github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Roman Khavronenko	e9ee043879	lib/storage: make `indexdb/tagFilters` cache size configurable (#2667 ) The default size of `indexdb/tagFilters` now can be overridden via `storage.cacheSizeIndexDBTagFilters` flag. Please, be careful with changing default size since it may lead to inefficient work of the vmstorage or OOM exceptions. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2663 Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2022-06-01 14:57:39 +03:00
Aliaksandr Valialkin	afced37c0b	all: add initial support for query tracing See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#query-tracing Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1403	2022-06-01 02:31:44 +03:00
Aliaksandr Valialkin	38beb9fe04	lib/storage: add ability to change the indexdb rotation time offset with -retentionTimezoneOffset command-line flag This is a follow-up for `0fbf59199a` See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2574	2022-05-25 16:07:14 +03:00
Aliaksandr Valialkin	e961aec551	app/vmstorage: do not allow to set -retentionPeriod smaller than one day VictoriaMetrics doesn't support retention periods smaller than one day, so do not allow to set it to small values. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2496	2022-05-07 00:54:42 +03:00
Aliaksandr Valialkin	925fa9a7de	docs/Cluster-VictoriaMetrics.md: make the description for `-rpc.disableCompression` command-line flag more clear	2022-05-06 16:24:56 +03:00
Roman Khavronenko	c41ae2db2c	vmstorage: switch to rich duration parser for flag `snapshotsMaxAge` (#2542 ) The switch suppose to allow setting `d`, `w`, `y` duration units. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-05-05 21:13:55 +03:00
Aliaksandr Valialkin	361b08c30e	lib/storage: leave the last sample per each discrete interval during the deduplicaton This aligns better with staleness logic in Prometheus - https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness	2022-05-02 21:59:31 +03:00
Artem Navoiev	11db05a4ff	lib/{storage,flagutil} - Add option for snapshot autoremoval (#2487 ) * lib/{storage,flagutil} - Add option for snapshot autoremoval - add prometheus-like duration as command flag - add option to delete stale snapshots - update duration.go flag to re-use own code * wip * lib/flagutil: re-use Duration.Set() call in NewDuration * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-05-02 11:24:12 +03:00
Aliaksandr Valialkin	a436836402	lib/flagutil: re-use Duration.Set() call in NewDuration	2022-05-02 10:58:08 +03:00
Aliaksandr Valialkin	ed1b394a1a	app/vmstorage: expose `vm_indexdb_items_added_total` and `vm_indexdb_items_added_size_bytes_total` counters at `/metrics` page These counters can be used for monitoring the rate of addition of new entries in indexdb (aka inverted index). See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2471	2022-04-21 13:19:42 +03:00
Aliaksandr Valialkin	81b7a31cb1	app/vmstorage: properly handle `maxSeries` limit passed from vmselect to vmstorage	2022-04-12 11:19:07 +03:00
Aliaksandr Valialkin	a4a15a462b	app/vmselect/netstorage: bump RPC API versions for vmselect->vmstorage communications This is a follow-up after `b843f0e229`	2022-04-08 12:36:04 +03:00
Nikolay	4cf6219e07	lib/{storage,regexpcache}: replaces regexpCacheMap with LRU cache (#2293 ) * lib/{storage,regexpcache}: replaces regexpCacheMap with LRU cache It should decrease memory usage for regexp caching with storing cacheEntry by pointer - golang map should be able to effectivly shrink it's size original issue with this case - unexpected map grows and storage OOM Apply suggestions from code review Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> Adds missing metrics for regexp cache and regexpPrefixes cache * wip * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-03-26 12:57:27 +02:00
Aliaksandr Valialkin	b843f0e229	app/vmselect: add fine-grained limits for the number of returned/scanned time series for various APIs	2022-03-26 11:28:14 +02:00
Aliaksandr Valialkin	698458b742	lib/httpserver: extract the code responsible for initializing server-side TLS config into netutil.GetServerTLSConfig	2022-03-17 19:46:20 +02:00
Roman Khavronenko	bd7837d524	lib: allow to configure cache size by type (#2206 ) * lib: allow to configure cache size by type https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1940 Signed-off-by: hagen1778 <roman@victoriametrics.com> * Apply suggestions from code review * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-21 13:55:51 +02:00
Roman Khavronenko	d107f86fbc	lib/index: reduce read/write load after indexDB rotation (#2177 ) * lib/index: reduce read/write load after indexDB rotation IndexDB in VM is responsible for storing TSID - ID's used for identifying time series. The index is stored on disk and used by both ingestion and read path. IndexDB is stored separately to data parts and is global for all stored data. It can't be deleted partially as VM deletes data parts. Instead, indexDB is rotated once in `retention` interval. The rotation procedure means that `current` indexDB becomes `previous`, and new freshly created indexDB struct becomes `current`. So in any time, VM holds indexDB for current and previous retention periods. When time series is ingested or queried, VM checks if its TSID is present in `current` indexDB. If it is missing, it checks the `previous` indexDB. If TSID was found, it gets copied to the `current` indexDB. In this way `current` indexDB stores only series which were active during the retention period. To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both write and read path consult `tsidCache` and on miss the relad lookup happens. When rotation happens, VM resets the `tsidCache`. This is needed for ingestion path to trigger `current` indexDB re-population. Since index re-population requires additional resources, every index rotation event may cause some extra load on CPU and disk. While it may be unnoticeable for most of the cases, for systems with very high number of unique series each rotation may lead to performance degradation for some period of time. This PR makes an attempt to smooth out resource usage after the rotation. The changes are following: 1. `tsidCache` is no longer reset after the rotation; 2. Instead, each entry in `tsidCache` gains a notion of indexDB to which they belong; 3. On ingestion path after the rotation we check if requested TSID was found in `tsidCache`. Then we have 3 branches: 3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID. 3.2 Slow path. It wasn't found, so we generate it from scratch, add to `current` indexDB, add it to `tsidCache`. 3.3 Smooth path. It was found but does not belong to the `current` indexDB. In this case, we add it to the `current` indexDB with some probability. The probability is based on time passed since the last rotation with some threshold. The more time has passed since rotation the higher is chance to re-populate `current` indexDB. The default re-population interval in this PR is set to `1h`, during which entries from `previous` index supposed to slowly re-populate `current` index. The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs were moved from `previous` indexDB to the `current` indexDB. This metric supposed to grow only during the first `1h` after the last rotation. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-12 00:34:44 +02:00
Aliaksandr Valialkin	34d14c4940	all: substitute zeroTime with time.Time{}, since this generates more optimal binary code	2022-02-07 14:36:41 +02:00
Aliaksandr Valialkin	5f266370c5	all: follow-up after `4bdd10ab90` Properly use new bytesutil.Resize* functions	2022-02-01 17:49:28 +02:00
Aliaksandr Valialkin	6232eaa938	lib/bytesutil: split Resize() into ResizeNoCopy() and ResizeWithCopy() functions Previously bytesutil.Resize() was copying the original byte slice contents to a newly allocated slice. This wasted CPU cycles and memory bandwidth in some places, where the original slice contents wasn't needed after slize resizing. Switch such places to bytesutil.ResizeNoCopy(). Rename the original bytesutil.Resize() function to bytesutil.ResizeWithCopy() for the sake of improved readability. Additionally, allocate new slice with `make()` instead of `append()`. This guarantees that the capacity of the allocated slice exactly matches the requested size. The `append()` could return a slice with bigger capacity as an optimization for further `append()` calls. This could result in excess memory usage when the returned byte slice was cached (for instance, in lib/blockcache). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-01-25 15:28:42 +02:00
Aliaksandr Valialkin	6ae584b9b3	lib/{mergeset,storage}: properly limit cache sizes for indexdb Previously these caches could exceed limits set via `-memory.allowedPercent` and/or `-memory.allowedBytes`, since limits were set independently per each data part. If the number of data parts was big, then limits could be exceeded, which could result to out of memory errors. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-01-20 18:45:03 +02:00
Aliaksandr Valialkin	cdfe854c9b	lib/storage: explicitly pass dedupInterval to DeduplicateSamples() and deduplicateSamplesDuringMerge() This improves the code readability and debuggability, since the output of these functions stops depending on global state.	2021-12-14 20:52:29 +02:00
Aliaksandr Valialkin	ab4be24397	app/vmstorage: export vm_cache_size_max_bytes metrics for determining capacity of various caches The vm_cache_size_max_bytes metric can be used for determining caches which reach their capacity via the following query: vm_cache_size_bytes / vm_cache_size_max_bytes > 0.9	2021-12-02 10:30:01 +02:00
Aliaksandr Valialkin	4fb19fe34b	all: consistently return `application/json` content-type without `charset=utf-8` The `application/json` content-type has utf-8 encoding by default. See https://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897	2021-11-09 18:07:22 +02:00
Aliaksandr Valialkin	4fddcf4c83	app/{vminsert,vmstorage}: follow-up after `a171916ef5` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269	2021-10-08 14:09:51 +03:00
Nikolay	a171916ef5	Adds read-only mode for vmstorage node (#1680 ) * adds read-only mode for vmstorage https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269 * changes order a bit * moves isFreeDiskLimitReached var to storage struct renames functions to be consistent change protoparser api - with optional storage limit check for given openned storage * renames freeSpaceLimit to ReadOnly	2021-10-08 12:52:56 +03:00
Aliaksandr Valialkin	c2f37f049b	lib/storage: properly search series by multiple tag filters matching empty labels such as foo{bar=~"baz\|",x=~"y\|"} Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1601 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/395	2021-09-09 21:12:53 +03:00
Aliaksandr Valialkin	c473d8ffe1	li/storage: re-use the per-day inverted index search code for searching in global index This allows removing a big pile of outdated code for global index search. This may help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1486	2021-07-30 10:28:20 +03:00
Aliaksandr Valialkin	49bf3abf67	app/vmselect: follow-up for `626073bca8` * Rename -search.maxMetricsPointSearch to -search.maxSamplesPerQuery, so it is more consistent with the existing -search.maxSamplesPerSeries * Move the -search.maxSamplesPerQuery from vmstorage to vmselect, so it could effectively limit the number of raw samples obtained from all the vmstorage nodes * Document the -search.maxSamplesPerQuery in docs/CHANGELOG.md	2021-07-28 18:00:04 +03:00
匠心零度	626073bca8	protection vmselect ,avoid metrics point too much let vmselect cup load very, very high (#1478 ) * protection vmselect…… * protection vmselect…… * protection vmselect…… * All checks have failed,fix Co-authored-by: lirenzuo <lirenzuo@shein.com>	2021-07-28 14:39:35 +03:00
Aliaksandr Valialkin	22c6e64bbc	lib/storage: consistency renaming: tagCache -> tagFiltersCache This improves code readability	2021-07-06 11:03:30 +03:00
Aliaksandr Valialkin	44855f0c9b	app/{vmselect,vmstorage}: clarify the description for `-dedup.minScrapeInterval` command-line flag Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1426	2021-07-02 15:06:41 +03:00
Aliaksandr Valialkin	165a9f9200	app/vmstorage: add ability to limit series cardinality via `-storage.maxHourlySeries` and `-storage.maxDailySeries` command-line flags	2021-05-20 15:31:57 +03:00
Aliaksandr Valialkin	2839055513	lib/storage: substitute GetTSDBStatusForDate with GetTSDBStatusWithFiltersForDate with nil tfss	2021-05-13 09:01:05 +03:00
Nikolay	be87be34a4	Adds tsdb match filters (#1282 ) * init work on filters * init propose for status filters * fixes tsdb status adds test * fix bug * removes checks from test	2021-05-12 17:16:58 +03:00
Aliaksandr Valialkin	9c505d27dd	lib/ingestserver: properly close incoming connections during graceful shutdown	2021-05-08 19:53:45 +03:00
Aliaksandr Valialkin	4a5f45c77e	app/vminsert: add support for data ingestion via other vminsert nodes	2021-05-08 19:53:45 +03:00
Aliaksandr Valialkin	6dc5d3b357	all: rename https://victoriametrics.github.io to https://docs.victoriametrics.com	2021-04-20 20:20:01 +03:00
Aliaksandr Valialkin	0f7ece84f3	app/vmstorage/transport: reduce memory allocations on data ingestion path	2021-04-10 17:36:00 +03:00
Aliaksandr Valialkin	2585058a5f	Makefile: prepare `arm64` and `amd64` release archives for cluster version on `make release` command	2021-04-05 23:01:45 +03:00
Aliaksandr Valialkin	4028d692f5	app: do not process non-GET requests on at `/` handler	2021-04-02 22:56:38 +03:00
Aliaksandr Valialkin	512addc608	app/{vminsert,vmagent}: add `-sortLabels` command-line option for sorting time series labels before ingesting them in the storage This option can be useful when samples for the same time series are ingested with distinct order of labels. For example, metric{k1="v1",k2="v2"} and metric{k2="v2",k1="v1"}.	2021-03-31 23:27:21 +03:00
Aliaksandr Valialkin	ae1c653d55	lib/storage: reduce memory usage when ingesting samples for the same time series with distinct order of labels	2021-03-31 21:22:40 +03:00
Aliaksandr Valialkin	8ef1184adf	app/vmstorage: add `vm_index_search_duration_seconds` histogram for monitoring the performance of index search	2021-03-17 01:13:15 +02:00
Aliaksandr Valialkin	d074326970	app/vmstorage: add `-logNewSeries` command-line flag for determining the source of series churn rate	2021-03-15 22:40:28 +02:00
Aliaksandr Valialkin	c67a07b469	lib/handshake: log read/write operation duration on connection errors This improve debuggability of network errors	2021-03-02 21:20:20 +02:00
Aliaksandr Valialkin	83da939947	app/vmstorage: export vm_composite_filter_success_conversions_total and vm_composite_filter_missing_conversions_total metrics	2021-02-17 19:13:49 +02:00
Aliaksandr Valialkin	c769f8321d	deployment/docker: embed tzdata into prod Go app instead of installing it into base docker image While this increases app size by 700Kb, this allows using -loggerTimezone in a scratch base image See https://github.com/golang/go/issues/38017	2021-02-12 04:56:27 +02:00
Aliaksandr Valialkin	ff7850aec0	deployment/docker: use `docker buildx` for creating multiarch builds See https://github.com/docker/buildx/	2021-02-12 04:35:35 +02:00
Aliaksandr Valialkin	08f21d8761	app/vmstorage: export vm_composite_index_min_timestamp metric	2021-02-10 17:14:00 +02:00
Aliaksandr Valialkin	148422bcba	lib/storage: disable composite index usage when querying old data	2021-02-10 14:57:58 +02:00
Aliaksandr Valialkin	fa0ef143b1	lib/storage: optimize search by label filters matching big number of time series	2021-02-10 00:46:17 +02:00
Aliaksandr Valialkin	e8ee9fa7fe	app/vmstorage: export missing `vm_cache_size_bytes` metrics for indexdb and data caches	2021-02-09 00:49:58 +02:00
Aliaksandr Valialkin	4b930b9ffe	app/vmselect: add ability to set Graphite-compatible filter via `{__graphite__="foo.*.bar"}` syntax	2021-02-03 01:17:19 +02:00
Aliaksandr Valialkin	d5a2b120e9	app/vmstorage: disable final merge by default, since it may result in high disk IO and CPU usage without measurable benefits such as increased query performance and reduced disk space usage	2021-01-08 00:12:12 +02:00
Aliaksandr Valialkin	a2eb451de4	app/{vmagent,vminsert}: follow-up for `ce8c2dd1f1`: return `/targets` page in HTML when requested via web browser	2020-12-14 14:13:01 +02:00
Aliaksandr Valialkin	1a237c6903	all: properly handle CPU limits set on the host system/container This can reduce memory usage on systems with enabled CPU limits. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/946	2020-12-08 21:07:03 +02:00
Aliaksandr Valialkin	bdac2171f1	all: do not print usage info for all the flags when incorrect command-line flag is passed This should improve usability for VictoriaMetrics apps that have big number of command-line flags, i.e. all the apps.	2020-12-03 21:46:19 +02:00
Aliaksandr Valialkin	433ae806ac	app/vmselect: implement `/tags/tagSeries` and /tags/tagMultiSeries` in order to be consistent with single-node VictoriaMetrics	2020-11-23 14:57:08 +02:00
Aliaksandr Valialkin	7d76fdedcc	app/vmselect: use storage.NewSearchQuery() instead of constructing storage.SearchQuery in-place This should prevent from bugs when AccountID and ProjectID aren't set in storage.SearchQuery.	2020-11-16 18:04:33 +02:00
Aliaksandr Valialkin	eea1be0d5c	app/vmselect/graphite: add /tags/findSeries handler from Graphite Tags API See https://graphite.readthedocs.io/en/stable/tags.html#exploring-tags	2020-11-16 12:52:23 +02:00
Aliaksandr Valialkin	7ceaf4ba8f	all: consistently return text-based HTTP responses with charset=utf-8 This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897	2020-11-13 10:30:21 +02:00
immerrr again	1ec1a9f27f	app/vmstorage: add "/internal/force_flush" endpoint (#893 )	2020-11-11 14:46:37 +02:00
Aliaksandr Valialkin	767231f41f	app/vmstorage/transport: properly handle request to labelValuesOnTimeRange	2020-11-05 02:08:04 +02:00
Aliaksandr Valialkin	c5e6c5f5a6	app/vmselect: optimize querying for `/api/v1/labels` and `/api/v1/label/<name>/values` when `start` and `end` args are set	2020-11-05 01:19:29 +02:00
Aliaksandr Valialkin	9c5cd5a6c5	lib/storage: code cleanup after `5bfd4e6218`	2020-10-20 16:10:53 +03:00
Aliaksandr Valialkin	0db7c2b500	app/vmstorage: support for `-retentionPeriod` smaller than one month Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/173 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/17	2020-10-20 14:42:46 +03:00
Aliaksandr Valialkin	d2e917d1cb	app/vmstorage: add `vm_rows_added_to_storage_total` metric, which shows the total number of rows added to storage since app start	2020-10-09 13:36:17 +03:00
Aliaksandr Valialkin	b51fa16177	app/vmstorage: add `-finalMergeDelay` command-line flag for configuring the delay before final merge for per-month partitions after no new data is ingested to it	2020-10-07 17:42:31 +03:00
Aliaksandr Valialkin	abfd3a8fab	app/{vminsert,vmselect,vmstorage}: add a link to https://victoriametrics.github.io/Cluster-VictoriaMetrics.html from main page of every cluster component	2020-10-06 15:30:07 +03:00
Aliaksandr Valialkin	fd7dd5064a	lib/storage: code cleanup after `10f2eedee0` Remove the code that uses metricIDs caches for the current and the previous hour during metricIDs search, since this code became unused after implementing per-day inverted index almost a year ago. While at it, fix a bug, which could prevent from finding time series with names containing dots (aka Graphite-like names such as `foo.bar.baz`).	2020-10-01 19:12:04 +03:00
Aliaksandr Valialkin	536aa8779a	app/vmstorage: rename `vm_{big\|small}_merge_need_free_disk_space` to `vm_merge_need_free_disk_space` This simplifies alerting.	2020-09-29 22:53:33 +03:00
Aliaksandr Valialkin	097a4c10dd	app/vmstorage: add metrics for determining whether background merges need additional disk space to complete These metrics are: * vm_small_merge_need_free_disk_space * vm_big_merge_need_free_disk_space Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686	2020-09-29 21:47:47 +03:00
Aliaksandr Valialkin	2ee0dc27a6	app/vmstorage: parallelize data processing obtained from a single connection from vminsert Previously vmstorage could use only a single CPU core for data processing from a single connection from vminsert. Now all the CPU cores can be used for data processing from a single connection from vminsert. This should improve the maximum data ingestion performance for a single vminsert->vmstorage connection.	2020-09-28 21:41:16 +03:00
Aliaksandr Valialkin	543f3aea97	all: consistently use "%w" formatting in fmt.Errorf for wrapped errors	2020-09-23 22:48:21 +03:00
Aliaksandr Valialkin	9b15b11f74	app/vmstorage: added `-forceMergeAuthKey` command-line flag for protecting `/internal/force_merge` endpoint	2020-09-17 14:24:20 +03:00
Aliaksandr Valialkin	d96858b921	lib/storage: add `/internal/force_merge` handler for running forced compactions on historical per-month partitions This may be useful for freeing up storage space after time series deletion. See https://victoriametrics.github.io/#force-merge for more details. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686	2020-09-17 12:20:56 +03:00
Aliaksandr Valialkin	f307e6f432	app/vmselect: initial implementation of Graphite Metrics API See https://graphite-api.readthedocs.io/en/latest/api.html#the-metrics-api	2020-09-11 00:30:20 +03:00
Aliaksandr Valialkin	f5cb213ef9	lib/storage: reuse timestamp blocks for adjancent metric blocks with identical timestamps This should reduce disk space usage when scraping targets containing metrics with identical names such as `node_cpu_seconds_total`, histograms, quantiles, etc. Expose `vm_timestamps_blocks_merged_total` and `vm_timestamps_bytes_saved_total` metrics for monitoring the effectiveness of timestamp blocks merging.	2020-09-09 23:59:21 +03:00
Aliaksandr Valialkin	6721e47ae9	app: respect CPU limits set via cgroups Update GOMAXPROCS to limits set via cgroups. This should reduce CPU trashing and reduce memory usage for cases when VictoriaMetrics components run in containers with CPU limits. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/685	2020-08-11 23:01:03 +03:00
Aliaksandr Valialkin	b3d4ff7ee2	app/vmstorage: improve error logging when the request times out	2020-08-10 13:17:24 +03:00
Aliaksandr Valialkin	a455930ab4	app/vmstorage: rename `vm_cache_size_entries{type="storage/prefetchedMetricIDs"}` to `vm_cache_entries{type="storage/prefetchedMetricIDs"}` to be consistent with other `vm_cache_entries` metrics	2020-08-06 16:34:18 +03:00
Aliaksandr Valialkin	a3e91c593b	lib/storage: limit the number of concurrent calls to storage.searchTSIDs to GOMAXPROCS*2 This should limit the maximum memory usage and reduce CPU trashing on vmstorage when multiple heavy queries are executed. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648	2020-08-05 18:27:21 +03:00
Aliaksandr Valialkin	94471a1273	app: remove duplicate *-pure makefile rules	2020-07-31 20:01:30 +03:00
Aliaksandr Valialkin	106e302d7a	all: add mssing APP_NAME to vm*-GOARCH builds	2020-07-31 13:45:32 +03:00
Aliaksandr Valialkin	29bbab0ec9	lib/storage: remove prioritizing of merging small parts over merging big parts, since it doesn't work as expected The prioritizing could lead to big merge starvation, which could end up in too big number of parts that must be merged into big parts. Multiple big merges may be initiated after the migration from v1.39.0 or v1.39.1. It is OK - these merges should be finished soon, which should return CPU and disk IO usage to normal levels. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/618	2020-07-30 20:02:22 +03:00
Aliaksandr Valialkin	fb3d1380ac	lib/storage: respect `-search.maxQueryDuration` when searching for time series in inverted index Previously the time spent on inverted index search could exceed the configured `-search.maxQueryDuration`. This commit stops searching in inverted index on query timeout.	2020-07-23 21:22:05 +03:00
Aliaksandr Valialkin	b8303afcd8	lib/storage: improve prioritizing of data ingestion over querying Prioritize also small merges over big merges. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648	2020-07-23 01:40:38 +03:00
Aliaksandr Valialkin	31ef39e8da	lib/httpserver: log remote address in error message from `httpserver.Errorf` This should improve detection of the root cause of errors. Thanks to Anant for the idea.	2020-07-20 14:06:29 +03:00
Aliaksandr Valialkin	0bff96fe4b	lib/storage: prioritize data ingestion over heavy queries Heavy queries could result in the lack of CPU resources for processing the current data ingestion stream. Prevent this by delaying queries' execution until free resources are available for data ingestion. Expose `vm_search_delays_total` metric, which may be used in for alerting when there is no enough CPU resources for data ingestion and/or for executing heavy queries. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291	2020-07-05 19:44:04 +03:00
Aliaksandr Valialkin	4cb3e7595c	app/vmstorage: add `-denyQueriesOutsideRetention` command-line flag for denying queries outside the configured retention	2020-07-01 00:58:42 +03:00
Aliaksandr Valialkin	d962568e93	all: use %w instead of %s for wrapping errors in `fmt.Errorf` This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode . See https://blog.golang.org/go1.13-errors for details.	2020-06-30 23:33:46 +03:00
Aliaksandr Valialkin	01719f4949	app/vmstorage/transport: simplify setupTfss in order to prevent the possibility of nil tfs Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/534	2020-06-05 13:17:26 +03:00
Aliaksandr Valialkin	e4cef1b678	app/vmstorage: prevent from serving conns from vminsert and vmselect after the server is closed Previously it was possible that the connection is served after the server is closed if the following steps are performed: 1) Server accepts new connection. 2) Server.MustClose() is called and successfully finished. 3) Server starts processing the connection accepted at step 1. There could be various crashes like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/534 since the storage may be already closed. Now the server closes the connection at step 3 without processing it.	2020-06-05 11:55:48 +03:00
Aliaksandr Valialkin	901093279e	app/vmstorage/transport: update stale comment - vmstorage now sends small `ack` packets to `vminsert`	2020-05-21 14:04:52 +03:00
Aliaksandr Valialkin	2784015a4d	all: print `--help` output to stdout instead of stderr This is easier to grep and pipe	2020-05-16 12:03:06 +03:00
Aliaksandr Valialkin	a853869e75	app/vmstorage/transport: prevent from uncontrolled memory usage growth when `vminsert` sends big packets with too long labels Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/490	2020-05-15 15:42:54 +03:00
Aliaksandr Valialkin	1e5c1d7eaa	app/vmstorage: add `vm_slow_metric_name_loads_total` metric, which could be used as an indicator when more RAM is needed for improving query performance	2020-05-15 14:12:24 +03:00
Aliaksandr Valialkin	d6b9a49481	app/vmstorage: add `vm_slow_row_inserts_total` and `vm_slow_per_day_index_inserts_total` metrics for determining whether VictoriaMetrics required more RAM for the current number of active time series	2020-05-15 13:46:57 +03:00
Aliaksandr Valialkin	3d3f41b961	app/vmstorage/transport: fix panic during server stop on 32-bit arches See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/212	2020-05-12 20:21:40 +03:00
Aliaksandr Valialkin	f7753b1469	lib/storage: gradually pre-populate per-day inverted index for the next day This should prevent from CPU usage spikes at 00:00 UTC every day when inverted index for new day must be quickly created for all the active time series. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430	2020-05-12 12:13:32 +03:00
Aliaksandr Valialkin	3052b479b7	lib/httpserver: reduce typical duration for http server graceful shutdown Previously the duration for graceful shutdown for http server could take more than a minute because of imporperly set timeouts in setNetworkTimeout. Now typical duration for graceful shutdown should be reduced to less than 5 seconds.	2020-05-07 14:16:38 +03:00
Aliaksandr Valialkin	989d84cf3f	app/{vminsert,vmstorage}: wait for `ack` from `vmstorage` after each packet sent to it from `vminsert` This should protect from possible data loss when `vmstorage` is stopped while the packet is sent from `vminsert`. This commit switches to new protocol between vminsert and vmstorage, which is incompatible with the previous protocol. So it is required that both vminsert and vmstorage nodes are updated.	2020-04-27 09:53:26 +03:00
Aliaksandr Valialkin	e933cbac16	lib/storage: postpone reading data from blocks during search This eliminates the need for storing block data into temporary files on a single-node VictoriaMetrics during heavy queries, which touch big number of time series over long time ranges. This improves single-node VM performance on heavy queries by up to 2x.	2020-04-27 08:44:01 +03:00
Aliaksandr Valialkin	f9526809e5	app/vmselect: add `/api/v1/status/tsdb` page with useful stats for locating root cause for high cardinality issues See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/425 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/268	2020-04-22 22:03:23 +03:00
Aliaksandr Valialkin	a53e332a93	app/vmstorage: add missing shutdown for http server on graceful shutdown This could result in the following panic during graceful shutdown when `/metrics` page is requested: http: panic serving 10.101.66.5:57366: runtime error: invalid memory address or nil pointer dereference goroutine 2050 [running]: net/http.(conn).serve.func1(0xc00ef22000) net/http/server.go:1772 +0x139 panic(0xa0fc00, 0xe91d80) runtime/panic.go:973 +0x3e3 github.com/VictoriaMetrics/VictoriaMetrics/lib/workingsetcache.(Cache).UpdateStats(0x0, 0xc0000516c8) github.com/VictoriaMetrics/VictoriaMetrics/lib/workingsetcache/cache.go:224 +0x37 github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(indexDB).UpdateMetrics(0xc00b931d00, 0xc02c41acf8) github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/index_db.go:258 +0x9f github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(Storage).UpdateMetrics(0xc0000bc7e0, 0xc02c41ac00) github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/storage.go:413 +0x4c5 main.registerStorageMetrics.func1(0x0) github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:186 +0xd9 main.registerStorageMetrics.func3(0xc00008c380) github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:196 +0x26 main.registerStorageMetrics.func7(0xc) github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:211 +0x26 github.com/VictoriaMetrics/metrics.(Gauge).marshalTo(0xc000010148, 0xaa407d, 0x20, 0xb50d60, 0xc005319890) github.com/VictoriaMetrics/metrics@v1.11.2/gauge.go:38 +0x3f github.com/VictoriaMetrics/metrics.(Set).WritePrometheus(0xc000084300, 0x7fd56809c940, 0xc005319860) github.com/VictoriaMetrics/metrics@v1.11.2/set.go:51 +0x1e1 github.com/VictoriaMetrics/metrics.WritePrometheus(0x7fd56809c940, 0xc005319860, 0xa16f01) github.com/VictoriaMetrics/metrics@v1.11.2/metrics.go:42 +0x41 github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.writePrometheusMetrics(0x7fd56809c940, 0xc005319860) github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/metrics.go:16 +0x44 github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper(0xb5a120, 0xc005319860, 0xc005018f00, 0xc00002cc90) github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:154 +0x58d github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.gzipHandler.func1(0xb5a120, 0xc005319860, 0xc005018f00) github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:119 +0x8e net/http.HandlerFunc.ServeHTTP(0xc00002d110, 0xb5a660, 0xc0044141c0, 0xc005018f00) net/http/server.go:2012 +0x44 net/http.serverHandler.ServeHTTP(0xc004414000, 0xb5a660, 0xc0044141c0, 0xc005018f00) net/http/server.go:2807 +0xa3 net/http.(conn).serve(0xc00ef22000, 0xb5bf60, 0xc010532080) net/http/server.go:1895 +0x86c created by net/http.(Server).Serve net/http/server.go:2933 +0x35c	2020-04-02 21:09:55 +03:00
Aliaksandr Valialkin	3b744f3c32	app/vmstorage: typo fix	2020-04-01 23:43:09 +03:00
Aliaksandr Valialkin	f838cdc86e	app/vmstorage: add `vm_free_disk_space_bytes` metric for monitoring the remaining disk space at `-storageDataPath`	2020-04-01 23:10:44 +03:00
Aliaksandr Valialkin	d450249955	lib/storage: properly handle `{label=~"foo\|"}` filters as Prometheus does Such filters must match all the time series with `label="foo"` plus all the time series without `label` Previously only time series with `label="foo"` were matched.	2020-03-30 20:21:47 +03:00
Dmitry Naumov	b84071fc25	Rootless docker images by default (#358 ) * Rootless docker images by default * Migrate to rootless base image Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>	2020-03-27 21:18:32 +02:00
Aliaksandr Valialkin	8939c19281	app/vmstorage: return 500 status code instead of 200 status code on internal errors inside `/snapshot/*` handlers	2020-03-10 23:54:27 +02:00
Aliaksandr Valialkin	cf9aee4ec3	all: properly split `vm_deduplicated_samples_total` among cluster components Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/345	2020-02-27 23:47:51 +02:00
Aliaksandr Valialkin	afecb34491	app/vmstorage: limit the maximum error message size before sending it to client Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/315	2020-02-13 17:33:12 +02:00
Aliaksandr Valialkin	1010a57882	all: allow setting flags via environment vars Now flags can be set via environment vars with the same names as flags. Command-line flags override flags set via env vars. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/311	2020-02-10 13:31:21 +02:00
Aliaksandr Valialkin	ea66212c93	lib/storage: move `-dedup.minScrapeInterval` flag outside lib/storage, so it doesnt show up in `vminsert` in cluster version	2020-02-10 13:07:25 +02:00
Aliaksandr Valialkin	4ed5e9a7ce	lib/storage: pre-fetch metricNames for the found metricIDs in Search.Init This should speed up Search.NextMetricBlock loop for big number of found time series.	2020-01-30 15:16:16 +02:00
Aliaksandr Valialkin	ea53a21b02	all: consistently log durations in seconds with millisecond precision This should improve logs readability	2020-01-22 18:35:24 +02:00
Aliaksandr Valialkin	cffaeda0f1	all: publish Docker images for the following GOARCH: amd64, arm, arm64, ppc64le and 386 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/258	2019-12-11 23:33:11 +02:00
Aliaksandr Valialkin	4e22b521c2	lib/storage: remove metricID with missing metricID->metricName entry The metricID->metricName entry can be missing in the indexdb after unclean shutdown when only a part of entries for new time series is written into indexdb. Recover from such a situation by removing the broken metricID. New metricID will be automatically created for time series with the given metricName when new data point will arive to it.	2019-12-02 20:52:13 +02:00
Aliaksandr Valialkin	4c63caa37c	deployment/docker/certs: update TLS certs source from alpine:3.9 to alpine:3.10	2019-11-29 19:55:36 +02:00
Aliaksandr Valialkin	d297b65089	lib/storage: add `vm_cache_size_bytes{type="storage/hour_metric_ids"}` metric	2019-11-13 20:26:05 +02:00
Aliaksandr Valialkin	494ad0fdb3	lib/storage: remove inmemory index for recent hour, since it uses too much memory Production workload shows that the index requires ~4Kb of RAM per active time series. This is too much for high number of active time series, so let's delete this index. Now the queries should fall back to the index for the current day instead of the index for the recent hour. The query performance for the current day index should be good enough given the 100M rows/sec scan speed per CPU core.	2019-11-13 18:08:58 +02:00
Aliaksandr Valialkin	633dd81bb5	lib/storage: add `-disableRecentHourIndex` flag for disabling inmemory index for recent hour This may be useful for saving RAM on high number of time series aka high cardinality	2019-11-13 15:10:12 +02:00
Aliaksandr Valialkin	f1620ba7c0	lib/storage: fix inmemory inverted index issues found in v1.29 Issues fixed: - Slow startup times. Now the index is loaded from cache during start. - High memory usage related to superflouos index copies every 10 seconds.	2019-11-13 13:35:38 +02:00
Aliaksandr Valialkin	87b39222be	Revert "lib/fs: do not postpone directory removal on NFS error" This reverts commit 21aeb02b46649ac9906cb37733f7b155a77a0db9.	2019-11-12 16:29:50 +02:00
Aliaksandr Valialkin	c48e39eea9	lib/storage: add tests for dateMetricIDCache	2019-11-11 13:21:05 +02:00
Aliaksandr Valialkin	5f52eb7653	lib/fs: do not postpone directory removal on NFS error Continue trying to remove NFS directory on temporary errors for up to a minute. The previous async removal process breaks in the following case during VictoriaMetrics start - VictoriaMetrics opens index, finds incomplete merge transactions and starts replaying them. - The transaction instructs removing old directories for parts, which were already merged into bigger part. - VictoriaMetrics removes these directories, but their removal is delayed due to NFS errors. - VictoriaMetrics scans partition directory after all the incomplete merge transactions are finished and finds directories, which should be removed, but weren't still removed due to NFS errors. - VictoriaMetrics panics when it finds unexpected empty directory. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162	2019-11-10 13:27:16 +02:00
Aliaksandr Valialkin	9ea2bd822e	lib/storage: implement per-day inverted index	2019-11-10 00:20:32 +02:00
Aliaksandr Valialkin	dea2f3efed	lib/storage: use specialized cache for (date, metricID) entries This improves ingestion performance.	2019-11-09 23:09:18 +02:00
Aliaksandr Valialkin	46e67bb78c	lib/storage: export `vm_new_timeseries_created_total` metric for determining time series churn rate	2019-11-08 19:58:21 +02:00
Aliaksandr Valialkin	0063c857f5	lib/storage: add inmemory inverted index for the last hour It should improve performance for `last N hours` dashboards with update intervals smaller than 1 hour.	2019-11-08 19:37:46 +02:00
Aliaksandr Valialkin	1c777e0245	lib/storage: substitute error message about unsorted items in the index block after metricIDs merge with counter The origin of the error has been detected and documented in the code, so it is enough to export a counter for such errors at `vm_index_blocks_with_metric_ids_incorrect_order_total`, so it could be monitored and alerted on high error rates. Export also the counter for processed index blocks with metricIDs - `vm_index_blocks_with_metric_ids_processed_total`, so its' rate could be compared to `rate(vm_index_blocks_with_metric_ids_incorrect_order_total)`.	2019-11-06 14:32:41 +02:00
Aliaksandr Valialkin	6ab9c98a1e	app/vmstorage: add `-bigMergeConcurrency` and `-smallMergeConcurrency` flags for tuning the maximum number of CPU cores used during merges	2019-10-31 16:17:29 +02:00
Aliaksandr Valialkin	b101064f8b	all: report the number of bytes read on io.ReadFull error This should simplify error investigation similar to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/175	2019-09-11 14:50:24 +03:00
Aliaksandr Valialkin	2c654258ef	lib/fs: add MustStopDirRemover for waiting until pending directories are removed on graceful shutdown This patch is mainly required for laggy NFS. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162	2019-09-05 11:17:17 +03:00
Aliaksandr Valialkin	5893a9f9a3	app/vmstorage: increase default values for search.maxTagKeys, search.maxTagValues and search.maxUniqueTimeseries	2019-08-27 14:28:26 +03:00
Aliaksandr Valialkin	f56c1298ad	app/vmstorage: add `vm_concurrent_addrows_*` metrics for tracking concurrency for Storage.AddRows calls Track also the number of dropped rows due to the exceeded timeout on concurrency limit for Storage.AddRows. This number is tracked in `vm_concurrent_addrows_dropped_rows_total`	2019-08-06 15:08:43 +03:00
Aliaksandr Valialkin	880b1d80b1	app/vmselect: optimize `/api/v1/series` by skipping storage data Fetch and process only time series metainfo.	2019-08-04 23:00:46 +03:00
Aliaksandr Valialkin	8253790157	app/vmstorage: consistency renaming for `ignored rows` metrics vm_too_big_timestamp_rows_total -> vm_rows_ignored_total{reason="big_timestamp"} vm_too_small_timestamp_rows_total -> vm_rows_ignored_total{reason="small_timestamp"}	2019-07-26 20:02:24 +03:00
Aliaksandr Valialkin	c6bec48927	lib/storage: add metrics for calculating skipped rows outside the retention The metrics are: - vm_too_big_timestamp_rows_total - vm_too_small_timestamp_rows_total	2019-07-26 14:11:56 +03:00
Aliaksandr Valialkin	54f035d4ce	all: small updates after PR #114	2019-07-24 17:43:43 +03:00
Aliaksandr Valialkin	ba8195c58e	all: consistency renaming: bytesSize -> sizeBytes	2019-07-10 00:47:42 +03:00
Aliaksandr Valialkin	41f512af1c	all: add `vm_data_size_bytes` metrics for easy monitoring of on-disk data size and on-disk inverted index size	2019-07-04 19:43:04 +03:00
Aliaksandr Valialkin	a0c22a6830	app/vmstorage: add `vm_cache_entries{type="storage/hour_metric_ids"}` metric for tracking active time series count	2019-06-19 18:37:38 +03:00
Aliaksandr Valialkin	945894e049	app/vmselect: properly handle empty label (aka __name__) in LabelEntries handler	2019-06-10 19:55:02 +03:00
Aliaksandr Valialkin	75a0acf72d	app/vmselect: add `/api/v1/labels/count` handler for quick detection of labels with the maximum number of distinct values	2019-06-10 19:54:55 +03:00
Aliaksandr Valialkin	547bcdce63	app/vmstorage: enable compression of responses to vmselect by default This should save vmstorage => vmselect network bandwidth in common case when recently added data is queried.	2019-06-10 14:54:59 +03:00
Aliaksandr Valialkin	d54f5fec0b	lib/storage: skip adaptive searching for tag filter matching the minimum number of metrics if the identical previous search didn't found such filter This should improve speed for searching metrics among high number of time series with high churn rate like in big Kubernetes clusters with frequent deployments.	2019-06-10 14:07:47 +03:00
Aliaksandr Valialkin	4c3913290a	app/vmstorage: add missing `_total` suffixes to newly added metrics	2019-06-09 22:11:41 +03:00
Aliaksandr Valialkin	d882afa905	lib/storage: optimize time series lookup for recent hours when the db contains many millions of time series with high churn rate (aka frequent deployments in Kubernetes)	2019-06-09 19:14:04 +03:00

1 2 3 4 5 ...

253 commits