github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Roman Khavronenko	cf1a8bce6b	lib/index: reduce read/write load after indexDB rotation (#2177 ) * lib/index: reduce read/write load after indexDB rotation IndexDB in VM is responsible for storing TSID - ID's used for identifying time series. The index is stored on disk and used by both ingestion and read path. IndexDB is stored separately to data parts and is global for all stored data. It can't be deleted partially as VM deletes data parts. Instead, indexDB is rotated once in `retention` interval. The rotation procedure means that `current` indexDB becomes `previous`, and new freshly created indexDB struct becomes `current`. So in any time, VM holds indexDB for current and previous retention periods. When time series is ingested or queried, VM checks if its TSID is present in `current` indexDB. If it is missing, it checks the `previous` indexDB. If TSID was found, it gets copied to the `current` indexDB. In this way `current` indexDB stores only series which were active during the retention period. To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both write and read path consult `tsidCache` and on miss the relad lookup happens. When rotation happens, VM resets the `tsidCache`. This is needed for ingestion path to trigger `current` indexDB re-population. Since index re-population requires additional resources, every index rotation event may cause some extra load on CPU and disk. While it may be unnoticeable for most of the cases, for systems with very high number of unique series each rotation may lead to performance degradation for some period of time. This PR makes an attempt to smooth out resource usage after the rotation. The changes are following: 1. `tsidCache` is no longer reset after the rotation; 2. Instead, each entry in `tsidCache` gains a notion of indexDB to which they belong; 3. On ingestion path after the rotation we check if requested TSID was found in `tsidCache`. Then we have 3 branches: 3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID. 3.2 Slow path. It wasn't found, so we generate it from scratch, add to `current` indexDB, add it to `tsidCache`. 3.3 Smooth path. It was found but does not belong to the `current` indexDB. In this case, we add it to the `current` indexDB with some probability. The probability is based on time passed since the last rotation with some threshold. The more time has passed since rotation the higher is chance to re-populate `current` indexDB. The default re-population interval in this PR is set to `1h`, during which entries from `previous` index supposed to slowly re-populate `current` index. The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs were moved from `previous` indexDB to the `current` indexDB. This metric supposed to grow only during the first `1h` after the last rotation. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-12 00:30:08 +02:00
Aliaksandr Valialkin	cebcb15ba4	lib/storage: verify that the tsidsFound contain the needed tsids in tests added at `f4dead529f`	2021-09-11 10:57:13 +03:00
Aliaksandr Valialkin	f4dead529f	lib/storage: properly search series by multiple tag filters matching empty labels such as foo{bar=~"baz\|",x=~"y\|"} Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1601 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/395	2021-09-09 21:09:21 +03:00
Aliaksandr Valialkin	d05cac6c98	li/storage: re-use the per-day inverted index search code for searching in global index This allows removing a big pile of outdated code for global index search. This may help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1486	2021-07-30 10:31:37 +03:00
Aliaksandr Valialkin	84fb59b0ba	lib/storage: move deletedMetricIDs set from indexDB to Storage This makes consitent the list of deleted metricIDs when it is used from both the current indexDB and the previous indexDB (aka extDB). This should fix the issue, which could lead to storing new samples under deleted metricIDs after indexDB rotation. See more details at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347#issuecomment-861232136 . Thanks to @tangqipengleoo for the initial analysis and the pull request - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 . This commit resolves the issue in more generic way compared to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 . The downside of the commit is the deletedMetricIDs set isn't cleaned from the metricIDs outside the retention. It needs app restart. This should be OK in most cases.	2021-06-15 15:04:30 +03:00
Aliaksandr Valialkin	c4f3fbfa5d	lib/storage: reset cache on disk during series deletion and during indexdb rotation This should prevent from inconsistent behavior (aka partially missing data for some time series) after unclean shutdown. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347	2021-06-11 12:42:28 +03:00
Aliaksandr Valialkin	f54133b200	lib/storage: do not populate MetricID->MetricName cache during data ingestion This cache isn't needed during data ingestion, so there is no need in spending RAM on it. This reduces RAM usage on data ingestion path by 30%	2021-05-24 03:02:46 +03:00
Aliaksandr Valialkin	d7be2753c0	lib/storage: substitute GetTSDBStatusForDate with GetTSDBStatusWithFiltersForDate with nil tfss	2021-05-13 09:02:33 +03:00
Aliaksandr Valialkin	832651c6c2	app/vmselect: follow up after `8a0678678b` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1168	2021-05-12 17:18:30 +03:00
Nikolay	8a0678678b	Adds tsdb match filters (#1282 ) * init work on filters * init propose for status filters * fixes tsdb status adds test * fix bug * removes checks from test	2021-05-12 15:18:45 +03:00
Aliaksandr Valialkin	8e2afdf568	lib/storage: improve Search.NextMetricBlock performance by using MetricID->MetricName cache	2021-03-22 22:49:18 +02:00
Aliaksandr Valialkin	636c55b526	lib/mergeset: reduce memory usage for inmemoryBlock by using more compact items representation This also should reduce CPU time spent by GC, since inmemoryBlock.items don't have pointers now, so GC doesn't need visiting them.	2021-02-21 22:06:47 +02:00
Aliaksandr Valialkin	553016ea99	lib/storage: disable composite index usage when querying old data	2021-02-10 14:57:50 +02:00
Aliaksandr Valialkin	368b69b4c4	app/vmselect: properly handle errors in GetLabelsOnTimeRange and GetLabelValuesOnTimeRange	2020-11-05 01:38:38 +02:00
Aliaksandr Valialkin	b378cd6ed8	app/vmselect: optimize querying for `/api/v1/labels` and `/api/v1/label/<name>/values` when `start` and `end` args are set	2020-11-05 01:01:33 +02:00
Aliaksandr Valialkin	fe289331dd	lib/storage: remove obsolete code	2020-11-02 19:11:59 +02:00
Aliaksandr Valialkin	764dc2499f	lib/storage: code cleanup after `10f2eedee0` Remove the code that uses metricIDs caches for the current and the previous hour during metricIDs search, since this code became unused after implementing per-day inverted index almost a year ago. While at it, fix a bug, which could prevent from finding time series with names containing dots (aka Graphite-like names such as `foo.bar.baz`).	2020-10-01 19:06:23 +03:00
Aliaksandr Valialkin	039c9d2441	lib/storage: respect `-search.maxQueryDuration` when searching for time series in inverted index Previously the time spent on inverted index search could exceed the configured `-search.maxQueryDuration`. This commit stops searching in inverted index on query timeout.	2020-07-23 21:21:42 +03:00
Aliaksandr Valialkin	e1107fec10	lib/storage: reset `MetricName->TSID` cache after marking metricIDs as deleted This is a follow-up commit after `12b16077c4` , which didn't reset the `tsidCache` in all the required places. This could result in indefinite errors like: missing metricName by metricID ...; this could be the case after unclean shutdown; deleting the metricID, so it could be re-created next time Fix this by resetting the cache inside deleteMetricIDs function.	2020-07-14 14:06:32 +03:00
Aliaksandr Valialkin	d5dddb0953	all: use %w instead of %s for wrapping errors in `fmt.Errorf` This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode . See https://blog.golang.org/go1.13-errors for details.	2020-06-30 23:05:11 +03:00
Aliaksandr Valialkin	ae1cc0fc4b	lib/storage: properly match `{tag!="\|foo"}` filters Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/546	2020-06-10 19:35:56 +03:00
Aliaksandr Valialkin	a7797dae09	lib/storage: fix Graphite wildcard matching, which has been broken in v1.36.0 v1.36.0 always returns empty responses for Graphite wildcards like the following {__name__=~"foo\\.[^.]\\.bar\\.baz"} Temporary workaround for v1.36.0 is to add `[^.]` to the end of the regexp.	2020-05-28 12:03:49 +03:00
Aliaksandr Valialkin	d186472081	lib/storage: improve search speed for time series matching Graphite whildcards such as `foo..bar.baz` Add index for reverse Graphite-like metric names with dots. Use this index during search for filters like `__name__=~"foo\\.[^.]\\.bar\\.baz"` which end with non-empty suffix with dots, i.e. `.bar.baz` in this case. This change may "hide" historical time series during queries. The workaround is to add `[.]` to the end of regexp label filter, i.e. "foo\\.[^.]\\.bar\\.baz" should be substituted with "foo\\.[^.]\\.bar\\.baz[.]".	2020-05-27 21:45:52 +03:00
Aliaksandr Valialkin	4fc33163c4	lib/storage: optimize ingestion pefrormance for new time series	2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin	364db13c9c	app/vmselect: add `/api/v1/status/tsdb` page with useful stats for locating root cause for high cardinality issues See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/425 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/268	2020-04-22 22:03:43 +03:00
Aliaksandr Valialkin	e0d0348f36	lib/storage: add missing reset for tagFilter.matchesEmptyValue on tagFilter.Init	2020-04-01 17:42:44 +03:00
Aliaksandr Valialkin	7a4635f853	all: remove the remaining mentions of cluster version	2019-11-21 23:18:22 +02:00
Aliaksandr Valialkin	86a1cd700b	lib/storage: remove inmemory index for recent hour, since it uses too much memory Production workload shows that the index requires ~4Kb of RAM per active time series. This is too much for high number of active time series, so let's delete this index. Now the queries should fall back to the index for the current day instead of the index for the recent hour. The query performance for the current day index should be good enough given the 100M rows/sec scan speed per CPU core.	2019-11-13 17:58:07 +02:00
Aliaksandr Valialkin	ca259864e2	lib/storage: return back inmemory inverted index for recent hour Issues fixed: - Slow startup times. Now the index is loaded from cache during start. - High memory usage related to superflouos index copies every 10 seconds.	2019-11-13 13:11:04 +02:00
Aliaksandr Valialkin	01bb3c06c7	lib/storage: remove inmemory inverted index for recent hours Production load with >10M active time series showed it could slow down VictoriaMetrics startup times and could eat all the memory leading to OOM. Remove inmemory inverted index for recent hours until thorough testing on production data shows it works OK.	2019-11-13 10:45:53 +02:00
Mike Poindexter	f3ad330635	Add test for invalid caching of tsids (#232 ) * Add test for invalid caching of tsids * Clean up error handling	2019-11-12 15:09:33 +02:00
Aliaksandr Valialkin	ee7765b10d	lib/storage: implement per-day inverted index	2019-11-10 00:02:46 +02:00
Aliaksandr Valialkin	e472f0b23b	lib/storage: substitute error message about unsorted items in the index block after metricIDs merge with counter The origin of the error has been detected and documented in the code, so it is enough to export a counter for such errors at `vm_index_blocks_with_metric_ids_incorrect_order_total`, so it could be monitored and alerted on high error rates. Export also the counter for processed index blocks with metricIDs - `vm_index_blocks_with_metric_ids_processed_total`, so its' rate could be compared to `rate(vm_index_blocks_with_metric_ids_incorrect_order_total)`.	2019-11-06 14:28:11 +02:00
Aliaksandr Valialkin	c1cf7d9f93	lib/storage: add tests for mergeTagToMetricIDsRows and return the original items if the function breaks items` ordering. This should save from data corruption issues revealed in the previous releases up to v1.28.0-beta5.	2019-10-08 16:27:35 +03:00
Aliaksandr Valialkin	2444433d83	lib/storage: add missing break in removeDuplicateMetricIDs	2019-09-25 18:23:43 +03:00
Aliaksandr Valialkin	ea4c828bae	lib/storage: remove duplicate MetricIDs in `tag->metricIDs` items before writing them into inverted index	2019-09-25 17:55:13 +03:00
Aliaksandr Valialkin	4e26ad869b	lib/{storage,mergeset}: verify PrepareBlock callback results Do not touch the first and the last item passed to PrepareBlock in order to preserve sort order of mergeset blocks.	2019-09-23 20:43:13 +03:00
Aliaksandr Valialkin	09fc6e22e5	all: use workingsetcache instead of fastcache This should reduce the amount of RAM required for processing time series with non-zero churn rate. The previous cache behavior can be restored with `-cache.oldBehavior` command-line flag.	2019-08-13 21:39:34 +03:00
Aliaksandr Valialkin	c14fd6c43f	lib/storage: typo fixes after `a77e88db7d`	2019-07-30 15:38:52 +03:00
Aliaksandr Valialkin	a77e88db7d	lib/storage: fix matching against tag filter with empty name Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/137	2019-07-30 15:15:09 +03:00
Aliaksandr Valialkin	683bf2a11f	lib/storage: make sure non-nil args are passed to openIndexDB	2019-06-25 20:10:04 +03:00
Aliaksandr Valialkin	cf63669303	lib/storage: skip searching in extDB if it doesn't contain items for the given time range This should improve inverted index search performance for big amount of unique time series when the search is performed only on recent data.	2019-06-25 13:00:37 +03:00
Aliaksandr Valialkin	dbd217b8f0	lib/storage: test GetSeriesCount	2019-06-10 12:43:34 +03:00
Aliaksandr Valialkin	d37924900b	lib/storage: optimize time series lookup for recent hours when the db contains many millions of time series with high churn rate (aka frequent deployments in Kubernetes)	2019-06-09 19:13:56 +03:00
Aliaksandr Valialkin	1836c415e6	all: open-sourcing single-node version	2019-05-23 00:18:06 +03:00

45 commits