github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-11 15:34:56 +00:00

Author	SHA1	Message	Date
Roman Khavronenko	63f6ac3ff8	lib/promutils: move time-related funcs from `promutils` to `timeutil` (#8403 ) Since funcs `ParseDuration` and `ParseTimeMsec` are used in vlogs, vmalert, victoriametrics and other components, importing promutils only for this reason makes them to export irrelevant `vm_rows_invalid_total{type="prometheus"}` metric. This change removes `vm_rows_invalid_total{type="prometheus"}` metric from /metrics page for these components. ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-03-03 10:25:42 +01:00
Zhu Jiekun	3d3480140c	lib/storage: properly cache extDB metricsID on search error Previously, if indexDB search failed for some reason during search at previous indexDB (aka extDB), VictoriaMetrics stored empty search result at cache. It could cause incorrect search results at subsequent requests. This commit checks search error and stores request results only on success. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8345	2025-02-26 15:48:25 +01:00
Aliaksandr Valialkin	bc69d5f1a4	lib/mergeset: explicitly pass the interval for flushing in-memory data to disk at MustOpenTable() This allows using different intervals for flushing in-memory data among different mergeset.Table instances. The initial user of this feature is lib/logstorage.Storage, which explicitly passes Storage.flushInterval to every created mereset.Table instance. Previously mergeset.Table instances were using 5 seconds flush interval, which didn't depend on the Storage.flushInterval. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775	2025-02-24 15:23:50 +01:00
f41gh7	855dfb324d	app/vmselect: add query resource limits priority This commit adds support for overriding vmstorage `maxUniqueTimeseries` with specific resource limits: 1. `-search.maxLabelsAPISeries` for [/api/v1/labels](https://docs.victoriametrics.com/url-examples/#apiv1labels), [/api/v1/label/.../values](https://docs.victoriametrics.com/url-examples/#apiv1labelvalues) 2. `-search. maxSeries` for [/api/v1/series](https://docs.victoriametrics.com/url-examples/#apiv1series) 3. `-search.maxTSDBStatusSeries` for [/api/v1/status/tsdb](https://docs.victoriametrics.com/#tsdb-stats) 4. `-search.maxDeleteSeries` for [/api/v1/admin/tsdb/delete_series](https://docs.victoriametrics.com/url-examples/#apiv1admintsdbdelete_series) Currently, this limit priority logic cannot be applied to flags `-search.maxFederateSeries` and `-search.maxExportSeries`, because they share the same RPC `search_v7` with the /api/v1/query and /api/v1/query_range APIs, preventing vmstorage from identifying the actual API of the request. To address that, we need to add additional information to the protocol between vmstorage and vmselect, which should be introduced in the future when possible. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7857	2025-02-19 18:18:32 +01:00
Artem Fetishev	ba0d7dc2fc	Allow disabling per-day index (#6976 ) ### Describe Your Changes Allow disabling the per-day index using the `-disablePerDayIndex` flag. This should significantly improve the ingestion rate and decrease the disk space usage for the use cases that assume small or no churn rate. See the docs added to `docs/README.md` for details. Both improvements are due to no data written to the per-day index. Benchmark results: ```shell rm -Rf ./lib/storage/Benchmark*; go test ./lib/storage -run=NONE -bench=BenchmarkStorageInsertWithAndWithoutPerDayIndex --loggerLevel=ERROR goos: linux goarch: amd64 pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage cpu: 13th Gen Intel(R) Core(TM) i7-1355U BenchmarkStorageInsertWithAndWithoutPerDayIndex/HighChurnRate/perDayIndexes-12 1 3850268120 ns/op 39.56 data-MiB 28.20 indexdb-MiB 259722 rows/s BenchmarkStorageInsertWithAndWithoutPerDayIndex/HighChurnRate/noPerDayIndexes-12 1 2916865725 ns/op 39.57 data-MiB 25.73 indexdb-MiB 342834 rows/s BenchmarkStorageInsertWithAndWithoutPerDayIndex/NoChurnRate/perDayIndexes-12 1 2218073474 ns/op 9.772 data-MiB 13.73 indexdb-MiB 450842 rows/s BenchmarkStorageInsertWithAndWithoutPerDayIndex/NoChurnRate/noPerDayIndexes-12 1 1295140898 ns/op 9.771 data-MiB 0.3566 indexdb-MiB 772119 rows/s PASS ok github.com/VictoriaMetrics/VictoriaMetrics/lib/storage 11.421s ``` Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>	2025-02-14 12:35:51 +01:00
Nikolay	b9a7bda0a1	lib/storage: refactoring introduce OpenOptions MustOpenStorage function may accept variable number of optional arguments. This commit combines optional arguments into dedicated OpenOptions struct. It reduces complexity of adding new optional arguments. Related PR: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8118	2025-02-13 11:10:44 +01:00
Artem Fetishev	631b736bc2	lib/storage: fix cardinality limiting for cases when insertion takes fast path (#8218 ) ### Describe Your Changes The cardinality limiter in this case does not receive the actual metricID but some other value found in r.TSID.MetricID and is not initialized. Depending on the system and/or go runtime implementation, this value can be 0 or some garbage value (which shouldn't have too wide a range). Thus, there basically no limit for inserted metricIDs. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2025-02-05 15:22:34 +01:00
Aliaksandr Valialkin	dc9dba71b2	lib/storage: open per-month partitions in parallel This should reduce the time needed for opening the storage with retentions exceeding a few months. While at at, limit the concurrency of opening partitions in parallel to the number of available CPU cores, since higher concurrency may increase RAM usage and CPU usage without performance improvements if opening a single partition is CPU-bound task. This is a follow-up for `17988942ab`	2025-01-27 16:07:14 +01:00
Nikolay	277fdd1070	lib/storage: reduce test suite batch size (#8022 ) Commit `eef6943084` added new test functions. Which checks various cases for metricName registration at data ingestion. Initial dataset size had 4 batches with 100 rows each. It works fine at machines with 5GB+ memory. But i386 architecture supports only 4GB of memory per process. Due to given limitations, batch size should be reduced to 3 batches and 30 rows. It keeps the same test funtionality, but reduces overall memory usage to ~3GB. Signed-off-by: f41gh7 <nik@victoriametrics.com>	2025-01-14 11:27:50 +01:00
Nikolay	e9f86af7f5	lib/storage: add a hint for merge about type of parts in merge (#7998 ) Hint allows to choose type of cache to be used for index search: - in-memory parts are storing recently ingested samples and should use main cache. This improves ingestion speed and cache hit ration for queries accessing recently ingested samples. - merges of file parts is performed in background, using a separate cache allows avoiding pollution of the main cache with irrelevant entries. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7182 --------- Signed-off-by: f41gh7 <nik@victoriametrics.com>	2025-01-10 16:01:39 +04:00
Nikolay	9ada784983	lib/storage: make finalDedup schedule interval configurable This commit makes configurable interval for checking if final dedup process for the historical data should be started. It allows to spread resource utilisation for multiple vmstorage/vmsingle instances in time. Since final dedup may add additional preasure on disk, backup systems and make cluster less stable. Storage unconditionally adds 25% jitter to the provided value, it should simplify configuration management at Kubernetes ecosystem. Because Kubernetes application pods must have the same configuration. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7880 --------- Signed-off-by: f41gh7 <nik@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2025-01-10 10:46:46 +01:00
Roman Khavronenko	c464d4484f	lib/storage: update dedup tests * update misleading comments about preferring NaNs on intervals. NaNs are only preferred on timestamp conflicts * add conflicting timestamps to the benchmark test. Previously, benchmark wasn't checking the timestamp conflict code branch. The updated results after `c0fcfd6b97` are the following: ``` benchstat old.txt new.txt goos: darwin goarch: arm64 pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage cpu: Apple M4 Pro │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DeduplicateSamples/minScrapeInterval=3s-14 889.7n ± ∞ ¹ 904.3n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamples/minScrapeInterval=4s-14 735.9n ± ∞ ¹ 748.7n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamples/minScrapeInterval=10s-14 637.7n ± ∞ ¹ 659.3n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=3s-14 838.8n ± ∞ ¹ 810.4n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=4s-14 765.2n ± ∞ ¹ 735.1n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=10s-14 673.1n ± ∞ ¹ 622.4n ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 751.7n 741.0n -1.42% ``` ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. --- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-12-16 12:50:41 +01:00
Andrei Baidarov	0dc576d3da	lib/storage: prefer stale markers over other values on dedup interval Previously, during de-duplication staleness markers could be removed due to incorrect logic at values equality check. During the evaluation of read query vmselect deduplicates samples using dedupInterval option. It picks the highest value across all points with the same timestamp next to the border of dedupInterval. The issue is any comparison with NaN via <, > returns false. This means that the position of NaN in srcValues could affect the result. This commit changes this logic with additional step, that explicitly checks for staleness marker for the following cases: 1. Deduplication on vmselect 2. Deduplication in vmstorage during merges 3. Deduplication in stream aggregation check performed only for stale markers, because other NaNs are rejected on ingestion by vmstorage or by stream aggregation. Checking for stale markers in general slows down dedup speed by 3%: ``` benchstat old.txt new.txt goos: darwin goarch: arm64 pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage cpu: Apple M4 Pro │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DeduplicateSamples/minScrapeInterval=1s-14 462.8n ± ∞ ¹ 425.2n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamples/minScrapeInterval=2s-14 905.6n ± ∞ ¹ 903.3n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamples/minScrapeInterval=5s-14 710.0n ± ∞ ¹ 698.9n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamples/minScrapeInterval=10s-14 632.7n ± ∞ ¹ 638.5n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=1s-14 439.7n ± ∞ ¹ 409.9n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=2s-14 908.9n ± ∞ ¹ 882.2n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=5s-14 721.2n ± ∞ ¹ 684.7n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=10s-14 659.1n ± ∞ ¹ 630.6n ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 659.5n 636.0n -3.56% ``` Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7674 --------- Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-12-12 12:34:17 +01:00
Andrii Chubatiuk	564e6ea024	app/{vminsert,vmagent}: drop time series on exceeding labels limits. Previously, time series with labels exceeding the configured limits were truncated and written to storage, potentially causing data inconsistency. This could lead to collisions between time series and make it difficult to identify the source due to truncated labels. This commit changes the behavior: * Such time series are now rejected outright. * Rejected time series are logged to stdout, and corresponding counters are incremented. * removes `vm_too_long_label_values_total`, `vm_too_long_label_names_total`, `vm_metrics_with_dropped_labels_total` metrics. * adds new values `[too_many_labels,too_long_label_name,too_long_label_value]` to `reason` label of the `vm_rows_ignored_total` metric name related issues: - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6928 - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7661	2024-12-10 21:19:16 +01:00
Andrii Chubatiuk	9cfdbc582f	refactoring: changed prompb to prompbmarshal everythere where internal series transformations are happening (#7409 ) ### Describe Your Changes doing similar changes for both vmagent and vminsert (like one in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7399) ends up with almost same implementations for each of packages instead of having this shared code in one place. one of the reasons is the same Timeseries and Labels structure from different prompb and prompbmarshal packages. My proposal is to use structures from prompb package only to marshal/unmarshal sent/received data, but for internal transformations use only structures from prompbmarshal package Another example, where it already can help to simplify code is streaming aggregation pipeline for vmsingle (now it first marshals prompb.Timeseries to storage.MetricRow and then if streaming aggregation or deduplication is enabled it unmarshals all the series back but to prompbmarshal.Timeseries) ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-11-26 12:45:17 +01:00
Artem Fetishev	3383589fd1	lib/storage: confirm that changing retention period can cause previous indexDB deletion (#7569 ) ### Describe Your Changes Add test cases proving that it is possible to lose indexDB after changing the retention period. See #7609 ### Checklist The following checks are mandatory: - [x ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-11-21 10:44:21 +01:00
Nikolay	1985110de2	lib/storage: properly check for minMissingTimestamps After changes at commit `787b9cd`. Minimal timestamps for extDB check was performed without context of the index search prefix. It worked fine for Single node version, but for cluster version a different prefix was used for metricID search requests. It may lead to incomplete results, if minimal missing timestamp was cached for the tenant with different ingestion patterns. Minimal reproducible case is: - metrics were ingested for tenants 0 and 1 - at some point in time metrics ingestion for tenant 1 stopped - index records have the following timestamps layout: tenant 0: 1,2,3,4,5,6 tenant 1: 1,2,3,4 - after indexDB rotation, containsTimeRange lookups may produce incorrect results: time range request for tenant 1 - 5:6 caches 5 as min timestamp request for the same or smaller time range for tenant 0 now returns empty results. Second case: - requests for the tenant without metrics always updates atomic value with incorrect minimal time range for other tenants. This commit replaces single atomic with map of search prefix keys. It should have slight performance overhead, but work consistently for cluster version. minMissingTimestamp is cached by prefix search key, which included tenantID. Since it will be only populated at runtime, it doesn't hold unused tenants for queries. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7417	2024-11-15 16:25:13 +01:00
Zakhar Bessarab	9f9cc24e4c	Revert "lib/mergeset: add sparse indexdb cache (#7269 )" This reverts commit `837d0d136d`.	2024-11-04 10:29:14 -03:00
Zakhar Bessarab	4e50d6eed3	lib/storage/partition: prevent panic in case resulting in-memory part is empty after merge (#7329 ) It is possible for in-memory part to be empty if ingested samples are removed by retention filters. In this case, data will not be discarded due to retention before creating in memory part. After in-memory parts merge samples will be removed resulting in creating completely empty part at destination. This commit checks for resulting part and skips it, if it's empty. --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-10-27 20:40:13 +01:00
Zakhar Bessarab	837d0d136d	lib/mergeset: add sparse indexdb cache (#7269 ) Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7182 - add a separate index cache for searches which might read through large amounts of random entries. Primary use-case for this is retention and downsampling filters, when applying filters background merge needs to fetch large amount of random entries which pollutes an index cache. Using different caches allows to reduce effect on memory usage and cache efficiency of the main cache while still having high cache hit rate. A separate cache size is 5% of allowed memory. - reduce size of indexdb/dataBlocks cache in order to free memory for new sparse cache. Reduced size by 5% and moved this to a separate cache. - add a separate metricName search which does not cache metric names - this is needed in order to allow disabling metric name caching when applying downsampling/retention filters. Applying filters during background merge accesses random entries, this fills up cache and does not provide an actual improvement due to random access nature. Merge performance and memory usage stats before and after the change: - before ![image](https://github.com/user-attachments/assets/485fffbb-c225-47ae-b5c5-bc8a7c57b36e) - after ![image](https://github.com/user-attachments/assets/f4ba3440-7c1c-4ec1-bc54-4d2ab431eef5) --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-10-24 15:21:17 +02:00
Artem Fetishev	6b9f57e5f7	lib/storage: Fix flaky test: TestStorageRotateIndexDB (#7267 ) This commit fixes the TestStorageRotateIndexDB flaky test reported at: #6977. Sample test failure: https://pastebin.com/bTSs8HP1 The test fails because one goroutine adds items to the indexDB table while another goroutine is closing that table. This may happen if indexDB rotation happens twice during one Storage.add() operation: - Storage.add() takes the current indexDB and adds index recods to it - First index db rotation makes the current index DB a previous one (still ok at this point) - Second index db rotation removes the indexDB that was current two rotations earlier. It does this by setting the mustDrop flag to true and decrementing the ref counter. The ref counter reaches zero which cases the underlying indexdb table to release its resources gracefully. Graceful release assumes that the table is not written anymore. But Storage.add() still adds items to it. The solution is to increment the indexDB ref counters while it is used inside add(). The unit test has been changed a little so that the test fails reliably. The idea is to make add() function invocation to last much longer, therefore the test inserts not just one record at a time but thouthands of them. To see the test fail, just replace the idbsLocked() func with: ```go unc (s Storage) idbsLocked2() (indexDB, *indexDB, func()) { return s.idbCurr.Load(), s.idbNext.Load(), func() {} } ``` --------- Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>	2024-10-23 11:48:21 +02:00
Zhu Jiekun	8c50c38a80	vmstorage: auto calculate maxUniqueTimeseries based on resources (#6961 ) ### Describe Your Changes Add support for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6930 Calculate `-search.maxUniqueTimeseries` by `-search.maxConcurrentRequests` and remaining memory if it's not set or less equal than 0. The remaining memory is affected by `-memory.allowedPercent`, `-memory.allowedBytes` and cgroup memory limit. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> (cherry picked from commit `85f60237e2`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-10-18 14:00:14 +02:00
Roman Khavronenko	0d4f4b8f7d	(app\|lib)/vmstorage: do not increment `vm_rows_ignored_total` on NaNs (#7166 ) `vm_rows_ignored_total` metric is a metric for users to signalize about ingestion issues, such as bad timestamp or parsing error. In commit `a5424e95b3` this metric started to increment each time vmstorage gets NaN. But NaN is a valid value for Prometheus data model and for Prometheus metrics exposition format. Exporters from Prometheus ecosystem could expose NaNs as values for metrics and these values will be delivered to vmstorage and increment the metric. Since there is nothing user can do with this, in opposite to parsing errors or bad timestamps, there is not much sense in incrementing this metric. So this commit rolls-back `reason="nan_value"` increments. ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-10-02 12:37:27 +02:00
Artem Fetishev	ed5da38ede	Introduce a flag for limiting the number of time series to delete (#7091 ) ### Describe Your Changes Introduce the `-search.maxDeleteSeries` flag that limits the number of time series that can be deleted with a single `/api/v1/admin/tsdb/delete_series` call. Currently, any number can be deleted and if the number is big (millions) then the operation may result in unaccounted CPU and memory usage spikes which in some cases may result in OOM kill (see #7027). The flag limits the number to 30k by default and the users may override it if needed at the vmstorage start time. --------- Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-09-30 10:02:21 +02:00
Artem Fetishev	55febc0920	lib/storage: restore ability to put empty metric ID list into tagFiltersToMetricIDsCache (#7064 ) ### Describe Your Changes Currently it the metricID list is empty it won't be mashalled and as the result won't be put into the tagFiltersToMetricIDsCache which causes the cache misses for the corresponding tagFilters. In some setups this causes severe search speed detradation (see #7009). The empty metric IDs was covered before but then was accidentally removed in `6c21439`. This PR restores the coverage of this case. A new unit test can be used as a proof that empty metricID lists are not added to the cache (just remove the fix in index_db.go and run the test to see the result) Also a benchmark has been added to see the implications of the compression. ``` user@laptop:~/p/github.com/rtm0/VictoriaMetrics/01/src$ go test ./lib/storage/ -run=NONE -bench BenchmarkMarshalUnmarshalMetricIDs --loggerLevel=ERROR goos: linux goarch: amd64 pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage cpu: 13th Gen Intel(R) Core(TM) i7-1355U BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-0-12 3237240 363.5 ns/op 0 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1-12 2831049 451.8 ns/op 0.4706 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10-12 1152764 1009 ns/op 1.667 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100-12 297055 3998 ns/op 5.755 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000-12 31172 34566 ns/op 8.484 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000-12 4900 289659 ns/op 9.416 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100000-12 447 2341173 ns/op 9.456 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000000-12 42 24926928 ns/op 9.468 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000000-12 5 204098872 ns/op 9.467 compression-rate PASS ok github.com/VictoriaMetrics/VictoriaMetrics/lib/storage 15.018s ``` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-09-20 17:21:53 +02:00
Aliaksandr Valialkin	787b9cd9a0	lib/storage: improve performance for indexSearch.containsTimeRange() The indexSearch.containsTimeRange() function is called for the current indexDB and the previous indexDB every time when searching for metricIDs by label filters. This function consumes a lot of additional CPU time for cases when queries with lightweight label filters are sent to VictoriaMetrics at high rate (e.g. thousands of RPS), like in the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7009 . Optimize indexSearch.containsTimeRange() function in the following ways: - Unconditionally return true if this function is called for the current indexDB, since there are very high chances that the current indexDB contains the data with timestamps in the requested time range. - Cache the minimum timestamp, which is missing in the indexed data for the previous indexDB. This is safe to do, since the previous indexDB is readonly. This optimization eliminates potentially slow lookup in the previous indexDB for typical use cases when the requested time range is close to the current time.	2024-09-20 13:07:20 +02:00
Aliaksandr Valialkin	6f61e9d49d	lib/storage: simplify indexDB.doExtDB() usage by removing the returned value Previously indexDB.doExtDB() was returning boolean value, which was indicating whether f callback was called. There is no need in returning this boolean value, since the f callback can determine on itself whether it was called. This simplifies the code a bit. While at it, document indexDB.doExtDB().	2024-09-20 11:59:57 +02:00
Roman Khavronenko	218c533874	lib/storage: follow-up after `d8f8822fa5` (#7036 ) Make function name and comments more clear. `d8f8822fa5` Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-09-20 11:50:47 +02:00
Nikolay	d8f8822fa5	lib/storage: consistently check for missing metricID index records (#6967 ) * Previously, only metricID->metricName missing index records were tracked with deadline But it was possible a case for missing metricID->TSID index records. IndexDB metrics fix exposed misleading metric for such missing records. * This commit adds check for metricID->TSID missing index records. And delete missing metricID entry if it hit 60 second deadline. Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6931 Signed-off-by: f41gh7 <nik@victoriametrics.com>	2024-09-16 10:05:08 +02:00
Artem Fetishev	a5424e95b3	lib/storage: adds metrics that count records that failed to insert ### Describe Your Changes Add storage metrics that count records that failed to insert: - `RowsReceivedTotal`: the number of records that have been received by the storage from the clients - `RowsAddedTotal`: the number of records that have actually been persisted. This value must be equal to `RowsReceivedTotal` if all the records have been valid ones. But it will be smaller otherwise. The values of the metrics below should provide the insight of why some records hasn't been added - `NaNValueRows`: the number of records whose value was `NaN` - `StaleNaNValueRows`: the number of records whose value was `Stale NaN` - `InvalidRawMetricNames`: the number of records whose raw metric name has failed to unmarshal. The following metrics existed before this PR and are listed here for completeness: - `TooSmallTimestampRows`: the number of records whose timestamp is negative or is older than retention period - `TooBigTimestampRows`: the number of records whose timestamp is too far in the future. - `HourlySeriesLimitRowsDropped`: the number of records that have not been added because the hourly series limit has been exceeded. - `DailySeriesLimitRowsDropped`: the number of records that have not been added because the daily series limit has been exceeded. --- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-09-06 17:57:21 +02:00
Artem Fetishev	39294b4919	lib/storage: do not drop stale NaN samples (#6936 ) This patch reverts `1fd3385` After discussing it we've come to conclusion that this is a valid behavior which can be avoided by deleting the time series only once the corresponding stale NaNs have been received. On the other hand, the fix leads to lost stale NaNs in some rare but valid use cases. For example: - In a cluster configuration the samples for a given time series are normally sent to the same vmstorage replica. However, wminsert may reroute the samples to another replica because the original one is down or is overloaded. In this case the stale NaN may end up on a replica that has no data for that time series, but we still want to record that sample. Thus, reverting that fix. --- related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069 Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-09-05 16:45:09 +02:00
Hui Wang	b48f5f3e59	lib/storage: fix metric `vm_object_references{type="indexdb"}` (#6937 ) follow up `4ecc370acb` ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-09-05 16:42:49 +02:00
rtm0	4df243d530	lib/storage: improve the message of the tooManyTimeseries error (#6893 ) ### Describe Your Changes This is a follow-up for #6836. Per @valyala's [comment](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6836#discussion_r1730291704), the error message does not reflect which flag needs to be adjusted. ### Checklist The following checks are mandatory: - [x ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-09-03 10:28:03 +02:00
rtm0	2c856c6951	tests: check Metrics.RowsAddedTotal in unit tests (#6895 ) ### Describe Your Changes This is a follow-up PR: Unit tests introduced in #6872 can now use RowsAddedTotal counter whose scope was fixed in #6841. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-08-30 14:31:15 +02:00
Nikolay	4ecc370acb	lib/storage: properly add previous indexDB metrics (#6890 ) Previously, some extIndexDB metrics were not registered. It resulted into missing metrics, if metric value was added to the extIndexDB. It's a usual case for search requests at both indexes. Current commit updates all metrics from extIndexDB according to the current IndexDB. It must fix such cases Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6868 ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-08-28 11:14:28 +02:00
rtm0	9fcfba3927	lib/storage: properly handle maxMetrics limit at metricID search `TL;DR` This PR improves the metric IDs search in IndexDB: - Avoid seaching for metric IDs twice when `maxMetrics` limit is exceeded - Use correct error type for indicating that the `maxMetrics` limit is exceded - Simplify the logic of deciding between per-day and global index search A unit test has been added to ensure that this refactoring does not break anything. --- Function calls before the fix: ``` idb.searchMetricIDs \|__ is.searchMetricIDs \|__ is.searchMetricIDsInternal \|__ is.updateMetricIDsForTagFilters \|__ is.tryUpdatingMetricIDsForDateRange \| \| \|__ is.getMetricIDsForDateAndFilters ``` - `searchMetricIDsInternal` searches metric IDs for each filter set. It maintains a metric ID set variable which is updated every time the `updateMetricIDsForTagFilters` function is called. After each successful call, the function checks the length of the updated metric ID set and if it is greater than `maxMetrics`, the function returns `too many timeseries` error. - `updateMetricIDsForTagFilters` uses either per-day or global index to search metric IDs for the given filter set. The decision of which index to use is made is made within the `tryUpdatingMetricIDsForDateRange` function and if it returns `fallback to global search` error then the function uses global index by calling `getMetricIDsForDateAndFilters` with zero date. - `tryUpdatingMetricIDsForDateRange` first checks if the given time range is larger than 40 days and if so returns `fallback to global search` error. Otherwise it proceeds to searching for metric IDs within that time range by calling `getMetricIDsForDateAndFilters` for each date. - `getMetricIDsForDateAndFilters` searches for metric IDs for the given date and returns `fallback to global search` error if the number of found metric IDs is greater than `maxMetrics`. Problems with this solution: 1. The `fallback to global search` error returned by `getMetricIDsForDateAndFilters` in case when maxMetrics is exceeded is misleading. 2. If `tryUpdatingMetricIDsForDateRange` proceeds to date range search and returns `fallback to global search` error (because `getMetricIDsForDateAndFilters` returns it) then this will trigger global search in `updateMetricIDsForTagFilters`. However the global search uses the same maxMetrics value which means this search is destined to fail too. I.e. the same search is performed twice and fails twice. 3. `too many timeseries` error is already handled in `searchMetricIDsInternal` and therefore handing this error in `updateMetricIDsForTagFilters` is redundant 4. updateMetricIDsForTagFilters is a better place to make a decision on whether to use per-day or global index. Solution: 1. Use a dedicated error for `too many timeseries` case 2. Handle `too many timeseries` error in `searchMetricIDsInternal` only 3. Move the per-day or global search decision from `tryUpdatingMetricIDsForDateRange` to `updateMetricIDsForTagFilters` and remove `fallback to global search` error. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-08-27 21:39:03 +02:00
rtm0	eef6943084	lib/storage: properly register index records with RegisterMetricNames Once the timeseries is in tsidCache, new entries won't be created in per-day index because the RegisterMetricNames() code does consider different dates for the same timeseries. So this case has been added. The same bug exists for AddRows() but it is not manifested because the index entries are finally created in updatePerDateData(). RegisterMetricNames also updated to increase the newTimeseriesCreated counter because it actually creates new time series in index. A unit tests has been added that check all possible data patterns (different metric names and dates) and code branches in both RegisterMetricNames and AddRows. The total number of new unit tests is around 100 which increaded the running time of storage tests by 50%. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>	2024-08-27 21:33:53 +02:00
rtm0	30f98916f9	Move rowsAddedTotal counter to Storage (#6841 ) ### Describe Your Changes Reduced the scope of rowsAddedTotal variable from global to Storage. This metric clearly belongs to a given Storage object as it counts the number of records added by a given Storage instance. Reducing the scope improves the incapsulation and allows to reset this variable during the unit tests (i.e. every time a new Storage object is created by a test, that object gets a new variable). Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-08-27 21:30:37 +02:00
Aliaksandr Valialkin	8551fbe9f3	Revert "refactor(vmstorage): Refactor the code to reduce the time complexity of `MustAddRows` and improve readability (#6629 )" This reverts commit `e280d90e9a`. Reason for revert: the updated code doesn't improve the performance of table.MustAddRows for the typical case when rows contain timestamps belonging to ptws[0]. The performance may be improved in theory for the case when all the rows belong to partiton other than ptws[0], but this partition is automatically moved to ptws[0] by the code at lines `6aad1d43e9/lib/storage/table.go (L287-L298)` , so the next time the typical case will work. Also the updated code makes the code harder to follow, since it introduces an additional level of indirection with non-trivial semantics inside table.MustAddRows - the partition.TimeRangeInPartition() function. This function needs to be inspected and understood when reading the code at table.MustAddRows(). This function depends on minTsInRows and maxTsInRows vars, which are defined and initialized many lines above the partition.TimeRangeInPartition() call. This complicates reading and understanding the code even more. The previous code was using clearer loop over rows with the clear call to partition.HasTimestamp() for every timestamp in the row. The partition.HasTimestamp() call is used in the table.MustAddRows() function multiple times. This makes the use of partition.HasTimestamp() call more consistent, easier to understand and easier to maintain comparing to the mix of partition.HasTimestamp() and partition.TimeRangeInPartition() calls. Aslo, there is no need in documenting some hardcore software engineering refactoring at docs/CHANGLELOG.md, since the docs/CHANGELOG.md is intended for VictoriaMetrics users, who may not know software engineering. The docs/CHANGELOG.md must document user-visible changes, and the docs must be concise and clear for VictoriaMetrics users. See https://docs.victoriametrics.com/contributing/#pull-request-checklist for more details. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6629	2024-07-25 14:32:09 +02:00
Ruixiang Tan	e280d90e9a	refactor(vmstorage): Refactor the code to reduce the time complexity of `MustAddRows` and improve readability (#6629 ) ### Describe Your Changes The original logic is not only highly complex but also poorly readable, so it can be modified to increase readability and reduce time complexity. --------- Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>	2024-07-25 08:55:12 +02:00
rtm0	bdc0e688e8	Fix inconsistent error handling in Storage.AddRows() (#6583 ) ### Describe Your Changes `Storage.AddRows()` returns an error only in one case: when `Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But the same error is treated as a warning when it happens inside `Storage.add()` or returned by `Storage.prefillNextIndexDB()`. This commit fixes this inconsistency by treating the error returned by `Storage.updatePerDateData()` as a warning as well. As a result `Storage.add()` does not need a return value anymore and so doesn't `Storage.AddRows()`. Additionally, this commit adds a unit test that checks all cases that result in a row not being added to the storage. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-07-17 12:07:14 +02:00
Aliaksandr Valialkin	784327ea30	lib/uint64set: optimize Set.Has() for nil Set - it should be inlined now This makes unnecessary the checkDeleted variable at lib/storage/index_db.go This is a follow-up for `b984f4672e` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6342	2024-07-15 23:59:20 +02:00
Aliaksandr Valialkin	c995ccad93	lib/{storage,mergeset}: do not allow setting dataFlushInterval to values smaller than pending{Items,Rows}FlushInterval Pending rows and items unconditionally remain in memory for up to pending{Items,Rows}FlushInterval, so there is no any sense in setting dataFlushInterval (the interval for guaranteed flush of in-memory data to disk) to values smaller than pending{Items,Rows}FlushInterval, since this doesn't affect the interval for flushing pending rows and items from memory to disk. This is a follow-up for `4c80b17027` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6221	2024-07-15 10:08:15 +02:00
Aliaksandr Valialkin	3c02937a34	all: consistently use 'any' instead of 'interface{}' 'any' type is supported starting from Go1.18. Let's consistently use it instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.	2024-07-10 00:20:37 +02:00
rtm0	a42bd59ee4	Fix Date metricid cache consistency under concurrent use (#6534 ) ### Describe Your Changes Fix Date metricid cache consistency under concurrent use. When one goroutine calls Has() and does not find the cache entry in the immutable map it will acquire a lock and check the mutable map. And it is possible that before that lock is acquired, the entry is moved from the mutable map to the immutable map by another goroutine causing a cache miss. The fix is to check the immutable map again once the lock is acquired. ### Checklist The following checks are mandatory: - [x ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-06-26 17:33:38 +02:00
Roman Khavronenko	b984f4672e	lib/storage: filter deleted label names and values from `/api/v1/labe… (#6342 ) …ls` and `/api/v1/label/.../values` Check for deleted metrics when `match[]` filter matches small number of time series (optimized path). The issue was introduced [v1.81.0](https://docs.victoriametrics.com/changelog_2022/#v1810). Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6300 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-29 14:07:44 +02:00
Aliaksandr Valialkin	4b458370c1	lib/logstorage: work-in-progress	2024-05-24 03:06:55 +02:00
Nikolay	a5d1013042	lib/storage: change default value for maxLabelValueLen to 1024 (#6313 ) * It must reduce memory usage for misbehaving clients. Since VictoriaMetrics stores sparse index inmemory. * Reduce disk space usage for indexdb. * Prevent possible indexDB items drops. * It may trigger slow insert and new timeseries registration due to default value for flag change https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6176 --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-05-22 21:53:53 +02:00
Aliaksandr Valialkin	ad505a7a9a	lib/logstorage: work-in-progress	2024-05-20 04:08:30 +02:00
Aliaksandr Valialkin	cc2647d212	lib/encoding: optimize UnmarshalVarUint64, UnmarshalVarInt64 and UnmarshalBytes a bit Change the return values for these functions - now they return the unmarshaled result plus the size of the unmarshaled result in bytes, so the caller could re-slice the src for further unmarshaling. This improves performance of these functions in hot loops of VictoriaLogs a bit.	2024-05-14 01:23:54 +02:00

1 2 3 4 5 ...

809 commits