Newly added index entries can be missing after unclean shutdown, since they didn't flush to persistent storage yet.
Log about this and delete the corresponding metricID, so it could be re-created next time.
This eliminates the need for storing block data into temporary files on a single-node VictoriaMetrics
during heavy queries, which touch big number of time series over long time ranges.
This improves single-node VM performance on heavy queries by up to 2x.
Now it leaves only the first data point on each `-dedup.minScrapeInterval` interval.
Previously it may leave two data points on the interval. This could lead to unexpected results
for `histogram_quantile(phi, sum(rate(buckets)) by (le))` query.
This should reduce the frequency of the following errors:
cannot find tag filter matching less than N time series; either increase -search.maxUniqueTimeseries or use more specific tag filters
more than N time series found on the time range [...]; either increase -search.maxUniqueTimeseries or shrink the time range
This case is possible when the corresponding metricID->metricName entry didn't propagate to inverted index yet.
This should fix the following error:
error when searching tsids for tfss [...]: cannot find metricName by metricID 1582417212213420669: EOF
- Sort tag filters in the ascending number of matching time series
in order to apply the most specific filters first.
- Fall back to metricName search for filters matching big number of time series
(usually this are negative filters or regexp filters).
Such filters must match all the time series with `label="foo"` plus all the time series without `label`
Previously only time series with `label="foo"` were matched.
This guarantees that the snapshot contains all the recently added data
from inmemory buffers when multiple concurrent calls to Storage.CreateSnapshot are performed.
`runTransactions` call issues async deletions for transaction files. The previously issued transaction deletions
can race with the next call to `runTransactions`. Prevent this by waiting until all the pending transaction
deletions are funished in the beginning of `runTransactions`. Also make sure that all the pending transaction
deletions are finished before returning from `runTransactions`.
This should fix the issue on NFS when incompletely removed dirs may be left
after unclean shutdown (OOM, kill -9, hard reset, etc.), while the corresponding transaction
files are already removed.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162
The metricID->metricName entry can be missing in the indexdb after unclean shutdown
when only a part of entries for new time series is written into indexdb.
Recover from such a situation by removing the broken metricID. New metricID
will be automatically created for time series with the given metricName
when new data point will arive to it.
See the corresponding benchmark in Prometheus - 23c0299d85/tsdb/head_bench_test.go (L52)
The benchmark allows performing apples-to-apples comparison of time series search
in Prometheus and VictoriaMetrics. The following article - https://www.robustperception.io/evaluating-performance-and-correctness -
contains incorrect numbers for VictoriaMetrics, since there wasn't this benchmark yet. Fix it.
Benchmarks can be repeated with the following commands from Prometheus and VictoriaMetrics source code roots:
- Prometheus: GOMAXPROCS=1 go test ./tsdb/ -run=111 -bench=BenchmarkHeadPostingForMatchers
- VictoriaMetrics: GOMAXPROCS=1 go test ./lib/storage/ -run=111 -bench=BenchmarkHeadPostingForMatchers
Benchmark results:
benchmark old ns/op new ns/op delta
BenchmarkHeadPostingForMatchers/n="1" 272756688 364977 -99.87%
BenchmarkHeadPostingForMatchers/n="1",j="foo" 138132923 1181636 -99.14%
BenchmarkHeadPostingForMatchers/j="foo",n="1" 134723762 1141578 -99.15%
BenchmarkHeadPostingForMatchers/n="1",j!="foo" 195823953 1148056 -99.41%
BenchmarkHeadPostingForMatchers/i=~".*" 7962582919 8716755 -99.89%
BenchmarkHeadPostingForMatchers/i=~".+" 7589543864 12096587 -99.84%
BenchmarkHeadPostingForMatchers/i=~"" 1142371741 16164560 -98.59%
BenchmarkHeadPostingForMatchers/i!="" 9964150263 12230021 -99.88%
BenchmarkHeadPostingForMatchers/n="1",i=~".*",j="foo" 216995884 1173476 -99.46%
BenchmarkHeadPostingForMatchers/n="1",i=~".*",i!="2",j="foo" 202541348 1299743 -99.36%
BenchmarkHeadPostingForMatchers/n="1",i!="" 486285711 11555193 -97.62%
BenchmarkHeadPostingForMatchers/n="1",i!="",j="foo" 350776931 5607506 -98.40%
BenchmarkHeadPostingForMatchers/n="1",i=~".+",j="foo" 380888565 6380335 -98.32%
BenchmarkHeadPostingForMatchers/n="1",i=~"1.+",j="foo" 89500296 2078970 -97.68%
BenchmarkHeadPostingForMatchers/n="1",i=~".+",i!="2",j="foo" 379529654 6561368 -98.27%
BenchmarkHeadPostingForMatchers/n="1",i=~".+",i!~"2.*",j="foo" 424563825 6757132 -98.41%
The first column (old) is for Prometheus, the second column (new) is for VictoriaMetrics.
Prometheus was using 3.5GB of RAM during the benchmark, while VictoriaMetrics was using 400MB of RAM.
Production workload shows that the index requires ~4Kb of RAM per active time series.
This is too much for high number of active time series, so let's delete this index.
Now the queries should fall back to the index for the current day instead of the index
for the recent hour. The query performance for the current day index should be good enough
given the 100M rows/sec scan speed per CPU core.
Issues fixed:
- Slow startup times. Now the index is loaded from cache during start.
- High memory usage related to superflouos index copies every 10 seconds.
The origin of the error has been detected and documented in the code,
so it is enough to export a counter for such errors at `vm_index_blocks_with_metric_ids_incorrect_order_total`,
so it could be monitored and alerted on high error rates.
Export also the counter for processed index blocks with metricIDs - `vm_index_blocks_with_metric_ids_processed_total`,
so its' rate could be compared to `rate(vm_index_blocks_with_metric_ids_incorrect_order_total)`.
Slow loops could require seeks and expensive regexp matching, while fast loops just scans
all the metricIDs for the given `tag=value` prefix. So these operations must have separate
max loops multiplier.
The fastest tag filters are non-negative non-regexp, since they are the most specific.
The slowest tag filters are negative regexp, since they require scanning
all the entries for the given label.