github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-21 15:45:01 +00:00

Author	SHA1	Message	Date
Roman Khavronenko	fd39eb2b2a	lib/storage: update dedup tests * update misleading comments about preferring NaNs on intervals. NaNs are only preferred on timestamp conflicts * add conflicting timestamps to the benchmark test. Previously, benchmark wasn't checking the timestamp conflict code branch. The updated results after `c0fcfd6b97` are the following: ``` benchstat old.txt new.txt goos: darwin goarch: arm64 pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage cpu: Apple M4 Pro │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DeduplicateSamples/minScrapeInterval=3s-14 889.7n ± ∞ ¹ 904.3n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamples/minScrapeInterval=4s-14 735.9n ± ∞ ¹ 748.7n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamples/minScrapeInterval=10s-14 637.7n ± ∞ ¹ 659.3n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=3s-14 838.8n ± ∞ ¹ 810.4n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=4s-14 765.2n ± ∞ ¹ 735.1n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=10s-14 673.1n ± ∞ ¹ 622.4n ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 751.7n 741.0n -1.42% ``` ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. --- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-12-18 22:38:36 +01:00
Andrei Baidarov	33a012c225	lib/storage: prefer stale markers over other values on dedup interval Previously, during de-duplication staleness markers could be removed due to incorrect logic at values equality check. During the evaluation of read query vmselect deduplicates samples using dedupInterval option. It picks the highest value across all points with the same timestamp next to the border of dedupInterval. The issue is any comparison with NaN via <, > returns false. This means that the position of NaN in srcValues could affect the result. This commit changes this logic with additional step, that explicitly checks for staleness marker for the following cases: 1. Deduplication on vmselect 2. Deduplication in vmstorage during merges 3. Deduplication in stream aggregation check performed only for stale markers, because other NaNs are rejected on ingestion by vmstorage or by stream aggregation. Checking for stale markers in general slows down dedup speed by 3%: ``` benchstat old.txt new.txt goos: darwin goarch: arm64 pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage cpu: Apple M4 Pro │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ DeduplicateSamples/minScrapeInterval=1s-14 462.8n ± ∞ ¹ 425.2n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamples/minScrapeInterval=2s-14 905.6n ± ∞ ¹ 903.3n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamples/minScrapeInterval=5s-14 710.0n ± ∞ ¹ 698.9n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamples/minScrapeInterval=10s-14 632.7n ± ∞ ¹ 638.5n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=1s-14 439.7n ± ∞ ¹ 409.9n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=2s-14 908.9n ± ∞ ¹ 882.2n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=5s-14 721.2n ± ∞ ¹ 684.7n ± ∞ ¹ ~ (p=1.000 n=1) ² DeduplicateSamplesDuringMerge/minScrapeInterval=10s-14 659.1n ± ∞ ¹ 630.6n ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 659.5n 636.0n -3.56% ``` Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7674 --------- Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-12-12 13:00:34 +01:00
Aliaksandr Valialkin	e56d5e1918	lib/storage: follow-up after `7c0ae3a86a` - Update docs at https://docs.victoriametrics.com/#deduplication - Optimize the deduplication loop a bit Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333	2022-12-08 18:18:49 -08:00
Roman Khavronenko	909cd04c55	lib/storage: keep sample with the biggest value on timestamp conflict (#3421 ) The change leaves raw sample with the biggest value for identical timestamps per each `-dedup.minScrapeInterval` discrete interval when the deduplication is enabled. ``` benchstat old.txt new.txt name old time/op new time/op delta DeduplicateSamples/minScrapeInterval=1s-10 817ns ± 2% 832ns ± 3% ~ (p=0.052 n=10+10) DeduplicateSamples/minScrapeInterval=2s-10 1.56µs ± 1% 2.12µs ± 0% +35.19% (p=0.000 n=9+7) DeduplicateSamples/minScrapeInterval=5s-10 1.32µs ± 3% 1.65µs ± 2% +25.57% (p=0.000 n=10+10) DeduplicateSamples/minScrapeInterval=10s-10 1.13µs ± 2% 1.50µs ± 1% +32.85% (p=0.000 n=10+10) name old speed new speed delta DeduplicateSamples/minScrapeInterval=1s-10 10.0GB/s ± 2% 9.9GB/s ± 3% ~ (p=0.052 n=10+10) DeduplicateSamples/minScrapeInterval=2s-10 5.24GB/s ± 1% 3.87GB/s ± 0% -26.03% (p=0.000 n=9+7) DeduplicateSamples/minScrapeInterval=5s-10 6.22GB/s ± 3% 4.96GB/s ± 2% -20.37% (p=0.000 n=10+10) DeduplicateSamples/minScrapeInterval=10s-10 7.28GB/s ± 2% 5.48GB/s ± 1% -24.74% (p=0.000 n=10+10) ``` https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333 Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-12-08 18:18:36 -08:00
Aliaksandr Valialkin	361b08c30e	lib/storage: leave the last sample per each discrete interval during the deduplicaton This aligns better with staleness logic in Prometheus - https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness	2022-05-02 21:59:31 +03:00
Aliaksandr Valialkin	cdfe854c9b	lib/storage: explicitly pass dedupInterval to DeduplicateSamples() and deduplicateSamplesDuringMerge() This improves the code readability and debuggability, since the output of these functions stops depending on global state.	2021-12-14 20:52:29 +02:00
Aliaksandr Valialkin	9d3f9da5ad	lib/storage: make sure the second call to DeduplicateSamples and deduplicateSamplesDuringMerge doesnt change samples	2021-07-15 12:18:38 +03:00
Aliaksandr Valialkin	ef3c58d7a3	lib/storage: properly determine when the deduplication is needed in needsDedup Previously needsDedup() could return true if the de-duplication wasn't needed for the following case: d < interval / \ \| v \| v \| interval interval Now it properly returns false for this case	2021-07-12 10:54:51 +03:00
Aliaksandr Valialkin	7f3e884a31	all: spelling fix: superflouos->superfluous. This is a follow-up for `0acdab3ab9`	2020-11-24 12:42:04 +02:00
Aliaksandr Valialkin	a0000c3a6e	lib/storage: improve deduplication algorithm Now it leaves only the first data point on each `-dedup.minScrapeInterval` interval. Previously it may leave two data points on the interval. This could lead to unexpected results for `histogram_quantile(phi, sum(rate(buckets)) by (le))` query.	2020-04-26 13:10:18 +03:00
Aliaksandr Valialkin	e0c6da8e2a	lib/storage: disable deduplication after dedup tests are complete The rest of tests expect that the de-duplication is disabled.	2020-04-10 17:33:38 +03:00
Aliaksandr Valialkin	8ed0d5471a	lib/storage: correctly handle `-dedup.minScrapeInterval` values smaller than 8ms Such small values may be used for removing samples with duplicate timestamps. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/409 for details.	2020-04-10 16:40:41 +03:00

12 commits