Aliaksandr Valialkin
e56d5e1918
lib/storage: follow-up after 7c0ae3a86a
...
- Update docs at https://docs.victoriametrics.com/#deduplication
- Optimize the deduplication loop a bit
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333
2022-12-08 18:18:49 -08:00
Roman Khavronenko
909cd04c55
lib/storage: keep sample with the biggest value on timestamp conflict ( #3421 )
...
The change leaves raw sample with the biggest value for identical
timestamps per each `-dedup.minScrapeInterval` discrete interval
when the deduplication is enabled.
```
benchstat old.txt new.txt
name old time/op new time/op delta
DeduplicateSamples/minScrapeInterval=1s-10 817ns ± 2% 832ns ± 3% ~ (p=0.052 n=10+10)
DeduplicateSamples/minScrapeInterval=2s-10 1.56µs ± 1% 2.12µs ± 0% +35.19% (p=0.000 n=9+7)
DeduplicateSamples/minScrapeInterval=5s-10 1.32µs ± 3% 1.65µs ± 2% +25.57% (p=0.000 n=10+10)
DeduplicateSamples/minScrapeInterval=10s-10 1.13µs ± 2% 1.50µs ± 1% +32.85% (p=0.000 n=10+10)
name old speed new speed delta
DeduplicateSamples/minScrapeInterval=1s-10 10.0GB/s ± 2% 9.9GB/s ± 3% ~ (p=0.052 n=10+10)
DeduplicateSamples/minScrapeInterval=2s-10 5.24GB/s ± 1% 3.87GB/s ± 0% -26.03% (p=0.000 n=9+7)
DeduplicateSamples/minScrapeInterval=5s-10 6.22GB/s ± 3% 4.96GB/s ± 2% -20.37% (p=0.000 n=10+10)
DeduplicateSamples/minScrapeInterval=10s-10 7.28GB/s ± 2% 5.48GB/s ± 1% -24.74% (p=0.000 n=10+10)
```
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-08 18:18:36 -08:00
Aliaksandr Valialkin
361b08c30e
lib/storage: leave the last sample per each discrete interval during the deduplicaton
...
This aligns better with staleness logic in Prometheus - https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness
2022-05-02 21:59:31 +03:00
Aliaksandr Valialkin
cdfe854c9b
lib/storage: explicitly pass dedupInterval to DeduplicateSamples() and deduplicateSamplesDuringMerge()
...
This improves the code readability and debuggability, since the output of these functions
stops depending on global state.
2021-12-14 20:52:29 +02:00
Aliaksandr Valialkin
9d3f9da5ad
lib/storage: make sure the second call to DeduplicateSamples and deduplicateSamplesDuringMerge doesnt change samples
2021-07-15 12:18:38 +03:00
Aliaksandr Valialkin
ef3c58d7a3
lib/storage: properly determine when the deduplication is needed in needsDedup
...
Previously needsDedup() could return true if the de-duplication wasn't needed for the following case:
d < interval
/ \
| v | v |
interval interval
Now it properly returns false for this case
2021-07-12 10:54:51 +03:00
Aliaksandr Valialkin
7f3e884a31
all: spelling fix: superflouos->superfluous. This is a follow-up for 0acdab3ab9
2020-11-24 12:42:04 +02:00
Aliaksandr Valialkin
a0000c3a6e
lib/storage: improve deduplication algorithm
...
Now it leaves only the first data point on each `-dedup.minScrapeInterval` interval.
Previously it may leave two data points on the interval. This could lead to unexpected results
for `histogram_quantile(phi, sum(rate(buckets)) by (le))` query.
2020-04-26 13:10:18 +03:00
Aliaksandr Valialkin
e0c6da8e2a
lib/storage: disable deduplication after dedup tests are complete
...
The rest of tests expect that the de-duplication is disabled.
2020-04-10 17:33:38 +03:00
Aliaksandr Valialkin
8ed0d5471a
lib/storage: correctly handle -dedup.minScrapeInterval
values smaller than 8ms
...
Such small values may be used for removing samples with duplicate timestamps.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/409 for details.
2020-04-10 16:40:41 +03:00