Aliaksandr Valialkin
5b9e6b9d24
lib/storage: follow-up after 7c0ae3a86a
...
- Update docs at https://docs.victoriametrics.com/#deduplication
- Optimize the deduplication loop a bit
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333
2022-12-08 18:16:57 -08:00
Roman Khavronenko
7c0ae3a86a
lib/storage: keep sample with the biggest value on timestamp conflict ( #3421 )
...
The change leaves raw sample with the biggest value for identical
timestamps per each `-dedup.minScrapeInterval` discrete interval
when the deduplication is enabled.
```
benchstat old.txt new.txt
name old time/op new time/op delta
DeduplicateSamples/minScrapeInterval=1s-10 817ns ± 2% 832ns ± 3% ~ (p=0.052 n=10+10)
DeduplicateSamples/minScrapeInterval=2s-10 1.56µs ± 1% 2.12µs ± 0% +35.19% (p=0.000 n=9+7)
DeduplicateSamples/minScrapeInterval=5s-10 1.32µs ± 3% 1.65µs ± 2% +25.57% (p=0.000 n=10+10)
DeduplicateSamples/minScrapeInterval=10s-10 1.13µs ± 2% 1.50µs ± 1% +32.85% (p=0.000 n=10+10)
name old speed new speed delta
DeduplicateSamples/minScrapeInterval=1s-10 10.0GB/s ± 2% 9.9GB/s ± 3% ~ (p=0.052 n=10+10)
DeduplicateSamples/minScrapeInterval=2s-10 5.24GB/s ± 1% 3.87GB/s ± 0% -26.03% (p=0.000 n=9+7)
DeduplicateSamples/minScrapeInterval=5s-10 6.22GB/s ± 3% 4.96GB/s ± 2% -20.37% (p=0.000 n=10+10)
DeduplicateSamples/minScrapeInterval=10s-10 7.28GB/s ± 2% 5.48GB/s ± 1% -24.74% (p=0.000 n=10+10)
```
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-08 18:06:11 -08:00
Aliaksandr Valialkin
0d86644d65
lib/storage: leave the last sample per each discrete interval during the deduplicaton
...
This aligns better with staleness logic in Prometheus - https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness
2022-05-02 21:50:45 +03:00
Aliaksandr Valialkin
1d20a19c7d
lib/storage: explicitly pass dedupInterval to DeduplicateSamples() and deduplicateSamplesDuringMerge()
...
This improves the code readability and debuggability, since the output of these functions
stops depending on global state.
2021-12-14 20:49:12 +02:00
Aliaksandr Valialkin
d472b03e34
lib/storage: make sure the second call to DeduplicateSamples and deduplicateSamplesDuringMerge doesnt change samples
2021-07-15 12:17:45 +03:00
Aliaksandr Valialkin
8ca2799478
lib/storage: properly determine when the deduplication is needed in needsDedup
...
Previously needsDedup() could return true if the de-duplication wasn't needed for the following case:
d < interval
/ \
| v | v |
interval interval
Now it properly returns false for this case
2021-07-12 10:53:30 +03:00
Aliaksandr Valialkin
78d2715d04
all: spelling fix: superflouos->superfluous. This is a follow-up for 0acdab3ab9
2020-11-24 12:42:22 +02:00
Aliaksandr Valialkin
d7c1ff8b0c
lib/storage: improve deduplication algorithm
...
Now it leaves only the first data point on each `-dedup.minScrapeInterval` interval.
Previously it may leave two data points on the interval. This could lead to unexpected results
for `histogram_quantile(phi, sum(rate(buckets)) by (le))` query.
2020-04-26 13:10:02 +03:00
Aliaksandr Valialkin
4de6c6bbf0
lib/storage: disable deduplication after dedup tests are complete
...
The rest of tests expect that the de-duplication is disabled.
2020-04-10 17:28:31 +03:00
Aliaksandr Valialkin
ded0c0d3c7
lib/storage: correctly handle -dedup.minScrapeInterval
values smaller than 8ms
...
Such small values may be used for removing samples with duplicate timestamps.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/409 for details.
2020-04-10 16:36:41 +03:00