added details about deduplication (#2527)

* added details about deduplication

* Update docs/README.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* Apply suggestions from code review

* Update docs/README.md

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
This commit is contained in:
Max Golionko 2022-05-02 20:45:57 +08:00 committed by Aliaksandr Valialkin
parent a70ac35ac7
commit be81a0a327
No known key found for this signature in database
GPG key ID: A72BEC6CD3D0DED1

View file

@ -1097,6 +1097,12 @@ with the enabled de-duplication. See [this section](#deduplication) for details.
VictoriaMetrics de-duplicates data points if `-dedup.minScrapeInterval` command-line flag is set to positive duration. For example, `-dedup.minScrapeInterval=60s` would de-duplicate data points on the same time series if they fall within the same discrete 60s bucket. The earliest data point will be kept. In the case of equal timestamps, an arbitrary data point will be kept. See [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2112#issuecomment-1032587618) for more details on how downsampling works.
The same `-dedup.minScrapeInterval` command-line flag value must be passed to both `vmselect` and `vmstorage` nodes because of the following aspects.
By default `vminsert` tries to route all the samples for a single time series to a single `vmstorage` node. But samples for a single time series can be spread among multiple `vmstorage` nodes under certain conditions:
* when adding/removing `vmstorage` nodes. Then new samples for a part of time series will be routed to another `vmstorage` nodes;
* when `vmstorage` nodes are temporarily unavailable (for instance, during their restart). Then new samples are re-routed to the remaining available `vmstorage` nodes;
* when `vmstorage` node has no enough capacity for processing incoming data stream. Then `vminsert` re-routes new samples to other `vmstorage` nodes.
The `-dedup.minScrapeInterval=D` is equivalent to `-downsampling.period=0s:D` if [downsampling](#downsampling) is enabled. It is safe to use deduplication and downsampling simultaneously.
The recommended value for `-dedup.minScrapeInterval` must equal to `scrape_interval` config from Prometheus configs. It is recommended to have a single `scrape_interval` across all the scrape targets. See [this article](https://www.robustperception.io/keep-it-simple-scrape_interval-id) for details.