mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2024-11-21 14:44:00 +00:00
docs: clarify deduplication docs (#4371)
The purpose of the change is too highlight what HA pair is and how deduplication needs identical labels to be present in raw samples. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4367 Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit is contained in:
parent
20dc3db71e
commit
8185c2466c
5 changed files with 93 additions and 40 deletions
35
README.md
35
README.md
|
@ -1466,22 +1466,37 @@ with the enabled de-duplication. See [this section](#deduplication) for details.
|
||||||
|
|
||||||
## Deduplication
|
## Deduplication
|
||||||
|
|
||||||
VictoriaMetrics leaves a single raw sample with the biggest timestamp per each `-dedup.minScrapeInterval` discrete interval
|
VictoriaMetrics leaves a single [raw sample](https://docs.victoriametrics.com/keyConcepts.html#raw-samples)
|
||||||
if `-dedup.minScrapeInterval` is set to positive duration. For example, `-dedup.minScrapeInterval=60s` would leave a single
|
with the biggest [timestamp](https://en.wikipedia.org/wiki/Unix_time) for each [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series)
|
||||||
raw sample with the biggest timestamp per each discrete 60s interval.
|
per each `-dedup.minScrapeInterval` discrete interval if `-dedup.minScrapeInterval` is set to positive duration.
|
||||||
|
For example, `-dedup.minScrapeInterval=60s` would leave a single raw sample with the biggest timestamp per each discrete
|
||||||
|
`60s` interval.
|
||||||
This aligns with the [staleness rules in Prometheus](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness).
|
This aligns with the [staleness rules in Prometheus](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness).
|
||||||
|
|
||||||
If multiple raw samples have the same biggest timestamp on the given `-dedup.minScrapeInterval` discrete interval, then the sample with the biggest value is left.
|
If multiple raw samples have **the same timestamp** on the given `-dedup.minScrapeInterval` discrete interval,
|
||||||
|
then the sample with **the biggest value** is kept.
|
||||||
|
|
||||||
The `-dedup.minScrapeInterval=D` is equivalent to `-downsampling.period=0s:D` if [downsampling](#downsampling) is enabled. So it is safe to use deduplication and downsampling simultaneously.
|
Please note, [labels](https://docs.victoriametrics.com/keyConcepts.html#labels) of raw samples should be identical
|
||||||
|
in order to be deduplicated. For example, this is why [HA pair of vmagents](https://docs.victoriametrics.com/vmagent.html#high-availability)
|
||||||
|
needs to be identically configured.
|
||||||
|
|
||||||
The recommended value for `-dedup.minScrapeInterval` must equal to `scrape_interval` config from Prometheus configs. It is recommended to have a single `scrape_interval` across all the scrape targets. See [this article](https://www.robustperception.io/keep-it-simple-scrape_interval-id) for details.
|
The `-dedup.minScrapeInterval=D` is equivalent to `-downsampling.period=0s:D` if [downsampling](#downsampling) is enabled.
|
||||||
|
So it is safe to use deduplication and downsampling simultaneously.
|
||||||
|
|
||||||
The de-duplication reduces disk space usage if multiple identically configured [vmagent](https://docs.victoriametrics.com/vmagent.html) or Prometheus instances in HA pair
|
The recommended value for `-dedup.minScrapeInterval` must equal to `scrape_interval` config from Prometheus configs.
|
||||||
write data to the same VictoriaMetrics instance. These vmagent or Prometheus instances must have identical
|
It is recommended to have a single `scrape_interval` across all the scrape targets.
|
||||||
`external_labels` section in their configs, so they write data to the same time series. See also [how to set up multiple vmagent instances for scraping the same targets](https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets).
|
See [this article](https://www.robustperception.io/keep-it-simple-scrape_interval-id) for details.
|
||||||
|
|
||||||
It is recommended passing different `-promscrape.cluster.name` values to HA pairs of `vmagent` instances, so the de-duplication consistently leaves samples for one `vmagent` instance and removes duplicate samples from other `vmagent` instances. See [these docs](https://docs.victoriametrics.com/vmagent.html#high-availability) for details.
|
The de-duplication reduces disk space usage if multiple **identically configured** [vmagent](https://docs.victoriametrics.com/vmagent.html)
|
||||||
|
or Prometheus instances in HA pair write data to the same VictoriaMetrics instance.
|
||||||
|
These vmagent or Prometheus instances must have **identical** `external_labels` section in their configs,
|
||||||
|
so they write data to the same time series.
|
||||||
|
See also [how to set up multiple vmagent instances for scraping the same targets](https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets).
|
||||||
|
|
||||||
|
It is recommended passing different `-promscrape.cluster.name` values to each distinct HA pair of `vmagent` instances,
|
||||||
|
so the de-duplication consistently leaves samples for one `vmagent` instance and removes duplicate samples
|
||||||
|
from other `vmagent` instances.
|
||||||
|
See [these docs](https://docs.victoriametrics.com/vmagent.html#high-availability) for details.
|
||||||
|
|
||||||
## Storage
|
## Storage
|
||||||
|
|
||||||
|
|
|
@ -752,14 +752,18 @@ See [these docs](https://docs.victoriametrics.com/#deduplication) for details.
|
||||||
|
|
||||||
## High availability
|
## High availability
|
||||||
|
|
||||||
It is possible to run multiple identically configured `vmagent` instances or `vmagent` [clusters](#scraping-big-number-of-targets),
|
It is possible to run multiple **identically configured** `vmagent` instances or `vmagent`
|
||||||
so they [scrape](#how-to-collect-metrics-in-prometheus-format) the same set of targets and push the collected data to the same set of VictoriaMetrics remote storage systems.
|
[clusters](#scraping-big-number-of-targets), so they [scrape](#how-to-collect-metrics-in-prometheus-format)
|
||||||
|
the same set of targets and push the collected data to the same set of VictoriaMetrics remote storage systems.
|
||||||
|
Two **identically configured** vmagent instances or clusters is usually called an HA pair.
|
||||||
|
|
||||||
In this case the deduplication must be configured at VictoriaMetrics in order to de-duplicate samples received from multiple identically configured `vmagent` instances or clusters.
|
When running HA pairs, [deduplication](https://docs.victoriametrics.com/#deduplication) must be configured
|
||||||
|
at VictoriaMetrics side in order to de-duplicate received samples.
|
||||||
See [these docs](https://docs.victoriametrics.com/#deduplication) for details.
|
See [these docs](https://docs.victoriametrics.com/#deduplication) for details.
|
||||||
|
|
||||||
It is also recommended passing different values to `-promscrape.cluster.name` command-line flag per each `vmagent` instance or per each `vmagent` cluster in HA setup.
|
It is also recommended passing different values to `-promscrape.cluster.name` command-line flag per each `vmagent`
|
||||||
This is needed for proper data de-duplication. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2679) for details.
|
instance or per each `vmagent` cluster in HA setup. This is needed for proper data de-duplication.
|
||||||
|
See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2679) for details.
|
||||||
|
|
||||||
## Scraping targets via a proxy
|
## Scraping targets via a proxy
|
||||||
|
|
||||||
|
|
|
@ -1469,22 +1469,37 @@ with the enabled de-duplication. See [this section](#deduplication) for details.
|
||||||
|
|
||||||
## Deduplication
|
## Deduplication
|
||||||
|
|
||||||
VictoriaMetrics leaves a single raw sample with the biggest timestamp per each `-dedup.minScrapeInterval` discrete interval
|
VictoriaMetrics leaves a single [raw sample](https://docs.victoriametrics.com/keyConcepts.html#raw-samples)
|
||||||
if `-dedup.minScrapeInterval` is set to positive duration. For example, `-dedup.minScrapeInterval=60s` would leave a single
|
with the biggest [timestamp](https://en.wikipedia.org/wiki/Unix_time) for each [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series)
|
||||||
raw sample with the biggest timestamp per each discrete 60s interval.
|
per each `-dedup.minScrapeInterval` discrete interval if `-dedup.minScrapeInterval` is set to positive duration.
|
||||||
|
For example, `-dedup.minScrapeInterval=60s` would leave a single raw sample with the biggest timestamp per each discrete
|
||||||
|
`60s` interval.
|
||||||
This aligns with the [staleness rules in Prometheus](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness).
|
This aligns with the [staleness rules in Prometheus](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness).
|
||||||
|
|
||||||
If multiple raw samples have the same biggest timestamp on the given `-dedup.minScrapeInterval` discrete interval, then the sample with the biggest value is left.
|
If multiple raw samples have **the same timestamp** on the given `-dedup.minScrapeInterval` discrete interval,
|
||||||
|
then the sample with **the biggest value** is kept.
|
||||||
|
|
||||||
The `-dedup.minScrapeInterval=D` is equivalent to `-downsampling.period=0s:D` if [downsampling](#downsampling) is enabled. So it is safe to use deduplication and downsampling simultaneously.
|
Please note, [labels](https://docs.victoriametrics.com/keyConcepts.html#labels) of raw samples should be identical
|
||||||
|
in order to be deduplicated. For example, this is why [HA pair of vmagents](https://docs.victoriametrics.com/vmagent.html#high-availability)
|
||||||
|
needs to be identically configured.
|
||||||
|
|
||||||
The recommended value for `-dedup.minScrapeInterval` must equal to `scrape_interval` config from Prometheus configs. It is recommended to have a single `scrape_interval` across all the scrape targets. See [this article](https://www.robustperception.io/keep-it-simple-scrape_interval-id) for details.
|
The `-dedup.minScrapeInterval=D` is equivalent to `-downsampling.period=0s:D` if [downsampling](#downsampling) is enabled.
|
||||||
|
So it is safe to use deduplication and downsampling simultaneously.
|
||||||
|
|
||||||
The de-duplication reduces disk space usage if multiple identically configured [vmagent](https://docs.victoriametrics.com/vmagent.html) or Prometheus instances in HA pair
|
The recommended value for `-dedup.minScrapeInterval` must equal to `scrape_interval` config from Prometheus configs.
|
||||||
write data to the same VictoriaMetrics instance. These vmagent or Prometheus instances must have identical
|
It is recommended to have a single `scrape_interval` across all the scrape targets.
|
||||||
`external_labels` section in their configs, so they write data to the same time series. See also [how to set up multiple vmagent instances for scraping the same targets](https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets).
|
See [this article](https://www.robustperception.io/keep-it-simple-scrape_interval-id) for details.
|
||||||
|
|
||||||
It is recommended passing different `-promscrape.cluster.name` values to HA pairs of `vmagent` instances, so the de-duplication consistently leaves samples for one `vmagent` instance and removes duplicate samples from other `vmagent` instances. See [these docs](https://docs.victoriametrics.com/vmagent.html#high-availability) for details.
|
The de-duplication reduces disk space usage if multiple **identically configured** [vmagent](https://docs.victoriametrics.com/vmagent.html)
|
||||||
|
or Prometheus instances in HA pair write data to the same VictoriaMetrics instance.
|
||||||
|
These vmagent or Prometheus instances must have **identical** `external_labels` section in their configs,
|
||||||
|
so they write data to the same time series.
|
||||||
|
See also [how to set up multiple vmagent instances for scraping the same targets](https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets).
|
||||||
|
|
||||||
|
It is recommended passing different `-promscrape.cluster.name` values to each distinct HA pair of `vmagent` instances,
|
||||||
|
so the de-duplication consistently leaves samples for one `vmagent` instance and removes duplicate samples
|
||||||
|
from other `vmagent` instances.
|
||||||
|
See [these docs](https://docs.victoriametrics.com/vmagent.html#high-availability) for details.
|
||||||
|
|
||||||
## Storage
|
## Storage
|
||||||
|
|
||||||
|
|
|
@ -1477,22 +1477,37 @@ with the enabled de-duplication. See [this section](#deduplication) for details.
|
||||||
|
|
||||||
## Deduplication
|
## Deduplication
|
||||||
|
|
||||||
VictoriaMetrics leaves a single raw sample with the biggest timestamp per each `-dedup.minScrapeInterval` discrete interval
|
VictoriaMetrics leaves a single [raw sample](https://docs.victoriametrics.com/keyConcepts.html#raw-samples)
|
||||||
if `-dedup.minScrapeInterval` is set to positive duration. For example, `-dedup.minScrapeInterval=60s` would leave a single
|
with the biggest [timestamp](https://en.wikipedia.org/wiki/Unix_time) for each [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series)
|
||||||
raw sample with the biggest timestamp per each discrete 60s interval.
|
per each `-dedup.minScrapeInterval` discrete interval if `-dedup.minScrapeInterval` is set to positive duration.
|
||||||
|
For example, `-dedup.minScrapeInterval=60s` would leave a single raw sample with the biggest timestamp per each discrete
|
||||||
|
`60s` interval.
|
||||||
This aligns with the [staleness rules in Prometheus](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness).
|
This aligns with the [staleness rules in Prometheus](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness).
|
||||||
|
|
||||||
If multiple raw samples have the same biggest timestamp on the given `-dedup.minScrapeInterval` discrete interval, then the sample with the biggest value is left.
|
If multiple raw samples have **the same timestamp** on the given `-dedup.minScrapeInterval` discrete interval,
|
||||||
|
then the sample with **the biggest value** is kept.
|
||||||
|
|
||||||
The `-dedup.minScrapeInterval=D` is equivalent to `-downsampling.period=0s:D` if [downsampling](#downsampling) is enabled. So it is safe to use deduplication and downsampling simultaneously.
|
Please note, [labels](https://docs.victoriametrics.com/keyConcepts.html#labels) of raw samples should be identical
|
||||||
|
in order to be deduplicated. For example, this is why [HA pair of vmagents](https://docs.victoriametrics.com/vmagent.html#high-availability)
|
||||||
|
needs to be identically configured.
|
||||||
|
|
||||||
The recommended value for `-dedup.minScrapeInterval` must equal to `scrape_interval` config from Prometheus configs. It is recommended to have a single `scrape_interval` across all the scrape targets. See [this article](https://www.robustperception.io/keep-it-simple-scrape_interval-id) for details.
|
The `-dedup.minScrapeInterval=D` is equivalent to `-downsampling.period=0s:D` if [downsampling](#downsampling) is enabled.
|
||||||
|
So it is safe to use deduplication and downsampling simultaneously.
|
||||||
|
|
||||||
The de-duplication reduces disk space usage if multiple identically configured [vmagent](https://docs.victoriametrics.com/vmagent.html) or Prometheus instances in HA pair
|
The recommended value for `-dedup.minScrapeInterval` must equal to `scrape_interval` config from Prometheus configs.
|
||||||
write data to the same VictoriaMetrics instance. These vmagent or Prometheus instances must have identical
|
It is recommended to have a single `scrape_interval` across all the scrape targets.
|
||||||
`external_labels` section in their configs, so they write data to the same time series. See also [how to set up multiple vmagent instances for scraping the same targets](https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets).
|
See [this article](https://www.robustperception.io/keep-it-simple-scrape_interval-id) for details.
|
||||||
|
|
||||||
It is recommended passing different `-promscrape.cluster.name` values to HA pairs of `vmagent` instances, so the de-duplication consistently leaves samples for one `vmagent` instance and removes duplicate samples from other `vmagent` instances. See [these docs](https://docs.victoriametrics.com/vmagent.html#high-availability) for details.
|
The de-duplication reduces disk space usage if multiple **identically configured** [vmagent](https://docs.victoriametrics.com/vmagent.html)
|
||||||
|
or Prometheus instances in HA pair write data to the same VictoriaMetrics instance.
|
||||||
|
These vmagent or Prometheus instances must have **identical** `external_labels` section in their configs,
|
||||||
|
so they write data to the same time series.
|
||||||
|
See also [how to set up multiple vmagent instances for scraping the same targets](https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets).
|
||||||
|
|
||||||
|
It is recommended passing different `-promscrape.cluster.name` values to each distinct HA pair of `vmagent` instances,
|
||||||
|
so the de-duplication consistently leaves samples for one `vmagent` instance and removes duplicate samples
|
||||||
|
from other `vmagent` instances.
|
||||||
|
See [these docs](https://docs.victoriametrics.com/vmagent.html#high-availability) for details.
|
||||||
|
|
||||||
## Storage
|
## Storage
|
||||||
|
|
||||||
|
|
|
@ -763,14 +763,18 @@ See [these docs](https://docs.victoriametrics.com/#deduplication) for details.
|
||||||
|
|
||||||
## High availability
|
## High availability
|
||||||
|
|
||||||
It is possible to run multiple identically configured `vmagent` instances or `vmagent` [clusters](#scraping-big-number-of-targets),
|
It is possible to run multiple **identically configured** `vmagent` instances or `vmagent`
|
||||||
so they [scrape](#how-to-collect-metrics-in-prometheus-format) the same set of targets and push the collected data to the same set of VictoriaMetrics remote storage systems.
|
[clusters](#scraping-big-number-of-targets), so they [scrape](#how-to-collect-metrics-in-prometheus-format)
|
||||||
|
the same set of targets and push the collected data to the same set of VictoriaMetrics remote storage systems.
|
||||||
|
Two **identically configured** vmagent instances or clusters is usually called an HA pair.
|
||||||
|
|
||||||
In this case the deduplication must be configured at VictoriaMetrics in order to de-duplicate samples received from multiple identically configured `vmagent` instances or clusters.
|
When running HA pairs, [deduplication](https://docs.victoriametrics.com/#deduplication) must be configured
|
||||||
|
at VictoriaMetrics side in order to de-duplicate received samples.
|
||||||
See [these docs](https://docs.victoriametrics.com/#deduplication) for details.
|
See [these docs](https://docs.victoriametrics.com/#deduplication) for details.
|
||||||
|
|
||||||
It is also recommended passing different values to `-promscrape.cluster.name` command-line flag per each `vmagent` instance or per each `vmagent` cluster in HA setup.
|
It is also recommended passing different values to `-promscrape.cluster.name` command-line flag per each `vmagent`
|
||||||
This is needed for proper data de-duplication. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2679) for details.
|
instance or per each `vmagent` cluster in HA setup. This is needed for proper data de-duplication.
|
||||||
|
See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2679) for details.
|
||||||
|
|
||||||
## Scraping targets via a proxy
|
## Scraping targets via a proxy
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue