VictoriaMetrics/docs/stream-aggregation/troubleshooting.md at docs-stream-aggregation

github-mirrors/VictoriaMetrics

Fork 0

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-21 15:45:01 +00:00

AndrewChubatiuk b8f06e9636

docs: stream aggregation updated structure, added common mistakes section

2024-08-10 08:12:38 +03:00

4.4 KiB

Raw Permalink Blame History

sort

weight

title

aliases

Troubleshooting

docs

identifier	parent	weight
stream-aggregation-troubleshooting	stream-aggregation	5

/stream-aggregation/troubleshooting/

/stream-aggregation/troubleshooting/index.html

Known scenarios

Unexpected spikes for total or increase outputs.
Lower than expected values for total_prometheus and increase_prometheus outputs.
High memory usage and CPU usage.
Unexpected results in vmagent cluster mode.

Staleness

The following outputs track the last seen per-series values in order to properly calculate output values:

The last seen per-series value is dropped if no new samples are received for the given time series during two consecutive aggregation intervals specified in stream aggregation config via interval option. If a new sample for the existing time series is received after that, then it is treated as the first sample for a new time series. This may lead to the following issues:

Lower than expected results for total_prometheus and increase_prometheus outputs, since they ignore the first sample in a new time series.
Unexpected spikes for total and increase outputs, since they assume that new time series start from 0.

These issues can be fixed in the following ways:

By increasing the interval option at stream aggregation config, so it covers the expected delays in data ingestion pipelines.
By specifying the staleness_interval option at stream aggregation config, so it covers the expected delays in data ingestion pipelines. By default, the staleness_interval equals to 2 x interval.

High resource usage

The following solutions can help reducing memory usage and CPU usage durting streaming aggregation:

To use more specific match filters at streaming aggregation config, so only the really needed raw samples are aggregated.
To increase aggregation interval by specifying bigger duration for the interval option at streaming aggregation config.
To generate lower number of output time series by using less specific by list or more specific without list.
To drop unneeded long labels in input samples via input_relabel_configs.

Cluster mode

If you use vmagent in cluster mode for streaming aggregation then be careful when using by or without options or when modifying sample labels via relabeling, since incorrect usage may result in duplicates and data collision.

For example, if more than one vmagent instance calculates increase for http_requests_total metric with by: [path] option, then all the vmagent instances will aggregate samples to the same set of time series with different path labels. The proper fix would be adding an unique label for all the output samples produced by each vmagent, so they are aggregated into distinct sets of time series. These time series then can be aggregated later as needed during querying.

If vmagent instances run in Docker or Kubernetes, then you can refer POD_NAME or HOSTNAME environment variables as an unique label value per each vmagent via -remoteWrite.label=vmagent=%{HOSTNAME} command-line flag. See these docs on how to refer environment variables in VictoriaMetrics components.

4.4 KiB Raw Permalink Blame History

Known scenarios

Staleness

High resource usage

Cluster mode

4.4 KiB

Raw Permalink Blame History