VictoriaMetrics/docs/stream-aggregation/troubleshooting.md

4.4 KiB

sort weight title menu aliases
5 5 Troubleshooting
docs
identifier parent weight
stream-aggregation-troubleshooting stream-aggregation 5
/stream-aggregation/troubleshooting/
/stream-aggregation/troubleshooting/index.html

Known scenarios

Staleness

The following outputs track the last seen per-series values in order to properly calculate output values:

The last seen per-series value is dropped if no new samples are received for the given time series during two consecutive aggregation intervals specified in stream aggregation config via interval option. If a new sample for the existing time series is received after that, then it is treated as the first sample for a new time series. This may lead to the following issues:

  • Lower than expected results for total_prometheus and increase_prometheus outputs, since they ignore the first sample in a new time series.
  • Unexpected spikes for total and increase outputs, since they assume that new time series start from 0.

These issues can be fixed in the following ways:

  • By increasing the interval option at stream aggregation config, so it covers the expected delays in data ingestion pipelines.
  • By specifying the staleness_interval option at stream aggregation config, so it covers the expected delays in data ingestion pipelines. By default, the staleness_interval equals to 2 x interval.

High resource usage

The following solutions can help reducing memory usage and CPU usage durting streaming aggregation:

Cluster mode

If you use vmagent in cluster mode for streaming aggregation then be careful when using by or without options or when modifying sample labels via relabeling, since incorrect usage may result in duplicates and data collision.

For example, if more than one vmagent instance calculates increase for http_requests_total metric with by: [path] option, then all the vmagent instances will aggregate samples to the same set of time series with different path labels. The proper fix would be adding an unique label for all the output samples produced by each vmagent, so they are aggregated into distinct sets of time series. These time series then can be aggregated later as needed during querying.

If vmagent instances run in Docker or Kubernetes, then you can refer POD_NAME or HOSTNAME environment variables as an unique label value per each vmagent via -remoteWrite.label=vmagent=%{HOSTNAME} command-line flag. See these docs on how to refer environment variables in VictoriaMetrics components.