4.4 KiB
sort | weight | title | menu | aliases | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 5 | Troubleshooting |
|
|
Known scenarios
- Unexpected spikes for
total
orincrease
outputs. - Lower than expected values for
total_prometheus
andincrease_prometheus
outputs. - High memory usage and CPU usage.
- Unexpected results in vmagent cluster mode.
Staleness
The following outputs track the last seen per-series values in order to properly calculate output values:
The last seen per-series value is dropped if no new samples are received for the given time series during two consecutive aggregation
intervals specified in stream aggregation config via interval
option.
If a new sample for the existing time series is received after that, then it is treated as the first sample for a new time series.
This may lead to the following issues:
- Lower than expected results for total_prometheus and increase_prometheus outputs, since they ignore the first sample in a new time series.
- Unexpected spikes for total and increase outputs, since they assume that new time series start from 0.
These issues can be fixed in the following ways:
- By increasing the
interval
option at stream aggregation config, so it covers the expected delays in data ingestion pipelines. - By specifying the
staleness_interval
option at stream aggregation config, so it covers the expected delays in data ingestion pipelines. By default, thestaleness_interval
equals to2 x interval
.
High resource usage
The following solutions can help reducing memory usage and CPU usage durting streaming aggregation:
- To use more specific
match
filters at streaming aggregation config, so only the really needed raw samples are aggregated. - To increase aggregation interval by specifying bigger duration for the
interval
option at streaming aggregation config. - To generate lower number of output time series by using less specific
by
list or more specificwithout
list. - To drop unneeded long labels in input samples via input_relabel_configs.
Cluster mode
If you use vmagent in cluster mode for streaming aggregation
then be careful when using by
or without
options or when modifying sample labels
via relabeling, since incorrect usage may result in duplicates and data collision.
For example, if more than one vmagent
instance calculates increase for http_requests_total
metric
with by: [path]
option, then all the vmagent
instances will aggregate samples to the same set of time series with different path
labels.
The proper fix would be adding an unique label for all the output samples
produced by each vmagent
, so they are aggregated into distinct sets of time series.
These time series then can be aggregated later as needed during querying.
If vmagent
instances run in Docker or Kubernetes, then you can refer POD_NAME
or HOSTNAME
environment variables
as an unique label value per each vmagent
via -remoteWrite.label=vmagent=%{HOSTNAME}
command-line flag.
See these docs on how to refer environment variables in VictoriaMetrics components.