VictoriaMetrics/docs/stream-aggregation/use-cases.md

306 lines
14 KiB
Markdown
Raw Permalink Normal View History

---
sort: 1
weight: 1
title: Use cases
menu:
docs:
identifier: stream-aggregation-use-cases
parent: 'stream-aggregation'
weight: 1
aliases:
- /stream-aggregation/use-cases/
- /stream-aggregation/use-cases/index.html
---
## Statsd alternative
Stream aggregation can be used as [statsd](https://github.com/statsd/statsd) alternative in the following cases:
* [Counting input samples](#counting-input-samples)
* [Summing input metrics](#summing-input-metrics)
* [Quantiles over input metrics](#quantiles-over-input-metrics)
* [Histograms over input metrics](#histograms-over-input-metrics)
* [Aggregating histograms](#aggregating-histograms)
Currently, streaming aggregation is available only for [supported data ingestion protocols](https://docs.victoriametrics.com/#how-to-import-time-series-data)
and not available for [Statsd metrics format](https://github.com/statsd/statsd/blob/master/docs/metric_types.md).
## Recording rules alternative
Sometimes [alerting queries](https://docs.victoriametrics.com/vmalert#alerting-rules) may require non-trivial amounts of CPU, RAM,
disk IO and network bandwidth at metrics storage side. For example, if `http_request_duration_seconds` histogram is generated by thousands
of application instances, then the alerting query `histogram_quantile(0.99, sum(increase(http_request_duration_seconds_bucket[5m])) without (instance)) > 0.5`
can become slow, since it needs to scan too big number of unique [time series](https://docs.victoriametrics.com/keyconcepts#time-series)
with `http_request_duration_seconds_bucket` name. This alerting query can be accelerated by pre-calculating
the `sum(increase(http_request_duration_seconds_bucket[5m])) without (instance)` via [recording rule](https://docs.victoriametrics.com/vmalert#recording-rules).
But this recording rule may take too much time to execute too. In this case the slow recording rule can be substituted
with the following [stream aggregation config](./configuration/#configuration-file-reference):
```yaml
- match: 'http_request_duration_seconds_bucket'
interval: 5m
without: [instance]
outputs: [total]
```
This stream aggregation generates `http_request_duration_seconds_bucket:5m_without_instance_total` output series according to [output metric naming](./key-concepts.md#output-metric-names).
Then these series can be used in [alerting rules](https://docs.victoriametrics.com/vmalert#alerting-rules):
```metricsql
histogram_quantile(0.99, last_over_time(http_request_duration_seconds_bucket:5m_without_instance_total[5m])) > 0.5
```
This query is executed much faster than the original query, because it needs to scan much lower number of time series.
See [the list of aggregate output](./configuration/outputs), which can be specified at `output` field.
See also [aggregating by labels](./key-concepts.md/#aggregating-by-labels).
Field `interval` is recommended to be set to a value at least several times higher than your metrics collect interval.
## Reducing the number of stored samples
If per-[series](https://docs.victoriametrics.com/keyconcepts#time-series) samples are ingested at high frequency,
then this may result in high disk space usage, since too much data must be stored to disk. This also may result
in slow queries, since too much data must be processed during queries.
This can be fixed with the stream aggregation by increasing the interval between per-series samples stored in the database.
For example, the following [stream aggregation config](./configuration/#configuration-file-reference) reduces the frequency of input samples
to one sample per 5 minutes per each input time series (this operation is also known as downsampling):
```yaml
# Aggregate metrics ending with _total with `total` output.
# See {{% ref "./configuration/outputs" %}}
- match: '{__name__=~".+_total"}'
interval: 5m
outputs: [total]
# Downsample other metrics with `count_samples`, `sum_samples`, `min` and `max` outputs
# See {{% ref "./configuration/outputs" %}}
- match: '{__name__!~".+_total"}'
interval: 5m
outputs: [count_samples, sum_samples, min, max]
```
The aggregated output metrics have the following names according to [output metric naming](./key-concepts.md#output-metric-names):
```text
# For input metrics ending with _total
some_metric_total:5m_total
# For input metrics not ending with _total
some_metric:5m_count_samples
some_metric:5m_sum_samples
some_metric:5m_min
some_metric:5m_max
```
See [the list of aggregate output](./configuration/outputs), which can be specified at `output` field.
See also [aggregating histograms](#aggregating-histograms) and [aggregating by labels](./key-concepts.md#aggregating-by-labels).
## Reducing the number of stored series
Sometimes applications may generate too many [time series](https://docs.victoriametrics.com/keyconcepts#time-series).
For example, the `http_requests_total` metric may have `path` or `user` label with too big number of unique values.
In this case the following stream aggregation can be used for reducing the number metrics stored in VictoriaMetrics:
```yaml
- match: 'http_requests_total'
interval: 30s
without: [path, user]
outputs: [total]
```
This config specifies labels, which must be removed from the aggregate output, in the `without` list.
See [these docs](./key-concepts.md#aggregating-by-labels) for more details.
The aggregated output metric has the following name according to [output metric naming](./key-concepts.md#output-metric-names):
```text
http_requests_total:30s_without_path_user_total
```
See [the list of aggregate output](./configuration/outputs), which can be specified at `output` field.
See also [aggregating histograms](#aggregating-histograms).
## Counting input samples
If the monitored application generates event-based metrics, then it may be useful to count the number of such metrics
at stream aggregation level.
For example, if an advertising server generates `hits{some="labels"} 1` and `clicks{some="labels"} 1` metrics
per each incoming hit and click, then the following [stream aggregation config](./configuration/#configuration-file-reference)
can be used for counting these metrics per 30 second interval:
```yaml
- match: '{__name__=~"hits|clicks"}'
interval: 30s
outputs: [count_samples]
```
This config generates the following output metrics for `hits` and `clicks` input metrics
according to [output metric naming](./key-concepts.md#output-metric-names):
```text
hits:30s_count_samples count1
clicks:30s_count_samples count2
```
See [the list of aggregate output](./configuration/outputs), which can be specified at `output` field.
See also [aggregating by labels](./key-concepts.md#aggregating-by-labels).
## Summing input metrics
If the monitored application calculates some events and then sends the calculated number of events to VictoriaMetrics
at irregular intervals or at too high frequency, then stream aggregation can be used for summing such events
and writing the aggregate sums to the storage at regular intervals.
For example, if an advertising server generates `hits{some="labels} N` and `clicks{some="labels"} M` metrics
at irregular intervals, then the following [stream aggregation config](./configuration/#configuration-file-reference)
can be used for summing these metrics per minute:
```yaml
- match: '{__name__=~"hits|clicks"}'
interval: 1m
outputs: [sum_samples]
```
This config generates the following output metrics according to [output metric naming](https://docs.victoriametrics.com/keyconcepts#output-metric-names):
```text
hits:1m_sum_samples sum1
clicks:1m_sum_samples sum2
```
See [the list of aggregate output](./configuration/outputs), which can be specified at `output` field.
See also [aggregating by labels](./key-concepts.md#aggregating-by-labels).
## Quantiles over input metrics
If the monitored application generates measurement metrics per request, then it may be useful to calculate
the pre-defined set of [percentiles](https://en.wikipedia.org/wiki/Percentile) over these measurements.
For example, if the monitored application generates `request_duration_seconds N` and `response_size_bytes M` metrics
per each incoming request, then the following [stream aggregation config](./configuration/#configuration-file-reference)
can be used for calculating 50th and 99th percentiles for these metrics every 30 seconds:
```yaml
- match:
- request_duration_seconds
- response_size_bytes
interval: 30s
outputs: ["quantiles(0.50, 0.99)"]
```
This config generates the following output metrics according to [output metric naming](./key-concepts.md#output-metric-names):
```text
request_duration_seconds:30s_quantiles{quantile="0.50"} value1
request_duration_seconds:30s_quantiles{quantile="0.99"} value2
response_size_bytes:30s_quantiles{quantile="0.50"} value1
response_size_bytes:30s_quantiles{quantile="0.99"} value2
```
See [the list of aggregate output](./configuration/outputs), which can be specified at `output` field.
See also [histograms over input metrics](#histograms-over-input-metrics) and [aggregating by labels](./key-concepts.md#aggregating-by-labels).
## Histograms over input metrics
If the monitored application generates measurement metrics per request, then it may be useful to calculate
a [histogram](https://docs.victoriametrics.com/keyconcepts#histogram) over these metrics.
For example, if the monitored application generates `request_duration_seconds N` and `response_size_bytes M` metrics
per each incoming request, then the following [stream aggregation config](./configuration/#configuration-file-reference)
can be used for calculating [VictoriaMetrics histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350)
for these metrics every 60 seconds:
```yaml
- match:
- request_duration_seconds
- response_size_bytes
interval: 60s
outputs: [histogram_bucket]
```
This config generates the following output metrics according to [output metric naming](./key-concepts.md#output-metric-names).
```text
request_duration_seconds:60s_histogram_bucket{vmrange="start1...end1"} count1
request_duration_seconds:60s_histogram_bucket{vmrange="start2...end2"} count2
...
request_duration_seconds:60s_histogram_bucket{vmrange="startN...endN"} countN
response_size_bytes:60s_histogram_bucket{vmrange="start1...end1"} count1
response_size_bytes:60s_histogram_bucket{vmrange="start2...end2"} count2
...
response_size_bytes:60s_histogram_bucket{vmrange="startN...endN"} countN
```
The resulting histogram buckets can be queried with [MetricsQL](https://docs.victoriametrics.com/metricsql/) in the following ways:
1. An estimated 50th and 99th [percentiles](https://en.wikipedia.org/wiki/Percentile) of the request duration over the last hour:
```metricsql
histogram_quantiles("quantile", 0.50, 0.99, sum(increase(request_duration_seconds:60s_histogram_bucket[1h])) by (vmrange))
```
This query uses [histogram_quantiles](https://docs.victoriametrics.com/metricsql/#histogram_quantiles) function.
1. An estimated [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation) of the request duration over the last hour:
```metricsql
histogram_stddev(sum(increase(request_duration_seconds:60s_histogram_bucket[1h])) by (vmrange))
```
This query uses [histogram_stddev](https://docs.victoriametrics.com/metricsql/#histogram_stddev) function.
1. An estimated share of requests with the duration smaller than `0.5s` over the last hour:
```metricsql
histogram_share(0.5, sum(increase(request_duration_seconds:60s_histogram_bucket[1h])) by (vmrange))
```
This query uses [histogram_share](https://docs.victoriametrics.com/metricsql/#histogram_share) function.
See [the list of aggregate output](./configuration/outputs), which can be specified at `output` field.
See also [quantiles over input metrics](#quantiles-over-input-metrics) and [aggregating by labels](./key-concepts.md#aggregating-by-labels).
## Aggregating histograms
[Histogram](https://docs.victoriametrics.com/keyconcepts#histogram) is a set of [counter](https://docs.victoriametrics.com/keyconcepts#counter)
metrics with different `vmrange` or `le` labels. As they're counters, the applicable aggregation output is
[total](./configuration/outputs/#total):
```yaml
- match: 'http_request_duration_seconds_bucket'
interval: 1m
without: [instance]
outputs: [total]
```
This config generates the following output metrics according to [output metric naming](./key-concepts.md#output-metric-names):
```text
http_request_duration_seconds_bucket:1m_without_instance_total{le="0.1"} value1
http_request_duration_seconds_bucket:1m_without_instance_total{le="0.2"} value2
http_request_duration_seconds_bucket:1m_without_instance_total{le="0.4"} value3
http_request_duration_seconds_bucket:1m_without_instance_total{le="1"} value4
http_request_duration_seconds_bucket:1m_without_instance_total{le="3"} value5
http_request_duration_seconds_bucket:1m_without_instance_total{le="+Inf" value6
```
The resulting metrics can be passed to [histogram_quantile](https://docs.victoriametrics.com/metricsql#histogram_quantile)
function:
```metricsql
histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket:1m_without_instance_total[5m])) by(le))
```
Please note, histograms can be aggregated if their `le` labels are configured identically.
[VictoriaMetrics histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350)
have no such requirement.
See [the list of aggregate output](./configuration/outputs), which can be specified at `output` field.
See also [histograms over input metrics](#histograms-over-input-metrics) and [quantiles over input metrics](#quantiles-over-input-metrics).