15 KiB
sort | weight | title | menu | aliases | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 2 | Key concepts |
|
|
Single-node VictoriaMetrics supports relabeling, deduplication and stream aggregation for all the received data, scraped or pushed. The processed data is then stored in local storage and can't be forwarded further.
vmagent supports relabeling, deduplication and stream aggregation for all
the received data, scraped or pushed. Then, the collected data will be forwarded to specified -remoteWrite.url
destinations.
The data processing order is the following:
- All the received data is relabeled according to
specified
-remoteWrite.relabelConfig
; - All the received data is deduplicated
according to specified
-streamAggr.dedupInterval
; - All the received data is aggregated according to specified
-streamAggr.config
; - The resulting data from p1 and p2 is then replicated to each
-remoteWrite.url
; - Data sent to each
-remoteWrite.url
can be additionally relabeled according to the corresponding-remoteWrite.urlRelabelConfig
(set individually per URL); - Data sent to each
-remoteWrite.url
can be additionally deduplicated according to the corresponding-remoteWrite.streamAggr.dedupInterval
(set individually per URL); - Data sent to each
-remoteWrite.url
can be additionally aggregated according to the corresponding-remoteWrite.streamAggr.config
(set individually per URL). Please note, it is not recommended to use-streamAggr.config
and-remoteWrite.streamAggr.config
together, unless you understand the complications.
Typical scenarios for data routing with vmagent:
- Aggregate incoming data and replicate to N destinations. For this one should configure
-streamAggr.config
to aggregate the incoming data before replicating it to all the configured-remoteWrite.url
destinations. - Individually aggregate incoming data for each destination. For this on should configure
-remoteWrite.streamAggr.config
for each-remoteWrite.url
destination. Relabeling via-remoteWrite.urlRelabelConfig
can be used for routing only selected metrics to each-remoteWrite.url
destination.
Deduplication
vmagent supports online de-duplication of samples
before sending them to the configured -remoteWrite.url
. The de-duplication can be enabled via the following options:
-
By specifying the desired de-duplication interval via
-streamAggr.dedupInterval
command-line flag for all received data or via-remoteWrite.streamAggr.dedupInterval
command-line flag for the particular-remoteWrite.url
destination. For example,./vmagent -remoteWrite.url=http://remote-storage/api/v1/write -remoteWrite.streamAggr.dedupInterval=30s
instructsvmagent
to leave only the last sample per each seen time series per every 30 seconds. The de-deduplication is performed after applying relabeling and before performing the aggregation. If the-remoteWrite.streamAggr.config
and / or-streamAggr.config
is set, then the de-duplication is performed individually per each stream aggregation config for the matching samples after applying input_relabel_configs. -
By specifying
dedup_interval
option individually per each stream aggregation config in-remoteWrite.streamAggr.config
or-streamAggr.config
configs.
Single-node VictoriaMetrics supports two types of de-duplication:
- After storing the duplicate samples to local storage. See
-dedup.minScrapeInterval
command-line option. - Before storing the duplicate samples to local storage. This type of de-duplication can be enabled via the following options:
-
By specifying the desired de-duplication interval via
-streamAggr.dedupInterval
command-line flag. For example,./victoria-metrics -streamAggr.dedupInterval=30s
instructs VictoriaMetrics to leave only the last sample per each seen time series per every 30 seconds. The de-duplication is performed after applying relabeling and before performing the aggregation.If the
-remtoeWrite.streamAggr.config
and / or-streamAggr.config
is set, then the de-duplication is performed individually per each stream aggregation config for the matching samples after applying input_relabel_configs. -
By specifying
dedup_interval
option individually per each stream aggregation config in-remoteWrite.streamAggr.config
or-streamAggr.config
configs.
-
It is possible to drop the given labels before applying the de-duplication. See these docs.
The online de-duplication uses the same logic as -dedup.minScrapeInterval
command-line flag at VictoriaMetrics.
Ignoring old samples
By default, all the input samples are taken into account during stream aggregation. If samples with old timestamps outside the current aggregation interval must be ignored, then the following options can be used:
-
To pass
-streamAggr.ignoreOldSamples
command-line flag to single-node VictoriaMetrics or to vmagent. At vmagent-remoteWrite.streamAggr.ignoreOldSamples
flag can be specified individually per each-remoteWrite.url
. This enables ignoring old samples for all the aggregation configs. -
To set
ignore_old_samples:
true
option at the particular aggregation config. This enables ignoring old samples for that particular aggregation config.
Ignore aggregation intervals on start
Streaming aggregation results may be incorrect for some time after the restart of vmagent
or single-node VictoriaMetrics until all the buffered samples
are sent from remote sources to the vmagent
or single-node VictoriaMetrics via supported data ingestion protocols.
In this case it may be a good idea to drop the aggregated data during the first N
aggregation intervals
just after the restart of vmagent
or single-node VictoriaMetrics. This can be done via the following options:
-
Set
-streamAggr.ignoreFirstIntervals=<intervalsCount>
command-line flag to single-node VictoriaMetrics or to vmagent to skip first<intervalsCount>
aggregation intervals from persisting to the storage. At vmagent-remoteWrite.streamAggr.ignoreFirstIntervals=<intervalsCount>
flag can be specified individually per each-remoteWrite.url
. It is expected that all incomplete or queued data will be processed during specified<intervalsCount>
and all subsequent aggregation intervals will produce correct data. -
Set
ignore_first_intervals: <intervalsCount>
option individually per aggregation config. This enables ignoring first<intervalsCount>
aggregation intervals for that particular aggregation config.
Flush time alignment
By default, the time for aggregated data flush is aligned by the interval
option.
For example:
- if
interval: 1m
is set, then the aggregated data is flushed to the storage at the end of every minute - if
interval: 1h
is set, then the aggregated data is flushed to the storage at the end of every hour
If you do not need such an alignment, then set no_align_flush_to_interval:
true
option in the aggregate config.
In this case aggregated data flushes will be aligned to the vmagent
start time or to config reload time.
The aggregated data on the first and the last interval is dropped during vmagent
start, restart or config reload,
since the first and the last aggregation intervals are incomplete, so they usually contain incomplete confusing data.
If you need preserving the aggregated data on these intervals, then set flush_on_shutdown:
true
option.
See also:
Output metric names
Output metric names for stream aggregation are constructed according to the following pattern:
<metric_name>:<interval>[_by_<by_labels>][_without_<without_labels>]_<output>
<metric_name>
is the original metric name.<interval>
is theinterval
specified in the stream aggregation config.<by_labels>
is_
-delimited sorted list ofby
labels. If theby
list is missing in the config, then the_by_<by_labels>
part isn't included in the output metric name.<without_labels>
is an optional_
-delimited sorted list ofwithout
labels specified in the stream aggregation config. If thewithout
list is missing in the config, then the_without_<without_labels>
part isn't included in the output metric name.<output>
is the aggregate used for constructing the output metric. The aggregate name is taken from theoutputs
list at the corresponding stream aggregation config.
Both input and output metric names can be modified if needed via relabeling according to these docs.
It is possible to leave the original metric name after the aggregation by specifying keep_metric_names:
true
option at stream aggregation config.
The keep_metric_names
option can be used if only a single output is set in outputs
list.
Relabeling
It is possible to apply arbitrary relabeling to input and output metrics
during stream aggregation via input_relabel_configs
and output_relabel_configs
options in stream aggregation config.
Relabeling rules inside input_relabel_configs
are applied to samples matching the match
filters before optional deduplication.
Relabeling rules inside output_relabel_configs
are applied to aggregated samples before sending them to the remote storage.
For example, the following config removes the :1m_sum_samples
suffix added to the output metric name:
- interval: 1m
outputs: [sum_samples]
output_relabel_configs:
- source_labels: [__name__]
target_label: __name__
regex: "(.+):.+"
Another option to remove the suffix, which is added by stream aggregation, is to add keep_metric_names:
true
to the config:
- interval: 1m
outputs: [sum_samples]
keep_metric_names: true
See also dropping unneeded labels.
Dropping unneeded labels
If you need dropping some labels from input samples before input relabeling, de-duplication and stream aggregation, then the following options exist:
-
To specify comma-separated list of label names to drop in
-streamAggr.dropInputLabels
command-line flag or via-remoteWrite.streamAggr.dropInputLabels
individually per each-remoteWrite.url
. For example,-streamAggr.dropInputLabels=replica,az
instructs to dropreplica
andaz
labels from input samples before applying de-duplication and stream aggregation. -
To specify
drop_input_labels
list with the labels to drop. For example, the following config dropsreplica
label from input samples with the nameprocess_resident_memory_bytes
before calculating the average over one minute:- match: process_resident_memory_bytes interval: 1m drop_input_labels: [replica] outputs: [avg] keep_metric_names: true
Typical use case is to drop replica
label from samples, which are received from high availability replicas.
Aggregating by labels
All the labels for the input metrics are preserved by default in the output metrics. For example,
the input metric foo{app="bar",instance="host1"}
results to the output metric foo:1m_sum_samples{app="bar",instance="host1"}
when the following stream aggregation config is used:
- interval: 1m
outputs: [sum_samples]
The input labels can be removed via without
list specified in the config. For example, the following config
removes the instance
label from output metrics by summing input samples across all the instances:
- interval: 1m
without: [instance]
outputs: [sum_samples]
In this case the foo{app="bar",instance="..."}
input metrics are transformed into foo:1m_without_instance_sum_samples{app="bar"}
output metric according to output metric naming.
It is possible specifying the exact list of labels in the output metrics via by
list.
For example, the following config sums input samples by the app
label:
- interval: 1m
by: [app]
outputs: [sum_samples]
In this case the foo{app="bar",instance="..."}
input metrics are transformed into foo:1m_by_app_sum_samples{app="bar"}
output metric according to output metric naming.
The labels used in by
and without
lists can be modified via input_relabel_configs
section - see these docs.
See also aggregation outputs.