VictoriaMetrics/app
Hui Wang a21aea5dd4
stream aggregation: perform deduplication for all received data when … (#6711)
…specifying `-streamAggr.dedupInterval` or
`-remoteWrite.streamAggr.dedupInterval` command-line flag

[The
documentation](https://docs.victoriametrics.com/stream-aggregation/)
contains conflicting descriptions regarding deduplication for
non-matched series when `-remoteWrite.streamAggr.config` and / or
`-streamAggr.config` are set:
1. Statement below says **all the received data** is deduplicated:
>[vmagent](https://docs.victoriametrics.com/vmagent/) supports
relabeling, deduplication and stream aggregation for all the received
data, scraped or pushed. Then, the collected data will be forwarded to
specified -remoteWrite.url destinations. The data processing order is
the following:
>1. all the received data is relabeled according to the specified
[-remoteWrite.relabelConfig](https://docs.victoriametrics.com/vmagent/#relabeling)
(if it is set)
>2. all the received data is deduplicated according to specified
[-streamAggr.dedupInterval](https://docs.victoriametrics.com/stream-aggregation/#deduplication)
(if it is set to duration bigger than 0)

2. Another statement says the deduplication is performed individually
for the **matching samples**
>The de-deduplication is performed after applying
[relabeling](https://docs.victoriametrics.com/vmagent/#relabeling) and
before performing the aggregation. If the -remoteWrite.streamAggr.config
and / or -streamAggr.config is set, then the de-duplication is performed
individually per each [stream aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config)
for the matching samples after applying
[input_relabel_configs](https://docs.victoriametrics.com/stream-aggregation/#relabeling).

Considering the following deduplication use cases:
1. To apply deduplication(globally or for specific remoteWrite
destination) for all the received data, scraped or pushed
--- using `-streamAggr.dedupInterval` or
`-remoteWrite.streamAggr.dedupInterval`.
2. To deduplicate and aggregate metrics that match the rule `match`
filters
--- using `-remoteWrite.streamAggr.config` and specifiying
`dedup_interval` option in [stream aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config).
3. To deduplicate all the received data while having `streamAggr.config`
for some metrics
--- no way for a single vmagent now, need to set up two level vmagents

This PR implements case3.

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit d523015f27)
2024-09-03 10:49:38 +02:00
..
victoria-logs app/victoria-logs/Makefile: add make victoria-logs-linux-loong64 build rule 2024-07-12 23:13:19 +02:00
vlinsert app/vlinsert/elasticsearch: add fake response for logstash requests (#6742) 2024-08-06 16:30:11 +02:00
vlogsgenerator all: consistently use stringsutil.JSONString() for formatting JSON strings with fmt.* functions instead of using "%q" formatter 2024-07-17 14:01:37 +02:00
vlselect app/{vmselect,vlselect}: run make vmui-update vmui-logs-update 2024-08-28 13:38:28 +02:00
vlstorage vendor: update github.com/VictoriaMetrics/metrics from v1.34.1 to v1.35.0 2024-07-15 10:45:39 +02:00
vmagent stream aggregation: perform deduplication for all received data when … (#6711) 2024-09-03 10:49:38 +02:00
vmalert tests: fix slice init length (#6897) 2024-08-30 11:18:21 +02:00
vmalert-tool deployment: build image for vmagent streamaggr benchmark (#6515) 2024-06-24 16:29:14 +02:00
vmauth app/vmauth: verify how backend response headers are propagated to vmauth client 2024-07-27 13:45:07 +02:00
vmbackup app: consistently use t.Fatal* instead of t.Error* (except of app/vmalert and app/vmctl - these packages will be processed in a separate commit) 2024-07-11 16:01:25 +02:00
vmbackupmanager all: replace old https://docs.victoriametrics.com/vmbackupmanager.html url with the new one - https://docs.victoriametrics.com/vmbackupmanager/ 2024-04-18 02:04:39 +02:00
vmctl lib/httputils: parse URL before creating HTTP transport (#6820) 2024-08-16 11:34:49 +02:00
vmgateway all: replace old https://docs.victoriametrics.com/vmgateway.html url with the new one - https://docs.victoriametrics.com/vmgateway/ 2024-04-18 02:08:53 +02:00
vminsert app/vminsert: returns back memory optimisation (#6794) 2024-08-13 10:49:09 -04:00
vmrestore deployment: build image for vmagent streamaggr benchmark (#6515) 2024-06-24 16:29:14 +02:00
vmselect app/{vmselect,vlselect}: run make vmui-update vmui-logs-update 2024-08-28 13:38:28 +02:00
vmstorage all: consistently use stringsutil.JSONString() for formatting JSON strings with fmt.* functions instead of using "%q" formatter 2024-07-17 14:01:37 +02:00
vmui deployment/docker: update Go builder from Go1.22.5 to Go1.23.0 (#6861) 2024-08-22 23:56:12 +02:00