Commit graph

40 commits

Author SHA1 Message Date
Aliaksandr Valialkin
1cedaf61cb
app/{vmagent,vminsert}: add an ability to ignore input samples outside the current aggregation interval for stream aggregation
See https://docs.victoriametrics.com/stream-aggregation.html#ignoring-old-samples
2024-03-17 23:03:47 +02:00
Andrii Chubatiuk
15e33d56f1
lib/streamaggr: pick sample with bigger timestamp or value on deduplicator (#5939)
Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication
This would require more memory for deduplication, since we need to track timestamp
for each record. However, deduplication should become more consistent.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5643

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-03-12 22:47:29 +01:00
Aliaksandr Valialkin
da611ad628
app/{vmagent,vminsert}: add -streamAggr.dropInputSamples command-line flag for dropping the specified labels from input samples before deduplication and streaming aggregation 2024-03-05 02:15:01 +02:00
Aliaksandr Valialkin
ed523b5bbc
app/{vminsert,vmagent}: allow using -streamAggr.dedupInterval without -streamAggr.config
This allows performing online de-duplication of incoming samples
2024-03-05 00:45:30 +02:00
Aliaksandr Valialkin
22d63ac7cd
lib/streamaggr: do not reset aggregation state after the aggregation took longer than the configured interval
It is better from user PoV preserving this state until the next flush
2024-03-04 20:03:06 +02:00
Aliaksandr Valialkin
32653db7d5
lib/streamaggr: add missing "s" suffix in the warning message when the de-duplication or aggregation couldnt be finished in a timely manner 2024-03-04 19:37:58 +02:00
Aliaksandr Valialkin
6319d029a8
lib/streamaggr: benchmark only flush routines in BenchmarkDedupAggrFlushSerial and BenchmarkAggregatorsFlushSerial 2024-03-04 19:12:28 +02:00
Aliaksandr Valialkin
074abd5bee
Revert "lib/streamaggr: do not flush dedup shards in parallel"
This reverts commit eb40395a1c.

Reason for revert: it has been appeared that the performance gain on multiple CPU cores
wasn't visible because the benchmark was generating incorrect pushSample.key.

See a207e0bf687d65f5198207477248d70c69284296
2024-03-04 19:12:28 +02:00
Aliaksandr Valialkin
e70177c5fb
lib/streamaggr: properly generate pushSample.key in benchmarks 2024-03-04 19:12:27 +02:00
Aliaksandr Valialkin
eb40395a1c
lib/streamaggr: do not flush dedup shards in parallel
This significantly increases CPU usage on systems with many CPU cores, while doesn't reduce flush latency too much
2024-03-04 17:00:20 +02:00
Aliaksandr Valialkin
946814afee
lib/streamaggr: reduce memory allocations when registering new series in deduplication and aggregation structs 2024-03-04 17:00:19 +02:00
Aliaksandr Valialkin
925f60841f
lib/streamaggr: make aggregate.runFlusher() more roubst and clear 2024-03-04 17:00:19 +02:00
Aliaksandr Valialkin
aa5e7e268c
lib/streamaggr: properly drop samples on the first incomplete interval
Previously samples were dropped on the first incomplete interval and the next complete interval.
Also make sure that the de-duplication is performed just before flushing the aggregate state.
This should help the case then dedup_interval = interval.
2024-03-04 17:00:18 +02:00
Aliaksandr Valialkin
86494518da
lib/streamaggr: explicitly call resetSeries after flushSeries
This makes the code less fragile
2024-03-04 06:01:18 +02:00
Aliaksandr Valialkin
ac3cf3f357
lib/streamaggr: enable time alignment for aggregate flushed to multiples of interval
For example, if `interval: 1m`, then data flush occurs at the end of every minute,
while `interval: 1h` leads to data flush at the end of every hour.

Add `no_align_flush_to_interval` option, which can be used for disabling the alignment.
2024-03-04 05:42:58 +02:00
Aliaksandr Valialkin
138a4d1c2b
lib/streamaggr: ignore the first sample in new time series during staleness_interval seconds after the stream aggregation start for total and increase outputs 2024-03-04 01:49:26 +02:00
Aliaksandr Valialkin
0422ae01ba
lib/streamaggr: flush dedup state and aggregation state in parallel on all the available CPU cores
This should reduce the time needed for aggregation state flush on systems with many CPU cores
2024-03-04 01:21:50 +02:00
Aliaksandr Valialkin
28a9e92b5e
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
  for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
  deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
  metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
  per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
  This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
  - vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
  - vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
  - vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
  - vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
  - vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
  - vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
  - vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
    which took longer than the configured interval
  - vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
    which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md

The memory usage reduction increases CPU usage during stream aggregation by up to 30%.

This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 02:42:50 +02:00
Aliaksandr Valialkin
eb8e95516f
lib/streamaggr: allow one second aggregation interval 2024-03-01 21:33:16 +02:00
hagen1778
98b805544e
lib/streamaggr: fix incorrect err message for min interval value
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 09:53:05 +01:00
Roman Khavronenko
aaa526e8ff
lib/streamaggr: skip unfinished aggregation state on shutdown by default (#5689)
Sending unfinished aggregate states tend to produce unexpected anomalies with lower values than expected.
The old behavior can be restored by specifying `flush_on_shutdown: true` setting in streaming aggregation config

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:45:23 +01:00
Aliaksandr Valialkin
f888a019fe
lib/streamaggr: expand %{ENV} placeholders in stream aggregation configs 2024-01-24 12:31:27 +02:00
Aliaksandr Valialkin
1f105dde98
all: allow dynamically reading *AuthKey flag values from files and urls
Examples:

1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath
2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath
3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url

The flag value is automatically updated when the file contents changes.
2024-01-21 22:03:38 +02:00
noodles2hg
8efe694160
lib/streamaggr/streamaggr.go: fix link in error message (#5439) 2023-12-08 16:55:05 +03:00
Nikolay
41f7940f97
lib/streamaggr: properly reference slice with labels (#5406)
* lib/streamaggr: properly reference slice with labels
by limiting slice capacity. It must fix issues with slice modification, in case of append new slice will be allocated, instead of modifying refrenced slice
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5402

* Reduce memory allocations when output_relabel_configs adds new labels to output samples

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-29 10:03:04 +02:00
Alexander Marshalov
33484d3365
lib/streamaggr: respect streamAgg.dropInput with empty stream aggr config (#5213)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5207
2023-10-20 15:55:58 +02:00
Aliaksandr Valialkin
52c13e9515
lib/streamaggr: follow-up for 736197179e
- Use a byte slice instead of a map for tracking indexes for matching series.
  This improves performance, since access by slice index is faster than access by map key.
- Re-use the byte slice for tracking indexes for matching series.
  This removes unnecessary memory allocations and improves stream aggregation performance a bit.
- Add an ability to return to the previous behvaiour by specifying -remoteWrite.streamAggr.dropInput command-line flag.
  In this case all the input samples are dropped when stream aggregation is enabled.
- Backport the new stream aggregation behaviour from vmagent to single-node VictoriaMetrics when -streamAggr.config
  option is set.
- Improve docs regarding this change at docs/CHANGELOG.md
- Document the new behavior at docs/stream-aggregation.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4243
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4575
2023-07-24 17:05:26 -07:00
Zakhar Bessarab
736197179e
{lib/streamaggr,vmagent/remotewrite}: breaking change for keepInput flag (#4575)
* {lib/streamaggr,vmagent/remotewrite}: breaking change for keepInput flag

Changes default behaviour of keepInput flag to write series which did not match any aggregators to the remote write.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4243

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update app/vmagent/remotewrite/remotewrite.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-07-24 16:33:30 -07:00
Aliaksandr Valialkin
01a2859f43
lib/streamaggr: skip de-duplication for series, which do not match the configured aggregation rules
Previously all the incoming samples were de-duplicated, even if their series doesn't
match aggregation rule filters. This could result in increased CPU usage.

Now the de-duplication isn't applied to samples for series, which do not match
aggregation rule filters. Such samples are just ignored.
2023-07-22 16:42:34 -07:00
Alexander Marshalov
70773f53d7
allow configuring staleness interval in stream aggregation (#4667) (#4670)
---------

Signed-off-by: Alexander Marshalov <_@marshalov.org>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-07-20 16:07:33 +02:00
Aliaksandr Valialkin
887555669e
Revert "lib/streamaggr: discard samples with timestamps outside of aggregation interval (#4199)"
This reverts commit 9e99f2f5b3.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4068

Reason for revert: this breaks valid use cases:

- If timestamps aren't specified in the incoming samples on purpose. For example, if stream aggregation is used
  as StatsD replacement. StatsD protocol has no timestamp concept for incoming samples.
  See https://github.com/b/statsd_spec

- If all the samples must be aggregated, even if they contain stale timestamps.
  for example, if the stream aggregation produces some counter of some events,
  it may be better to count all the events even if they were delayed before
  being ingested into VictoriaMetrics.

Is is also unclear how to determine whether the sample becomes stale.
For example, if the aggregation interval equals to 1h, and the previous
aggregation cycle just finished 10 minutes ago, what to do with the newly
incoming sample with the timestamp 30 minutes older than the current time?
The answer highly depends on the context, so it is unsafe to uncoditionally
use a single logic for dropping the old samples here.
2023-05-08 16:52:27 -07:00
Zakhar Bessarab
9e99f2f5b3
lib/streamaggr: discard samples with timestamps outside of aggregation interval (#4199)
* lib/streamaggr: discard samples with timestamps not matching aggregation interval

Samples with timestamps lower than `now - aggregation_interval` are likely to be written via backfilling and should not be used for calculation of aggregation.
See #4068

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/streamaggr: make log message more descriptive, fix imports

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-27 11:59:49 +02:00
Aliaksandr Valialkin
d577657fb7
lib/streamaggr: follow-up for ff72ca14b9
- Make sure that the last successfully loaded config is used on hot-reload failure
- Properly cleanup resources occupied by already initialized aggregators
  when the current aggregator fails to be initialized
- Expose distinct vmagent_streamaggr_config_reload* metrics per each -remoteWrite.streamAggr.config
  This should simplify monitoring and debugging failed reloads
- Remove race condition at app/vminsert/common.MustStopStreamAggr when calling sa.MustStop() while sa
  could be in use at realoadSaConfig()
- Remove lib/streamaggr.aggregator.hasState global variable, since it may negatively impact scalability
  on system with big number of CPU cores at hasState.Store(true) call inside aggregator.Push().
- Remove fine-grained aggregator reload - reload all the aggregators on config change instead.
  This simplifies the code a bit. The fine-grained aggregator reload may be returned back
  if there will be demand from real users for it.
- Check -relabelConfig and -streamAggr.config files when single-node VictoriaMetrics runs with -dryRun flag
- Return back accidentally removed changelog for v1.87.4 at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3639
2023-03-31 22:30:38 -07:00
Alexander Marshalov
ff72ca14b9
added hot reload support for stream aggregation configs (#3969) (#3970)
added hot reload support for stream aggregation configs (#3969)

Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-03-29 18:05:58 +02:00
Oleksandr Redko
9fff48c3e3
app,lib: fix typos in comments (#3804) 2023-02-13 13:27:13 +01:00
Aliaksandr Valialkin
d655d6b047
lib/streamaggr: add ability to de-duplicate input samples before aggregation 2023-01-25 09:14:49 -08:00
Aliaksandr Valialkin
5fe7ff24c2
lib/streamaggr: limit the the number of concurrent flushes of the aggregate data to the exact number of available CPUs
This should reduce the maximum memory usage during concurrent flushes of the aggregate data
2023-01-07 00:18:51 -08:00
Aliaksandr Valialkin
5c4bd4f7c1
lib/streamaggr: limit the number of concurrent flushes of aggregate metrics in order to limit memory usage 2023-01-06 22:39:13 -08:00
Aliaksandr Valialkin
0e1f0ade31
lib/streamaggr: sort by and without labels in the aggregate output metric name
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3460
2023-01-05 02:08:44 -08:00
Aliaksandr Valialkin
fa13bbc48a
app/{vmagent,vminsert}: add support for streaming aggregation
See https://docs.victoriametrics.com/stream-aggregation.html

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3460
2023-01-03 22:19:21 -08:00