Commit graph

21 commits

Author SHA1 Message Date
Ivan Yurochko
5dd879cd17
lib/streamaggr: add ignore_first_sample_interval param for streamaggr cfg (#7313)
### Describe Your Changes

As of right now by default aggregated output in streaming aggregation
takes a staleness interval and only starts sending first samples after
the staleness interval passes. We have a use case where we prefer to
start sending data as soon as we have any. This adds the option to
configure when we start sending first samples

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7116

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-11-21 16:20:22 +01:00
Aliaksandr Valialkin
04981c7a7f
lib/streamaggr: remove resetState arg from aggrState.flushState()
The resetState arg was used only for the BenchmarkAggregatorsFlushInternalSerial benchmark.
This benchmark was testing aggregate state flush performance by keeping the same state across flushes.
The benhmark didn't reflect the performance and scalability of stream aggregation in production,
while it led to non-trivial code changes related to resetState arg handling.

So let's drop the benchmark together with all the code related to resetState handling,
in order to simplify the code at lib/streamaggr a bit.

Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314
2024-08-07 11:39:14 +02:00
Aliaksandr Valialkin
86c7afd126
lib/streamaggr: consistently use the same timestamp across all the output aggregated samples in a single aggregation interval
Prevsiously every aggregation output was using its own timestamp for the output aggregated samples
in a single aggregation interval. This could result in unexpected inconsitent timesetamps for the output
aggregated samples.

This commit consistently uses the same timestamp across all the output aggregated samples.
This commit makes sure that the duration between subsequent timestamps strictly equals
the configured aggregation interval.

Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314
This commit should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580
2024-08-07 11:39:13 +02:00
Aliaksandr Valialkin
db557b86ee
app/vmagent/remotewrite: follow-up for f153f54d11
- Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go .
  This improves code maintainability a bit.

- Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs().
  This prevents from potential resource leaks.

- Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation.
  This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions.

- Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label
  in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics.

- Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation.
  This label should simplify investigation of the exposed metrics.

- Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability:
  it is hard to use these labels in query filters and aggregation functions.

- Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` .
  This metric shows the number of samples generated by the corresponding streaming aggregation rule.
  This metric has been added in the commit 861852f262 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

- Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice.
  This metric has been added in the commit 861852f262 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

- Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params,
  which could modify the behaviour of the constructed streaming aggregator.
  Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory.

- Pass Options arg to LoadFromFile() function by reference, since this structure is quite big.
  This also allows passing nil instead of Options when default options are enough.

- Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics,
  so they have consistent set of labels comparing to the rest of streaming aggregation metrics.

- Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding
  `aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding
  `output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output
  metrics. This is a follow-up for the commit 2eb1bc4f81 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604

- Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users
  figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason
  in the commit 2eb1bc4f81 .

- Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function.
  While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268
2024-07-15 20:24:01 +02:00
Aliaksandr Valialkin
5354374b62
lib/streamaggr: follow-up for 9c3d44c8c9
- Consistently enumerate stream aggregation outputs in alphabetical order across the source code and docs.
  This should simplify future maintenance of the corresponding code and docs.

- Fix the link to `rate_sum()` at `see also` section of `rate_avg()` docs.

- Make more clear the docs for `rate_sum()` and `rate_avg()` outputs.

- Encapsulate output metric suffix inside rateAggrState. This eliminates possible bugs related
  to incorrect suffix passing to newRateAggrState().

- Rename rateAggrState.total field to less misleading rateAggrState.increase name, since it calculates
  counter increase in the current aggregation window.

- Set rateLastValueState.prevTimestamp on the first sample in time series instead of the second sample.
  This makes more clear the code logic.

- Move the code for removing outdated entries at rateAggrState into removeOldEntries() function.
  This make the code logic inside rateAggrState.flushState() more clear.

- Do not write output sample with zero value if there are no input series, which could be used
  for calculating the rate, e.g. if only a single sample is registered for every input series.

- Do not take into account input series with a single registered sample when calculating rate_avg(),
  since this leads to incorrect results.

- Move {rate,total}AggrState.flushState() function to the end of rate.go and total.go files, so they look more similar.
  This shuld simplify future mantenance.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6243
2024-07-15 08:40:09 +02:00
Aliaksandr Valialkin
3c02937a34
all: consistently use 'any' instead of 'interface{}'
'any' type is supported starting from Go1.18. Let's consistently use it
instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.
2024-07-10 00:20:37 +02:00
Andrii Chubatiuk
861852f262
lib/streamaggr: added stale samples metric, added metrics labels (#6462)
### Describe Your Changes

- added stale metrics counters for input and output samples
- added labels for aggregator metrics =>
`name="{rwctx}:{aggrId}:{aggrSuffix}"`
   - rwctx - global or number starting from 1
   - aggrid - aggregator id starting from 1
   - aggrSuffix - <interval>_(by|without)_label1_label2_labeln
   e.g: `name="global:1:1m_without_instance_pod"`

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-07-01 14:56:17 +02:00
Aliaksandr Valialkin
36be090cd5
lib/streamaggr: follow-up for 7cb894a777
- Use bytesutil.InternString() instead of strings.Clone() for inputKey and outputKey in aggregatorpushSamples().
  This should reduce string allocation rate, since strings can be re-used between aggrState flushes.
- Reduce memory allocations at dedupAggrShard by storing dedupAggrSample by value in the active series map.
- Remove duplicate call to bytesutil.InternBytes() at Deduplicator, since it is already called inside dedupAggr.pushSamples().
- Add missing string interning at rateAggrState.pushSamples().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6402
2024-06-07 16:27:26 +02:00
Roman Khavronenko
7cb894a777
lib/streamaggr: reduce number of inuse objects (#6402)
The main change is getting rid of interning of sample key. It was
discovered that for cases with many unique time series aggregated by
vmagent interned keys could grow up to hundreds of millions of objects.
This has negative impact on the following aspects:
1. It slows down garbage collection cycles, as GC has to scan all inuse
objects periodically. The higher is the number of inuse objects, the
longer it takes/the more CPU it takes.
2. It slows down the hot path of samples aggregation where each key
needs to be looked up in the map first.

The change makes code more fragile, but suppose to provide performance
optimization for heavy-loaded vmagents with stream aggregation enabled.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-06-07 15:45:52 +02:00
Andrii Chubatiuk
9c3d44c8c9
lib/streamaggr: added rate and rate_avg output (#6243)
Added `rate` and `rate_avg` output
Resource usage is the same as for increase output, tested on a benchmark

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-05-13 15:39:49 +02:00
Roman Khavronenko
8a03e987cb
lib/streamaggr: set correct suffix <output>_prometheus (#6228)
Set correct suffix `<output>_prometheus` for aggregation outputs
`increase_prometheus` and `total_prometheus`
Before, outputs `total` and `total_prometheus` or `increase` and
`increase_prometheus` had the same suffix.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-08 13:11:30 +02:00
Aliaksandr Valialkin
6a465f6e29
lib/streamaggr: ignore out of order samples when calculating increase, increase_prometheus, total and total_prometheus outputs
Out of order samples may result in unexpected spikes for these outputs.
So it is better to ignore such samples.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931
2024-03-17 22:03:03 +02:00
Aliaksandr Valialkin
6319d029a8
lib/streamaggr: benchmark only flush routines in BenchmarkDedupAggrFlushSerial and BenchmarkAggregatorsFlushSerial 2024-03-04 19:12:28 +02:00
Aliaksandr Valialkin
b232968bb4
lib/streamaggr: reduce the number of pointers at "total" aggregation state
This should reduce load on GC when scanning heap objects.
2024-03-04 19:12:27 +02:00
Aliaksandr Valialkin
946814afee
lib/streamaggr: reduce memory allocations when registering new series in deduplication and aggregation structs 2024-03-04 17:00:19 +02:00
Aliaksandr Valialkin
138a4d1c2b
lib/streamaggr: ignore the first sample in new time series during staleness_interval seconds after the stream aggregation start for total and increase outputs 2024-03-04 01:49:26 +02:00
Aliaksandr Valialkin
0422ae01ba
lib/streamaggr: flush dedup state and aggregation state in parallel on all the available CPU cores
This should reduce the time needed for aggregation state flush on systems with many CPU cores
2024-03-04 01:21:50 +02:00
Aliaksandr Valialkin
28a9e92b5e
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
  for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
  deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
  metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
  per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
  This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
  - vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
  - vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
  - vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
  - vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
  - vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
  - vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
  - vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
    which took longer than the configured interval
  - vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
    which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md

The memory usage reduction increases CPU usage during stream aggregation by up to 30%.

This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 02:42:50 +02:00
Aliaksandr Valialkin
9763e2295b
lib/streamaggr: follow up for 70773f53d7
- Round staleness_interval durations to the upper number of seconds.
  This should prevent from under-calculations for fractional staleness intervals.
- Rename stalenessInterval field at *AggrState structs into stalenessSecs, since it holds seconds.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4667
2023-07-20 21:44:24 -07:00
Alexander Marshalov
70773f53d7
allow configuring staleness interval in stream aggregation (#4667) (#4670)
---------

Signed-off-by: Alexander Marshalov <_@marshalov.org>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-07-20 16:07:33 +02:00
Aliaksandr Valialkin
fa13bbc48a
app/{vmagent,vminsert}: add support for streaming aggregation
See https://docs.victoriametrics.com/stream-aggregation.html

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3460
2023-01-03 22:19:21 -08:00