- Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go .
This improves code maintainability a bit.
- Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs().
This prevents from potential resource leaks.
- Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation.
This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions.
- Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label
in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics.
- Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation.
This label should simplify investigation of the exposed metrics.
- Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability:
it is hard to use these labels in query filters and aggregation functions.
- Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` .
This metric shows the number of samples generated by the corresponding streaming aggregation rule.
This metric has been added in the commit 861852f262 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462
- Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice.
This metric has been added in the commit 861852f262 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462
- Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params,
which could modify the behaviour of the constructed streaming aggregator.
Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory.
- Pass Options arg to LoadFromFile() function by reference, since this structure is quite big.
This also allows passing nil instead of Options when default options are enough.
- Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics,
so they have consistent set of labels comparing to the rest of streaming aggregation metrics.
- Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding
`aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding
`output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output
metrics. This is a follow-up for the commit 2eb1bc4f81 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604
- Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users
figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason
in the commit 2eb1bc4f81 .
- Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function.
While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268
- Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage
systems are configured with the disabled on-disk queue, every in-memory queue is full
and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common,
so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx
relabeling and other processing in this case.
- Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped().
Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set.
In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full
and on-disk queue is disabled.
The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing
the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage
processes pending data, so it drops aggregated samples in this case.
- Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output,
so it is clear that this flag can be set individually per each -remoteWrite.url.
- Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems
are configured with the disabled on-disk queue, then there is no sense in keeping samples
on some of these systems, while dropping samples on the remaining systems, since this
will result in global stall on the remote storage system with the disabled on-disk queue
and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false
from remotewrite.TryPush() in this case. This will result in infinite duplicate samples
written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload
is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set.
This allows proceeding with newly scraped / pushed samples by sending them to the remaining
remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set.
- Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test.
- Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url.
See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065
- Export streamaggr.LoadFromData() function, so it could be used in tests outside the lib/streamaggr package.
This allows removing a hack with creation of temporary files at TestRemoteWriteContext_TryPush_ImmutableTimeseries.
- Move common code for mustParsePromMetrics() function into lib/prompbmarshal package,
so it could be used in tests for building []prompbmarshal.TimeSeries from string.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6206
### Describe Your Changes
- added stale metrics counters for input and output samples
- added labels for aggregator metrics =>
`name="{rwctx}:{aggrId}:{aggrSuffix}"`
- rwctx - global or number starting from 1
- aggrid - aggregator id starting from 1
- aggrSuffix - <interval>_(by|without)_label1_label2_labeln
e.g: `name="global:1:1m_without_instance_pod"`
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 861852f262)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html):
allow configuring `-remoteWrite.disableOnDiskQueue` and
`-remoteWrite.dropSamplesOnOverload` cmd-line flags per each
`-remoteWrite.url`. See this [pull
request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065).
Thanks to @rbizos for implementaion!
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add
labels `path` and `url` to metrics
`vmagent_remotewrite_push_failures_total` and
`vmagent_remotewrite_samples_dropped_total`. Now number of failed pushes
and dropped samples can be tracked per `-remoteWrite.url`.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Raphael Bizos <r.bizos@criteo.com>
(cherry picked from commit 87fd400dfc)
When at least one remote write has deduplication configured it cleans up
timeseries while they can be in use by another remote write without
deduplication
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 879771808b)
- Make the configuration more clear by accepting the list of ignored labels during sharding
via a dedicated command-line flag - -remoteWrite.shardByURL.ignoreLabels.
This prevents from overloading the meaning of -remoteWrite.shardByURL.labels command-line flag.
- Removed superfluous memory allocation per each processed sample if sharding by remote storage is enabled.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5938