Commit graph

81 commits

Author SHA1 Message Date
Roman Khavronenko
2c0892263f
docs: update stream aggregation docs (#7202)
This PR is based on
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6777. The
differences are the following:
* it keeps backward compatibility for links
* it re-structures only original document file
* it adds #common-mistakes section, re-phrased

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
(cherry picked from commit ca2a08eabe)
2024-10-11 14:28:06 +02:00
Dima Lazerka
465c7ad045
docs: fixes misspelled typos
Also tried to make it catch "Authorisation" in the future, fixed a lot
of other misspells along the way, but didn't make it catch
"Authorisation" anyway.

- Fix misspelled "Authorization" header name
- Fix misspelled "organization"
- Fix more misspells
2024-09-13 13:19:03 +02:00
Hui Wang
a21aea5dd4
stream aggregation: perform deduplication for all received data when … (#6711)
…specifying `-streamAggr.dedupInterval` or
`-remoteWrite.streamAggr.dedupInterval` command-line flag

[The
documentation](https://docs.victoriametrics.com/stream-aggregation/)
contains conflicting descriptions regarding deduplication for
non-matched series when `-remoteWrite.streamAggr.config` and / or
`-streamAggr.config` are set:
1. Statement below says **all the received data** is deduplicated:
>[vmagent](https://docs.victoriametrics.com/vmagent/) supports
relabeling, deduplication and stream aggregation for all the received
data, scraped or pushed. Then, the collected data will be forwarded to
specified -remoteWrite.url destinations. The data processing order is
the following:
>1. all the received data is relabeled according to the specified
[-remoteWrite.relabelConfig](https://docs.victoriametrics.com/vmagent/#relabeling)
(if it is set)
>2. all the received data is deduplicated according to specified
[-streamAggr.dedupInterval](https://docs.victoriametrics.com/stream-aggregation/#deduplication)
(if it is set to duration bigger than 0)

2. Another statement says the deduplication is performed individually
for the **matching samples**
>The de-deduplication is performed after applying
[relabeling](https://docs.victoriametrics.com/vmagent/#relabeling) and
before performing the aggregation. If the -remoteWrite.streamAggr.config
and / or -streamAggr.config is set, then the de-duplication is performed
individually per each [stream aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config)
for the matching samples after applying
[input_relabel_configs](https://docs.victoriametrics.com/stream-aggregation/#relabeling).

Considering the following deduplication use cases:
1. To apply deduplication(globally or for specific remoteWrite
destination) for all the received data, scraped or pushed
--- using `-streamAggr.dedupInterval` or
`-remoteWrite.streamAggr.dedupInterval`.
2. To deduplicate and aggregate metrics that match the rule `match`
filters
--- using `-remoteWrite.streamAggr.config` and specifiying
`dedup_interval` option in [stream aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config).
3. To deduplicate all the received data while having `streamAggr.config`
for some metrics
--- no way for a single vmagent now, need to set up two level vmagents

This PR implements case3.

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit d523015f27)
2024-09-03 10:49:38 +02:00
Hui Wang
e74d5f266e
stream aggregation: do not allow to enable -stream.keepInput and `k… (#6723)
…eep_metric_names` options in stream aggregation config together

With aggregated data and raw data under the same metric, results would
be confusing.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 62d19369a3)
2024-08-13 09:08:27 -04:00
Andrii Chubatiuk
77c3bbf3fc
docs: updated guides structure, removed deprecated sort option (#6767)
### Describe Your Changes

* `sort` param is unused by the current website engine, and was present only for compatibility
with previous website engine. It is time to remove it as it makes no effect
* re-structure guides content into folders to simplify assets management

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

(cherry picked from commit 35d77a3bed)
2024-08-07 16:59:22 +02:00
Hui Wang
13a21a3ba0
app/vmagent/remotewrite: make -remoteWrite.streamAggr.ignoreFirstIntervals of array type (#6744)
Make `-remoteWrite.streamAggr.ignoreFirstIntervals` of array type so it could
 accept multiple values which can be applied to the corresponding`-remoteWrite.url`.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 8f5c26d788)
2024-08-07 09:57:49 +02:00
Aliaksandr Valialkin
83d53dfb0d
Revert "replaced global http refs with relative markdown ones (#6692)"
This reverts commit 537266363a.

Reason for revert: relative links in docs are much harder to maintain in consistent state
comparing to absolute links:

- It is non-trivial to figure out the proper relative link path when creating and editing docs.
- Relative links break after moving the doc files to another paths, and it is non-trivial
  to figure which links are broken after that.
- The updated relative links do not work properly right now in the docs.
  For example, the https://docs.victoriametrics.com/victorialogs/quickstart.md#building-from-source-code
  link at https://docs.victoriametrics.com/victorialogs/changelog/ leads to 404 page.

This is documented at https://docs.victoriametrics.com/#images-in-documentation .
2024-07-25 14:40:53 +02:00
Andrii Chubatiuk
b8d80ddae5
replaced global http refs with relative markdown ones (#6692)
### Describe Your Changes

Replaced global http links in docs with relative markdown ones

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-07-25 13:26:51 +02:00
Andrii Chubatiuk
6b97044d8a
view documentation locally (#6677)
- moved files from root to VictoriaMetrics folder to be able to mount
operator docs and VictoriaMetrics docs independently
- added ability to run website locally

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-07-25 12:27:05 +02:00
Andrii Chubatiuk
13b9a2cfc1
Code tooltip updates (#6657)
### Describe Your Changes

recently added custom `promtextmetric` and `influxtextmetric` prismjs
plugins to vmdocs to convert this
<img width="1081" alt="Screenshot 2024-07-17 at 12 13 29"
src="https://github.com/user-attachments/assets/66d4ea18-10fe-45ef-80b4-989d0eb3bd92">
to this
<img width="1087" alt="Screenshot 2024-07-17 at 12 24 34"
src="https://github.com/user-attachments/assets/60d4ab44-79e5-4c63-b966-54b989ead1aa">




### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-07-18 11:52:24 +02:00
Aliaksandr Valialkin
009f71fc4c
docs/stream-aggregation.md: an attempt to fix yaml formatting at https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config 2024-07-17 22:43:54 +02:00
Aliaksandr Valialkin
fec9e9a0a7
docs/stream-aggregation.md: clarify "Routing" chapter a bit after f153f54d11 2024-07-15 21:42:56 +02:00
Aliaksandr Valialkin
cbc637d1dd
app/vmagent/remotewrite: follow-up for f153f54d11
- Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go .
  This improves code maintainability a bit.

- Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs().
  This prevents from potential resource leaks.

- Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation.
  This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions.

- Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label
  in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics.

- Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation.
  This label should simplify investigation of the exposed metrics.

- Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability:
  it is hard to use these labels in query filters and aggregation functions.

- Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` .
  This metric shows the number of samples generated by the corresponding streaming aggregation rule.
  This metric has been added in the commit 861852f262 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

- Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice.
  This metric has been added in the commit 861852f262 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

- Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params,
  which could modify the behaviour of the constructed streaming aggregator.
  Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory.

- Pass Options arg to LoadFromFile() function by reference, since this structure is quite big.
  This also allows passing nil instead of Options when default options are enough.

- Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics,
  so they have consistent set of labels comparing to the rest of streaming aggregation metrics.

- Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding
  `aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding
  `output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output
  metrics. This is a follow-up for the commit 2eb1bc4f81 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604

- Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users
  figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason
  in the commit 2eb1bc4f81 .

- Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function.
  While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268
2024-07-15 20:25:36 +02:00
Aliaksandr Valialkin
21f049e211
lib/streamaggr: follow-up for 9c3d44c8c9
- Consistently enumerate stream aggregation outputs in alphabetical order across the source code and docs.
  This should simplify future maintenance of the corresponding code and docs.

- Fix the link to `rate_sum()` at `see also` section of `rate_avg()` docs.

- Make more clear the docs for `rate_sum()` and `rate_avg()` outputs.

- Encapsulate output metric suffix inside rateAggrState. This eliminates possible bugs related
  to incorrect suffix passing to newRateAggrState().

- Rename rateAggrState.total field to less misleading rateAggrState.increase name, since it calculates
  counter increase in the current aggregation window.

- Set rateLastValueState.prevTimestamp on the first sample in time series instead of the second sample.
  This makes more clear the code logic.

- Move the code for removing outdated entries at rateAggrState into removeOldEntries() function.
  This make the code logic inside rateAggrState.flushState() more clear.

- Do not write output sample with zero value if there are no input series, which could be used
  for calculating the rate, e.g. if only a single sample is registered for every input series.

- Do not take into account input series with a single registered sample when calculating rate_avg(),
  since this leads to incorrect results.

- Move {rate,total}AggrState.flushState() function to the end of rate.go and total.go files, so they look more similar.
  This shuld simplify future mantenance.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6243
2024-07-15 08:44:48 +02:00
Hui Wang
f3cbd62823
vmagent: fix vm_streamaggr_flushed_samples_total counter (#6604)
We use `vm_streamaggr_flushed_samples_total` to show the number of
produced samples by aggregation rule, previously it was overcounted, and
doesn't account for `output_relabel_configs`.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 2eb1bc4f81)
2024-07-12 14:19:17 +02:00
Arkadii Yakovets
8645b2cc8e
docs: add spellcheck command (#6562)
### Describe Your Changes

Implement spellcheck command:
  - add cspell configuration files
  - dockerize spellchecking process
  - add Makefile targets

This PR adds a standalone `make spellcheck` target to check `docs/*.md` files for spelling
errors. The target process is dockerized to be run in a separate npm environment.

Some `docs/` typo fixes also included.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Arkadii Yakovets <ark@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit fabf0b928e)
2024-07-11 12:40:24 +02:00
omahs
efc6b00b2c
docs: fix typos (#6600)
### Describe Your Changes

docs: fix typos

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

(cherry picked from commit 8786a08d27)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-07-09 10:52:47 +02:00
Aliaksandr Valialkin
172ae1adf7
Revert c6c5a5a186 and b2765c45d0
Reason for revert:

There are many statsd servers exist:

- https://github.com/statsd/statsd - classical statsd server
- https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ )
- https://github.com/avito-tech/bioyino - high-performance statsd server
- https://github.com/atlassian/gostatsd - statsd server in Go
- https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics

These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics
according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd (
the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target
according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ).

Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides
significant advantages over the existing statsd servers, while has no significant drawbacks comparing
to existing statsd servers.

The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server.
The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics (
see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers.
So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating
from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent).

Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server
with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation.
In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation.
This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated
statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed
during querying.

P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic
specialized configuration for aggregating of statsd metrics. The main requirements for this configuration:

- easy to write, read and update (ideally it should work out of the box for most cases without additional configuration)
- hard to misconfigure (e.g. hard to shoot yourself in the foot)

It would be great if this configuration will be compatible with the configuration of the most widely used statsd server.

In the mean time it is recommended continue using external statsd server.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600
2024-07-03 23:57:49 +02:00
Aliaksandr Valialkin
f8779d1ed2
lib/streamaggr: follow-up for the commit c0e4ccb7b5
- Clarify docs for `Ignore aggregation intervals on start` feature.

- Make more clear the code dealing with ignoreFirstIntervals at aggregator.runFlusher() functions.
  It is better from readability and maintainability PoV using distinct a.flush() calls
  for distinct cases instead of merging them into a single a.flush() call.

- Take into account the first incomplete interval when tracking the number of skipped aggregation intervals,
  since this behaviour is easier to understand by the end users.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6137
2024-07-02 21:34:48 +02:00
hagen1778
69625aa8a1
docs: mark without optional in stream aggr docs
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit f8eea0f2c9)
2024-07-01 14:56:46 +02:00
Arkadii Yakovets
a6655322b1
docs: fix docs/ and README.md spelling errors (#6362)
Fixes `docs/` and `README.md` typos and errors.

Signed-off-by: Arkadii Yakovets <ark@victoriametrics.com>

(cherry picked from commit c740a8042e)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-06-03 11:53:33 +02:00
yumeiyin
95b8cf76f8
chore: remove redundant words (#6348)
(cherry picked from commit 9289c7512d)
2024-05-29 14:37:04 +02:00
Andrii Chubatiuk
fe332c3419
app/vmagent: add global aggregator (#6268)
Add global stream aggregation for VMAgent

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467
(cherry picked from commit f153f54d11)
2024-05-17 14:01:31 +02:00
Andrii Chubatiuk
d9cddf1ad8
lib/streamaggr: added rate and rate_avg output (#6243)
Added `rate` and `rate_avg` output
Resource usage is the same as for increase output, tested on a benchmark

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 9c3d44c8c9)
2024-05-13 16:49:39 +02:00
Oleg
76af930e4a
Statsd protocol compatibility (#5053)
In this PR I added compatibility with [statsd
protocol](https://github.com/b/statsd_spec) with tags to be able to send
metrics directly from statsd clients to vmagent or directly to VM.
For example its compatible with
[statsd-instrument](https://github.com/Shopify/statsd-instrument) and
[dogstatsd-ruby](https://github.com/DataDog/dogstatsd-ruby) gems

Related issues: #5052, #206, #4600

(cherry picked from commit c6c5a5a186)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-10 14:27:31 +02:00
Hui Wang
abd29c15ab
docs: update vmalert and vmagent docs (#6207)
* restore and actualize doc section explaining duplicated labels error
* rm misleading comment about post-aggregation in stream aggregation

(cherry picked from commit e3c226cf92)
2024-04-30 10:30:19 +02:00
hagen1778
342290275e
app/streamaggr: follow-up after c0e4ccb7b5
* rm vmagent mentions from vminsert flags
* improve documentation wording, add links to related sections
* mention `ignore_first_intervals` in the stream aggr options
* update flags description
* add basic test for config parsing validation

Signed-off-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit bae3874e6a)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-22 14:39:23 +02:00
Andrii Chubatiuk
131367fb59
lib/streamaggr: add option to ignore first N aggregation intervals (#6137)
Stream aggregation may yield inaccurate results if it processes incomplete data.
This issue can arise when data is sourced from clients that maintain a queue of unsent data, such as Prometheus or vmagent.
 If the queue isn't fully cleared within the aggregation interval, only a portion of the time series may be included in that period, leading to distorted calculations.
To mitigate this we add an option to ignore first N aggregation intervals. It is expected, that client queues
will be cleared during the time while aggregation ignores first N intervals and all subsequent aggregations
will be correct.

(cherry picked from commit c0e4ccb7b5)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-22 14:34:36 +02:00
Aliaksandr Valialkin
b933001d2d
all: replace old https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html url with the new one - https://docs.victoriametrics.com/single-server-victoriametrics/ 2024-04-18 03:11:49 +02:00
Aliaksandr Valialkin
baf5c8d6d0
all: replace old https://docs.victoriametrics.com/keyConcepts.html url with the new one - https://docs.victoriametrics.com/keyconcepts/ 2024-04-18 02:34:09 +02:00
Aliaksandr Valialkin
728bcf0585
all: replace old https://docs.victoriametrics.com/stream-aggregation.html url with the new one - https://docs.victoriametrics.com/stream-aggregation/ 2024-04-18 02:20:00 +02:00
Aliaksandr Valialkin
64938732e3
all: replace old https://docs.victoriametrics.com/MetricsQL.html url with the new one - https://docs.victoriametrics.com/metricsql/ 2024-04-18 02:15:33 +02:00
Aliaksandr Valialkin
a99005eff6
all: replace old https://docs.victoriametrics.com/vmalert.html url with the new one - https://docs.victoriametrics.com/vmalert/ 2024-04-18 01:44:54 +02:00
Aliaksandr Valialkin
0211a04a52
all: replace the outdated url https://docs.victoriametrics.com/vmagent.html with the new one - https://docs.victoriametrics.com/vmagent/ 2024-04-18 01:32:57 +02:00
Aliaksandr Valialkin
cd222d6502
lib/streamaggr: ignore out of order samples for last output
This is a follow-up for 6a465f6e29

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931
2024-03-18 01:03:58 +02:00
Aliaksandr Valialkin
111d0aa2bf
app/{vmagent,vminsert}: add an ability to ignore input samples outside the current aggregation interval for stream aggregation
See https://docs.victoriametrics.com/stream-aggregation.html#ignoring-old-samples
2024-03-17 23:30:46 +02:00
Aliaksandr Valialkin
1f753a049a
lib/streamaggr: follow-up for 15e33d56f1
- Properly set pushSample.timestamp when flushing de-duplicated samples to stream aggregation
  This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931

- Re-classify this change as feature instead of bugfix at docs/CHANGELOG.md

- Verify de-duplication logic for samples with different timestamps

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5643
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5939
2024-03-17 23:23:57 +02:00
hagen1778
41a1efbea8
docs: follow-up 15e33d56f1
Update documentation according to changes in deduplication logic.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-17 23:07:51 +02:00
Aliaksandr Valialkin
abf347ec0f
docs: replace speed up with more clear accelerate wording 2024-03-12 03:03:15 +02:00
Aliaksandr Valialkin
27b9e8ed3e
app/{vmagent,vminsert}: add -streamAggr.dropInputSamples command-line flag for dropping the specified labels from input samples before deduplication and streaming aggregation 2024-03-05 02:27:27 +02:00
Aliaksandr Valialkin
c38c45d71f
app/{vminsert,vmagent}: allow using -streamAggr.dedupInterval without -streamAggr.config
This allows performing online de-duplication of incoming samples
2024-03-05 00:47:23 +02:00
Aliaksandr Valialkin
48a425898a
lib/streamaggr: enable time alignment for aggregate flushed to multiples of interval
For example, if `interval: 1m`, then data flush occurs at the end of every minute,
while `interval: 1h` leads to data flush at the end of every hour.

Add `no_align_flush_to_interval` option, which can be used for disabling the alignment.
2024-03-04 06:23:35 +02:00
Aliaksandr Valialkin
3ba9b2225e
docs/stream-aggregation.md: add troubleshooting section with solutions for common problems in streaming aggregation 2024-03-04 03:04:59 +02:00
Aliaksandr Valialkin
d0e6541f35
docs/stream-aggregation.md: typo fixes 2024-03-02 04:35:37 +02:00
Aliaksandr Valialkin
b912a45220
docs/stream-aggregation.md: remove superflouous output_relabel_configs from the config example for histogram aggregation 2024-03-02 03:36:08 +02:00
Aliaksandr Valialkin
0d5d46f9db
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
  for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
  deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
  metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
  per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
  This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
  - vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
  - vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
  - vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
  - vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
  - vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
  - vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
  - vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
    which took longer than the configured interval
  - vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
    which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md

The memory usage reduction increases CPU usage during stream aggregation by up to 30%.

This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 03:15:43 +02:00
hagen1778
216f268c1a
docs: follow-up after 491287ed15
* port un-synced changed from docs/readme to readme
* consistently use `sh` instead of `console` highlight, as it looks like
a more appropriate syntax highlight
* consistently use `sh` instead of `bash`, as it is shorter
* consistently use `yaml` instead of `yml`

See syntax codes here https://gohugo.io/content-management/syntax-highlighting/

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 17:06:26 +01:00
Roman Khavronenko
9e9f170fe7
lib/streamaggr: skip unfinished aggregation state on shutdown by default (#5689)
Sending unfinished aggregate states tend to produce unexpected anomalies with lower values than expected.
The old behavior can be restored by specifying `flush_on_shutdown: true` setting in streaming aggregation config

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:45:45 +01:00
Aliaksandr Valialkin
e6e5b97e1e
lib/streamaggr: expand %{ENV} placeholders in stream aggregation configs 2024-01-24 12:31:42 +02:00
Aliaksandr Valialkin
db6dadf1f7
docs: convert png images to webp in all the docs except of docs/operator/*
This reduces the size of docs/* folder from 33MB to 18MB

Images inside docs/operator/* must be converted at the https://github.com/VictoriaMetrics/operator/tree/master/docs
and then the updated images must be automatically propagated to the docs/operator/*

This is a follow-up for d3f919df3e

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5206
2023-11-22 19:29:47 +02:00