github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-04-20 16:09:25 +00:00

Author	SHA1	Message	Date
Max Kotliar	b7b38b9551	Follow up to "vmagent/client: Use VictoriaMetrics remote write protocol by default, downgrade to Prometheus if needed" (#8706 ) ### Describe Your Changes Follow-up to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462 Addressed review comments: - Log panic with FATAL prefix to indicate possible on-disk data corruption - Moved version bump line to the tip block (v1.114.0 has already been released) in changelog - Removed duplicate vmagent entry from targets list from Makefile ### Checklist The following checks are mandatory: - [x] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2025-04-14 11:13:22 +02:00
Max Kotliar	01fff44a8d	vmagent/client: Use VictoriaMetrics remote write protocol by default, downgrade to Prometheus if needed (#8462 ) This commit improves how vmagent selects the remote write protocol. Previously, vmagent [performed a handshake probe](`0ff1a3b154/lib/protoparser/protoparserutil/vmproto_handshake.go (L11)`) at [startup](`0ff1a3b154/app/vmagent/remotewrite/client.go (L173)`): - If the probe succeeded, it used the VictoriaMetrics (VM) protocol. - If the probe failed, it downgraded to the Prometheus protocol. - No protocol changes occurred after the initial probe at runtime. However, this approach had limitations: - If vmstorage was unavailable during vmagent startup, vmagent would immediately downgrade to the Prometheus protocol, leading to higher network usage unitl vmagent restarted. This case has been reported in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7615. - If the remote write server was updated or downgraded (e.g., during a fallback or migration), vmagent would not detect the protocol change. It would continue retrying failed requests and eventually drop them. Require a restart of vmagent to pick up the new protocol. This commit introduces a more adaptive mechanism. vmagent always starts with the VM protocol and downgrades to the Prometheus protocol only if an unsupported media type or bad request response is received. When this happens, the protocol is downgraded for all future requests. In-flight requests are re-packed from Zstd to Snappy and retried immediately. Snappy-encoded requests are dropped if an unsupported media type or bad request is received (no retrying). Additionally, the in-memory and persisted queues could mix snappy and zstd encoded blocks. The proper encoding is decided before sending by encoding.IsZstd function. TODO: * [x] Add tests * [x] Update documentation * [x] Changelog * [x] Research on [content-type](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462#issuecomment-2786918054), [accept-encoding](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462#issuecomment-2786923382) Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7615#top issue. The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2025-04-14 11:13:22 +02:00
Max Kotliar	79254126f1	vmagent/remotewrite: set content encoding header based on actual body Improve remote write handling in vmagent by setting the `Content-Encoding` header based on the actual request body, rather than relying on configuration. - Detects Zstd compression via the Zstd magic number. - Falls back to Snappy if Zstd is not detected. - Persistent queue may now contain mixed-encoding content. - Add basic vmagent integration tests Follow up on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5344 and `12cd32fd75`. Extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462 Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301	2025-04-08 17:45:20 +02:00
Aliaksandr Valialkin	15e1f03940	app/vmagent: increase the default GOGC from 30 to 50 This reduces CPU usage by up to 30% in exchange of the increased RAM usage by 10% when scraping thousands of targets, which expose millions of metrics in summary. This looks like a good tradeoff after the commit `edac875179` , which reduced RAM usage by more than 10%, so the final RAM usage for vmagent is still lower than the RAM usage at v1.114.0 by ~15%, while CPU usage drops by 30%.	2025-04-01 21:08:55 +02:00
Max Kotliar	e6b00253a4	vmagent/remotewrite: fix golangci-lint code style issue ### Describe Your Changes Fixes golangci-lint issues introduced in `98f1e32e39` ``` --- a/app/vmagent/remotewrite/pendingseries.go +++ b/app/vmagent/remotewrite/pendingseries.go @@ -202,7 +202,7 @@ func (wr writeRequest) copyTimeSeries(dst, src prompbmarshal.TimeSeries) { // Pre-allocate memory for labels. labelsLen := len(wr.labels) - wr.labels = slicesutil.SetLength(wr.labels, labelsLen + len(labelsSrc)) + wr.labels = slicesutil.SetLength(wr.labels, labelsLen+len(labelsSrc)) labelsDst := wr.labels[labelsLen:] // Pre-allocate memory for byte slice needed for storing label names and values. @@ -212,7 +212,7 @@ func (wr writeRequest) copyTimeSeries(dst, src prompbmarshal.TimeSeries) { neededBufLen += len(label.Name) + len(label.Value) } bufLen := len(wr.buf) - wr.buf = slicesutil.SetLength(wr.buf, bufLen + neededBufLen) + wr.buf = slicesutil.SetLength(wr.buf, bufLen+neededBufLen) buf := wr.buf[:bufLen] // Copy labels ``` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2025-04-01 18:38:30 +04:00
Aliaksandr Valialkin	810a4f55d4	app/vmagent/remotewrite: optimize writeRequest.copyTimeSeries a bit Pre-allocate memory for labels and for the needed byte buffer used for holding the copied label names and values.	2025-04-01 16:00:30 +02:00
Roman Khavronenko	df98840167	lib/promscrape: support filtering targets via `scrapePool` GET param in `/api/v1/targets` API (#8611 ) This improves compatibility with Prometheus `/api/v1/targets` API. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5343 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> (cherry picked from commit `a2ba37be68`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-03-31 16:24:43 +02:00
Aliaksandr Valialkin	f83e780a55	lib/httputil: automatically initialize data transfer metrics for the created HTTP transports via NewTransport()	2025-03-27 15:22:15 +01:00
Aliaksandr Valialkin	88e82614bf	lib/httputil: add NewTransport() function for creating pre-initialized net/http.Transport	2025-03-26 20:16:39 +01:00
Aliaksandr Valialkin	a7b20ff241	lib: rename lib/influxutils to lib/influxutil for the sake of consistency naming of *util packages	2025-03-26 17:39:01 +01:00
Aliaksandr Valialkin	f3f9141ebb	lib: rename lib/promutils to lib/promutil for the sake of consistency for *util package naming	2025-03-26 17:33:13 +01:00
Aliaksandr Valialkin	e9bd27753b	lib/protoparser: rename lib/protoparser/datadogutils to lib/protoparser/datadogutil for the sake of consistency for *util package naming	2025-03-26 17:13:36 +01:00
Aliaksandr Valialkin	c0e9b15606	lib/protoparser: rename lib/protoparser/common to lib/protoparser/protoparserutil This improves readability of the code, which uses this package.	2025-03-18 16:40:06 +01:00
Guillem Jover	1d8b7faf71	spelling and grammar fixes via codespell (#8497 ) ### Describe Your Changes Fix many spelling errors and some grammar, including misspellings in filenames. The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`. While this is a breaking change, this metric isn't used in alerts or dashboards. So it seems to have low impact on users. The change also deprecates `cspell` as it is much heavier and less usable. --------- Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com> Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com> (cherry picked from commit `76d205feae`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-03-17 16:38:11 +01:00
Aliaksandr Valialkin	f8aeb0e7fc	app/vlinsert: follow-up for `37ed1842ab` - Properly decode protobuf-encoded Loki request if it has no Content-Encoding header. Protobuf Loki message is snappy-encoded by default, so snappy decoding must be used when Content-Encoding header is missing. - Return back the previous signatures of parseJSONRequest and parseProtobufRequest functions. This eliminates the churn in tests for these functions. This also fixes broken benchmarks BenchmarkParseJSONRequest and BenchmarkParseProtobufRequest, which consume the whole request body on the first iteration and do nothing on subsequent iterations. - Put the CHANGELOG entries into correct places, since they were incorrectly put into already released versions of VictoriaMetrics and VictoriaLogs. - Add support for reading zstd-compressed data ingestion requests into the remaining protocols at VictoriaLogs and VictoriaMetrics. - Remove the `encoding` arg from PutUncompressedReader() - it has enough information about the passed reader arg in order to properly deal with it. - Add ReadUncompressedData to lib/protoparser/common for reading uncompressed data from the reader until EOF. This allows removing repeated code across request-based protocol parsers without streaming mode. - Consistently limit data ingestion request sizes, which can be read by ReadUncompressedData function. Previously this wasn't the case for all the supported protocols. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8416 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8380 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8300	2025-03-15 00:11:58 +01:00
Zhu Jiekun	cdadd5407d	docs: revert doc change for on-disk persistence and move new content to another section (#8506 ) ### Describe Your Changes revert doc change in `815bad3687` and move new content to another section. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `9548b7e442`)	2025-03-14 12:30:07 +01:00
f41gh7	dd32d2f99d	lib/protoparser: support zstd in all logs http ingestion, datadog and otel metrics protocols (#8416 ) This commit introduces common readers for multiple compression encoding algorithms. Currently, supported encodings are: * zstd * gzip * deflat * snappy It adds new common reader to the all VictoriaLogs ingestion protocols. And updates opentelemetry metrics parsing for VictoriaMetrics components. Also, it ports zstd stream parses from cluster branch. Related issues: fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8380 fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8300 --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com> Co-authored-by: f41gh7 <nik@victoriametrics.com>	2025-03-14 00:44:50 +01:00
Zhu Jiekun	bcd775098f	app/vmagent: prevent dropping persistent queue if -remoteWrite.showURL changed Previously, if the command-line flag value `-remoteWrite.showURL` changed, vmagent dropped content of persistent queues. It's not expected behavior and may lead to data-loss at queue. Further more if command-line flag value `-remoteWrite.showURL` is set to `true`, any changes to url query arguments will lead to persistent queue drop. The most common uses is kafka and gcp pub-sub integration. It uses url query arguments for client configuration. Also, it complicates copy content of persistent queue between vmagents. Since it requires to properly change name inside metainfo.json. This commit removes persistent queue name equality check from `lib/persistentqueue`. This check was added as an additional protection from on-disk data corruption. It's safe to skip this check for vmagent, because vmagent encodes remoteWrite.url as part of path to the queue. It guarantees that there will be no collision. related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8477. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: f41gh7 <nik@victoriametrics.com> Co-authored-by: f41gh7 <nik@victoriametrics.com>	2025-03-14 00:16:52 +01:00
Roman Khavronenko	3ec0247ee3	lib/prompbmarshal: move MustParsePromMetrics to protoparser/prometheus (#8405 ) `MustParsePromMetrics` imports `lib/protoparser/prometheus`, and this package exposes the following metrics: ``` vm_protoparser_rows_read_total{type="promscrape"} vm_rows_invalid_total{type="prometheus"} ``` It means every package that uses `lib/prompbmarshal` will start exposing these metrics. For example, vlogs imports `lib/protoparser/common` which uses `lib/prompbmarshal.Label`. And only because of this vlogs starts exposing unrelated prometheus metrics on /metrics page. Moving `MustParsePromMetrics` to `lib/protoparser/prometheus` seems like the leas intrusive change. ----------- Depends on another change https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8403 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-02-27 22:55:32 +01:00
Andrii Chubatiuk	a041488786	lib/streamaggr: added aggregation windows (#6314 ) ### Describe Your Changes By default, stream aggregation and deduplication stores a single state per each aggregation output result. The data for each aggregator is flushed independently once per aggregation interval. But there's no guarantee that incoming samples with timestamps close to the aggregation interval's end will get into it. For example, when aggregating with `interval: 1m` a data sample with timestamp 1739473078 (18:57:59) can fall into aggregation round `18:58:00` or `18:59:00`. It depends on network lag, load, clock synchronization, etc. In most scenarios it doesn't impact aggregation or deduplication results, which are consistent within margin of error. But for metrics represented as a collection of series, like [histograms](https://docs.victoriametrics.com/keyconcepts/#histogram), such inaccuracy leads to invalid aggregation results. For this case, streaming aggregation and deduplication support mode with aggregation windows for current and previous state. With this mode, flush doesn't happen immediately but is shifted by a calculated samples lag that improves correctness for delayed data. Enabling of this mode has increased resource usage: memory usage is expected to double as aggregation will store two states instead of one. However, this significantly improves accuracy of calculations. Aggregation windows can be enabled via the following settings: - `-streamAggr.enableWindows` at [single-node VictoriaMetrics](https://docs.victoriametrics.com/single-server-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/vmagent/). At [vmagent](https://docs.victoriametrics.com/vmagent/) `-remoteWrite.streamAggr.enableWindows` flag can be specified individually per each `-remoteWrite.url`. If one of these flags is set, then all aggregators will be using fixed windows. In conjunction with `-remoteWrite.streamAggr.dedupInterval` or `-streamAggr.dedupInterval` fixed aggregation windows are enabled on deduplicator as well. - `enable_windows` option in [aggregation config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config). It allows enabling aggregation windows for a specific aggregator. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `c8fc903669`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-02-19 13:31:37 +01:00
f41gh7	0b987a1534	lib/cgroup: use the default GOGC=100 for the most of VictoriaMetrics components Historically some of VictoriaMetrics components were optimized for the low rate of memory allocations. These are: vmagent, single-node VictoriaMetrics and vmstorage. These components benefit from the low GOGC value, since this allow reducing their memory usage in steady state on typical workloads. Other VictoriaMetrics components aren't optimized for the reduced rate of memory allocations. This results in the increased CPU usage spent on garbage collection (GC) in these components, since it must be triggered at higher rate. See https://tip.golang.org/doc/gc-guide#GOGC for details. These components do not use too much memory, so it is OK increasing the GOGC for these components from 30 to 100 - this won't affect the most users. Keep GOGC to 30 only for vmagent, single-node VictoriaMetrics and vmstorage components. See `077193d87c` and `54b9e1d3cb` . Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7902	2024-12-23 19:44:55 +01:00
f41gh7	78ad858ff7	app/{vminsert,vmagent}: drop time series on exceeding labels limits. Previously, time series with labels exceeding the configured limits were truncated and written to storage, potentially causing data inconsistency. This could lead to collisions between time series and make it difficult to identify the source due to truncated labels. This commit changes the behavior: * Such time series are now rejected outright. * Rejected time series are logged to stdout, and corresponding counters are incremented. * removes `vm_too_long_label_values_total`, `vm_too_long_label_names_total`, `vm_metrics_with_dropped_labels_total` metrics. * adds new values `[too_many_labels,too_long_label_name,too_long_label_value]` to `reason` label of the `vm_rows_ignored_total` metric name related issues: - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6928 - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7661	2024-12-10 22:15:38 +01:00
f41gh7	f97489dcf9	app/vmagent: follow-up `430163d` and `680b8c2` Removes global defaultAuthToken, since it's no longer needed. It was added as fallback for 'remoteWrite.multitenantURL' feature. This feature was deprecated at v1.102 version and removed. Updates newRemoteWriteCtxs function, it shouldn't accept auth.Token no longer. This was also a part of remove feature. Signed-off-by: f41gh7 <nik@victoriametrics.com>	2024-11-29 15:23:21 +01:00
Nikolay	d0a508d1ca	app/vmagent: fixes multitenant token parse Previously, vmagent produced parsing error for 'multitenant' auth token value for the cases: * data ingestion with enableMultitentEndpoints * data scrapping at promscrape It's inconsistent to the other VictoriaMetrics components. Since 'multitenant' is well-known token value for multitenancy via labels. And vmagent is intended to be compatible with vminsert ingestion endpoints. This commit replaces NewToken with NewTokenPossibleMultitenant function for token parsing. It allows to use multitenant value for it. And it makes token values consistent for the all components. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7694	2024-11-29 15:23:20 +01:00
Alexander Frolov	61050e2003	app/vmagent: respect Pushgateway protocol in multi-tenant vmagent handler (#7571 ) ### Describe Your Changes fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3636 for multi-tenant vmagent handler ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-11-20 18:37:35 +01:00
Hui Wang	e96e8bd469	vmagent: fix type of command-line flag `-streamAggr.dedupInterval` (#7081 ) Previously unit `m` is not correctly supported. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-10-17 11:11:46 -03:00
Andrii Chubatiuk	019171fdfc	lib/protoparser/influx: enable batch processing by default (#7165 ) ### Describe Your Changes Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7090 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `daa7183749`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-10-15 11:51:48 +02:00
Artem Fetishev	be7dfd5ab3	app/(vmagent,vmalert)/remotewrite/client: Fix flag docs (#7198 ) ### Describe Your Changes The flags docs mention the flag that does not exist (and never existed). Perhaps that was a typo. `s/retryMaxInterval/retryMaxTime/g` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> (cherry picked from commit `e2c73dc89f`)	2024-10-11 14:25:15 +02:00
Hui Wang	ecd37cf56c	stream aggregation: support configuring multiple labels per `remoteWrite… (#7073 ) ….url` using `-remoteWrite.streamAggr.dropInputLabels` Before, labels were set to all the `remoteWrite.url`. address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6780 --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> (cherry picked from commit `fbde238cdc`)	2024-09-27 12:40:53 +02:00
Zhu Jiekun	73ae5dcfc5	vmagent: remote write respect Retry-After in header (#6124 ) ### Describe Your Changes related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6097 #### Changed - Remote write retry policy in `vmagent` is changed into: 1. Respect `Retry-After` duration if exists. 2. Otherwise, calculate next retry duration by backoff policy (x2) and max retry duration limit. #### Docs - `CHANGELOG.md`. --- ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Co-authored-by: Zakhar Bessarab <me@zekker-dev.tk> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `5319acb8ed`)	2024-09-24 16:58:16 +02:00
f41gh7	395894688c	app/*/multiarch: return back empty value for TARGETARCH follow-up after `91456ab5bb` docker buildx uses special variables, such as TARGETARCH and it shouldn't be overwritten. See this article for details https://www.docker.com/blog/faster-multi-platform-builds-dockerfile-cross-compilation-guide/ Signed-off-by: f41gh7 <nik@victoriametrics.com>	2024-09-06 18:15:22 +02:00
Aliaksandr Valialkin	ac507466c3	all: suppress InvalidDefaultArgInFrom warning emitted by `docker build` when building Docker packages via `make package-` command Recent versions of `docker build` started generating the InvalidDefaultArgInFrom warning if Dockerfile contains an ARG without default value. While this warning doesn't affect building Docker packages via `make package-` commands, it is better suppressing the warning, so it doesn't clutter `make package-*` output with the noise, which can hide real issues in the future.	2024-09-03 14:05:43 +02:00
Hui Wang	a21aea5dd4	stream aggregation: perform deduplication for all received data when … (#6711 ) …specifying `-streamAggr.dedupInterval` or `-remoteWrite.streamAggr.dedupInterval` command-line flag [The documentation](https://docs.victoriametrics.com/stream-aggregation/) contains conflicting descriptions regarding deduplication for non-matched series when `-remoteWrite.streamAggr.config` and / or `-streamAggr.config` are set: 1. Statement below says all the received data is deduplicated: >[vmagent](https://docs.victoriametrics.com/vmagent/) supports relabeling, deduplication and stream aggregation for all the received data, scraped or pushed. Then, the collected data will be forwarded to specified -remoteWrite.url destinations. The data processing order is the following: >1. all the received data is relabeled according to the specified [-remoteWrite.relabelConfig](https://docs.victoriametrics.com/vmagent/#relabeling) (if it is set) >2. all the received data is deduplicated according to specified [-streamAggr.dedupInterval](https://docs.victoriametrics.com/stream-aggregation/#deduplication) (if it is set to duration bigger than 0) 2. Another statement says the deduplication is performed individually for the matching samples >The de-deduplication is performed after applying [relabeling](https://docs.victoriametrics.com/vmagent/#relabeling) and before performing the aggregation. If the -remoteWrite.streamAggr.config and / or -streamAggr.config is set, then the de-duplication is performed individually per each [stream aggregation config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config) for the matching samples after applying [input_relabel_configs](https://docs.victoriametrics.com/stream-aggregation/#relabeling). Considering the following deduplication use cases: 1. To apply deduplication(globally or for specific remoteWrite destination) for all the received data, scraped or pushed --- using `-streamAggr.dedupInterval` or `-remoteWrite.streamAggr.dedupInterval`. 2. To deduplicate and aggregate metrics that match the rule `match` filters --- using `-remoteWrite.streamAggr.config` and specifiying `dedup_interval` option in [stream aggregation config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config). 3. To deduplicate all the received data while having `streamAggr.config` for some metrics --- no way for a single vmagent now, need to set up two level vmagents This PR implements case3. --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> (cherry picked from commit `d523015f27`)	2024-09-03 10:49:38 +02:00
Yury Akudovich	f759371c00	app/vmagent: add `remoteWrite.retryMinInterval` and `remoteWrite.retryMaxTime` flags (#6289 ) ## Describe Your Changes Add RemoteWrite Retry Controls This PR introduces two new flags to the remote write functionality: - remoteWrite.retryMinInterval - remoteWrite.retryMaxTime These flags provide finer control over the retry behavior for remoteWrite operations, allowing users to customize the minimum interval between retries and the maximum duration for retry attempts. Fixes #5486. ## Checklist - [x] The following checks are mandatory: My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Yury Akudovich <ya@matterlabs.dev> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `d0f5a9d77a`)	2024-08-23 15:28:44 +02:00
ccliu	8729052623	vmagent: resolve the issue where usePromCompatibleNaming is not working (#6776 ) Describe Your Changes When I use usePromCompatibleNaming with vmagent to process data that needs to be formatted from different sources such as InfluxDB, I find that it doesn’t work However, it works in vminsert. I found that vminsert uses the HasRelabeling method to determine whether to relabel. ```go func HasRelabeling() bool { pcs := pcsGlobal.Load() return pcs.Len() > 0 \|\| usePromCompatibleNaming } ``` in vmagent, the decision to relabel is determined only by pcsGlobal.Len() > 0. However, in the applyRelabeling method, the usePromCompatibleNaming logic is also used to determine whether to relabel in the error handling. ```go func (rctx relabelCtx) applyRelabeling(tss []prompbmarshal.TimeSeries, pcs promrelabel.ParsedConfigs) []prompbmarshal.TimeSeries { if pcs.Len() == 0 && !usePromCompatibleNaming { // Nothing to change. return tss } ``` So I think that the logic for determining whether to relabel in vmagent is not as expected. Checklist The following checks are mandatory: [✅]My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Co-authored-by: Roman Khavronenko <hagen1778@gmail.com> (cherry picked from commit `d134a310f3`)	2024-08-13 10:33:55 -04:00
Hui Wang	e74d5f266e	stream aggregation: do not allow to enable `-stream.keepInput` and `k… (#6723 ) …eep_metric_names` options in stream aggregation config together With aggregated data and raw data under the same metric, results would be confusing. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `62d19369a3`)	2024-08-13 09:08:27 -04:00
Hui Wang	13a21a3ba0	app/vmagent/remotewrite: make `-remoteWrite.streamAggr.ignoreFirstIntervals` of array type (#6744 ) Make `-remoteWrite.streamAggr.ignoreFirstIntervals` of array type so it could accept multiple values which can be applied to the corresponding`-remoteWrite.url`. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `8f5c26d788`)	2024-08-07 09:57:49 +02:00
Hui Wang	71ac65996b	app/vmagent/remotewrite: fix `-streamAggr.dropInputLabels` behavior (#6743 ) Fix `-streamAggr.dropInputLabels` behavior when global deduplication is enabled without `-streamAggr.config`. Previously, `-remoteWrite.streamAggr.dropInputLabels` is misapplied. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `4863605469`)	2024-08-07 09:57:49 +02:00
Zakhar Bessarab	0b1def6e24	app/{vminsert,vmagent}: add healthcheck for influx ingestion endpoints (#6749 ) ### Describe Your Changes This is useful for clients which validate InfluxDB is available before data ingestion can be started. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6653 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `9877a5e7d5`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-05 09:45:32 +02:00
Aliaksandr Valialkin	f8aa445945	all: consistently use stringsutil.JSONString() for formatting JSON strings with fmt.* functions instead of using "%q" formatter The %q formatter may result in incorrectly formatted JSON string if the original string contains special chars such as \x1b . They must be encoded as \u001b , otherwise the resulting JSON string cannot be parsed by JSON parsers. This is a follow-up for `c0caa69939` See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24	2024-07-17 14:01:37 +02:00
Aliaksandr Valialkin	8b76a40715	lib/httpserver: skip basic auth check for additional request paths, which should call httpserver.CheckAuthFlag() This is a follow-up for `61dce6f2a1` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6338 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329	2024-07-16 01:08:41 +02:00
Aliaksandr Valialkin	476bf400ac	lib/{httputils,netutil}: move httputils.GetStatDialFunc to netutil.NewStatDialFunc - Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call - NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil - Simplify the implementation of NewStatDialFunc by removing sync.Map from there. - Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils - Use gauge instead of counter type for *_conns metric This is a follow-up for `d7b5062917` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299	2024-07-15 23:05:46 +02:00
Aliaksandr Valialkin	cbc637d1dd	app/vmagent/remotewrite: follow-up for `f153f54d11` - Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go . This improves code maintainability a bit. - Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs(). This prevents from potential resource leaks. - Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation. This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions. - Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics. - Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation. This label should simplify investigation of the exposed metrics. - Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability: it is hard to use these labels in query filters and aggregation functions. - Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` . This metric shows the number of samples generated by the corresponding streaming aggregation rule. This metric has been added in the commit `861852f262` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 - Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice. This metric has been added in the commit `861852f262` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 - Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params, which could modify the behaviour of the constructed streaming aggregator. Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory. - Pass Options arg to LoadFromFile() function by reference, since this structure is quite big. This also allows passing nil instead of Options when default options are enough. - Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics, so they have consistent set of labels comparing to the rest of streaming aggregation metrics. - Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding `aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding `output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output metrics. This is a follow-up for the commit `2eb1bc4f81` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604 - Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason in the commit `2eb1bc4f81` . - Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function. While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268	2024-07-15 20:25:36 +02:00
Aliaksandr Valialkin	3365dd508f	app/vmagent/remotewrite: do not spend CPU time on an attempt to send data to blocked queue if some queues are unblocked Previously remotewrite.TryPush() was trying to send data to remote storages with blocked persistent queues, if some persistent queues to other remote storage systems were unblocked. This resulted in excess CPU usage on relabeling and stream aggregation for the remote storage with blocked queues. The solution is to check whether some peristent storages have blocked queues and skip them before applying per- -remoteWrite.url relabeling and streaming aggregation. While at it, properly update per- -remoteWrite.url vmagent_remotewrite_samples_dropped_total and vmagent_remotewrite_push_failures_total counters when global streaming aggregation cannot send data to remote storage systems because of blocked queues. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467 and https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268 . This is a follow-up for `87fd400dfc` and `f153f54d11` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065	2024-07-15 09:40:34 +02:00
Aliaksandr Valialkin	4921ec5604	docs/CHANGELOG.md: use new link to VictoriaMetrics cluster docs instead of old link The old link was changed globally to the new link in the commit `f4b1cbfef0` . Unfortunately, old links are still posted in new commits :( This is a follow-up for `680b8c25c8` . While at it, remove duplicate 'len(*remoteWriteURLs) > 0' check in the remotewrite.Init() functions, since this check is already made at the beginning of the function. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6253	2024-07-13 03:04:20 +02:00
Aliaksandr Valialkin	bc1f92d7f5	app/vmagent/remotewrite: follow-up for `87fd400dfc` - Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage systems are configured with the disabled on-disk queue, every in-memory queue is full and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common, so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx relabeling and other processing in this case. - Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped(). Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set. In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full and on-disk queue is disabled. The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage processes pending data, so it drops aggregated samples in this case. - Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output, so it is clear that this flag can be set individually per each -remoteWrite.url. - Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems are configured with the disabled on-disk queue, then there is no sense in keeping samples on some of these systems, while dropping samples on the remaining systems, since this will result in global stall on the remote storage system with the disabled on-disk queue and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false from remotewrite.TryPush() in this case. This will result in infinite duplicate samples written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set. This allows proceeding with newly scraped / pushed samples by sending them to the remaining remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set. - Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test. - Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url. See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065	2024-07-13 02:30:10 +02:00
Aliaksandr Valialkin	7c97cef95c	app: consistently use t.Fatal* instead of t.Error* (except of app/vmalert and app/vmctl - these packages will be processed in a separate commit) Consistently using t.Fatal* simplifies the test code and makes it less fragile, since it is common error to forget to make proper cleanup after t.Error* call. Also t.Error* calls do not provide any practical benefits when some tests fail. They just clutter test output with additional noise information, which do not help in fixing failing tests most of the time. This is a follow-up for `a9525da8a4`	2024-07-11 16:01:25 +02:00
Aliaksandr Valialkin	d6415b2572	all: consistently use 'any' instead of 'interface{}' 'any' type is supported starting from Go1.18. Let's consistently use it instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.	2024-07-10 00:23:26 +02:00
Aliaksandr Valialkin	172ae1adf7	Revert `c6c5a5a186` and `b2765c45d0` Reason for revert: There are many statsd servers exist: - https://github.com/statsd/statsd - classical statsd server - https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ ) - https://github.com/avito-tech/bioyino - high-performance statsd server - https://github.com/atlassian/gostatsd - statsd server in Go - https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd ( the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ). Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides significant advantages over the existing statsd servers, while has no significant drawbacks comparing to existing statsd servers. The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server. The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics ( see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers. So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent). Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation. In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation. This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed during querying. P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic specialized configuration for aggregating of statsd metrics. The main requirements for this configuration: - easy to write, read and update (ideally it should work out of the box for most cases without additional configuration) - hard to misconfigure (e.g. hard to shoot yourself in the foot) It would be great if this configuration will be compatible with the configuration of the most widely used statsd server. In the mean time it is recommended continue using external statsd server. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600	2024-07-03 23:57:49 +02:00
Aliaksandr Valialkin	cd152693c6	Revert "Exemplar support (#5982 )" This reverts commit `5a3abfa041`. Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below). Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ . Issues with Prometheus exemplars: - Prometheus still has only experimental support for exemplars after more than three years since they were introduced. It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature. See `0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)` and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage - It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus . - It looks like exemplars are supported for Histogram metric types only - see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar . Exemplars aren't supported for Counter, Gauge and Summary metric types. - Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query contains histogram_quantile() function. It queries exemplars via special Prometheus API - https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.) and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph. This makes the graph unusable, since it is litterally filled with thousands of exemplar dots. Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars. - Exemplars are usually connected to traces. While traces are good for some I doubt exemplars will become production-ready in the near future because of the issues outlined above. Alternative to exemplars: Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs - just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry! This doesn't work as expected in production as shown above. Are there better solutions, which work in production? Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job` and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry by these labes on the time range with the anomaly on metrics' graph. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5982	2024-07-03 16:09:18 +02:00

1 2 3 4 5 ...

685 commits