github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Hui Wang	e96e8bd469	vmagent: fix type of command-line flag `-streamAggr.dedupInterval` (#7081 ) Previously unit `m` is not correctly supported. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-10-17 11:11:46 -03:00
Andrii Chubatiuk	019171fdfc	lib/protoparser/influx: enable batch processing by default (#7165 ) ### Describe Your Changes Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7090 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `daa7183749`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-10-15 11:51:48 +02:00
Artem Fetishev	be7dfd5ab3	app/(vmagent,vmalert)/remotewrite/client: Fix flag docs (#7198 ) ### Describe Your Changes The flags docs mention the flag that does not exist (and never existed). Perhaps that was a typo. `s/retryMaxInterval/retryMaxTime/g` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> (cherry picked from commit `e2c73dc89f`)	2024-10-11 14:25:15 +02:00
Hui Wang	ecd37cf56c	stream aggregation: support configuring multiple labels per `remoteWrite… (#7073 ) ….url` using `-remoteWrite.streamAggr.dropInputLabels` Before, labels were set to all the `remoteWrite.url`. address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6780 --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> (cherry picked from commit `fbde238cdc`)	2024-09-27 12:40:53 +02:00
Zhu Jiekun	73ae5dcfc5	vmagent: remote write respect Retry-After in header (#6124 ) ### Describe Your Changes related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6097 #### Changed - Remote write retry policy in `vmagent` is changed into: 1. Respect `Retry-After` duration if exists. 2. Otherwise, calculate next retry duration by backoff policy (x2) and max retry duration limit. #### Docs - `CHANGELOG.md`. --- ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Co-authored-by: Zakhar Bessarab <me@zekker-dev.tk> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `5319acb8ed`)	2024-09-24 16:58:16 +02:00
f41gh7	395894688c	app/*/multiarch: return back empty value for TARGETARCH follow-up after `91456ab5bb` docker buildx uses special variables, such as TARGETARCH and it shouldn't be overwritten. See this article for details https://www.docker.com/blog/faster-multi-platform-builds-dockerfile-cross-compilation-guide/ Signed-off-by: f41gh7 <nik@victoriametrics.com>	2024-09-06 18:15:22 +02:00
Aliaksandr Valialkin	ac507466c3	all: suppress InvalidDefaultArgInFrom warning emitted by `docker build` when building Docker packages via `make package-` command Recent versions of `docker build` started generating the InvalidDefaultArgInFrom warning if Dockerfile contains an ARG without default value. While this warning doesn't affect building Docker packages via `make package-` commands, it is better suppressing the warning, so it doesn't clutter `make package-*` output with the noise, which can hide real issues in the future.	2024-09-03 14:05:43 +02:00
Hui Wang	a21aea5dd4	stream aggregation: perform deduplication for all received data when … (#6711 ) …specifying `-streamAggr.dedupInterval` or `-remoteWrite.streamAggr.dedupInterval` command-line flag [The documentation](https://docs.victoriametrics.com/stream-aggregation/) contains conflicting descriptions regarding deduplication for non-matched series when `-remoteWrite.streamAggr.config` and / or `-streamAggr.config` are set: 1. Statement below says all the received data is deduplicated: >[vmagent](https://docs.victoriametrics.com/vmagent/) supports relabeling, deduplication and stream aggregation for all the received data, scraped or pushed. Then, the collected data will be forwarded to specified -remoteWrite.url destinations. The data processing order is the following: >1. all the received data is relabeled according to the specified [-remoteWrite.relabelConfig](https://docs.victoriametrics.com/vmagent/#relabeling) (if it is set) >2. all the received data is deduplicated according to specified [-streamAggr.dedupInterval](https://docs.victoriametrics.com/stream-aggregation/#deduplication) (if it is set to duration bigger than 0) 2. Another statement says the deduplication is performed individually for the matching samples >The de-deduplication is performed after applying [relabeling](https://docs.victoriametrics.com/vmagent/#relabeling) and before performing the aggregation. If the -remoteWrite.streamAggr.config and / or -streamAggr.config is set, then the de-duplication is performed individually per each [stream aggregation config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config) for the matching samples after applying [input_relabel_configs](https://docs.victoriametrics.com/stream-aggregation/#relabeling). Considering the following deduplication use cases: 1. To apply deduplication(globally or for specific remoteWrite destination) for all the received data, scraped or pushed --- using `-streamAggr.dedupInterval` or `-remoteWrite.streamAggr.dedupInterval`. 2. To deduplicate and aggregate metrics that match the rule `match` filters --- using `-remoteWrite.streamAggr.config` and specifiying `dedup_interval` option in [stream aggregation config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config). 3. To deduplicate all the received data while having `streamAggr.config` for some metrics --- no way for a single vmagent now, need to set up two level vmagents This PR implements case3. --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> (cherry picked from commit `d523015f27`)	2024-09-03 10:49:38 +02:00
Yury Akudovich	f759371c00	app/vmagent: add `remoteWrite.retryMinInterval` and `remoteWrite.retryMaxTime` flags (#6289 ) ## Describe Your Changes Add RemoteWrite Retry Controls This PR introduces two new flags to the remote write functionality: - remoteWrite.retryMinInterval - remoteWrite.retryMaxTime These flags provide finer control over the retry behavior for remoteWrite operations, allowing users to customize the minimum interval between retries and the maximum duration for retry attempts. Fixes #5486. ## Checklist - [x] The following checks are mandatory: My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Yury Akudovich <ya@matterlabs.dev> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `d0f5a9d77a`)	2024-08-23 15:28:44 +02:00
ccliu	8729052623	vmagent: resolve the issue where usePromCompatibleNaming is not working (#6776 ) Describe Your Changes When I use usePromCompatibleNaming with vmagent to process data that needs to be formatted from different sources such as InfluxDB, I find that it doesn’t work However, it works in vminsert. I found that vminsert uses the HasRelabeling method to determine whether to relabel. ```go func HasRelabeling() bool { pcs := pcsGlobal.Load() return pcs.Len() > 0 \|\| usePromCompatibleNaming } ``` in vmagent, the decision to relabel is determined only by pcsGlobal.Len() > 0. However, in the applyRelabeling method, the usePromCompatibleNaming logic is also used to determine whether to relabel in the error handling. ```go func (rctx relabelCtx) applyRelabeling(tss []prompbmarshal.TimeSeries, pcs promrelabel.ParsedConfigs) []prompbmarshal.TimeSeries { if pcs.Len() == 0 && !usePromCompatibleNaming { // Nothing to change. return tss } ``` So I think that the logic for determining whether to relabel in vmagent is not as expected. Checklist The following checks are mandatory: [✅]My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Co-authored-by: Roman Khavronenko <hagen1778@gmail.com> (cherry picked from commit `d134a310f3`)	2024-08-13 10:33:55 -04:00
Hui Wang	e74d5f266e	stream aggregation: do not allow to enable `-stream.keepInput` and `k… (#6723 ) …eep_metric_names` options in stream aggregation config together With aggregated data and raw data under the same metric, results would be confusing. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `62d19369a3`)	2024-08-13 09:08:27 -04:00
Hui Wang	13a21a3ba0	app/vmagent/remotewrite: make `-remoteWrite.streamAggr.ignoreFirstIntervals` of array type (#6744 ) Make `-remoteWrite.streamAggr.ignoreFirstIntervals` of array type so it could accept multiple values which can be applied to the corresponding`-remoteWrite.url`. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `8f5c26d788`)	2024-08-07 09:57:49 +02:00
Hui Wang	71ac65996b	app/vmagent/remotewrite: fix `-streamAggr.dropInputLabels` behavior (#6743 ) Fix `-streamAggr.dropInputLabels` behavior when global deduplication is enabled without `-streamAggr.config`. Previously, `-remoteWrite.streamAggr.dropInputLabels` is misapplied. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `4863605469`)	2024-08-07 09:57:49 +02:00
Zakhar Bessarab	0b1def6e24	app/{vminsert,vmagent}: add healthcheck for influx ingestion endpoints (#6749 ) ### Describe Your Changes This is useful for clients which validate InfluxDB is available before data ingestion can be started. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6653 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `9877a5e7d5`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-05 09:45:32 +02:00
Aliaksandr Valialkin	f8aa445945	all: consistently use stringsutil.JSONString() for formatting JSON strings with fmt.* functions instead of using "%q" formatter The %q formatter may result in incorrectly formatted JSON string if the original string contains special chars such as \x1b . They must be encoded as \u001b , otherwise the resulting JSON string cannot be parsed by JSON parsers. This is a follow-up for `c0caa69939` See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24	2024-07-17 14:01:37 +02:00
Aliaksandr Valialkin	8b76a40715	lib/httpserver: skip basic auth check for additional request paths, which should call httpserver.CheckAuthFlag() This is a follow-up for `61dce6f2a1` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6338 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329	2024-07-16 01:08:41 +02:00
Aliaksandr Valialkin	476bf400ac	lib/{httputils,netutil}: move httputils.GetStatDialFunc to netutil.NewStatDialFunc - Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call - NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil - Simplify the implementation of NewStatDialFunc by removing sync.Map from there. - Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils - Use gauge instead of counter type for *_conns metric This is a follow-up for `d7b5062917` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299	2024-07-15 23:05:46 +02:00
Aliaksandr Valialkin	cbc637d1dd	app/vmagent/remotewrite: follow-up for `f153f54d11` - Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go . This improves code maintainability a bit. - Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs(). This prevents from potential resource leaks. - Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation. This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions. - Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics. - Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation. This label should simplify investigation of the exposed metrics. - Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability: it is hard to use these labels in query filters and aggregation functions. - Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` . This metric shows the number of samples generated by the corresponding streaming aggregation rule. This metric has been added in the commit `861852f262` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 - Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice. This metric has been added in the commit `861852f262` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 - Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params, which could modify the behaviour of the constructed streaming aggregator. Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory. - Pass Options arg to LoadFromFile() function by reference, since this structure is quite big. This also allows passing nil instead of Options when default options are enough. - Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics, so they have consistent set of labels comparing to the rest of streaming aggregation metrics. - Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding `aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding `output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output metrics. This is a follow-up for the commit `2eb1bc4f81` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604 - Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason in the commit `2eb1bc4f81` . - Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function. While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268	2024-07-15 20:25:36 +02:00
Aliaksandr Valialkin	3365dd508f	app/vmagent/remotewrite: do not spend CPU time on an attempt to send data to blocked queue if some queues are unblocked Previously remotewrite.TryPush() was trying to send data to remote storages with blocked persistent queues, if some persistent queues to other remote storage systems were unblocked. This resulted in excess CPU usage on relabeling and stream aggregation for the remote storage with blocked queues. The solution is to check whether some peristent storages have blocked queues and skip them before applying per- -remoteWrite.url relabeling and streaming aggregation. While at it, properly update per- -remoteWrite.url vmagent_remotewrite_samples_dropped_total and vmagent_remotewrite_push_failures_total counters when global streaming aggregation cannot send data to remote storage systems because of blocked queues. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467 and https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268 . This is a follow-up for `87fd400dfc` and `f153f54d11` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065	2024-07-15 09:40:34 +02:00
Aliaksandr Valialkin	4921ec5604	docs/CHANGELOG.md: use new link to VictoriaMetrics cluster docs instead of old link The old link was changed globally to the new link in the commit `f4b1cbfef0` . Unfortunately, old links are still posted in new commits :( This is a follow-up for `680b8c25c8` . While at it, remove duplicate 'len(*remoteWriteURLs) > 0' check in the remotewrite.Init() functions, since this check is already made at the beginning of the function. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6253	2024-07-13 03:04:20 +02:00
Aliaksandr Valialkin	bc1f92d7f5	app/vmagent/remotewrite: follow-up for `87fd400dfc` - Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage systems are configured with the disabled on-disk queue, every in-memory queue is full and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common, so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx relabeling and other processing in this case. - Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped(). Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set. In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full and on-disk queue is disabled. The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage processes pending data, so it drops aggregated samples in this case. - Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output, so it is clear that this flag can be set individually per each -remoteWrite.url. - Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems are configured with the disabled on-disk queue, then there is no sense in keeping samples on some of these systems, while dropping samples on the remaining systems, since this will result in global stall on the remote storage system with the disabled on-disk queue and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false from remotewrite.TryPush() in this case. This will result in infinite duplicate samples written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set. This allows proceeding with newly scraped / pushed samples by sending them to the remaining remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set. - Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test. - Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url. See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065	2024-07-13 02:30:10 +02:00
Aliaksandr Valialkin	7c97cef95c	app: consistently use t.Fatal* instead of t.Error* (except of app/vmalert and app/vmctl - these packages will be processed in a separate commit) Consistently using t.Fatal* simplifies the test code and makes it less fragile, since it is common error to forget to make proper cleanup after t.Error* call. Also t.Error* calls do not provide any practical benefits when some tests fail. They just clutter test output with additional noise information, which do not help in fixing failing tests most of the time. This is a follow-up for `a9525da8a4`	2024-07-11 16:01:25 +02:00
Aliaksandr Valialkin	d6415b2572	all: consistently use 'any' instead of 'interface{}' 'any' type is supported starting from Go1.18. Let's consistently use it instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.	2024-07-10 00:23:26 +02:00
Aliaksandr Valialkin	172ae1adf7	Revert `c6c5a5a186` and `b2765c45d0` Reason for revert: There are many statsd servers exist: - https://github.com/statsd/statsd - classical statsd server - https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ ) - https://github.com/avito-tech/bioyino - high-performance statsd server - https://github.com/atlassian/gostatsd - statsd server in Go - https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd ( the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ). Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides significant advantages over the existing statsd servers, while has no significant drawbacks comparing to existing statsd servers. The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server. The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics ( see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers. So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent). Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation. In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation. This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed during querying. P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic specialized configuration for aggregating of statsd metrics. The main requirements for this configuration: - easy to write, read and update (ideally it should work out of the box for most cases without additional configuration) - hard to misconfigure (e.g. hard to shoot yourself in the foot) It would be great if this configuration will be compatible with the configuration of the most widely used statsd server. In the mean time it is recommended continue using external statsd server. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600	2024-07-03 23:57:49 +02:00
Aliaksandr Valialkin	cd152693c6	Revert "Exemplar support (#5982 )" This reverts commit `5a3abfa041`. Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below). Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ . Issues with Prometheus exemplars: - Prometheus still has only experimental support for exemplars after more than three years since they were introduced. It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature. See `0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)` and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage - It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus . - It looks like exemplars are supported for Histogram metric types only - see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar . Exemplars aren't supported for Counter, Gauge and Summary metric types. - Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query contains histogram_quantile() function. It queries exemplars via special Prometheus API - https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.) and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph. This makes the graph unusable, since it is litterally filled with thousands of exemplar dots. Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars. - Exemplars are usually connected to traces. While traces are good for some I doubt exemplars will become production-ready in the near future because of the issues outlined above. Alternative to exemplars: Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs - just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry! This doesn't work as expected in production as shown above. Are there better solutions, which work in production? Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job` and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry by these labes on the time range with the anomaly on metrics' graph. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5982	2024-07-03 16:09:18 +02:00
Aliaksandr Valialkin	a5d60ad78e	app/vmagent/remotewrite,lib/streamaggr: re-use common code in tests after `879771808b` - Export streamaggr.LoadFromData() function, so it could be used in tests outside the lib/streamaggr package. This allows removing a hack with creation of temporary files at TestRemoteWriteContext_TryPush_ImmutableTimeseries. - Move common code for mustParsePromMetrics() function into lib/prompbmarshal package, so it could be used in tests for building []prompbmarshal.TimeSeries from string. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6206	2024-07-03 15:22:51 +02:00
Aliaksandr Valialkin	4268a310c1	app/vmagent/remotewrite/remotewrite.go: make remoteWriteCtx.TryPush code easier to follow Move the code responsible for relabelCtx clearing into deferred function. This allows making more clear the remoteWriteCtx.TryPush code. This is a follow-up for `879771808b` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205 While at it, clarify the description of the bugfix at docs/CHANGELOG.md	2024-07-03 14:18:51 +02:00
Aliaksandr Valialkin	f406764ccc	app/vmagent/remotewrite/streamaggr.go: clarify the description for -remoteWrite.streamAggr.* command-line flags, so they are applied to the corresponding -remoteWrite.url	2024-07-03 14:18:51 +02:00
Andrii Chubatiuk	937ae2ca90	lib/streamaggr: added stale samples metric, added metrics labels (#6462 ) ### Describe Your Changes - added stale metrics counters for input and output samples - added labels for aggregator metrics => `name="{rwctx}:{aggrId}:{aggrSuffix}"` - rwctx - global or number starting from 1 - aggrid - aggregator id starting from 1 - aggrSuffix - <interval>_(by\|without)_label1_label2_labeln e.g: `name="global:1:1m_without_instance_pod"` ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `861852f262`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-07-01 15:01:49 +02:00
Andrii Chubatiuk	516848783e	deployment: build image for vmagent streamaggr benchmark (#6515 ) ### Describe Your Changes optionally build vmagent image for benchmark needed for https://github.com/VictoriaMetrics/ops/pull/1297 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). (cherry picked from commit `6b128da811`)	2024-06-24 16:29:14 +02:00
Hui Wang	028a80613f	lib/httpserver: allow reloadAuthKey and configAuthKey to override htt… (#6338 ) …pAuth.* address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329, makes `reloadAuthKey`, `configAuthKey`, `flagsAuthKey`, `pprofAuthKey` behavior the same way, but keys like `-snapshotAuthKey`, `-forceMergeAuthKey` are still protected by httpAuth.*. All the available key are listed in https://docs.victoriametrics.com/single-server-victoriametrics/#security. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `61dce6f2a1`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-10 12:41:29 +02:00
hagen1778	73c9981335	chore: follow-up after `c740a8042e` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `6d8e02f278`)	2024-06-03 11:53:37 +02:00
yumeiyin	95b8cf76f8	chore: remove redundant words (#6348 ) (cherry picked from commit `9289c7512d`)	2024-05-29 14:37:04 +02:00
Andrii Chubatiuk	2c4a42554a	app/vmagent: fixed streamaggr args (#6374 ) use GetOptionalArg instead of index to fallback to a first argument if index is absent for remotewrite.streamaggr.config (cherry picked from commit `7e5a206057`)	2024-05-29 14:04:24 +02:00
Hui Wang	5b8c3fc9d0	app/vmalert: support DNS SRV record in `-remoteWrite.url` (#6299 ) part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053, supports [DNS SRV](https://en.wikipedia.org/wiki/SRV_record) address in `-remoteWrite.url` command-line option. (cherry picked from commit `d7b5062917`)	2024-05-22 10:53:22 +02:00
Roman Khavronenko	3e8b5e74d5	lib/streamaggr: skip empty aggregators (#6307 ) Prevent excessive resource usage when stream aggregation config file contains no matchers by prevent pushing data into Aggregators object. Before this change a lot of extra work was invoked without reason. Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `7ce052b32d`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-20 14:46:36 +02:00
Roman Khavronenko	8daa1d9505	app/vmagent: fix panic on shutdown when no global deduplication is co… (#6308 ) …nfigured Follow-up for `f153f54d11` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `7dc18bf67a`)	2024-05-20 14:46:10 +02:00
viperstars	ab78f3c89d	app/vmagent/remotewrite: skip sending empty block to downstream server (#6241 ) Occasionally, vmagent sends empty blocks to downstream servers. If a downstream server returns an unexpected response, vmagent gets stuck in a retry loop. While vmagent handles 400 and 409 errors, there are various prometheus remote write implementations that return different error codes. For example, vector returns a 422 error. To mitigate the risk of vmagent getting stuck in a retry loop, it is advisable to skip sending empty blocks to downstream servers. Co-authored-by: hao.peng <hao.peng@smartx.com> Co-authored-by: Zhu Jiekun <jiekun.dev@gmail.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `3661373cc2`)	2024-05-17 14:57:07 +02:00
Andrii Chubatiuk	fe332c3419	app/vmagent: add global aggregator (#6268 ) Add global stream aggregation for VMAgent https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467 (cherry picked from commit `f153f54d11`)	2024-05-17 14:01:31 +02:00
Nikolay	ee4a94a371	follow-up for `c6c5a5a186` (#6265 ) * adds datadog extensions for statsd: - multiple packed values (v1.1) - additional types distribution, histogram * adds type check and append metric type to the labels with special tag name `__statsd_metric_type__`. It simplifies streaming aggregation config. * remove statsd support from cluster, since cluster doesn't support stream aggregation. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `b2765c45d0`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-17 13:49:24 +02:00
Andrii Chubatiuk	ec2273b247	app/vmagent: removed deprecated -remoteWrite.multitenantURL flag support (#6253 ) Removed deprecated `-remoteWrite.multitenantURL` flag to simplify global stream aggregation --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `680b8c25c8`)	2024-05-13 16:49:33 +02:00
Roman Khavronenko	0bed453737	Feature allow configuring disableOnDiskQueue and dropSamplesOnOverload per url (#6248 ) * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): allow configuring `-remoteWrite.disableOnDiskQueue` and `-remoteWrite.dropSamplesOnOverload` cmd-line flags per each `-remoteWrite.url`. See this [pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065). Thanks to @rbizos for implementaion! * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add labels `path` and `url` to metrics `vmagent_remotewrite_push_failures_total` and `vmagent_remotewrite_samples_dropped_total`. Now number of failed pushes and dropped samples can be tracked per `-remoteWrite.url`. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Raphael Bizos <r.bizos@criteo.com> (cherry picked from commit `87fd400dfc`)	2024-05-10 14:32:23 +02:00
qiangxuhui	885fc4122a	Add build support for loong64 (#6222 ) ### Describe Your Changes Added makefile rule for `GOARCH=loong64` to support building all VictoriaMetrics components on the `loongarch64` platform. ### Checklist The following checks are mandatory: * [X] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: qiangxuhui <qiangxuhui@loongson.cn> (cherry picked from commit `80f3644ee3`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-10 14:32:05 +02:00
Oleg	76af930e4a	Statsd protocol compatibility (#5053 ) In this PR I added compatibility with [statsd protocol](https://github.com/b/statsd_spec) with tags to be able to send metrics directly from statsd clients to vmagent or directly to VM. For example its compatible with [statsd-instrument](https://github.com/Shopify/statsd-instrument) and [dogstatsd-ruby](https://github.com/DataDog/dogstatsd-ruby) gems Related issues: #5052, #206, #4600 (cherry picked from commit `c6c5a5a186`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-10 14:27:31 +02:00
Ted Possible	0206a01d03	Exemplar support (#5982 ) This code adds Exemplars to VMagent and the promscrape parser adhering to OpenMetrics Specifications. This will allow forwarding of exemplars to Prometheus and other third party apps that support OpenMetrics specs. --------- Signed-off-by: Ted Possible <ted_possible@cable.comcast.com> (cherry picked from commit `5a3abfa041`)	2024-05-10 13:14:17 +02:00
Andrii Chubatiuk	e26b55db1e	app/vmagent/remotewrite: do not cleanup timeseries which are used in multiple remote write contexts (#6206 ) When at least one remote write has deduplication configured it cleans up timeseries while they can be in use by another remote write without deduplication https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `879771808b`)	2024-05-06 12:10:45 +02:00
hagen1778	57c841669c	app/vmagent: mention corner case with dangling queues and identical URLs See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6140 We don't cover this corner case as it has low chance for reproduction. Precisely, the requirements are following: 1. vmagent need to be configured with multiple identical `remoteWrite.url` flags; 2. At least one of the persistent queues need to be non-empty, which already signalizes about issues with setup; 3. vmagent need to be restarted with removing of one of `remoteWrite.url` flags. We do not document this case in vmagent.md as it seems to be a rare corner case and its explanation will require too much of explanation and confuse users. Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `4251292708`)	2024-04-23 14:52:35 +02:00
hagen1778	342290275e	app/streamaggr: follow-up after `c0e4ccb7b5` * rm vmagent mentions from vminsert flags * improve documentation wording, add links to related sections * mention `ignore_first_intervals` in the stream aggr options * update flags description * add basic test for config parsing validation Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `bae3874e6a`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-04-22 14:39:23 +02:00
Andrii Chubatiuk	131367fb59	lib/streamaggr: add option to ignore first N aggregation intervals (#6137 ) Stream aggregation may yield inaccurate results if it processes incomplete data. This issue can arise when data is sourced from clients that maintain a queue of unsent data, such as Prometheus or vmagent. If the queue isn't fully cleared within the aggregation interval, only a portion of the time series may be included in that period, leading to distorted calculations. To mitigate this we add an option to ignore first N aggregation intervals. It is expected, that client queues will be cleared during the time while aggregation ignores first N intervals and all subsequent aggregations will be correct. (cherry picked from commit `c0e4ccb7b5`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-04-22 14:34:36 +02:00
Aliaksandr Valialkin	a249ab96b4	app/vmagent/influx: replace hybrid channel-based pool + sync.Pool with plain sync.Pool for pushCtx Data ingestion benchmark doesn't show memory usage difference between two approaches, so let's use simpler approach in order to improve code readability and maintainability. This is a follow-up for `77c597738c`	2024-04-20 21:38:25 +02:00

1 2 3 4 5 ...

660 commits