github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	893a555051	Revert "lib/protoparser/opentelemetry/firehose: escape requestID before returning it to user (#6451 )" This reverts commit `cd1aca217c`. Reason for revert: this commit has no sense, since the firehose response has application/json content-type, so it must contain JSON-encoded timestamp and requestId fields according to https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html#responseformat . HTML-escaping the requestId field may break the response, so the client couldn't correctly recognize the html-escaped requestId. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6451	2024-07-16 09:50:16 +02:00
Aliaksandr Valialkin	d6415b2572	all: consistently use 'any' instead of 'interface{}' 'any' type is supported starting from Go1.18. Let's consistently use it instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.	2024-07-10 00:23:26 +02:00
Aliaksandr Valialkin	9edeecabc8	lib: consistently use f-tests instead of table-driven tests This makes easier to read and debug these tests. This also reduces test lines count by 15% from 3K to 2.5K . See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e . While at it, consistently use t.Fatal* instead of t.Error, since t.Error usually leads to more complicated and fragile tests, while it doesn't bring any practical benefits over t.Fatal*.	2024-07-09 22:39:13 +02:00
Aliaksandr Valialkin	172ae1adf7	Revert `c6c5a5a186` and `b2765c45d0` Reason for revert: There are many statsd servers exist: - https://github.com/statsd/statsd - classical statsd server - https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ ) - https://github.com/avito-tech/bioyino - high-performance statsd server - https://github.com/atlassian/gostatsd - statsd server in Go - https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd ( the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ). Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides significant advantages over the existing statsd servers, while has no significant drawbacks comparing to existing statsd servers. The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server. The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics ( see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers. So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent). Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation. In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation. This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed during querying. P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic specialized configuration for aggregating of statsd metrics. The main requirements for this configuration: - easy to write, read and update (ideally it should work out of the box for most cases without additional configuration) - hard to misconfigure (e.g. hard to shoot yourself in the foot) It would be great if this configuration will be compatible with the configuration of the most widely used statsd server. In the mean time it is recommended continue using external statsd server. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600	2024-07-03 23:57:49 +02:00
Aliaksandr Valialkin	cd152693c6	Revert "Exemplar support (#5982 )" This reverts commit `5a3abfa041`. Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below). Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ . Issues with Prometheus exemplars: - Prometheus still has only experimental support for exemplars after more than three years since they were introduced. It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature. See `0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)` and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage - It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus . - It looks like exemplars are supported for Histogram metric types only - see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar . Exemplars aren't supported for Counter, Gauge and Summary metric types. - Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query contains histogram_quantile() function. It queries exemplars via special Prometheus API - https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.) and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph. This makes the graph unusable, since it is litterally filled with thousands of exemplar dots. Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars. - Exemplars are usually connected to traces. While traces are good for some I doubt exemplars will become production-ready in the near future because of the issues outlined above. Alternative to exemplars: Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs - just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry! This doesn't work as expected in production as shown above. Are there better solutions, which work in production? Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job` and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry by these labes on the time range with the anomaly on metrics' graph. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5982	2024-07-03 16:09:18 +02:00
Andrii Chubatiuk	252aa5a3ab	lib/protoparser/graphite: added -graphite.sanitizeMetricName flag (#6489 ) ### Describe Your Changes Added flag to sanitize graphite metrics fixes #6077 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `476faf5578`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-07-02 17:16:00 +02:00
Roman Khavronenko	8c8d84e30a	lib/protoparser/opentelemetry/firehose: escape requestID before returning it to user (#6451 ) All user input should be sanitized before rendering. This should prevent possible attacks. See https://github.com/VictoriaMetrics/VictoriaMetrics/security/code-scanning/203 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-10 18:06:24 +02:00
Nikolay	ee4a94a371	follow-up for `c6c5a5a186` (#6265 ) * adds datadog extensions for statsd: - multiple packed values (v1.1) - additional types distribution, histogram * adds type check and append metric type to the labels with special tag name `__statsd_metric_type__`. It simplifies streaming aggregation config. * remove statsd support from cluster, since cluster doesn't support stream aggregation. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `b2765c45d0`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-17 13:49:24 +02:00
hagen1778	864fbf9125	Statsd protocol compatibility (#5053 ) In this PR I added compatibility with [statsd protocol](https://github.com/b/statsd_spec) with tags to be able to send metrics directly from statsd clients to vmagent or directly to VM. For example its compatible with [statsd-instrument](https://github.com/Shopify/statsd-instrument) and [dogstatsd-ruby](https://github.com/DataDog/dogstatsd-ruby) gems Related issues: #5052, #206, #4600 (cherry picked from commit `c6c5a5a186`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-10 14:28:37 +02:00
Ted Possible	0206a01d03	Exemplar support (#5982 ) This code adds Exemplars to VMagent and the promscrape parser adhering to OpenMetrics Specifications. This will allow forwarding of exemplars to Prometheus and other third party apps that support OpenMetrics specs. --------- Signed-off-by: Ted Possible <ted_possible@cable.comcast.com> (cherry picked from commit `5a3abfa041`)	2024-05-10 13:14:17 +02:00
Aliaksandr Valialkin	4318f34644	lib/protoparser: substitute hybrid channel-based pools with plain sync.Pool Using plain sync.Pool simplifies the code without increasing memory usage and CPU usage. So it is better to use plain sync.Pool from readability and maintainability PoV. This is a follow-up for `8942f290eb`	2024-04-20 22:02:39 +02:00
Aliaksandr Valialkin	9004bc098e	all: use clear() built-in Go function for clearing []prompbmarshal.TimeSeries and []prompbmarshal.Label slices This makes the code a bit clear.	2024-04-20 21:00:24 +02:00
Aliaksandr Valialkin	bb9bb600b3	lib/protoparser/opentelemetry: follow-up after `47892b4a4c` - Rename -opentelemetry.sanitizeMetrics command-line flag to more clear -opentelemetry.usePrometheusNaming - Clarify the description of the change at docs/CHANGELOG.md - Rename promrelabel.SanitizeLabelNameParts to more clear promrelabel.SplitMetricNameToTokens - Properly split metric names at '_' char in promerlabel.SplitMetricNameToTokens. - Add tests for various edge cases for Prometheus metric names' normalization according to the code at `b865505850/pkg/translator/prometheus/normalize_name.go` - Extract the code responsible for Prometheus metric names' normalization into a separate file (santize.go) Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6035	2024-04-03 03:09:52 +03:00
Aliaksandr Valialkin	00f59d6ddf	all: fix golangci-lint(revive) warnings after `0c0ed61ce7` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001	2024-04-03 03:00:45 +03:00
Aliaksandr Valialkin	faa2ba828a	app/vmagent: simplify code after `509df44d03` - Simplify the code in order to improve its maintenance - Properly pass tenant ID when processing multi-tenant opentelemetry request at vmagent Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6016	2024-04-03 02:50:46 +03:00
Andrii Chubatiuk	0799834aaa	opentelemetry: added cmd flag to sanitize metric names (#6035 )	2024-04-03 02:31:39 +03:00
Andrii Chubatiuk	914b23f1e8	app/{vmagent,vminsert}: fixed firehose response (#6016 )	2024-04-02 18:03:12 +03:00
Aliaksandr Valialkin	816202bca7	lib/protoparser/opentelemetry/firehose: verify that the full response is parsed properly in ProcessRequestBody This is a follow-up for `bf9cb84575` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5899	2024-03-01 00:39:47 +02:00
Andrii Chubatiuk	e575fb1aeb	opentelemetry: fix firehose message parsing (#5899 ) Co-authored-by: Andrii Chubatiuk <wachy@Andriis-MBP-2.lan>	2024-03-01 00:24:14 +02:00
Aliaksandr Valialkin	a9fb2e91a6	lib/protoparser/csvimport: unse unsafe.Slice instead of deprecated reflect.SliceHeader	2024-02-29 17:20:05 +02:00
Aliaksandr Valialkin	7832d0800e	app/{vminsert,vmagent}: follow-up after `67a55b89a4` - Document the ability to read OpenTelemetry data from Amazon Firehose at docs/CHANGELOG.md - Simplify parsing Firehose data. There is no need in trying to optimize the parsing with fastjson and byte slice tricks, since OpenTelemetry protocol is really slooow because of over-engineering. It is better to write clear code for better maintanability in the future. - Move Firehose parser from /lib/protoparser/firehose to lib/protoparser/opentelemetry/firehose, since it is used only by opentelemetry parser. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5893	2024-02-29 14:47:20 +02:00
Andrii Chubatiuk	60cf0c9656	{vmagent,vminsert}: added firehose http destination opentelemetry data ingestion support (#5893 ) Co-authored-by: Andrii Chubatiuk <wachy@Andriis-MBP-2.lan> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-02-29 14:46:16 +02:00
Hui Wang	d6ecfffa17	chore: add actual request size in error message (#5889 )	2024-02-29 02:40:57 +02:00
Nikolay	22762d7a69	app/vmselect: change export/csv timestamp format for rfc3339 to respect milliseconds (#5853 ) * app/vmselect: adds milliseconds to the csv export response for rfc3339 * milliseconds is a standard prescion for VictoriaMetrics query request responses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5837 * app/victoria-metrics: adds tests for csv export/import follow-up after 3541a8d0cf96dd4f8563624c4aab6816615d0756 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-02-23 01:16:08 +02:00
Aliaksandr Valialkin	ec02e9ba19	lib/protoparser/datadogsketches: use math.RoundToEven() for calculating the rank The original code uses this function - see `48d52eeea6/pkg/quantile/sparse.go (L138)` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5775	2024-02-07 21:45:05 +02:00
Aliaksandr Valialkin	28fffdfcc7	lib/protoparser/datadogsketches: add more permalinks to the original source code These permalinks should help verifying the correctness of the code This is a follow-up after `07213f4e0c` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5775	2024-02-07 21:45:05 +02:00
Andrii Chubatiuk	3aa439a618	added ddsketch permalink (#5775 ) Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com>	2024-02-07 21:45:04 +02:00
Aliaksandr Valialkin	82f4e4e070	app/{vmagent,vminsert}: follow-up after `a1d1ccd6f2` - Document the change at docs/CHANGELOG.md - Copy changes from docs/Single-server-VictoriaMetrics.md to README.md - Add missing handler for processing multitenant requests ( https://docs.victoriametrics.com/vmagent/#multitenancy ) - Substitute github.com/stretchr/testify dependency with 3 lines of code in the added tests - Comment unclear code at lib/protoparser/datadogsketches/parser.go , so @AndrewChubatiuk could update it and add permalinks to the original source code there. - Various code cleanups Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5584 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3091	2024-02-07 01:31:52 +02:00
Andrii Chubatiuk	c634859c4f	support datadog /api/beta/sketches API (#5584 ) Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-02-07 01:30:00 +02:00
Aliaksandr Valialkin	e15f07d989	all: consistently clear prompbmarshal.Label by assigning an empty struct instead of zeroing Name and Value individually	2024-01-22 01:11:59 +02:00
Aliaksandr Valialkin	f7b589e38a	lib/prompb: switch to github.com/VictoriaMetrics/easyproto	2024-01-16 20:43:09 +02:00
Aliaksandr Valialkin	8cb138e8df	lib/protoparser/datadogv2: simplify code for parsing protobuf messages after `0597718435` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4451	2024-01-16 20:35:17 +02:00
Aliaksandr Valialkin	f8ae2abd88	lib/protoparser/opentelemetry: use github.com/VictoriaMetrics/easyproto for protobuf message unmarshaling and marshaling This reduces VictoriaMetrics binary size by 100KB. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2570 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2424	2024-01-16 20:34:18 +02:00
Aliaksandr Valialkin	9eef72bce9	lib/protoparser/datadogv2: add support for reading protobuf-encoded requests at /api/v2/series endpoint Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4451 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094	2024-01-16 20:32:15 +02:00
Aliaksandr Valialkin	12de0d39eb	lib/protoparser/datadogv2: take into account source_type_name field, since it contains useful value such as kubernetes, docker, system, etc.	2023-12-21 23:05:52 +02:00
Aliaksandr Valialkin	6feef14095	lib/protoparser: add missing /datadog/ prefix to the /api/v2/series path in the description for -datadog.maxInsertRequestSize command-line flag	2023-12-21 21:05:24 +02:00
Aliaksandr Valialkin	62a105d9e9	app/{vminsert,vmagent}: preliminary support for /api/v2/series ingestion from new versions of DataDog Agent This commit adds only JSON support - https://docs.datadoghq.com/api/latest/metrics/#submit-metrics , while recent versions of DataDog Agent send data to /api/v2/series in undocumented Protobuf format. The support for this format will be added later. Thanks to @AndrewChubatiuk for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4451	2023-12-21 20:50:27 +02:00
Aliaksandr Valialkin	76b120e355	lib/protoparser/opentelemetry: allow ingesting metrics without resource labels Some clients may ingest samples via OpenTelemetry protocol without Resource labels. Previously VictoriaMetrics was silently dropping such samples. The commit `317834f876` added vm_protoparser_rows_dropped_total{type="opentelemetry",reason="resource_not_set"} counter for tracking of such dropped samples. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5459 It is better from usability PoV to accept such samples instead of dropping them and incrementing the corresponding counter.	2023-12-17 19:16:43 +02:00
Zakhar Bessarab	61f400eccb	lib/protoparser/opentelemetry: add metric to track skipped rows without resource (#5459 ) Currently, it is impossible to understand why metrics are not ingested when resource is not set by OTEL exporter. Adding metric should simplify debugging and make it improve debuggability. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> (cherry picked from commit `317834f876`)	2023-12-15 11:54:07 +01:00
Aliaksandr Valialkin	559e4db512	Revert "add datadog /api/v2/series and /api/beta/sketches support (#5094 )" This reverts commit `d6b4c8e4ef`. Reason for revert: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094#issuecomment-1839789080	2023-12-05 02:30:40 +02:00
Aliaksandr Valialkin	61db92cdc7	Revert "lib/protoparser/datadog: follow-up after 543f218fe96574b9b2189c8350bb09afa349e3bb" This reverts commit `73d18fbc7a`. Reason for revert: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094#issuecomment-1839789080	2023-12-05 02:29:00 +02:00
Aliaksandr Valialkin	85fcefaa34	app/vmagent: code cleanup for Kafka and Google PubSub consumers / producers - Add links to relevant docs into descriptions for every -kafka.* and -gcp.pubsub.* command-line flags. - Wait until message processing goroutines are stopped before returning from gcppubsub.Stop(). - Prevent from multiple calls to Init() without Stop(). - Drop message if tenantID cannot be parsed properly. - Take into account tenantID for all the supported message formats. - Support gzip-compressed messages for graphite format. - Use exponential backoff sleep when the message cannot be pushed to remote storage systems because of disabled on-disk persistence - https://docs.victoriametrics.com/vmagent.html#disabling-on-disk-persistence - Unblock from sleep as soon as Stop() is called. Previously the sleep could take up to 2 seconds after Stop() is called. - Remove unused globalCtx and initContext from app/vmagent/remotewrite/gcppubsub - Mention Google PubSub support at docs/enterprise.md - Make Google PubSub docs more clear at docs/vmagent.md This is a follow-up for commits 115245924a5f096c5a3383d6cc8e8b6fbd421984 and e6eab781ce42285a6a1750dc01eba6801dd35516 . Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/717 Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/713	2023-12-04 22:51:04 +02:00
hagen1778	73d18fbc7a	lib/protoparser/datadog: follow-up after `543f218fe9` * prevent /api/v1 from panic on parsing rows * add tests for Extract function for v1 and v2 api's * separate request types in different pools to prevent different objects mixing * add changelog line `543f218fe9` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `98d0f81f21`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-12-01 13:56:23 +01:00
Andrii Chubatiuk	d6b4c8e4ef	add datadog /api/v2/series and /api/beta/sketches support (#5094 ) Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com> Co-authored-by: Nikolay <https://github.com/f41gh7> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> (cherry picked from commit `543f218fe9`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-12-01 13:55:32 +01:00
Roman Khavronenko	26242f526e	lib/protoparser: decrease `import.maxLineLen` from 100MB to 10MB (#5364 ) Tests showed that importing a single line with 70MB size takes 5.3GiB RSS memory for VictoriaMetrics single-node. In the scenario when user exports and imports data from one VM to another, it could possibly lead to OOM exception for destination VM. Importing a single line with 16MB size taks 1.3GiB RSS memory. Hence, the limit for `import.maxLineLen` was decreased from 100MB to 10MB to improve reliability of VictoriaMetrics during imports. Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-24 13:13:33 +02:00
Hui Wang	91379331eb	lib/protoparser/promremotewrite: fall back to zstd decoding if Snappy-decoding fails (#5344 ) This case is possible after the following steps: 1. vmagent successfully performed handshake with the -remoteWrite.url and the remote storage supports zstd-compressed data. 2. remote storage became unavailable or slow to ingest data, vmagent compressed the collected data into blocks with zstd and puts these blocks to persistent queue on disk. 3. vmagent restarts and the remote storage is unavailable during the handshake, then vmagent falls back to Snappy compression. 4. vmagent starts sending zstd-compressed data from persistent queue to the remote storage, while falsely advertizing it sends Snappy-compressed data. 5. The remote storage receives zstd-compressed data and fails unpacking it with Snappy. The solution is the same as `12cd32fd75`, just fall back to zstd decompression if Snappy decompression fails.	2023-11-17 15:53:18 +01:00
Aliaksandr Valialkin	12cd32fd75	lib/protoparser/promremotewrite: fall back to Snappy decoding if zstd decoding fails This case is possible after the following steps: 1. vmagent tries to perform handshake with the -remoteWrite.url in order to determine whether the remote storage supports zstd-compressed data. 2. The remote storage is unavailable during the handshake. In this case vmagent falls back to Snappy compression for the data sent to the remote storage. 3. vmagent compresses the collected data into blocks with Snappy and puts these blocks to persistent queue on disk. 4. The remote storage becomes available. 5. vmagent restarts, performs the handshake with the remote storage and detects that it supports zstd-compressed data. 6. vmagent starts sending Snappy-compressed data from persistent queue to the remote storage, while falsely advertizing it sends zstd-compressed data. 7. The remote storage receives Snappy-compressed data and fails unpacking it with zstd. The solution is to just fall back to Snappy decompression if zstd decompression fails. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301	2023-11-13 21:25:39 +01:00
Aliaksandr Valialkin	f03e81c693	lib/promauth: follow-up for `e16d3f5639` - Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup don't prevent from processing the corresponding scrape targets after the file becomes correct, without the need to restart vmagent. Previously scrape targets with invalid TLS CA file or TLS client certificate files were permanently dropped after the first attempt to initialize them, and they didn't appear until the next vmagent reload or the next change in other places of the loaded scrape configs. - Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent. Previously the old TLS CA was used until vmagent restart. - Properly handle errors during http request creation for the second attempt to send data to remote system at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing, since the returned request is nil on error. - Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent. Previously it could miss details on the source of the request. - Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header of every http request issued by vmagent during service discovery or target scraping. Re-use the HTTP client instead until the corresponding scrape config changes. - Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached, e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server when auth header cannot be obtained because of temporary error. - Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config. Cache the loaded certificate and the error for one second. This should significantly reduce CPU load when scraping big number of targets with the same tls_config. - Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`. - Improve test coverage at lib/promauth - Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later. Previously vmagent was exitting in this case. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959	2023-10-26 09:55:47 +02:00
Aliaksandr Valialkin	36a1fdca6c	all: consistently use %w instead of %s in when error is passed to fmt.Errorf() This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.	2023-10-26 09:44:40 +02:00
Aliaksandr Valialkin	815e9bf892	app/{vmagent,vminsert}: follow-up for NewRelic data ingestion protocol support This is a follow-up for `f60c08a7bd` Changes: - Make sure all the urls related to NewRelic protocol start from /newrelic . Previously some urls were started from /api/v1/newrelic - Remove /api/v1 part from NewRelic urls, since it has no sense - Remove automatic transformation from CamelCase to snake_case for NewRelic labels and metric names, since it may complicate the transition from NewRelic to VictoriaMetrics. Preserve all the metric names and label names, so users could query metrics and labels by the same names which are used in NewRelic. The automatic transformation from CamelCase to snake_case can be added later as a special action for relabeling rules if needed. - Properly update per-tenant data ingestion stats at app/vmagent/newrelic/request_handler.go . Previously it was always zero. - Fix NewRelic urls in vmagent when multitenant data ingestion is enabled. Previously they were mistakenly started from `/`. - Document NewRelic data ingestion url at docs/Cluster-VictoriaMetrics.md - Remove superflouos memory allocations at lib/protoparser/newrelic - Improve tests at lib/protoparser/newrelic/* Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3520 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4712	2023-10-16 13:55:04 +02:00

1 2 3 4 5

213 commits