github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-11 15:34:56 +00:00

Author	SHA1	Message	Date
Andrii Chubatiuk	a7404304fc	obtain tenant information from headers	2025-03-08 11:17:34 +02:00
f41gh7	ec68ea2222	lib/metricnamestats: follow-up after `b85b28d30a` * properly save state for cross-device mount points * properly check empty state for tracker Signed-off-by: f41gh7 <nik@victoriametrics.com>	2025-03-06 23:18:49 +01:00
Nikolay	b85b28d30a	lib/storage: add tracker for time series metric names statistics This feature allows to track query requests by metric names. Tracker state is stored in-memory, capped by 1/100 of allocated memory to the storage. If cap exceeds, tracker rejects any new items add and instead registers query requests for already observed metric names. This feature is disable by default and new flag: `-storage.trackMetricNamesStats` enables it. New API added to the select component: * /api/v1/status/metric_names_stats - which returns a JSON object with usage statistics. * /admin/api/v1/status/metric_names_stats/reset - which resets internal state of the tracker and reset tsid/cache. New metrics were added for this feature: * vm_cache_size_bytes{type="storage/metricNamesUsageTracker"} * vm_cache_size{type="storage/metricNamesUsageTracker"} * vm_cache_size_max_bytes{type="storage/metricNamesUsageTracker"} Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4458 --------- Signed-off-by: f41gh7 <nik@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2025-03-06 22:06:50 +01:00
Andrii Chubatiuk	26fba57cfa	lib/protoparser/opentelemetry: properly marshal nested attributes into JSON Previously, opentelemetry attribute parsed added extra field names according to golang JSON parser spec for structs: ``` struct AnyValue{ StringValue string } ``` Was serialized into: ``` {"StringValue": "some-string"} ``` While opentelemetry-collector serializes it as ``` "some-string" ``` This commit changes this behaviour it makes parses compatible with opentelemetry-collector format. See test cases for examples. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8384	2025-03-05 16:35:07 +01:00
hagen1778	6db97d6f79	lib/timeutil: add test for `ParseDuration` See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8403#discussion_r1976110052 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-03-03 10:46:01 +01:00
Roman Khavronenko	63f6ac3ff8	lib/promutils: move time-related funcs from `promutils` to `timeutil` (#8403 ) Since funcs `ParseDuration` and `ParseTimeMsec` are used in vlogs, vmalert, victoriametrics and other components, importing promutils only for this reason makes them to export irrelevant `vm_rows_invalid_total{type="prometheus"}` metric. This change removes `vm_rows_invalid_total{type="prometheus"}` metric from /metrics page for these components. ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-03-03 10:25:42 +01:00
Zakhar Bessarab	99de272b72	lib/promrelabel/scrape_url: properly parse IPv6 address from __address__ label Fix parsing of IPv6 addresses after discovery. Previously, it could lead to target being discovered and discarded afterwards. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8374 --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2025-02-28 14:19:10 +04:00
Aliaksandr Valialkin	744ac496bd	lib/logstorage: add ability to specify field name prefixes inside `fields (...)` lists passed to `pack_json` and `pack_logfmt` pipes	2025-02-27 22:54:18 +01:00
Roman Khavronenko	38d46d149f	lib/prompbmarshal: move MustParsePromMetrics to protoparser/prometheus (#8405 ) `MustParsePromMetrics` imports `lib/protoparser/prometheus`, and this package exposes the following metrics: ``` vm_protoparser_rows_read_total{type="promscrape"} vm_rows_invalid_total{type="prometheus"} ``` It means every package that uses `lib/prompbmarshal` will start exposing these metrics. For example, vlogs imports `lib/protoparser/common` which uses `lib/prompbmarshal.Label`. And only because of this vlogs starts exposing unrelated prometheus metrics on /metrics page. Moving `MustParsePromMetrics` to `lib/protoparser/prometheus` seems like the leas intrusive change. ----------- Depends on another change https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8403 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-02-27 22:50:27 +01:00
Aliaksandr Valialkin	84d5771b41	lib/logstorage: allow passing `` at `in()`, `contains_any()` and `contains_all()` Such filters are equivalent to `match all` filter aka `*`. These filters are needed for VictoriaLogs plugin for Grafana. See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/238#issuecomment-2685447673	2025-02-27 11:37:43 +01:00
Zhu Jiekun	3d3480140c	lib/storage: properly cache extDB metricsID on search error Previously, if indexDB search failed for some reason during search at previous indexDB (aka extDB), VictoriaMetrics stored empty search result at cache. It could cause incorrect search results at subsequent requests. This commit checks search error and stores request results only on success. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8345	2025-02-26 15:48:25 +01:00
Aliaksandr Valialkin	ae5e28524e	lib/logstorage: do not treat a string with leading zeros as a number at tryParseUint64 The "00123" string shouldn't be treated as 123 number. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8361	2025-02-25 22:10:36 +01:00
Aliaksandr Valialkin	1f75e5bb59	lib/logstorage: optimize common regex filters generated by Grafana For example, `field:~".+"`, `field:~"."` or `field:""` Replace such filters to faster ones. For example, `field:~"."` is replaced with ``, while `field:~".+"` is replaced with `field:`.	2025-02-25 20:34:28 +01:00
Aliaksandr Valialkin	fef2cb9dc7	lib/regexutil: speed up Regex.MatchString for ".*"	2025-02-25 20:34:28 +01:00
Aliaksandr Valialkin	82cdcec6c6	lib/logstorage: run `make fmt` after `30974e7f3f`	2025-02-25 18:37:40 +01:00
Aliaksandr Valialkin	30974e7f3f	lib/logstorage: add `le_field` and `lt_field` filters These filters can be used for selecting logs where one field value is less than another field value. These filter complement `<=` and `<` filters for constant literals.	2025-02-25 18:24:50 +01:00
Aliaksandr Valialkin	edc750dd55	lib/logstorage: optimize eq_filter when it is applied to fields of the same type	2025-02-25 18:24:15 +01:00
Aliaksandr Valialkin	bc69d5f1a4	lib/mergeset: explicitly pass the interval for flushing in-memory data to disk at MustOpenTable() This allows using different intervals for flushing in-memory data among different mergeset.Table instances. The initial user of this feature is lib/logstorage.Storage, which explicitly passes Storage.flushInterval to every created mereset.Table instance. Previously mergeset.Table instances were using 5 seconds flush interval, which didn't depend on the Storage.flushInterval. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775	2025-02-24 15:23:50 +01:00
Aliaksandr Valialkin	6764a03fbe	lib/logstorage: properly use datadb.flushInterval as an interval between flushes for the in-memory parts The dataFlushInterval variable has been mistakenly introduced in the commit `9dbd0f9085` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775	2025-02-24 15:06:36 +01:00
Aliaksandr Valialkin	2857188939	lib/logstorage: limit the maximum log field name length, which can be generated by JSONParser.ParseLogMessage Make sure that the maximum log field name, which can be generated by JSONParser.ParseLogMessage, doesn't exceed the hardcoded limit maxFieldNameSize. Stop flattening of nested JSON objects when the resulting field name becomes longer than maxFieldNameSize, and return the nested JSON object as a string instead. This should prevent from parse errors when ingesting deeply nested JSON logs with long field names.	2025-02-24 12:55:24 +01:00
Aliaksandr Valialkin	c0b9732fc8	lib/logstorage: add a benchmark for JSONParser.ParseLogMessage	2025-02-24 12:50:49 +01:00
Aliaksandr Valialkin	bfad966e72	lib/encoding/zstd: reduce the number of cached zstd.Encoder instances Use the real compression level supported by github.com/klauspost/compress/zstd as a cache map key. The number of real compression levels is smaller than the number of zstd compression levels. This should reduce the number of cached zstd.Encoder instances. See https://github.com/klauspost/compress/discussions/1025 See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7503#issuecomment-2500088591	2025-02-24 00:45:42 +01:00
Aliaksandr Valialkin	f1eac36a80	lib/logstorage: add `contains_any` and `contains_all` filters - `contains_any` selects logs with fields containing at least one word/phrase from the provided list. The provided list can be generated by a subquery. - `contains_all` selects logs with fields containing all the words and phrases from the provided list. The provided list can be generated by a subquery.	2025-02-22 21:55:58 +01:00
Aliaksandr Valialkin	4275653f03	lib/logstorage: do not spend CPU time on preparing values for already filtered out rows according to bm at filterEqField.applyToBlockSearch	2025-02-22 21:55:58 +01:00
Aliaksandr Valialkin	1ffd5e9b69	lib/logstorage: avoid extra memory allocations at getEmptyStrings()	2025-02-22 21:55:57 +01:00
Aliaksandr Valialkin	c372e10937	lib/logstorage: add an ability to drop duplicate words at unpack_words pipe	2025-02-22 21:55:57 +01:00
Aliaksandr Valialkin	7da98b540b	lib/logstorage: rename unpack_tokens to unpack_words pipe The LogsQL defines a word at https://docs.victoriametrics.com/victorialogs/logsql/#word , so it is more natural to use unpack_words instead of unpack_tokens name for the pipe.	2025-02-22 21:55:56 +01:00
Aliaksandr Valialkin	387d0369da	lib/logstorage: optimize `OR` filter a bit for many inner filters Use two operations on bitmaps per each inner filter instead of three operations.	2025-02-22 21:55:56 +01:00
Aliaksandr Valialkin	69f02f83ae	lib/logstorage: use clear() for clearing bitmap bits at resetBits() instead of a loop The clear() call is easier to read and understand than the loop.	2025-02-22 21:55:55 +01:00
Aliaksandr Valialkin	3c1d738196	lib/logstorage: avoid calling bitmap.reset() at getBitmap() The bitmap at getBitamp() must be already reset when it was returned to the pool via putBitamp(). Thise saves CPU a bit.	2025-02-22 21:55:55 +01:00
Aliaksandr Valialkin	35e1c35281	lib/logstorage: improve error logging for improperly escaped backslashes inside quoted strings This should simplify debugging LogsQL queries by users	2025-02-22 21:55:54 +01:00
Aliaksandr Valialkin	dfcfaba374	lib/logstorage: add `field1:eq_field(field2)` filter, which returns logs with identical values at field1 and field2	2025-02-22 21:55:54 +01:00
Aliaksandr Valialkin	cd5b24b377	lib/logstorage: optimize `len`, `hash` and `json_array_len` pipes for repeated values Re-use the previous result instead of calculating new result for repated input values	2025-02-22 21:55:54 +01:00
Aliaksandr Valialkin	d33e24ab9b	lib/logstorage: add `json_array_len` pipe for calculating the length of JSON arrays	2025-02-22 21:55:53 +01:00
Aliaksandr Valialkin	cd73c1bafb	lib/logstorage: refactor unroll_tokens into unpack_tokens pipe unpack_tokens pipe generates a JSON array of unpacked tokens from the source field. This composes better with other pipes such as unroll pipe.	2025-02-22 21:55:53 +01:00
Aliaksandr Valialkin	d32c697361	lib/logstorage: add `unroll_tokens` pipe for unrolling individual word tokens from the log field	2025-02-22 21:55:52 +01:00
Aliaksandr Valialkin	1ea3f72d50	lib/logstorage: simplify usage of `top`, `uniq` and `unroll` pipes by allowing comma-separated list of fields without parens Examples: - `top 5 x, y` is equivalent to `top 5 by (x, y)` - `uniq foo, bar` is equivalent to `uniq by (foo, bar)` - `unroll foo, bar` is equivalent to `unroll (foo, bar)`	2025-02-20 22:36:09 +01:00
Aliaksandr Valialkin	31e88a692d	lib/logstorage: properly handle _time:<=max_time filter _time:<=max_time filter must include logs with timestamps matching max_time. For example, _time:<=2025-02-24Z must include logs with timestamps until the end of February 24, 2025.	2025-02-20 19:15:37 +01:00
Aliaksandr Valialkin	ffbd0ebbae	lib/logstorage: allow using '>', '>=', '<' and '<=' in '_time:...' filter Examples: _time:>=2025-02-24Z selects logs with timestamps bigger or equal to 2025-02-24 UTC _time:>1d selects logs with timestamps older than one day comparing to the current time This simplifies writing queries with _time filters. See https://docs.victoriametrics.com/victorialogs/logsql/#time-filter	2025-02-20 19:04:51 +01:00
f41gh7	855dfb324d	app/vmselect: add query resource limits priority This commit adds support for overriding vmstorage `maxUniqueTimeseries` with specific resource limits: 1. `-search.maxLabelsAPISeries` for [/api/v1/labels](https://docs.victoriametrics.com/url-examples/#apiv1labels), [/api/v1/label/.../values](https://docs.victoriametrics.com/url-examples/#apiv1labelvalues) 2. `-search. maxSeries` for [/api/v1/series](https://docs.victoriametrics.com/url-examples/#apiv1series) 3. `-search.maxTSDBStatusSeries` for [/api/v1/status/tsdb](https://docs.victoriametrics.com/#tsdb-stats) 4. `-search.maxDeleteSeries` for [/api/v1/admin/tsdb/delete_series](https://docs.victoriametrics.com/url-examples/#apiv1admintsdbdelete_series) Currently, this limit priority logic cannot be applied to flags `-search.maxFederateSeries` and `-search.maxExportSeries`, because they share the same RPC `search_v7` with the /api/v1/query and /api/v1/query_range APIs, preventing vmstorage from identifying the actual API of the request. To address that, we need to add additional information to the protocol between vmstorage and vmselect, which should be introduced in the future when possible. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7857	2025-02-19 18:18:32 +01:00
Andrii Chubatiuk	58a80a9718	app/vlinsert/syslog: properly parse log line with characters escaped by rfc5424 Inside PARAM-VALUE, the characters '"' (ABNF %d34), '\' (ABNF %d92), and ']' (ABNF %d93) MUST be escaped. This is necessary to avoid parsing errors. Escaping ']' would not strictly be necessary but is REQUIRED by this specification to avoid syslog application implementation errors. Each of these three characters MUST be escaped as '\"', '\\', and '\]' respectively. The backslash is used for control character escaping for consistency with its use for escaping in other parts of the syslog message as well as in traditional syslog. Related RFC: https://datatracker.ietf.org/doc/html/rfc5424#section-6.3.3 Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8282	2025-02-19 18:01:54 +01:00
Andrii Chubatiuk	e97c530fdc	lib/protoparser/influx: add -influx.forceStreamMode flag to force parsing all Influx data in stream mode (#8319 ) Addresses #8269 Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2025-02-19 17:05:02 +01:00
Andrii Chubatiuk	c8fc903669	lib/streamaggr: added aggregation windows (#6314 ) ### Describe Your Changes By default, stream aggregation and deduplication stores a single state per each aggregation output result. The data for each aggregator is flushed independently once per aggregation interval. But there's no guarantee that incoming samples with timestamps close to the aggregation interval's end will get into it. For example, when aggregating with `interval: 1m` a data sample with timestamp 1739473078 (18:57:59) can fall into aggregation round `18:58:00` or `18:59:00`. It depends on network lag, load, clock synchronization, etc. In most scenarios it doesn't impact aggregation or deduplication results, which are consistent within margin of error. But for metrics represented as a collection of series, like [histograms](https://docs.victoriametrics.com/keyconcepts/#histogram), such inaccuracy leads to invalid aggregation results. For this case, streaming aggregation and deduplication support mode with aggregation windows for current and previous state. With this mode, flush doesn't happen immediately but is shifted by a calculated samples lag that improves correctness for delayed data. Enabling of this mode has increased resource usage: memory usage is expected to double as aggregation will store two states instead of one. However, this significantly improves accuracy of calculations. Aggregation windows can be enabled via the following settings: - `-streamAggr.enableWindows` at [single-node VictoriaMetrics](https://docs.victoriametrics.com/single-server-victoriametrics/) and [vmagent](https://docs.victoriametrics.com/vmagent/). At [vmagent](https://docs.victoriametrics.com/vmagent/) `-remoteWrite.streamAggr.enableWindows` flag can be specified individually per each `-remoteWrite.url`. If one of these flags is set, then all aggregators will be using fixed windows. In conjunction with `-remoteWrite.streamAggr.dedupInterval` or `-streamAggr.dedupInterval` fixed aggregation windows are enabled on deduplicator as well. - `enable_windows` option in [aggregation config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config). It allows enabling aggregation windows for a specific aggregator. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2025-02-19 13:19:33 +01:00
hagen1778	38bded4e58	lib/logstorage: adjust expected compression ratio in tests A follow-up after `9bb5ba5d2f` that impacted compression ratio for data compressed with native GO zstd lib (`make test-pure`). Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-02-19 13:11:37 +01:00
Aliaksandr Valialkin	dce5eb88d3	lib/logstorage: remove optimizations from LogRows.sortFieldsInRows It has been appeared these optimizatios do not give measurable performance improvements, while they complicate the code too much and may result in slowdown when the ingested logs have different sets of fields. This is a follow-up for `630601488e`	2025-02-19 12:35:06 +01:00
Aliaksandr Valialkin	a50ab10998	lib/logstorage: return back the maximum number of files for log fields data from 256 to 128 It has been appeared that 256 files increase RAM usage too much comparing to 128 files when ingesting logs with hundreds of fields (aka wide events). So let's return back 128 files limit for now. This is a follow-up for `9bb5ba5d2f`	2025-02-19 12:35:06 +01:00
Aliaksandr Valialkin	b58e2ab214	lib/bytesutil: drop ByteBuffer.B when its capacity is bigger than 64KB at Reset There is little sense in keeping too big buffers - they just waste RAM and do not reduce the load on GC too much. So it is better dropping such buffers at Reset instead of keeping them around.	2025-02-19 12:35:06 +01:00
Aliaksandr Valialkin	659251beaa	lib/filestream: use smaller sizes for read buffers than for write buffers The number of filestream readers is proportional to the number of parts to be merged, while the number of filestream writers is proportional to the number of concurrent merges. Usually around 4-16 parts are merged at once, so the number of active filestream readers is ~8x bigger than the number of active filestream writers. So it is a good idea to use smaller size of read buffers comparing to the size of write buffers. Limit read buffer size by 64Kb, while write buffer size is limited by 128Kb. This should reduce the overall memory usage when merging parts with big number of files. This is the case for VictoriaLogs, which works with logs containing hundreds of fields (aka wide events).	2025-02-19 12:35:05 +01:00
Aliaksandr Valialkin	9bb5ba5d2f	lib/logstorage: make sure that the data for every log field is stored in a separate file until the number of files is smaller than 256 This should improve query performance for logs with hundreds of fields (aka wide events). Previously there was a high chance that the data for multiple log fields is stored in the same file. This could result in query performance slowdown and/or increased disk read IO, since the operating system could read unnecessary data for the fields, which aren't used in the query. Now log fields are guaranteed to be stored in separate files until the number of fields exceeds 256. After that multiple log fields start sharing files.	2025-02-19 01:48:14 +01:00
Aliaksandr Valialkin	2a681f2e8d	lib/filestream: reduce the maximum size of the buffered data per every stream from 512Kb to 256Kb This reduces memory usage when many filestreams are processed simultaneously. This is the case for VictoriaLogs when it processes logs with hundreds of fields.	2025-02-19 01:45:07 +01:00

1 2 3 4 5 ...

2912 commits