github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-11 15:34:56 +00:00

Author	SHA1	Message	Date
Roman Khavronenko	d5d143f849	lib/promutils: move time-related funcs from `promutils` to `timeutil` (#8403 ) Since funcs `ParseDuration` and `ParseTimeMsec` are used in vlogs, vmalert, victoriametrics and other components, importing promutils only for this reason makes them to export irrelevant `vm_rows_invalid_total{type="prometheus"}` metric. This change removes `vm_rows_invalid_total{type="prometheus"}` metric from /metrics page for these components. ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `63f6ac3ff8`)	2025-03-03 10:28:07 +01:00
Aliaksandr Valialkin	c8a12435ec	lib/logstorage: add ability to specify field name prefixes inside `fields (...)` lists passed to `pack_json` and `pack_logfmt` pipes	2025-02-27 22:56:14 +01:00
Aliaksandr Valialkin	a1aa4b7aa9	lib/logstorage: allow passing `` at `in()`, `contains_any()` and `contains_all()` Such filters are equivalent to `match all` filter aka `*`. These filters are needed for VictoriaLogs plugin for Grafana. See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/238#issuecomment-2685447673	2025-02-27 11:41:39 +01:00
Aliaksandr Valialkin	a3ff49def0	lib/logstorage: do not treat a string with leading zeros as a number at tryParseUint64 The "00123" string shouldn't be treated as 123 number. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8361	2025-02-26 16:07:47 +01:00
Aliaksandr Valialkin	dd1c0e3bb7	lib/logstorage: optimize common regex filters generated by Grafana For example, `field:~".+"`, `field:~"."` or `field:""` Replace such filters to faster ones. For example, `field:~"."` is replaced with ``, while `field:~".+"` is replaced with `field:`.	2025-02-25 20:35:04 +01:00
Aliaksandr Valialkin	14a5ccdc83	lib/logstorage: run `make fmt` after `30974e7f3f` (cherry picked from commit `82cdcec6c6`)	2025-02-25 19:13:31 +01:00
Aliaksandr Valialkin	9e0581533c	lib/logstorage: add `le_field` and `lt_field` filters These filters can be used for selecting logs where one field value is less than another field value. These filter complement `<=` and `<` filters for constant literals. (cherry picked from commit `30974e7f3f`)	2025-02-25 19:13:31 +01:00
Aliaksandr Valialkin	3bc89226bb	lib/logstorage: optimize eq_filter when it is applied to fields of the same type (cherry picked from commit `edc750dd55`)	2025-02-25 19:13:30 +01:00
Aliaksandr Valialkin	dc09d0bff4	lib/mergeset: explicitly pass the interval for flushing in-memory data to disk at MustOpenTable() This allows using different intervals for flushing in-memory data among different mergeset.Table instances. The initial user of this feature is lib/logstorage.Storage, which explicitly passes Storage.flushInterval to every created mereset.Table instance. Previously mergeset.Table instances were using 5 seconds flush interval, which didn't depend on the Storage.flushInterval. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775	2025-02-24 15:34:59 +01:00
Aliaksandr Valialkin	a964cc7a0c	lib/logstorage: properly use datadb.flushInterval as an interval between flushes for the in-memory parts The dataFlushInterval variable has been mistakenly introduced in the commit `9dbd0f9085` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775	2025-02-24 15:34:59 +01:00
Aliaksandr Valialkin	d56f9327ec	lib/logstorage: limit the maximum log field name length, which can be generated by JSONParser.ParseLogMessage Make sure that the maximum log field name, which can be generated by JSONParser.ParseLogMessage, doesn't exceed the hardcoded limit maxFieldNameSize. Stop flattening of nested JSON objects when the resulting field name becomes longer than maxFieldNameSize, and return the nested JSON object as a string instead. This should prevent from parse errors when ingesting deeply nested JSON logs with long field names.	2025-02-24 15:34:59 +01:00
Aliaksandr Valialkin	dc536d5626	lib/logstorage: add a benchmark for JSONParser.ParseLogMessage	2025-02-24 15:34:58 +01:00
Aliaksandr Valialkin	3ee4b3ef24	lib/logstorage: add `contains_any` and `contains_all` filters - `contains_any` selects logs with fields containing at least one word/phrase from the provided list. The provided list can be generated by a subquery. - `contains_all` selects logs with fields containing all the words and phrases from the provided list. The provided list can be generated by a subquery.	2025-02-24 15:34:58 +01:00
Aliaksandr Valialkin	3e941920f6	lib/logstorage: do not spend CPU time on preparing values for already filtered out rows according to bm at filterEqField.applyToBlockSearch	2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin	6975352d5a	lib/logstorage: avoid extra memory allocations at getEmptyStrings()	2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin	a2d0846e86	lib/logstorage: add an ability to drop duplicate words at unpack_words pipe	2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin	518ed87a3a	lib/logstorage: rename unpack_tokens to unpack_words pipe The LogsQL defines a word at https://docs.victoriametrics.com/victorialogs/logsql/#word , so it is more natural to use unpack_words instead of unpack_tokens name for the pipe.	2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin	4beceb67ab	lib/logstorage: optimize `OR` filter a bit for many inner filters Use two operations on bitmaps per each inner filter instead of three operations.	2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin	bff5551ba5	lib/logstorage: use clear() for clearing bitmap bits at resetBits() instead of a loop The clear() call is easier to read and understand than the loop.	2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin	4dfd1407ba	lib/logstorage: avoid calling bitmap.reset() at getBitmap() The bitmap at getBitamp() must be already reset when it was returned to the pool via putBitamp(). Thise saves CPU a bit.	2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin	bc3e557f02	lib/logstorage: improve error logging for improperly escaped backslashes inside quoted strings This should simplify debugging LogsQL queries by users	2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin	1f11bc948e	lib/logstorage: add `field1:eq_field(field2)` filter, which returns logs with identical values at field1 and field2	2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin	504c034cbf	lib/logstorage: optimize `len`, `hash` and `json_array_len` pipes for repeated values Re-use the previous result instead of calculating new result for repated input values	2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin	959282090a	lib/logstorage: add `json_array_len` pipe for calculating the length of JSON arrays	2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin	aef939dc20	lib/logstorage: refactor unroll_tokens into unpack_tokens pipe unpack_tokens pipe generates a JSON array of unpacked tokens from the source field. This composes better with other pipes such as unroll pipe.	2025-02-24 15:34:55 +01:00
Aliaksandr Valialkin	afd74d82db	lib/logstorage: add `unroll_tokens` pipe for unrolling individual word tokens from the log field	2025-02-24 15:34:55 +01:00
Aliaksandr Valialkin	2dfd6bb689	lib/logstorage: simplify usage of `top`, `uniq` and `unroll` pipes by allowing comma-separated list of fields without parens Examples: - `top 5 x, y` is equivalent to `top 5 by (x, y)` - `uniq foo, bar` is equivalent to `uniq by (foo, bar)` - `unroll foo, bar` is equivalent to `unroll (foo, bar)`	2025-02-21 12:43:26 +01:00
Aliaksandr Valialkin	061fd098b5	lib/logstorage: properly handle _time:<=max_time filter _time:<=max_time filter must include logs with timestamps matching max_time. For example, _time:<=2025-02-24Z must include logs with timestamps until the end of February 24, 2025.	2025-02-21 12:43:26 +01:00
Aliaksandr Valialkin	80d173471f	lib/logstorage: allow using '>', '>=', '<' and '<=' in '_time:...' filter Examples: _time:>=2025-02-24Z selects logs with timestamps bigger or equal to 2025-02-24 UTC _time:>1d selects logs with timestamps older than one day comparing to the current time This simplifies writing queries with _time filters. See https://docs.victoriametrics.com/victorialogs/logsql/#time-filter	2025-02-21 12:43:26 +01:00
Andrii Chubatiuk	94bf90842a	app/vlinsert/syslog: properly parse log line with characters escaped by rfc5424 Inside PARAM-VALUE, the characters '"' (ABNF %d34), '\' (ABNF %d92), and ']' (ABNF %d93) MUST be escaped. This is necessary to avoid parsing errors. Escaping ']' would not strictly be necessary but is REQUIRED by this specification to avoid syslog application implementation errors. Each of these three characters MUST be escaped as '\"', '\\', and '\]' respectively. The backslash is used for control character escaping for consistency with its use for escaping in other parts of the syslog message as well as in traditional syslog. Related RFC: https://datatracker.ietf.org/doc/html/rfc5424#section-6.3.3 Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8282	2025-02-19 18:12:40 +01:00
hagen1778	bb302df170	lib/logstorage: adjust expected compression ratio in tests A follow-up after `9bb5ba5d2f` that impacted compression ratio for data compressed with native GO zstd lib (`make test-pure`). Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `38bded4e58`)	2025-02-19 13:30:05 +01:00
Aliaksandr Valialkin	697b775a46	lib/logstorage: remove optimizations from LogRows.sortFieldsInRows It has been appeared these optimizatios do not give measurable performance improvements, while they complicate the code too much and may result in slowdown when the ingested logs have different sets of fields. This is a follow-up for `630601488e` (cherry picked from commit `dce5eb88d3`)	2025-02-19 13:30:04 +01:00
Aliaksandr Valialkin	d0d9fb2818	lib/logstorage: return back the maximum number of files for log fields data from 256 to 128 It has been appeared that 256 files increase RAM usage too much comparing to 128 files when ingesting logs with hundreds of fields (aka wide events). So let's return back 128 files limit for now. This is a follow-up for `9bb5ba5d2f` (cherry picked from commit `a50ab10998`)	2025-02-19 13:30:04 +01:00
Aliaksandr Valialkin	a842114070	lib/logstorage: make sure that the data for every log field is stored in a separate file until the number of files is smaller than 256 This should improve query performance for logs with hundreds of fields (aka wide events). Previously there was a high chance that the data for multiple log fields is stored in the same file. This could result in query performance slowdown and/or increased disk read IO, since the operating system could read unnecessary data for the fields, which aren't used in the query. Now log fields are guaranteed to be stored in separate files until the number of fields exceeds 256. After that multiple log fields start sharing files. (cherry picked from commit `9bb5ba5d2f`)	2025-02-19 13:30:02 +01:00
Aliaksandr Valialkin	6a590de86f	lib/logstorage: LogRows.mustAddInternal a bit - Re-use column names and values from the previously added rows if possible. This increases locality of reference for field names and values, while improving access speed for the field names and values. - Postpone sorting fields in the added rows until creating inmemory part from them. This allows optimizing the sorting for log fields with the same set of fields. This is usually the case for logs, which belong to the same logs stream. (cherry picked from commit `630601488e`)	2025-02-19 13:30:02 +01:00
Aliaksandr Valialkin	893241b280	lib/logstorage: log the path to metadata file on errors at partHeader.mustReadMetadata This should simplify troubleshooting (cherry picked from commit `f4ca5d3b1a`)	2025-02-19 13:30:01 +01:00
Aliaksandr Valialkin	00d8e7a373	lib/logstorage: allow calling visitSubqueries on nil Query This makes the code, which calls Query.visitSubquery, less error prone (cherry picked from commit `910f307ca2`)	2025-02-19 13:30:01 +01:00
Aliaksandr Valialkin	3ba095a875	lib/logstorage: remove needExecuteQuery from filterIn and filterStreamID, since it isn't needed (cherry picked from commit `6afd66dcc8`)	2025-02-19 13:30:01 +01:00
Aliaksandr Valialkin	88363b46b5	lib/logstorage: consistently use Query.cloneShallow() for shallow cloning of the original query	2025-02-17 15:36:38 +01:00
Aliaksandr Valialkin	5e4b5f9969	lib/logstorage: move common code for parsing a query inside parens into a separate function	2025-02-17 15:36:37 +01:00
Aliaksandr Valialkin	6155b85a13	lib/logstorage: make sure that chunkedAllocator is isn't used from concurrently running goroutines This is needed in order to avoid data races	2025-02-17 15:36:37 +01:00
Aliaksandr Valialkin	7458aa392a	lib/logstorage: ensure that statsProcessor.updateStatsForAllRows() is called on non-empty blockResult This eliminates a class of potential bugs with incorrect stats calculations when an additional filter is applied to the blockResult before passing it to the stats function, and this filter removes all the rows from blockResult.	2025-02-17 15:36:37 +01:00
Aliaksandr Valialkin	71636e922a	lib/logstorage: properly initialize minValue and maxValue at pipeLenProcessorShard and pipeHashProcessorShard Previously this could result in incorrect 0 result of min() stats function applied to the len() results. This is a follow-up for `eddeccfcfb`	2025-02-17 15:36:36 +01:00
Roman Khavronenko	c1861bdf8b	bump golangci-lint to v1.64.4 See https://github.com/golangci/golangci-lint/releases/tag/v1.64.4 * address linting errors Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-02-13 11:18:09 +01:00
Aliaksandr Valialkin	59e9426068	lib/logstorage: attempt to use int64 bucketing before trying float64 bucketing at blockResult.getbucketedValue() int64 bucketing is lossless and faster than float64 bucketing, so it is preferred over float64 bucketing	2025-02-13 00:02:20 +01:00
Aliaksandr Valialkin	7b38f7b5ef	lib/logstorage: refactor bucketing code 1. Use distinct code paths for blockResult.getValues() and blockResult.getValuesBucketed(). This should simplify debugging and maintenance of the resulting code. 2. Do not load column values if all the values in the block fit the same bucket. Use blockResultColumn.minValue and blockResultColumn.maxValue for determining whether column values must be loaded via blockResultColumn.getValuesEncoded(). This signiciantly improves performance for big buckets, which cover all the column values in a block. 3. Properly calculate buckets for negative values. 4. Properly adjust weekly buckets by Monday.	2025-02-12 21:47:46 +01:00
Aliaksandr Valialkin	8d76c1c2c0	lib/logstorage: improve performance of `stats by (...)` bucketing a bit	2025-02-12 03:26:16 +01:00
Aliaksandr Valialkin	c6b3899c86	lib/logstorage/pipe_sort_topk.go: do not read _time field values if they aren't referred in the `sort by(...)` This improves performance for queries, which use `sort by (...) limit N` without mentioning _time field. For example, the following query must work faster now _time:1d \| rm _time \| sort by (request_duration desc) limit 10 (cherry picked from commit `422caf6bd7`)	2025-02-11 23:02:22 +01:00
Aliaksandr Valialkin	22591df851	lib/logstorage/block_result.go: remove misleading comment left after the commit `eddeccfcfb` (cherry picked from commit `33c55d7a22`)	2025-02-11 23:02:21 +01:00
Aliaksandr Valialkin	404901d7e8	lib/logstorage: optimize parsing timezone offset at TryParseTimestampRFC3339Nano() - Add a fast path for timestamps ending with 'Z' - Use strings.LastIndexAny instead of strings.IndexAny for searching for timezone offset at the end of the string. This works faster for timestamps with sub-second precision. (cherry picked from commit `335071cf3d`)	2025-02-11 23:02:21 +01:00

1 2 3 4 5 ...

315 commits