Commit graph

315 commits

Author SHA1 Message Date
Roman Khavronenko
d5d143f849
lib/promutils: move time-related funcs from promutils to timeutil (#8403)
Since funcs `ParseDuration` and `ParseTimeMsec` are used in vlogs,
vmalert, victoriametrics and other components, importing promutils only
for this reason makes them to export irrelevant
`vm_rows_invalid_total{type="prometheus"}` metric.

This change removes `vm_rows_invalid_total{type="prometheus"}` metric
from /metrics page for these components.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 63f6ac3ff8)
2025-03-03 10:28:07 +01:00
Aliaksandr Valialkin
c8a12435ec
lib/logstorage: add ability to specify field name prefixes inside fields (...) lists passed to pack_json and pack_logfmt pipes 2025-02-27 22:56:14 +01:00
Aliaksandr Valialkin
a1aa4b7aa9
lib/logstorage: allow passing * at in(*), contains_any(*) and contains_all(*)
Such filters are equivalent to `match all` filter aka `*`. These filters are needed for VictoriaLogs plugin for Grafana.

See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/238#issuecomment-2685447673
2025-02-27 11:41:39 +01:00
Aliaksandr Valialkin
a3ff49def0
lib/logstorage: do not treat a string with leading zeros as a number at tryParseUint64
The "00123" string shouldn't be treated as 123 number.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8361
2025-02-26 16:07:47 +01:00
Aliaksandr Valialkin
dd1c0e3bb7
lib/logstorage: optimize common regex filters generated by Grafana
For example, `field:~".+"`, `field:~".*"` or `field:""`

Replace such filters to faster ones. For example, `field:~".*"` is replaced with `*`,
while `field:~".+"` is replaced with `field:*`.
2025-02-25 20:35:04 +01:00
Aliaksandr Valialkin
14a5ccdc83
lib/logstorage: run make fmt after 30974e7f3f
(cherry picked from commit 82cdcec6c6)
2025-02-25 19:13:31 +01:00
Aliaksandr Valialkin
9e0581533c
lib/logstorage: add le_field and lt_field filters
These filters can be used for selecting logs where one field value is less than another field value.
These filter complement `<=` and `<` filters for constant literals.

(cherry picked from commit 30974e7f3f)
2025-02-25 19:13:31 +01:00
Aliaksandr Valialkin
3bc89226bb
lib/logstorage: optimize eq_filter when it is applied to fields of the same type
(cherry picked from commit edc750dd55)
2025-02-25 19:13:30 +01:00
Aliaksandr Valialkin
dc09d0bff4
lib/mergeset: explicitly pass the interval for flushing in-memory data to disk at MustOpenTable()
This allows using different intervals for flushing in-memory data among different mergeset.Table instances.

The initial user of this feature is lib/logstorage.Storage, which explicitly passes Storage.flushInterval
to every created mereset.Table instance. Previously mergeset.Table instances were using 5 seconds
flush interval, which didn't depend on the Storage.flushInterval.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
2025-02-24 15:34:59 +01:00
Aliaksandr Valialkin
a964cc7a0c
lib/logstorage: properly use datadb.flushInterval as an interval between flushes for the in-memory parts
The dataFlushInterval variable has been mistakenly introduced in the commit 9dbd0f9085

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
2025-02-24 15:34:59 +01:00
Aliaksandr Valialkin
d56f9327ec
lib/logstorage: limit the maximum log field name length, which can be generated by JSONParser.ParseLogMessage
Make sure that the maximum log field name, which can be generated by JSONParser.ParseLogMessage,
doesn't exceed the hardcoded limit maxFieldNameSize. Stop flattening of nested JSON objects
when the resulting field name becomes longer than maxFieldNameSize, and return the nested JSON object
as a string instead.

This should prevent from parse errors when ingesting deeply nested JSON logs with long field names.
2025-02-24 15:34:59 +01:00
Aliaksandr Valialkin
dc536d5626
lib/logstorage: add a benchmark for JSONParser.ParseLogMessage 2025-02-24 15:34:58 +01:00
Aliaksandr Valialkin
3ee4b3ef24
lib/logstorage: add contains_any and contains_all filters
- `contains_any` selects logs with fields containing at least one word/phrase from the provided list.
  The provided list can be generated by a subquery.

- `contains_all` selects logs with fields containing all the words and phrases from the provided list.
  The provided list can be generated by a subquery.
2025-02-24 15:34:58 +01:00
Aliaksandr Valialkin
3e941920f6
lib/logstorage: do not spend CPU time on preparing values for already filtered out rows according to bm at filterEqField.applyToBlockSearch 2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin
6975352d5a
lib/logstorage: avoid extra memory allocations at getEmptyStrings() 2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin
a2d0846e86
lib/logstorage: add an ability to drop duplicate words at unpack_words pipe 2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin
518ed87a3a
lib/logstorage: rename unpack_tokens to unpack_words pipe
The LogsQL defines a word at https://docs.victoriametrics.com/victorialogs/logsql/#word ,
so it is more natural to use unpack_words instead of unpack_tokens name for the pipe.
2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin
4beceb67ab
lib/logstorage: optimize OR filter a bit for many inner filters
Use two operations on bitmaps per each inner filter instead of three operations.
2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin
bff5551ba5
lib/logstorage: use clear() for clearing bitmap bits at resetBits() instead of a loop
The clear() call is easier to read and understand than the loop.
2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin
4dfd1407ba
lib/logstorage: avoid calling bitmap.reset() at getBitmap()
The bitmap at getBitamp() must be already reset when it was returned to the pool via putBitamp().
Thise saves CPU a bit.
2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin
bc3e557f02
lib/logstorage: improve error logging for improperly escaped backslashes inside quoted strings
This should simplify debugging LogsQL queries by users
2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin
1f11bc948e
lib/logstorage: add field1:eq_field(field2) filter, which returns logs with identical values at field1 and field2 2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin
504c034cbf
lib/logstorage: optimize len, hash and json_array_len pipes for repeated values
Re-use the previous result instead of calculating new result for repated input values
2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin
959282090a
lib/logstorage: add json_array_len pipe for calculating the length of JSON arrays 2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin
aef939dc20
lib/logstorage: refactor unroll_tokens into unpack_tokens pipe
unpack_tokens pipe generates a JSON array of unpacked tokens from the source field.
This composes better with other pipes such as unroll pipe.
2025-02-24 15:34:55 +01:00
Aliaksandr Valialkin
afd74d82db
lib/logstorage: add unroll_tokens pipe for unrolling individual word tokens from the log field 2025-02-24 15:34:55 +01:00
Aliaksandr Valialkin
2dfd6bb689
lib/logstorage: simplify usage of top, uniq and unroll pipes by allowing comma-separated list of fields without parens
Examples:

   - `top 5 x, y` is equivalent to `top 5 by (x, y)`
   - `uniq foo, bar` is equivalent to `uniq by (foo, bar)`
   - `unroll foo, bar` is equivalent to `unroll (foo, bar)`
2025-02-21 12:43:26 +01:00
Aliaksandr Valialkin
061fd098b5
lib/logstorage: properly handle _time:<=max_time filter
_time:<=max_time filter must include logs with timestamps matching max_time.
For example, _time:<=2025-02-24Z must include logs with timestamps until the end of February 24, 2025.
2025-02-21 12:43:26 +01:00
Aliaksandr Valialkin
80d173471f
lib/logstorage: allow using '>', '>=', '<' and '<=' in '_time:...' filter
Examples:

  _time:>=2025-02-24Z selects logs with timestamps bigger or equal to 2025-02-24 UTC
  _time:>1d selects logs with timestamps older than one day comparing to the current time

This simplifies writing queries with _time filters.
See https://docs.victoriametrics.com/victorialogs/logsql/#time-filter
2025-02-21 12:43:26 +01:00
Andrii Chubatiuk
94bf90842a
app/vlinsert/syslog: properly parse log line with characters escaped by rfc5424
Inside PARAM-VALUE, the characters '"' (ABNF %d34), '\' (ABNF %d92),
and ']' (ABNF %d93) MUST be escaped.  This is necessary to avoid
parsing errors.  Escaping ']' would not strictly be necessary but is
REQUIRED by this specification to avoid syslog application
implementation errors.  Each of these three characters MUST be
escaped as '\"', '\\', and '\]' respectively.  The backslash is used
for control character escaping for consistency with its use for
escaping in other parts of the syslog message as well as in traditional syslog.

 Related RFC:
https://datatracker.ietf.org/doc/html/rfc5424#section-6.3.3

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8282
2025-02-19 18:12:40 +01:00
hagen1778
bb302df170
lib/logstorage: adjust expected compression ratio in tests
A follow-up after 9bb5ba5d2f
that impacted compression ratio for data compressed with native GO zstd lib (`make test-pure`).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 38bded4e58)
2025-02-19 13:30:05 +01:00
Aliaksandr Valialkin
697b775a46
lib/logstorage: remove optimizations from LogRows.sortFieldsInRows
It has been appeared these optimizatios do not give measurable performance improvements,
while they complicate the code too much and may result in slowdown when the ingested logs have
different sets of fields.

This is a follow-up for 630601488e

(cherry picked from commit dce5eb88d3)
2025-02-19 13:30:04 +01:00
Aliaksandr Valialkin
d0d9fb2818
lib/logstorage: return back the maximum number of files for log fields data from 256 to 128
It has been appeared that 256 files increase RAM usage too much comparing to 128 files
when ingesting logs with hundreds of fields (aka wide events). So let's return back 128 files
limit for now.

This is a follow-up for 9bb5ba5d2f

(cherry picked from commit a50ab10998)
2025-02-19 13:30:04 +01:00
Aliaksandr Valialkin
a842114070
lib/logstorage: make sure that the data for every log field is stored in a separate file until the number of files is smaller than 256
This should improve query performance for logs with hundreds of fields (aka wide events).
Previously there was a high chance that the data for multiple log fields is stored in the same file.
This could result in query performance slowdown and/or increased disk read IO,
since the operating system could read unnecessary data for the fields, which aren't used in the query.

Now log fields are guaranteed to be stored in separate files until the number of fields exceeds 256.
After that multiple log fields start sharing files.

(cherry picked from commit 9bb5ba5d2f)
2025-02-19 13:30:02 +01:00
Aliaksandr Valialkin
6a590de86f
lib/logstorage: LogRows.mustAddInternal a bit
- Re-use column names and values from the previously added rows if possible.
  This increases locality of reference for field names and values, while improving
  access speed for the field names and values.

- Postpone sorting fields in the added rows until creating inmemory part from them.
  This allows optimizing the sorting for log fields with the same set of fields.
  This is usually the case for logs, which belong to the same logs stream.

(cherry picked from commit 630601488e)
2025-02-19 13:30:02 +01:00
Aliaksandr Valialkin
893241b280
lib/logstorage: log the path to metadata file on errors at partHeader.mustReadMetadata
This should simplify troubleshooting

(cherry picked from commit f4ca5d3b1a)
2025-02-19 13:30:01 +01:00
Aliaksandr Valialkin
00d8e7a373
lib/logstorage: allow calling visitSubqueries on nil Query
This makes the code, which calls Query.visitSubquery, less error prone

(cherry picked from commit 910f307ca2)
2025-02-19 13:30:01 +01:00
Aliaksandr Valialkin
3ba095a875
lib/logstorage: remove needExecuteQuery from filterIn and filterStreamID, since it isn't needed
(cherry picked from commit 6afd66dcc8)
2025-02-19 13:30:01 +01:00
Aliaksandr Valialkin
88363b46b5
lib/logstorage: consistently use Query.cloneShallow() for shallow cloning of the original query 2025-02-17 15:36:38 +01:00
Aliaksandr Valialkin
5e4b5f9969
lib/logstorage: move common code for parsing a query inside parens into a separate function 2025-02-17 15:36:37 +01:00
Aliaksandr Valialkin
6155b85a13
lib/logstorage: make sure that chunkedAllocator is isn't used from concurrently running goroutines
This is needed in order to avoid data races
2025-02-17 15:36:37 +01:00
Aliaksandr Valialkin
7458aa392a
lib/logstorage: ensure that statsProcessor.updateStatsForAllRows() is called on non-empty blockResult
This eliminates a class of potential bugs with incorrect stats calculations when an additional filter
is applied to the blockResult before passing it to the stats function, and this filter removes
all the rows from blockResult.
2025-02-17 15:36:37 +01:00
Aliaksandr Valialkin
71636e922a
lib/logstorage: properly initialize minValue and maxValue at pipeLenProcessorShard and pipeHashProcessorShard
Previously this could result in incorrect 0 result of min() stats function applied to the len() results.

This is a follow-up for eddeccfcfb
2025-02-17 15:36:36 +01:00
Roman Khavronenko
c1861bdf8b
bump golangci-lint to v1.64.4
See https://github.com/golangci/golangci-lint/releases/tag/v1.64.4

* address linting errors

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-02-13 11:18:09 +01:00
Aliaksandr Valialkin
59e9426068
lib/logstorage: attempt to use int64 bucketing before trying float64 bucketing at blockResult.getbucketedValue()
int64 bucketing is lossless and faster than float64 bucketing, so it is preferred over float64 bucketing
2025-02-13 00:02:20 +01:00
Aliaksandr Valialkin
7b38f7b5ef
lib/logstorage: refactor bucketing code
1. Use distinct code paths for blockResult.getValues() and blockResult.getValuesBucketed().
   This should simplify debugging and maintenance of the resulting code.

2. Do not load column values if all the values in the block fit the same bucket.
   Use blockResultColumn.minValue and blockResultColumn.maxValue for determining whether
   column values must be loaded via blockResultColumn.getValuesEncoded().
   This signiciantly improves performance for big buckets, which cover all the column
   values in a block.

3. Properly calculate buckets for negative values.

4. Properly adjust weekly buckets by Monday.
2025-02-12 21:47:46 +01:00
Aliaksandr Valialkin
8d76c1c2c0
lib/logstorage: improve performance of stats by (...) bucketing a bit 2025-02-12 03:26:16 +01:00
Aliaksandr Valialkin
c6b3899c86
lib/logstorage/pipe_sort_topk.go: do not read _time field values if they aren't referred in the sort by(...)
This improves performance for queries, which use `sort by (...) limit N` without mentioning _time field.
For example, the following query must work faster now

    _time:1d | rm _time | sort by (request_duration desc) limit 10

(cherry picked from commit 422caf6bd7)
2025-02-11 23:02:22 +01:00
Aliaksandr Valialkin
22591df851
lib/logstorage/block_result.go: remove misleading comment left after the commit eddeccfcfb
(cherry picked from commit 33c55d7a22)
2025-02-11 23:02:21 +01:00
Aliaksandr Valialkin
404901d7e8
lib/logstorage: optimize parsing timezone offset at TryParseTimestampRFC3339Nano()
- Add a fast path for timestamps ending with 'Z'
- Use strings.LastIndexAny instead of strings.IndexAny for searching
  for timezone offset at the end of the string. This works faster
  for timestamps with sub-second precision.

(cherry picked from commit 335071cf3d)
2025-02-11 23:02:21 +01:00