Commit graph

336 commits

Author SHA1 Message Date
Aliaksandr Valialkin
7a46af3920
victorialogs: add cluster mode
Cluster mode is enabled when -storageNode command-line flag is passed to VictoriaLogs.
In this mode it spreads the ingested logs among storage nodes specified in the -storageNode flag.
It also queries storage nodes during `select` queries.

Cluster mode allows building multi-level cluster setup when top-level select node can query multiple lower-level clusters
and get global querying view.

See https://docs.victoriametrics.com/victorialogs/cluster/

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5077
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7950
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8223
2025-04-10 16:55:23 +02:00
Aliaksandr Valialkin
adae788b18
lib/logstorage: pad pipeStatsProcessorShard.groupMapShards in order to avoid false sharing when merging these shards in parallel on many CPU cores 2025-04-03 22:21:18 +02:00
Aliaksandr Valialkin
a65d10fcce
lib/logstorage: add padding between hitsMap items at hitsMapAdaptive.shards in order to avoid false sharing when processing the hitsMapAdaptive.shards on multiple CPU cores 2025-04-03 20:14:20 +02:00
Dan Dascalescu
0a49d8c930
chore: minor grammar fix in error messages ()
### Describe Your Changes

`its'` -> `its`

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-03-27 10:21:52 +01:00
Aliaksandr Valialkin
baf701889a
lib/logstorage: typo fix in the comment to Storage.GetStreamFieldValues() function 2025-03-19 13:22:16 +01:00
Aliaksandr Valialkin
ea4534d154
lib/logstorage: support for {field in (*)} and {field not_in (*)} syntax in LogsQL
This is needed for https://github.com/VictoriaMetrics/victorialogs-datasource/issues/238
to be consistent with `in(*)` feature, which has been added in the commit 84d5771b41
2025-03-19 12:09:55 +01:00
Guillem Jover
76d205feae
spelling and grammar fixes via codespell ()
### Describe Your Changes

Fix many spelling errors and some grammar, including misspellings in
filenames. 

The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`.
While this is a breaking change, this metric isn't used in alerts or dashboards. 
So it seems to have low impact on users.

The change also deprecates `cspell` as it is much heavier and less usable. 
---------

Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
2025-03-17 16:32:10 +01:00
Aliaksandr Valialkin
336f954056
lib/logstorage: switch the type of LogRows.streamTagCanonicals from [][]byte to []string
This reduces the size of LogRows.streamTagCanonicals by 1/3 because of the eliminated `cap` field
in the slice header (reflect.SliceHeader) compared to the string header (reflect.StringHeader).
2025-03-17 15:02:51 +01:00
Aliaksandr Valialkin
13ff9a8ebd
lib/{mergeset,storage,logstorage}: use chunked buffer instead of bytesutil.ByteBuffer as a storage for in-memory parts
This commit adds lib/chunkedbuffer.Buffer - an in-memory chunked buffer
optimized for random access via MustReadAt() function.
It is better than bytesutil.ByteBuffer for storing large volumes of data,
since it stores the data in chunks of a fixed size (4KiB at the moment)
instead of using a contiguous memory region. This has the following benefits over bytesutil.ByteBuffer:

- reduced memory fragmentation
- reduced memory re-allocations when new data is written to the buffer
- reduced memory usage, since the allocated chunks can be re-used
  by other Buffer instances after Buffer.Reset() call

Performance tests show up to 2x memory reduction for VictoriaLogs
when ingesting logs with big number of fields (aka wide events) under high speed.
2025-03-15 20:58:33 +01:00
Aliaksandr Valialkin
73aae546e0
lib/logstorage: pre-allocate buffers for fields and rows inside block.appendRowsTo()
This reduces the number of memory re-allocations inside the loop, which copies the rows.
2025-03-15 17:18:45 +01:00
Aliaksandr Valialkin
174a6db19f
lib/logstorage: pre-allocated buffers for fields and rows inside rows.appendRows()
This should reduce the number of memory re-allocations inside the loop, which copies the rows.
2025-03-15 16:39:19 +01:00
Aliaksandr Valialkin
0e413a7efb
lib/logstorage: pre-allocate the buffer needed for marshaling a block of strings inside marshalStringsBlock
This reduces the number of memory re-allocations when appending the strings to the buffer in the loop.
2025-03-15 15:56:33 +01:00
Aliaksandr Valialkin
9769ad3a24
lib/logstorage: optimize copying dict values inside valuesDict.copyFrom a bit
Pre-allocate the needed slice of strings and then assign items to it by index
instead of appending them. This reduces the number of memory allocations
and improves performance a bit.
2025-03-15 15:32:21 +01:00
Aliaksandr Valialkin
8e773564b1
lib/logstorage: intern column names instead of cloning them during data ingestion
This reduces the number of memory allocations when ingesting logs with big number of fields (aka wide events)
2025-03-15 15:29:54 +01:00
Aliaksandr Valialkin
2c7dd2b991
lib/logstorage: support for {label in (v1,...,vN)} and {label not_in (v1, ..., vN)} syntax 2025-03-15 01:35:13 +01:00
Aliaksandr Valialkin
c60b4175bb
app/vlinsert: add an ability to ignore log fields starting with the given prefixes
The `ignore_fields` HTTTP query args can contain prefixes ending with '*'.
For example, `ignore_fields=foo.*,bar` skips all the fields starting with `foo.`
during data ingestion.
2025-03-15 00:03:02 +01:00
Aliaksandr Valialkin
f874a3aa7b
lib/logstorage: show a link to query options docs in the error message emitted during failure to parse query options
This should help figuring out and fixing the error by the user.
2025-03-15 00:03:02 +01:00
Aliaksandr Valialkin
8c079602c1
lib/logstorage: optimize handling long constant fields
Long constant fields cannot be stored in columnsHeader as a const column,
because their size exceeds maxConstColumnValueSize, so they are stored as regular values.
This commit optimizes storing such fields by storing only a single value
across the field values in a block instead of storing multiple values.
This should improve data ingestion performance a bit. This also should improve query
performance when the query accesses such fields because of better cache locality.

Also improve persisting of constant string lengths by storing them only once.
2025-03-14 03:14:01 +01:00
Aliaksandr Valialkin
974d504043
lib/logstorage: add a test for marshalUint64Block / unmarshalUint64Block 2025-03-14 03:14:00 +01:00
Aliaksandr Valialkin
c62ccf11ae
lib/logstorage: newTestLogRows: create a const column, which cannot be stored in the column header because its length exceeds maxConstColumnValueSize 2025-03-14 03:14:00 +01:00
Aliaksandr Valialkin
4d44c3e154
lib/logstorage: properly parse floating-point numbers with leading zeroes in fractional part
Parsing for floating-point numbers with leading zeroes such as 1.023, 1.00234 has been broken
in the commit ae5e28524e .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8464
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8361
2025-03-12 15:25:58 +01:00
Roman Khavronenko
63f6ac3ff8
lib/promutils: move time-related funcs from promutils to timeutil ()
Since funcs `ParseDuration` and `ParseTimeMsec` are used in vlogs,
vmalert, victoriametrics and other components, importing promutils only
for this reason makes them to export irrelevant
`vm_rows_invalid_total{type="prometheus"}` metric.

This change removes `vm_rows_invalid_total{type="prometheus"}` metric
from /metrics page for these components.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-03-03 10:25:42 +01:00
Aliaksandr Valialkin
744ac496bd
lib/logstorage: add ability to specify field name prefixes inside fields (...) lists passed to pack_json and pack_logfmt pipes 2025-02-27 22:54:18 +01:00
Aliaksandr Valialkin
84d5771b41
lib/logstorage: allow passing * at in(*), contains_any(*) and contains_all(*)
Such filters are equivalent to `match all` filter aka `*`. These filters are needed for VictoriaLogs plugin for Grafana.

See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/238#issuecomment-2685447673
2025-02-27 11:37:43 +01:00
Aliaksandr Valialkin
ae5e28524e
lib/logstorage: do not treat a string with leading zeros as a number at tryParseUint64
The "00123" string shouldn't be treated as 123 number.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8361
2025-02-25 22:10:36 +01:00
Aliaksandr Valialkin
1f75e5bb59
lib/logstorage: optimize common regex filters generated by Grafana
For example, `field:~".+"`, `field:~".*"` or `field:""`

Replace such filters to faster ones. For example, `field:~".*"` is replaced with `*`,
while `field:~".+"` is replaced with `field:*`.
2025-02-25 20:34:28 +01:00
Aliaksandr Valialkin
82cdcec6c6
lib/logstorage: run make fmt after 30974e7f3f 2025-02-25 18:37:40 +01:00
Aliaksandr Valialkin
30974e7f3f
lib/logstorage: add le_field and lt_field filters
These filters can be used for selecting logs where one field value is less than another field value.
These filter complement `<=` and `<` filters for constant literals.
2025-02-25 18:24:50 +01:00
Aliaksandr Valialkin
edc750dd55
lib/logstorage: optimize eq_filter when it is applied to fields of the same type 2025-02-25 18:24:15 +01:00
Aliaksandr Valialkin
bc69d5f1a4
lib/mergeset: explicitly pass the interval for flushing in-memory data to disk at MustOpenTable()
This allows using different intervals for flushing in-memory data among different mergeset.Table instances.

The initial user of this feature is lib/logstorage.Storage, which explicitly passes Storage.flushInterval
to every created mereset.Table instance. Previously mergeset.Table instances were using 5 seconds
flush interval, which didn't depend on the Storage.flushInterval.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
2025-02-24 15:23:50 +01:00
Aliaksandr Valialkin
6764a03fbe
lib/logstorage: properly use datadb.flushInterval as an interval between flushes for the in-memory parts
The dataFlushInterval variable has been mistakenly introduced in the commit 9dbd0f9085

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
2025-02-24 15:06:36 +01:00
Aliaksandr Valialkin
2857188939
lib/logstorage: limit the maximum log field name length, which can be generated by JSONParser.ParseLogMessage
Make sure that the maximum log field name, which can be generated by JSONParser.ParseLogMessage,
doesn't exceed the hardcoded limit maxFieldNameSize. Stop flattening of nested JSON objects
when the resulting field name becomes longer than maxFieldNameSize, and return the nested JSON object
as a string instead.

This should prevent from parse errors when ingesting deeply nested JSON logs with long field names.
2025-02-24 12:55:24 +01:00
Aliaksandr Valialkin
c0b9732fc8
lib/logstorage: add a benchmark for JSONParser.ParseLogMessage 2025-02-24 12:50:49 +01:00
Aliaksandr Valialkin
f1eac36a80
lib/logstorage: add contains_any and contains_all filters
- `contains_any` selects logs with fields containing at least one word/phrase from the provided list.
  The provided list can be generated by a subquery.

- `contains_all` selects logs with fields containing all the words and phrases from the provided list.
  The provided list can be generated by a subquery.
2025-02-22 21:55:58 +01:00
Aliaksandr Valialkin
4275653f03
lib/logstorage: do not spend CPU time on preparing values for already filtered out rows according to bm at filterEqField.applyToBlockSearch 2025-02-22 21:55:58 +01:00
Aliaksandr Valialkin
1ffd5e9b69
lib/logstorage: avoid extra memory allocations at getEmptyStrings() 2025-02-22 21:55:57 +01:00
Aliaksandr Valialkin
c372e10937
lib/logstorage: add an ability to drop duplicate words at unpack_words pipe 2025-02-22 21:55:57 +01:00
Aliaksandr Valialkin
7da98b540b
lib/logstorage: rename unpack_tokens to unpack_words pipe
The LogsQL defines a word at https://docs.victoriametrics.com/victorialogs/logsql/#word ,
so it is more natural to use unpack_words instead of unpack_tokens name for the pipe.
2025-02-22 21:55:56 +01:00
Aliaksandr Valialkin
387d0369da
lib/logstorage: optimize OR filter a bit for many inner filters
Use two operations on bitmaps per each inner filter instead of three operations.
2025-02-22 21:55:56 +01:00
Aliaksandr Valialkin
69f02f83ae
lib/logstorage: use clear() for clearing bitmap bits at resetBits() instead of a loop
The clear() call is easier to read and understand than the loop.
2025-02-22 21:55:55 +01:00
Aliaksandr Valialkin
3c1d738196
lib/logstorage: avoid calling bitmap.reset() at getBitmap()
The bitmap at getBitamp() must be already reset when it was returned to the pool via putBitamp().
Thise saves CPU a bit.
2025-02-22 21:55:55 +01:00
Aliaksandr Valialkin
35e1c35281
lib/logstorage: improve error logging for improperly escaped backslashes inside quoted strings
This should simplify debugging LogsQL queries by users
2025-02-22 21:55:54 +01:00
Aliaksandr Valialkin
dfcfaba374
lib/logstorage: add field1:eq_field(field2) filter, which returns logs with identical values at field1 and field2 2025-02-22 21:55:54 +01:00
Aliaksandr Valialkin
cd5b24b377
lib/logstorage: optimize len, hash and json_array_len pipes for repeated values
Re-use the previous result instead of calculating new result for repated input values
2025-02-22 21:55:54 +01:00
Aliaksandr Valialkin
d33e24ab9b
lib/logstorage: add json_array_len pipe for calculating the length of JSON arrays 2025-02-22 21:55:53 +01:00
Aliaksandr Valialkin
cd73c1bafb
lib/logstorage: refactor unroll_tokens into unpack_tokens pipe
unpack_tokens pipe generates a JSON array of unpacked tokens from the source field.
This composes better with other pipes such as unroll pipe.
2025-02-22 21:55:53 +01:00
Aliaksandr Valialkin
d32c697361
lib/logstorage: add unroll_tokens pipe for unrolling individual word tokens from the log field 2025-02-22 21:55:52 +01:00
Aliaksandr Valialkin
1ea3f72d50
lib/logstorage: simplify usage of top, uniq and unroll pipes by allowing comma-separated list of fields without parens
Examples:

   - `top 5 x, y` is equivalent to `top 5 by (x, y)`
   - `uniq foo, bar` is equivalent to `uniq by (foo, bar)`
   - `unroll foo, bar` is equivalent to `unroll (foo, bar)`
2025-02-20 22:36:09 +01:00
Aliaksandr Valialkin
31e88a692d
lib/logstorage: properly handle _time:<=max_time filter
_time:<=max_time filter must include logs with timestamps matching max_time.
For example, _time:<=2025-02-24Z must include logs with timestamps until the end of February 24, 2025.
2025-02-20 19:15:37 +01:00
Aliaksandr Valialkin
ffbd0ebbae
lib/logstorage: allow using '>', '>=', '<' and '<=' in '_time:...' filter
Examples:

  _time:>=2025-02-24Z selects logs with timestamps bigger or equal to 2025-02-24 UTC
  _time:>1d selects logs with timestamps older than one day comparing to the current time

This simplifies writing queries with _time filters.
See https://docs.victoriametrics.com/victorialogs/logsql/#time-filter
2025-02-20 19:04:51 +01:00