Commit graph

2970 commits

Author SHA1 Message Date
Zakhar Bessarab
04b6939c34
lib/promrelabel/scrape_url: properly parse IPv6 address from __address__ label
Fix parsing of IPv6 addresses after discovery. Previously, it could lead
to target being discovered and discarded afterwards.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8374

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
(cherry picked from commit 99de272b72)
2025-02-28 14:20:24 +04:00
Aliaksandr Valialkin
c8a12435ec
lib/logstorage: add ability to specify field name prefixes inside fields (...) lists passed to pack_json and pack_logfmt pipes 2025-02-27 22:56:14 +01:00
Roman Khavronenko
3ec0247ee3
lib/prompbmarshal: move MustParsePromMetrics to protoparser/prometheus (#8405)
`MustParsePromMetrics` imports `lib/protoparser/prometheus`, and this
package exposes the following metrics:
```
vm_protoparser_rows_read_total{type="promscrape"}
vm_rows_invalid_total{type="prometheus"}
```

It means every package that uses `lib/prompbmarshal` will start exposing
these metrics. For example, vlogs imports `lib/protoparser/common` which
uses `lib/prompbmarshal.Label`. And only because of this vlogs starts
exposing unrelated prometheus metrics on /metrics page.

Moving `MustParsePromMetrics` to `lib/protoparser/prometheus` seems like
the leas intrusive change.


-----------

Depends on another change
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8403

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-02-27 22:55:32 +01:00
Aliaksandr Valialkin
a1aa4b7aa9
lib/logstorage: allow passing * at in(*), contains_any(*) and contains_all(*)
Such filters are equivalent to `match all` filter aka `*`. These filters are needed for VictoriaLogs plugin for Grafana.

See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/238#issuecomment-2685447673
2025-02-27 11:41:39 +01:00
Zhu Jiekun
6631899ead
lib/storage: properly cache extDB metricsID on search error
Previously, if indexDB search failed for some reason during search at previous indexDB (aka extDB), VictoriaMetrics stored empty search result at cache. It could cause incorrect search results at subsequent requests.

 This commit checks search error and stores request results only on success. 

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8345
2025-02-26 16:07:48 +01:00
Aliaksandr Valialkin
a3ff49def0
lib/logstorage: do not treat a string with leading zeros as a number at tryParseUint64
The "00123" string shouldn't be treated as 123 number.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8361
2025-02-26 16:07:47 +01:00
Aliaksandr Valialkin
dd1c0e3bb7
lib/logstorage: optimize common regex filters generated by Grafana
For example, `field:~".+"`, `field:~".*"` or `field:""`

Replace such filters to faster ones. For example, `field:~".*"` is replaced with `*`,
while `field:~".+"` is replaced with `field:*`.
2025-02-25 20:35:04 +01:00
Aliaksandr Valialkin
e36e28a2b0
lib/regexutil: speed up Regex.MatchString for ".*" 2025-02-25 20:35:03 +01:00
Aliaksandr Valialkin
14a5ccdc83
lib/logstorage: run make fmt after 30974e7f3f
(cherry picked from commit 82cdcec6c6)
2025-02-25 19:13:31 +01:00
Aliaksandr Valialkin
9e0581533c
lib/logstorage: add le_field and lt_field filters
These filters can be used for selecting logs where one field value is less than another field value.
These filter complement `<=` and `<` filters for constant literals.

(cherry picked from commit 30974e7f3f)
2025-02-25 19:13:31 +01:00
Aliaksandr Valialkin
3bc89226bb
lib/logstorage: optimize eq_filter when it is applied to fields of the same type
(cherry picked from commit edc750dd55)
2025-02-25 19:13:30 +01:00
Aliaksandr Valialkin
dc09d0bff4
lib/mergeset: explicitly pass the interval for flushing in-memory data to disk at MustOpenTable()
This allows using different intervals for flushing in-memory data among different mergeset.Table instances.

The initial user of this feature is lib/logstorage.Storage, which explicitly passes Storage.flushInterval
to every created mereset.Table instance. Previously mergeset.Table instances were using 5 seconds
flush interval, which didn't depend on the Storage.flushInterval.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
2025-02-24 15:34:59 +01:00
Aliaksandr Valialkin
a964cc7a0c
lib/logstorage: properly use datadb.flushInterval as an interval between flushes for the in-memory parts
The dataFlushInterval variable has been mistakenly introduced in the commit 9dbd0f9085

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
2025-02-24 15:34:59 +01:00
Aliaksandr Valialkin
d56f9327ec
lib/logstorage: limit the maximum log field name length, which can be generated by JSONParser.ParseLogMessage
Make sure that the maximum log field name, which can be generated by JSONParser.ParseLogMessage,
doesn't exceed the hardcoded limit maxFieldNameSize. Stop flattening of nested JSON objects
when the resulting field name becomes longer than maxFieldNameSize, and return the nested JSON object
as a string instead.

This should prevent from parse errors when ingesting deeply nested JSON logs with long field names.
2025-02-24 15:34:59 +01:00
Aliaksandr Valialkin
dc536d5626
lib/logstorage: add a benchmark for JSONParser.ParseLogMessage 2025-02-24 15:34:58 +01:00
Aliaksandr Valialkin
0d3ee707ba
lib/encoding/zstd: reduce the number of cached zstd.Encoder instances
Use the real compression level supported by github.com/klauspost/compress/zstd as a cache map key.
The number of real compression levels is smaller than the number of zstd compression levels.
This should reduce the number of cached zstd.Encoder instances.

See https://github.com/klauspost/compress/discussions/1025
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7503#issuecomment-2500088591
2025-02-24 15:34:58 +01:00
Aliaksandr Valialkin
3ee4b3ef24
lib/logstorage: add contains_any and contains_all filters
- `contains_any` selects logs with fields containing at least one word/phrase from the provided list.
  The provided list can be generated by a subquery.

- `contains_all` selects logs with fields containing all the words and phrases from the provided list.
  The provided list can be generated by a subquery.
2025-02-24 15:34:58 +01:00
Aliaksandr Valialkin
3e941920f6
lib/logstorage: do not spend CPU time on preparing values for already filtered out rows according to bm at filterEqField.applyToBlockSearch 2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin
6975352d5a
lib/logstorage: avoid extra memory allocations at getEmptyStrings() 2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin
a2d0846e86
lib/logstorage: add an ability to drop duplicate words at unpack_words pipe 2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin
518ed87a3a
lib/logstorage: rename unpack_tokens to unpack_words pipe
The LogsQL defines a word at https://docs.victoriametrics.com/victorialogs/logsql/#word ,
so it is more natural to use unpack_words instead of unpack_tokens name for the pipe.
2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin
4beceb67ab
lib/logstorage: optimize OR filter a bit for many inner filters
Use two operations on bitmaps per each inner filter instead of three operations.
2025-02-24 15:34:57 +01:00
Aliaksandr Valialkin
bff5551ba5
lib/logstorage: use clear() for clearing bitmap bits at resetBits() instead of a loop
The clear() call is easier to read and understand than the loop.
2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin
4dfd1407ba
lib/logstorage: avoid calling bitmap.reset() at getBitmap()
The bitmap at getBitamp() must be already reset when it was returned to the pool via putBitamp().
Thise saves CPU a bit.
2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin
bc3e557f02
lib/logstorage: improve error logging for improperly escaped backslashes inside quoted strings
This should simplify debugging LogsQL queries by users
2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin
1f11bc948e
lib/logstorage: add field1:eq_field(field2) filter, which returns logs with identical values at field1 and field2 2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin
504c034cbf
lib/logstorage: optimize len, hash and json_array_len pipes for repeated values
Re-use the previous result instead of calculating new result for repated input values
2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin
959282090a
lib/logstorage: add json_array_len pipe for calculating the length of JSON arrays 2025-02-24 15:34:56 +01:00
Aliaksandr Valialkin
aef939dc20
lib/logstorage: refactor unroll_tokens into unpack_tokens pipe
unpack_tokens pipe generates a JSON array of unpacked tokens from the source field.
This composes better with other pipes such as unroll pipe.
2025-02-24 15:34:55 +01:00
Aliaksandr Valialkin
afd74d82db
lib/logstorage: add unroll_tokens pipe for unrolling individual word tokens from the log field 2025-02-24 15:34:55 +01:00
Aliaksandr Valialkin
2dfd6bb689
lib/logstorage: simplify usage of top, uniq and unroll pipes by allowing comma-separated list of fields without parens
Examples:

   - `top 5 x, y` is equivalent to `top 5 by (x, y)`
   - `uniq foo, bar` is equivalent to `uniq by (foo, bar)`
   - `unroll foo, bar` is equivalent to `unroll (foo, bar)`
2025-02-21 12:43:26 +01:00
Aliaksandr Valialkin
061fd098b5
lib/logstorage: properly handle _time:<=max_time filter
_time:<=max_time filter must include logs with timestamps matching max_time.
For example, _time:<=2025-02-24Z must include logs with timestamps until the end of February 24, 2025.
2025-02-21 12:43:26 +01:00
Aliaksandr Valialkin
80d173471f
lib/logstorage: allow using '>', '>=', '<' and '<=' in '_time:...' filter
Examples:

  _time:>=2025-02-24Z selects logs with timestamps bigger or equal to 2025-02-24 UTC
  _time:>1d selects logs with timestamps older than one day comparing to the current time

This simplifies writing queries with _time filters.
See https://docs.victoriametrics.com/victorialogs/logsql/#time-filter
2025-02-21 12:43:26 +01:00
Hui Wang
93bbe10074
app/vmselect: add query resource limits priority
This commit adds support for overriding vmstorage `maxUniqueTimeseries` with specific
resource limits:
1. `-search.maxLabelsAPISeries` for
[/api/v1/labels](https://docs.victoriametrics.com/url-examples/#apiv1labels),
[/api/v1/label/.../values](https://docs.victoriametrics.com/url-examples/#apiv1labelvalues)
2. `-search. maxSeries` for
[/api/v1/series](https://docs.victoriametrics.com/url-examples/#apiv1series)
3. `-search.maxTSDBStatusSeries` for
[/api/v1/status/tsdb](https://docs.victoriametrics.com/#tsdb-stats)
4. `-search.maxDeleteSeries` for
[/api/v1/admin/tsdb/delete_series](https://docs.victoriametrics.com/url-examples/#apiv1admintsdbdelete_series)

Currently, this limit priority logic cannot be applied to flags
`-search.maxFederateSeries` and `-search.maxExportSeries`, because they
share the same RPC `search_v7` with the /api/v1/query and
/api/v1/query_range APIs, preventing vmstorage from identifying the
actual API of the request. To address that, we need to add additional
information to the protocol between vmstorage and vmselect, which should
be introduced in the future when possible.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7857
2025-02-19 18:14:54 +01:00
Andrii Chubatiuk
94bf90842a
app/vlinsert/syslog: properly parse log line with characters escaped by rfc5424
Inside PARAM-VALUE, the characters '"' (ABNF %d34), '\' (ABNF %d92),
and ']' (ABNF %d93) MUST be escaped.  This is necessary to avoid
parsing errors.  Escaping ']' would not strictly be necessary but is
REQUIRED by this specification to avoid syslog application
implementation errors.  Each of these three characters MUST be
escaped as '\"', '\\', and '\]' respectively.  The backslash is used
for control character escaping for consistency with its use for
escaping in other parts of the syslog message as well as in traditional syslog.

 Related RFC:
https://datatracker.ietf.org/doc/html/rfc5424#section-6.3.3

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8282
2025-02-19 18:12:40 +01:00
Andrii Chubatiuk
99de7456c3
lib/protoparser/influx: add -influx.forceStreamMode flag to force parsing all Influx data in stream mode (#8319)
Addresses #8269

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-02-19 17:40:32 +01:00
Andrii Chubatiuk
a041488786
lib/streamaggr: added aggregation windows (#6314)
### Describe Your Changes

By default, stream aggregation and deduplication stores a single state
per each aggregation output result.
The data for each aggregator is flushed independently once per
aggregation interval. But there's no guarantee that
incoming samples with timestamps close to the aggregation interval's end
will get into it. For example, when aggregating
with `interval: 1m` a data sample with timestamp 1739473078 (18:57:59)
can fall into aggregation round `18:58:00` or `18:59:00`.
It depends on network lag, load, clock synchronization, etc. In most
scenarios it doesn't impact aggregation or
deduplication results, which are consistent within margin of error. But
for metrics represented as a collection of series,
like
[histograms](https://docs.victoriametrics.com/keyconcepts/#histogram),
such inaccuracy leads to invalid aggregation results.

For this case, streaming aggregation and deduplication support mode with
aggregation windows for current and previous state. With this mode,
flush doesn't happen immediately but is shifted by a calculated samples
lag that improves correctness for delayed data.

Enabling of this mode has increased resource usage: memory usage is
expected to double as aggregation will store two states
instead of one. However, this significantly improves accuracy of
calculations. Aggregation windows can be enabled via
the following settings:

- `-streamAggr.enableWindows` at [single-node
VictoriaMetrics](https://docs.victoriametrics.com/single-server-victoriametrics/)
and [vmagent](https://docs.victoriametrics.com/vmagent/). At
[vmagent](https://docs.victoriametrics.com/vmagent/)
`-remoteWrite.streamAggr.enableWindows` flag can be specified
individually per each `-remoteWrite.url`.
If one of these flags is set, then all aggregators will be using fixed
windows. In conjunction with `-remoteWrite.streamAggr.dedupInterval` or
`-streamAggr.dedupInterval` fixed aggregation windows are enabled on
deduplicator as well.
- `enable_windows` option in [aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config).
  It allows enabling aggregation windows for a specific aggregator.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit c8fc903669)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-02-19 13:31:37 +01:00
hagen1778
bb302df170
lib/logstorage: adjust expected compression ratio in tests
A follow-up after 9bb5ba5d2f
that impacted compression ratio for data compressed with native GO zstd lib (`make test-pure`).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 38bded4e58)
2025-02-19 13:30:05 +01:00
Aliaksandr Valialkin
697b775a46
lib/logstorage: remove optimizations from LogRows.sortFieldsInRows
It has been appeared these optimizatios do not give measurable performance improvements,
while they complicate the code too much and may result in slowdown when the ingested logs have
different sets of fields.

This is a follow-up for 630601488e

(cherry picked from commit dce5eb88d3)
2025-02-19 13:30:04 +01:00
Aliaksandr Valialkin
d0d9fb2818
lib/logstorage: return back the maximum number of files for log fields data from 256 to 128
It has been appeared that 256 files increase RAM usage too much comparing to 128 files
when ingesting logs with hundreds of fields (aka wide events). So let's return back 128 files
limit for now.

This is a follow-up for 9bb5ba5d2f

(cherry picked from commit a50ab10998)
2025-02-19 13:30:04 +01:00
Aliaksandr Valialkin
0a8d52376e
lib/bytesutil: drop ByteBuffer.B when its capacity is bigger than 64KB at Reset
There is little sense in keeping too big buffers - they just waste RAM and do not reduce
the load on GC too much. So it is better dropping such buffers at Reset instead of keeping them around.

(cherry picked from commit b58e2ab214)
2025-02-19 13:30:03 +01:00
Aliaksandr Valialkin
53849c95b7
lib/filestream: use smaller sizes for read buffers than for write buffers
The number of filestream readers is proportional to the number of parts to be merged,
while the number of filestream writers is proportional to the number of concurrent merges.
Usually around 4-16 parts are merged at once, so the number of active filestream readers is ~8x
bigger than the number of active filestream writers.

So it is a good idea to use smaller size of read buffers comparing to the size of write buffers.
Limit read buffer size by 64Kb, while write buffer size is limited by 128Kb.
This should reduce the overall memory usage when merging parts with big number of files.
This is the case for VictoriaLogs, which works with logs containing hundreds of fields (aka wide events).

(cherry picked from commit 659251beaa)
2025-02-19 13:30:03 +01:00
Aliaksandr Valialkin
a842114070
lib/logstorage: make sure that the data for every log field is stored in a separate file until the number of files is smaller than 256
This should improve query performance for logs with hundreds of fields (aka wide events).
Previously there was a high chance that the data for multiple log fields is stored in the same file.
This could result in query performance slowdown and/or increased disk read IO,
since the operating system could read unnecessary data for the fields, which aren't used in the query.

Now log fields are guaranteed to be stored in separate files until the number of fields exceeds 256.
After that multiple log fields start sharing files.

(cherry picked from commit 9bb5ba5d2f)
2025-02-19 13:30:02 +01:00
Aliaksandr Valialkin
0cd8591700
lib/filestream: reduce the maximum size of the buffered data per every stream from 512Kb to 256Kb
This reduces memory usage when many filestreams are processed simultaneously.
This is the case for VictoriaLogs when it processes logs with hundreds of fields.

(cherry picked from commit 2a681f2e8d)
2025-02-19 13:30:02 +01:00
Aliaksandr Valialkin
6a590de86f
lib/logstorage: LogRows.mustAddInternal a bit
- Re-use column names and values from the previously added rows if possible.
  This increases locality of reference for field names and values, while improving
  access speed for the field names and values.

- Postpone sorting fields in the added rows until creating inmemory part from them.
  This allows optimizing the sorting for log fields with the same set of fields.
  This is usually the case for logs, which belong to the same logs stream.

(cherry picked from commit 630601488e)
2025-02-19 13:30:02 +01:00
Aliaksandr Valialkin
893241b280
lib/logstorage: log the path to metadata file on errors at partHeader.mustReadMetadata
This should simplify troubleshooting

(cherry picked from commit f4ca5d3b1a)
2025-02-19 13:30:01 +01:00
Aliaksandr Valialkin
00d8e7a373
lib/logstorage: allow calling visitSubqueries on nil Query
This makes the code, which calls Query.visitSubquery, less error prone

(cherry picked from commit 910f307ca2)
2025-02-19 13:30:01 +01:00
Aliaksandr Valialkin
3ba095a875
lib/logstorage: remove needExecuteQuery from filterIn and filterStreamID, since it isn't needed
(cherry picked from commit 6afd66dcc8)
2025-02-19 13:30:01 +01:00
Nikolay
46b66626c8
lib/httpserver: properly check basic authorization
Commit 68791f9ccc introduced regression.
It performed basicAuth check before built-in routes. It made impossible
to bypass basic authorization with `authKey` param.

This commit fixeds that issue and removes unneeded check. It also adds
integration tests for this case.

 Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7345

---------

Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-02-17 16:08:50 +01:00
Aliaksandr Valialkin
88363b46b5
lib/logstorage: consistently use Query.cloneShallow() for shallow cloning of the original query 2025-02-17 15:36:38 +01:00