Commit graph

147 commits

Author SHA1 Message Date
Guillem Jover
76d205feae
spelling and grammar fixes via codespell (#8497)
### Describe Your Changes

Fix many spelling errors and some grammar, including misspellings in
filenames. 

The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`.
While this is a breaking change, this metric isn't used in alerts or dashboards. 
So it seems to have low impact on users.

The change also deprecates `cspell` as it is much heavier and less usable. 
---------

Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
2025-03-17 16:32:10 +01:00
Aliaksandr Valialkin
2c7dd2b991
lib/logstorage: support for {label in (v1,...,vN)} and {label not_in (v1, ..., vN)} syntax 2025-03-15 01:35:13 +01:00
Aliaksandr Valialkin
744ac496bd
lib/logstorage: add ability to specify field name prefixes inside fields (...) lists passed to pack_json and pack_logfmt pipes 2025-02-27 22:54:18 +01:00
Aliaksandr Valialkin
30974e7f3f
lib/logstorage: add le_field and lt_field filters
These filters can be used for selecting logs where one field value is less than another field value.
These filter complement `<=` and `<` filters for constant literals.
2025-02-25 18:24:50 +01:00
Aliaksandr Valialkin
b8daa8afb4
docs/VictoriaLogs/LogsQL.md: typo fixes 2025-02-22 22:35:54 +01:00
Aliaksandr Valialkin
f1eac36a80
lib/logstorage: add contains_any and contains_all filters
- `contains_any` selects logs with fields containing at least one word/phrase from the provided list.
  The provided list can be generated by a subquery.

- `contains_all` selects logs with fields containing all the words and phrases from the provided list.
  The provided list can be generated by a subquery.
2025-02-22 21:55:58 +01:00
Aliaksandr Valialkin
c372e10937
lib/logstorage: add an ability to drop duplicate words at unpack_words pipe 2025-02-22 21:55:57 +01:00
Aliaksandr Valialkin
7da98b540b
lib/logstorage: rename unpack_tokens to unpack_words pipe
The LogsQL defines a word at https://docs.victoriametrics.com/victorialogs/logsql/#word ,
so it is more natural to use unpack_words instead of unpack_tokens name for the pipe.
2025-02-22 21:55:56 +01:00
Aliaksandr Valialkin
dfcfaba374
lib/logstorage: add field1:eq_field(field2) filter, which returns logs with identical values at field1 and field2 2025-02-22 21:55:54 +01:00
Aliaksandr Valialkin
d33e24ab9b
lib/logstorage: add json_array_len pipe for calculating the length of JSON arrays 2025-02-22 21:55:53 +01:00
Aliaksandr Valialkin
cd73c1bafb
lib/logstorage: refactor unroll_tokens into unpack_tokens pipe
unpack_tokens pipe generates a JSON array of unpacked tokens from the source field.
This composes better with other pipes such as unroll pipe.
2025-02-22 21:55:53 +01:00
Aliaksandr Valialkin
d32c697361
lib/logstorage: add unroll_tokens pipe for unrolling individual word tokens from the log field 2025-02-22 21:55:52 +01:00
Aliaksandr Valialkin
ffbd0ebbae
lib/logstorage: allow using '>', '>=', '<' and '<=' in '_time:...' filter
Examples:

  _time:>=2025-02-24Z selects logs with timestamps bigger or equal to 2025-02-24 UTC
  _time:>1d selects logs with timestamps older than one day comparing to the current time

This simplifies writing queries with _time filters.
See https://docs.victoriametrics.com/victorialogs/logsql/#time-filter
2025-02-20 19:04:51 +01:00
Aliaksandr Valialkin
815ff805e6
docs/VictoriaLogs/LogsQL.md: add a chapter about subquery filters 2025-02-13 09:44:25 +01:00
Aliaksandr Valialkin
b5392337bf
lib/logstorage: block_stat pipe: return the path to the part where the block is stored 2025-01-26 22:36:47 +01:00
Aliaksandr Valialkin
ad6c587494
lib/logstorage: properly propagate extra filters to all the subqueries
The purpose of extra filters ( https://docs.victoriametrics.com/victorialogs/querying/#extra-filters )
is to limit the subset of logs, which can be queried. For example, it is expected that all the queries
with `extra_filters={tenant=123}` can access only logs, which contain `123` value for the `tenant` field.

Previously this wasn't the case, since the provided extra filters weren't applied to subqueries.
For example, the following query could be used to select all the logs outside `tenant=123`, for any `extra_filters` arg:

    * | union({tenant!=123})

This commit fixes this by propagating extra filters to all the subqueries.

While at it, this commit also properly propagates [start, end] time range filter from HTTP querying APIs
into all the subqueries, since this is what most users expect. This behaviour can be overriden on per-subquery
basis with the `options(ignore_global_time_filter=true)` option - see https://docs.victoriametrics.com/victorialogs/logsql/#query-options

Also properly apply apply optimizations across all the subqueries. Previously the optimizations at Query.optimize()
function were applied only to the top-level query.
2025-01-24 18:49:25 +01:00
Aliaksandr Valialkin
026894054b
docs/VictoriaLogs/LogsQL.md: show how to unroll the returned histogram buckets into separate rows at histogram pipe docs 2025-01-24 16:39:21 +01:00
Aliaksandr Valialkin
5747e8b5d0
docs/VictoriaLogs/sql-to-logsql.md: add a guide on how to convert SQL to LogsQL 2025-01-24 04:35:19 +01:00
Aliaksandr Valialkin
2c271aa9b2
docs/VictoriaLogs/LogsQL.md: mention that field pipe can be used for improving query performance 2025-01-23 23:36:37 +01:00
Aliaksandr Valialkin
eddeccfcfb
lib/logstorage: add hash pipe for calculating hash over the given log field
This pipe may be useful for sharding log entries among hash buckets.
2025-01-23 04:16:46 +01:00
Aliaksandr Valialkin
b620b5cff5
lib/logstorage: add an ability to set query concurrency on a per-query basis
This is done via 'options(concurrency=N)' prefix for the query.
For example, the following query is executed on at most 4 CPU cores:

    options(concurrency=4) _time:1d | count_uniq(user_id)

This allows reducing RAM and CPU usage at the cost of longer query execution times,
since by default every query is executed in parallel on all the available CPU cores.

See https://docs.victoriametrics.com/victorialogs/logsql/#query-options
2025-01-23 02:42:16 +01:00
Aliaksandr Valialkin
bfbe06e912
lib/logstorage: add ability to execute INNER JOIN with join pipe 2025-01-20 16:56:20 +01:00
Aliaksandr Valialkin
71a7d0db4a
docs/VictoriaLogs/LogsQL.md: clarify docs about LogsQL pipes a bit 2025-01-20 16:55:22 +01:00
Aliaksandr Valialkin
1645542a8a
docs/VictoriaLogs/LogsQL.md: use top pipe in examples instead of stats by (...) count() | sort (...) limit N
`top` pipe is shorter and easier to understand
2025-01-16 04:22:18 +01:00
Aliaksandr Valialkin
5e4de8e860
docs/VictoriaLogs/LogsQL.md: use proper backticks around hexnumencode: 2025-01-16 04:12:37 +01:00
Aliaksandr Valialkin
6312d3bbba
docs/VictoriaLogs: typo fixes for block_stats pipe docs 2025-01-16 03:53:50 +01:00
Aliaksandr Valialkin
d2bede6b51
docs/VictoriaLogs/LogsQL.md: add missing they word 2025-01-16 03:40:42 +01:00
Aliaksandr Valialkin
5ca5069fc4
docs/VictoriaLogs/LogsQL.md: fix incorrect url to VictoriaMetrics histogram buckets
This is a follow-up for d2a791bef3
2025-01-16 00:02:51 +01:00
Aliaksandr Valialkin
f27e120aeb
lib/logstorage: add union pipe, which allows uniting results from multiple queries 2025-01-15 22:22:07 +01:00
Aliaksandr Valialkin
47fe8cf3be
lib/logstorage: math pipe: add rand() function 2025-01-15 22:22:06 +01:00
Aliaksandr Valialkin
d2a791bef3
lib/logstorage: add histogram stats function for calculating histogram buckets over numeric fields 2025-01-13 22:30:19 +01:00
Aliaksandr Valialkin
cf7ea78588
lib/logstorage: format pipe: add frequently used formatters
- url encoding / decoding with <urlencode:field> and <urldecode:field>
- base64 encoding / decoding with <base64encode:field> and <base64decode:field>
- hex encoding / decoding with <hexencode:field> and <hexdecode:field>
- hex encoding for integers with <hexnumencode:field> and <hexnumdecode:field>
2025-01-13 07:08:43 +01:00
Aliaksandr Valialkin
3f22d06b0c
lib/logstorage: add value_type filter to LogsQL
This filter can be used when debugging and exploring logs in order to understand better
which value types are used for storing the particular log fields.

The `value_type` filter complements `block_stats` pipe.
2025-01-12 22:21:39 +01:00
Aliaksandr Valialkin
c2811d8d11
docs/VictoriaLogs/LogsQL.md: fix a link to count_uniq_hash stats function docs
It must be consistent with the other stats functions

This is a follow-up for de0ae735aa
2024-12-22 14:39:27 +01:00
Aliaksandr Valialkin
5dc0413bc0
lib/logstorage: allow specifying hits column name in the top pipe via top ... hits as <column_name> syntax 2024-12-22 11:23:19 +01:00
Aliaksandr Valialkin
60f9f44150
lib/logstorage: reduce memory allocations at stats and top pipes
Use chunked allocator in order to reduce memory allocations. It allocates objects from slices of up to 64Kb size.
This improves performance for `stats` and `top` pipes by up to 2x when they are applied to big number of `by (...)` groups.

Also parallelize execution of `count_uniq`, `count_uniq_hash` and `uniq_values` stats functions,
so they are executed faster on hosts with many CPU cores when applied to fields with big number
of unique values.
2024-12-22 02:13:02 +01:00
Aliaksandr Valialkin
3d7f8377f7
lib/logstorage: do not return log fields with the same constant value across all the selected logs from facets pipe
Such log fields do not give any useful information during logs' exploration.
They just clutter the output of the `facets` pipe. So it is better to drop such fields by default.

If these fields are needed, then `keep_const_fields` option can be added to `facets` pipe.
2024-12-17 12:23:00 +01:00
Aliaksandr Valialkin
b42ed019f5
docs/VictoriaLogs/LogsQL.md: collapse_nums pipe docs: clarify that <N> is a placeholder 2024-12-11 16:34:40 +01:00
Aliaksandr Valialkin
5a41c7f5a5
docs/VictoriaLogs/LogsQL.md: mention that collapse_nums can miss collapsing some numbers or can collapse unexpected numbers
Suggest a solution with replace_regexp() pipe for custom collapsing.
2024-12-11 16:32:36 +01:00
hagen1778
d05fadf988
docs: fix typo in facets example
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-12-10 15:52:33 +01:00
Aliaksandr Valialkin
de0ae735aa
lib/logstorage: add count_uniq_hash function to stats pipe
This function calculates the number of unique value hashes. This number is a good approximation
for the number of unique values. The `count_uniq_hash` function uses less memory and works faster
than `count_uniq` when applied to fields with big number of unique values.
2024-12-09 13:29:41 +01:00
Aliaksandr Valialkin
f54f73033b
docs/VictoriaLogs/LogsQL.md: typo fix: remove double with with 2024-12-09 00:37:01 +01:00
Aliaksandr Valialkin
db961f8609
lib/logstorage: add an ability to detect common patterns at collapse_nums pipe
The following patterns are detected:

- `<N>-<N>-<N>-<N>-<N>` is replaced with `<UUID>`.
- `<N>.<N>.<N>.<N>` is replaced with `<IP4>`.
- `<N>:<N>:<N>` is replaced with `<TIME>`. Optional fractional seconds after the time are treated as a part of `<TIME>`.
- `<N>-<N>-<N>` and `<N>/<N>/<N>` is replaced with `<DATE>`.
- `<N>-<N>-<N>T<N>:<N>:<N>` and `<N>-<N>-<N> <N>:<N>:<N>` is replaced with `<DATETIME>`. Optional timezone after the datetime is treated as a part of `<DATETIME>`.
2024-12-08 20:09:02 +01:00
Aliaksandr Valialkin
65d831a0ee
lib/logstorage: add collapse_nums pipe, which replaces decimal and hexadecimal nums in the given log field with <N>
This is useful for detecting patterns across log messages, which differ by various numeric fields,
with the following query:

_time:1h | collapse_nums | top 10 by (_msg)
2024-12-08 01:03:30 +01:00
Aliaksandr Valialkin
48540ac409
app/vlselect: allow passing max_value_len query arg to /select/logsql/facets API
The max_value_len query arg allows controlling the maximum length of values
per every log field. If the length is exceeded, then the log field is dropped
from the results, since it contains incomplete (misleading) set of most frequently seen field values.
2024-12-07 14:30:07 +01:00
Aliaksandr Valialkin
08af80ebe0
lib/logstorage: add an ability to change the time window for searching for surrounding logs in the stream_context pipe
Thanks to @worker24h for the idea at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7637#issuecomment-2523313740
2024-12-06 15:47:52 +01:00
Aliaksandr Valialkin
dbec34bafc
lib/logstorage: add facets pipe for returning the most frequent values across all the log fields seen in the selected logs 2024-12-06 01:24:15 +01:00
Aliaksandr Valialkin
c3b8da81cd
lib/logstorage: add rate and rate_sum stats functions
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7415
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7646
2024-12-05 17:10:46 +01:00
Aliaksandr Valialkin
b57f8d3cb6
docs/VictoriaLogs/LogsQL.md: add references to first and last pipes from the top pipe description
`top` pipe can be confused with the `first` and `last` pipes, so add references to these pipes from the `top` pipe docs.
This should help users locating the needed pipes.
2024-12-05 13:33:40 +01:00
Aliaksandr Valialkin
534371031e
lib/logstorage: add first and last pipes
The `first N by (field)` pipe is a shorthand to `sort by (field) limit N`,
while the `last N by (field)` pipe is a shorthand to `sort by (field) desc limit N`.

While at it, add support for partitioning sort results by log groups and applying
individual limit per each group.

For example, the following query returns up to 3 logs per each host with the biggest value
for the `request_duration` field:

_time:5m | last 3 by (request_duration) partition by (host)

This query is equivalent to the following one:

_time:5m | sort by (request_duration) desc limit 3 partition by (host)

Automatically add the 'partition by (_time)` into `sort`, `first` and `last` pipes
used in the query to `/select/logsql/stats_query_range` API.
This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7699
2024-12-05 01:42:03 +01:00