Commit graph

24 commits

Author SHA1 Message Date
Aliaksandr Valialkin
2dfd6bb689
lib/logstorage: simplify usage of top, uniq and unroll pipes by allowing comma-separated list of fields without parens
Examples:

   - `top 5 x, y` is equivalent to `top 5 by (x, y)`
   - `uniq foo, bar` is equivalent to `uniq by (foo, bar)`
   - `unroll foo, bar` is equivalent to `unroll (foo, bar)`
2025-02-21 12:43:26 +01:00
Aliaksandr Valialkin
4760df3e04
lib/logstorage: optimize pipe pipe for repeated strings, uint8 values and tuples
Update the pipe state only once per each series of repeated strings, uint8 values and tuples.
This improves performance a bit for the following `top` pipes:

- top (string_field)
- top (uint8_field)
- top (field1, ..., fieldN)

Do not apply the optimization for uint16, uint32, uint64 and int64 fields, since they
usually contain big number of unique values, which do not repeat most of the time.
2025-02-11 17:01:22 +01:00
Aliaksandr Valialkin
a6171ca7e2
lib/logstorage: optimize performance for stats, top and uniq pipes a bit
Split unique values (groups) into shards according to the configured concurrency
during processing of the matching rows if the number of unique values exceeds the hardcoded threshold.
Previously this splitting was performed unconditionally at the merge stage when merging independently
calculated per-CPU states into a single state. It is faster to perform the split during rows processing
if the number of unique values is big.

This gives up to 30% perfromance improvements when these pipes are applied to big number of unique values (groups).

(cherry picked from commit 48602a1ae8)
2025-02-07 18:32:30 +04:00
Aliaksandr Valialkin
851a5636aa
lib/logstorage: properly limit the number of concurrent workers at stats, top and uniq pipes according to the provided options(concurrency=N)
The number of worker shards per each pipe processor is created during query initialization.
This number equals to the `options(concurrency=N)` if this option is set or to the number of available CPU cores.
This means that all the pipes must adhere the given concurrency when passing data blocks
to the next pipe.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8201

The bug has been introduced in 0214aa328e
2025-02-06 13:44:35 +01:00
Aliaksandr Valialkin
fea934936b
lib/logstorage: properly propagate extra filters to all the subqueries
The purpose of extra filters ( https://docs.victoriametrics.com/victorialogs/querying/#extra-filters )
is to limit the subset of logs, which can be queried. For example, it is expected that all the queries
with `extra_filters={tenant=123}` can access only logs, which contain `123` value for the `tenant` field.

Previously this wasn't the case, since the provided extra filters weren't applied to subqueries.
For example, the following query could be used to select all the logs outside `tenant=123`, for any `extra_filters` arg:

    * | union({tenant!=123})

This commit fixes this by propagating extra filters to all the subqueries.

While at it, this commit also properly propagates [start, end] time range filter from HTTP querying APIs
into all the subqueries, since this is what most users expect. This behaviour can be overriden on per-subquery
basis with the `options(ignore_global_time_filter=true)` option - see https://docs.victoriametrics.com/victorialogs/logsql/#query-options

Also properly apply apply optimizations across all the subqueries. Previously the optimizations at Query.optimize()
function were applied only to the top-level query.
2025-01-26 22:05:05 +01:00
Aliaksandr Valialkin
f4b08b70d2
lib/logstorage: improve performance of unique pipe for integer columns with big number of unique values
(cherry picked from commit b4f4ece162)
2025-01-16 17:07:32 +01:00
Aliaksandr Valialkin
46ee68683b
lib/logstorage: top pipe: allow mixing the order of hits and rank suffixes
(cherry picked from commit 99516a5730)
2025-01-14 14:29:48 +01:00
Aliaksandr Valialkin
dbb1007b43
lib/logstorage: track integer field values in integer map for top N (int_field)
This reduces memory usage by up to 2x for the map used for tracking hits.
This also reduces CPU usage for tracking integer fields.

(cherry picked from commit cc29692e27)
2025-01-14 14:29:47 +01:00
Aliaksandr Valialkin
a326a4747e
lib/logstorage: reduce memory allocations when splitting in(...) values into tokens and calculating hashes for these tokens
While at it, reduce memory allocations at Storage.getFieldValuesNoHits and make it more scalable on multi-CPU systems.

This improves performance of in(<query>) filter when the <query> returns big number of values.
2024-12-23 19:45:03 +01:00
Aliaksandr Valialkin
bb4dbbab7c
lib/logstorage: allow specifying hits column name in the top pipe via top ... hits as <column_name> syntax 2024-12-23 19:45:03 +01:00
Aliaksandr Valialkin
6b0da64b30
lib/logstorage: reduce memory allocations at stats and top pipes
Use chunked allocator in order to reduce memory allocations. It allocates objects from slices of up to 64Kb size.
This improves performance for `stats` and `top` pipes by up to 2x when they are applied to big number of `by (...)` groups.

Also parallelize execution of `count_uniq`, `count_uniq_hash` and `uniq_values` stats functions,
so they are executed faster on hosts with many CPU cores when applied to fields with big number
of unique values.
2024-12-23 19:45:02 +01:00
Aliaksandr Valialkin
e71a8e3a6c
lib/logstorage: add facets pipe for returning the most frequent values across all the log fields seen in the selected logs
(cherry picked from commit dbec34bafc)
2024-12-09 12:23:27 +01:00
Aliaksandr Valialkin
a4ea3b87d7
lib/logstorage: optimize query imeediately after its parsing
This eliminates possible bugs related to forgotten Query.Optimize() calls.

This also allows removing optimize() function from pipe interface.

While at it, drop filterNoop inside filterAnd.

(cherry picked from commit 66b2987f49)
2024-11-08 17:07:56 +01:00
Aliaksandr Valialkin
1dd01b8a8f
lib/logstorage: follow-up for af831a6c906158f371f1b6810706fa0a54b78386
Sync the code between top and sort pipes regarding the code related to rank.

(cherry picked from commit 7a623c225f)
2024-10-30 09:52:52 +01:00
Aliaksandr Valialkin
329d9a46ee
lib/logstorage: add an ability to return rank from top pipe results
(cherry picked from commit 3c06d083ea)
2024-10-30 09:52:51 +01:00
Aliaksandr Valialkin
cd7823a310
lib/logstorage: optimize 'stats by(...)' calculations for by(...) fields with millions of unique values on multi-CPU systems
- Parallelize merging of per-CPU `stats by(...)` result shards.
- Parallelize writing `stats by(...)` results to the next pipe.

(cherry picked from commit c4b2fdff70)
2024-10-18 11:42:15 +02:00
Aliaksandr Valialkin
1000ae437c
lib/logstorage: optimize performance for top pipe when it is applied to a field with millions of unique values
- Use parallel merge of per-CPU shard results. This improves merge performance on multi-CPU systems.
- Use topN heap sort of per-shard results. This improves performance when results contain millions of entries.

(cherry picked from commit 192c07f76a)
2024-10-18 11:42:15 +02:00
Aliaksandr Valialkin
81f3e07e1e
lib/logstorage: do not count dictionary values which have no matching logs in count_uniq stats function
Create blockResultColumn.forEachDictValue* helper functions for visiting matching
dictionary values. These helper functions should prevent from counting dictionary values
without matching logs in the future.

This is a follow-up for 0c0f013a60
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7152
2024-10-01 13:36:27 +02:00
Aliaksandr Valialkin
dbcf06cd85
lib/logstorage: skip values with zero hits for 'uniq', 'top' and 'field_values' pipes
See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/72#issuecomment-2352078483
2024-09-30 14:16:21 +02:00
Aliaksandr Valialkin
58d1e517de
lib/logstorage: clear hits slice obtained from encoding.GetUint64s() before updating it with hits for valueTypeDict column
encoding.GetUint64s() returns uninitialized slice, which may contain arbitrary values.
So values in this slice must be reset to zero before using it for counting hits in `uniq` and `top` pipes.
2024-09-29 10:29:50 +02:00
Aliaksandr Valialkin
b5d94f06f5
lib/logstorage: postpone initialization of per-shard stateSizeBudget until the first call to pipeProcessor.writeBlock()
This simplifies pipeProcessor initialization logic a bit.
This also doesn't mangle the original maxStateSize value, which is used in error messages when the state size exceeds maxStateSize.
2024-09-29 10:29:49 +02:00
Aliaksandr Valialkin
246c339e3d
lib/logstorage: read timestamps column when it is really needed during query execution
Previously timestamps column was read unconditionally on every query.
This could significantly slow down queries, which do not need reading this column
like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .
2024-09-25 19:18:37 +02:00
Aliaksandr Valialkin
dd62a2b9d6
lib/logstorage: work-in-progress 2024-06-27 14:21:03 +02:00
Aliaksandr Valialkin
1750991119
lib/logstorage: work-in-progress 2024-06-17 12:13:25 +02:00