github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-11 15:34:56 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	2dfd6bb689	lib/logstorage: simplify usage of `top`, `uniq` and `unroll` pipes by allowing comma-separated list of fields without parens Examples: - `top 5 x, y` is equivalent to `top 5 by (x, y)` - `uniq foo, bar` is equivalent to `uniq by (foo, bar)` - `unroll foo, bar` is equivalent to `unroll (foo, bar)`	2025-02-21 12:43:26 +01:00
Aliaksandr Valialkin	4760df3e04	lib/logstorage: optimize `pipe` pipe for repeated strings, uint8 values and tuples Update the pipe state only once per each series of repeated strings, uint8 values and tuples. This improves performance a bit for the following `top` pipes: - top (string_field) - top (uint8_field) - top (field1, ..., fieldN) Do not apply the optimization for uint16, uint32, uint64 and int64 fields, since they usually contain big number of unique values, which do not repeat most of the time.	2025-02-11 17:01:22 +01:00
Aliaksandr Valialkin	a6171ca7e2	lib/logstorage: optimize performance for `stats`, `top` and `uniq` pipes a bit Split unique values (groups) into shards according to the configured concurrency during processing of the matching rows if the number of unique values exceeds the hardcoded threshold. Previously this splitting was performed unconditionally at the merge stage when merging independently calculated per-CPU states into a single state. It is faster to perform the split during rows processing if the number of unique values is big. This gives up to 30% perfromance improvements when these pipes are applied to big number of unique values (groups). (cherry picked from commit `48602a1ae8`)	2025-02-07 18:32:30 +04:00
Aliaksandr Valialkin	851a5636aa	lib/logstorage: properly limit the number of concurrent workers at `stats`, `top` and `uniq` pipes according to the provided `options(concurrency=N)` The number of worker shards per each pipe processor is created during query initialization. This number equals to the `options(concurrency=N)` if this option is set or to the number of available CPU cores. This means that all the pipes must adhere the given concurrency when passing data blocks to the next pipe. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8201 The bug has been introduced in `0214aa328e`	2025-02-06 13:44:35 +01:00
Aliaksandr Valialkin	fea934936b	lib/logstorage: properly propagate extra filters to all the subqueries The purpose of extra filters ( https://docs.victoriametrics.com/victorialogs/querying/#extra-filters ) is to limit the subset of logs, which can be queried. For example, it is expected that all the queries with `extra_filters={tenant=123}` can access only logs, which contain `123` value for the `tenant` field. Previously this wasn't the case, since the provided extra filters weren't applied to subqueries. For example, the following query could be used to select all the logs outside `tenant=123`, for any `extra_filters` arg: * \| union({tenant!=123}) This commit fixes this by propagating extra filters to all the subqueries. While at it, this commit also properly propagates [start, end] time range filter from HTTP querying APIs into all the subqueries, since this is what most users expect. This behaviour can be overriden on per-subquery basis with the `options(ignore_global_time_filter=true)` option - see https://docs.victoriametrics.com/victorialogs/logsql/#query-options Also properly apply apply optimizations across all the subqueries. Previously the optimizations at Query.optimize() function were applied only to the top-level query.	2025-01-26 22:05:05 +01:00
Aliaksandr Valialkin	f4b08b70d2	lib/logstorage: improve performance of `unique` pipe for integer columns with big number of unique values (cherry picked from commit `b4f4ece162`)	2025-01-16 17:07:32 +01:00
Aliaksandr Valialkin	46ee68683b	lib/logstorage: `top` pipe: allow mixing the order of `hits` and `rank` suffixes (cherry picked from commit `99516a5730`)	2025-01-14 14:29:48 +01:00
Aliaksandr Valialkin	dbb1007b43	lib/logstorage: track integer field values in integer map for `top N (int_field)` This reduces memory usage by up to 2x for the map used for tracking hits. This also reduces CPU usage for tracking integer fields. (cherry picked from commit `cc29692e27`)	2025-01-14 14:29:47 +01:00
Aliaksandr Valialkin	a326a4747e	lib/logstorage: reduce memory allocations when splitting in(...) values into tokens and calculating hashes for these tokens While at it, reduce memory allocations at Storage.getFieldValuesNoHits and make it more scalable on multi-CPU systems. This improves performance of in(<query>) filter when the <query> returns big number of values.	2024-12-23 19:45:03 +01:00
Aliaksandr Valialkin	bb4dbbab7c	lib/logstorage: allow specifying hits column name in the `top` pipe via `top ... hits as <column_name>` syntax	2024-12-23 19:45:03 +01:00
Aliaksandr Valialkin	6b0da64b30	lib/logstorage: reduce memory allocations at `stats` and `top` pipes Use chunked allocator in order to reduce memory allocations. It allocates objects from slices of up to 64Kb size. This improves performance for `stats` and `top` pipes by up to 2x when they are applied to big number of `by (...)` groups. Also parallelize execution of `count_uniq`, `count_uniq_hash` and `uniq_values` stats functions, so they are executed faster on hosts with many CPU cores when applied to fields with big number of unique values.	2024-12-23 19:45:02 +01:00
Aliaksandr Valialkin	e71a8e3a6c	lib/logstorage: add `facets` pipe for returning the most frequent values across all the log fields seen in the selected logs (cherry picked from commit `dbec34bafc`)	2024-12-09 12:23:27 +01:00
Aliaksandr Valialkin	a4ea3b87d7	lib/logstorage: optimize query imeediately after its parsing This eliminates possible bugs related to forgotten Query.Optimize() calls. This also allows removing optimize() function from pipe interface. While at it, drop filterNoop inside filterAnd. (cherry picked from commit `66b2987f49`)	2024-11-08 17:07:56 +01:00
Aliaksandr Valialkin	1dd01b8a8f	lib/logstorage: follow-up for af831a6c906158f371f1b6810706fa0a54b78386 Sync the code between top and sort pipes regarding the code related to rank. (cherry picked from commit `7a623c225f`)	2024-10-30 09:52:52 +01:00
Aliaksandr Valialkin	329d9a46ee	lib/logstorage: add an ability to return rank from `top` pipe results (cherry picked from commit `3c06d083ea`)	2024-10-30 09:52:51 +01:00
Aliaksandr Valialkin	cd7823a310	lib/logstorage: optimize 'stats by(...)' calculations for by(...) fields with millions of unique values on multi-CPU systems - Parallelize merging of per-CPU `stats by(...)` result shards. - Parallelize writing `stats by(...)` results to the next pipe. (cherry picked from commit `c4b2fdff70`)	2024-10-18 11:42:15 +02:00
Aliaksandr Valialkin	1000ae437c	lib/logstorage: optimize performance for `top` pipe when it is applied to a field with millions of unique values - Use parallel merge of per-CPU shard results. This improves merge performance on multi-CPU systems. - Use topN heap sort of per-shard results. This improves performance when results contain millions of entries. (cherry picked from commit `192c07f76a`)	2024-10-18 11:42:15 +02:00
Aliaksandr Valialkin	81f3e07e1e	lib/logstorage: do not count dictionary values which have no matching logs in `count_uniq` stats function Create blockResultColumn.forEachDictValue* helper functions for visiting matching dictionary values. These helper functions should prevent from counting dictionary values without matching logs in the future. This is a follow-up for `0c0f013a60` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7152	2024-10-01 13:36:27 +02:00
Aliaksandr Valialkin	dbcf06cd85	lib/logstorage: skip values with zero hits for 'uniq', 'top' and 'field_values' pipes See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/72#issuecomment-2352078483	2024-09-30 14:16:21 +02:00
Aliaksandr Valialkin	58d1e517de	lib/logstorage: clear hits slice obtained from encoding.GetUint64s() before updating it with hits for valueTypeDict column encoding.GetUint64s() returns uninitialized slice, which may contain arbitrary values. So values in this slice must be reset to zero before using it for counting hits in `uniq` and `top` pipes.	2024-09-29 10:29:50 +02:00
Aliaksandr Valialkin	b5d94f06f5	lib/logstorage: postpone initialization of per-shard stateSizeBudget until the first call to pipeProcessor.writeBlock() This simplifies pipeProcessor initialization logic a bit. This also doesn't mangle the original maxStateSize value, which is used in error messages when the state size exceeds maxStateSize.	2024-09-29 10:29:49 +02:00
Aliaksandr Valialkin	246c339e3d	lib/logstorage: read timestamps column when it is really needed during query execution Previously timestamps column was read unconditionally on every query. This could significantly slow down queries, which do not need reading this column like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .	2024-09-25 19:18:37 +02:00
Aliaksandr Valialkin	dd62a2b9d6	lib/logstorage: work-in-progress	2024-06-27 14:21:03 +02:00
Aliaksandr Valialkin	1750991119	lib/logstorage: work-in-progress	2024-06-17 12:13:25 +02:00

24 commits