github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-11 15:34:56 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	1ea3f72d50	lib/logstorage: simplify usage of `top`, `uniq` and `unroll` pipes by allowing comma-separated list of fields without parens Examples: - `top 5 x, y` is equivalent to `top 5 by (x, y)` - `uniq foo, bar` is equivalent to `uniq by (foo, bar)` - `unroll foo, bar` is equivalent to `unroll (foo, bar)`	2025-02-20 22:36:09 +01:00
Aliaksandr Valialkin	292f709725	lib/logstorage: optimize `pipe` pipe for repeated strings, uint8 values and tuples Update the pipe state only once per each series of repeated strings, uint8 values and tuples. This improves performance a bit for the following `top` pipes: - top (string_field) - top (uint8_field) - top (field1, ..., fieldN) Do not apply the optimization for uint16, uint32, uint64 and int64 fields, since they usually contain big number of unique values, which do not repeat most of the time.	2025-02-11 17:00:46 +01:00
Aliaksandr Valialkin	48602a1ae8	lib/logstorage: optimize performance for `stats`, `top` and `uniq` pipes a bit Split unique values (groups) into shards according to the configured concurrency during processing of the matching rows if the number of unique values exceeds the hardcoded threshold. Previously this splitting was performed unconditionally at the merge stage when merging independently calculated per-CPU states into a single state. It is faster to perform the split during rows processing if the number of unique values is big. This gives up to 30% perfromance improvements when these pipes are applied to big number of unique values (groups).	2025-02-06 13:46:32 +01:00
Aliaksandr Valialkin	171d4019cd	lib/logstorage: properly limit the number of concurrent workers at `stats`, `top` and `uniq` pipes according to the provided `options(concurrency=N)` The number of worker shards per each pipe processor is created during query initialization. This number equals to the `options(concurrency=N)` if this option is set or to the number of available CPU cores. This means that all the pipes must adhere the given concurrency when passing data blocks to the next pipe. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8201 The bug has been introduced in `0214aa328e`	2025-02-06 09:16:56 +01:00
Aliaksandr Valialkin	ad6c587494	lib/logstorage: properly propagate extra filters to all the subqueries The purpose of extra filters ( https://docs.victoriametrics.com/victorialogs/querying/#extra-filters ) is to limit the subset of logs, which can be queried. For example, it is expected that all the queries with `extra_filters={tenant=123}` can access only logs, which contain `123` value for the `tenant` field. Previously this wasn't the case, since the provided extra filters weren't applied to subqueries. For example, the following query could be used to select all the logs outside `tenant=123`, for any `extra_filters` arg: * \| union({tenant!=123}) This commit fixes this by propagating extra filters to all the subqueries. While at it, this commit also properly propagates [start, end] time range filter from HTTP querying APIs into all the subqueries, since this is what most users expect. This behaviour can be overriden on per-subquery basis with the `options(ignore_global_time_filter=true)` option - see https://docs.victoriametrics.com/victorialogs/logsql/#query-options Also properly apply apply optimizations across all the subqueries. Previously the optimizations at Query.optimize() function were applied only to the top-level query.	2025-01-24 18:49:25 +01:00
Aliaksandr Valialkin	b4f4ece162	lib/logstorage: improve performance of `unique` pipe for integer columns with big number of unique values	2025-01-15 19:53:10 +01:00
Aliaksandr Valialkin	99516a5730	lib/logstorage: `top` pipe: allow mixing the order of `hits` and `rank` suffixes	2025-01-13 22:30:19 +01:00
Aliaksandr Valialkin	cc29692e27	lib/logstorage: track integer field values in integer map for `top N (int_field)` This reduces memory usage by up to 2x for the map used for tracking hits. This also reduces CPU usage for tracking integer fields.	2025-01-13 22:30:18 +01:00
Aliaksandr Valialkin	c5949af9e8	lib/logstorage: reduce memory allocations when splitting in(...) values into tokens and calculating hashes for these tokens While at it, reduce memory allocations at Storage.getFieldValuesNoHits and make it more scalable on multi-CPU systems. This improves performance of in(<query>) filter when the <query> returns big number of values.	2024-12-22 13:13:44 +01:00
Aliaksandr Valialkin	5dc0413bc0	lib/logstorage: allow specifying hits column name in the `top` pipe via `top ... hits as <column_name>` syntax	2024-12-22 11:23:19 +01:00
Aliaksandr Valialkin	60f9f44150	lib/logstorage: reduce memory allocations at `stats` and `top` pipes Use chunked allocator in order to reduce memory allocations. It allocates objects from slices of up to 64Kb size. This improves performance for `stats` and `top` pipes by up to 2x when they are applied to big number of `by (...)` groups. Also parallelize execution of `count_uniq`, `count_uniq_hash` and `uniq_values` stats functions, so they are executed faster on hosts with many CPU cores when applied to fields with big number of unique values.	2024-12-22 02:13:02 +01:00
Aliaksandr Valialkin	dbec34bafc	lib/logstorage: add `facets` pipe for returning the most frequent values across all the log fields seen in the selected logs	2024-12-06 01:24:15 +01:00
Aliaksandr Valialkin	66b2987f49	lib/logstorage: optimize query imeediately after its parsing This eliminates possible bugs related to forgotten Query.Optimize() calls. This also allows removing optimize() function from pipe interface. While at it, drop filterNoop inside filterAnd.	2024-11-08 16:43:54 +01:00
Aliaksandr Valialkin	7a623c225f	lib/logstorage: follow-up for af831a6c906158f371f1b6810706fa0a54b78386 Sync the code between top and sort pipes regarding the code related to rank.	2024-10-29 16:44:46 +01:00
Aliaksandr Valialkin	3c06d083ea	lib/logstorage: add an ability to return rank from `top` pipe results	2024-10-29 16:44:45 +01:00
Aliaksandr Valialkin	c4b2fdff70	lib/logstorage: optimize 'stats by(...)' calculations for by(...) fields with millions of unique values on multi-CPU systems - Parallelize merging of per-CPU `stats by(...)` result shards. - Parallelize writing `stats by(...)` results to the next pipe.	2024-10-18 02:22:41 +02:00
Aliaksandr Valialkin	192c07f76a	lib/logstorage: optimize performance for `top` pipe when it is applied to a field with millions of unique values - Use parallel merge of per-CPU shard results. This improves merge performance on multi-CPU systems. - Use topN heap sort of per-shard results. This improves performance when results contain millions of entries.	2024-10-18 02:21:56 +02:00
Aliaksandr Valialkin	a350be48b6	lib/logstorage: do not count dictionary values which have no matching logs in `count_uniq` stats function Create blockResultColumn.forEachDictValue* helper functions for visiting matching dictionary values. These helper functions should prevent from counting dictionary values without matching logs in the future. This is a follow-up for `0c0f013a60` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7152	2024-10-01 13:34:45 +02:00
Aliaksandr Valialkin	0c0f013a60	lib/logstorage: skip values with zero hits for 'uniq', 'top' and 'field_values' pipes See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/72#issuecomment-2352078483	2024-09-30 14:15:07 +02:00
Aliaksandr Valialkin	55eb321f77	lib/logstorage: clear hits slice obtained from encoding.GetUint64s() before updating it with hits for valueTypeDict column encoding.GetUint64s() returns uninitialized slice, which may contain arbitrary values. So values in this slice must be reset to zero before using it for counting hits in `uniq` and `top` pipes.	2024-09-29 10:29:13 +02:00
Aliaksandr Valialkin	94afcbd9a9	lib/logstorage: postpone initialization of per-shard stateSizeBudget until the first call to pipeProcessor.writeBlock() This simplifies pipeProcessor initialization logic a bit. This also doesn't mangle the original maxStateSize value, which is used in error messages when the state size exceeds maxStateSize.	2024-09-29 10:29:13 +02:00
Aliaksandr Valialkin	4599429f51	lib/logstorage: read timestamps column when it is really needed during query execution Previously timestamps column was read unconditionally on every query. This could significantly slow down queries, which do not need reading this column like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .	2024-09-25 19:17:47 +02:00
Aliaksandr Valialkin	87f1c8bd6c	lib/logstorage: work-in-progress	2024-06-27 14:20:43 +02:00
Aliaksandr Valialkin	2b6a634ec0	lib/logstorage: work-in-progress	2024-06-17 12:13:18 +02:00

24 commits