github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-21 15:45:01 +00:00

Author	SHA1	Message	Date
Guillem Jover	76d205feae	spelling and grammar fixes via codespell (#8497 ) ### Describe Your Changes Fix many spelling errors and some grammar, including misspellings in filenames. The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`. While this is a breaking change, this metric isn't used in alerts or dashboards. So it seems to have low impact on users. The change also deprecates `cspell` as it is much heavier and less usable. --------- Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com> Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>	2025-03-17 16:32:10 +01:00
Aliaksandr Valialkin	1ffd5e9b69	lib/logstorage: avoid extra memory allocations at getEmptyStrings()	2025-02-22 21:55:57 +01:00
Aliaksandr Valialkin	dfcfaba374	lib/logstorage: add `field1:eq_field(field2)` filter, which returns logs with identical values at field1 and field2	2025-02-22 21:55:54 +01:00
Aliaksandr Valialkin	49bd3bc70a	lib/logstorage: attempt to use int64 bucketing before trying float64 bucketing at blockResult.getbucketedValue() int64 bucketing is lossless and faster than float64 bucketing, so it is preferred over float64 bucketing	2025-02-13 00:01:19 +01:00
Aliaksandr Valialkin	c75cce1384	lib/logstorage: refactor bucketing code 1. Use distinct code paths for blockResult.getValues() and blockResult.getValuesBucketed(). This should simplify debugging and maintenance of the resulting code. 2. Do not load column values if all the values in the block fit the same bucket. Use blockResultColumn.minValue and blockResultColumn.maxValue for determining whether column values must be loaded via blockResultColumn.getValuesEncoded(). This signiciantly improves performance for big buckets, which cover all the column values in a block. 3. Properly calculate buckets for negative values. 4. Properly adjust weekly buckets by Monday.	2025-02-12 21:46:12 +01:00
Aliaksandr Valialkin	46293ac5f7	lib/logstorage: improve performance of `stats by (...)` bucketing a bit	2025-02-12 03:21:03 +01:00
Aliaksandr Valialkin	33c55d7a22	lib/logstorage/block_result.go: remove misleading comment left after the commit `eddeccfcfb`	2025-02-11 22:48:16 +01:00
Aliaksandr Valialkin	eddeccfcfb	lib/logstorage: add `hash` pipe for calculating hash over the given log field This pipe may be useful for sharding log entries among hash buckets.	2025-01-23 04:16:46 +01:00
Aliaksandr Valialkin	b4f4ece162	lib/logstorage: improve performance of `unique` pipe for integer columns with big number of unique values	2025-01-15 19:53:10 +01:00
Aliaksandr Valialkin	f018aa33cb	lib/logstorage: avoid callback overhead at visitValuesReadonly Process values in batches instead of passing every value in the callback. This improves performance of reading the encoded values from storage by up to 50%.	2025-01-13 22:30:17 +01:00
Aliaksandr Valialkin	b812de236b	lib/logstorage: run `make fmt` after `e610edf045`	2025-01-12 03:17:57 +01:00
Aliaksandr Valialkin	e610edf045	lib/logstorage: improve performance for `math` pipe - Pass the calculated results to the next pipe in float64 columns. Previously the results were converted to string columns. This could slow down further calculations. - Use custom optimized logic for processing numeric columns, which are passed to math pipe. Previously all the input columns were converted to string and then converted to float64 before math pipe calculations. - Initialize the newly added columns at blockResult as soon as they are added. This improves performance when big number of columns are calculated by math pipe.	2025-01-12 03:01:47 +01:00
Aliaksandr Valialkin	df723a4870	lib/logstorage: automatically detect columns with int64 values and store them as packed 8-byte int64 values Previously columns with negative int64 values were stored either as float64 or string depending on whether the negative int64 values are bigger or smaller than -2^53. If the integer values are smaller than -2^53, then they are stored as string, since float64 cannot hold such values without precision loss. Now such values are stored as int64. This should improve compression ratio and query performance over columns with negative int64 values.	2025-01-12 03:01:46 +01:00
Aliaksandr Valialkin	60f9f44150	lib/logstorage: reduce memory allocations at `stats` and `top` pipes Use chunked allocator in order to reduce memory allocations. It allocates objects from slices of up to 64Kb size. This improves performance for `stats` and `top` pipes by up to 2x when they are applied to big number of `by (...)` groups. Also parallelize execution of `count_uniq`, `count_uniq_hash` and `uniq_values` stats functions, so they are executed faster on hosts with many CPU cores when applied to fields with big number of unique values.	2024-12-22 02:13:02 +01:00
Aliaksandr Valialkin	471f1d0a09	lib/logstorage: fixed a typo in blockResult.reset() The commit `4599429f51` improperly set br.cs to nil, while it should set br.bs to nil instead. This resulted in excess memory allocations at br.csInit() and br.csInitFast().	2024-12-21 13:39:25 +01:00
Aliaksandr Valialkin	bddb0e369f	lib/logstorage: optimize `stream_context` pipe over log streams with tens of millions of logs `stream_context` is implemented in the way, which needs scanning all the logs for the selected log streams. The scan performance is usually fast, since the majority of blocks are skipped, since they do not contain rows with the needed timestamps. But there was a pathological case with `stream_context before N`: VictoriaLogs usually scans blocks in chronological order. That means that the `before` context logs are constantly updated with the new logs. This requires reading the actual data for the requested log fields from disk. The workaround is to split the process of obtaining stream context logs into two phases: 1. Select only timestamps for the stream context logs, whithout selecting other log fields. This operation is usually much faster than reading the requested log fields. 2. Select stream context logs for the selected timestamps. This operation is usually fast, since the requested number of context logs is usually not so big. Performance testing for the new algorithm shows up to 30x speed improvement for `stream_context before N` and up to 5x speed improvement for `stream_context after N` when applied to log stream with 50M logs. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7637	2024-12-06 15:00:46 +01:00
Aliaksandr Valialkin	546bf7d579	lib/logstorage: properly skip filtered out dict values when calculating uniq_values, min, max, row_min and row_max stats functions Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7458	2024-11-08 23:21:21 +01:00
Aliaksandr Valialkin	2023f017b1	lib/logstorage: optimize performance for queries, which select all the log fields for logs containing hundreds of log fields (aka "wide events") Unpack the full columnsHeader block instead of unpacking meta-information per each individual column when the query, which selects all the columns, is executed. This improves performance when scanning logs with big number of fields.	2024-10-18 02:22:42 +02:00
Aliaksandr Valialkin	507b206a7d	lib/logstorage: move getConstColumnValue() and getColumnHeader() methods from columnsHeader to blockSearch This localizes blockSearch.getColumnsHeader() call at block_search.go . This call is going to be optimized in the next commits in order to avoid unmarshaling of header data for unneeded columns, which weren't requested by getConstColumnValue() / getColumnHeader().	2024-10-13 14:29:02 +02:00
Aliaksandr Valialkin	867f671cc4	lib/logstorage: make sure that bs.br is non-nil before checking br.bs.bsw.bh.rowsCount there br.bs may be nil when br contains the block with additional filters applied during pipe calculations. For example, `* \| count() if (error) errors`.	2024-10-12 20:51:29 +02:00
Aliaksandr Valialkin	a350be48b6	lib/logstorage: do not count dictionary values which have no matching logs in `count_uniq` stats function Create blockResultColumn.forEachDictValue* helper functions for visiting matching dictionary values. These helper functions should prevent from counting dictionary values without matching logs in the future. This is a follow-up for `0c0f013a60` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7152	2024-10-01 13:34:45 +02:00
Aliaksandr Valialkin	b82bd0c2ec	lib/logstorage: improve performance for stream_context pipe over streams with big number of log entries Do not read timestamps for blocks, which cannot contain surrounding logs. This should improve peformance for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6730 . Also optimize min(_time) and max(_time) calculations a bit by avoiding conversion of timestamp to string when it isn't needed. This should improve performance for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .	2024-09-26 22:22:23 +02:00
Aliaksandr Valialkin	65b93b17b1	lib/logstorage: lazily read column headers metadata during queries This improves performance for analytical queries, which do not need column headers metadata. For example, the following query doesn't need column headers metadata, since _stream and min(_time) are stored in block header, which is read separately from colum headers metadata: _time:1w \| stats by (_stream) min(_time) min_time This commit significantly improves the performance for this query. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070	2024-09-25 19:17:48 +02:00
Aliaksandr Valialkin	4599429f51	lib/logstorage: read timestamps column when it is really needed during query execution Previously timestamps column was read unconditionally on every query. This could significantly slow down queries, which do not need reading this column like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .	2024-09-25 19:17:47 +02:00
Aliaksandr Valialkin	7f1ba18719	lib/logstorage: improve the performance of obtaining _stream column value Substitute global streamTagsCache with per-blockSearch cache for ((stream.id) -> (_stream value)) entries. This improves scalability of obtaining _stream values on a machine with many CPU cores, since every CPU has its own blockSearch instance. This also should reduce memory usage when querying logs over big number of streams, since per-blockSearch cache of ((stream.id) -> (_stream value)) entries is limited in size, and its lifetime is bounded by a single query.	2024-09-24 20:57:00 +02:00
Aliaksandr Valialkin	9e1c037249	lib/logstorage: properly parse timezone offset at TryParseTimestampRFC3339Nano() The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets. While at it, make tryParseTimestampISO8601 function private in order to prevent from improper usage of this function from outside the lib/logstorage package. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508	2024-06-25 14:53:38 +02:00
Aliaksandr Valialkin	7229dd8c33	lib/logstorage: work-in-progress	2024-06-20 03:10:08 +02:00
Aliaksandr Valialkin	2b6a634ec0	lib/logstorage: work-in-progress	2024-06-17 12:13:18 +02:00
Aliaksandr Valialkin	8f5dc966f6	lib/logstorage: work-in-progress	2024-06-11 17:50:32 +02:00
Aliaksandr Valialkin	43cf221681	lib/logstorage: work-in-progress	2024-06-05 03:18:12 +02:00
Aliaksandr Valialkin	539fce9227	lib/logstorage: work-in-progress	2024-06-04 01:49:02 +02:00
Aliaksandr Valialkin	1de187bcb7	lib/logstorage: work-in-progress	2024-05-29 01:52:13 +02:00
Aliaksandr Valialkin	dc55146752	lib/logstorage: work-in-progress	2024-05-25 21:36:16 +02:00
Aliaksandr Valialkin	e2590f0485	lib/logstorage: work-in-progress	2024-05-25 00:30:58 +02:00
Aliaksandr Valialkin	4b458370c1	lib/logstorage: work-in-progress	2024-05-24 03:06:55 +02:00
Aliaksandr Valialkin	22107421eb	lib/logstorage: work-in-progress	2024-05-22 21:01:20 +02:00
Aliaksandr Valialkin	ad505a7a9a	lib/logstorage: work-in-progress	2024-05-20 04:08:30 +02:00
Aliaksandr Valialkin	0aa19a2837	lib/logstorage: work-in-progress	2024-05-15 04:55:44 +02:00
Aliaksandr Valialkin	da3af090c6	lib/logstorage: work-in-progress	2024-05-14 03:05:03 +02:00
Aliaksandr Valialkin	cb35e62e04	lib/logstorage: work-in-progress Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6258	2024-05-14 01:49:23 +02:00
hagen1778	17283fab6c	lib/logstorage: make linter happy Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-13 15:35:11 +02:00
Aliaksandr Valialkin	9dbd0f9085	lib/logstorage: initial implementation of pipes in LogsQL See https://docs.victoriametrics.com/victorialogs/logsql/#pipes	2024-05-12 16:33:31 +02:00

42 commits