github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	4b1611267f	lib/logstorage: properly return surrounding logs outside the selected time range by stream_context pipe Previously only logs inside the selected time range could be returned by stream_context pipe. For example, the following query could return up to 10 surrounding logs only for the last 5 minutes, while most users expect this query should return up to 10 surrounding logs without restrictions on the time range. _time:5m panic \| stream_context before 10 This enables the ability to implement stream context feature at VictoriaLogs web UI: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7063 . Reduce memory usage when returning stream context over big log streams with millions of entries. The new logic scans over all the log messages for the selected log stream, while keeping in memory only the given number of surrounding logs. Previously all the logs for the given log stream on the selected time range were loaded in memory before selecting the needed surrounding logs. This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6730 . Reduce the scan performance for big log streams by fetching only the requested fields. For example, the following query should be executed much faster than before if logs contain many fields other than _stream, _msg and _time: panic \| stream_context after 30 \| fields _stream, _msg, _time	2024-09-26 17:03:45 +02:00
Aliaksandr Valialkin	037652d5ae	app/vlinsert: support `_time` field without timezone information during data ingestion Use local timezone of the host server in this case. The timezone can be overridden with TZ environment variable if needed. While at it, allow using whitespace instead of T as a delimiter between data and time in the ingested _time field. For example, '2024-09-20 10:20:30' is now accepted during data ingestion. This is valid ISO8601 format, which is used by some log shippers, so it should be supported. This format is also known as SQL datetime format. Also assume local time zone when time without timezone information is passed to querying APIs. Previously such a time was parsed in UTC timezone. Add `Z` to the end of the time string if the old behaviour is preferred. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6721	2024-09-26 12:49:35 +02:00
Aliaksandr Valialkin	255d1d4e13	app/vlselect/logsql: clone the query with the current timestamp when performing live tailing requests in the loop Previously the original timestamp was used in the copied query, so _time:duration filters were applied to the original time range: (timestamp-duration ... timestamp]. This resulted in stopped live tailing, since new logs have timestamps bigger than the original time range. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7028	2024-09-26 08:57:23 +02:00
Aliaksandr Valialkin	e9950f6307	lib/logstorage: add `blocks_count` pipe This pipe is useful for debugging purposes when the number of processed blocks must be calculated for the given query: <query> \| blocks_count This helps detecting the root cause of query performance slowdown in cases like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070	2024-09-25 19:17:48 +02:00
Aliaksandr Valialkin	65b93b17b1	lib/logstorage: lazily read column headers metadata during queries This improves performance for analytical queries, which do not need column headers metadata. For example, the following query doesn't need column headers metadata, since _stream and min(_time) are stored in block header, which is read separately from colum headers metadata: _time:1w \| stats by (_stream) min(_time) min_time This commit significantly improves the performance for this query. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070	2024-09-25 19:17:48 +02:00
Aliaksandr Valialkin	4599429f51	lib/logstorage: read timestamps column when it is really needed during query execution Previously timestamps column was read unconditionally on every query. This could significantly slow down queries, which do not need reading this column like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .	2024-09-25 19:17:47 +02:00
Aliaksandr Valialkin	7f1ba18719	lib/logstorage: improve the performance of obtaining _stream column value Substitute global streamTagsCache with per-blockSearch cache for ((stream.id) -> (_stream value)) entries. This improves scalability of obtaining _stream values on a machine with many CPU cores, since every CPU has its own blockSearch instance. This also should reduce memory usage when querying logs over big number of streams, since per-blockSearch cache of ((stream.id) -> (_stream value)) entries is limited in size, and its lifetime is bounded by a single query.	2024-09-24 20:57:00 +02:00
Aliaksandr Valialkin	cf2e7d0d92	lib/logstorage/consts.go: document that it isn't recommended setting maxColumnsPerBlock constant to too big values This should help avoiding cases like this one - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6425#issuecomment-2337446083	2024-09-24 18:51:46 +02:00
Aliaksandr Valialkin	f86e093b20	lib/logstorage: improve performance for streamID.marshalString() by more than 2x The streamID.marshalString() is executed in hot path if the query selects _stream_id field. Command to run the benchmark: go test ./lib/logstorage/ -run=NONE -bench=BenchmarkStreamIDMarshalString -benchtime=5s Results before the commit: BenchmarkStreamIDMarshalString-16 438480714 14.04 ns/op 71.23 MB/s 0 B/op 0 allocs/op Results after the commit: BenchmarkStreamIDMarshalString-16 982459660 6.049 ns/op 165.30 MB/s 0 B/op 0 allocs/op	2024-09-24 18:35:04 +02:00
Aliaksandr Valialkin	919d2dc90e	lib/logstorage: add benchmark for streamID.marshalString	2024-09-24 18:31:38 +02:00
Aliaksandr Valialkin	a3d8077959	lib/logstorage: make sure that getCommonTokens returns common tokens in the original order of tokens inside tokenSets arg This fixes flaky test TestGetCommonTokensForOrFilters: filter_or_test.go:143: unexpected tokens for field "_msg"; got ["foo" "bar"]; want ["bar" "foo"]	2024-09-19 15:59:48 +02:00
Aliaksandr Valialkin	657988ac3a	app/vlselect: consistently reuse the original query timestamp when executing /select/logsql/query with positive limit=N query arg Previously the query could return incorrect results, since the query timestamp was updated with every Query.Clone() call during iterative search for the time range with up to limit=N rows. While at it, optimize queries, which find low number of matching logs, while spend a lot of CPU time for searching across big number of logs. The optimization reduces the upper bound of the time range to search if the current time range contains zero matching rows. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6785	2024-09-08 14:32:23 +02:00
Aliaksandr Valialkin	45a3713bdb	lib/logstorage: preserve the order of tokens to check against bloom filters in AND filters Previously tokens from AND filters were extracted in random order. This could slow down checking them agains bloom filters if the most specific tokens go at the beginning of the AND filters. Preserve the original order of tokens when matching them against bloom filters, so the user could control the performance of the query by putting the most specific AND filters at the beginning of the query. While at it, add tests for getCommonTokensForAndFilters() and getCommonTokensForOrFilters(). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556	2024-09-08 12:27:30 +02:00
Aliaksandr Valialkin	eaee2d7db4	lib/logstorage: improve error logging for incorrect queries passed to /select/logsql/stats_query and /select/logsql/stats_query_range functions	2024-09-08 11:24:44 +02:00
Aliaksandr Valialkin	1cd06ace5a	lib/logstorage: properly extract common tokens from unsupported OR filters Previously the following query could miss rows matching !bar if these rows do not contain foo: foo OR !bar This is because of incorrect detection of common tokens for OR filters - all the unsupported filters were skipped (including the NOT filter (aka `!`)), while in this case zero common tokens must be returned. While at it, move repetiteve code in TestFilterAnd and TestFilterOr into f function. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556	2024-09-08 11:14:55 +02:00
Aliaksandr Valialkin	0a40064a6f	app/vlselect: add /select/logsql/stats_query_range endpoint for building time series panels in VictoriaLogs plugin for Grafana Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6943 Updates https://github.com/VictoriaMetrics/victorialogs-datasource/issues/61	2024-09-07 00:41:47 +02:00
Aliaksandr Valialkin	c9bb4ddeed	app/vlselect: add /select/logsql/stats_query endpoint, which is going to be used by vmalert Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6942 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706	2024-09-06 23:06:43 +02:00
Aliaksandr Valialkin	00e7d5add3	lib/logstorage: substitute `\|` operator with `or` operator at `math` pipe This is needed for avoiding confusion between the `\|` operator at `math` pipe and `\|` pipe delimiter. For example, the following query was parsed unexpectedly: * \| math foo / bar \| fields x as * \| math foo / (bar \| fields) as x Substituting `\|` with `or` inside `math` pipe fixes this ambiguity.	2024-09-06 22:44:14 +02:00
Aliaksandr Valialkin	0205170409	lib/logstorage: consistently use nsecsPerDay constant and remove nsecPerDay constant	2024-09-06 16:17:04 +02:00
Aliaksandr Valialkin	258ccfb953	lib/logstorage: pre-calculate hashes from tokens used in bloom filter search Previously per-token hashes for per-block bloom filters were re-calculated on every scanned block. This could be slow when the number of tokens is big or when the number of blocks to scan is big. Pre-calculate hashes for bloom filters and then use them for searching in bloom filters. This improves performance by 2.5x for in(...) filters with many values to search inside `in()`.	2024-09-05 19:44:17 +02:00
Aliaksandr Valialkin	49e57ea80e	lib/logstorage: delete unused function - bloomfilter.containsAny	2024-09-05 16:21:06 +02:00
Aliaksandr Valialkin	2dd845fa53	lib/logstorage: properly fix incorrect extraction of common tokens for `OR` filters at distinct log fields Previously (f1:foo OR f2:bar) was incorrectly returning `foo` token for `f1` and `bar` token for `f2`. These tokens were used for checking against bloom filter for every data block, so the data block, which didn't contain simultaneously `foo` token for `f1` field and `bar` token for `f2` field, was skipped. This was incorrect, since such a block may contain logs matching the original OR filter. The fix is to return common tokens from `OR`-delimted filters only if these tokens exist at EVERY such filter for the given field name. If some `OR`-delimited filter misses the given field name, then `OR`-delimited filters do not contain common tokens, which could be used for checking against bloom filter. While at it, add more tests covering various edge cases for filters delimited by AND and OR. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556	2024-09-05 14:29:50 +02:00
jackyin	975ed27a76	lib/logstorage: `and` filter results in unexpected response (#6556 ) fix #6554 andfilter shouldn't return orfilter field which result in bloomfilter return false. --------- Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-09-03 10:17:44 +02:00
Aliaksandr Valialkin	ac06569c49	app/vlinsert/loki: use easyproto instead for parsing Loki protobuf messages	2024-07-10 03:05:17 +02:00
Aliaksandr Valialkin	aa9bb99527	lib/logstorage: drop all the pipes from the query when calculating the number of matching logs at /select/logsql/hits API	2024-07-10 00:39:28 +02:00
Aliaksandr Valialkin	3c02937a34	all: consistently use 'any' instead of 'interface{}' 'any' type is supported starting from Go1.18. Let's consistently use it instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.	2024-07-10 00:20:37 +02:00
Aliaksandr Valialkin	a9525da8a4	lib: consistently use f-tests instead of table-driven tests This makes easier to read and debug these tests. This also reduces test lines count by 15% from 3K to 2.5K See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e While at it, consistently use t.Fatal* instead of t.Error, since t.Error usually leads to more complicated and fragile tests, while it doesn't bring any practical benefits over t.Fatal*.	2024-07-09 22:40:50 +02:00
Aliaksandr Valialkin	c0caa69939	lib/logstorage: use quicktemplate.AppendJSONString instead of strconv.AppendQuote for encoding JSON strings The strconv.AppendQuote improperly encodes special chars such as \x1b . They must be encoded as \u001b . See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24	2024-07-05 01:22:23 +02:00
Aliaksandr Valialkin	3b6c78c26c	lib/logstorage: allow writing `after N` in front of `before N` at `stream_context` pipe	2024-07-02 01:38:20 +02:00
Aliaksandr Valialkin	6bb66cb3e9	lib/logstorage: properly search for the surrounding logs in `stream_context` pipe The set of log fields in the found logs may differ from the set of log fields present in the log stream. So compare only the log fields in the found logs when searching for the matching log entry in the log stream. While at it, return _stream field in the delimiter log entry, since this field is used by VictoriaLogs Web UI for grouping logs by log streams.	2024-07-01 02:29:50 +02:00
Aliaksandr Valialkin	bb0deb7ac4	lib/logstorage: add ability to store sorted log position into a separate field with `sort ... rank <fieldName>` syntax	2024-07-01 01:44:17 +02:00
Aliaksandr Valialkin	dc291d8980	lib/logstorage: add delimiter between log chunks returned from `\| stream_context` pipe	2024-07-01 01:30:37 +02:00
Aliaksandr Valialkin	d4ca651547	lib/logstorage: add `stream_context` pipe, which allows selecting surrounding logs for the matching logs	2024-06-28 19:14:29 +02:00
Aliaksandr Valialkin	0730f1324d	lib/logstorage: it is safe using `\| unroll` pipe in live tailing `\| unroll` pipe can make multiple copies of rows from the input row. This doesn't break live tailing, so allow `\| unroll` pipe in live tailing.	2024-06-27 19:44:57 +02:00
Aliaksandr Valialkin	87f1c8bd6c	lib/logstorage: work-in-progress	2024-06-27 14:20:43 +02:00
Aliaksandr Valialkin	dff5008392	app/vlstorage: add -retention.maxDiskSpaceUsageBytes command-line flag for limiting the retention at VictoriaLogs by disk space usage	2024-06-25 17:30:33 +02:00
Aliaksandr Valialkin	3eacd43fff	lib/logstorage: parse syslog structured data into separate fields in order to simplify further querying of this data	2024-06-25 14:53:39 +02:00
Aliaksandr Valialkin	9e1c037249	lib/logstorage: properly parse timezone offset at TryParseTimestampRFC3339Nano() The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets. While at it, make tryParseTimestampISO8601 function private in order to prevent from improper usage of this function from outside the lib/logstorage package. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508	2024-06-25 14:53:38 +02:00
Aliaksandr Valialkin	7252c5d258	lib/logstorage: make golangci-lint happy	2024-06-25 03:04:21 +02:00
Aliaksandr Valialkin	de7450b7e0	lib/logstorage: work-in-progress	2024-06-24 23:27:12 +02:00
Aliaksandr Valialkin	7229dd8c33	lib/logstorage: work-in-progress	2024-06-20 03:10:08 +02:00
Aliaksandr Valialkin	e498fa6960	app/vlinsert/syslog: allow accepting syslog messages with different configs at different ports	2024-06-17 23:16:34 +02:00
Aliaksandr Valialkin	2b6a634ec0	lib/logstorage: work-in-progress	2024-06-17 12:13:18 +02:00
Aliaksandr Valialkin	8f5dc966f6	lib/logstorage: work-in-progress	2024-06-11 17:50:32 +02:00
Aliaksandr Valialkin	0521e58a09	lib/logstorage: work-in-progress	2024-06-10 18:42:19 +02:00
Aliaksandr Valialkin	55d8379ae6	lib/logstorage: work-in-progress	2024-06-06 12:27:05 +02:00
Aliaksandr Valialkin	80a7c65ab7	lib/logstorage: allow using `eval` keyword instead of `math` keyword in `math` pipe	2024-06-05 10:07:49 +02:00
Aliaksandr Valialkin	43cf221681	lib/logstorage: work-in-progress	2024-06-05 03:18:12 +02:00
Aliaksandr Valialkin	96c29ab403	lib/logstorage: allow typing `asc` in `sort` pipe for the sake of consistency with `desc`	2024-06-04 02:29:10 +02:00
Aliaksandr Valialkin	539fce9227	lib/logstorage: work-in-progress	2024-06-04 01:49:02 +02:00
Aliaksandr Valialkin	b30e80b071	lib/logstorage: work-in-progress	2024-05-30 16:19:23 +02:00
Aliaksandr Valialkin	1de187bcb7	lib/logstorage: work-in-progress	2024-05-29 01:52:13 +02:00
Aliaksandr Valialkin	0aafca29be	lib/logstorage: work-in-progress	2024-05-28 19:29:41 +02:00
Aliaksandr Valialkin	99138e15c0	lib/logstorage: fix golangci-lint warnings	2024-05-26 02:01:32 +02:00
Aliaksandr Valialkin	1e203f35f7	lib/logstorage: work-in-progress	2024-05-26 01:55:21 +02:00
Aliaksandr Valialkin	7ac529c235	lib/logstorage: work-in-progress	2024-05-25 22:59:13 +02:00
Aliaksandr Valialkin	0b629ce5a5	lib/logstorage: re-use per-shard fields across processed blocks in pipePackJSON and pipeUnroll	2024-05-25 22:13:32 +02:00
Aliaksandr Valialkin	dc55146752	lib/logstorage: work-in-progress	2024-05-25 21:36:16 +02:00
Aliaksandr Valialkin	e2590f0485	lib/logstorage: work-in-progress	2024-05-25 00:30:58 +02:00
Aliaksandr Valialkin	4b458370c1	lib/logstorage: work-in-progress	2024-05-24 03:06:55 +02:00
Alexander Marshalov	7da541360e	[vmlogs] fixed time parsing with millisecond precision time (#6293 ) (#6295 ) fix for #6293 Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-05-22 21:46:50 +02:00
Aliaksandr Valialkin	22107421eb	lib/logstorage: work-in-progress	2024-05-22 21:01:20 +02:00
Aliaksandr Valialkin	bc4a0b8f37	lib/logstorage: fix golangci-lint warnings	2024-05-20 11:04:12 +02:00
Aliaksandr Valialkin	ad505a7a9a	lib/logstorage: work-in-progress	2024-05-20 04:08:30 +02:00
Aliaksandr Valialkin	0aa19a2837	lib/logstorage: work-in-progress	2024-05-15 04:55:44 +02:00
Aliaksandr Valialkin	da3af090c6	lib/logstorage: work-in-progress	2024-05-14 03:05:03 +02:00
Aliaksandr Valialkin	cb35e62e04	lib/logstorage: work-in-progress Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6258	2024-05-14 01:49:23 +02:00
Aliaksandr Valialkin	cc2647d212	lib/encoding: optimize UnmarshalVarUint64, UnmarshalVarInt64 and UnmarshalBytes a bit Change the return values for these functions - now they return the unmarshaled result plus the size of the unmarshaled result in bytes, so the caller could re-slice the src for further unmarshaling. This improves performance of these functions in hot loops of VictoriaLogs a bit.	2024-05-14 01:23:54 +02:00
hagen1778	17283fab6c	lib/logstorage: make linter happy Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-13 15:35:11 +02:00
Aliaksandr Valialkin	9dbd0f9085	lib/logstorage: initial implementation of pipes in LogsQL See https://docs.victoriametrics.com/victorialogs/logsql/#pipes	2024-05-12 16:33:31 +02:00
Aliaksandr Valialkin	590160ddbb	lib/slicesutil: add helper functions for setting slice length and extending its capacity The added helper functions - SetLength() and ExtendCapacity() - replace error-prone code with simple function calls.	2024-05-12 11:32:17 +02:00
wanshuangcheng	83216e956c	chore: fix function names in comment (#6076 ) Signed-off-by: wanshuangcheng <wanshuangcheng@outlook.com>	2024-04-08 01:11:12 -07:00
Aliaksandr Valialkin	918cccaddf	all: fix golangci-lint(revive) warnings after `0c0ed61ce7` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001	2024-04-02 23:16:29 +03:00
XLONG96	a5795f533d	lib/logstorage: avoid panic when parsing regex with stream filter (#5897 )	2024-02-29 15:31:54 +02:00
Aliaksandr Valialkin	4617dc8bbe	lib/logstorage: consistently use atomic.* types instead of atomic.* functions on regular types See `ea9e2b19a5`	2024-02-23 23:46:13 +02:00
Aliaksandr Valialkin	f81b480905	lib/mergeset: consistently use atomic.* types instead of atomic.* function calls on ordinary types See `ea9e2b19a5`	2024-02-23 23:29:35 +02:00
Aliaksandr Valialkin	275335c181	lib/logstorage: consistently use atomic.* type for refCount and mustDrop fields in datadb and storage structs in the same way as it is used in lib/storage See `ea9e2b19a5` and `a204fd69f1`	2024-02-23 23:04:42 +02:00
Aliaksandr Valialkin	0514091948	app/vlselect: follow-up for `451d2abf50` - Consistently return the first `limit` log entries if the total size of found log entries doesn't exceed 1Mb. See app/vlselect/logsql/sort_writer.go . Previously random log entries could be returned with each request. - Document the change at docs/VictoriaLogs/CHANGELOG.md - Document the `limit` query arg at docs/VictoriaLogs/querying/README.md - Make the change less intrusive. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5674 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5778	2024-02-18 23:05:51 +02:00
Dmytro Kozlov	451d2abf50	Enable the `limit` query param for the `/select/logsql/query` (#5778 ) * app/vlselect: add limit for logs query * app/vlselect: CHANGELOG.md * app/vlselect: stop search process if limit is reached, update logic, remove default limit * app/vlselect: fix tests * app/vlselect: fix filter tests * app/vlselect: fix tests	2024-02-18 22:58:47 +02:00
noodles2hg	cafd6f08b3	lib/logstorage: proper exit during block search (#5400 )	2024-02-01 12:11:05 +00:00
Jiajing LU	333bda8702	count inmemoryParts that have not been taken for merge (#5447 )	2024-02-01 12:06:28 +00:00
Aliaksandr Valialkin	2655c02d5e	lib/logstorage: make sure that WaitGroup.Add isnt called after stopCh is closed and WaitGroup.Wait is called This protects from rare panic, which may occur during graceful shutdown of VictoriaLogs	2024-01-26 21:17:02 +01:00
Aliaksandr Valialkin	3449d563bd	all: add up to 10% random jitter to the interval between periodic tasks performed by various components This should smooth CPU and RAM usage spikes related to these periodic tasks, by reducing the probability that multiple concurrent periodic tasks are performed at the same time.	2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin	cef7a39ba3	lib/logstorage: always check the previous indexBlockHeader for blocks with matching tenantID and/or streamID The previous indexBlockHeader may contain blocks for the matching tenantID and/or streamID, so it must be scanned unconditionally during the search. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5295 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4856 This is a follow-up for `89dcbc2fe7`	2023-11-13 23:13:53 +01:00
XLONG96	89dcbc2fe7	lib/logstorage: fix streamID and tenantID search (#4856 ) (#5295 )	2023-11-13 23:09:39 +01:00
Aliaksandr Valialkin	42dd71bb63	all: consistently use %w instead of %s in when error is passed to fmt.Errorf() This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.	2023-10-25 21:24:03 +02:00
Zakhar Bessarab	b296c8e95a	lib/logstorage: fix free space check (#5113 ) Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-10-03 12:39:41 +02:00
Aliaksandr Valialkin	8dce4eb189	lib/logstorage: follow-up for `94627113db` - Move uniqueFields from rows to blockStreamMerger struct. This allows localizing all the references to uniqueFields inside blockStreamMerger.mustWriteBlock(), which should improve readability and maintainability of the code. - Remove logging of the event when blocks cannot be merged because they contain more than maxColumnsPerBlock, since the provided logging didn't provide the solution for the issue with too many columns. I couldn't figure out the proper solution, which could be helpful for end user, so decided to remove the logging until we find the solution. This commit also contains the following additional changes: - It truncates field names longer than 128 chars during logs ingestion. This should prevent from ingesting bogus field names. This also should prevent from too big columnsHeader blocks, which could negatively affect search query performance, since columnsHeader is read on every scan of the corresponding data block. - It limits the maximum length of const column value to 256. Longer values are stored in an ordinary columns. This helps limiting the size of columnsHeader blocks and improving search query performance by avoiding reading too long const columns on every scan of the corresponding data block. - It deduplicates columns with identical names during data ingestion and background merging. Previously it was possible to pass columns with duplicate names to block.mustInitFromRows(), and they were stored as is in the block. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4969	2023-10-02 19:19:08 +02:00
Aliaksandr Valialkin	7b33a27874	lib/logstorage: follow-up for `8a23d08c21` - Compare the actual free disk space to the value provided via -storage.minFreeDiskSpaceBytes directly inside the Storage.IsReadOnly(). This should work fast in most cases. This simplifies the logic at lib/storage. - Do not take into account -storage.minFreeDiskSpaceBytes during background merges, since it results in uncontrolled growth of small parts when the free disk space approaches -storage.minFreeDiskSpaceBytes. The background merge logic uses another mechanism for determining whether there is enough disk space for the merge - it reserves the needed disk space before the merge and releases it after the merge. This prevents from out of disk space errors during background merge. - Properly handle corner cases for flushing in-memory data to disk when the storage enters read-only mode. This is better than losing the in-memory data. - Return back Storage.MustAddRows() instead of Storage.AddRows(), since the only case when AddRows() can return error is when the storage is in read-only mode. This case must be handled by the caller by calling Storage.IsReadOnly() before adding rows to the storage. This simplifies the code a bit, since the caller of Storage.MustAddRows() shouldn't handle errors returned by Storage.AddRows(). - Properly store parsed logs to Storage if parts of the request contain invalid log lines. Previously the parsed logs could be lost in this case. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4945	2023-10-02 16:52:23 +02:00
Aliaksandr Valialkin	10d9214980	lib/logstorage: run up to GOMAXPROCS flushers of old in-memory parts to disk One flusher isn't enough under high data ingestion rate. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775	2023-10-02 16:20:59 +02:00
Aliaksandr Valialkin	da9ef90277	lib/logstorage: assist merging in-memory parts at data ingestion path if their number starts exceeding maxInmemoryPartsPerPartition This is a follow-up for `9310e9f584` , which removed data ingestion pacing. This can result in uncontrolled growth of in-memory parts under high data ingestion rate, which, in turn, can result in unbounded RAM usage, OOM crashes and slow query performance. While at it, consistently reset isInMerge field for parts passed to mergeParts() before returning from this function. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4828	2023-10-02 08:24:58 +02:00
Zakhar Bessarab	94627113db	lib/logstorage: prevent from panic during background merge (#4969 ) * lib/logstorage: prevent from panic during background merge Fixes panic during background merge when resulting block would contain more columns than maxColumnsPerBlock. Buffered data will be flushed and replaced by the next block. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/logstorage: clarify field description and comment Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-09-29 11:58:20 +02:00
Zakhar Bessarab	8a23d08c21	lib/logstorage: switch to read-only mode when running out of disk space (#4945 ) * lib/logstorage: switch to read-only mode when running out of disk space Added support of `--storage.minFreeDiskSpaceBytes` command-line flag to allow graceful handling of running out of disk space at `--storageDataPath`. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/logstorage: fix error handling logic during merge Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/logstorage: fix log level Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-09-29 11:55:38 +02:00
Zakhar Bessarab	9310e9f584	lib/logstorage/datadb: remove parts merge cond (#4828 ) It was added in order to limit number of goroutines performing assisted merges during ingestion. It turned out that blocking ingestion goroutines lower ingestion performance and limits overall ingestion around 40k items per seconds because of lock contention. Removing parts merge sync.Cond allows to remove lock contention at write path and significantly improves write performance. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-09-29 11:50:14 +02:00
Zakhar Bessarab	bea3431ed1	lib/storage/partition: add check to ensure parts exist on disk (#5017 ) * lib/storage/partition: add check to ensure parts exist on disk If part exists in parts.json but is missing on disk there will be a misleading error similar to "unexpected number of substrings in the part name". This change forces verification of part existence and throws a correct error in case it is missing on disk. Such issue can be result of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005 or disk corruption. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/partition: use filepath.Join instead of string concatenation Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/partition: add action points for error message Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * all: add a check for missing part in lib/mergeset and lib/logstorage --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-09-19 11:17:41 +02:00
Aliaksandr Valialkin	edee262ecc	Makefile: update golangci-lint from v1.51.2 to v1.54.2 See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2	2023-09-01 10:16:42 +02:00
Aliaksandr Valialkin	317a273c6d	lib/logstorage: eliminate data race when clearing s.ptwHot after deleting the corresponding partition The previous code could result in the following data race: 1. The s.ptwHot partition is marked to be deleted 2. ptw.decRef() is called on it 3. ptw.pt is set to nil 4. s.ptwHot.pt is accessed from concurrent goroutine, which leads to panic. The change clears s.ptwHot under s.partitionsLock in order to prevent from the data race. This is a follow-up for `8d50032dd6` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4895	2023-08-29 11:09:55 +02:00
crossoverJie	8d50032dd6	lib/logstorage: Set ptwHot to nil when the partition pointed by ptwHot is dropped (#4902 )	2023-08-29 11:01:19 +02:00
crossoverJie	cde5029bce	lib/logstorage: add nil check for ptwHot.pt (#4896 )	2023-08-27 01:24:26 +02:00
Aliaksandr Valialkin	f35d27aa2b	app/vlstorage: expose vl_data_size_bytes metric at /metrics page for tracking the on-disk data size (both indexdb and the data itself)	2023-07-31 07:56:53 -07:00

1 2 3 4

166 commits