Commit graph

154 commits

Author SHA1 Message Date
Aliaksandr Valialkin
45a3713bdb
lib/logstorage: preserve the order of tokens to check against bloom filters in AND filters
Previously tokens from AND filters were extracted in random order. This could slow down
checking them agains bloom filters if the most specific tokens go at the beginning of the AND filters.
Preserve the original order of tokens when matching them against bloom filters,
so the user could control the performance of the query by putting the most specific AND filters
at the beginning of the query.

While at it, add tests for getCommonTokensForAndFilters() and getCommonTokensForOrFilters().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556
2024-09-08 12:27:30 +02:00
Aliaksandr Valialkin
eaee2d7db4
lib/logstorage: improve error logging for incorrect queries passed to /select/logsql/stats_query and /select/logsql/stats_query_range functions 2024-09-08 11:24:44 +02:00
Aliaksandr Valialkin
1cd06ace5a
lib/logstorage: properly extract common tokens from unsupported OR filters
Previously the following query could miss rows matching !bar if these rows do not contain foo:

   foo OR !bar

This is because of incorrect detection of common tokens for OR filters - all the unsupported filters
were skipped (including the NOT filter (aka `!`)), while in this case zero common tokens must be returned.

While at it, move repetiteve code in TestFilterAnd and TestFilterOr into f function.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556
2024-09-08 11:14:55 +02:00
Aliaksandr Valialkin
0a40064a6f
app/vlselect: add /select/logsql/stats_query_range endpoint for building time series panels in VictoriaLogs plugin for Grafana
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6943
Updates https://github.com/VictoriaMetrics/victorialogs-datasource/issues/61
2024-09-07 00:41:47 +02:00
Aliaksandr Valialkin
c9bb4ddeed
app/vlselect: add /select/logsql/stats_query endpoint, which is going to be used by vmalert
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6942
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706
2024-09-06 23:06:43 +02:00
Aliaksandr Valialkin
00e7d5add3
lib/logstorage: substitute | operator with or operator at math pipe
This is needed for avoiding confusion between the `|` operator at `math` pipe and `|` pipe delimiter.
For example, the following query was parsed unexpectedly:

   * | math foo / bar | fields x

as

   * | math foo / (bar | fields) as x

Substituting `|` with `or` inside `math` pipe fixes this ambiguity.
2024-09-06 22:44:14 +02:00
Aliaksandr Valialkin
0205170409
lib/logstorage: consistently use nsecsPerDay constant and remove nsecPerDay constant 2024-09-06 16:17:04 +02:00
Aliaksandr Valialkin
258ccfb953
lib/logstorage: pre-calculate hashes from tokens used in bloom filter search
Previously per-token hashes for per-block bloom filters were re-calculated on every scanned block.
This could be slow when the number of tokens is big or when the number of blocks to scan is big.
Pre-calculate hashes for bloom filters and then use them for searching in bloom filters.
This improves performance by 2.5x for in(...) filters with many values to search inside `in()`.
2024-09-05 19:44:17 +02:00
Aliaksandr Valialkin
49e57ea80e
lib/logstorage: delete unused function - bloomfilter.containsAny 2024-09-05 16:21:06 +02:00
Aliaksandr Valialkin
2dd845fa53
lib/logstorage: properly fix incorrect extraction of common tokens for OR filters at distinct log fields
Previously (f1:foo OR f2:bar) was incorrectly returning `foo` token for `f1` and `bar` token for `f2`.
These tokens were used for checking against bloom filter for every data block, so the data block,
which didn't contain simultaneously `foo` token for `f1` field and `bar` token for `f2` field, was skipped.
This was incorrect, since such a block may contain logs matching the original OR filter.

The fix is to return common tokens from `OR`-delimted filters only if these tokens exist at EVERY such filter
for the given field name. If some `OR`-delimited filter misses the given field name, then `OR`-delimited filters
do not contain common tokens, which could be used for checking against bloom filter.

While at it, add more tests covering various edge cases for filters delimited by AND and OR.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556
2024-09-05 14:29:50 +02:00
jackyin
975ed27a76
lib/logstorage: and filter results in unexpected response (#6556)
fix #6554
andfilter shouldn't return orfilter field which result in bloomfilter
return false.

---------

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-09-03 10:17:44 +02:00
Aliaksandr Valialkin
ac06569c49
app/vlinsert/loki: use easyproto instead for parsing Loki protobuf messages 2024-07-10 03:05:17 +02:00
Aliaksandr Valialkin
aa9bb99527
lib/logstorage: drop all the pipes from the query when calculating the number of matching logs at /select/logsql/hits API 2024-07-10 00:39:28 +02:00
Aliaksandr Valialkin
3c02937a34
all: consistently use 'any' instead of 'interface{}'
'any' type is supported starting from Go1.18. Let's consistently use it
instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.
2024-07-10 00:20:37 +02:00
Aliaksandr Valialkin
a9525da8a4
lib: consistently use f-tests instead of table-driven tests
This makes easier to read and debug these tests. This also reduces test lines count by 15% from 3K to 2.5K
See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e

While at it, consistently use t.Fatal* instead of t.Error*, since t.Error* usually leads
to more complicated and fragile tests, while it doesn't bring any practical benefits over t.Fatal*.
2024-07-09 22:40:50 +02:00
Aliaksandr Valialkin
c0caa69939
lib/logstorage: use quicktemplate.AppendJSONString instead of strconv.AppendQuote for encoding JSON strings
The strconv.AppendQuote improperly encodes special chars such as \x1b . They must be encoded as \u001b .

See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24
2024-07-05 01:22:23 +02:00
Aliaksandr Valialkin
3b6c78c26c
lib/logstorage: allow writing after N in front of before N at stream_context pipe 2024-07-02 01:38:20 +02:00
Aliaksandr Valialkin
6bb66cb3e9
lib/logstorage: properly search for the surrounding logs in stream_context pipe
The set of log fields in the found logs may differ from the set of log fields present in the log stream.
So compare only the log fields in the found logs when searching for the matching log entry in the log stream.

While at it, return _stream field in the delimiter log entry, since this field is used by VictoriaLogs Web UI
for grouping logs by log streams.
2024-07-01 02:29:50 +02:00
Aliaksandr Valialkin
bb0deb7ac4
lib/logstorage: add ability to store sorted log position into a separate field with sort ... rank <fieldName> syntax 2024-07-01 01:44:17 +02:00
Aliaksandr Valialkin
dc291d8980
lib/logstorage: add delimiter between log chunks returned from | stream_context pipe 2024-07-01 01:30:37 +02:00
Aliaksandr Valialkin
d4ca651547
lib/logstorage: add stream_context pipe, which allows selecting surrounding logs for the matching logs 2024-06-28 19:14:29 +02:00
Aliaksandr Valialkin
0730f1324d
lib/logstorage: it is safe using | unroll pipe in live tailing
`| unroll` pipe can make multiple copies of rows from the input row.
This doesn't break live tailing, so allow `| unroll` pipe in live tailing.
2024-06-27 19:44:57 +02:00
Aliaksandr Valialkin
87f1c8bd6c
lib/logstorage: work-in-progress 2024-06-27 14:20:43 +02:00
Aliaksandr Valialkin
dff5008392
app/vlstorage: add -retention.maxDiskSpaceUsageBytes command-line flag for limiting the retention at VictoriaLogs by disk space usage 2024-06-25 17:30:33 +02:00
Aliaksandr Valialkin
3eacd43fff
lib/logstorage: parse syslog structured data into separate fields in order to simplify further querying of this data 2024-06-25 14:53:39 +02:00
Aliaksandr Valialkin
9e1c037249
lib/logstorage: properly parse timezone offset at TryParseTimestampRFC3339Nano()
The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets.

While at it, make tryParseTimestampISO8601 function private in order to prevent
from improper usage of this function from outside the lib/logstorage package.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508
2024-06-25 14:53:38 +02:00
Aliaksandr Valialkin
7252c5d258
lib/logstorage: make golangci-lint happy 2024-06-25 03:04:21 +02:00
Aliaksandr Valialkin
de7450b7e0
lib/logstorage: work-in-progress 2024-06-24 23:27:12 +02:00
Aliaksandr Valialkin
7229dd8c33
lib/logstorage: work-in-progress 2024-06-20 03:10:08 +02:00
Aliaksandr Valialkin
e498fa6960
app/vlinsert/syslog: allow accepting syslog messages with different configs at different ports 2024-06-17 23:16:34 +02:00
Aliaksandr Valialkin
2b6a634ec0
lib/logstorage: work-in-progress 2024-06-17 12:13:18 +02:00
Aliaksandr Valialkin
8f5dc966f6
lib/logstorage: work-in-progress 2024-06-11 17:50:32 +02:00
Aliaksandr Valialkin
0521e58a09
lib/logstorage: work-in-progress 2024-06-10 18:42:19 +02:00
Aliaksandr Valialkin
55d8379ae6
lib/logstorage: work-in-progress 2024-06-06 12:27:05 +02:00
Aliaksandr Valialkin
80a7c65ab7
lib/logstorage: allow using eval keyword instead of math keyword in math pipe 2024-06-05 10:07:49 +02:00
Aliaksandr Valialkin
43cf221681
lib/logstorage: work-in-progress 2024-06-05 03:18:12 +02:00
Aliaksandr Valialkin
96c29ab403
lib/logstorage: allow typing asc in sort pipe for the sake of consistency with desc 2024-06-04 02:29:10 +02:00
Aliaksandr Valialkin
539fce9227
lib/logstorage: work-in-progress 2024-06-04 01:49:02 +02:00
Aliaksandr Valialkin
b30e80b071
lib/logstorage: work-in-progress 2024-05-30 16:19:23 +02:00
Aliaksandr Valialkin
1de187bcb7
lib/logstorage: work-in-progress 2024-05-29 01:52:13 +02:00
Aliaksandr Valialkin
0aafca29be
lib/logstorage: work-in-progress 2024-05-28 19:29:41 +02:00
Aliaksandr Valialkin
99138e15c0
lib/logstorage: fix golangci-lint warnings 2024-05-26 02:01:32 +02:00
Aliaksandr Valialkin
1e203f35f7
lib/logstorage: work-in-progress 2024-05-26 01:55:21 +02:00
Aliaksandr Valialkin
7ac529c235
lib/logstorage: work-in-progress 2024-05-25 22:59:13 +02:00
Aliaksandr Valialkin
0b629ce5a5
lib/logstorage: re-use per-shard fields across processed blocks in pipePackJSON and pipeUnroll 2024-05-25 22:13:32 +02:00
Aliaksandr Valialkin
dc55146752
lib/logstorage: work-in-progress 2024-05-25 21:36:16 +02:00
Aliaksandr Valialkin
e2590f0485
lib/logstorage: work-in-progress 2024-05-25 00:30:58 +02:00
Aliaksandr Valialkin
4b458370c1
lib/logstorage: work-in-progress 2024-05-24 03:06:55 +02:00
Alexander Marshalov
7da541360e
[vmlogs] fixed time parsing with millisecond precision time (#6293) (#6295)
fix for #6293

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-05-22 21:46:50 +02:00
Aliaksandr Valialkin
22107421eb
lib/logstorage: work-in-progress 2024-05-22 21:01:20 +02:00
Aliaksandr Valialkin
bc4a0b8f37
lib/logstorage: fix golangci-lint warnings 2024-05-20 11:04:12 +02:00
Aliaksandr Valialkin
ad505a7a9a
lib/logstorage: work-in-progress 2024-05-20 04:08:30 +02:00
Aliaksandr Valialkin
0aa19a2837
lib/logstorage: work-in-progress 2024-05-15 04:55:44 +02:00
Aliaksandr Valialkin
da3af090c6
lib/logstorage: work-in-progress 2024-05-14 03:05:03 +02:00
Aliaksandr Valialkin
cb35e62e04
lib/logstorage: work-in-progress
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6258
2024-05-14 01:49:23 +02:00
Aliaksandr Valialkin
cc2647d212
lib/encoding: optimize UnmarshalVarUint64, UnmarshalVarInt64 and UnmarshalBytes a bit
Change the return values for these functions - now they return the unmarshaled result plus
the size of the unmarshaled result in bytes, so the caller could re-slice the src for further unmarshaling.

This improves performance of these functions in hot loops of VictoriaLogs a bit.
2024-05-14 01:23:54 +02:00
hagen1778
17283fab6c
lib/logstorage: make linter happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-13 15:35:11 +02:00
Aliaksandr Valialkin
9dbd0f9085
lib/logstorage: initial implementation of pipes in LogsQL
See https://docs.victoriametrics.com/victorialogs/logsql/#pipes
2024-05-12 16:33:31 +02:00
Aliaksandr Valialkin
590160ddbb
lib/slicesutil: add helper functions for setting slice length and extending its capacity
The added helper functions - SetLength() and ExtendCapacity() - replace error-prone code with simple function calls.
2024-05-12 11:32:17 +02:00
wanshuangcheng
83216e956c
chore: fix function names in comment (#6076)
Signed-off-by: wanshuangcheng <wanshuangcheng@outlook.com>
2024-04-08 01:11:12 -07:00
Aliaksandr Valialkin
918cccaddf
all: fix golangci-lint(revive) warnings after 0c0ed61ce7
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001
2024-04-02 23:16:29 +03:00
XLONG96
a5795f533d
lib/logstorage: avoid panic when parsing regex with stream filter (#5897) 2024-02-29 15:31:54 +02:00
Aliaksandr Valialkin
4617dc8bbe
lib/logstorage: consistently use atomic.* types instead of atomic.* functions on regular types
See ea9e2b19a5
2024-02-23 23:46:13 +02:00
Aliaksandr Valialkin
f81b480905
lib/mergeset: consistently use atomic.* types instead of atomic.* function calls on ordinary types
See ea9e2b19a5
2024-02-23 23:29:35 +02:00
Aliaksandr Valialkin
275335c181
lib/logstorage: consistently use atomic.* type for refCount and mustDrop fields in datadb and storage structs in the same way as it is used in lib/storage
See ea9e2b19a5 and a204fd69f1
2024-02-23 23:04:42 +02:00
Aliaksandr Valialkin
0514091948
app/vlselect: follow-up for 451d2abf50
- Consistently return the first `limit` log entries if the total size of found log entries doesn't exceed 1Mb.
  See app/vlselect/logsql/sort_writer.go . Previously random log entries could be returned with each request.
- Document the change at docs/VictoriaLogs/CHANGELOG.md
- Document the `limit` query arg at docs/VictoriaLogs/querying/README.md
- Make the change less intrusive.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5674
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5778
2024-02-18 23:05:51 +02:00
Dmytro Kozlov
451d2abf50
Enable the limit query param for the /select/logsql/query (#5778)
* app/vlselect: add limit for logs query

* app/vlselect: CHANGELOG.md

* app/vlselect: stop search process if limit is reached, update logic, remove default limit

* app/vlselect: fix tests

* app/vlselect: fix filter tests

* app/vlselect: fix tests
2024-02-18 22:58:47 +02:00
noodles2hg
cafd6f08b3
lib/logstorage: proper exit during block search (#5400) 2024-02-01 12:11:05 +00:00
Jiajing LU
333bda8702
count inmemoryParts that have not been taken for merge (#5447) 2024-02-01 12:06:28 +00:00
Aliaksandr Valialkin
2655c02d5e
lib/logstorage: make sure that WaitGroup.Add isnt called after stopCh is closed and WaitGroup.Wait is called
This protects from rare panic, which may occur during graceful shutdown of VictoriaLogs
2024-01-26 21:17:02 +01:00
Aliaksandr Valialkin
3449d563bd
all: add up to 10% random jitter to the interval between periodic tasks performed by various components
This should smooth CPU and RAM usage spikes related to these periodic tasks,
by reducing the probability that multiple concurrent periodic tasks are performed at the same time.
2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin
cef7a39ba3
lib/logstorage: always check the previous indexBlockHeader for blocks with matching tenantID and/or streamID
The previous indexBlockHeader may contain blocks for the matching tenantID and/or streamID,
so it must be scanned unconditionally during the search.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5295
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4856

This is a follow-up for 89dcbc2fe7
2023-11-13 23:13:53 +01:00
XLONG96
89dcbc2fe7
lib/logstorage: fix streamID and tenantID search (#4856) (#5295) 2023-11-13 23:09:39 +01:00
Aliaksandr Valialkin
42dd71bb63
all: consistently use %w instead of %s in when error is passed to fmt.Errorf()
This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.
2023-10-25 21:24:03 +02:00
Zakhar Bessarab
b296c8e95a
lib/logstorage: fix free space check (#5113)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-10-03 12:39:41 +02:00
Aliaksandr Valialkin
8dce4eb189
lib/logstorage: follow-up for 94627113db
- Move uniqueFields from rows to blockStreamMerger struct.
  This allows localizing all the references to uniqueFields inside blockStreamMerger.mustWriteBlock(),
  which should improve readability and maintainability of the code.

- Remove logging of the event when blocks cannot be merged because they contain more than maxColumnsPerBlock,
  since the provided logging didn't provide the solution for the issue with too many columns.
  I couldn't figure out the proper solution, which could be helpful for end user,
  so decided to remove the logging until we find the solution.

This commit also contains the following additional changes:

- It truncates field names longer than 128 chars during logs ingestion.
  This should prevent from ingesting bogus field names.
  This also should prevent from too big columnsHeader blocks,
  which could negatively affect search query performance,
  since columnsHeader is read on every scan of the corresponding data block.

- It limits the maximum length of const column value to 256.
  Longer values are stored in an ordinary columns.
  This helps limiting the size of columnsHeader blocks
  and improving search query performance by avoiding
  reading too long const columns on every scan of the corresponding data block.

- It deduplicates columns with identical names during data ingestion
  and background merging. Previously it was possible to pass columns with duplicate names
  to block.mustInitFromRows(), and they were stored as is in the block.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4969
2023-10-02 19:19:08 +02:00
Aliaksandr Valialkin
7b33a27874
lib/logstorage: follow-up for 8a23d08c21
- Compare the actual free disk space to the value provided via -storage.minFreeDiskSpaceBytes
  directly inside the Storage.IsReadOnly(). This should work fast in most cases.
  This simplifies the logic at lib/storage.

- Do not take into account -storage.minFreeDiskSpaceBytes during background merges, since
  it results in uncontrolled growth of small parts when the free disk space approaches -storage.minFreeDiskSpaceBytes.
  The background merge logic uses another mechanism for determining whether there is enough
  disk space for the merge - it reserves the needed disk space before the merge
  and releases it after the merge. This prevents from out of disk space errors during background merge.

- Properly handle corner cases for flushing in-memory data to disk when the storage
  enters read-only mode. This is better than losing the in-memory data.

- Return back Storage.MustAddRows() instead of Storage.AddRows(),
  since the only case when AddRows() can return error is when the storage is in read-only mode.
  This case must be handled by the caller by calling Storage.IsReadOnly()
  before adding rows to the storage.
  This simplifies the code a bit, since the caller of Storage.MustAddRows() shouldn't handle
  errors returned by Storage.AddRows().

- Properly store parsed logs to Storage if parts of the request contain invalid log lines.
  Previously the parsed logs could be lost in this case.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4945
2023-10-02 16:52:23 +02:00
Aliaksandr Valialkin
10d9214980
lib/logstorage: run up to GOMAXPROCS flushers of old in-memory parts to disk
One flusher isn't enough under high data ingestion rate.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
2023-10-02 16:20:59 +02:00
Aliaksandr Valialkin
da9ef90277
lib/logstorage: assist merging in-memory parts at data ingestion path if their number starts exceeding maxInmemoryPartsPerPartition
This is a follow-up for 9310e9f584 , which removed data ingestion pacing.
This can result in uncontrolled growth of in-memory parts under high data ingestion rate,
which, in turn, can result in unbounded RAM usage, OOM crashes and slow query performance.

While at it, consistently reset isInMerge field for parts passed to mergeParts() before returning from this function.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4828
2023-10-02 08:24:58 +02:00
Zakhar Bessarab
94627113db
lib/logstorage: prevent from panic during background merge (#4969)
* lib/logstorage: prevent from panic during background merge

Fixes panic during background merge when resulting block would contain more columns than maxColumnsPerBlock.
Buffered data will be flushed and replaced by the next block.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/logstorage: clarify field description and comment

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-29 11:58:20 +02:00
Zakhar Bessarab
8a23d08c21
lib/logstorage: switch to read-only mode when running out of disk space (#4945)
* lib/logstorage: switch to read-only mode when running out of disk space

Added support of `--storage.minFreeDiskSpaceBytes` command-line flag to allow graceful handling of running out of disk space at `--storageDataPath`.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/logstorage: fix error handling logic during merge

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/logstorage: fix log level

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-29 11:55:38 +02:00
Zakhar Bessarab
9310e9f584
lib/logstorage/datadb: remove parts merge cond (#4828)
It was added in order to limit number of goroutines performing assisted merges during ingestion.
It turned out that blocking ingestion goroutines lower ingestion performance and limits overall ingestion around 40k items per seconds because of lock contention.
Removing parts merge sync.Cond allows to remove lock contention at write path and significantly improves write performance.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-29 11:50:14 +02:00
Zakhar Bessarab
bea3431ed1
lib/storage/partition: add check to ensure parts exist on disk (#5017)
* lib/storage/partition: add check to ensure parts exist on disk

If part exists in parts.json but is missing on disk there will be a misleading error similar to "unexpected number of substrings in the part name".

This change forces verification of part existence and throws a correct error in case it is missing on disk.

Such issue can be result of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005 or disk corruption.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/partition: use filepath.Join instead of string concatenation

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/partition: add action points for error message

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* all: add a check for missing part in lib/mergeset and lib/logstorage

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-19 11:17:41 +02:00
Aliaksandr Valialkin
edee262ecc
Makefile: update golangci-lint from v1.51.2 to v1.54.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2
2023-09-01 10:16:42 +02:00
Aliaksandr Valialkin
317a273c6d
lib/logstorage: eliminate data race when clearing s.ptwHot after deleting the corresponding partition
The previous code could result in the following data race:
1. The s.ptwHot partition is marked to be deleted
2. ptw.decRef() is called on it
3. ptw.pt is set to nil
4. s.ptwHot.pt is accessed from concurrent goroutine, which leads to panic.

The change clears s.ptwHot under s.partitionsLock in order to prevent from the data race.

This is a follow-up for 8d50032dd6

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4895
2023-08-29 11:09:55 +02:00
crossoverJie
8d50032dd6
lib/logstorage: Set ptwHot to nil when the partition pointed by ptwHot is dropped (#4902) 2023-08-29 11:01:19 +02:00
crossoverJie
cde5029bce
lib/logstorage: add nil check for ptwHot.pt (#4896) 2023-08-27 01:24:26 +02:00
Aliaksandr Valialkin
f35d27aa2b
app/vlstorage: expose vl_data_size_bytes metric at /metrics page for tracking the on-disk data size (both indexdb and the data itself) 2023-07-31 07:56:53 -07:00
Aliaksandr Valialkin
f548adce0b
app/vlinsert/loki: follow-up after 09df5b66fd
- Parse protobuf if Content-Type isn't set to `application/json` - this behavior is documented at https://grafana.com/docs/loki/latest/api/#push-log-entries-to-loki

- Properly handle gzip'ped JSON requests. The `gzip` header must be read from `Content-Encoding` instead of `Content-Type` header

- Properly flush all the parsed logs with the explicit call to vlstorage.MustAddRows() at the end of query handler

- Check JSON field types more strictly.

- Allow parsing Loki timestamp as floating-point number. Such a timestamp can be generated by some clients,
  which store timestamps in float64 instead of int64.

- Optimize parsing of Loki labels in Prometheus text exposition format.

- Simplify tests.

- Remove lib/slicesutil, since there are no more users for it.

- Update docs with missing info and fix various typos. For example, it should be enough to have `instance` and `job` labels
  as stream fields in most Loki setups.

- Allow empty of missing timestamps in the ingested logs.
  The current timestamp at VictoriaLogs side is then used for the ingested logs.
  This simplifies debugging and testing of the provided HTTP-based data ingestion APIs.

The remaining MAJOR issue, which needs to be addressed: victoria-logs binary size increased from 13MB to 22MB
after adding support for Loki data ingestion protocol at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4482 .
This is because of shitty protobuf dependencies. They must be replaced with another protobuf implementation
similar to the one used at lib/prompb or lib/prompbmarshal .
2023-07-20 16:48:21 -07:00
Zakhar Bessarab
09df5b66fd
app/vlinsert: add support of loki push protocol (#4482)
* app/vlinsert: add support of loki push protocol

- implemented loki push protocol for both Protobuf and JSON formats
- added examples in documentation
- added example docker-compose

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vlinsert: move protobuf metric into its own file

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* deployment/docker/victorialogs/promtail: update reference to docker image

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* deployment/docker/victorialogs/promtail: make volume name unique

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vlinsert/loki: add license reference

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* deployment/docker/victorialogs/promtail: fix volume name

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/VictoriaLogs/data-ingestion: add stream fields for loki JSON ingestion example

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vlinsert/loki: move entities to places where those are used

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vlinsert/loki: refactor to use common components

- use CommonParameters from insertutils
- stop ingestion after first error similar to elasticsearch and jsonline

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vlinsert/loki: address review feedback

- add missing logstorage.PutLogRows calls
- refactor tenant ID parsing to use common function
- reduce number of allocations for parsing by reusing  logfields slices
- add tests and benchmarks for requests processing funcs

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-07-20 10:10:55 +02:00
Aliaksandr Valialkin
163572ea97
lib/logstorage: go fmt after a8000b74c5 2023-07-18 16:04:51 -07:00
Aliaksandr Valialkin
a8000b74c5
lib/logstorage: properly encode "offset" search word just after _time filter 2023-07-18 16:00:06 -07:00
Aliaksandr Valialkin
ed00b03ecb
lib/logstorage: add abilty to speficy offset for the selected _time filter
The following syntax is supported: _time:filter offset off
For example:

- _time:5m offset 1h - 5-minute duration one hour before the current time
- _time:2023 offset 2w - 2023 year with the 2 weeks offset in the past
2023-07-17 19:07:42 -07:00
Aliaksandr Valialkin
118b093bdd
lib/logstorage: log the -retentionPeriod and -futureRetention values when the ingested log entry has timestamp outside the configured retention
This should simplify debugging
2023-07-17 19:07:41 -07:00
Aliaksandr Valialkin
bdfb80668d
lib/logstorage: support for short form of _time:(now-duration, now] filter: _time:duration 2023-07-17 19:07:40 -07:00
Aliaksandr Valialkin
3bf58326e7
lib/logstorage: LogsQL: replace exact_prefix("...") with exact("..."*)
This makes LogsQL queries more consistent with i("...") and i("..."*) syntax
2023-07-17 19:07:40 -07:00
Aliaksandr Valialkin
a79e53d82a
lib/logstorage: fix TestValuesEncoder() on 32-bit architectures 2023-07-13 11:27:13 -07:00
Dmytro Kozlov
79c42814cf
lib/logstorage: fix panic (#4620) 2023-07-13 09:53:41 +02:00
Aliaksandr Valialkin
3c5623ce7f
lib/logstorage: go fmt 2023-07-04 14:13:14 -07:00
Aliaksandr Valialkin
6d35d21f60
lib/logstorage: fix make test-pure tests 2023-07-04 13:14:30 -07:00