github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-11 15:34:56 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	6afd66dcc8	lib/logstorage: remove needExecuteQuery from filterIn and filterStreamID, since it isn't needed	2025-02-19 01:45:06 +01:00
Nikolay	12c39b660d	lib/httpserver: properly check basic authorization Commit `68791f9ccc` introduced regression. It performed basicAuth check before built-in routes. It made impossible to bypass basic authorization with `authKey` param. This commit fixeds that issue and removes unneeded check. It also adds integration tests for this case. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7345 --------- Signed-off-by: f41gh7 <nik@victoriametrics.com>	2025-02-17 15:52:22 +01:00
Aliaksandr Valialkin	5cd7e1cc2f	lib/logstorage: consistently use Query.cloneShallow() for shallow cloning of the original query	2025-02-14 18:55:08 +01:00
Aliaksandr Valialkin	313cb716e2	lib/logstorage: move common code for parsing a query inside parens into a separate function	2025-02-14 18:26:05 +01:00
Aliaksandr Valialkin	f768c15bb2	lib/logstorage: make sure that chunkedAllocator is isn't used from concurrently running goroutines This is needed in order to avoid data races	2025-02-14 15:34:08 +01:00
Aliaksandr Valialkin	95422e8f7a	lib/logstorage: ensure that statsProcessor.updateStatsForAllRows() is called on non-empty blockResult This eliminates a class of potential bugs with incorrect stats calculations when an additional filter is applied to the blockResult before passing it to the stats function, and this filter removes all the rows from blockResult.	2025-02-14 15:34:07 +01:00
Aliaksandr Valialkin	40bde6bd28	lib/logstorage: properly initialize minValue and maxValue at pipeLenProcessorShard and pipeHashProcessorShard Previously this could result in incorrect 0 result of min() stats function applied to the len() results. This is a follow-up for `eddeccfcfb`	2025-02-14 15:34:07 +01:00
Artem Fetishev	ba0d7dc2fc	Allow disabling per-day index (#6976 ) ### Describe Your Changes Allow disabling the per-day index using the `-disablePerDayIndex` flag. This should significantly improve the ingestion rate and decrease the disk space usage for the use cases that assume small or no churn rate. See the docs added to `docs/README.md` for details. Both improvements are due to no data written to the per-day index. Benchmark results: ```shell rm -Rf ./lib/storage/Benchmark*; go test ./lib/storage -run=NONE -bench=BenchmarkStorageInsertWithAndWithoutPerDayIndex --loggerLevel=ERROR goos: linux goarch: amd64 pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage cpu: 13th Gen Intel(R) Core(TM) i7-1355U BenchmarkStorageInsertWithAndWithoutPerDayIndex/HighChurnRate/perDayIndexes-12 1 3850268120 ns/op 39.56 data-MiB 28.20 indexdb-MiB 259722 rows/s BenchmarkStorageInsertWithAndWithoutPerDayIndex/HighChurnRate/noPerDayIndexes-12 1 2916865725 ns/op 39.57 data-MiB 25.73 indexdb-MiB 342834 rows/s BenchmarkStorageInsertWithAndWithoutPerDayIndex/NoChurnRate/perDayIndexes-12 1 2218073474 ns/op 9.772 data-MiB 13.73 indexdb-MiB 450842 rows/s BenchmarkStorageInsertWithAndWithoutPerDayIndex/NoChurnRate/noPerDayIndexes-12 1 1295140898 ns/op 9.771 data-MiB 0.3566 indexdb-MiB 772119 rows/s PASS ok github.com/VictoriaMetrics/VictoriaMetrics/lib/storage 11.421s ``` Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>	2025-02-14 12:35:51 +01:00
Roman Khavronenko	768525928d	bump golangci-lint to v1.64.4 See https://github.com/golangci/golangci-lint/releases/tag/v1.64.4 * address linting errors Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-02-13 11:12:06 +01:00
Nikolay	b9a7bda0a1	lib/storage: refactoring introduce OpenOptions MustOpenStorage function may accept variable number of optional arguments. This commit combines optional arguments into dedicated OpenOptions struct. It reduces complexity of adding new optional arguments. Related PR: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8118	2025-02-13 11:10:44 +01:00
Aliaksandr Valialkin	49bd3bc70a	lib/logstorage: attempt to use int64 bucketing before trying float64 bucketing at blockResult.getbucketedValue() int64 bucketing is lossless and faster than float64 bucketing, so it is preferred over float64 bucketing	2025-02-13 00:01:19 +01:00
Aliaksandr Valialkin	c75cce1384	lib/logstorage: refactor bucketing code 1. Use distinct code paths for blockResult.getValues() and blockResult.getValuesBucketed(). This should simplify debugging and maintenance of the resulting code. 2. Do not load column values if all the values in the block fit the same bucket. Use blockResultColumn.minValue and blockResultColumn.maxValue for determining whether column values must be loaded via blockResultColumn.getValuesEncoded(). This signiciantly improves performance for big buckets, which cover all the column values in a block. 3. Properly calculate buckets for negative values. 4. Properly adjust weekly buckets by Monday.	2025-02-12 21:46:12 +01:00
Zhu Jiekun	5fad3c8492	docs: [all] fix typo for description of flag -pprofAuthKey (#8286 ) ### Describe Your Changes fix typo for description of flag -pprofAuthKey ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2025-02-12 12:20:12 +01:00
Andrii Chubatiuk	3a27073634	app/vlinsert: add OpenTelemetry ingested logs trace_id and span_id This commit parses additional optional fields from OpenTelemetry logs protocol. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8255	2025-02-12 10:47:38 +01:00
Aliaksandr Valialkin	46293ac5f7	lib/logstorage: improve performance of `stats by (...)` bucketing a bit	2025-02-12 03:21:03 +01:00
Aliaksandr Valialkin	422caf6bd7	lib/logstorage/pipe_sort_topk.go: do not read _time field values if they aren't referred in the `sort by(...)` This improves performance for queries, which use `sort by (...) limit N` without mentioning _time field. For example, the following query must work faster now _time:1d \| rm _time \| sort by (request_duration desc) limit 10	2025-02-11 22:48:16 +01:00
Aliaksandr Valialkin	33c55d7a22	lib/logstorage/block_result.go: remove misleading comment left after the commit `eddeccfcfb`	2025-02-11 22:48:16 +01:00
Aliaksandr Valialkin	335071cf3d	lib/logstorage: optimize parsing timezone offset at TryParseTimestampRFC3339Nano() - Add a fast path for timestamps ending with 'Z' - Use strings.LastIndexAny instead of strings.IndexAny for searching for timezone offset at the end of the string. This works faster for timestamps with sub-second precision.	2025-02-11 22:48:15 +01:00
Aliaksandr Valialkin	292f709725	lib/logstorage: optimize `pipe` pipe for repeated strings, uint8 values and tuples Update the pipe state only once per each series of repeated strings, uint8 values and tuples. This improves performance a bit for the following `top` pipes: - top (string_field) - top (uint8_field) - top (field1, ..., fieldN) Do not apply the optimization for uint16, uint32, uint64 and int64 fields, since they usually contain big number of unique values, which do not repeat most of the time.	2025-02-11 17:00:46 +01:00
Aliaksandr Valialkin	81d359507d	lib/logstorage: properly compare RFC3339 timestamps with sub-second precision in lessString() Previously RFC3339 timestamps with sub-second precision could be incorrectly compared by lessString(). For example, 2025-01-20T10:20:30.1Z was incorrectly treated as smaller than 2025-01-20T10:20:30.09Z, because the first timestamp has smaller decimal number after the last dot than the second timestamp.	2025-02-10 15:00:59 +01:00
Aliaksandr Valialkin	48602a1ae8	lib/logstorage: optimize performance for `stats`, `top` and `uniq` pipes a bit Split unique values (groups) into shards according to the configured concurrency during processing of the matching rows if the number of unique values exceeds the hardcoded threshold. Previously this splitting was performed unconditionally at the merge stage when merging independently calculated per-CPU states into a single state. It is faster to perform the split during rows processing if the number of unique values is big. This gives up to 30% perfromance improvements when these pipes are applied to big number of unique values (groups).	2025-02-06 13:46:32 +01:00
Aliaksandr Valialkin	171d4019cd	lib/logstorage: properly limit the number of concurrent workers at `stats`, `top` and `uniq` pipes according to the provided `options(concurrency=N)` The number of worker shards per each pipe processor is created during query initialization. This number equals to the `options(concurrency=N)` if this option is set or to the number of available CPU cores. This means that all the pipes must adhere the given concurrency when passing data blocks to the next pipe. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8201 The bug has been introduced in `0214aa328e`	2025-02-06 09:16:56 +01:00
Nikolay	78dc9533fc	app/vmauth: allow to serve internal API and different address vmauth uses 'lib/httpserver' for serving HTTP requests. This server unconditionally defines built-in routes (such as '/metrics', '/health', etc). It makes impossible to proxy `HTTP` requests to backends with the same routes. Since vmauth's httpserver matches built-in route and return local response. This commit adds new flag `httpInternalListenAddr` with default empty value. Which removes internal API routes from public router and exposes it at separate http server. For example given configuration disables private routes at `0.0.0.0:8427` address and serves it at `0.0.0.0:8426`: `./bin/vmauth --auth.config=config.yaml --httpListenAddr=:8427 --httpInternalListenAddr=127.0.0.1:8426` Related issues: - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6468 - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7345	2025-02-05 17:10:11 +01:00
Artem Fetishev	631b736bc2	lib/storage: fix cardinality limiting for cases when insertion takes fast path (#8218 ) ### Describe Your Changes The cardinality limiter in this case does not receive the actual metricID but some other value found in r.TSID.MetricID and is not initialized. Depending on the system and/or go runtime implementation, this value can be 0 or some garbage value (which shouldn't have too wide a range). Thus, there basically no limit for inserted metricIDs. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2025-02-05 15:22:34 +01:00
Dmytro Kozlov	64e2c017ff	Merge branch 'master' into get-vllogs-tenants	2025-01-29 16:19:30 +01:00
Dmitry Konishchev	17fd53bd37	app/vmselect: properly cancel long running requests on client connection close At this time `bufferedwriter` [silently ignores connection close errors](`78eaa056c0/lib/bufferedwriter/bufferedwriter.go (L67)`). It may be very convenient in some situations (to not log such unimportant errors), but it's too implicit and unsafe for the others. For example, if you close [export API](https://docs.victoriametrics.com/#how-to-export-time-series) client connection in the middle of communication, VictoriaMetrics won't notice it and will start to hog CPU by exporting all the data into nowhere until it process all of them. If you'll make a few retries, it will be effectively a DoS on the server. This commit replaces this implicit error suppressing with explicit error handling which fixes the issue with export API. Issue was introduced at `e78f3ac8ac`	2025-01-29 16:09:32 +01:00
Roman Khavronenko	13c4324bb5	lib/cgroup: warn users about using fractional CPU quotas (#8175 ) See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7988 ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-01-29 13:19:08 +01:00
Aliaksandr Valialkin	95f182053b	lib/logstorage: remove unnecesary abstraction - RowsFormatter It is better to use the AppendFieldsToJSON function directly instead of hiding it under RowsFormatter abstraction.	2025-01-28 18:03:18 +01:00
Aliaksandr Valialkin	3c036e0d31	lib/logstorage: ignore logs with too long field names during data ingestion Previously too long field names were silently truncated. This is not what most users expect. It is better ignoring the whole log entry in this case and logging it with the WARNING message, so human operator could notice and fix the ingestion of incorrect logs ASAP. The commit also adds and updates the following entries to VictoriaLogs faq: - https://docs.victoriametrics.com/victorialogs/faq/#how-many-fields-a-single-log-entry-may-contain - https://docs.victoriametrics.com/victorialogs/faq/#what-is-the-maximum-supported-field-name-length - https://docs.victoriametrics.com/victorialogs/faq/#what-length-a-log-record-is-expected-to-have These entries are referred at `-insert.maxLineSizeBytes` and `-insert.maxFieldsPerLine` command-line descriptions and at the WARNING messages, which are emitted when log entries are ignored because of some of these limits are exceeded.	2025-01-28 16:55:48 +01:00
Aliaksandr Valialkin	3b34241380	lib/fs/fsutil: move lib/envutil to the more appropriate place at lib/fs/fsutil This is a follow-up for `043d066133` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6871	2025-01-27 18:47:39 +01:00
Aliaksandr Valialkin	dc9dba71b2	lib/storage: open per-month partitions in parallel This should reduce the time needed for opening the storage with retentions exceeding a few months. While at at, limit the concurrency of opening partitions in parallel to the number of available CPU cores, since higher concurrency may increase RAM usage and CPU usage without performance improvements if opening a single partition is CPU-bound task. This is a follow-up for `17988942ab`	2025-01-27 16:07:14 +01:00
Aliaksandr Valialkin	fd1568531d	lib/filestream: use correct formatting option for error type in the error message	2025-01-27 15:23:52 +01:00
Dmytro Kozlov	2808f90492	Merge branch 'master' into get-vllogs-tenants	2025-01-27 09:44:23 +01:00
Aliaksandr Valialkin	17988942ab	lib/logstorage: open per-day partitions in parallel during startup This significantly reduces startup times when the storage contains large partitions over many days.	2025-01-27 00:34:02 +01:00
Aliaksandr Valialkin	ed05ae12c4	lib/logstorage: optimize unmarshalColumnNames a bit This should reduce the time needed for opening a large storage with many partitions, which contain logs with big number of fields (aka wide events). Thanks to @kiriklo for the initial idea at the pull request https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8061 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7937	2025-01-27 00:12:46 +01:00
Aliaksandr Valialkin	256924e2d6	lib/logstorage: improve error message by adding a link with the explanation why VictoriaLogs ignores logs with the size exceeding 2MB Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7972 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7984	2025-01-26 22:50:26 +01:00
Aliaksandr Valialkin	b5392337bf	lib/logstorage: `block_stat` pipe: return the path to the part where the block is stored	2025-01-26 22:36:47 +01:00
Aliaksandr Valialkin	043d066133	lib/{fs,filestream}: unconditionally disable fsync in tests Use the testing.Testing() function in order to determine whether the code runs in test. This allows running tests and fast speed without the need to specify DISABLE_FSYNC_FOR_TESTING environment variable. This is a follow-up for the commit `334cd92a6c` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6871	2025-01-26 21:47:20 +01:00
Aliaksandr Valialkin	900159a2d3	lib/logstorage: remove unneeded code after `202eb429a7` readerWithStats isn't used when reading column names from file	2025-01-26 20:04:15 +01:00
Aliaksandr Valialkin	ad6c587494	lib/logstorage: properly propagate extra filters to all the subqueries The purpose of extra filters ( https://docs.victoriametrics.com/victorialogs/querying/#extra-filters ) is to limit the subset of logs, which can be queried. For example, it is expected that all the queries with `extra_filters={tenant=123}` can access only logs, which contain `123` value for the `tenant` field. Previously this wasn't the case, since the provided extra filters weren't applied to subqueries. For example, the following query could be used to select all the logs outside `tenant=123`, for any `extra_filters` arg: * \| union({tenant!=123}) This commit fixes this by propagating extra filters to all the subqueries. While at it, this commit also properly propagates [start, end] time range filter from HTTP querying APIs into all the subqueries, since this is what most users expect. This behaviour can be overriden on per-subquery basis with the `options(ignore_global_time_filter=true)` option - see https://docs.victoriametrics.com/victorialogs/logsql/#query-options Also properly apply apply optimizations across all the subqueries. Previously the optimizations at Query.optimize() function were applied only to the top-level query.	2025-01-24 18:49:25 +01:00
Aliaksandr Valialkin	467cdd8a3d	lib: consistently use logger.Panicf("BUG: ...") for logging programming bugs logger.Fatalf("BUG: ...") complicates investigating the bug, since it doesn't show the call stack, which led to the bug. So it is better to consistently use logger.Panicf("BUG: ...") for logging programming bugs.	2025-01-24 16:39:21 +01:00
Nikolay	bfd83e3cca	app/vmselect: fixes panic data race at query tracing Previously, NewChild elements of querytracer could be referenced by concurrent storageNode goroutines. After earlier return ( if search.skipSlowReplicas is set), it is possible, that tracer objects could be still in-use by concurrent workers. It may cause panics and data races. Most probable case is when parent tracer is finished, but children still could write data to itself via Donef() method. It triggers read-write data race at trace formatting. This commit adds a new methods to the querytracer package, that allows to create children not referenced by parent and add it to the parent later. Orphaned child must be registered at the parent, when goroutine returns. It's done synchronously by the single caller via finishQueryTracer call. If child didn't finished work and reference for it is used by concurrent goroutine, new child must be created instead with context message. It prevents panics and possible data races. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8114 --------- Signed-off-by: f41gh7 <nik@victoriametrics.com> Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2025-01-24 13:56:09 +01:00
Phuong Le	a947ccf228	lib/logstorage: remove redundant error check	2025-01-24 07:51:10 +01:00
Aliaksandr Valialkin	7cdeb3a32c	lib/logstorage: inherit query options by nested queries This is a follow-up for `b620b5cff5`	2025-01-23 22:15:37 +01:00
Aliaksandr Valialkin	eddeccfcfb	lib/logstorage: add `hash` pipe for calculating hash over the given log field This pipe may be useful for sharding log entries among hash buckets.	2025-01-23 04:16:46 +01:00
Aliaksandr Valialkin	b620b5cff5	lib/logstorage: add an ability to set query concurrency on a per-query basis This is done via 'options(concurrency=N)' prefix for the query. For example, the following query is executed on at most 4 CPU cores: options(concurrency=4) _time:1d \| count_uniq(user_id) This allows reducing RAM and CPU usage at the cost of longer query execution times, since by default every query is executed in parallel on all the available CPU cores. See https://docs.victoriametrics.com/victorialogs/logsql/#query-options	2025-01-23 02:42:16 +01:00
Aliaksandr Valialkin	42c21ff671	lib/logstorage: always pass the current timestamp to newLexer() Also always initialize Query.timestamp with the timestamp from the lexer. This should avoid potential problems with relative timestamps inside inner queries. For example, the `_time:1h` filter in the following query is correctly executed relative to the current timestamp: foo:in(_time:1h \| keep foo)	2025-01-23 02:42:16 +01:00
Aliaksandr Valialkin	b9eb9fe72d	lib/logstorage: simplify the caller side of addNewItem() function	2025-01-23 02:42:16 +01:00
Andrii Chubatiuk	2adb5fe014	lib/protoparser/opentelemetry: do not drop histogram buckets, when sum is absent (#8054 ) Despite requirement in OpenTelemetry spec that histograms should contain sum, [OpenTelemetry collector promremotewrite translator](`37c8044abf/pkg/translator/prometheusremotewrite/helper.go (L222)`) and [Prometheus OpenTelemetry parsing](`d52e689a20/storage/remote/otlptranslator/prometheusremotewrite/helper.go (L264)`) skip only sum if it's absent. Our current implementation drops buckets if sum is absent, which causes issues for users, that are expecting a similar to Prometheus behaviour ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2025-01-21 12:46:55 +01:00
Aliaksandr Valialkin	bfbe06e912	lib/logstorage: add ability to execute INNER JOIN with `join` pipe	2025-01-20 16:56:20 +01:00
Zhu Jiekun	1f0b03aebe	docs: update docs for authKey, add authKey to HTTP 401 resp body (#7971 ) ### Describe Your Changes optimize for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6226 for user who set `AuthKey` flag, they will receive new response in body: ```go // query arg not set The provided authKey '' doesn't match -search.resetCacheAuthKey // incorrect query arg The provided authKey '5dxd71hsz==' doesn't match -search.resetCacheAuthKey ``` previously, they receive: ``` The provided authKey doesn't match -search.resetCacheAuthKey ``` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2025-01-20 12:42:53 +01:00
Dmytro Kozlov	2a553096bb	Merge branch 'master' into get-vllogs-tenants	2025-01-18 15:09:52 +01:00
Aliaksandr Valialkin	2eb15cf30c	lib/logstorage: merge top-level _stream:{...} filters in the query This should improve performance of queries, which contain multiple top-level _stream:{...} filters. This should help the case described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8037#issuecomment-2595854592	2025-01-16 20:46:53 +01:00
Aliaksandr Valialkin	499f0b9588	lib/logstorage: add a test for `union` pipe This is a follow-up for `f27e120aeb`	2025-01-16 20:30:28 +01:00
Aliaksandr Valialkin	43d615ae87	lib/logstorage: properly pass tenantIDs list to initStreamFilters Previously an empty tenantIDs list was mistakenly passed to initStreamFilters when the query already contained top-level stream filter. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8037	2025-01-16 17:45:49 +01:00
Aliaksandr Valialkin	e5b4cf33bf	lib/logstorage: make golangci-lint happy after `f27e120aeb`	2025-01-15 22:28:13 +01:00
Aliaksandr Valialkin	f27e120aeb	lib/logstorage: add `union` pipe, which allows uniting results from multiple queries	2025-01-15 22:22:07 +01:00
Aliaksandr Valialkin	ee1ce90501	lib/logstorage: properly drop temporary directories created by filter* tests	2025-01-15 22:22:07 +01:00
Aliaksandr Valialkin	47fe8cf3be	lib/logstorage: `math` pipe: add `rand()` function	2025-01-15 22:22:06 +01:00
Aliaksandr Valialkin	b4f4ece162	lib/logstorage: improve performance of `unique` pipe for integer columns with big number of unique values	2025-01-15 19:53:10 +01:00
Aliaksandr Valialkin	bb00f7529f	lib/logstorage: improve performance when applying math calculations for _time, const and dict values	2025-01-15 19:53:10 +01:00
Nikolay	277fdd1070	lib/storage: reduce test suite batch size (#8022 ) Commit `eef6943084` added new test functions. Which checks various cases for metricName registration at data ingestion. Initial dataset size had 4 batches with 100 rows each. It works fine at machines with 5GB+ memory. But i386 architecture supports only 4GB of memory per process. Due to given limitations, batch size should be reduced to 3 batches and 30 rows. It keeps the same test funtionality, but reduces overall memory usage to ~3GB. Signed-off-by: f41gh7 <nik@victoriametrics.com>	2025-01-14 11:27:50 +01:00
Roman Khavronenko	d290efb849	lib/opentlemetry: throttle log messages during parsing (#8021 ) Samples parsing is a hot path. Bad client could easily overwhelm receiver with bad or unsupported data. So it is better to throttle such messages. Follow-up after `b26a68641c` ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-01-14 11:03:11 +01:00
chenlujjj	b26a68641c	lib/opentelemetry: log the metric name of unsupported metrics (#8018 ) To resolve: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8009 Log the name of unsupported metrics.	2025-01-14 10:49:30 +01:00
Aliaksandr Valialkin	b88cda5c41	lib/logstorage: make golangci-lint happy after the commit `d2a791bef3`	2025-01-13 22:31:33 +01:00
Aliaksandr Valialkin	d2a791bef3	lib/logstorage: add `histogram` stats function for calculating histogram buckets over numeric fields	2025-01-13 22:30:19 +01:00
Aliaksandr Valialkin	99516a5730	lib/logstorage: `top` pipe: allow mixing the order of `hits` and `rank` suffixes	2025-01-13 22:30:19 +01:00
Aliaksandr Valialkin	aecc86c390	lib/logstorage: do not copy pipeTopkProcessorShard when obtaining parition keys	2025-01-13 22:30:19 +01:00
Aliaksandr Valialkin	cc29692e27	lib/logstorage: track integer field values in integer map for `top N (int_field)` This reduces memory usage by up to 2x for the map used for tracking hits. This also reduces CPU usage for tracking integer fields.	2025-01-13 22:30:18 +01:00
Aliaksandr Valialkin	f018aa33cb	lib/logstorage: avoid callback overhead at visitValuesReadonly Process values in batches instead of passing every value in the callback. This improves performance of reading the encoded values from storage by up to 50%.	2025-01-13 22:30:17 +01:00
Roman Khavronenko	ee3c0c6a87	make: bump golangci-lint to v1.63.4 ( New version has additional checks and reduced resource consumption, so it doesn't timeout for our internal repos. To make linter happy, I addressed "redefinition of the built-in function" lint error. ---- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-01-13 07:18:04 +01:00
Aliaksandr Valialkin	cf7ea78588	lib/logstorage: `format` pipe: add frequently used formatters - url encoding / decoding with <urlencode:field> and <urldecode:field> - base64 encoding / decoding with <base64encode:field> and <base64decode:field> - hex encoding / decoding with <hexencode:field> and <hexdecode:field> - hex encoding for integers with <hexnumencode:field> and <hexnumdecode:field>	2025-01-13 07:08:43 +01:00
Aliaksandr Valialkin	186aa3bb0e	lib/logstorage: explicitly pass statsFunc to statsProcessor methods This allows reducing the state of every statsProcessor by removing pointer to the corresponding statsFunc. For example, this reduces statsCountProcessor size by 2x.	2025-01-13 04:49:39 +01:00
Aliaksandr Valialkin	e368f687a7	lib/logstorage: `stats` pipe: stop finalizeStats() as soon as the query is canceled Previoysly finalizeStats() for some functions such as count_uniq() could run for long periods of time after the query is canceled, since stopCh wan't propagated to finalizeStats().	2025-01-13 03:38:09 +01:00
Aliaksandr Valialkin	0214aa328e	lib/logstorage: `stats` pipe: use integer group keys if `stats by(...)` contains a single field with integer values This reduces memory usage and improves performance, since access to a map with integer keys is faster than access to a map with string keys.	2025-01-13 03:22:00 +01:00
Aliaksandr Valialkin	dd919eeee6	lib/logstorage: `count_uniq` and `count_uniq_hash` stats functions: avoid converting integer values to strings Prevsiously integer values were converted to strings before being passed to `updateState()` function at `count_uniq` and `count_uniq_hash`. Later such values are converted back to integers in order to track them via integer map of unique values. This commit avoids the int -> string -> int conversion. Instead, it passes integers directly to the integer map of unique values. This improves performance of `count_uniq` and `count_uniq_hash` functions even further.	2025-01-13 02:45:28 +01:00
Aliaksandr Valialkin	3f22d06b0c	lib/logstorage: add `value_type` filter to LogsQL This filter can be used when debugging and exploring logs in order to understand better which value types are used for storing the particular log fields. The `value_type` filter complements `block_stats` pipe.	2025-01-12 22:21:39 +01:00
Aliaksandr Valialkin	b812de236b	lib/logstorage: run `make fmt` after `e610edf045`	2025-01-12 03:17:57 +01:00
Aliaksandr Valialkin	e610edf045	lib/logstorage: improve performance for `math` pipe - Pass the calculated results to the next pipe in float64 columns. Previously the results were converted to string columns. This could slow down further calculations. - Use custom optimized logic for processing numeric columns, which are passed to math pipe. Previously all the input columns were converted to string and then converted to float64 before math pipe calculations. - Initialize the newly added columns at blockResult as soon as they are added. This improves performance when big number of columns are calculated by math pipe.	2025-01-12 03:01:47 +01:00
Aliaksandr Valialkin	764955b61c	lib/logstorage: track integer values in integer maps when counting the number of unique values at `count_uniq` stats function Previously integer values were tracked in string maps. Now every input value is parsed as integer. On success the parsed integer is tracked via specialized maps, which hold only integers. This reduces CPU usage and memory usage in general case.	2025-01-12 03:01:46 +01:00
Aliaksandr Valialkin	e3d31a371a	lib/logstorage: avoid copying column name inside blockSearch.getColumnHeader() and blockSearch.getConstColumnValue() Use the column name attached to the corresponding part. The lifetime of this column name exceed the blockSearch lifetime, so it is safe using it here. This is a follow-up for `8d968acd0a`	2025-01-12 03:01:46 +01:00
Aliaksandr Valialkin	df723a4870	lib/logstorage: automatically detect columns with int64 values and store them as packed 8-byte int64 values Previously columns with negative int64 values were stored either as float64 or string depending on whether the negative int64 values are bigger or smaller than -2^53. If the integer values are smaller than -2^53, then they are stored as string, since float64 cannot hold such values without precision loss. Now such values are stored as int64. This should improve compression ratio and query performance over columns with negative int64 values.	2025-01-12 03:01:46 +01:00
Aliaksandr Valialkin	bd00e3a735	lib/logstorage: make sure that the automatic conversion of field values to float64 is lossless Previously field values could be automatically converted to float64 with precision loss. This could lead to unexpected results when querying such field values. For example, "10007199254740992" was incorrectly represented as 10007199254740993. This commit prevents from such lossy conversions when storing field values. While at it, prevent from int64 overflow at tryParseBytes and tryParseDuration functions, which are used for parsing constants in queries for byte sizes and durations. Now these functions return 1<<63-1 (the maximum int64 value) for constants exceeding this value. Previously they could return arbitrary garbage for such constants.	2025-01-12 03:01:45 +01:00
Nikolay	e9f86af7f5	lib/storage: add a hint for merge about type of parts in merge (#7998 ) Hint allows to choose type of cache to be used for index search: - in-memory parts are storing recently ingested samples and should use main cache. This improves ingestion speed and cache hit ration for queries accessing recently ingested samples. - merges of file parts is performed in background, using a separate cache allows avoiding pollution of the main cache with irrelevant entries. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7182 --------- Signed-off-by: f41gh7 <nik@victoriametrics.com>	2025-01-10 16:01:39 +04:00
Nikolay	9ada784983	lib/storage: make finalDedup schedule interval configurable This commit makes configurable interval for checking if final dedup process for the historical data should be started. It allows to spread resource utilisation for multiple vmstorage/vmsingle instances in time. Since final dedup may add additional preasure on disk, backup systems and make cluster less stable. Storage unconditionally adds 25% jitter to the provided value, it should simplify configuration management at Kubernetes ecosystem. Because Kubernetes application pods must have the same configuration. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7880 --------- Signed-off-by: f41gh7 <nik@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2025-01-10 10:46:46 +01:00
Zhu Jiekun	276989716f	lib/promscrape: add Marathon service discovery This commit adds support for [Marathon](https://mesosphere.github.io/marathon/) service discovery to the scrape configuration. The following flag is introduced: ``` -promscrape.marathonSDCheckInterval duration Interval for checking for changes in Marathon service discovery. This works only if marathon_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#marathon_sd_configs for details (default 30s) ``` The service discovery could be config like: ```yaml scrape_configs: - job_name: marathon_job marathon_sd_configs: servers: - "..." - "..." ``` See: [`b555d94d1a/docs/sd_configs.md (marathon_sd_configs)`) related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6642 --------- Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2025-01-08 18:57:22 +01:00
cuiweiyuan	d064e14933	chore: fix function name in comment (#7926 ) ### Describe Your Changes fix function name in comment ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: cuiweiyuan <cuiweiyuan@aliyun.com>	2025-01-08 13:58:22 +01:00
Andrii Chubatiuk	79f1a37ee6	vlinsert: take into account order of msgfields to have predictable _msg field selection in case of multiple matches (#7784 ) ### Describe Your Changes Currently if multiple msgFields are present in a log row it's not obvious which field is selected as a _msg field. With this PR and order of msgfield values defined either via headers or query arg params defines a priority of these values ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-12-23 10:10:02 +01:00
Andrii Chubatiuk	f9cd408ca9	datadog-serverless: fixed metrics and logs ingestion from Datadog serverless extensions for AWS and GCP (#7769 ) fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7761 ### Describe Your Changes - datadog /api/v2/logs api supports message field in json format, which is not documented and is used by serverless extension. This PR allows message field to be both string and object type. Also added support of not documented timestamp field - added `-datadog.streamFields` and `-datadog.ignoreFields` flags to configure default stream fields for datadog logs, where there's no alternative option to pass extra headers and query args - added ingest `max` and `min` values of data, which are ingested using `datadogsketches` API, which is also actively used by serverless extensions - use default `.` separator instead of `_` for sketches metric names until metrics are not sanitized	2024-12-23 09:57:48 +01:00
Aliaksandr Valialkin	afd926a0b0	lib/logstorage: limit the maximum number of logs and/or log streams, which can be passed to `stream_context` pipe This should prevent from excess usage of CPU, RAM and other resources when too many logs are passed to 'stream_context' pipe. It is expected that 'stream_context' pipe results are investigated by humans, who cannot inspect surrounding logs for millions of initial logs. That's why it is OK to limit the number of logs and/or log streams, which can be passed to 'stream_context' pipe. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7766 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7903	2024-12-22 14:28:50 +01:00
Aliaksandr Valialkin	c5949af9e8	lib/logstorage: reduce memory allocations when splitting in(...) values into tokens and calculating hashes for these tokens While at it, reduce memory allocations at Storage.getFieldValuesNoHits and make it more scalable on multi-CPU systems. This improves performance of in(<query>) filter when the <query> returns big number of values.	2024-12-22 13:13:44 +01:00
Aliaksandr Valialkin	5dc0413bc0	lib/logstorage: allow specifying hits column name in the `top` pipe via `top ... hits as <column_name>` syntax	2024-12-22 11:23:19 +01:00
Aliaksandr Valialkin	f919783de9	lib/logstorage: uncommend accidentally commented tests at `60f9f44150`	2024-12-22 02:20:57 +01:00
Aliaksandr Valialkin	60f9f44150	lib/logstorage: reduce memory allocations at `stats` and `top` pipes Use chunked allocator in order to reduce memory allocations. It allocates objects from slices of up to 64Kb size. This improves performance for `stats` and `top` pipes by up to 2x when they are applied to big number of `by (...)` groups. Also parallelize execution of `count_uniq`, `count_uniq_hash` and `uniq_values` stats functions, so they are executed faster on hosts with many CPU cores when applied to fields with big number of unique values.	2024-12-22 02:13:02 +01:00
Aliaksandr Valialkin	471f1d0a09	lib/logstorage: fixed a typo in blockResult.reset() The commit `4599429f51` improperly set br.cs to nil, while it should set br.bs to nil instead. This resulted in excess memory allocations at br.csInit() and br.csInitFast().	2024-12-21 13:39:25 +01:00
Aliaksandr Valialkin	5478cc61c2	lib/cgroup: add missing initialization of `gogc` variable inside SetGOGC This is a follow-up for `79c08ecac4` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7902	2024-12-20 14:56:59 +01:00
Aliaksandr Valialkin	79c08ecac4	lib/cgroup: use the default GOGC=100 for the most of VictoriaMetrics components Historically some of VictoriaMetrics components were optimized for the low rate of memory allocations. These are: vmagent, single-node VictoriaMetrics and vmstorage. These components benefit from the low GOGC value, since this allow reducing their memory usage in steady state on typical workloads. Other VictoriaMetrics components aren't optimized for the reduced rate of memory allocations. This results in the increased CPU usage spent on garbage collection (GC) in these components, since it must be triggered at higher rate. See https://tip.golang.org/doc/gc-guide#GOGC for details. These components do not use too much memory, so it is OK increasing the GOGC for these components from 30 to 100 - this won't affect the most users. Keep GOGC to 30 only for vmagent, single-node VictoriaMetrics and vmstorage components. See `077193d87c` and `54b9e1d3cb` . Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7902	2024-12-20 14:48:28 +01:00
Aliaksandr Valialkin	9c39bac565	lib/logstorage: fix imroper sorting of numeric fields when they are stored as const values at `sort` pipe Numeric fields can be stored as const values in the block of logs. In this case the `sort` pipe was incorrectly comparing such values as strings instead of numbers. This results in incorrect sort results. For example, 123 was smaller than 2. Fix this by removing the incorrect case for comparing const fields. While at it, replace lessString() with strings.LessNatural() in the sortBlockLess. This improves sorting performance a bit, since the sortBlockLess function already tried comparing numeric values, and it doesn't need to spend CPU time on such a comparison again inside lessString() call. The commit `42c9183281` wasn't correct by replacing strings.LessNatural() with lessString() inside the sortBlockLess() function.	2024-12-20 13:26:20 +01:00
Aliaksandr Valialkin	524f0e8d8b	lib/logstorage: eliminate memory allocations when finalizing per-group values calculated by `stats` pipe This improves query performance a bit when `stats by (...)` returns millions of individual `by (...)` groups	2024-12-17 15:17:01 +01:00
Aliaksandr Valialkin	e6b7d25ab4	app/vlselect: allow passing arbitrary LogsQL filters to extra_filters and extra_stream_filters query args While at at, allow passing an array of string values per each JSON entry at extra_filters and extra_stream_filters. For example, `extra_filters={"foo":["bar","baz"]}` is converted into `foo:in("bar", "baz")` extra filter, while `extra_stream_fitlers={"foo":["bar","baz"]}` is converted into `{foo=~"bar\|baz"}` extra filter. This should simplify creating faceted search when multiple values per a single log field must be selected. This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7365#issuecomment-2447964259 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5542	2024-12-17 13:02:13 +01:00

1 2 3 4 5 ...

2915 commits