github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	7094fa38bc	lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping Previously all the newly ingested time series were registered in global `MetricName -> TSID` index. This index was used during data ingestion for locating the TSID (internal series id) for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names). The `MetricName -> TSID` index is stored on disk in order to make sure that the data isn't lost on VictoriaMetrics restart or unclean shutdown. The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache, and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics uses in-memory cache for speeding up the lookup for active time series. This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk. VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases: - If `storage/tsid` cache capacity isn't enough for active time series. Then just increase available memory for VictoriaMetrics or reduce the number of active time series ingested into VictoriaMetrics. - If new time series is ingested into VictoriaMetrics. In this case it cannot find the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index, since it doesn't know that the index has no the corresponding entry too. This is a typical event under high churn rate, when old time series are constantly substituted with new time series. Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index, are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics. Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName` for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod. This index can become very large under high churn rate and long retention. VictoriaMetrics caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups. The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series. This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics consults only the index for the current day when new time series is ingested into it. The downside of this change is increased indexdb size on disk for workloads without high churn rate, e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store identical `MetricName -> TSID` entries for static time series for every day. This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation, since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 . At the same time the change fixes the issue, which could result in lost access to time series, which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698 The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685 This is a follow-up for `1f28b46ae9`	2023-07-13 16:07:30 -07:00
Aliaksandr Valialkin	a360fd5f71	app/{vmselect,vlselect}: run `make vmui-update vmui-logs-update`	2023-07-09 12:43:48 -07:00
Haleygo	20e7db47ee	vmselect: fix result in Prometheus query when time is small (#4578 ) vmselect: fix result in Prometheus query when time is small Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2023-07-07 11:48:05 +02:00
Aliaksandr Valialkin	7f3b5431a1	app/vmselect/graphite: follow-up after `c7884f8686` - Consistently use -search.maxGraphiteTagValues for limiting tag values from auto-complete API - Use -search.maxGraphiteSeries for limiting paths (aka series), which can be returned from Graphite series API - Clarify the change in docs/CHANGELOG.md Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4339 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2841	2023-07-06 15:21:56 -07:00
Aliaksandr Valialkin	45e345806c	app/vmselect/netstorage: remove runtime.Gosched() call from unpackWorker() This should improve scalability of unpackWorker() on systems with many CPU cores. This is a follow-up for `a2ecf4fa4a` and `16f3b279a2` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966	2023-07-06 10:05:58 -07:00
Aliaksandr Valialkin	8be52ef217	app/vlselect: handle vmui at /select/vmui path instead of /vmui This simplifies routing at auth proxies such as vmauth to vlselect component, which serves VMUI - just route all the requests, which start with /select/, to vlselect.	2023-06-21 19:52:50 -07:00
Aliaksandr Valialkin	78eaa056c0	app/vmselect: move common http functionality from app/vmselect/searchutils to lib/httputils While at it, move app/vmselect/bufferedwriter to lib/bufferedwriter, since it is going to be used in VictoriaLogs	2023-06-19 22:34:20 -07:00
Dmytro Kozlov	c7884f8686	app/{graphite,netstorage,prometheus}: fix graphite search tags api limits, remove redudant limit from SeriesHandler handler (#4352 ) * app/{graphite,netstorage,prometheus}: fix graphite search tags api limits, remove unused limit from SeriesHandler handler, * app/{graphite,netstorage,prometheus}: use search.maxTagValues for Graphite * app/{graphite,netstorage,prometheus}: update CHANGELOG.md * app/{graphite,netstorage,prometheus}: use own flags for Graphite API * app/{graphite,netstorage,prometheus}: cleanup * app/{graphite,netstorage,prometheus}: cleanup * app/{graphite,netstorage,prometheus}: update docs --------- Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-06-02 14:34:04 +02:00
Nikolay	228ea03bda	app/vmselect/graphite: fixes tests for arm (#4348 ) at arm based CPUs only 9 digits after comma matches for tests. Especially at holtWinters functions. Since it only takes effect at tests it makes no sense for changing float prescision at actual functions	2023-05-26 09:34:15 +02:00
Aliaksandr Valialkin	2b53ff774b	app/vmselect: log locations of sendPrometheusError() calls Previously the location inside the sendPrometheusError() was logged. This could make hard investigating error locations via `vm_log_messages_total` metric.	2023-05-18 20:39:53 -07:00
Aliaksandr Valialkin	d9b3a92348	app/vmselect/vmui: run `make vmui-update` after `39c1b0f8d1`	2023-05-18 12:15:12 -07:00
Alexander Marshalov	2e494e2375	fixed typos in documentation and commandline flags descriptions (#4275 )	2023-05-10 09:50:41 +02:00
Aliaksandr Valialkin	ec3943d14a	app/vmselect: small cleanup after `4f3f9950d0` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3807	2023-05-08 14:57:11 -07:00
Aliaksandr Valialkin	1db9b78b88	app/vmselect: small cleanup after `68e31a6000` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3811	2023-05-08 14:34:37 -07:00
Aliaksandr Valialkin	80946f06c2	app/{vmselect,vmctl}: move ParseTime() to lib/promutils Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4091 This is a follow-up for `e2053baf32`	2023-05-08 14:17:57 -07:00
Roman Khavronenko	baf456978d	vmselect: exit early from queue on context cancel (#4223 ) * vmselect: exit early from queue on context cancel When `-search.maxConcurrentRequests` is reached, vmselect puts request in the queue. It is expected, that requests in the queue will be processed as soon as it would be enough capacity to do so. However, it could happen that while request was waiting its turn, the client could have already cancel it (close the connection, or just close the tab with UI). In this case, we should de-queue such requests to avoid spending extra resources on them. Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmselect: address review comments Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-05-03 10:42:17 +02:00
Yury Molodov	4f3f9950d0	vmui: add metric relabel debug (#3889 ) * feat: add metric relabel debug (#3807) * fix: add link to relabeling cookbook * lib/promrelabel: merge, fix conflicts * lib/promrelabel: fix diff * docs/vmui: add metric relabel playground --------- Co-authored-by: dmitryk-dk <kozlovdmitriyy@gmail.com>	2023-04-26 11:53:29 +03:00
Yury Molodov	68e31a6000	vmui: Integrate WITH template playground (#3831 ) * feat: add WithTemplate page * app/vmselect/prometheus: enable json mode for expand with expr API * app/vmselect/prometheus: enable CORS and add content type * feat: add api for expand with templates * fix: remove console from useExpandWithExprs * app/vmselect/prometheus: fix escaping * vmui: integrate WITH template * app/vmctl: check content type instead of form param * fix: add content-type for fetch with-exprs * fix: add a header to the server's response that allows the "Content-Type" header * app/vmctl: added comment and cleanup * app/vmctl: use format query param --------- Co-authored-by: dmitryk-dk <kozlovdmitriyy@gmail.com>	2023-04-25 11:40:01 +03:00
Aliaksandr Valialkin	3727251910	lib/fs: add MustReadDir() function Use fs.MustReadDir() instead of os.ReadDir() across the code in order to reduce the code verbosity. The fs.MustReadDir() logs the error with the directory name and the call stack on error before exit. This information should be enough for debugging the cause of the error.	2023-04-14 22:10:46 -07:00
Aliaksandr Valialkin	30425ca81a	lib/fs: rename WriteFileAtomically to MustWriteAtomic Callers of this function log the returned error and exit. So let's just log the error with the given filepath and the call stack inside the function itself and then exit. This simplifies the code at callers' place while leaves the same level of debuggability in case of errors.	2023-04-13 22:41:15 -07:00
Aliaksandr Valialkin	036a7b7365	lib/fs: replace MkdirAllIfNotExist->MustMkdirIfNotExist and MkdirAllFailIfExist->MustMkdirFailIfExist Callers of these functions log the returned error and then exit. The returned error already contains the path to directory, which was failed to be created. So let's just log the error together with the call stack inside these functions. This leaves the debuggability of the returned error at the same level while allows simplifying the code at callers' side. While at it, properly use MustMkdirFailIfExist instead of MustMkdirIfNotExist inside inmemoryPart.MustStoreToDisk(). It is expected that the inmemoryPart.MustStoreToDick() must fail if there is already a directory under the given path.	2023-04-13 22:11:59 -07:00
Aliaksandr Valialkin	a3eebf118e	app/vmselect/vmui: run `make vmui-update` after `01fc228fb0`	2023-04-06 15:07:41 -07:00
Aliaksandr Valialkin	4770377fb3	app/vmselect/vmui: run `make vmui-update` after `a1601929ec`	2023-04-06 03:20:13 -07:00
Yury Molodov	74eea53dee	vmui: implement heatmap improvements (#4078 ) * fix: disabled limits for histogram * fix: add sorted buckets by upper bound * refactor: move line chart components to folder * feat: implement heatmap improvements (https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3384#issuecomment-1484023162) * app/vmselect/vmui: `make vmui-update` --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-04-05 22:13:57 -07:00
Aliaksandr Valialkin	de0fe02f6e	app/vmselect/vmui: run `make vmui-update` after `edb45d7fc1`	2023-04-02 21:21:51 -07:00
Aliaksandr Valialkin	06b721dd07	app/vmselect/vmui: run `make vmui-update` after `42087518ba`	2023-04-01 00:40:49 -07:00
Aliaksandr Valialkin	ffdf430be0	app/vmselect/graphite: open source Graphite Render API	2023-03-31 23:25:04 -07:00
Nikolay	9b1e002287	app/vmselect: properly remove temp files at windows system (#4020 ) With non-posix compliant systems it's not possible to remove unclosed files. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2023-03-27 18:10:15 -07:00
Aliaksandr Valialkin	02ee4ffd4d	app/vmselect/promql: follow-up for `79e1c6a6fc` - Document the fix at docs/CHANGELOG.md - Add tests with multiple adjancent zero buckets - Simplify the fix a bit Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/296 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4021	2023-03-27 18:03:36 -07:00
Ze'ev Klapow	79e1c6a6fc	fix le buckets when adjacent vmrange is empty (#4021 ) There is a bug here where if you have a single bucket like: foo{vmrange="4.084e+02...4.642e+02"} 2 123 The expected output is three le encoded buckets like: foo{le="4.084e+02"} 0 123 foo{le="4.642e+02"} 2 123 foo{le="+Inf"} 2 123 This correctly encodes the start and end of the vmrange. If however, the input contains the previous bucket, and that bucket is empty then you only get the end le and +Inf out currently, i.e: foo{vmrange="7.743e+05...8.799e+05"} 5 123 foo{vmrange="6.813e+05...7.743e+05"} 0 123 results in: foo{le="8.799e+05"} 5 123 foo{le="+Inf"} 5 123 This causes issues when you go to compute a quantile because this means that the assumed lower bound of the buckets is 0 and this we interpolate between 0->end rather than the vmrange start->end as expected.	2023-03-27 17:54:19 -07:00
Aliaksandr Valialkin	622000797a	app/vmselect: follow-up for `10ab086366` - Expose stats.seriesFetched at `/api/v1/query_range` responses too for the sake of consistency. - Initialize QueryStats when it is needed and pass it to EvalConfig then. This guarantees that the QueryStats is properly collected when the query contains some subqueries.	2023-03-27 15:22:00 -07:00
Roman Khavronenko	4021aa11b5	app/vmselect: export `seriesFetched` stat for /query responses (#3925 ) The change adds a new field `seriesFetched` to EvalConfig object. Since EvalConfig object can be copied inside `Exec`, `seriesFetched` is a pointer which can be updated by all copied objects. The reason for having stats is that other components, like vmalert, could benefit from this information. Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-03-27 15:18:25 -07:00
Yury Molodov	3214b1c315	vmui: heatmap (#3780 ) * fix: add stroke and font for all axes * feat: add util for generate gradient * feat: add heatmap plugin * feat: add heatmap legend * feat: add heatmap graph (#3384) * vmui: add heatmap graph (#3384) * feat: add convert Prometheus to VictoriaMetrics histogram * fix: prevent re-render graph * feat: reset step for heatmap * feat: normalize heatmap data * fix: format heatmap legend * wip * app/vmselect/vmui: run `make vmui-update` --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-03-26 00:30:02 -07:00
Aliaksandr Valialkin	5832242b44	app/vmselect/netstorage: reduce the contention at fs.ReaderAt stats collection on systems with big number of CPU cores This optimization is based on the profile provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966#issuecomment-1483208419	2023-03-25 16:37:07 -07:00
Aliaksandr Valialkin	a1e496ced6	app/vmselect/netstorage: document why runtime.Gosched() is removed at `28f054bb00` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966	2023-03-25 16:36:51 -07:00
Zakhar Bessarab	28f054bb00	vmselect/netstorage: remove direct calls to `Gosched` to reduce amount of locks for global scope using `runtime.Gosched` requires acquiring global lock to check if there are any other goroutines to perform tasks. with the latest versions of runtime it can pause running goroutines automatically without requiring to call `Gosched` directly. Updates #3966 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-03-25 16:34:03 -07:00
Aliaksandr Valialkin	2b851e69d2	app/vmselect/promql: typo fix after `e7f46a0aab`	2023-03-24 23:46:30 -07:00
Aliaksandr Valialkin	e7f46a0aab	app/vmselect/promql: follow-up for `7205c79c5a` - Allocate and initialize seriesByWorkerID slice in a single go instead of initializing every item in the list separately. This should reduce CPU usage a bit. - Properly set anti-false sharing padding at timeseriesWithPadding structure - Document the change at docs/CHANGELOG.md Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966	2023-03-24 23:34:37 -07:00
Zakhar Bessarab	7205c79c5a	app/vmselect/promql: use lock-less approach to gather results of parallel processing for `evalRollup` funcs (#4004 ) vmselect/promql: refactor `evalRollupNoIncrementalAggregate` to use lock-less approach for parallel workers computation Locking there is causing issues when running on highly multi-core system as it introduces lock contention during results merge. New implementation uses lock less approach to store results per workerID and merges final result in the end, this is expected to significantly reduce lock contention and CPU usage for systems with high number of cores. Related: #3966 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * vmselect/promql: add pooling for `timeseriesWithPadding` to reduce allocations Related: #3966 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * vmselect/promql: refactor `evalRollupFuncWithSubquery` to avoid using locks Uses same approach as `evalRollupNoIncrementalAggregate` to remove locking between workers and reduce lock contention. Related: #3966 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-03-24 23:07:12 -07:00
Aliaksandr Valialkin	ebc1caa5dc	app/vmselect/vmui: run `make vmui-update` after `dc2c712a29`	2023-03-24 18:01:39 -07:00
Aliaksandr Valialkin	e480b9881e	app/vmselect/promql: pass workerID to the callback inside doParallel() This opens the possibility to remove tssLock from evalRollupFuncWithSubquery() in the follow-up commit from @zekker6 in order to speed up the code for systems with many CPU cores. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966	2023-03-20 20:54:57 -07:00
Aliaksandr Valialkin	9e16329b2f	app/vmselect/promql: fix TestIncrementalAggr test on systems less than 3 CPU cores This is a follow-up for `4856a4cf5a`	2023-03-20 20:37:18 -07:00
Aliaksandr Valialkin	70959d5dab	app/vmselect/netstorage: reduce the number of calls to runtime.Gosched() at timeseriesWorker() and unpackWorker() Call runtime.Gosched() only when there is a work to steal from other workers. Simplify the timeseriesWorker() and unpackWroker() code a bit by inlining stealTimeseriesWork() and stealUnpackWork(). This should reduce CPU usage when processing queries on systems with big number of CPU cores. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966	2023-03-20 20:31:02 -07:00
Aliaksandr Valialkin	4856a4cf5a	app/vmselect: optimize incremental aggregates a bit Substitute sync.Map with an ordinary slice indexed by workerID. This should reduce the overhead when updating the incremental aggregate state	2023-03-20 15:37:06 -07:00
Aliaksandr Valialkin	8622dee4b5	app/vmselect/vmui: `make vmui-update` after `d4525bd2d0`	2023-03-20 14:35:03 -07:00
oliverpool	fbefc940ef	app/vmselect/promql: add test to ensure 8-byte alignment (#3948 ) See `0af9e2b693`	2023-03-16 09:01:42 -07:00
Aliaksandr Valialkin	e8225d7d6b	app/vmselect/promql: prevent from `cannot unmarshal timeseries from rollupResultCache` panic after the upgrade to v1.89.0 The issue has been introduced in `0af9e2b693`	2023-03-12 19:09:39 -07:00
Aliaksandr Valialkin	1428aa2c22	app/vmselect/vmui: `make vmui-update` after `00a0816ab1`	2023-03-12 17:19:19 -07:00
Aliaksandr Valialkin	0af9e2b693	app/vmselect/promql: prevent from SIGBUS crash on architecures, which deny unaligned access to 8-byte words (e.g. ARM) Thanks to @oliverpool for nailing down the root cause of the issue and for the initial attempt to fix it at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3927	2023-03-12 16:32:08 -07:00
Aliaksandr Valialkin	b5db69fe05	app/vmselect/netstorage: do not intern string representation of MetricName for time series received from vmstorage It has been appeared that this interning may lead to increased memory usage and increased CPU usage when vmselect performs queries, which select big number of time series. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3863	2023-03-12 00:52:35 -08:00

1 2 3 4 5 ...

928 commits