github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-01 14:47:38 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	e51190a34c	Revert "app/vmselect: make vmselect resilient to absence of cache folder (#5987 )" This reverts commit `cb23685681`. Reason for revert: the "fix" may hide programming bugs related to incorrect creation of folders before their use. This may complicate detecting and fixing such bugs in the future. There are the following fixes for the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985 : - To configure the OS to do not drop data from the system-wide temporary directory (aka /tmp). - To run VictoriaMetrics with -cacheDataPath command-line flag, which points to the directory, which cannot be removed automatically by the OS. The case when the user accidentally deletes the directory with some files created by VictoriaMetrics shouldn't be considered as expected, so VictoriaMetrics shouldn't try resolving this case automatically. It is much better from operation and debuggability PoV is to crash with the clear `directory doesn't exist` error in this case.	2024-04-03 02:44:00 +03:00
Roman Khavronenko	548bf31dd2	app/vmselect: make vmselect resilient to absence of cache folder (#5987 ) vmselect uses a cache folder in file system for two purposes: 1. Storing rollup cache results on shutdown; 2. Storing temporary search results from vmstorage during query executions. It could happen that cache folder is deleted accidentally by user, or by OS during cleanup routines. This would cause vmselect to: 1. panic on /metrics call, because `MustGetFreeSpace` will fail; 2. return query error user, as it won't be able to store temporary search results. The changes in this commit are the following: 1. Make `MustGetFreeSpace` to try re-creating the cache folder if it is missing; 2. Make vmselect to try re-creating the cache folder if it can't persist tmp search results. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985 Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com> (cherry picked from commit `cb23685681`)	2024-03-26 15:27:32 +01:00
Aliaksandr Valialkin	319d21eddf	app/vmselect/netstorage: usae unsafe.SliceData instead of deprecated reflect.SliceHeader	2024-02-29 17:38:14 +02:00
Aliaksandr Valialkin	63d635a5e4	app: consistently use atomic.* types instead of atomic.* functions See `ea9e2b19a5`	2024-02-24 03:06:14 +02:00
Dan Dascalescu	0c7eda7c88	app/vmselect: simplify wording for `too many samples` error (#5827 ) (cherry picked from commit `17cf031fa1`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-02-20 16:29:11 +01:00
Aliaksandr Valialkin	7a9f0b32a2	app/vmselect/netstorage: prevent from disk write IO when closing temporary files Remove temporary file before closing it in order to signal the OS that it shouldn't store the file contents from page cache to disk when the file is closed. Gracefully handle the case when the file cannot be removed before being closed - in this case remove the file after closing it. This allows working on Windows. Also remove superflouos opening of temporary file for reading - re-use already opened file handle for writing. This is a follow-up for `9b1e002287` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4020 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2024-02-01 19:54:48 +02:00
Aliaksandr Valialkin	4f5cb17042	app/vmselect/netstorage: properly handle the case when an empty brsPool points to the end of brs.brs This case is possible after a new brsPool is allocated. The fix is to verify whether len(brsPool) >= len(brs.brs) before trying to append a new item to brsPool and sharing its contents with brs.brs. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5733	2024-01-31 10:31:51 +02:00
Aliaksandr Valialkin	d8c82b6421	app/vmselect/netstorage: initialize tmpBlocksFileWrapper at goroutine, which continues using it This may improve CPU cache locality	2024-01-26 21:29:30 +01:00
Aliaksandr Valialkin	cfc1193d15	app/vmselect/netstorage: limit the maximum brsPool size to 32Kb at ProcessSearchQuery() This avoids slow path in Go runtime for allocating objects bigger than 32Kb - see `704401ffa0/src/runtime/malloc.go (L11)` This also reduces memory usage a bit for vmselect and single-node VictoriaMetrics after the commit `5dd37ad836` . Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527	2024-01-23 14:12:27 +02:00
Aliaksandr Valialkin	fe4ea30a79	app/vmselect/netstorage: limit the size of metricNamesBuf to 32Kb in order to avoid slow path at Go runtime for allocating a byte slice of bigger size See `704401ffa0/src/runtime/malloc.go (L11)` This also reduces the average memory usage a bit for vmselect and single-node VictoriaMetrics after the commit `508c608062` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527	2024-01-23 13:50:59 +02:00
Aliaksandr Valialkin	953b96ced2	app/vmselect/netstorage: remove tswPool, since it isnt efficient	2024-01-23 02:29:13 +02:00
Aliaksandr Valialkin	68a59bfabd	app/vmselect/netstorage: avoid metricName->blockRef lookup when processing multiple blocks for the same time series This saves a few CPU cycles for common case	2024-01-23 02:29:12 +02:00
Aliaksandr Valialkin	f8a9ef8cbd	app/vmselect/netstorage: group per-vmstorage fields at tmpBlocksFileWrapperShard This improves code readability a bit	2024-01-23 02:29:12 +02:00
Aliaksandr Valialkin	d52b121222	app/vmselect/netstorage: use []blockRef from blockRefPool in order to reduce memory allocations	2024-01-23 02:29:11 +02:00
Aliaksandr Valialkin	5b05224eb9	app/vmselect/netstorage: substitute pointer to blockRefs by brssPool index at the metricName->blockRefs map This should reduce the pressure on Go GC, since it will see lower number of pointers. This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527	2024-01-23 02:29:11 +02:00
Aliaksandr Valialkin	b289f15f02	app/vmselect/netstorage: reduce the number of allocations for blockRefs objects in ProcessSearchQuery() This should reduce pressure on Go GC at vmselect The change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527	2024-01-23 02:29:11 +02:00
Aliaksandr Valialkin	2ab9a75cca	app/vmselect/netstorage: reduce the number of memory allocations in ProcessSearchQuery() by storing all the metric names in a single byte slice This reduces the number of memory allocations at the cost of possible memory usage increase, since now different metric name strings may hold references to the previous byte slice. This is good tradeoff, since ProcessSearchQuery is called in vmselect, and vmselect isn't usually limited by memory. This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527	2024-01-23 02:29:10 +02:00
Aliaksandr Valialkin	c888d76c4b	app/vmselect/netstorage: make sure that at least a single result is collected from every storage group before deciding whether it is OK to skip results from the remaining storage nodes	2023-12-20 19:53:49 +02:00
Aliaksandr Valialkin	e4bb2808f1	app/vmselect: add support for vmstorage groups with independent -replicationFactor per group Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5197 See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#vmstorage-groups-at-vmselect Thanks to @zekker6 for the initial pull request at https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/718	2023-12-13 00:14:34 +02:00
Aliaksandr Valialkin	5b7f40907e	app/vmselect/netstorage: do not retry request when deadline is exceeded	2023-11-14 19:57:29 +01:00
Roman Khavronenko	cd2247b24a	app/vmselect: limit the number of parallel workers by 32 (#5195 ) * app/vmselect: limit the number of parallel workers by 32 The change should improve performance and memory usage during query processing on machines with big number of CPU cores. The number of parallel workers for query processing is controlled via `-search.maxWorkersPerQuery` command-line flag. By default, the number of workers is limited by the number of available CPU cores, but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip - The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage, so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md - Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag - Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md - Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS, because bigger values may worsen query performance and increase CPU usage - Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX, since it is closer to a feature than to a bugfix. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-10-26 09:15:27 +02:00
Aliaksandr Valialkin	9c3a37597c	app/vmselect/netstorage: run `make fmt` after `58326dbf25`	2023-09-10 15:18:15 +02:00
Aliaksandr Valialkin	58326dbf25	app/vmselect: return 503 status code when partial responses are denied and some of vmstorage nodes are temporarily unavailable This should help detecting this case and automatic retrying the query at healthy cluster replica in another availability zone. This commit is needed as a preparation for automatic query retry at another backend at vmauth on 5xx errors as described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4792#issuecomment-1674338561	2023-09-07 16:07:06 +02:00
Aliaksandr Valialkin	d8afd7fe98	Makefile: update golangci-lint from v1.51.2 to v1.54.2 See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2	2023-09-01 10:25:49 +02:00
Aliaksandr Valialkin	19d61737c1	app/{vminsert,vmselect}: follow-up after `2b7b3293c1` - Document the change at docs/CHANGELOG.md - Set the default value for -vmstorageUserTimeout to 3 seconds. This is much better than the 0 value, which means that TCP connection to unreachable vmstorage could block for up to 16 minutes. - Document -vmstorageUserTimeout at docs/Cluster-VictoriaMetrics.md	2023-08-29 12:17:39 +02:00
Will Jordan	2b7b3293c1	Add `vmstorageUserTimeout` flags to configure TCP user timeout (Linux) (#4423 ) `TCP_USER_TIMEOUT` (since Linux 2.6.37) specifies the maximum amount of time that transmitted data may remain unacknowledged before TCP will forcibly close the connection and return `ETIMEDOUT` to the application. Setting a low TCP user timeout allows RPC connections quickly reroute around unavailable storage nodes during network interruptions.	2023-08-29 11:46:39 +02:00
Aliaksandr Valialkin	992c300ce9	all: replace atomic.Value with atomic.Pointer[T] This eliminates the need in .(*T) casting for results obtained from Load() Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map, because map is already a pointer type.	2023-07-19 17:48:26 -07:00
Aliaksandr Valialkin	e1a2404db5	app/vmselect/netstorage: follow-up after `173ccf4333` - Clarify docs about -replicationFactor command-line flag at vmselect - Clarify description for -replicationFactor and -search.skipSlowReplicas command-line flags - Fix the logic for returning responses if -search.skipSlowReplicas command-line flag is enabled. The logic was broken in the `173ccf4333`, so it could return responses only if some of vmstorage nodes return error, while it should return when query results are successfully collected from more than (len(storageNodes) - replicationFactor) vmstorage nodes. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711	2023-07-09 11:58:22 -07:00
Haleygo	14e242d0b9	vmselect: fix result collect count (#4599 )	2023-07-08 08:21:27 +02:00
Roman Khavronenko	173ccf4333	vmselect: introduce `search.skipSlowReplicas` cmd-line flag (#4538 ) * vmselect: introduce `search.skipSlowReplicas` cmd-line flag vmselect has two logical conditions during request processing when `-replicationFactor` cmd-line flag is set: 1. If at least `len(storageNodes) - replicationFactor` responded, it could skip waiting for the rest of nodes to respond. This could lead to problems described here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207. 2. Mark response as partial if less than `len(storageNodes) - replicationFactor` responded without an error. The P1 showed itself error-prone and became the main reason why `-replicationFactor` wasn't recommended to use at vmselect level. However, this optimization could be still very useful in situations when there are slow and fast replicas in cluster. But P2 remains viable and important conditionless. Hiding P1 behind the feature-flag `search.skipSlowReplicas` should make `-replicationFactor` flag usable again. And let users choose whether they want P1 to be respected. Related issues https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711 Signed-off-by: hagen1778 <roman@victoriametrics.com> * docs: update changelog Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-07 11:50:26 +02:00
Aliaksandr Valialkin	eb47ad4b69	app/vmselect/netstorage: remove runtime.Gosched() call from unpackWorker() This should improve scalability of unpackWorker() on systems with many CPU cores. This is a follow-up for `a2ecf4fa4a` and `16f3b279a2` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966	2023-07-06 10:07:42 -07:00
Aliaksandr Valialkin	ec75d9097d	app/vmselect/netstorage: follow-up after `11ac551d52` - Clarify the scope of the fix at docs/CHANGELOG.md - Handle the case when -search.maxSamplesPerSeries limit is exceeded in the same way as the -search.maxSamplesPerQuery limit. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4472	2023-07-05 21:13:34 -07:00
Aliaksandr Valialkin	643e99a157	app/vmselect/netstorage: improve code readability a bit after `6c84b61893` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4364	2023-07-05 20:48:38 -07:00
Roman Khavronenko	11ac551d52	app/vmselect/netstorage: properly process `-search.maxSamplesPerQuery` limit (#4472 ) Properly return the error to user when `-search.maxSamplesPerQuery` limit is exceeded. Before, user could have received a partial response instead. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-23 13:17:34 +02:00
Haleygo	6c84b61893	vmselect:fix init sn take too much time (#4366 ) * vmselect: descrease start time for vmselect https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4364	2023-05-30 13:04:31 +02:00
Aliaksandr Valialkin	aac3dccfd1	lib/fs: replace MkdirAllIfNotExist->MustMkdirIfNotExist and MkdirAllFailIfExist->MustMkdirFailIfExist Callers of these functions log the returned error and then exit. The returned error already contains the path to directory, which was failed to be created. So let's just log the error together with the call stack inside these functions. This leaves the debuggability of the returned error at the same level while allows simplifying the code at callers' side. While at it, properly use MustMkdirFailIfExist instead of MustMkdirIfNotExist inside inmemoryPart.MustStoreToDisk(). It is expected that the inmemoryPart.MustStoreToDick() must fail if there is already a directory under the given path.	2023-04-13 22:22:08 -07:00
Nikolay	b38a145cfd	app/vmselect: properly remove temp files at windows system (#4020 ) With non-posix compliant systems it's not possible to remove unclosed files. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2023-03-27 18:10:44 -07:00
Aliaksandr Valialkin	db3bcbe56a	app/vmselect/netstorage: reduce the contention at fs.ReaderAt stats collection on systems with big number of CPU cores This optimization is based on the profile provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966#issuecomment-1483208419	2023-03-25 16:38:39 -07:00
Aliaksandr Valialkin	a2ecf4fa4a	app/vmselect/netstorage: document why runtime.Gosched() is removed at `28f054bb00` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966	2023-03-25 16:38:28 -07:00
Zakhar Bessarab	16f3b279a2	vmselect/netstorage: remove direct calls to `Gosched` to reduce amount of locks for global scope using `runtime.Gosched` requires acquiring global lock to check if there are any other goroutines to perform tasks. with the latest versions of runtime it can pause running goroutines automatically without requiring to call `Gosched` directly. Updates #3966 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-03-25 16:37:58 -07:00
Aliaksandr Valialkin	08da383eac	app/vmselect/netstorage: reduce the number of calls to runtime.Gosched() at timeseriesWorker() and unpackWorker() Call runtime.Gosched() only when there is a work to steal from other workers. Simplify the timeseriesWorker() and unpackWroker() code a bit by inlining stealTimeseriesWork() and stealUnpackWork(). This should reduce CPU usage when processing queries on systems with big number of CPU cores. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966	2023-03-20 20:32:56 -07:00
Aliaksandr Valialkin	18af01c387	app/vmselect: optimize incremental aggregates a bit Substitute sync.Map with an ordinary slice indexed by workerID. This should reduce the overhead when updating the incremental aggregate state	2023-03-20 15:42:13 -07:00
Aliaksandr Valialkin	e491fee1f4	app/vmselect/netstorage: do not intern string representation of MetricName for time series received from vmstorage It has been appeared that this interning may lead to increased memory usage and increased CPU usage when vmselect performs queries, which select big number of time series. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3863	2023-03-12 00:44:08 -08:00
Oleksandr Redko	0e1c395609	app,lib: fix typos in comments (#3804 )	2023-02-13 09:32:35 -08:00
Zakhar Bessarab	626bd22157	fix: vmselect multi-level setup panic (#3738 ) * app/vmselect/netstorage: fix panic for multi-level cluster setup when `replicationFactor` was set and request contained `trace` parameter (#3734) Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * app/vmselect/netstorage: use correct context for retry Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-02-01 08:56:36 -08:00
Aliaksandr Valialkin	26f6cfd3b2	app/vmselect/netstorage: tune the number of blocks per series which should be unpacked by a single goroutine instead of spinning up multiple goroutines This reduces overhead on time series data unpacking for typical cases, this reducing CPU usage at vmselect Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641	2023-01-12 09:35:15 -08:00
Aliaksandr Valialkin	41b0b951f3	app/vmselect/netstorage: unpack series blocks in the current goroutine if their count doesnt exceed 100 This should improve performance a bit for common case	2023-01-12 01:31:38 -08:00
Aliaksandr Valialkin	98931449c1	app/vmselect/netstorage: reduce tail latency during query processing Previously the selected time series were split evenly among available CPU cores for further processing - e.g unpacking the data and applying the given rollup function to the unpacked data. Some time series could be processed slower than others. This could result in uneven work distribution among available CPU cores, e.g. some CPU cores could complete their work sooner than others. This could slow down query execution. The new algorithm allows stealing time series to process from other CPU cores when all the local work is done. This should reduce the maximum time needed for query execution (aka tail latency). The new algorithm should also scale better on systems with many CPU cores, since every CPU processes locally assigned time series without inter-CPU communications. The inter-CPU communications are used only when all the local work is finished and the pending work from other CPUs needs to be stealed.	2023-01-10 13:42:26 -08:00
Aliaksandr Valialkin	158a280822	app/vmselect/netstorage: reduce memory allocations when unpacking time series Unpack time series with less than 4M samples in the currently running goroutine. Previously a new goroutine was being started for unpacking the samples. This was requiring additional memory allocations.	2023-01-09 23:17:34 -08:00
Aliaksandr Valialkin	abbac2c27c	app/vmselect/netstorage: pre-allocate 4 block references per each time series during querying Usually the number of blocks returned per each time series during queries is around 4. So it is a good idea to pre-allocate 4 block references per time series in order to reduce the number of memory allocations.	2023-01-09 22:08:30 -08:00

1 2 3 4 5

213 commits