github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	9c3a37597c	app/vmselect/netstorage: run `make fmt` after `58326dbf25`	2023-09-10 15:18:15 +02:00
Aliaksandr Valialkin	58326dbf25	app/vmselect: return 503 status code when partial responses are denied and some of vmstorage nodes are temporarily unavailable This should help detecting this case and automatic retrying the query at healthy cluster replica in another availability zone. This commit is needed as a preparation for automatic query retry at another backend at vmauth on 5xx errors as described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4792#issuecomment-1674338561	2023-09-07 16:07:06 +02:00
Aliaksandr Valialkin	d8afd7fe98	Makefile: update golangci-lint from v1.51.2 to v1.54.2 See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2	2023-09-01 10:25:49 +02:00
Aliaksandr Valialkin	19d61737c1	app/{vminsert,vmselect}: follow-up after `2b7b3293c1` - Document the change at docs/CHANGELOG.md - Set the default value for -vmstorageUserTimeout to 3 seconds. This is much better than the 0 value, which means that TCP connection to unreachable vmstorage could block for up to 16 minutes. - Document -vmstorageUserTimeout at docs/Cluster-VictoriaMetrics.md	2023-08-29 12:17:39 +02:00
Will Jordan	2b7b3293c1	Add `vmstorageUserTimeout` flags to configure TCP user timeout (Linux) (#4423 ) `TCP_USER_TIMEOUT` (since Linux 2.6.37) specifies the maximum amount of time that transmitted data may remain unacknowledged before TCP will forcibly close the connection and return `ETIMEDOUT` to the application. Setting a low TCP user timeout allows RPC connections quickly reroute around unavailable storage nodes during network interruptions.	2023-08-29 11:46:39 +02:00
Aliaksandr Valialkin	992c300ce9	all: replace atomic.Value with atomic.Pointer[T] This eliminates the need in .(*T) casting for results obtained from Load() Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map, because map is already a pointer type.	2023-07-19 17:48:26 -07:00
Aliaksandr Valialkin	e1a2404db5	app/vmselect/netstorage: follow-up after `173ccf4333` - Clarify docs about -replicationFactor command-line flag at vmselect - Clarify description for -replicationFactor and -search.skipSlowReplicas command-line flags - Fix the logic for returning responses if -search.skipSlowReplicas command-line flag is enabled. The logic was broken in the `173ccf4333`, so it could return responses only if some of vmstorage nodes return error, while it should return when query results are successfully collected from more than (len(storageNodes) - replicationFactor) vmstorage nodes. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711	2023-07-09 11:58:22 -07:00
Haleygo	14e242d0b9	vmselect: fix result collect count (#4599 )	2023-07-08 08:21:27 +02:00
Roman Khavronenko	173ccf4333	vmselect: introduce `search.skipSlowReplicas` cmd-line flag (#4538 ) * vmselect: introduce `search.skipSlowReplicas` cmd-line flag vmselect has two logical conditions during request processing when `-replicationFactor` cmd-line flag is set: 1. If at least `len(storageNodes) - replicationFactor` responded, it could skip waiting for the rest of nodes to respond. This could lead to problems described here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207. 2. Mark response as partial if less than `len(storageNodes) - replicationFactor` responded without an error. The P1 showed itself error-prone and became the main reason why `-replicationFactor` wasn't recommended to use at vmselect level. However, this optimization could be still very useful in situations when there are slow and fast replicas in cluster. But P2 remains viable and important conditionless. Hiding P1 behind the feature-flag `search.skipSlowReplicas` should make `-replicationFactor` flag usable again. And let users choose whether they want P1 to be respected. Related issues https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711 Signed-off-by: hagen1778 <roman@victoriametrics.com> * docs: update changelog Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-07 11:50:26 +02:00
Aliaksandr Valialkin	eb47ad4b69	app/vmselect/netstorage: remove runtime.Gosched() call from unpackWorker() This should improve scalability of unpackWorker() on systems with many CPU cores. This is a follow-up for `a2ecf4fa4a` and `16f3b279a2` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966	2023-07-06 10:07:42 -07:00
Aliaksandr Valialkin	ec75d9097d	app/vmselect/netstorage: follow-up after `11ac551d52` - Clarify the scope of the fix at docs/CHANGELOG.md - Handle the case when -search.maxSamplesPerSeries limit is exceeded in the same way as the -search.maxSamplesPerQuery limit. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4472	2023-07-05 21:13:34 -07:00
Aliaksandr Valialkin	643e99a157	app/vmselect/netstorage: improve code readability a bit after `6c84b61893` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4364	2023-07-05 20:48:38 -07:00
Roman Khavronenko	11ac551d52	app/vmselect/netstorage: properly process `-search.maxSamplesPerQuery` limit (#4472 ) Properly return the error to user when `-search.maxSamplesPerQuery` limit is exceeded. Before, user could have received a partial response instead. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-23 13:17:34 +02:00
Haleygo	6c84b61893	vmselect:fix init sn take too much time (#4366 ) * vmselect: descrease start time for vmselect https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4364	2023-05-30 13:04:31 +02:00
Aliaksandr Valialkin	aac3dccfd1	lib/fs: replace MkdirAllIfNotExist->MustMkdirIfNotExist and MkdirAllFailIfExist->MustMkdirFailIfExist Callers of these functions log the returned error and then exit. The returned error already contains the path to directory, which was failed to be created. So let's just log the error together with the call stack inside these functions. This leaves the debuggability of the returned error at the same level while allows simplifying the code at callers' side. While at it, properly use MustMkdirFailIfExist instead of MustMkdirIfNotExist inside inmemoryPart.MustStoreToDisk(). It is expected that the inmemoryPart.MustStoreToDick() must fail if there is already a directory under the given path.	2023-04-13 22:22:08 -07:00
Nikolay	b38a145cfd	app/vmselect: properly remove temp files at windows system (#4020 ) With non-posix compliant systems it's not possible to remove unclosed files. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2023-03-27 18:10:44 -07:00
Aliaksandr Valialkin	db3bcbe56a	app/vmselect/netstorage: reduce the contention at fs.ReaderAt stats collection on systems with big number of CPU cores This optimization is based on the profile provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966#issuecomment-1483208419	2023-03-25 16:38:39 -07:00
Aliaksandr Valialkin	a2ecf4fa4a	app/vmselect/netstorage: document why runtime.Gosched() is removed at `28f054bb00` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966	2023-03-25 16:38:28 -07:00
Zakhar Bessarab	16f3b279a2	vmselect/netstorage: remove direct calls to `Gosched` to reduce amount of locks for global scope using `runtime.Gosched` requires acquiring global lock to check if there are any other goroutines to perform tasks. with the latest versions of runtime it can pause running goroutines automatically without requiring to call `Gosched` directly. Updates #3966 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-03-25 16:37:58 -07:00
Aliaksandr Valialkin	08da383eac	app/vmselect/netstorage: reduce the number of calls to runtime.Gosched() at timeseriesWorker() and unpackWorker() Call runtime.Gosched() only when there is a work to steal from other workers. Simplify the timeseriesWorker() and unpackWroker() code a bit by inlining stealTimeseriesWork() and stealUnpackWork(). This should reduce CPU usage when processing queries on systems with big number of CPU cores. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966	2023-03-20 20:32:56 -07:00
Aliaksandr Valialkin	18af01c387	app/vmselect: optimize incremental aggregates a bit Substitute sync.Map with an ordinary slice indexed by workerID. This should reduce the overhead when updating the incremental aggregate state	2023-03-20 15:42:13 -07:00
Aliaksandr Valialkin	e491fee1f4	app/vmselect/netstorage: do not intern string representation of MetricName for time series received from vmstorage It has been appeared that this interning may lead to increased memory usage and increased CPU usage when vmselect performs queries, which select big number of time series. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3863	2023-03-12 00:44:08 -08:00
Oleksandr Redko	0e1c395609	app,lib: fix typos in comments (#3804 )	2023-02-13 09:32:35 -08:00
Zakhar Bessarab	626bd22157	fix: vmselect multi-level setup panic (#3738 ) * app/vmselect/netstorage: fix panic for multi-level cluster setup when `replicationFactor` was set and request contained `trace` parameter (#3734) Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * app/vmselect/netstorage: use correct context for retry Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-02-01 08:56:36 -08:00
Aliaksandr Valialkin	26f6cfd3b2	app/vmselect/netstorage: tune the number of blocks per series which should be unpacked by a single goroutine instead of spinning up multiple goroutines This reduces overhead on time series data unpacking for typical cases, this reducing CPU usage at vmselect Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641	2023-01-12 09:35:15 -08:00
Aliaksandr Valialkin	41b0b951f3	app/vmselect/netstorage: unpack series blocks in the current goroutine if their count doesnt exceed 100 This should improve performance a bit for common case	2023-01-12 01:31:38 -08:00
Aliaksandr Valialkin	98931449c1	app/vmselect/netstorage: reduce tail latency during query processing Previously the selected time series were split evenly among available CPU cores for further processing - e.g unpacking the data and applying the given rollup function to the unpacked data. Some time series could be processed slower than others. This could result in uneven work distribution among available CPU cores, e.g. some CPU cores could complete their work sooner than others. This could slow down query execution. The new algorithm allows stealing time series to process from other CPU cores when all the local work is done. This should reduce the maximum time needed for query execution (aka tail latency). The new algorithm should also scale better on systems with many CPU cores, since every CPU processes locally assigned time series without inter-CPU communications. The inter-CPU communications are used only when all the local work is finished and the pending work from other CPUs needs to be stealed.	2023-01-10 13:42:26 -08:00
Aliaksandr Valialkin	158a280822	app/vmselect/netstorage: reduce memory allocations when unpacking time series Unpack time series with less than 4M samples in the currently running goroutine. Previously a new goroutine was being started for unpacking the samples. This was requiring additional memory allocations.	2023-01-09 23:17:34 -08:00
Aliaksandr Valialkin	abbac2c27c	app/vmselect/netstorage: pre-allocate 4 block references per each time series during querying Usually the number of blocks returned per each time series during queries is around 4. So it is a good idea to pre-allocate 4 block references per time series in order to reduce the number of memory allocations.	2023-01-09 22:08:30 -08:00
Aliaksandr Valialkin	2483c67579	app/vmselect/netstorage: cache canonical MetricName for time series returned from the storage This reduces memory allocations for repeated queries, which return (almost) the same set of time series.	2023-01-09 21:56:27 -08:00
Aliaksandr Valialkin	b7a4650ab0	all: use metricsql.CompileRegexp instead of regexp.Compile for compiling regexps used in graphite queries This should speed up repeated queries, since metricsql.CompileRegexp returns regexps from the cache on subsequent calls for the same input regexp.	2023-01-09 21:45:34 -08:00
Aliaksandr Valialkin	9f02f5a05a	app/vmselect/netstorage: eliminate memory allocation for sortBlocksHeap arg when calling mergeSortBlocks()	2023-01-09 21:29:01 -08:00
Aliaksandr Valialkin	96f04c9863	app/vmselect/netstorage: consistently select the sample with the biggest value out of samples with identical timestamps Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333 This fix is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3620 , but doesn't slow down the common case with merging replicated data blocks so significantly. Benchmark results: Before the change: BenchmarkMergeSortBlocks/replicationFactor-1-4 13968 85643 ns/op 956.53 MB/s 1700 B/op 1 allocs/op BenchmarkMergeSortBlocks/replicationFactor-2-4 10806 109171 ns/op 1500.77 MB/s 2191 B/op 1 allocs/op BenchmarkMergeSortBlocks/replicationFactor-3-4 8887 130623 ns/op 1881.45 MB/s 2660 B/op 1 allocs/op BenchmarkMergeSortBlocks/replicationFactor-4-4 7440 157348 ns/op 2082.52 MB/s 3174 B/op 1 allocs/op BenchmarkMergeSortBlocks/replicationFactor-5-4 6534 184473 ns/op 2220.38 MB/s 3612 B/op 1 allocs/op BenchmarkMergeSortBlocks/overlapped-blocks-bestcase-4 13419 85205 ns/op 961.44 MB/s 2213 B/op 1 allocs/op BenchmarkMergeSortBlocks/overlapped-blocks-worstcase-4 579 1894900 ns/op 43.23 MB/s 46760 B/op 1 allocs/op After the change: BenchmarkMergeSortBlocks/replicationFactor-1-4 13832 85298 ns/op 960.40 MB/s 1716 B/op 1 allocs/op BenchmarkMergeSortBlocks/replicationFactor-2-4 8833 134222 ns/op 1220.66 MB/s 2675 B/op 1 allocs/op BenchmarkMergeSortBlocks/replicationFactor-3-4 6487 184830 ns/op 1329.65 MB/s 3636 B/op 1 allocs/op BenchmarkMergeSortBlocks/replicationFactor-4-4 4977 236318 ns/op 1386.61 MB/s 4733 B/op 1 allocs/op BenchmarkMergeSortBlocks/replicationFactor-5-4 4088 296734 ns/op 1380.36 MB/s 5761 B/op 1 allocs/op BenchmarkMergeSortBlocks/overlapped-blocks-bestcase-4 14083 84067 ns/op 974.47 MB/s 2110 B/op 1 allocs/op BenchmarkMergeSortBlocks/overlapped-blocks-worstcase-4 536 2043534 ns/op 40.09 MB/s 50511 B/op 1 allocs/op	2023-01-09 12:58:18 -08:00
Roman Khavronenko	909cd04c55	lib/storage: keep sample with the biggest value on timestamp conflict (#3421 ) The change leaves raw sample with the biggest value for identical timestamps per each `-dedup.minScrapeInterval` discrete interval when the deduplication is enabled. ``` benchstat old.txt new.txt name old time/op new time/op delta DeduplicateSamples/minScrapeInterval=1s-10 817ns ± 2% 832ns ± 3% ~ (p=0.052 n=10+10) DeduplicateSamples/minScrapeInterval=2s-10 1.56µs ± 1% 2.12µs ± 0% +35.19% (p=0.000 n=9+7) DeduplicateSamples/minScrapeInterval=5s-10 1.32µs ± 3% 1.65µs ± 2% +25.57% (p=0.000 n=10+10) DeduplicateSamples/minScrapeInterval=10s-10 1.13µs ± 2% 1.50µs ± 1% +32.85% (p=0.000 n=10+10) name old speed new speed delta DeduplicateSamples/minScrapeInterval=1s-10 10.0GB/s ± 2% 9.9GB/s ± 3% ~ (p=0.052 n=10+10) DeduplicateSamples/minScrapeInterval=2s-10 5.24GB/s ± 1% 3.87GB/s ± 0% -26.03% (p=0.000 n=9+7) DeduplicateSamples/minScrapeInterval=5s-10 6.22GB/s ± 3% 4.96GB/s ± 2% -20.37% (p=0.000 n=10+10) DeduplicateSamples/minScrapeInterval=10s-10 7.28GB/s ± 2% 5.48GB/s ± 1% -24.74% (p=0.000 n=10+10) ``` https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333 Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-12-08 18:18:36 -08:00
Aliaksandr Valialkin	3a25a4b1de	app/{vminsert,vmselect}: speed up TestInitStopNodes()	2022-12-03 23:53:14 -08:00
Zakhar Bessarab	e407e7243a	{app/vmstorage,app/vmselect}: add API to get list of existing tenants (#3348 ) * {app/vmstorage,app/vmselect}: add API to get list of existing tenants * {app/vmstorage,app/vmselect}: add API to get list of existing tenants * app/vmselect: fix error message * {app/vmstorage,app/vmselect}: fix error messages * app/vmselect: change log level for error handling * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-11-25 10:32:45 -08:00
Aliaksandr Valialkin	353b137ff0	app/vmselect/netstorage: typo fix after `61736e4a1d`	2022-11-19 00:53:54 +02:00
Aliaksandr Valialkin	97eafbe4a7	app/vmselect: clarify that it isnt recommended setting -replicationFactor at vmselect nodes even if the replication is enabled at vminsert nodes	2022-11-18 14:04:12 +02:00
Aliaksandr Valialkin	61736e4a1d	app/vmselect/netstorage: remove superflouos map lookup at ProcessSearchQuery This should reduce CPU usage a bit during querying	2022-11-18 13:49:59 +02:00
Aliaksandr Valialkin	eb784ff399	app/vmselect/netstorage: emit more useful information in query traces when some of vmstorage nodes return errors or if there is no need to wait for their responses	2022-11-18 13:01:42 +02:00
Aliaksandr Valialkin	fe8d40f12c	app/{vminsert,vmselect}: test initialization with different number of storage nodes Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3329	2022-11-09 11:48:39 +02:00
Aliaksandr Valialkin	976bbe3677	app/{vminsert,vmselect}: limit the access to storageNodes to getStorageNodesBucket and setStorageNodesBucket functions This makes the code more maintainable and earier to test.	2022-10-28 11:41:55 +03:00
Aliaksandr Valialkin	4f53147ed4	app/{vminsert,vmselect}/netstorage: allow calling Init()+MustStop() in a loop Previously netstorage.MustStop() call didn't free up all the resources, so the subsequent call to nestorage.Init() would panic. This allows writing tests, which call nestorage.Init() + nestorage.MustStop() in a loop.	2022-10-25 14:43:05 +03:00
Aliaksandr Valialkin	43bdd96a6e	app/vmselect: improve performance scalability on multi-CPU systems for `/api/v1/export/...` endpoints	2022-10-01 22:16:07 +03:00
Aliaksandr Valialkin	f0eea5b02d	app/vmselect/netstorage: fix a typo, which leads to incorrect query results in VictoriaMetrics cluster The typo has been introduced in the commit `1a254ea20c` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3067	2022-09-08 13:46:40 +03:00
Aliaksandr Valialkin	9cca3a0a1b	app/vmselect/netstorage: fix potential panic under high load The panic may trigger during data blocks' processing received from vmstorage nodes when some of vmstorage nodes return an error or when `-replicationFactor` is set to values higher than 2 at `vmselect`. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3058	2022-09-02 21:36:15 +03:00
Aliaksandr Valialkin	08b8467e97	app/vmselect/netstorage: make golangci-lint happy by naming the unused padding field as _	2022-08-22 00:32:37 +03:00
Aliaksandr Valialkin	9ddd2699fd	all: remove the remaining bits of io/ioutil The io/ioutil package is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil VictoriaMetrics requires at least Go1.18, so it is time to remove the io/ioutil from source code This is a follow-up for `02ca2342ab`	2022-08-22 00:22:41 +03:00
Aliaksandr Valialkin	87e0d69bf4	app/vmselect/netstorage: fix a bug introduced in `1a254ea20c` The bug results in `duplicate output time series` error because the same time series is added two times into the orderedMetricNames list inside the tmpBlocksFileWrapper.Finalize(). While at it, properly release all the tmpBlocksFile structs on tbf.Finalize() error. Previously only the remaining tbf entries were released. This could result in resource leak.	2022-08-17 14:07:51 +03:00
Aliaksandr Valialkin	1a254ea20c	app/vmselect/netstorage: remove common contention points related to inter-CPU communcations This should improve vmselect performance scalability on systems with many CPU cores. The following tasks were done: - Use separate temporary files for storing the data read from each vmstorage node. This may result in the following potential issues: - Up to N times higher memory usage for performing each query where N is the number of vmstorage nodes known to vmselect. This issue shouldn't increase chances of out of memory errors in most cases, since per-query memory overhead is quite low comparing to the overall vmselect memory usage. - Up to N times higher number of open temporary files where N is the number of vmstorage nodes known to vmselect. This issue should be fixed by increasing the limit on the number of open files. - Use separate counters per each vmstorage node for various stats calculation when reading the data from vmstorage nodes.	2022-08-11 23:22:56 +03:00

1 2 3 4

192 commits