Commit graph

5838 commits

Author SHA1 Message Date
Aliaksandr Valialkin
1c18b37604
app/vmselect/promql: follow-up for 79e1c6a6fc
- Document the fix at docs/CHANGELOG.md
- Add tests with multiple adjancent zero buckets
- Simplify the fix a bit

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/296
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4021
2023-03-27 18:06:10 -07:00
Ze'ev Klapow
8f33561797
fix le buckets when adjacent vmrange is empty (#4021)
There is a bug here where if you have a single bucket like:

foo{vmrange="4.084e+02...4.642e+02"} 2 123

The expected output is three le encoded buckets like:

foo{le="4.084e+02"} 0 123
foo{le="4.642e+02"} 2 123
foo{le="+Inf"} 2 123

This correctly encodes the start and end of the vmrange.
If however, the input contains the previous bucket, and that bucket is
empty then you only get the end le and +Inf out currently, i.e:

foo{vmrange="7.743e+05...8.799e+05"} 5 123
foo{vmrange="6.813e+05...7.743e+05"} 0 123

results in:

foo{le="8.799e+05"} 5 123
foo{le="+Inf"} 5 123

This causes issues when you go to compute a quantile because this means
that the assumed lower bound of the buckets is 0 and this we interpolate
between 0->end rather than the vmrange start->end as expected.
2023-03-27 18:06:08 -07:00
Aliaksandr Valialkin
7841651de6
docs/CHANGELOG.md: cut v1.87.4 release 2023-03-25 17:03:18 -07:00
Aliaksandr Valialkin
e6727fa51c
app/vmselect/netstorage: reduce the contention at fs.ReaderAt stats collection on systems with big number of CPU cores
This optimization is based on the profile provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966#issuecomment-1483208419
2023-03-25 16:45:31 -07:00
Aliaksandr Valialkin
f47e26025c
app/vmselect/netstorage: document why runtime.Gosched() is removed at 28f054bb00
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-25 16:45:31 -07:00
Zakhar Bessarab
1c34cc33e5
vmselect/netstorage: remove direct calls to Gosched to reduce amount of locks for global scope
using `runtime.Gosched` requires acquiring global lock to check if there are any other goroutines to perform tasks. with the latest versions of runtime it can pause running goroutines automatically without requiring to call `Gosched` directly.

Updates #3966

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-25 16:45:31 -07:00
Aliaksandr Valialkin
b0e64f95ad
app/vmselect/netstorage: reduce the number of calls to runtime.Gosched() at timeseriesWorker() and unpackWorker()
Call runtime.Gosched() only when there is a work to steal from other workers.
Simplify the timeseriesWorker() and unpackWroker() code a bit by inlining stealTimeseriesWork() and stealUnpackWork().

This should reduce CPU usage when processing queries on systems with big number of CPU cores.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-25 16:43:47 -07:00
Aliaksandr Valialkin
511db8c453
app/vmselect/promql: typo fix after e7f46a0aab 2023-03-25 01:25:49 -07:00
Aliaksandr Valialkin
cee0fb8071
app/vmselect/promql: follow-up for 7205c79c5a
- Allocate and initialize seriesByWorkerID slice in a single go instead
  of initializing every item in the list separately.
  This should reduce CPU usage a bit.
- Properly set anti-false sharing padding at timeseriesWithPadding structure
- Document the change at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-25 01:25:08 -07:00
Zakhar Bessarab
71044221f6
app/vmselect/promql: use lock-less approach to gather results of parallel processing for evalRollup* funcs (#4004)
* vmselect/promql: refactor `evalRollupNoIncrementalAggregate` to use lock-less approach for parallel workers computation

Locking there is causing issues when running on highly multi-core system as it introduces lock contention during results merge.

New implementation uses lock less approach to store results per workerID and merges final result in the end, this is expected to significantly reduce lock contention and CPU usage for systems with high number of cores.

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* vmselect/promql: add pooling for `timeseriesWithPadding` to reduce allocations

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* vmselect/promql: refactor `evalRollupFuncWithSubquery` to avoid using locks

Uses same approach as `evalRollupNoIncrementalAggregate` to remove locking between workers and reduce lock contention.

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-25 01:24:20 -07:00
Aliaksandr Valialkin
d8e218d99e
app/vmselect/promql: pass workerID to the callback inside doParallel()
This opens the possibility to remove tssLock from evalRollupFuncWithSubquery()
in the follow-up commit from @zekker6 in order to speed up the code
for systems with many CPU cores.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-25 01:23:06 -07:00
Aliaksandr Valialkin
9b5245a836
app/vmselect/promql: fix TestIncrementalAggr test on systems less than 3 CPU cores
This is a follow-up for 4856a4cf5a
2023-03-25 01:21:43 -07:00
Aliaksandr Valialkin
c780c6a280
app/vmselect: optimize incremental aggregates a bit
Substitute sync.Map with an ordinary slice indexed by workerID.
This should reduce the overhead when updating the incremental aggregate state
2023-03-24 23:50:31 -07:00
oliverpool
a3bc64e7ed
app/vmselect/promql: add test to ensure 8-byte alignment (#3948)
See 0af9e2b693
2023-03-24 23:49:04 -07:00
Aliaksandr Valialkin
0dcd796323
vendor: run make vendor-update 2023-03-24 22:29:03 -07:00
Alexander Marshalov
95c67d8309
allowed using dashes and dots in environment variables names (#4009)
* allowed using dashes and dots in environment variables names for templating config files with envtemplate (#3999)

Signed-off-by: Alexander Marshalov <_@marshalov.org>

* Apply suggestions from code review

---------

Signed-off-by: Alexander Marshalov <_@marshalov.org>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-24 22:23:52 -07:00
Aliaksandr Valialkin
8a878fa169
app/vmbackup: simplify code a bit after 5ba347bd2c
Unconditionally call deleteSnapshot() func just after making the snapshot, either successful or unsuccessful

Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2055
2023-03-24 22:20:20 -07:00
Zakhar Bessarab
91a5661b86
app/vmbackup: delete created snapshot in case of error during backup (#4008)
Related issue: #2055

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-24 22:20:20 -07:00
Nikolay
c2380f73e6
lib/netutil: log only parsing errors for proxy-protocol (#3985)
* lib/netutil: log only parsing errors for proxy-protocol

Previosly every error was logged. With configured TCP health checks at load-balancer or kubernetes, vmauth spams a lot of false positive error message into logs

* Update docs/CHANGELOG.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* Update lib/netutil/tcplistener.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-03-24 22:14:32 -07:00
Aliaksandr Valialkin
e7c20f0c75
lib/{mergeset,storage}: prevent from long wait time when creating a snapshot under high data ingestion rate
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3873
2023-03-24 22:12:08 -07:00
Dmytro Kozlov
67a526f5f7
lib/storage: fix collect downsampling metrics (#489)
* lib/storage: fix downsampling

* lib/storage: update logic

* lib/storage: fix comments, removed unneeded check
2023-03-19 23:34:33 -07:00
Aliaksandr Valialkin
4793665c4f
docs/CHANGELOG.md: cut v1.87.3 2023-03-12 23:24:37 -07:00
Aliaksandr Valialkin
1050d6bc0f
app/vmselect/promql: prevent from cannot unmarshal timeseries from rollupResultCache panic after the upgrade to v1.89.0
The issue has been introduced in 0af9e2b693
2023-03-12 19:10:48 -07:00
Aliaksandr Valialkin
dedac538c6
Makefile: update golangci-lint from v1.51.1 to v1.51.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.51.2
2023-03-12 17:12:49 -07:00
Aliaksandr Valialkin
5a9300be13
app/vmselect: remove data race on updating EvalConfig.IsPartialResponse from concurrently running goroutines
This properly returns `is_partial: true` for partial responses.
2023-03-12 16:57:28 -07:00
Aliaksandr Valialkin
c8f8aa85a2
app/vmselect/promql: prevent from SIGBUS crash on architecures, which deny unaligned access to 8-byte words (e.g. ARM)
Thanks to @oliverpool for nailing down the root cause of the issue and for the initial attempt to fix it
at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3927
2023-03-12 16:56:37 -07:00
Aliaksandr Valialkin
fe1d959ca5
docs/CHANGELOG.md: document 113a89904d 2023-03-12 03:00:41 -07:00
Nikolay
ed2e4685d9
lib/vmselectapi: fixes regression for disable compression setting (#3932)
after vmselect api refactoring it wasn't possible to disable response cache.
This patch restores correct behavior for rpc.disableCompression flag
2023-03-12 01:59:38 -08:00
Aliaksandr Valialkin
60fcf88e45
vendor: make vendor-update 2023-03-12 01:49:22 -08:00
Roman Khavronenko
3d2cfc4f33
security: bump go version to 1.20.2 (#3935)
upgrade Go builder from Go1.20.1 to Go1.20.2
See the list of issues addressed in Go1.20.2 here (https://github.com/golang/go/issues?q=milestone%3AGo1.20.2+label%3ACherryPickApproved).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-12 01:38:51 -08:00
Aliaksandr Valialkin
7f34158db5
app/vmselect/netstorage: do not intern string representation of MetricName for time series received from vmstorage
It has been appeared that this interning may lead to increased memory usage and increased CPU usage
when vmselect performs queries, which select big number of time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3863
2023-03-12 01:36:07 -08:00
Aliaksandr Valialkin
f5d50c4bc5
docs/CHANGELOG.md: document 927d9da270 2023-03-12 01:33:53 -08:00
Nikolay
ea038a8940
lib/storage: correctly handle io.EOF error for pre-fetched metrics (#3946)
io.EOF shouldn't be returned from this function. It breaks all search
API logic and may result in empty query results.
2023-03-12 01:29:05 -08:00
Aliaksandr Valialkin
7feec58f91
docs/CHANGELOG.md: clarify the description for 6bfe9cc733
- Add the panic message to the description, so it is easier to google
- Add a link to the corresponding bugreport

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3897
2023-03-12 01:26:28 -08:00
Nikolay
ce950d3b06
lib{mergset,storage}: prevent possible race condition with logging st… (#3900)
lib{mergset,storage}: prevent possible race condition with logging stats for merges

Previously partwrapper could be release by background process and reference for part may be invalid
during logging stats. It will lead to panic at vmstorage
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3897
2023-03-12 01:23:55 -08:00
Aliaksandr Valialkin
bc144e2b05
all: follow-up for 7a3e16e774
- Sync the description for -httpListenAddr.useProxyProtocol command-line flag at vmagent and vmauth,
  so it is consistent with the description at vmauth and victoria-metrics
- Add a sample of panic text to docs/CHANGELOG.md, so it could be googled
- Mention the -httpListenAddr.useProxyProtocol command-line flag in the description for the bugfix

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-03-12 01:19:55 -08:00
Nikolay
c80d0aaaf0
lib/netutil: fixes panic at proxy protocol (#3905)
it may occur if non proxy protocol message received by tcp server.
Listener Accept method must return only non-recoverable errors.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-03-12 01:12:53 -08:00
Zakhar Bessarab
3f57d4d489
lib/promscrape: correctly register vm_promscrape_config_* metrics (#3876)
* lib/promscrape: set `vm_promscrape_config_last_reload_successful` to 1 if there was no promscrape config provided

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape: register `vm_promscrape_config_*` metrics only in case promscrape config is used

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-27 12:06:22 -08:00
Dmytro Kozlov
d161873857
app/vmctl: skip series if measurement not found (#3869)
app/vmctl: skip measurements with no fields for influxdb mode
2023-02-27 12:04:48 -08:00
Aliaksandr Valialkin
bd319ea7e5
docs/CHANGELOG.md: mention that v1.87.2 is an LTS release 2023-02-24 16:12:56 -08:00
Aliaksandr Valialkin
27afb0ac37
docs/CHANGELOG.md: document v1.79.9 release 2023-02-24 15:11:48 -08:00
Aliaksandr Valialkin
16578b58b6
docs/CHANGELOG.md: cut v1.87.2 release 2023-02-24 15:09:40 -08:00
Aliaksandr Valialkin
5816678a86
docs/CHANGELOG.md: document d8eaa511b0 2023-02-24 12:44:36 -08:00
Zakhar Bessarab
c9547f4fba
lib/{fs,mergeset,storage}: skip .must-remove. dirs when creating snapshot (#3858) (#3867) 2023-02-24 12:44:35 -08:00
Aliaksandr Valialkin
1026e25f33
docs/CHANGELOG.md: typo fix: scrape scrape -> scrape 2023-02-24 12:34:07 -08:00
Aliaksandr Valialkin
95b66f3de5
lib/promscrape: follow-up for 43e104a83f
- Return immediately on context cancel during the backoff sleep.
  This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747

- Add a comment describing why the second attempt to obtain the response from remote side
  is perfromed immediately after the first attempt.

- Remove fasthttp dependency from lib/promscrape/discoveryutils

- Set context deadline before calling doRequestWithPossibleRetry().
  This simplifies the doRequestWithPossibleRetry() a bit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3293
2023-02-24 12:26:33 -08:00
Zakhar Bessarab
3f64dae46b
fix: do not use exponential backoff for first retry of scrape request (#3824)
* fix: do not use exponential backoff for first retry of scrape request (#3293)

* lib/promscrape: refactor `doRequestWithPossibleRetry` backoff to simplify logic

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update lib/promscrape/client.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* lib/promscrape: refactor `doRequestWithPossibleRetry` to make it more straightforward

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-02-24 12:26:32 -08:00
Alexander Marshalov
808a06bfd4
fix interpolate function for filling only intermediate gaps (#3816) (#3857)
* fix interpolate function for filling only intermediate gaps (#3816)

* Update docs/CHANGELOG.md

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-23 19:30:23 -08:00
Aliaksandr Valialkin
8dd8972a7a
docs/CHANGELOG.md: document 6d019a3c37
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3830
2023-02-23 19:30:23 -08:00
Mattias Ängehov
13d0a2a3e7
Azure Service Discovery - Fix token fetch for Container Apps/App Services (#3832)
* Modify API version when running in Container App

* Handle expires on from token response

Response from IMDS does not always contain expires in value which is
currently used to get the token expiry time. An example resources that
doesn't provide it are Container Apps and App Service.

Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com>

* Fix client id parameter for user assigned identity

* Apply suggestions from code review

---------

Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com>
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2023-02-23 19:30:23 -08:00