Improperly configured -bigMergeConcurrency command-line flag usually leads to uncontrolled
growth of unmerged parts, which, in turn, increases CPU usage and query durations.
So it is better deprecating this flag. In rare cases -smallMergeConcurrency command-line flag
can be used instead for controlling the concurrency of background merges.
There is a bug here where if you have a single bucket like:
foo{vmrange="4.084e+02...4.642e+02"} 2 123
The expected output is three le encoded buckets like:
foo{le="4.084e+02"} 0 123
foo{le="4.642e+02"} 2 123
foo{le="+Inf"} 2 123
This correctly encodes the start and end of the vmrange.
If however, the input contains the previous bucket, and that bucket is
empty then you only get the end le and +Inf out currently, i.e:
foo{vmrange="7.743e+05...8.799e+05"} 5 123
foo{vmrange="6.813e+05...7.743e+05"} 0 123
results in:
foo{le="8.799e+05"} 5 123
foo{le="+Inf"} 5 123
This causes issues when you go to compute a quantile because this means
that the assumed lower bound of the buckets is 0 and this we interpolate
between 0->end rather than the vmrange start->end as expected.
using `runtime.Gosched` requires acquiring global lock to check if there are any other goroutines to perform tasks. with the latest versions of runtime it can pause running goroutines automatically without requiring to call `Gosched` directly.
Updates #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Call runtime.Gosched() only when there is a work to steal from other workers.
Simplify the timeseriesWorker() and unpackWroker() code a bit by inlining stealTimeseriesWork() and stealUnpackWork().
This should reduce CPU usage when processing queries on systems with big number of CPU cores.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
- Allocate and initialize seriesByWorkerID slice in a single go instead
of initializing every item in the list separately.
This should reduce CPU usage a bit.
- Properly set anti-false sharing padding at timeseriesWithPadding structure
- Document the change at docs/CHANGELOG.md
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
* vmselect/promql: refactor `evalRollupNoIncrementalAggregate` to use lock-less approach for parallel workers computation
Locking there is causing issues when running on highly multi-core system as it introduces lock contention during results merge.
New implementation uses lock less approach to store results per workerID and merges final result in the end, this is expected to significantly reduce lock contention and CPU usage for systems with high number of cores.
Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* vmselect/promql: add pooling for `timeseriesWithPadding` to reduce allocations
Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* vmselect/promql: refactor `evalRollupFuncWithSubquery` to avoid using locks
Uses same approach as `evalRollupNoIncrementalAggregate` to remove locking between workers and reduce lock contention.
Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This opens the possibility to remove tssLock from evalRollupFuncWithSubquery()
in the follow-up commit from @zekker6 in order to speed up the code
for systems with many CPU cores.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
- Sync the description for -httpListenAddr.useProxyProtocol command-line flag at vmagent and vmauth,
so it is consistent with the description at vmauth and victoria-metrics
- Add a sample of panic text to docs/CHANGELOG.md, so it could be googled
- Mention the -httpListenAddr.useProxyProtocol command-line flag in the description for the bugfix
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
* vmalert: use group's ID in UI to avoid collisions
Identical group names are allowed. So we should used IDs
for various groupings and aggregations in UI.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: prevent disabling state updates tracking
The minimum number of update states to track is now set to 1.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: properly update `debug` and `update_entries_limit` params on hot-reload
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: display `debug` field for rule in UI
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: exclude `updates` field from json marhsaling
This field isn't correctly marshaled right now.
And implementing the correct marshaling for it doesn't
seem right, since json representation is mostly used
by systems like Grafana. And Grafana doesn't expect this
field to be present.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* fix test for disabled state
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* fix test for disabled state
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: speed up state restore procedure on start
Alerts state restore procedure has been changed to become asynchronous.
It doesn't block groups start anymore which significantly improves vmalert's startup time.
Instead, state restore is called by each group in their goroutines after the first rules
evaluation.
While previously state restore attempt was made for all loaded alerting rules,
now it is called only for alerts which became active after the first evaluation.
This reduces the amount of API calls to the configured remote read URL.
This also means that `remoteRead.ignoreRestoreErrors` command-line flag becomes deprecated now
and will have no effect if configured.
See relevant issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2608
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* make lint happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Apply suggestions from code review
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>