This reduces the number of memory allocations at the cost of possible memory usage increase,
since now different metric name strings may hold references to the previous byte slice.
This is good tradeoff, since ProcessSearchQuery is called in vmselect, and vmselect isn't usually limited by memory.
This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
* app/vmselect: limit the number of parallel workers by 32
The change should improve performance and memory usage during query processing
on machines with big number of CPU cores. The number of parallel workers for
query processing is controlled via `-search.maxWorkersPerQuery` command-line flag.
By default, the number of workers is limited by the number of available CPU cores,
but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
- The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage,
so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md
- Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag
- Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md
- Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS,
because bigger values may worsen query performance and increase CPU usage
- Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX,
since it is closer to a feature than to a bugfix.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Callers of these functions log the returned error and then exit. The returned error already contains the path
to directory, which was failed to be created. So let's just log the error together with the call stack
inside these functions. This leaves the debuggability of the returned error at the same level
while allows simplifying the code at callers' side.
While at it, properly use MustMkdirFailIfExist instead of MustMkdirIfNotExist inside inmemoryPart.MustStoreToDisk().
It is expected that the inmemoryPart.MustStoreToDick() must fail if there is already a directory under the given path.
using `runtime.Gosched` requires acquiring global lock to check if there are any other goroutines to perform tasks. with the latest versions of runtime it can pause running goroutines automatically without requiring to call `Gosched` directly.
Updates #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Call runtime.Gosched() only when there is a work to steal from other workers.
Simplify the timeseriesWorker() and unpackWroker() code a bit by inlining stealTimeseriesWork() and stealUnpackWork().
This should reduce CPU usage when processing queries on systems with big number of CPU cores.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
Previously the selected time series were split evenly among available CPU cores
for further processing - e.g unpacking the data and applying the given rollup
function to the unpacked data.
Some time series could be processed slower than others.
This could result in uneven work distribution among available CPU cores,
e.g. some CPU cores could complete their work sooner than others.
This could slow down query execution.
The new algorithm allows stealing time series to process from other CPU cores
when all the local work is done. This should reduce the maximum time
needed for query execution (aka tail latency).
The new algorithm should also scale better on systems with many CPU cores,
since every CPU processes locally assigned time series without inter-CPU communications.
The inter-CPU communications are used only when all the local work is finished
and the pending work from other CPUs needs to be stealed.
Unpack time series with less than 400K samples in the currently running goroutine.
Previously a new goroutine was being started for unpacking the samples.
This was requiring additional memory allocations.
Usually the number of blocks returned per each time series during queries is around 4.
So it is a good idea to pre-allocate 4 block references per time series
in order to reduce the number of memory allocations.
The io/ioutil package is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil
VictoriaMetrics requires at least Go1.18, so it is time to remove the io/ioutil from source code
This is a follow-up for 02ca2342ab
Reduce inter-CPU communications when processing the query over big number of time series.
This should improve performance for queries over big number of time series
on systems with many CPU cores.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2896
Based on b596ac3745
Thanks to @zqyzyq for the idea.
- Use binary search instead of linear scan when locating the run of smallest timestamps
in blocks with intersected time ranges. This should improve performance
when merging blocks with big number of samples
- Skip samples with duplicate timestamps. This should increase query performance
in cluster version of VictoriaMetrics with the enabled replication.
Previously SearchMetricNames was returning unmarshaled metric names.
This wasn't great for vmstorage, which should spend additional CPU time
for marshaling the metric names before sending them to vmselect.
While at it, remove possible duplicate metric names, which could occur when
multiple samples for new time series are ingested via concurrent requests.
Also sort the metric names before returning them to the client.
This simplifies debugging of the returned metric names across repeated requests to /api/v1/series