lib/storage: limit max mergeConcurrency value for systems with high number of CPUs (#2673)

Workers count for merges affects the max part size during merges. Such behaviour protects storage from running out of disk space for scenario when all workers are merging parts with the max size. This works very well for most cases. But for systems where high number of CPUs is allocated for vmstorage components this could significantly impact the max part size and result in more unmerged parts than expected. While checking multiple production highly loaded setups it was discovered that `max_over_time(vm_active_merges{type="storage/big}[1h]}"` rarely exceeds 2, and `max_over_time(vm_active_merges{type="storage/small}[1h]}"` rarely exceeds 4. The change in this commit limits the max value for concurrency accordingly. Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-03-11 15:34:56 +00:00 · 2022-06-07 13:55:09 +02:00 · 2022-06-07 13:55:09 +02:00 · 1ee1e986da
commit 1ee1e986da
parent 194258c7b4
1 changed files with 10 additions and 2 deletions
--- a/lib/storage/partition.go
+++ b/lib/storage/partition.go
@ -869,10 +869,18 @@ func hasActiveMerges(pws []*partWrapper) bool {
 }

 var (
-	bigMergeWorkersCount   = (cgroup.AvailableCPUs() + 1) / 2
-	smallMergeWorkersCount = (cgroup.AvailableCPUs() + 1) / 2
+	bigMergeWorkersCount   = getDefaultMergeConcurrency(4)
+	smallMergeWorkersCount = getDefaultMergeConcurrency(8)
 )

+func getDefaultMergeConcurrency(max int) int {
+	v := (cgroup.AvailableCPUs() + 1) / 2
+	if v > max {
+		v = max
+	}
+	return v
+}
+
 // SetBigMergeWorkersCount sets the maximum number of concurrent mergers for big blocks.
 //
 // The function must be called before opening or creating any storage.