app/vmselect/promql: propagate lower bucket values when fixing a histogram (#6547)

### Describe Your Changes In most cases histograms are exposed in sorted manner with lower buckets being first. This means that during scraping buckets with lower bounds have higher chance of being updated earlier than upper ones. Previously, values were propagated from upper to lower bounds, which means that in most cases that would produce results higher than expected once all buckets will become updated. Propagating from upper bound effectively limits highest value of histogram to the value of previous scrape. Once the data will become consistent in the subsequent evaluation this causes spikes in the result. Changing propagation to be from lower to higher buckets reduces value spikes in most cases due to nature of the original inconsistency. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580 An example histogram with previous(red) and updated(blue) versions: ![1719565540](https://github.com/VictoriaMetrics/VictoriaMetrics/assets/1367798/605c5e60-6abe-45b5-89b2-d470b60127b8) This also makes logic of filling nan values with lower buckets values: [1 2 3 nan nan nan] => [1 2 3 3 3 3] obsolete. Since buckets are now fixed from lower ones to upper this happens in the main loop, so there is no need in a second one. --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-03-11 15:34:56 +00:00 · 2024-07-10 17:15:29 +04:00 · 2024-07-10 17:15:29 +04:00 · 6a4bd5049b
commit 6a4bd5049b
parent 03e4c5c19c
4 changed files with 14 additions and 24 deletions
--- a/app/vmselect/promql/exec_test.go
+++ b/app/vmselect/promql/exec_test.go
@ -4461,7 +4461,7 @@ func TestExecSuccess(t *testing.T) {
 		),0.01)`
 		r := netstorage.Result{
 			MetricName: metricNameExpected,
-			Values:     []float64{18.57, 18.57, 18.57, 18.57, 18.57, 18.57},
+			Values:     []float64{30, 30, 30, 30, 30, 30},
 			Timestamps: timestampsExpected,
 		}
 		r.MetricName.Tags = []storage.Tag{{
--- a/app/vmselect/promql/transform.go
+++ b/app/vmselect/promql/transform.go
@ -1044,29 +1044,18 @@ func fixBrokenBuckets(i int, xss []leTimeseries) {
 	// Buckets are already sorted by le, so their values must be in ascending order,
 	// since the next bucket includes all the previous buckets.
 	// If the next bucket has lower value than the current bucket,
-	// then the current bucket must be substituted with the next bucket value.
-	// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2819
+	// then the next bucket must be substituted with the current bucket value.
+	// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580#issuecomment-2186659102
 	if len(xss) < 2 {
 		return
 	}
-	// Fill NaN in upper buckets with the first non-NaN value found in lower buckets.
-	for j := len(xss) - 1; j >= 0; j-- {
+
+	// Substitute upper bucket values with lower bucket values if the upper values are NaN
+	// or are bigger than the lower bucket values.
+	vNext := xss[0].ts.Values[0]
+	for j := 1; j < len(xss); j++ {
 		v := xss[j].ts.Values[i]
-		if !math.IsNaN(v) {
-			j++
-			for j < len(xss) {
-				xss[j].ts.Values[i] = v
-				j++
-			}
-			break
-		}
-	}
-	// Substitute lower bucket values with upper values if the lower values are NaN
-	// or are bigger than the upper bucket values.
-	vNext := xss[len(xss)-1].ts.Values[i]
-	for j := len(xss) - 2; j >= 0; j-- {
-		v := xss[j].ts.Values[i]
-		if math.IsNaN(v) || v > vNext {
+		if math.IsNaN(v) || vNext > v {
 			xss[j].ts.Values[i] = vNext
 		} else {
 			vNext = v
--- a/app/vmselect/promql/transform_test.go
+++ b/app/vmselect/promql/transform_test.go
@ -32,11 +32,11 @@ func TestFixBrokenBuckets(t *testing.T) {
 	f(nil, []float64{})
 	f([]float64{1}, []float64{1})
 	f([]float64{1, 2}, []float64{1, 2})
-	f([]float64{2, 1}, []float64{1, 1})
+	f([]float64{2, 1}, []float64{2, 2})
 	f([]float64{1, 2, 3, nan, nan}, []float64{1, 2, 3, 3, 3})
-	f([]float64{5, 1, 2, 3, nan}, []float64{1, 1, 2, 3, 3})
-	f([]float64{1, 5, 2, nan, 6, 3}, []float64{1, 2, 2, 3, 3, 3})
-	f([]float64{5, 10, 4, 3}, []float64{3, 3, 3, 3})
+	f([]float64{5, 1, 2, 3, nan}, []float64{5, 5, 5, 5, 5})
+	f([]float64{1, 5, 2, nan, 6, 3}, []float64{1, 5, 5, 5, 6, 6})
+	f([]float64{5, 10, 4, 3}, []float64{5, 10, 10, 10})
 }

 func TestVmrangeBucketsToLE(t *testing.T) {
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@ -59,6 +59,7 @@ See also [LTS releases](https://docs.victoriametrics.com/lts-releases/).
 * BUGFIX: [Single-node VictoriaMetrics](https://docs.victoriametrics.com/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/cluster-victoriametrics/): Fix the dateMetricIDCache consistency issue that leads to duplicate per-day index entries when new time series are inserted concurrently. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6534) for details.
 * BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): fix input cursor position reset in modal settings. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6530).
 * BUGFIX: [vmbackupmanager](https://docs.victoriametrics.com/vmbackupmanager/): fix `vm_backup_last_run_failed` metric not being properly initialized during startup. Previously, it could imply an error even if the backup have been completed successfully. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6550) for the details.
+* BUGFIX: [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html): properly calculate [histogram_quantile](https://docs.victoriametrics.com/MetricsQL.html#histogram_quantile) over Prometheus buckets with inconsistent values. See [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580#issuecomment-2186659102) and [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6547). Updates [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2819).

 ## [v1.102.0-rc2](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.102.0-rc2)