app/vmselect/promql: do not push down filters, which enumerate more than 10k unique values

Such filters may slow down time series search, so just skip them. This is a follow-up for e7f1ceeb84 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1827
2025-03-11 15:34:56 +00:00 · 2022-02-02 23:37:35 +02:00 · 2022-02-02 23:37:35 +02:00 · 4b850c2a59
commit 4b850c2a59
parent 4ef32df4fa
2 changed files with 7 additions and 2 deletions
--- a/app/vmselect/promql/eval.go
+++ b/app/vmselect/promql/eval.go
@ -382,12 +382,18 @@ func getCommonLabelFilters(tss []*timeseries) []metricsql.LabelFilter {
 			continue
 		}
 		values = getUniqueValues(values)
+		if len(values) > 10000 {
+			// Skip the filter on the given tag, since it needs to enumerate too many unique values.
+			// This may slow down the search for matching time series.
+			continue
+		}
 		lf := metricsql.LabelFilter{
 			Label: key,
 		}
 		if len(values) == 1 {
 			lf.Value = values[0]
 		} else {
+			sort.Strings(values)
 			lf.Value = joinRegexpValues(values)
 			lf.IsRegexp = true
 		}
@ -408,7 +414,6 @@ func getUniqueValues(a []string) []string {
 			m[s] = struct{}{}
 		}
 	}
-	sort.Strings(results)
 	return results
 }

--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@ -11,7 +11,7 @@ sort: 15
  * Binary operations with `on()`, `without()`, `group_left()` and `group_right()` modifiers. For example, `foo{a="b"} on (a) + bar` is now optimized to `foo{a="b"} on (a) + bar{a="b"}`
  * Multi-level binary operations. For example, `foo{a="b"} + bar{x="y"} + baz{z="q"}` is now optimized to `foo{a="b",x="y",z="q"} + bar{a="b",x="y",z="q"} + baz{a="b",x="y",z="q"}`
  * Aggregate functions. For example, `sum(foo{a="b"}) by (c) + bar{c="d"}` is now optimized to `sum(foo{a="b",c="d"}) by (c) + bar{c="d"}`
-* FEATURE [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html): optimize joining with `*_info` labels. For example: `kube_pod_created{namespace="prod"} * on (uid) group_left(node) kube_pod_info` now automatically adds the needed filters on `uid` label to `kube_pod_info` before selecting series for the right side of `*` operation. This may save CPU, RAM and disk IO resources. See [this article](https://www.robustperception.io/exposing-the-software-version-to-prometheus) for details on `*_info` labels.
+* FEATURE [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html): optimize joining with `*_info` labels. For example: `kube_pod_created{namespace="prod"} * on (uid) group_left(node) kube_pod_info` now automatically adds the needed filters on `uid` label to `kube_pod_info` before selecting series for the right side of `*` operation. This may save CPU, RAM and disk IO resources. See [this article](https://www.robustperception.io/exposing-the-software-version-to-prometheus) for details on `*_info` labels. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1827).
 * FEATURE: all: expose `process_cpu_cores_available` metric, which shows the number of CPU cores available to the app. The number can be fractional if the corresponding cgroup limit is set to a fractional value. This metric is useful for alerting on CPU saturation. For example, the following query alerts when the app uses more than 90% of CPU during the last 5 minutes: `rate(process_cpu_seconds_total[5m]) / process_cpu_cores_available > 0.9` . See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2107).
 * FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): add ability to configure notifiers (e.g. alertmanager) via a file in the way similar to Prometheus. See [these docs](https://docs.victoriametrics.com/vmalert.html#notifier-configuration-file), [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2127).
 * FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): add support for Consul service discovery for notifiers. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1947).