app/vmselect/prometheus: do not drop match[] filters if -search.ignoreExtraFiltersAtLabelsAPI flag is set

The `match[]` filter is mandatory at /api/v1/series, so it mustn't be dropped here.

There is no sense in dropping `match[]` filter together with `extra_label` and `extra_filters[]`
at /api/v1/labels and /api/v1/label/.../values if -search.ignoreExtraFiltersAtLabelsAPI commnad-line flag is set,
since:
- the `match[]` filter triggers slow path at these APIs;
- the `extra_label` and `extra_filters[]` filters narrow down the number of matched time series,
  so they improve performance comparing to the case when only `match[]` filter is left,
  while `extra_label` and `extra_filters[]` filters are dropped.

This is a follow-up for 0b7a23a91d
This commit is contained in:
Aliaksandr Valialkin 2024-03-06 13:25:49 +02:00
parent 98b56476d7
commit 2711732944
No known key found for this signature in database
GPG key ID: 52C003EE2BCDB9EB
5 changed files with 35 additions and 31 deletions

View file

@ -647,11 +647,16 @@ Some workloads may need fine-grained resource usage limits. In these cases the f
Bigger number of concurrent requests usually require bigger amounts of memory at both `vmselect` and `vmstorage`.
For example, if a single query needs 100 MiB of additional memory during its execution, then 100 concurrent queries
may need `100 * 100 MiB = 10 GiB` of additional memory. So it is better to limit the number of concurrent queries,
while suspending additional incoming queries if the concurrency limit is reached.
`vmselect` and `vmstorage` provides `-search.maxQueueDuration` command-line flag for limiting the maximum wait time for suspended queries.
while pausing additional incoming queries if the concurrency limit is reached.
`vmselect` and `vmstorage` provides `-search.maxQueueDuration` command-line flag for limiting the maximum wait time for paused queries.
See also `-search.maxMemoryPerQuery` command-line flag at `vmselect`.
- `-search.maxQueueDuration` at `vmselect` and `vmstorage` limits the maximum duration queries may wait for execution when `-search.maxConcurrentRequests`
concurrent queries are executed.
- `-search.ignoreExtraFiltersAtLabelsAPI` at `vmselect` enables ignoring of `match[]`, [`extra_filters[]` and `extra_label`](https://docs.victoriametrics.com/#prometheus-querying-api-enhancements)
query args at [/api/v1/labels](https://docs.victoriametrics.com/url-examples/#apiv1labels) and
[/api/v1/label/.../values](https://docs.victoriametrics.com/url-examples/#apiv1labelvalues).
This may be useful for reducing the load on `vmstorage` if the provided extra filters match too many time series.
The downside is that the endpoints can return labels and series, which do not match the provided extra filters.
- `-search.maxSamplesPerSeries` at `vmselect` limits the number of raw samples the query can process per each time series.
`vmselect` processes raw samples sequentially per each found time series during the query. It unpacks raw samples on the selected time range
per each time series into memory and then applies the given [rollup function](https://docs.victoriametrics.com/MetricsQL.html#rollup-functions).
@ -698,11 +703,6 @@ Some workloads may need fine-grained resource usage limits. In these cases the f
when the database contains big number of unique time series because of [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate).
In this case it might be useful to set the `-search.maxLabelsAPIDuration` to quite low value in order to limit CPU and memory usage.
See also `-search.maxLabelsAPISeries` and `-search.ignoreExtraFiltersAtLabelsAPI`.
- `-search.ignoreExtraFiltersAtLabelsAPI` at `vmselect` enables ignoring of `match[]`, [`extra_filters[]` and `extra_label`](https://docs.victoriametrics.com/#prometheus-querying-api-enhancements)
query args at [/api/v1/labels](https://docs.victoriametrics.com/url-examples/#apiv1labels) and
[/api/v1/label/.../values](https://docs.victoriametrics.com/url-examples/#apiv1labelvalues).
This may be useful for reducing the load on `vmstorage` if the provided extra filters match too many time series.
The downside is that the endpoints can return labels and series, which do not match the provided extra filters.
- `-storage.maxDailySeries` at `vmstorage` can be used for limiting the number of time series seen per day aka
[time series churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate). See [cardinality limiter docs](#cardinality-limiter).
- `-storage.maxHourlySeries` at `vmstorage` can be used for limiting the number of [active time series](https://docs.victoriametrics.com/FAQ.html#what-is-an-active-time-series).

View file

@ -1284,18 +1284,21 @@ func getCommonParamsInternal(r *http.Request, startTime time.Time, requireNonEmp
if requireNonEmptyMatch && len(matches) == 0 {
return nil, fmt.Errorf("missing `match[]` arg")
}
filterss, err := getTagFilterssFromMatches(matches)
if err != nil {
return nil, err
}
var filterss [][]storage.TagFilter
if !isLabelsAPI || !*ignoreExtraFiltersAtLabelsAPI {
tagFilterss, err := getTagFilterssFromMatches(matches)
if err != nil {
return nil, err
}
if len(filterss) > 0 || !isLabelsAPI || !*ignoreExtraFiltersAtLabelsAPI {
// If matches isn't empty, then there is no sense in ignoring extra filters
// even if ignoreExtraLabelsAtLabelsAPI is set, since extra filters won't slow down
// the query - they can only improve query performance by reducing the number
// of matching series at the storage level.
etfs, err := searchutils.GetExtraTagFilters(r)
if err != nil {
return nil, err
}
filterss = searchutils.JoinTagFilterss(tagFilterss, etfs)
filterss = searchutils.JoinTagFilterss(filterss, etfs)
}
cp := &commonParams{

View file

@ -41,6 +41,8 @@ See also [LTS releases](https://docs.victoriametrics.com/lts-releases/).
* FEATURE: [stream aggregation](https://docs.victoriametrics.com/stream-aggregation/): expose `vm_streamaggr_flush_timeouts_total` and `vm_streamaggr_dedup_flush_timeouts_total` [counters](https://docs.victoriametrics.com/keyconcepts/#counter) at [`/metrics` page](https://docs.victoriametrics.com/#monitoring), which can be used for detecting flush timeouts for stream aggregation states. Expose also `vm_streamaggr_flush_duration_seconds` and `vm_streamaggr_dedup_flush_duration_seconds` [histograms](https://docs.victoriametrics.com/keyconcepts/#histogram) for monitoring the real flush durations of stream aggregation states.
* FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): improve trace display for better visual separation of branches. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5926).
* BUGFIX: do not drop `match[]` filter at [`/api/v1/series`](https://docs.victoriametrics.com/url-examples/#apiv1series) if `-search.ignoreExtraFiltersAtLabelsAPI` command-line flag is set, since this broke the `/api/v1/series` requests.
## [v1.99.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.99.0)
Released at 2024-03-01

View file

@ -658,11 +658,16 @@ Some workloads may need fine-grained resource usage limits. In these cases the f
Bigger number of concurrent requests usually require bigger amounts of memory at both `vmselect` and `vmstorage`.
For example, if a single query needs 100 MiB of additional memory during its execution, then 100 concurrent queries
may need `100 * 100 MiB = 10 GiB` of additional memory. So it is better to limit the number of concurrent queries,
while suspending additional incoming queries if the concurrency limit is reached.
`vmselect` and `vmstorage` provides `-search.maxQueueDuration` command-line flag for limiting the maximum wait time for suspended queries.
while pausing additional incoming queries if the concurrency limit is reached.
`vmselect` and `vmstorage` provides `-search.maxQueueDuration` command-line flag for limiting the maximum wait time for paused queries.
See also `-search.maxMemoryPerQuery` command-line flag at `vmselect`.
- `-search.maxQueueDuration` at `vmselect` and `vmstorage` limits the maximum duration queries may wait for execution when `-search.maxConcurrentRequests`
concurrent queries are executed.
- `-search.ignoreExtraFiltersAtLabelsAPI` at `vmselect` enables ignoring of `match[]`, [`extra_filters[]` and `extra_label`](https://docs.victoriametrics.com/#prometheus-querying-api-enhancements)
query args at [/api/v1/labels](https://docs.victoriametrics.com/url-examples/#apiv1labels) and
[/api/v1/label/.../values](https://docs.victoriametrics.com/url-examples/#apiv1labelvalues).
This may be useful for reducing the load on `vmstorage` if the provided extra filters match too many time series.
The downside is that the endpoints can return labels and series, which do not match the provided extra filters.
- `-search.maxSamplesPerSeries` at `vmselect` limits the number of raw samples the query can process per each time series.
`vmselect` processes raw samples sequentially per each found time series during the query. It unpacks raw samples on the selected time range
per each time series into memory and then applies the given [rollup function](https://docs.victoriametrics.com/MetricsQL.html#rollup-functions).
@ -709,11 +714,6 @@ Some workloads may need fine-grained resource usage limits. In these cases the f
when the database contains big number of unique time series because of [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate).
In this case it might be useful to set the `-search.maxLabelsAPIDuration` to quite low value in order to limit CPU and memory usage.
See also `-search.maxLabelsAPISeries` and `-search.ignoreExtraFiltersAtLabelsAPI`.
- `-search.ignoreExtraFiltersAtLabelsAPI` at `vmselect` enables ignoring of `match[]`, [`extra_filters[]` and `extra_label`](https://docs.victoriametrics.com/#prometheus-querying-api-enhancements)
query args at [/api/v1/labels](https://docs.victoriametrics.com/url-examples/#apiv1labels) and
[/api/v1/label/.../values](https://docs.victoriametrics.com/url-examples/#apiv1labelvalues).
This may be useful for reducing the load on `vmstorage` if the provided extra filters match too many time series.
The downside is that the endpoints can return labels and series, which do not match the provided extra filters.
- `-storage.maxDailySeries` at `vmstorage` can be used for limiting the number of time series seen per day aka
[time series churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate). See [cardinality limiter docs](#cardinality-limiter).
- `-storage.maxHourlySeries` at `vmstorage` can be used for limiting the number of [active time series](https://docs.victoriametrics.com/FAQ.html#what-is-an-active-time-series).

View file

@ -261,9 +261,9 @@ Cluster version of VictoriaMetrics:
curl http://<vmselect>:8481/select/0/prometheus/api/v1/labels
```
By default, VictoriaMetrics returns labels seen during the last day starting at 00:00 UTC. An arbitrary time range can be set via [`start` and `end` query args](https://docs.victoriametrics.com/#timestamp-formats).
The specified `start..end` time range is rounded to day granularity because of performance optimization concerns.
By default, VictoriaMetrics returns labels seen during the last day starting at 00:00 UTC because of performance reasons.
An arbitrary time range can be set via [`start` and `end` query args](https://docs.victoriametrics.com/#timestamp-formats).
The specified `start..end` time range is rounded to UTC day granularity because of performance reasons.
Additional information:
* [Getting label names](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names)
@ -280,16 +280,15 @@ Single-node VictoriaMetrics:
curl http://localhost:8428/prometheus/api/v1/label/job/values
```
Cluster version of VictoriaMetrics:
```sh
curl http://<vmselect>:8481/select/0/prometheus/api/v1/label/job/values
```
By default, VictoriaMetrics returns labels values seen during the last day starting at 00:00 UTC. An arbitrary time range can be set via `start` and `end` query args.
The specified `start..end` time range is rounded to day granularity because of performance optimization concerns.
By default, VictoriaMetrics returns labels values seen during the last day starting at 00:00 UTC because of performance reasons.
An arbitrary time range can be set via `start` and `end` query args.
The specified `start..end` time range is rounded to UTC day granularity because of performance reasons.
Additional information:
* [Querying label values](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values)
@ -361,9 +360,9 @@ Cluster version of VictoriaMetrics:
curl http://<vmselect>:8481/select/0/prometheus/api/v1/series -d 'match[]=vm_http_request_errors_total'
```
By default, VictoriaMetrics returns time series seen during the last day starting at 00:00 UTC. An arbitrary time range can be set via `start` and `end` query args.
The specified `start..end` time range is rounded to day granularity because of performance optimization concerns.
By default, VictoriaMetrics returns time series seen during the last day starting at 00:00 UTC because of performance reasons.
An arbitrary time range can be set via `start` and `end` query args.
The specified `start..end` time range is rounded to UTC day granularity because of performance reasons.
Additional information:
* [Finding series by label matchers](https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers)