Commit graph

60 commits

Author SHA1 Message Date
Aliaksandr Valialkin
744f8c3fe7
app/vmselect/promql: add outliers_iqr(q) and outlier_iqr_over_time(m[d]) functions
These functions allow detecting anomalies in series and samples using Interquartile range method.
See Outliers section at https://en.wikipedia.org/wiki/Interquartile_range for more details.
2023-10-31 22:14:14 +01:00
Nikolay
4a50e9400c
app/vmselect: reduce lock contention for heavy aggregation requests (#5119)
reduce lock contention for heavy aggregation requests
previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests.
Now instead of interning, new string is created. It may increase CPU and memory usage for some cases.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087
2023-10-10 13:44:02 +02:00
Aliaksandr Valialkin
18dd0d1dbf
.golangci.yml: properly enable revive linter and fix all the warnings it detects 2023-02-26 12:19:58 -08:00
Roman Khavronenko
66d0b45651
vmselect/promql: check for deadline in count_values fn (#3806)
* vmselect/promql: check for deadline in `count_values` fn

`count_values` could be very slow during the data processing.
Checking for deadline between iterations supposed to reduce
probability of exceeding `search.maxQueryDuration`.

The change also adds a new trace record, which captures the time
spent in aggregation function. Before that, the trace for aggr funcs
could be confusing since it doesn't account for all the places where
time was spent.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-24 17:10:38 -08:00
Aliaksandr Valialkin
a15da5ff73
app/vmselect/promql: add share(q) aggregate function for normalizing results across multiple time series in [0..1] value range per each timestamp and aggregation group 2023-02-18 22:43:54 -08:00
Oleksandr Redko
0e1c395609
app,lib: fix typos in comments (#3804) 2023-02-13 09:32:35 -08:00
Aliaksandr Valialkin
c8bd3534cb
app/vmselect/promql: eliminate memory allocation when sorting values inside float64s 2023-01-09 23:06:57 -08:00
Aliaksandr Valialkin
895d5d9d22
app/vmselect/promql: consistently intern series names obtained from marshalMetricNameSorted
This reduces memory allocations when the returned series names are used as map keys later
2023-01-09 22:46:30 -08:00
Aliaksandr Valialkin
efb3c630fe
app/vmselect/promql: intern output series names during normal aggregation 2023-01-09 22:15:31 -08:00
Aliaksandr Valialkin
808ce815e4
app/vmselect/promql: allow passing inf arg into functions, which accept numeric limit on the number of output time series
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3461
2022-12-10 22:48:01 -08:00
Aliaksandr Valialkin
98175493d7
app/vmselect/promql: remove empty series before applying aggregate function
Previously empty series (e.g. series with all NaN samples) were passed to aggregate functions.
Such series must be ingored by all the aggregate functions.
So it is better from consistency PoV filtering out empty series before applying aggregate functions.
2022-09-30 08:44:06 +03:00
Aliaksandr Valialkin
fe2269b999
all: remove explicit "xxhash" name when importing github.com/cespare/xxhash/v2 package
This package already has the same name, so there is no need in explicit name
2022-06-21 20:24:28 +03:00
Aliaksandr Valialkin
0ef7a05fc0
app/vmselect/promql: rename removeNaNs() to more clear removeEmptySeries() 2022-04-20 19:53:24 +03:00
Aliaksandr Valialkin
addae7fc6a
app/vmselect/promql: preserve the order of time series passed to limit_offset() function
Previously time series passed to `limit_offset()` were shuffled according to hash for their labels.
This was unexpected behaviour for most users.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1920 and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/951
2021-12-12 18:07:15 +02:00
Aliaksandr Valialkin
1eb756fd11
app/vmselect/promql: arrange function names in the code in alphabetical order
This should simplify code maintenance in the future
2021-11-14 13:54:43 +02:00
Aliaksandr Valialkin
a102fca4ac
app/vmselect/promql: add limit_offset(limit, offset, q) function, which can be used for paging over big number of time series
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1778
2021-11-03 16:03:10 +02:00
Aliaksandr Valialkin
ed994873fd
app/vmselect/promql: randomize the static selection of time series returned from limitk()
Sort series by a hash calculated from the series labels. This should guarantee "random" selection of the returned time series.
Previously the selection could be biased, since time series were sorted alphabetically by label names and label values.
2021-10-16 21:16:34 +03:00
Aliaksandr Valialkin
f1e1d20ac4
app/vmselect/promql: consistently return the same set of time series from limitk() function
This is the expected behaviour by most users.
2021-10-08 19:55:29 +03:00
Aliaksandr Valialkin
ec6eb03d65
app/vmselect/promql: add topk_last and bottomk_last functions 2021-09-30 13:23:27 +03:00
Aliaksandr Valialkin
eff31c10ec
app/vmselect/promql: follow-up after 526dd93b32
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1625
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1612
2021-09-27 18:57:03 +03:00
Roman Khavronenko
6061464d80
app/vmselect: quantile func compatiblity with Prometheus (#1646)
* app/vmselect: `quantile` func compatiblity with Prometheus

The `quantile` func was previously calculated by https://github.com/valyala/histogram
package. The result of such calculation was always the closest real value to
requested quantile. While in Prometheus implementation interpolation is used.
Such difference may result into discrepancy in output between Prometheus and
VictoriaMetrics.

This commit adds a Prometheus-like `quantile` function. It also used by other
functions which depend on it, such as `quantiles`, `quantile_over_time`, `median` etc.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1625

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmselect: `quantile` review fixes

* quantile functions were split into multiple to provide
different API for already sorted data;
* float64sPool is used for reducing allocations. Items in pool may have
different sizes, but defining a new pool was complicates due to name collisions;

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-09-27 18:57:02 +03:00
Aliaksandr Valialkin
4ed7548ced app/vmselect/promql: optimize quantiles() calculation
Calculate quantiles in one go instead of calculating each quantile individually
2021-09-17 12:38:21 +03:00
Aliaksandr Valialkin
3f8de4f82f app/vmselect/promql: add mad(q) and outliers_mad(tolerance, q) functions to MetricsQL 2021-09-16 13:35:38 +03:00
Aliaksandr Valialkin
19516aedb5 app/vmselect/promql: use Prometheus-compatible label value formatting for count_values function 2021-09-13 13:48:18 +03:00
Aliaksandr Valialkin
39bb6bdd79 app/vmselect/promql: add quantile("phiLabel", phi1, ..., phiN, q) aggregate function to MetricsQL
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1573
2021-08-27 18:40:25 +03:00
Aliaksandr Valialkin
f90bf265f4 app/vmselect/promql: properly detect aggregate topk* and bottomk* aggregate functions in order to disable duplicate sorting
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1189
2021-04-08 00:10:33 +03:00
Aliaksandr Valialkin
b9469de410 app/vmselect/promql: add ability to set label value additionally to label name for the remaining sum of time series returned from topk_* and bottomk_* functions in the form: topk_min(N, m, "label=value") 2021-04-02 23:56:30 +03:00
Aliaksandr Valialkin
efebc3b6fb app/vmselect/promql: code cleanup after 43823addea 2020-11-06 01:31:33 +02:00
n4mine
3127aa92b5 app/vmselect/promql: fix when the parameter of maxValue(), minValue() leading by NaN. it will cause {top,bottom}k_{max,min} return inappropriate result (#883) 2020-11-06 01:31:31 +02:00
Aliaksandr Valialkin
c4464594b7 app/vmselect/promql: allow passing optional third argument to topk_* and bottomk_* functions in order to obtain sum of time series outside top/bottom K 2020-10-20 20:09:55 +03:00
Aliaksandr Valialkin
f877e703c8 app/vmselect/promql: fix mode_over_time calculations
Previously `mode_over_time` could return garbage due to improper shuffling of input data points.
2020-10-13 11:58:30 +03:00
Aliaksandr Valialkin
285665e93b app/vmselect/promql: allow passing multiple args to aggregate functions such as avg(q1, q2, q3) 2020-08-15 01:15:16 +03:00
Aliaksandr Valialkin
bdb881c43b app/vmselect/promql: add zscore-related functions: zscore_over_time(m[d]) and zscore(q) by (...) 2020-08-03 21:52:15 +03:00
Aliaksandr Valialkin
9dccedc599 app/vmselect/promql: return empty values from group() if all the time series have no values at the given timestamp
This aligns `group()` behaviour to Prometheus
2020-07-28 13:41:04 +03:00
Aliaksandr Valialkin
4d2011a87d app/vmselect/promql: add mode() aggregate function 2020-07-20 15:30:11 +03:00
Aliaksandr Valialkin
eb402a17bd app/vmselect/promql: optimize group(rollup(m)) calculations 2020-07-17 16:47:30 +03:00
Aliaksandr Valialkin
fc8fe38a82 app/vmselect/promql: add group() aggregate function to MetricsQL
This function has been added in Prometheus 2.20. See https://github.com/prometheus/prometheus/pull/7480
2020-07-17 15:17:38 +03:00
Aliaksandr Valialkin
c64914a7e4 app/vmselect/promql: keep all labels for time series from any() call 2020-07-17 15:17:37 +03:00
Aliaksandr Valialkin
7d46dd452a app/vmselect/promql: move common code from aggrFuncOutliersK and newAggrFuncRangeTopK into getRangeTopKTimeseries 2020-05-19 16:11:03 +03:00
Aliaksandr Valialkin
37068064dd app/vmselect/promql: fix outilersk calculations 2020-05-19 14:45:10 +03:00
Aliaksandr Valialkin
fc81ea38d4 app/vmselect/promql: add outliersk(N, m) aggregate function for anomaly detection across groups of similar time series 2020-05-19 13:52:44 +03:00
Aliaksandr Valialkin
408ade27a9 app/vmselect/promql: add any(x) by (y) aggregate function, which returns any time series from q for each group y 2020-05-12 19:50:29 +03:00
Aliaksandr Valialkin
21c2982ac8 app/vmselect/promql: support for sum(x) by (y) limit N syntax in order to limit the number of output time series after aggregation 2020-05-12 19:50:12 +03:00
Aliaksandr Valialkin
9ed4951ec8 lib/metricsql: move it to a separate repository - github.com/VictoriaMetrics/metrics 2020-04-28 15:30:06 +03:00
Aliaksandr Valialkin
453d71d082 Rename lib/promql to lib/metricsql and apply small fixes 2019-12-25 22:09:09 +02:00
Mike Poindexter
009d1559db Split Extended PromQL parsing to a separate library 2019-12-25 22:09:07 +02:00
Aliaksandr Valialkin
b238997a84 all: rename Extended PromQL to PromQL extensions 2019-12-12 19:29:59 +02:00
Aliaksandr Valialkin
d39bba3547 app/vmselect/promql: add {topk|bottomk}_{min|max|avg|median} aggregate functions for returning the exact k time series on the given time range
The full list of functions added:
- `topk_min(k, q)` - returns top K time series with the max minimums on the given time range
- `topk_max(k, q)` - returns top K time series with the max maximums on the given time range
- `topk_avg(k, q)` - returns top K time series with the max averages on the given time range
- `topk_median(k, q)` - returns top K time series with the max medians on the given time range
- `bottomk_min(k, q)` - returns bottom K time series with the min minimums on the given time range
- `bottomk_max(k, q)` - returns bottom K time series with the min maximums on the given time range
- `bottomk_avg(k, q)` - returns bottom K time series with the min averages on the given time range
- `bottomk_median(k, q)` - returns bottom K time series with the min medians on the given time range
2019-12-05 19:27:45 +02:00
Aliaksandr Valialkin
f46fb6c740 app/vmselect/promql: re-use metrics.Histogram when calculating histogram function for each point on the graph
This should reduce the amounts memory allocations
2019-11-25 14:24:30 +02:00
Aliaksandr Valialkin
8bb254d960 app/vmselect/promql: add histogram aggregate function, which is useful for building heatmaps from multiple time series 2019-11-24 00:04:15 +02:00