Commit graph

518 commits

Author SHA1 Message Date
Aliaksandr Valialkin
0189081490
app/vmselect/promql: typo fix, which could lead to panic during range query execution
The panic is:

  BUG: unexpected values after merging new values

This is a follow-up for 41a0fdaf39
2023-11-01 09:58:15 +01:00
Aliaksandr Valialkin
a70818f72f
app/vmselect/promql: properly calculate rollup result if lookbehind window isn't set
This is a follow-up for 41a0fdaf39
2023-10-31 22:22:37 +01:00
Aliaksandr Valialkin
ea81f6fc36
app/vmselect/promql: add outliers_iqr(q) and outlier_iqr_over_time(m[d]) functions
These functions allow detecting anomalies in series and samples using Interquartile range method.
See Outliers section at https://en.wikipedia.org/wiki/Interquartile_range for more details.
2023-10-31 22:10:31 +01:00
Aliaksandr Valialkin
41a0fdaf39
app/vmselect/promql: optimize repeated SLI-like instant queries with lookbehind windows >= 1d
Repeated instant queries with long lookbehind windows, which contain one of the following rollup functions,
are optimized via partial result caching:

- sum_over_time()
- count_over_time()
- avg_over_time()
- increase()
- rate()

The basic idea of optimization is to calculate

  rf(m[d] @ t)

as

  rf(m[offset] @ t) + rf(m[d] @ (t-offset)) - rf(m[offset] @ (t-d))

where rf(m[d] @ (t-offset)) is cached query result, which was calculated previously

The offset may be in the range of up to 1 hour.
2023-10-31 19:25:23 +01:00
Aliaksandr Valialkin
51aab7bb17
app/vmselect/promql: wrap too long line after a950873fff 2023-10-31 18:59:10 +01:00
Roman Khavronenko
a950873fff
app/vmselect: expose vm_memory_intensive_queries_total counter metric (#5208)
The new metric gets increased each time `-search.logQueryMemoryUsage` memory limit
is exceeded by a query. This metric should help to identify expensive and heavy queries
without inspecting the logs.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 13:31:09 +01:00
Aliaksandr Valialkin
da77f4deeb
app/vmselect/promql: add labels_equal(q, "label1", "label2", ...) function
This function returns q series, which have identical values for the listed labels
"label1", "label2", ...

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5148
2023-10-16 21:50:11 +02:00
Aliaksandr Valialkin
bdb743c88d
app/vmselect/promql: add drop_empty_series() function for dropping empty series before performing additional calculations
This can be useful in the following queries:

   drop_empty_series(temperature <= 30) default 40

This query drops temperature series with all the values bigger than 30 on the selected time range,
while replacing gaps in the remaining series with 40.

The query without drop_empty_series:

  (temperature <= 30) default 40

would leave all the temperature series with all the values bigger than 30 on the selected time range,
and replace all their values with 40. This is not what could be epxected in some cases
like here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5071
2023-10-16 20:44:56 +02:00
Aliaksandr Valialkin
97a7128e4b
app/vmselect/promql: do not use unsafe conversion from bytes slice to string when storing a value by map key
The assigned map key shouldn't change over time, otherwise the map won't work properly.

This is a follow-up for 1f91f22b5f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087
2023-10-16 01:56:57 +02:00
Nikolay
1f91f22b5f
app/vmselect: reduce lock contention for heavy aggregation requests (#5119)
reduce lock contention for heavy aggregation requests
previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests.
Now instead of interning, new string is created. It may increase CPU and memory usage for some cases.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087
2023-10-10 13:45:20 +02:00
Aliaksandr Valialkin
71668637ce
app/vmselect/promql: follow-up for 896c85a4a4
- Clarify the description of the change at docs/CHANGELOG.md
- Make sure that bitmap_*(X, NaN) returns NaN

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4996
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5021
2023-10-02 20:08:26 +02:00
Dmytro Kozlov
896c85a4a4
app/vmselect: fix bitmap_*() functions behavior (#5021)
Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4996

Signed-off-by: dmitryk-dk d.kozlov@victoriametrics.com

Signed-off-by: dmitryk-dk d.kozlov@victoriametrics.com
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-29 12:03:01 +02:00
Aliaksandr Valialkin
e453069dcd
app/vmselect/promql: run make fmt after 3b9605dba5 2023-09-25 16:16:14 +02:00
Aliaksandr Valialkin
3b9605dba5
app/vmselect/promql: do not sort q1 or q2 results
This makes sure that `q2` series are returned after `q1` series in the same way as Prometheus does

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4763
2023-09-25 16:14:16 +02:00
Aliaksandr Valialkin
a740159541
app/vmselect/promql: completely substitute median_over_time() WITH template with regular median_over_time() rollup function
This is a follow-up for 34d7a670d0

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034
2023-09-25 15:28:12 +02:00
Zakhar Bessarab
34d7a670d0
app/vmselect/promql: add implementation of median_over_time for rollup functions list (#5042)
`median_over_time` is handled by predefined WITH template in MetricsQL library which translates it to `quantile_over_time(0.5)`
This makes it impossble to use `median_over_time` as a usual rollup function for `aggr_over_time`.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-25 14:01:00 +02:00
Dmytro Kozlov
d5f9619984
vmagent: add validation of MetricsQL functions (#4991)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-15 13:15:23 +02:00
Aliaksandr Valialkin
edee262ecc
Makefile: update golangci-lint from v1.51.2 to v1.54.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2
2023-09-01 10:16:42 +02:00
Aliaksandr Valialkin
1c0e065216
app/vmselect/promql: add support for _ delimiters in numeric values
For example, 1_234_567_890 is equivalent to 1234567890,
while 1.234_567_890 is equivalent to 1.234567890
2023-08-30 14:33:41 +02:00
Aliaksandr Valialkin
072d891ed9
app/vmselect: prevent from panic when lookbehind window inside rollup function is parsed into negative value
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4795
2023-08-12 04:47:53 -07:00
Aliaksandr Valialkin
ac0b7e0421
Revert "vmui: change the response for active queries (#4782)"
This reverts commit 252643d100.

Reason for revert: the commit incorrectly fixes the the issue.
The `remoteAddr` must be properly quoted inside lib/httpserver.GetQuotedRemoteAddr().
It isn't quoted properly if the request contains X-Forwarded-For header.

The proper fix will be included in the follow-up commit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4676
2023-08-11 05:06:40 -07:00
Yury Molodov
252643d100
vmui: change the response for active queries (#4782)
* fix: change the response to a valid json (#4676)

* vmui/docs: fix response of active queries

https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4676
2023-08-10 12:27:28 +02:00
Damon07
3f6efab6ae
{app/vmselect,docs}: support share_eq_over_time#4441 (#4725)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4441

Co-authored-by: wangm <wangmm@tuya.com>
2023-07-31 15:23:59 +02:00
Aliaksandr Valialkin
bd95341190
app/vmselect/promql: fix tests after 781947a7e2 2023-07-20 21:25:38 -07:00
Aliaksandr Valialkin
c5f94fa5fc
app/vmselect: rename promql.WriteActiveQueries() to promql.ActiveQueriesHandler()
This makes it more consistent with the rest of handlers inside app/vmselect/main.go

This is a follow-up for 6a96fd8ed5

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4598
2023-07-20 11:32:33 -07:00
Aliaksandr Valialkin
140e7b6b74
all: replace atomic.Value with atomic.Pointer[T]
This eliminates the need in .(*T) casting for results obtained from Load()

Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map,
because map is already a pointer type.
2023-07-19 17:42:06 -07:00
Aliaksandr Valialkin
8a91eb25c4
app/vmselect: follow-up after 6a96fd8ed5
- Add `Active queries` chapter to VMUI docs
- Set `Content-Type: json` header inside promql.WriteActiveQueries() handler,
  in order to be consistent with other request handlers called at app/vmselect/main.go
- Pass the request to promql.WriteActiveQueries() handler, so it can change its output
  depending on the provided request params. This also improves consistency of
  promql.WriteActiveQueries() args with other request hanlers at app/vmselect/main.go

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4653
2023-07-19 16:26:03 -07:00
Yury Molodov
6a96fd8ed5
vmui: add Active Queries page (#4653)
* feat: add page to display a list of active queries (#4598)

* app/vmagent: code formatting

* fix: remove console

---------

Co-authored-by: dmitryk-dk <kozlovdmitriyy@gmail.com>
2023-07-19 15:47:21 -07:00
Aliaksandr Valialkin
8815080030
app/vmselect/promql: add the ability to copy all the labels from one side of group_left()/group_right() operation
This is performed by specifying `*` inside group_left()/group_right().
Also allow specifying prefix for the copied labels via `group_left(...) prefix "..."` and `group_right(...) prefix "..."` syntax.
For example, the following query adds all the namespace-related labels to pod info, and prefixes all the copied label names with "ns_" prefix:

  kube_pod_info * on(namespace) group_left(*) prefix "ns_" kube_namespace_labels

This resolves the following StackOverflow questions:

- https://stackoverflow.com/questions/76661818/how-to-add-namespace-labels-to-pod-labels-in-prometheus
- https://stackoverflow.com/questions/76653997/how-can-i-make-a-new-copy-of-kube-namespace-labels-metric-with-a-different-name
2023-07-17 19:07:39 -07:00
Aliaksandr Valialkin
be31bdc88c
app/vmselect/promql: recommend to use (a op b) keep_metric_names instead of a op b keep_metric_names
The `a op b keep_metric_names` is ambigouos to `a op (b keep_metric_names)` when `b` is a transform or rollup function.
For example, `a + rate(b) keep_metric_names`. So it is better to use more clear syntax: `(a op b) keep_metric_names`

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3710
2023-07-16 23:46:34 -07:00
Zakhar Bessarab
e2367b6d1c
metricsql: add support of using keep_metric_names for binary operations (#4109)
* metricsql: add support of using keep_metric_names for binary operations

This should help to avoid confusion with queries like one in the issue #3710.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* wip

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-07-16 03:00:39 -07:00
Aliaksandr Valialkin
4cb024d8a3
all: add support for or filters in series selectors
This commit adds ability to select series matching distinct filters via a single series selector.
For example, the following selector selects series with either {env="prod",job="a"}
or {env="dev",job="b"} labels:

  {env="prod",job="a" or env="dev",job="b"}

The `or` filter is supported in all the VictoriaMetrics tools now.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3997
Uses https://github.com/VictoriaMetrics/metricsql/pull/14
2023-07-16 00:06:33 -07:00
Haleygo
20e7db47ee
vmselect: fix result in Prometheus query when time is small (#4578)
vmselect: fix result in Prometheus query when time is small

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-07-07 11:48:05 +02:00
Aliaksandr Valialkin
30425ca81a
lib/fs: rename WriteFileAtomically to MustWriteAtomic
Callers of this function log the returned error and exit.
So let's just log the error with the given filepath and the call stack
inside the function itself and then exit. This simplifies the code
at callers' place while leaves the same level of debuggability in case of errors.
2023-04-13 22:41:15 -07:00
Aliaksandr Valialkin
02ee4ffd4d
app/vmselect/promql: follow-up for 79e1c6a6fc
- Document the fix at docs/CHANGELOG.md
- Add tests with multiple adjancent zero buckets
- Simplify the fix a bit

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/296
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4021
2023-03-27 18:03:36 -07:00
Ze'ev Klapow
79e1c6a6fc
fix le buckets when adjacent vmrange is empty (#4021)
There is a bug here where if you have a single bucket like:

foo{vmrange="4.084e+02...4.642e+02"} 2 123

The expected output is three le encoded buckets like:

foo{le="4.084e+02"} 0 123
foo{le="4.642e+02"} 2 123
foo{le="+Inf"} 2 123

This correctly encodes the start and end of the vmrange.
If however, the input contains the previous bucket, and that bucket is
empty then you only get the end le and +Inf out currently, i.e:

foo{vmrange="7.743e+05...8.799e+05"} 5 123
foo{vmrange="6.813e+05...7.743e+05"} 0 123

results in:

foo{le="8.799e+05"} 5 123
foo{le="+Inf"} 5 123

This causes issues when you go to compute a quantile because this means
that the assumed lower bound of the buckets is 0 and this we interpolate
between 0->end rather than the vmrange start->end as expected.
2023-03-27 17:54:19 -07:00
Aliaksandr Valialkin
622000797a
app/vmselect: follow-up for 10ab086366
- Expose stats.seriesFetched at `/api/v1/query_range` responses too
  for the sake of consistency.

- Initialize QueryStats when it is needed and pass it to EvalConfig then.
  This guarantees that the QueryStats is properly collected when the query
  contains some subqueries.
2023-03-27 15:22:00 -07:00
Roman Khavronenko
4021aa11b5
app/vmselect: export seriesFetched stat for /query responses (#3925)
The change adds a new field `seriesFetched` to EvalConfig object.
Since EvalConfig object can be copied inside `Exec`,
`seriesFetched` is a pointer which can be updated by all copied
objects.

The reason for having stats is that other components, like vmalert,
could benefit from this information.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-27 15:18:25 -07:00
Aliaksandr Valialkin
2b851e69d2
app/vmselect/promql: typo fix after e7f46a0aab 2023-03-24 23:46:30 -07:00
Aliaksandr Valialkin
e7f46a0aab
app/vmselect/promql: follow-up for 7205c79c5a
- Allocate and initialize seriesByWorkerID slice in a single go instead
  of initializing every item in the list separately.
  This should reduce CPU usage a bit.
- Properly set anti-false sharing padding at timeseriesWithPadding structure
- Document the change at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-24 23:34:37 -07:00
Zakhar Bessarab
7205c79c5a
app/vmselect/promql: use lock-less approach to gather results of parallel processing for evalRollup* funcs (#4004)
* vmselect/promql: refactor `evalRollupNoIncrementalAggregate` to use lock-less approach for parallel workers computation

Locking there is causing issues when running on highly multi-core system as it introduces lock contention during results merge.

New implementation uses lock less approach to store results per workerID and merges final result in the end, this is expected to significantly reduce lock contention and CPU usage for systems with high number of cores.

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* vmselect/promql: add pooling for `timeseriesWithPadding` to reduce allocations

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* vmselect/promql: refactor `evalRollupFuncWithSubquery` to avoid using locks

Uses same approach as `evalRollupNoIncrementalAggregate` to remove locking between workers and reduce lock contention.

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-24 23:07:12 -07:00
Aliaksandr Valialkin
e480b9881e
app/vmselect/promql: pass workerID to the callback inside doParallel()
This opens the possibility to remove tssLock from evalRollupFuncWithSubquery()
in the follow-up commit from @zekker6 in order to speed up the code
for systems with many CPU cores.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-20 20:54:57 -07:00
Aliaksandr Valialkin
9e16329b2f
app/vmselect/promql: fix TestIncrementalAggr test on systems less than 3 CPU cores
This is a follow-up for 4856a4cf5a
2023-03-20 20:37:18 -07:00
Aliaksandr Valialkin
4856a4cf5a
app/vmselect: optimize incremental aggregates a bit
Substitute sync.Map with an ordinary slice indexed by workerID.
This should reduce the overhead when updating the incremental aggregate state
2023-03-20 15:37:06 -07:00
oliverpool
fbefc940ef
app/vmselect/promql: add test to ensure 8-byte alignment (#3948)
See 0af9e2b693
2023-03-16 09:01:42 -07:00
Aliaksandr Valialkin
e8225d7d6b
app/vmselect/promql: prevent from cannot unmarshal timeseries from rollupResultCache panic after the upgrade to v1.89.0
The issue has been introduced in 0af9e2b693
2023-03-12 19:09:39 -07:00
Aliaksandr Valialkin
0af9e2b693
app/vmselect/promql: prevent from SIGBUS crash on architecures, which deny unaligned access to 8-byte words (e.g. ARM)
Thanks to @oliverpool for nailing down the root cause of the issue and for the initial attempt to fix it
at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3927
2023-03-12 16:32:08 -07:00
Aliaksandr Valialkin
b3bb18d674
app/vmselect/promql: fix panic when calculating aggr_func(rollup*())
The panic has been introduced in dac21d874b
2023-02-27 11:48:27 -08:00
Aliaksandr Valialkin
f7ef80aaad
.golangci.yml: properly enable revive linter and fix all the warnings it detects 2023-02-26 12:18:59 -08:00
Roman Khavronenko
e1c3267e34
vmselect/promql: check for deadline in count_values fn (#3806)
* vmselect/promql: check for deadline in `count_values` fn

`count_values` could be very slow during the data processing.
Checking for deadline between iterations supposed to reduce
probability of exceeding `search.maxQueryDuration`.

The change also adds a new trace record, which captures the time
spent in aggregation function. Before that, the trace for aggr funcs
could be confusing since it doesn't account for all the places where
time was spent.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-24 16:59:26 -08:00