Commit graph

146 commits

Author SHA1 Message Date
Andrei Baidarov
3120dc2054
app/vmselect: fixes possible panics for multitenant queries
This commit fixes panic for multitenant requests and empty storage node responses for tenants api.

 It also optimizes `populateSqTenantTokensIfNeeded` function calls, by making it only once for query request. Previously it was incorrectly called multiple times per each storage node request.

Related issue: 
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7549
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2024-11-15 16:12:30 +01:00
Roman Khavronenko
4114301955
lib/flagutil: rename Duration to RetentionDuration (#7284)
The purpose of this change is to reduce confusion between using
`flag.Duration` and `flagutils.Duration`. The reason is that
`flagutils.Duration` was mistakenly used for cases that required `m`
support. See
ab0d31a7b0

The change in name should clearly indicate the purpose of this data
type.

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-17 11:18:45 -03:00
Roman Khavronenko
f825a9de80
app/vmselect/promql: fix seriesFetched update logic (#7181)
### Describe Your Changes

evalInstantRollup could have overreport the number of fetched series if
`offset` checks will result into retry. This change updates fetched
series only if these checks were successful.

It also adds a comment to another potential place of over-reporting
series fetched. It doesn't fix it, because it would require spending
extra resources on such a check, while discrepancy in seriesFetched
doesn't affect calculations in any way.

Probably related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7170

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit ebd393d8b3)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-07 14:47:22 +02:00
Zakhar Bessarab
44b071296d
vmselect: add support of multi-tenant queries (#6346)
### Describe Your Changes

Added an ability to query data across multiple tenants. See:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1434

Currently, the following endpoints work with multi-tenancy:
- /prometheus/api/v1/query
- /prometheus/api/v1/query_range
- /prometheus/api/v1/series
- /prometheus/api/v1/labels
- /prometheus/api/v1/label/<label_name>/values
- /prometheus/api/v1/status/active_queries
- /prometheus/api/v1/status/top_queries
- /prometheus/api/v1/status/tsdb
- /prometheus/api/v1/export
- /prometheus/api/v1/export/csv
- /vmui


A note regarding VMUI: endpoints such as `active_queries` and
`top_queries` have been updated to indicate whether query was a
single-tenant or multi-tenant, but UI needs to be updated to display
this info.
cc: @Loori-R 

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2024-10-01 16:37:18 +02:00
Aliaksandr Valialkin
01c8e12370
app/vlselect: add /select/logsql/stats_query endpoint, which is going to be used by vmalert
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6942
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706
2024-09-06 23:00:58 +02:00
Aliaksandr Valialkin
d6415b2572
all: consistently use 'any' instead of 'interface{}'
'any' type is supported starting from Go1.18. Let's consistently use it
instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.
2024-07-10 00:23:26 +02:00
Aliaksandr Valialkin
6b81441ed0
app/vmselect: use strings.EqualFold instead of strings.ToLower where appropriate
Strings.EqualFold doesn't allocate memory contrary to strings.ToLower if the input string contains uppercase chars
2024-05-12 10:21:24 +02:00
Aliaksandr Valialkin
536d87cd51
app/vmselect/promql: properly estimate the needed amounts of memory for executing aggregate function over rollup function in incremental mode
Incremental aggregation processes only GOMAXPROCS time series at a time, so its' memory usage doesn't depend
on the number of input time series.

The issue has been introduced in 5138eaeea0

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3203
2024-05-12 10:14:27 +02:00
Aliaksandr Valialkin
b3dbbc22b9
app/vmselect/promql: properly handle args in count_values_over_time() function
Prevsiously they were swapped - the first arg should be the label name and the second arg should be label filters

This is a follow-up for e389b7b959e8144fdff5075bf7a5a39b2b0c6dd3

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5847
2024-02-25 01:48:37 +02:00
Aliaksandr Valialkin
63d635a5e4
app: consistently use atomic.* types instead of atomic.* functions
See ea9e2b19a5
2024-02-24 03:06:14 +02:00
Aliaksandr Valialkin
65fb54ab8f
app/vmselect/promql: move needSilenceIntervalForRollupFunc from eval.go to rollup.go
This should improve maintainability of the code related to rollup functions,
since it is located in rollup.go

While at it, properly return empty results from holt_winters(), rate_over_sum(),
sum2_over_time(), geomean_over_time() and distinct_over_time() when there are no real samples
on the selected lookbehind window. Previously the previous sample value was mistakenly
returned from these functions.
2024-02-23 01:05:11 +02:00
Aliaksandr Valialkin
202d8e2c40
docs: update -help output after 61d9df4c36
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/834
2024-02-08 14:50:56 +02:00
Roman Khavronenko
02e609b141
app/vmselect: set proper timestamp for cached instant responses (#5723)
* app/vmselect: set proper timestamp for cached instant responses

The change updates `getSumInstantValues` to prefer timestamp
from the most recent results. Before, timestamp from cached series
was used.

The old behavior had negative impact on recording rules as they
were getting responses with shifted timestamps in past.
Subsequent recording or alerting rules fetching results of these
recording rules could get no result due to staleness interval.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5659
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-30 22:20:16 +02:00
Aliaksandr Valialkin
d4a1a28543
app/vmselect: handle negative time range start in a generic manner inside NewSearchQuery()
This is a follow-up for cf03e11d89

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5553
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5630
2024-01-22 01:39:27 +02:00
Anton Tykhyy
51af1dfff7
Fix sum(aggr_over_time) 'got 1 args' error (#3028) (#5414)
app/vmselect/promql/eval.go:evalAggrFunc shunts evaluation
of AggrFuncExpr over rollupFunc over MetricsExpr to an optimized
path. tryGetArgRollupFuncWithMetricExpr() checks whether expression
can be shunted, but it mangles the AggrFuncExpr when the aggregation
function has more than one argument. This results in queries like
`sum(aggr_over_time("avg_over_time",m))` failing with error message
'expecting at least 2 args to "aggr_over_time"; got 1 args' while
the analogous query `sum(avg_over_time(m))` executes successfully.
This fix removes the unnecessary mangling.

Signed-off-by: Anton Tykhyy <atykhyy@gmail.com>
2023-12-14 12:49:01 +02:00
Aliaksandr Valialkin
509339bf63
app/vmselect: properly adjust the lower bound for the time range where raw samples must be selected for default_rollup() function
Previously the lower bound could be too small, which could result in missing values at the beginning of the graph
for default_rollup() function. This function is automatically applied to all the series selectors if they aren't
explicitly wrapped into a rollup function - see https://docs.victoriametrics.com/MetricsQL.html#implicit-query-conversions

While at it, properly take into account `-search.minStalenessInterval` command-line flag when adjusting
the lower bound for the selected time range.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5388
2023-12-06 14:46:18 +02:00
Aliaksandr Valialkin
633ec37022
app/vmselect/promql: typo fix after 7ca8ebef20
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5332
2023-11-16 17:01:19 +01:00
Aliaksandr Valialkin
7ca8ebef20
app/vmselect/promql: properly handle duplicate series when merging cached results with the results obtained from the database
evalRollupFuncNoCache() may return time series with identical labels (aka duplicate series)
when performing queries satisfying all the following conditions:

- It must select time series with multiple metric names. For example, {__name__=~"foo|bar"}
- The series selector must be wrapped into rollup function, which drops metric names. For example, rate({__name__=~"foo|bar"})
- The rollup function must be wrapped into aggregate function, which has no streaming optimization.
  For example, quantile(0.9, rate({__name__=~"foo|bar"})

In this case VictoriaMetrics shouldn't return `cannot merge series: duplicate series found` error.
Instead, it should fall back to query execution with disabled cache.

Also properly store the merged results. Previously they were incorrectly stored because of a typo
introduced in the commit 41a0fdaf39

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5332
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5337
2023-11-16 16:16:17 +01:00
Aliaksandr Valialkin
2f885d8e57
app/vmselect/promql: typo fixes after 7cf7740d18 2023-11-14 03:34:25 +01:00
Aliaksandr Valialkin
9ff1ee333f
app/vmselect/promql: properly handle instant query optimization conrner cases for min_over_time() and max_over_time()
- If min_over_time(m[offset] @ timestamp) <= min_over_time(m[offset] @ (timestamp-window)),
  then the optimization can be applied.

- If max_over_time(m[offset] @ timestamp) >= max_over_time(m[offset] @ (timestamp-window)),
  then the optimization can be applied.
2023-11-14 02:58:18 +01:00
Aliaksandr Valialkin
d9ecc3f6d7
lib/logger: add -loggerMaxArgLen command-line flag for fine-tuning the maximum length of logged args 2023-11-13 09:43:49 +01:00
Aliaksandr Valialkin
c916294b61
app/vmselect/promql: optimize instant queries with min_over_time() and max_over_time() rollup functions
This is a follow-up for 41a0fdaf39
2023-11-13 09:43:18 +01:00
Aliaksandr Valialkin
bf01a97f17
docs/CHANGELOG.md: update the description of the optimization for SLO/SLI-like queries according to latest changes
See commits 4497a08e3d and 92826b0b4a
2023-11-02 20:09:22 +01:00
Aliaksandr Valialkin
ece7024f11
app/vmselect/promql: reduce the minimum lookbehind window for enabling SLO/SLI optimizations from 24 hours to 6 hours
This reduction is based on production testing.

Also expose -search.minWindowForInstantRollupOptimization command-line flag, so users could fine-tune this arg for their needs
2023-11-01 20:19:19 +01:00
Aliaksandr Valialkin
6a98f9df54
app/vmui: show query execution duration in the header of query input field
This should simplify the process of query optimization
2023-11-01 16:46:42 +01:00
Aliaksandr Valialkin
c5e3b11762
app/vmselect/promql: apply SLO-like optimization to all the count_*_over_time() functions
This is a follow-up for 41a0fdaf39
2023-11-01 09:58:50 +01:00
Aliaksandr Valialkin
b96d55e1e4
app/vmselect/promql: typo fix, which could lead to panic during range query execution
The panic is:

  BUG: unexpected values after merging new values

This is a follow-up for 41a0fdaf39
2023-11-01 09:58:50 +01:00
Aliaksandr Valialkin
7b7ad44e84
app/vmselect/promql: properly calculate rollup result if lookbehind window isn't set
This is a follow-up for 41a0fdaf39
2023-10-31 22:23:04 +01:00
Aliaksandr Valialkin
9661918bb4
app/vmselect/promql: optimize repeated SLI-like instant queries with lookbehind windows >= 1d
Repeated instant queries with long lookbehind windows, which contain one of the following rollup functions,
are optimized via partial result caching:

- sum_over_time()
- count_over_time()
- avg_over_time()
- increase()
- rate()

The basic idea of optimization is to calculate

  rf(m[d] @ t)

as

  rf(m[offset] @ t) + rf(m[d] @ (t-offset)) - rf(m[offset] @ (t-d))

where rf(m[d] @ (t-offset)) is cached query result, which was calculated previously

The offset may be in the range of up to 1 hour.
2023-10-31 20:08:38 +01:00
Aliaksandr Valialkin
9ba007a636
app/vmselect/promql: wrap too long line after a950873fff 2023-10-31 19:11:05 +01:00
Roman Khavronenko
9d8f93050c
app/vmselect: expose vm_memory_intensive_queries_total counter metric (#5208)
The new metric gets increased each time `-search.logQueryMemoryUsage` memory limit
is exceeded by a query. This metric should help to identify expensive and heavy queries
without inspecting the logs.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 19:02:22 +01:00
Nikolay
4a50e9400c
app/vmselect: reduce lock contention for heavy aggregation requests (#5119)
reduce lock contention for heavy aggregation requests
previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests.
Now instead of interning, new string is created. It may increase CPU and memory usage for some cases.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087
2023-10-10 13:44:02 +02:00
Aliaksandr Valialkin
5c80b11c15
app/vmselect: prevent from panic when lookbehind window inside rollup function is parsed into negative value
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4795
2023-08-12 04:49:56 -07:00
Aliaksandr Valialkin
a7fdc3fcc7
all: add support for or filters in series selectors
This commit adds ability to select series matching distinct filters via a single series selector.
For example, the following selector selects series with either {env="prod",job="a"}
or {env="dev",job="b"} labels:

  {env="prod",job="a" or env="dev",job="b"}

The `or` filter is supported in all the VictoriaMetrics tools now.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3997
Uses https://github.com/VictoriaMetrics/metricsql/pull/14
2023-07-15 23:56:18 -07:00
Haleygo
ef8e3eb9b3
vmselect: fix result in Prometheus query when time is small (#4578)
vmselect: fix result in Prometheus query when time is small

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-07-09 12:33:29 -07:00
Aliaksandr Valialkin
9387793f47
app/vmselect: follow-up for 10ab086366
- Expose stats.seriesFetched at `/api/v1/query_range` responses too
  for the sake of consistency.

- Initialize QueryStats when it is needed and pass it to EvalConfig then.
  This guarantees that the QueryStats is properly collected when the query
  contains some subqueries.
2023-03-27 15:11:42 -07:00
Roman Khavronenko
10ab086366
app/vmselect: export seriesFetched stat for /query responses (#3925)
The change adds a new field `seriesFetched` to EvalConfig object.
Since EvalConfig object can be copied inside `Exec`,
`seriesFetched` is a pointer which can be updated by all copied
objects.

The reason for having stats is that other components, like vmalert,
could benefit from this information.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-27 08:51:33 -07:00
Aliaksandr Valialkin
740fa57fdc
app/vmselect/promql: typo fix after e7f46a0aab 2023-03-24 23:47:11 -07:00
Aliaksandr Valialkin
7aff6f872f
app/vmselect/promql: follow-up for 7205c79c5a
- Allocate and initialize seriesByWorkerID slice in a single go instead
  of initializing every item in the list separately.
  This should reduce CPU usage a bit.
- Properly set anti-false sharing padding at timeseriesWithPadding structure
- Document the change at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-24 23:39:43 -07:00
Zakhar Bessarab
fec87e3ada
app/vmselect/promql: use lock-less approach to gather results of parallel processing for evalRollup* funcs (#4004)
* vmselect/promql: refactor `evalRollupNoIncrementalAggregate` to use lock-less approach for parallel workers computation

Locking there is causing issues when running on highly multi-core system as it introduces lock contention during results merge.

New implementation uses lock less approach to store results per workerID and merges final result in the end, this is expected to significantly reduce lock contention and CPU usage for systems with high number of cores.

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* vmselect/promql: add pooling for `timeseriesWithPadding` to reduce allocations

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* vmselect/promql: refactor `evalRollupFuncWithSubquery` to avoid using locks

Uses same approach as `evalRollupNoIncrementalAggregate` to remove locking between workers and reduce lock contention.

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-24 23:39:41 -07:00
Aliaksandr Valialkin
79d8f0e7c6
app/vmselect/promql: pass workerID to the callback inside doParallel()
This opens the possibility to remove tssLock from evalRollupFuncWithSubquery()
in the follow-up commit from @zekker6 in order to speed up the code
for systems with many CPU cores.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-20 20:57:34 -07:00
Aliaksandr Valialkin
a6a4beb89a
app/vmselect: remove data race on updating EvalConfig.IsPartialResponse from concurrently running goroutines
This properly returns `is_partial: true` for partial responses.
2023-03-12 16:53:03 -07:00
Aliaksandr Valialkin
8efa9159cf
app/vmselect/promql: measure the time required for calculating the aggregate function from the prepared source time series 2023-02-23 20:06:02 -08:00
Aliaksandr Valialkin
6369c88a68
app/vmselect: add -search.logQueryMemoryUsage command-line flag for logging queries, which take big amounts of memory
Thanks to @michal-kralik for initial attempts for this feature:

- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3651
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3715

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3553
2023-02-23 18:52:44 -08:00
Oleksandr Redko
0e1c395609
app,lib: fix typos in comments (#3804) 2023-02-13 09:32:35 -08:00
Aliaksandr Valialkin
fe8802bbc8
app/vmselect/promql: reduce the number of memory allocations inside getCommonLabelFilters()
This should improve performance a bit for `q1 op q2` queries
2023-01-15 12:56:21 -08:00
Aliaksandr Valialkin
d33a65e401
app/vmselect/promql: reduce memory allocations at getCommonLabelFilters() function
Intern tag keys and values there
2023-01-12 01:27:34 -08:00
Aliaksandr Valialkin
8a35377cf3
app/vmselect/promql: move the eval function args in parallel query trace outside the loop 2023-01-10 22:23:43 -08:00
Aliaksandr Valialkin
b0fefe562a
app/vmselect/promql: optimize e1 op e2 when e1 returns an empty result
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3349
2022-11-21 16:09:49 +02:00
Aliaksandr Valialkin
a0f3247b14
app/vmselect/promql: properly handle zero and negative values for -search.maxMemoryPerQuery
This is a follow-up for 04a05f161c

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3203
2022-10-12 09:33:16 +03:00