Commit graph

1012 commits

Author SHA1 Message Date
Aliaksandr Valialkin
b49b8fed3c
app/vmselect/promql: simplify the code after 388d020b7c
Add a test, which verifies the correct sorting of float64 slices with NaNs.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5506
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5509
2024-01-16 15:06:33 +02:00
Aliaksandr Valialkin
388d020b7c
app/vmselect/promql: follow-up for ce4f26db02
- Document the bugfix at docs/CHANGELOG.md
- Filter out NaN values before sorting as suggested at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5509#discussion_r1447369218
- Revert unrelated changes in lib/filestream and lib/fs
- Use simpler test at app/vmselect/promql/exec_test.go

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5509
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5506
2024-01-16 02:55:08 +02:00
Zongyang
ce4f26db02
FIX bottomk doesn't return any data when there are no time range overlap between timeseries (#5509)
* FIX sort order in bottomk

* Add lessWithNaNsReversed for bottomk

* Add ut for TopK

* Move lt from loop

* FIX lint

* FIX lint

* FIX lint

* Mod log format

---------

Co-authored-by: xiaozongyang <xiaozngyang@kanyun.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-16 02:19:36 +02:00
Aliaksandr Valialkin
190a6565ae
app/vmselect/promql: consistently sort results of a or b query
Previously the order of results returned from `a or b` query could change with each request
because the sorting for such query has been disabled in order to satisfy
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4763 .

This commit executes `a or b` query as `sortByMetricName(a) or sortByMetricName(b)`.
This makes the order of returned time series consistent across requests,
while maintaining the requirement from https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4763 ,
e.g. `b` results are consistently put after `a` results.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5393
2024-01-16 01:30:10 +02:00
rbizos
70cd09e736
Handling negative index in Graphite groupByNode/aliasByNode (#5581)
Handeling the error case with -1

Signed-off-by: Raphael Bizos <r.bizos@criteo.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-01-15 09:57:15 +01:00
Aliaksandr Valialkin
f2229c2e42
lib/prompb: change type of Label.Name and Label.Value from []byte to string
This makes it more consistent with lib/prompbmarshal.Label
2024-01-14 22:33:21 +02:00
Roman Khavronenko
8c1dcf4743
app/vmselect: drop rollupDefault function as duplicate (#5502)
* app/vmselect: drop `rollupDefault` function as duplicate

It is unclear why there are two identical fns `rollupDefault`
and `rollupDistinct`. Dropping one of them.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update app/vmselect/promql/rollup.go

* Update app/vmselect/promql/rollup.go

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-12-21 11:22:19 +02:00
Anton Tykhyy
66c76a4d4d
Fix sum(aggr_over_time) 'got 1 args' error (#3028) (#5414)
app/vmselect/promql/eval.go:evalAggrFunc shunts evaluation
of AggrFuncExpr over rollupFunc over MetricsExpr to an optimized
path. tryGetArgRollupFuncWithMetricExpr() checks whether expression
can be shunted, but it mangles the AggrFuncExpr when the aggregation
function has more than one argument. This results in queries like
`sum(aggr_over_time("avg_over_time",m))` failing with error message
'expecting at least 2 args to "aggr_over_time"; got 1 args' while
the analogous query `sum(avg_over_time(m))` executes successfully.
This fix removes the unnecessary mangling.

Signed-off-by: Anton Tykhyy <atykhyy@gmail.com>
2023-12-14 12:38:54 +02:00
Yury Molodov
1a5cdb4790
vmui: autocomplete usability improvements (#5422)
* vmui: add show quick tip for autocomplete

* vmui: auto-completion usability improvements #5348

* vmui: add const for min symbols in autocomplete

* Use proper queries to VictoriaMetrics

* vmui: fix comments for autocomplete

* app/vmselect: run `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-12-13 00:32:41 +02:00
Aliaksandr Valialkin
3d3b0e31e0
app/vmselect: add -search.maxResponseSeries command-line flag for limiting the number of time series a single response can return
This limit can be used for preventing from high memory usage at Grafana when the response returns too many series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5372
2023-12-10 00:54:42 +02:00
Alexander Marshalov
02a0a7f428
added field version to the response for /api/v1/status/buildinfo API for using more efficient API in Grafana for receiving label values, added additional info about setup Grafana datasource (#5370) (#5437) 2023-12-07 16:37:36 +02:00
Aliaksandr Valialkin
802adf3b65
app/vmselect/prometheus: go fmt after b39e9257eb 2023-12-07 16:01:18 +02:00
Aliaksandr Valialkin
b39e9257eb
app/vmselect/prometheus: properly encode Prometheus label values at /federate endpoint
Prometheus spec says that only \, \n and " must be escaped inside label values.
See 995743836e/content/docs/instrumenting/exposition_formats.md (L90)

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5431
2023-12-07 15:36:01 +02:00
Aliaksandr Valialkin
e4f5039509
app/vmselect: properly adjust the lower bound for the time range where raw samples must be selected for default_rollup() function
Previously the lower bound could be too small, which could result in missing values at the beginning of the graph
for default_rollup() function. This function is automatically applied to all the series selectors if they aren't
explicitly wrapped into a rollup function - see https://docs.victoriametrics.com/MetricsQL.html#implicit-query-conversions

While at it, properly take into account `-search.minStalenessInterval` command-line flag when adjusting
the lower bound for the selected time range.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5388
2023-12-06 14:20:14 +02:00
Aliaksandr Valialkin
f62e03b3d2
app/vmselect: do not limit concurrency for static and fast queries
Previously concurrency for static and fast queries was limited with the -search.maxConcurrentRequests
command-line flag. This could complicate identifying heavy queries via `vmui` at `Top queries` and `Active queries` pages,
since `vmui` and these pages couldn't be opened on overloaded vmselect.

Thanks to @f41gh7 for the idea.
2023-12-01 17:25:01 +02:00
luckyxiaoqiang
d7897e0d70
app/vmselect/promql: add day_of_year() function (#5368)
Co-authored-by: dingxiaoqiang <dingxiaoqiang@bytedance.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-11-28 11:54:00 +01:00
Aliaksandr Valialkin
faee0e43d1
app/vmselect/promql: reduce the number of memory allocations inside copyTimeseriesShallow()
Previously the number of memory allocations inside copyTimeseriesShallow() was equal to 1+len(tss)
Reduce this number to 2 by pre-allocating a slice of timeseries structs with len(tss) length.
2023-11-17 15:40:03 +01:00
Aliaksandr Valialkin
aae06e003e
app/vmselect: simplify code a bit after 63e0f16062
Use only a single call to prometheus.WriteErrorResponse() inside sendPrometheusError
2023-11-16 18:14:33 +01:00
Aliaksandr Valialkin
2cbdb1db22
app/vmselect/promql: properly handle duplicate series when merging cached results with the results obtained from the database
evalRollupFuncNoCache() may return time series with identical labels (aka duplicate series)
when performing queries satisfying all the following conditions:

- It must select time series with multiple metric names. For example, {__name__=~"foo|bar"}
- The series selector must be wrapped into rollup function, which drops metric names. For example, rate({__name__=~"foo|bar"})
- The rollup function must be wrapped into aggregate function, which has no streaming optimization.
  For example, quantile(0.9, rate({__name__=~"foo|bar"})

In this case VictoriaMetrics shouldn't return `cannot merge series: duplicate series found` error.
Instead, it should fall back to query execution with disabled cache.

Also properly store the merged results. Previously they were incorrectly stored because of a typo
introduced in the commit 41a0fdaf39

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5332
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5337
2023-11-16 16:01:40 +01:00
Aliaksandr Valialkin
fbc6289a21
app/vmselect/promql: typo fixes after 7cf7740d18 2023-11-14 03:34:37 +01:00
Aliaksandr Valialkin
7cf7740d18
app/vmselect/promql: properly handle instant query optimization conrner cases for min_over_time() and max_over_time()
- If min_over_time(m[offset] @ timestamp) <= min_over_time(m[offset] @ (timestamp-window)),
  then the optimization can be applied.

- If max_over_time(m[offset] @ timestamp) >= max_over_time(m[offset] @ (timestamp-window)),
  then the optimization can be applied.
2023-11-14 02:58:08 +01:00
Yury Molodov
0116a333fb
vmui: reduced the number of server requests (#5253)
* vmui: reduced the number of server requests

* run `make vmui-update vmui-logs-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-14 01:50:00 +01:00
Noah Labrecque
12aefa8a4b
fix: apply correct bounds to sf and tf (#5274) 2023-11-14 01:17:16 +01:00
Aliaksandr Valialkin
8af56ea2ed
lib/htmlcomponents: use relative links for the top page and for favicon.ico
This allows hiding VictoriaMetrics components behind proxies with arbitrary path prefixes.
For example, vmagent HTTP handlers can be served via /vmagent/ path prefix:

- http://proxy/vmagent/targets
- http://proxy/vmagent/service-discovery

The path prefix can be arbitrary. For example, below are vmagent urls
for /tenantID/vmagent/ path prefix:

- http://proxy/tenantID/vmagent/targets
- http://proxy/tenantID/vmagent/service-discovery

While at it, consistently serve favicon.ico from any path directory.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5306
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5307
2023-11-13 20:29:05 +01:00
Aliaksandr Valialkin
cf23dc6480
all: cleanup: remove // +build ... lines, since they are no longer needed after Go1.17, and the minimum supported Go version for VictoriaMetrics source code is Go1.20 2023-11-13 19:12:51 +01:00
Aliaksandr Valialkin
230230cf0b
lib/logger: add -loggerMaxArgLen command-line flag for fine-tuning the maximum length of logged args 2023-11-11 12:30:08 +01:00
Aliaksandr Valialkin
80213f07fa
app/vmselect/promql: optimize instant queries with min_over_time() and max_over_time() rollup functions
This is a follow-up for 41a0fdaf39
2023-11-11 12:10:03 +01:00
Aliaksandr Valialkin
65db6609eb
docs/CHANGELOG.md: update the description of the optimization for SLO/SLI-like queries according to latest changes
See commits 4497a08e3d and 92826b0b4a
2023-11-02 20:05:05 +01:00
Aliaksandr Valialkin
eeb25bc4a5
app/vmselect/promql: add missing trace message in rollupResultCache.GetSeries() 2023-11-02 09:22:48 +01:00
Aliaksandr Valialkin
4497a08e3d
app/vmselect/promql: reduce the minimum lookbehind window for enabling SLO/SLI optimizations from 24 hours to 6 hours
This reduction is based on production testing.

Also expose -search.minWindowForInstantRollupOptimization command-line flag, so users could fine-tune this arg for their needs
2023-11-01 20:18:44 +01:00
Aliaksandr Valialkin
f59dda3223
app/vmselect: run make quicktemplate-gen after b8739bc00b 2023-11-01 17:52:56 +01:00
Aliaksandr Valialkin
b8739bc00b
app/vmselect: return stats.seriesFetched as string instead of number
vmalert expects string value for stats.seriesFetched, so it is impossible
switching to number without breaking compatibility with old vmalert releases :(

It is still unclear why stats.seriesFetched has string type in the first place...
2023-11-01 17:49:59 +01:00
Aliaksandr Valialkin
da887b49e7
app/vmui: show query execution duration in the header of query input field
This should simplify the process of query optimization
2023-11-01 16:43:51 +01:00
Aliaksandr Valialkin
92826b0b4a
app/vmselect/promql: apply SLO-like optimization to all the count_*_over_time() functions
This is a follow-up for 41a0fdaf39
2023-11-01 09:58:16 +01:00
Aliaksandr Valialkin
0189081490
app/vmselect/promql: typo fix, which could lead to panic during range query execution
The panic is:

  BUG: unexpected values after merging new values

This is a follow-up for 41a0fdaf39
2023-11-01 09:58:15 +01:00
Aliaksandr Valialkin
c4c6ee9485
app/vmui: fix non-working Disable cache checkbox at JSON and Table views 2023-10-31 22:58:06 +01:00
Aliaksandr Valialkin
a70818f72f
app/vmselect/promql: properly calculate rollup result if lookbehind window isn't set
This is a follow-up for 41a0fdaf39
2023-10-31 22:22:37 +01:00
Aliaksandr Valialkin
ea81f6fc36
app/vmselect/promql: add outliers_iqr(q) and outlier_iqr_over_time(m[d]) functions
These functions allow detecting anomalies in series and samples using Interquartile range method.
See Outliers section at https://en.wikipedia.org/wiki/Interquartile_range for more details.
2023-10-31 22:10:31 +01:00
Aliaksandr Valialkin
41a0fdaf39
app/vmselect/promql: optimize repeated SLI-like instant queries with lookbehind windows >= 1d
Repeated instant queries with long lookbehind windows, which contain one of the following rollup functions,
are optimized via partial result caching:

- sum_over_time()
- count_over_time()
- avg_over_time()
- increase()
- rate()

The basic idea of optimization is to calculate

  rf(m[d] @ t)

as

  rf(m[offset] @ t) + rf(m[d] @ (t-offset)) - rf(m[offset] @ (t-d))

where rf(m[d] @ (t-offset)) is cached query result, which was calculated previously

The offset may be in the range of up to 1 hour.
2023-10-31 19:25:23 +01:00
Aliaksandr Valialkin
51aab7bb17
app/vmselect/promql: wrap too long line after a950873fff 2023-10-31 18:59:10 +01:00
Roman Khavronenko
a950873fff
app/vmselect: expose vm_memory_intensive_queries_total counter metric (#5208)
The new metric gets increased each time `-search.logQueryMemoryUsage` memory limit
is exceeded by a query. This metric should help to identify expensive and heavy queries
without inspecting the logs.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 13:31:09 +01:00
Aliaksandr Valialkin
9149353a36
app/vmui: change the order of tables at Top queries tab
Move the most interesting table - queries with the most summary time to execute - to the top
2023-10-28 11:56:16 +02:00
Aliaksandr Valialkin
42dd71bb63
all: consistently use %w instead of %s in when error is passed to fmt.Errorf()
This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.
2023-10-25 21:24:03 +02:00
Roman Khavronenko
b8b6e120ff
app/vmselect: limit the number of parallel workers by 32 (#5195)
* app/vmselect: limit the number of parallel workers by 32

The change should improve performance and memory usage during query processing
on machines with big number of CPU cores. The number of parallel workers for
query processing is controlled via `-search.maxWorkersPerQuery` command-line flag.
By default, the number of workers is limited by the number of available CPU cores,
but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

- The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage,
  so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md

- Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag

- Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md

- Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS,
  because bigger values may worsen query performance and increase CPU usage

- Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX,
  since it is closer to a feature than to a bugfix.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-10-18 19:51:37 +02:00
Aliaksandr Valialkin
da77f4deeb
app/vmselect/promql: add labels_equal(q, "label1", "label2", ...) function
This function returns q series, which have identical values for the listed labels
"label1", "label2", ...

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5148
2023-10-16 21:50:11 +02:00
Aliaksandr Valialkin
bdb743c88d
app/vmselect/promql: add drop_empty_series() function for dropping empty series before performing additional calculations
This can be useful in the following queries:

   drop_empty_series(temperature <= 30) default 40

This query drops temperature series with all the values bigger than 30 on the selected time range,
while replacing gaps in the remaining series with 40.

The query without drop_empty_series:

  (temperature <= 30) default 40

would leave all the temperature series with all the values bigger than 30 on the selected time range,
and replace all their values with 40. This is not what could be epxected in some cases
like here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5071
2023-10-16 20:44:56 +02:00
Aliaksandr Valialkin
97a7128e4b
app/vmselect/promql: do not use unsafe conversion from bytes slice to string when storing a value by map key
The assigned map key shouldn't change over time, otherwise the map won't work properly.

This is a follow-up for 1f91f22b5f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087
2023-10-16 01:56:57 +02:00
Aliaksandr Valialkin
930a36df40
app/vmui: small UX enhancements
- Reduce vertical space usage, so more information is available on the screen without the need to scroll.
- Show information for lines with higher values at the top of the legend under the graph.
  This should simplify graph analysis when it contains many lines.
2023-10-12 19:54:19 +02:00
Aliaksandr Valialkin
31f7ef0811
app/{vmselect,vlselect}: enable caching of static contents from /vmui/static/ folder at client side
This should improve repated VMUI page load times on slow networks

See https://developer.chrome.com/docs/lighthouse/performance/uses-long-cache-ttl/
2023-10-12 09:33:40 +02:00
Nikolay
1f91f22b5f
app/vmselect: reduce lock contention for heavy aggregation requests (#5119)
reduce lock contention for heavy aggregation requests
previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests.
Now instead of interning, new string is created. It may increase CPU and memory usage for some cases.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087
2023-10-10 13:45:20 +02:00