Commit graph

1068 commits

Author SHA1 Message Date
Roman Khavronenko
53b2e6aa50
app/vmselect: limit the number of parallel workers by 32 (#5195)
* app/vmselect: limit the number of parallel workers by 32

The change should improve performance and memory usage during query processing
on machines with big number of CPU cores. The number of parallel workers for
query processing is controlled via `-search.maxWorkersPerQuery` command-line flag.
By default, the number of workers is limited by the number of available CPU cores,
but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

- The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage,
  so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md

- Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag

- Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md

- Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS,
  because bigger values may worsen query performance and increase CPU usage

- Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX,
  since it is closer to a feature than to a bugfix.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-10-26 11:50:23 +02:00
Aliaksandr Valialkin
26adfb3180
app/vmselect/promql: do not use unsafe conversion from bytes slice to string when storing a value by map key
The assigned map key shouldn't change over time, otherwise the map won't work properly.

This is a follow-up for 1f91f22b5f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087
2023-10-16 13:57:25 +02:00
Nikolay
d04464b76b
app/vmselect: reduce lock contention for heavy aggregation requests (#5119)
reduce lock contention for heavy aggregation requests
previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests.
Now instead of interning, new string is created. It may increase CPU and memory usage for some cases.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087
2023-10-16 02:01:07 +02:00
Aliaksandr Valialkin
7abc2cd709
app/vmselect/promql: completely substitute median_over_time() WITH template with regular median_over_time() rollup function
This is a follow-up for 34d7a670d0

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034
2023-09-25 15:54:56 +02:00
Zakhar Bessarab
857676421a
app/vmselect/promql: add implementation of median_over_time for rollup functions list (#5042)
`median_over_time` is handled by predefined WITH template in MetricsQL library which translates it to `quantile_over_time(0.5)`
This makes it impossble to use `median_over_time` as a usual rollup function for `aggr_over_time`.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-25 15:54:56 +02:00
Aliaksandr Valialkin
4353ed66c4
app/vmselect/promql: fix tests after the upgrade of github.com/VictoriaMetrics/metricsql from v0.64.0 to v0.65.0 in bfbd0b478e 2023-09-19 01:10:09 +02:00
Konstantin
a583d2df25
app/vmselect: return +Inf as null in graphite render api (#5009)
Signed-off-by: Konstantin Kulikov <k.kulikov2@gmail.com>
2023-09-18 16:42:16 +02:00
Aliaksandr Valialkin
730cb73109
app/vmselect/netstorage: run make fmt after 58326dbf25 2023-09-10 15:24:54 +02:00
Aliaksandr Valialkin
2804c18a7c
app/vmselect: return 503 status code when partial responses are denied and some of vmstorage nodes are temporarily unavailable
This should help detecting this case and automatic retrying the query at healthy cluster replica
in another availability zone.

This commit is needed as a preparation for automatic query retry at another backend at vmauth on 5xx errors
as described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4792#issuecomment-1674338561
2023-09-07 16:15:09 +02:00
Aliaksandr Valialkin
7db72dd7e6
lib/auth: add NewTokenPossibleMultitenant() for parsing auth token, which can be multitenant
Disallow parsing multitenant token at auth.NewToken().

Use auth.NewTokenPossibleMultitenant() at vminsert only. All the other callers should call auth.NewToken(),
since they do not support multitenant token.

This is a follow-up for f0c06b428e

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4910
2023-09-06 12:08:18 +02:00
Aliaksandr Valialkin
2388c3c192
Makefile: update golangci-lint from v1.51.2 to v1.54.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2
2023-09-01 11:15:51 +02:00
Zakhar Bessarab
56c54bf968
app/vmselect: fix panic when using /select/multitenant endpoint (#4912)
app/vmselect: fix panic when using `/select/multitenant` endpoint

Such requests must be rejected as not found since vmselect does not support multitenant endpoint.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4910

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-08-30 14:21:12 +02:00
Aliaksandr Valialkin
5c80b11c15
app/vmselect: prevent from panic when lookbehind window inside rollup function is parsed into negative value
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4795
2023-08-12 04:49:56 -07:00
Aliaksandr Valialkin
37af7d4ed3
app/{vmselect,vlselect}: run make vmui-update vmui-logs-update after 86f1459ca6 2023-08-11 07:01:15 -07:00
Damon07
4c509c0b89
{app/vmselect,docs}: support share_eq_over_time#4441 (#4725)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4441

Co-authored-by: wangm <wangmm@tuya.com>
2023-07-31 07:51:09 -07:00
Aliaksandr Valialkin
16c343f882
app/{vmselect,vlselect}/vmui: run make vmui-update vmui-logs-update after b6ae325763 2023-07-24 17:15:26 -07:00
Aliaksandr Valialkin
c921bc0833
app/{vmselect,vlselect}: run make vmui-update vmui-logs-update after recent changes to VMUI
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4604
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4676
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4294
2023-07-20 21:53:51 -07:00
Aliaksandr Valialkin
a0b7def89d
app/vmselect/promql: fix tests after 781947a7e2 2023-07-20 21:25:30 -07:00
Aliaksandr Valialkin
0cbe5ccb4a
app/vmselect: rename promql.WriteActiveQueries() to promql.ActiveQueriesHandler()
This makes it more consistent with the rest of handlers inside app/vmselect/main.go

This is a follow-up for 6a96fd8ed5

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4598
2023-07-20 11:30:40 -07:00
Aliaksandr Valialkin
992c300ce9
all: replace atomic.Value with atomic.Pointer[T]
This eliminates the need in .(*T) casting for results obtained from Load()

Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map,
because map is already a pointer type.
2023-07-19 17:48:26 -07:00
Yury Molodov
3ad80e281f
vmui: add Active Queries page (#4653)
* feat: add page to display a list of active queries (#4598)

* app/vmagent: code formatting

* fix: remove console

---------

Co-authored-by: dmitryk-dk <kozlovdmitriyy@gmail.com>
2023-07-19 16:02:58 -07:00
Aliaksandr Valialkin
5ace0701d3
app/vmselect/promql: add the ability to copy all the labels from one side of group_left()/group_right() operation
This is performed by specifying `*` inside group_left()/group_right().
Also allow specifying prefix for the copied labels via `group_left(...) prefix "..."` and `group_right(...) prefix "..."` syntax.
For example, the following query adds all the namespace-related labels to pod info, and prefixes all the copied label names with "ns_" prefix:

  kube_pod_info * on(namespace) group_left(*) prefix "ns_" kube_namespace_labels

This resolves the following StackOverflow questions:

- https://stackoverflow.com/questions/76661818/how-to-add-namespace-labels-to-pod-labels-in-prometheus
- https://stackoverflow.com/questions/76653997/how-can-i-make-a-new-copy-of-kube-namespace-labels-metric-with-a-different-name
2023-07-17 16:58:30 -07:00
Aliaksandr Valialkin
cc54fa2a56
app/vmselect/promql: recommend to use (a op b) keep_metric_names instead of a op b keep_metric_names
The `a op b keep_metric_names` is ambigouos to `a op (b keep_metric_names)` when `b` is a transform or rollup function.
For example, `a + rate(b) keep_metric_names`. So it is better to use more clear syntax: `(a op b) keep_metric_names`

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3710
2023-07-16 23:47:15 -07:00
Zakhar Bessarab
781947a7e2
metricsql: add support of using keep_metric_names for binary operations (#4109)
* metricsql: add support of using keep_metric_names for binary operations

This should help to avoid confusion with queries like one in the issue #3710.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* wip

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-07-16 03:01:27 -07:00
Aliaksandr Valialkin
a7fdc3fcc7
all: add support for or filters in series selectors
This commit adds ability to select series matching distinct filters via a single series selector.
For example, the following selector selects series with either {env="prod",job="a"}
or {env="dev",job="b"} labels:

  {env="prod",job="a" or env="dev",job="b"}

The `or` filter is supported in all the VictoriaMetrics tools now.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3997
Uses https://github.com/VictoriaMetrics/metricsql/pull/14
2023-07-15 23:56:18 -07:00
Aliaksandr Valialkin
f65153018b
app/{vmselect,vlselect}: run make vmui-update vmui-logs-update 2023-07-09 12:44:04 -07:00
Haleygo
ef8e3eb9b3
vmselect: fix result in Prometheus query when time is small (#4578)
vmselect: fix result in Prometheus query when time is small

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-07-09 12:33:29 -07:00
Aliaksandr Valialkin
e1a2404db5
app/vmselect/netstorage: follow-up after 173ccf4333
- Clarify docs about -replicationFactor command-line flag at vmselect
- Clarify description for -replicationFactor and -search.skipSlowReplicas command-line flags
- Fix the logic for returning responses if -search.skipSlowReplicas command-line flag
  is enabled. The logic was broken in the 173ccf4333,
  so it could return responses only if some of vmstorage nodes return error,
  while it should return when query results are successfully collected from more than
  (len(storageNodes) - replicationFactor) vmstorage nodes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
2023-07-09 11:58:22 -07:00
Haleygo
14e242d0b9
vmselect: fix result collect count (#4599) 2023-07-08 08:21:27 +02:00
Roman Khavronenko
173ccf4333
vmselect: introduce search.skipSlowReplicas cmd-line flag (#4538)
* vmselect: introduce `search.skipSlowReplicas` cmd-line flag

vmselect has two logical conditions during request processing when
`-replicationFactor` cmd-line flag is set:
1. If at least `len(storageNodes) - replicationFactor` responded, it could skip
waiting for the rest of nodes to respond. This could lead to problems described
here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207.
2. Mark response as partial if less than `len(storageNodes) - replicationFactor` responded
without an error.

The P1 showed itself error-prone and became the main reason why
`-replicationFactor` wasn't recommended to use at vmselect level.
However, this optimization could be still very useful in situations
when there are slow and fast replicas in cluster.

But P2 remains viable and important conditionless.
Hiding P1 behind the feature-flag `search.skipSlowReplicas`
should make `-replicationFactor` flag usable again. And let users
choose whether they want P1 to be respected.

Related issues
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* docs: update changelog

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-07 11:50:26 +02:00
Aliaksandr Valialkin
4b10432435
app/vlselect: handle vmui at /select/vmui path instead of /vmui
This simplifies routing at auth proxies such as vmauth to vlselect component,
which serves VMUI - just route all the requests, which start with /select/, to vlselect.
2023-07-06 21:36:28 -07:00
Aliaksandr Valialkin
427ce69426
app/vmselect: move common http functionality from app/vmselect/searchutils to lib/httputils
While at it, move app/vmselect/bufferedwriter to lib/bufferedwriter, since it is going to be used in VictoriaLogs
2023-07-06 17:22:23 -07:00
Aliaksandr Valialkin
dff199a745
app/vmselect/graphite: follow-up after c7884f8686
- Consistently use -search.maxGraphiteTagValues for limiting tag values from auto-complete API
- Use -search.maxGraphiteSeries for limiting paths (aka series), which can be returned from Graphite series API
- Clarify the change in docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4339
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2841
2023-07-06 15:19:07 -07:00
Aliaksandr Valialkin
eb47ad4b69
app/vmselect/netstorage: remove runtime.Gosched() call from unpackWorker()
This should improve scalability of unpackWorker() on systems with many CPU cores.
This is a follow-up for a2ecf4fa4a and 16f3b279a2

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-07-06 10:07:42 -07:00
Aliaksandr Valialkin
ec75d9097d
app/vmselect/netstorage: follow-up after 11ac551d52
- Clarify the scope of the fix at docs/CHANGELOG.md
- Handle the case when -search.maxSamplesPerSeries limit is exceeded
  in the same way as the -search.maxSamplesPerQuery limit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4472
2023-07-05 21:13:34 -07:00
Aliaksandr Valialkin
643e99a157
app/vmselect/netstorage: improve code readability a bit after 6c84b61893
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4364
2023-07-05 20:48:38 -07:00
Roman Khavronenko
11ac551d52
app/vmselect/netstorage: properly process -search.maxSamplesPerQuery limit (#4472)
Properly return the error to user when `-search.maxSamplesPerQuery` limit is exceeded.
Before, user could have received a partial response instead.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-23 13:17:34 +02:00
Dmytro Kozlov
c5debee3f4
app/{graphite,netstorage,prometheus}: fix graphite search tags api limits, remove redudant limit from SeriesHandler handler (#4352)
* app/{graphite,netstorage,prometheus}: fix graphite search tags api limits, remove unused limit from SeriesHandler handler,

* app/{graphite,netstorage,prometheus}: use search.maxTagValues for Graphite

* app/{graphite,netstorage,prometheus}: update CHANGELOG.md

* app/{graphite,netstorage,prometheus}: use own flags for Graphite API

* app/{graphite,netstorage,prometheus}: cleanup

* app/{graphite,netstorage,prometheus}: cleanup

* app/{graphite,netstorage,prometheus}: update docs

---------

Co-authored-by: Nikolay <nik@victoriametrics.com>

(cherry picked from commit c7884f8686)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-09 10:39:12 +02:00
Nikolay
e3ce736ce2
app/vmselect/graphite: fixes tests for arm (#4348)
at arm based CPUs only 9 digits after comma matches for tests.
Especially at holtWinters functions. Since it only takes effect at tests
it makes no sense for changing float prescision at actual functions

(cherry picked from commit 228ea03bda)
2023-06-02 13:19:34 +02:00
Roman Khavronenko
576e59d82c
cluster: standardize default HTTP responses (#4368)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-01 10:26:52 +02:00
Haleygo
6c84b61893
vmselect:fix init sn take too much time (#4366)
* vmselect: descrease start time for vmselect

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4364
2023-05-30 13:04:31 +02:00
Aliaksandr Valialkin
934a7f485c
app/vmselect: log locations of sendPrometheusError() calls
Previously the location inside the sendPrometheusError() was logged.
This could make hard investigating error locations via `vm_log_messages_total` metric.
2023-05-18 20:39:50 -07:00
Aliaksandr Valialkin
1ff67bb036
app/vmselect/vmui: run make vmui-update after 39c1b0f8d1 2023-05-18 12:15:22 -07:00
Alexander Marshalov
d321ea91f2
fixed typos in documentation and commandline flags descriptions (#4275) 2023-05-10 02:22:06 -07:00
Aliaksandr Valialkin
8703b2fa87
app/vmselect: small cleanup after 4f3f9950d0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3807
2023-05-09 22:45:02 -07:00
Aliaksandr Valialkin
fbc28810b1
app/vmselect: small cleanup after 68e31a6000
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3811
2023-05-09 22:43:59 -07:00
Aliaksandr Valialkin
5dbaffe2c6
app/{vmselect,vmctl}: move ParseTime() to lib/promutils
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4091

This is a follow-up for e2053baf32
2023-05-09 22:42:35 -07:00
Roman Khavronenko
5bc8d8f290
vmselect: exit early from queue on context cancel (#4223)
* vmselect: exit early from queue on context cancel

When `-search.maxConcurrentRequests` is reached, vmselect puts
request in the queue. It is expected, that requests in the queue
will be processed as soon as it would be enough capacity to do so.

However, it could happen that while request was waiting its turn,
the client could have already cancel it (close the connection,
or just close the tab with UI). In this case, we should de-queue
such requests to avoid spending extra resources on them.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmselect: address review comments

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-08 22:58:05 -07:00
Aliaksandr Valialkin
1a7794735e
app/vmselect: fix the build after fb8889820aba710508033cbf6826eb63a357532a 2023-05-08 17:32:18 -07:00
Yury Molodov
de35cbf251
vmui: Integrate WITH template playground (#3831)
* feat: add WithTemplate page

* app/vmselect/prometheus: enable json mode for expand with expr API

* app/vmselect/prometheus: enable CORS and add content type

* feat: add api for expand with templates

* fix: remove console from useExpandWithExprs

* app/vmselect/prometheus: fix escaping

* vmui:  integrate WITH template

* app/vmctl: check content type instead of form param

* fix: add content-type for fetch with-exprs

* fix: add a header to the server's response that allows the "Content-Type" header

* app/vmctl: added comment and cleanup

* app/vmctl: use format query param

---------

Co-authored-by: dmitryk-dk <kozlovdmitriyy@gmail.com>
2023-05-08 14:35:35 -07:00