Commit graph

1470 commits

Author SHA1 Message Date
Roman Khavronenko
debe1793bb
vmalert: follow-up after d4ac4b7813 (#4659)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-18 16:03:28 -07:00
hagen1778
456a2e70fd
docs: mention change from 6f3fee197e
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-18 16:01:59 -07:00
Aliaksandr Valialkin
8e42b2294c
docs/VictoriaLogs: add CHANGELOG.md 2023-07-17 23:14:23 -07:00
Aliaksandr Valialkin
5ace0701d3
app/vmselect/promql: add the ability to copy all the labels from one side of group_left()/group_right() operation
This is performed by specifying `*` inside group_left()/group_right().
Also allow specifying prefix for the copied labels via `group_left(...) prefix "..."` and `group_right(...) prefix "..."` syntax.
For example, the following query adds all the namespace-related labels to pod info, and prefixes all the copied label names with "ns_" prefix:

  kube_pod_info * on(namespace) group_left(*) prefix "ns_" kube_namespace_labels

This resolves the following StackOverflow questions:

- https://stackoverflow.com/questions/76661818/how-to-add-namespace-labels-to-pod-labels-in-prometheus
- https://stackoverflow.com/questions/76653997/how-can-i-make-a-new-copy-of-kube-namespace-labels-metric-with-a-different-name
2023-07-17 16:58:30 -07:00
Aliaksandr Valialkin
cc54fa2a56
app/vmselect/promql: recommend to use (a op b) keep_metric_names instead of a op b keep_metric_names
The `a op b keep_metric_names` is ambigouos to `a op (b keep_metric_names)` when `b` is a transform or rollup function.
For example, `a + rate(b) keep_metric_names`. So it is better to use more clear syntax: `(a op b) keep_metric_names`

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3710
2023-07-16 23:47:15 -07:00
Zakhar Bessarab
781947a7e2
metricsql: add support of using keep_metric_names for binary operations (#4109)
* metricsql: add support of using keep_metric_names for binary operations

This should help to avoid confusion with queries like one in the issue #3710.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* wip

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-07-16 03:01:27 -07:00
Aliaksandr Valialkin
a7fdc3fcc7
all: add support for or filters in series selectors
This commit adds ability to select series matching distinct filters via a single series selector.
For example, the following selector selects series with either {env="prod",job="a"}
or {env="dev",job="b"} labels:

  {env="prod",job="a" or env="dev",job="b"}

The `or` filter is supported in all the VictoriaMetrics tools now.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3997
Uses https://github.com/VictoriaMetrics/metricsql/pull/14
2023-07-15 23:56:18 -07:00
Aliaksandr Valialkin
d721109961
docs/CHANGELOG.md: sync with master branch 2023-07-14 10:48:40 -07:00
Haleygo
5e5c805599
vmalert: fix evalTS after modify group interval (#4629) 2023-07-14 10:47:29 -07:00
Aliaksandr Valialkin
e1cf962bad
lib/storage: switch from global to per-day index for MetricName -> TSID mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).

The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.

The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.

VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:

- If `storage/tsid` cache capacity isn't enough for active time series.
  Then just increase available memory for VictoriaMetrics or reduce the number of active time series
  ingested into VictoriaMetrics.

- If new time series is ingested into VictoriaMetrics. In this case it cannot find
  the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
  since it doesn't know that the index has no the corresponding entry too.
  This is a typical event under high churn rate, when old time series are constantly substituted
  with new time series.

Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.

Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.

This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.

The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.

This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .

At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698

The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685

This is a follow-up for 1f28b46ae9
2023-07-13 17:03:50 -07:00
Aliaksandr Valialkin
df67b78f75
docs/CHANGELOG.md: clarify the description of the bugfix at 177a0c1ca9
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4555
2023-07-13 12:19:00 -07:00
Dmytro Kozlov
f31ac064f9
app/vmctl: fix panic --remote-read-filter-time-start flag not defined (#4605)
* app/vmctl: fix panic `--remote-read-filter-time-start` flag not defined

* app/vmctl: update CHANGELOG.md

---------

Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-07-13 12:13:21 -07:00
Dmytro Kozlov
555a0a9d57
app/vmctl: fix issue with adding many seconds (#4617)
* app/vmctl: fix issue with adding many seconds

* app/vmagent: add CHANGELOG.md
2023-07-13 12:09:54 -07:00
Roman Khavronenko
fdccb56620
vmalert: check for negative offset for missed rounds (#4628)
It could happen for low evaluation intervals and irregular
delays during execution that evaluation time would get
a negative offset. This could result into cumulative
discrepancy between the actual time and evaluation time for rules.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-13 12:05:52 -07:00
Aliaksandr Valialkin
b07a1c85b9
all: update Go builder from 1.20.5 to 1.20.6
See https://github.com/golang/go/issues?q=milestone%3AGo1.20.6+label%3ACherryPickApproved
2023-07-12 01:00:24 -07:00
Haleygo
ef8e3eb9b3
vmselect: fix result in Prometheus query when time is small (#4578)
vmselect: fix result in Prometheus query when time is small

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-07-09 12:33:29 -07:00
Haleygo
3c2308fd52
vmalert:fix query request using rfc3339 format (#4577)
vmalert: consistently use time.RFC3339 format for time in queries

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-07-09 11:03:10 -07:00
Roman Khavronenko
173ccf4333
vmselect: introduce search.skipSlowReplicas cmd-line flag (#4538)
* vmselect: introduce `search.skipSlowReplicas` cmd-line flag

vmselect has two logical conditions during request processing when
`-replicationFactor` cmd-line flag is set:
1. If at least `len(storageNodes) - replicationFactor` responded, it could skip
waiting for the rest of nodes to respond. This could lead to problems described
here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207.
2. Mark response as partial if less than `len(storageNodes) - replicationFactor` responded
without an error.

The P1 showed itself error-prone and became the main reason why
`-replicationFactor` wasn't recommended to use at vmselect level.
However, this optimization could be still very useful in situations
when there are slow and fast replicas in cluster.

But P2 remains viable and important conditionless.
Hiding P1 behind the feature-flag `search.skipSlowReplicas`
should make `-replicationFactor` flag usable again. And let users
choose whether they want P1 to be respected.

Related issues
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* docs: update changelog

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-07 11:50:26 +02:00
Roman Khavronenko
109e55f865
vmalert: allow disabling of step param attached to instant queries (#4574)
vmalert: allow disabling of `step` param attached to instant queries

This might be useful for using vmalert with datasources that to not support this param,
unlike VictoriaMetrics.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4573

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 23:13:56 -07:00
Aliaksandr Valialkin
eea088d87f
docs/CHANGELOG.md: clarify description for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336 bugfix
This is a follow-up for 5eb5df96e2
2023-07-06 22:42:02 -07:00
Aliaksandr Valialkin
eeb53660b8
docs/CHANGELOG.md: use the proper link to the issue related to the commit 7a92263459
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4402
2023-07-06 22:41:43 -07:00
Aliaksandr Valialkin
67a8992798
docs/CHANGELOG.md: remove redundant info from the url to consulagent_sd_configs docs
This is a follow-up for 40d12be607
2023-07-06 22:41:23 -07:00
Aliaksandr Valialkin
40f1ccba67
docs/CHANGELOG.md: clarify the description of the bugfix at ce7141383d 2023-07-06 22:41:03 -07:00
Aliaksandr Valialkin
dc89e1f644
app/vmselect/graphite: follow-up after c7884f8686
- Consistently use -search.maxGraphiteTagValues for limiting tag values from auto-complete API
- Use -search.maxGraphiteSeries for limiting paths (aka series), which can be returned from Graphite series API
- Clarify the change in docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4339
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2841
2023-07-06 22:33:30 -07:00
Alexander Marshalov
eb611c3dc3
fix removing storage data dir before restoring from backup (#598)
* fix removing storage data dir before restoring from backup

Signed-off-by: Alexander Marshalov <_@marshalov.org>

* fix review comment

Signed-off-by: Alexander Marshalov <_@marshalov.org>

* fix review comment

Signed-off-by: Alexander Marshalov <_@marshalov.org>

* fixes after merge with `enterprise-single-node` branch

Signed-off-by: Alexander Marshalov <_@marshalov.org>

---------

Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-07-06 22:32:12 -07:00
Aliaksandr Valialkin
2f19ba0f75
app/vmselect/netstorage: follow-up after 11ac551d52
- Clarify the scope of the fix at docs/CHANGELOG.md
- Handle the case when -search.maxSamplesPerSeries limit is exceeded
  in the same way as the -search.maxSamplesPerQuery limit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4472
2023-07-06 22:26:47 -07:00
Roman Khavronenko
bd5abb74fd
vmctl: interrupt explore procedure in influx mode if no numeric fields were found (#4576)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 22:21:18 -07:00
Roman Khavronenko
41f0ed48eb
docs: follow-up after 9da638aa66 (#4572)
9da638aa66

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 22:18:54 -07:00
Dmytro Kozlov
dd412a3757
app/vmalert: show on UI groups error after reload config (#4543)
show on UI groups error after reload config

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4076

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 22:11:36 -07:00
Haleygo
b029286298
fix parse for invalid partial RFC3339 format (#4539)
The validation was needed for covering corner cases when storage is tested with data from 1970.
This resulted into unexpected search results, as year was parsed incorrectly from the given timestamp.


Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 22:09:35 -07:00
Nikolay
68879061be
docs: adds v1.91.3 release docs (#4561) 2023-07-06 22:06:58 -07:00
Yury Molodov
8c190ec8fb
vmui: fix app routing issues (#4408)
The change focuses on rectifying inconsistencies in the navigation behavior of the application
and eliminating issues encountered when manually altering the URL.

The key updates include:
- Refactoring of the routing mechanism to handle all possible routes and their states.
- Enhancement of the React Router usage to ensure a smoother navigation experience.
- Handling application state when the URL is manually changed.
2023-07-06 21:58:09 -07:00
Alexander Marshalov
677c8a5465
show backup progress percentage in vmbackup log during backup uploading and restoring progress percentage in vmrestore log during backup downloading (#4460) (#4530)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-07-06 21:56:54 -07:00
Roman Khavronenko
cf433c066a
vmauth: expose latency metrics per user (#4525)
expose `vmauth_user_request_duration_seconds`
and `vmauth_unauthorized_user_request_duration_seconds` summary metrics
for measuring requests latency per user.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 21:55:37 -07:00
Haleygo
9e49a9e924
vmalert: add vmalert_remotewrite_sent_duration_seconds_total metric (#4517)
add `vmalert_remotewrite_sent_duration_seconds_total` metric
2023-07-06 21:51:31 -07:00
Roman Khavronenko
d5e7ea5ef3
vmalert: update retry policy for pushing data to -remoteWrite.url (#4504)
By default, vmalert will make multiple retry attempts with exponential delay.
The total time spent during retry attempts shouldn't exceed `-remoteWrite.retryMaxTime` (default is 30s).
When retry time is exceeded vmalert drops the data dedicated for `-remoteWrite.url`.
Before, vmalert dropped data after 5 retry attempts with 1s delay between attempts (not configurable).

See `-remoteWrite.retryMinInterval` and `-remoteWrite.retryMaxTime` cmd-line flags.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-07-06 21:44:18 -07:00
Roman Khavronenko
311a81c7b0
vmalert: properly interrupt remotewrite retries on shutdown (#4505)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 21:43:04 -07:00
Zakhar Bessarab
7a000159d8
docs/changelog: followup for 830dac177f (#4499)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-07-06 21:41:36 -07:00
Roman Khavronenko
d4ee505f6f
vmalert: retry all errors except 4XX status codes (#4461)
vmalert: retry all errors except 4XX status codes

Retry all errors except 4XX status codes while pushing via remote-write
to the remote storage. Previously, errors like broken connection could
prevent vmalert from retrying the request.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 17:34:32 -07:00
Yury Molodov
0ad966a898
vmui: memory leak fix (#4455)
* fix: optimize the preparation of data for the graph

* fix: optimize tooltip rendering

* fix: optimize re-rendering of the chart

* vmui: memory leak fix
2023-07-06 17:33:54 -07:00
Aliaksandr Valialkin
46210c4d5e
lib/promutils.ParseTime(): add support for timestamps in milliseconds
See https://stackoverflow.com/questions/76437098/how-to-handle-time-unit-and-step-while-ingesting-or-querying-in-victoriametrics/76438405

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4459
2023-07-06 17:11:54 -07:00
Nikolay
dd7ebd6779
lib/storage: creates parts.json on start-up if it not exists. (#4450)
* lib/storage: creates parts.json on start-up if it not exists.
It fixes migrations from versions below v1.90.0.
Previously parts.json was created only after successful merge.
But if merge was interruped for some reason (OOM or shutdown), parts.json wasn't created and partitions left after interruped merge weren't properly deleted.
Since VM cannot check if it must be removed or not.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336

* Apply suggestions from code review

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* Update lib/storage/partition.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-07-06 17:10:26 -07:00
Dmytro Kozlov
b32a270da7
vmctl: increase retry backoff policy delay (#4447)
vmctl: update backoff policy on retries to reduce probability of overloading for `source` or `destination` databases
2023-07-06 17:00:06 -07:00
Dmytro Kozlov
2e81c5f740
vmctl: finish retries if context canceled (#4442)
vmctl: interrupt backoff retries if import context is cancelled

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-07-06 16:56:00 -07:00
Alexander Marshalov
4084dba9e4
fixed service name detection for consulagent service discovery in case of a difference in service name and service id (#4390) (#4439)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-07-06 16:53:29 -07:00
Roman Khavronenko
ecd7ec4832
Dashboard upd (#4438)
dashboards: update dashboard for single-node version
* add anonymous mem usage panel;
* add syscall rate panel;
* add location to logs panel;
* update legend for panels to reflect instance name;
* update queries to aggregate per instance.

dashboards: update dashboard for cluster version
* add syscall rate panel;
* add drilldown to logs panel.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 16:49:42 -07:00
Aliaksandr Valialkin
ed868f47f9
docs/CHANGELOG.md: remove the change regarding http2 support at vmagent
This is a follow-up for 8a07621a0c

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4283
2023-07-06 16:06:44 -07:00
Aliaksandr Valialkin
dff199a745
app/vmselect/graphite: follow-up after c7884f8686
- Consistently use -search.maxGraphiteTagValues for limiting tag values from auto-complete API
- Use -search.maxGraphiteSeries for limiting paths (aka series), which can be returned from Graphite series API
- Clarify the change in docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4339
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2841
2023-07-06 15:19:07 -07:00
Aliaksandr Valialkin
ec75d9097d
app/vmselect/netstorage: follow-up after 11ac551d52
- Clarify the scope of the fix at docs/CHANGELOG.md
- Handle the case when -search.maxSamplesPerSeries limit is exceeded
  in the same way as the -search.maxSamplesPerQuery limit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4472
2023-07-05 21:13:34 -07:00
Roman Khavronenko
11ac551d52
app/vmselect/netstorage: properly process -search.maxSamplesPerQuery limit (#4472)
Properly return the error to user when `-search.maxSamplesPerQuery` limit is exceeded.
Before, user could have received a partial response instead.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-23 13:17:34 +02:00