Commit graph

241 commits

Author SHA1 Message Date
Roman Khavronenko
2b59fff526
vmalert: fix labels and annotations processing for alerts (#2403)
To improve compatibility with Prometheus alerting the order of
templates processing has changed.
Before, vmalert did all labels processing beforehand. It meant
all extra labels (such as `alertname`, `alertgroup` or rule labels)
were available in templating. All collisions were resolved in favour
of extra labels.
In Prometheus, only labels from the received metric are available in
templating, so no collisions are possible.
This change makes vmalert's behaviour similar to Prometheus.

For example, consider alerting rule which is triggered by time series
with `alertname` label. In vmalert, this label would be overriden
by alerting rule's name everywhere: for alert labels, for annotations, etc.
In Prometheus, it would be overriden for alert's labels only, but in annotations
the original label value would be available.

See more details here https://github.com/prometheus/compliance/issues/80

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-06 20:24:45 +02:00
Roman Khavronenko
70bb0d2708
vmalert: add flag for disabling long-lived connections (keepalive) (#2395)
The new flag `datasource.disableKeepAlive` allows disabling keepalive
connections. This may be useful if there are multiple datasource
replicas (e.g. vmselects) behind the HTTP balancer to avoid uneven
load spread because of long-lived connections.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-04 12:59:04 +03:00
Roman Khavronenko
1354e6d712
vmalert: protect executor's field from concurrent access (#2387)
Executor recently gain field for storing previously sent series.
Since the same executor object can be used in multiple goroutines,
the access to this field should be serialized.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-30 12:37:27 +02:00
Roman Khavronenko
0989649ad0
Vmalert compliance 2 (#2340)
* vmalert: split alert's `Start` field into `ActiveAt` and `Start`

The `ActiveAt` field identifies when alert becomes active for rules
with `for > 0`. Previously, this value was stored in field `Start`.

The field `Start` now identifies the moment alert became `FIRING`.

The split is needed in order to distinguish these two moments
in the API responses for alerts.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: support specific moment of time for rules evaluation

The Querier interface was extended to accept a new argument
used as a timestamp at which evaluation should be made.

It is needed to align rules execution time within the group.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: mark disappeared series as stale

Series generated by alerting rules, which were sent to remote write
now will be marked as stale if they will disappear on the next
evaluation. This would make ALERTS and ALERTS_FOR_TIME series
more precise.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: evaluate rules at fixed timestamp

Before, time at which rules were evaluated was calculated
right before rule execution. The change makes sure
that timestamp is calculated only once per evalution round
and all rules are using the same timestamp.

It also updates the logic of resending of already resolved
alert notification.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: allow overridin `alertname` label value if it is present in response

Previously, `alertname` was always equal to the Alerting Rule name. Now,
its value can be overriden if series in response containt the different value
for this label.

The change is needed for improving compatibility with Prometheus.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: align rules evaluation in time

Now, evaluation timestamp for rules evaluates as if
there was no delay in rules evaluation. It means, that
rules will be evaluated at fixed timestamps+group_interval.
This way provides more consistent evaluation results and
improves compatibility with Prometheus,

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: add metric for missed iterations

New metric `vmalert_iteration_missed_total` will show
whether rules evaluation round was missed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: reduce delay before the initial rule evaluation in group

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: rollback alertname override

According to the spec:
```
The alert name from the alerting rule (HighRequestLatency from the example above) MUST be added to the labels of the alert with the label name as alertname. It MUST override any existing alertname label.
```

https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-3
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: throw err immediately on dedup detection

```
The execution of an alerting rule MUST error out immediately and MUST NOT send any alerts
or add samples to samples receiver if there is more than one alert with the same labels
```

https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-4
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: cleanup

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: use strings builder to reduce allocs

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-29 15:09:07 +02:00
Roman Khavronenko
56de8f0356
docs: fix typo in vmalert's API (#2380)
The API handler was changed in 1.75 but docs
still contain the old address.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2366
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-28 12:07:02 +02:00
Aliaksandr Valialkin
c8f356a6a8
app: sync Markdown changes from a8de1ab000 2022-03-22 14:11:18 +02:00
Aliaksandr Valialkin
09c6c7350b
docs/vmalert.md: sync after 11ae1ae924 2022-03-17 20:17:40 +02:00
Dmytro Kozlov
11ae1ae924
Added resendDelay for alerts (#2296)
* vmalert: add support of `resendDelay` flag for alerts

Co-authored-by: dmitryk-dk <dmitry.kozlov@brightlocal.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2022-03-16 15:26:33 +00:00
Roman Khavronenko
fb6eab03a2
Vmalert compliance improvements (#2320)
* vmalert: add support for `sortByLabel` template function

* vmalert: update API according to Prometheus conformance program

The changes to the API, field names and URL path has been made
according to the Prometheus specification for `alert_generator`
https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md

* vmalert: fix the timestamp of the evaluated rules

The timestamp used for alert's `EndsAt` was calculated
before sending the notification. While the correct way
is to use the timestamp taken right before rules evaluation.

* vmalert: add `-datasource.queryTimeAlignment` flag

The flag is supposed to provide ability to disable `time`
param alignment when executing rules. By default, this flag
is enabled, so it remains backward compatible.

The flag was introduced to achieve better compatibility
with Prometheus behaviour according to https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-15 11:54:53 +00:00
Roman Khavronenko
0fa7effc4b
docs: fix broken links (#2303)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-13 15:56:01 +02:00
Dmytro Kozlov
565bd08c43
Issue-1824: added flags and different auth types support (#2287)
* vmalert/notifier: added flags and different auth types support

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2022-03-10 13:09:12 +02:00
Bastien Dronneau
8b21f40217
docs(vmalert): typo in path (#2278) 2022-03-05 22:35:10 +02:00
Denys Holius
1685e181ae
Added minimal supported version of AlertManager (#2237)
* added minimal supported version of supported AlertManager

* docs: `make docs-sync`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-22 20:07:04 +02:00
Roman Khavronenko
69d1893f4c
Consul SD - update services on the watcher's start (#2202)
* lib/discovery/consul: update services on the watcher's start

Previously, watcher's start was only initing goroutines for discovery
but not waiting for the first iteration to end. It means first Consul
discovery wasn't returning discovered targets until the next iteration.

The change makes the watcher's start blocking until we get first discovery
iteration done and all registries updated.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: remove workarounds for consul SD

Now when consul SD lib properly updates services
on the first start, we don't need workarounds in vmalert.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/discovery/consul: update after review

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-21 15:32:45 +02:00
Aliaksandr Valialkin
ee5da826e9
docs: update -help output for VictoriaMetrics components 2022-02-15 21:08:22 +02:00
hagen1778
2efa46a11c vmalert: support $externalLabels and $externalURL in templates
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2193
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-15 17:33:52 +03:00
Nikolay
75e84144c7
adds release build for macos darwin amd64 and arm64 (#2185)
* adds release build for macos darwin amd64 and arm64

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1896
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1851

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-14 17:28:56 +02:00
Roman Khavronenko
e3adcbec6e
lib/promscrape: support prometheus-like duration in scrape configs (#2169)
* lib/promscrape: support prometheus-like duration in scrape configs

The change allows to specify duration values like `1d`, `1w`
for fields `scrape_interval`, `scrape_timeout`, etc.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/blockcache: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/promscrape: support prometheus-like duration in scrape configs

* add support for extra fields `scrape_align_interval` and `scrape_offset`;
* support Prometheus duration parsing for `__scrape_interval__`
and `__scrape_duration__` labels;

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

* docs/CHANGELOG.md: document the feature

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 16:17:00 +02:00
hagen1778
4e722c459b vmalert: fix bug with relative links in UI
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2167
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-09 12:18:39 +03:00
Aliaksandr Valialkin
9bb60ab00f
lib/promscrape: set -promscrape.config.strictParse to true by default
This allows detecting long-living silent errors in -promscrape.config
2022-02-08 15:41:43 +02:00
Aliaksandr Valialkin
ba1b3b8ef2
docs: cross-link downsampling docs from deduplication and vmalert docs 2022-02-04 11:57:05 +02:00
Aliaksandr Valialkin
6530bcedec
docs: updates after 5da71eb685
* Mention about the ability to configure vmalert notifiers via files in docs/CHANGELOG.md
* Mention about the ability to use Consul service discovery for vmalert notifiers in docs/CHANGELOG.md
* Run `make docs-sync` in order to sync app/vmalert/README.md to docs/vmalert.md
2022-02-02 23:26:18 +02:00
hagen1778
55e3bbd4cc vmalert: add support of -notifier.basicAuth.passwordFile flag for notifiers
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1567

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-02 18:58:54 +03:00
hagen1778
f57982eddc vmalert: remove trailing slash for static notifier addresses
This would make addresses `http://localhost:9093` and `http://localhost:9093/`
both to result into `http://localhost:9093/api/v2/alerts`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-02 18:58:17 +03:00
Roman Khavronenko
5da71eb685
vmalert: support configuration file for notifiers (#2127)
vmalert: support configuration file for notifiers

* vmalert notifiers now can be configured via file
see https://docs.victoriametrics.com/vmalert.html#notifier-configuration-file
* add support of Consul service discovery for notifiers config
see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1947
* add UI section for currently loaded/discovered notifiers
* deprecate `-rule.configCheckInterval` in favour of `-configCheckInterval`
* add ability to suppress logs for duplicated targets for notifiers discovery
* change behaviour of `vmalert_alerts_send_errors_total` - it now accounts
for failed alerts, not HTTP calls.
2022-02-02 14:11:41 +02:00
Aliaksandr Valialkin
831b93a755
app/vmalert: add parseDuration function in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/8817
2022-01-13 23:30:41 +02:00
Aliaksandr Valialkin
80f966b80c
app/vmalert: add stripPort template function in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/10002
2022-01-13 22:53:42 +02:00
Andrey Afoninsky
77bfa8181d
chore: add vmalert_remotewrite_total metric (#2040)
Co-authored-by: Andrey Afoninsky <andrey.afoninsky@booking.com>
2022-01-07 16:15:34 +02:00
Denys Holius
ae89b4e818
Old links replaced for newest (#2033)
* replaced old links to the website

* fixed deletion main README.md file

* fix: added docs files after docs-sync
2022-01-05 16:30:13 +02:00
Roman Khavronenko
9bb7905d26
vmalert: check if remoteWrite is configured for replay mode (#1990)
* vmalert: check if remoteWrite is configured for replay mode

The purpose of `replay` mode is to backfill results of recording
or alerting rules. So `remoteWrite.url` should be required.
Otherwise, process can fail on attempt to send data.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update app/vmalert/main.go

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-21 20:25:47 +02:00
Roman Khavronenko
6814cc6809
vmalert: always convert step value to seconds for better compatibility (#1955)
When using `vmalert` with older Prometheus versions, the passed
`step=2m` may be parsed by Prometheus with an err: "cannot parse \"2m0s\" to a valid duration".
In order to improve compatibility vmalert will always convert step duration to seconds.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1943
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-17 15:26:34 +02:00
Roman Khavronenko
2851709745
vmalert: update the order of service labels attaching (#1922)
Service labels like `alertname` or `alertgroup` were attached
after template expanding for `labels` section. Because of this,
labels `alertname` or `alertgroup` weren't available for templating
in `labels` section of alert's definition.
This commit changes the order of labels attaching and adds a test
for verifying these labels availability.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1921
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-10 12:10:26 +02:00
Aliaksandr Valialkin
896fa9bb7c
app/vmalert/config: sort extra_filter labels before passing them to query args in order to get consistent order of query args across runs
This fixes TestGroupParams test - see https://github.com/VictoriaMetrics/VictoriaMetrics/runs/4432510244?check_suite_focus=true#step:5:288
2021-12-08 13:02:49 +02:00
Roman Khavronenko
0afd14a14a
vmalert: introduce additional HTTP URL params per-group configuration (#1892)
* vmalert: introduce additional HTTP URL params per-group configuration

The new group field `params` allows to configure custom HTTP URL params
per each group. These params will be applied to every request before
executing rule's expression. Hot config reload is also supported.

Field `extra_filter_labels` was deprecated in favour of `params` field.
vmalert will print deprecation log message if config file contains
the deprecated field.

`params` fields are supported by both Prometheus and Graphite datasource types.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: provide more examples for `params` field

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: set higher priority for `params` setting

If there would be a conflict between URL params set in `datasource.url` flag
and params in group definition the latter will have higher priority.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-02 14:45:08 +02:00
Roman Khavronenko
e5b451a66a
ci: bump go version to 1.17 (#1895)
The bump was required for `vmalert` package.
`vmalert` docs now also contain an updated description.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-02 14:42:25 +02:00
Roman Khavronenko
d052c8c81e
vmalert: adjust topologies docs in README (#1893)
Commit changes images width and order in topologies section
for better readability.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-02 11:27:46 +03:00
Roman Khavronenko
866b6a842b
Vmalert docs upd (#1890)
* vmalert: add topology examples in docs

* vmalert: docs typo fix
2021-12-01 18:33:06 +03:00
Roman Khavronenko
40fcf667b0
vmalert: continue to print errors for bad config during hot reload (#1871)
Previously, vmalert would print an err message and set vmalert_config_last_reload_successful=0
only once during a hot reload of a bad config. Such behaviour may result into non noticed
event of a bad config reload attempt

Now, it continues to print error messages and keep vmalert_config_last_reload_successful state
until successful attempt will be made or config state will be rolled back to prev state.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-11-30 01:23:49 +02:00
Aliaksandr Valialkin
10bd8b1d86
docs/CHANGELOG.md: document 852a895b70 2021-11-30 01:22:37 +02:00
Roman Khavronenko
852a895b70
vmalert: make notifier.Addr optional (#1870)
For a long time notifier.Addr flag was required. The assumption was that vmalert will
be always used for alerting. However, practice shows that some users need only
recording rules. In this case, requirement of notifier.Addr is ambigious.

The change verifies if loaded config contains recording or alerting rules and
if there are corresponding flags set. This is true for initial config load
and hot reload.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-11-30 01:18:48 +02:00
Aliaksandr Valialkin
fc534a1e7f
app/vmalert/README.md: sync with docs/vmalert.md
This is a follow-up after d8c70903ec
2021-11-17 00:56:04 +02:00
Aliaksandr Valialkin
e5ac9d8e57
all: consistently return application/json content-type without charset=utf-8
The `application/json` content-type has utf-8 encoding by default.
See https://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897
2021-11-09 18:04:44 +02:00
Aliaksandr Valialkin
5046efb94b
docs/vmalert.md: improve wording in Multitenancy chapter 2021-11-09 14:19:52 +02:00
Aliaksandr Valialkin
a67518fc6d
docs: mention that graphs on the official dashboards contain useful hints 2021-11-08 19:54:10 +02:00
Aliaksandr Valialkin
34b5414ba8
app/{vmalert,vmbackup}/README.md: sync with docs after the commit 47d1612bf8 2021-11-05 20:45:38 +02:00
Aliaksandr Valialkin
237885e0d2
docs/vmalert.md: document the addition of -defaultTenant.prometheus and -defaultTenant.graphite command-line options to enterprise version of vmalert 2021-11-05 20:04:09 +02:00
Aliaksandr Valialkin
24dce03aaa
app/vmalert/datasource: use plain string literals instead of constants
This removes the unneeded level of indirection and improves code readability.

The "prometheus" and "graphite" constants aren't going to change in the future, so there is no sense in hiding them behind constants.
2021-11-05 19:57:47 +02:00
Aliaksandr Valialkin
bf814320b0
app/vmalert: remove rule.type config, since it doesnt play well with the upcoming default tenants for -clusterMode
It is better from the consistency point of view to set up rule types at group level where tenant config is set up.
2021-11-05 19:52:32 +02:00
Aliaksandr Valialkin
cbfc7b7c92
app/{vminsert,vmagent}: hide passwords and auth tokens by default at /config page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1764
2021-11-05 14:41:16 +02:00
Aliaksandr Valialkin
6608705652
app/{vmalert,vmagent}: improve the distribution of scrape offsets among targets / rules
Previously only the lower part of 64-bit hash was used for calculating the offset.
This may give uneven distribution in some cases. So let's use all the available 64 bits from the hash
for calculating the offset.
2021-10-27 19:59:16 +03:00