* vmalert: add support of `resendDelay` flag for alerts
Co-authored-by: dmitryk-dk <dmitry.kozlov@brightlocal.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
* vmalert: add support for `sortByLabel` template function
* vmalert: update API according to Prometheus conformance program
The changes to the API, field names and URL path has been made
according to the Prometheus specification for `alert_generator`
https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md
* vmalert: fix the timestamp of the evaluated rules
The timestamp used for alert's `EndsAt` was calculated
before sending the notification. While the correct way
is to use the timestamp taken right before rules evaluation.
* vmalert: add `-datasource.queryTimeAlignment` flag
The flag is supposed to provide ability to disable `time`
param alignment when executing rules. By default, this flag
is enabled, so it remains backward compatible.
The flag was introduced to achieve better compatibility
with Prometheus behaviour according to https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/discovery/consul: update services on the watcher's start
Previously, watcher's start was only initing goroutines for discovery
but not waiting for the first iteration to end. It means first Consul
discovery wasn't returning discovered targets until the next iteration.
The change makes the watcher's start blocking until we get first discovery
iteration done and all registries updated.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: remove workarounds for consul SD
Now when consul SD lib properly updates services
on the first start, we don't need workarounds in vmalert.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/discovery/consul: update after review
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/promscrape: support prometheus-like duration in scrape configs
The change allows to specify duration values like `1d`, `1w`
for fields `scrape_interval`, `scrape_timeout`, etc.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/blockcache: make linter happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/promscrape: support prometheus-like duration in scrape configs
* add support for extra fields `scrape_align_interval` and `scrape_offset`;
* support Prometheus duration parsing for `__scrape_interval__`
and `__scrape_duration__` labels;
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
* wip
* docs/CHANGELOG.md: document the feature
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* Mention about the ability to configure vmalert notifiers via files in docs/CHANGELOG.md
* Mention about the ability to use Consul service discovery for vmalert notifiers in docs/CHANGELOG.md
* Run `make docs-sync` in order to sync app/vmalert/README.md to docs/vmalert.md
vmalert: support configuration file for notifiers
* vmalert notifiers now can be configured via file
see https://docs.victoriametrics.com/vmalert.html#notifier-configuration-file
* add support of Consul service discovery for notifiers config
see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1947
* add UI section for currently loaded/discovered notifiers
* deprecate `-rule.configCheckInterval` in favour of `-configCheckInterval`
* add ability to suppress logs for duplicated targets for notifiers discovery
* change behaviour of `vmalert_alerts_send_errors_total` - it now accounts
for failed alerts, not HTTP calls.
* vmalert: check if remoteWrite is configured for replay mode
The purpose of `replay` mode is to backfill results of recording
or alerting rules. So `remoteWrite.url` should be required.
Otherwise, process can fail on attempt to send data.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update app/vmalert/main.go
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
When using `vmalert` with older Prometheus versions, the passed
`step=2m` may be parsed by Prometheus with an err: "cannot parse \"2m0s\" to a valid duration".
In order to improve compatibility vmalert will always convert step duration to seconds.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1943
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Service labels like `alertname` or `alertgroup` were attached
after template expanding for `labels` section. Because of this,
labels `alertname` or `alertgroup` weren't available for templating
in `labels` section of alert's definition.
This commit changes the order of labels attaching and adds a test
for verifying these labels availability.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1921
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: introduce additional HTTP URL params per-group configuration
The new group field `params` allows to configure custom HTTP URL params
per each group. These params will be applied to every request before
executing rule's expression. Hot config reload is also supported.
Field `extra_filter_labels` was deprecated in favour of `params` field.
vmalert will print deprecation log message if config file contains
the deprecated field.
`params` fields are supported by both Prometheus and Graphite datasource types.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: provide more examples for `params` field
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: set higher priority for `params` setting
If there would be a conflict between URL params set in `datasource.url` flag
and params in group definition the latter will have higher priority.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The bump was required for `vmalert` package.
`vmalert` docs now also contain an updated description.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously, vmalert would print an err message and set vmalert_config_last_reload_successful=0
only once during a hot reload of a bad config. Such behaviour may result into non noticed
event of a bad config reload attempt
Now, it continues to print error messages and keep vmalert_config_last_reload_successful state
until successful attempt will be made or config state will be rolled back to prev state.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
For a long time notifier.Addr flag was required. The assumption was that vmalert will
be always used for alerting. However, practice shows that some users need only
recording rules. In this case, requirement of notifier.Addr is ambigious.
The change verifies if loaded config contains recording or alerting rules and
if there are corresponding flags set. This is true for initial config load
and hot reload.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This removes the unneeded level of indirection and improves code readability.
The "prometheus" and "graphite" constants aren't going to change in the future, so there is no sense in hiding them behind constants.
Previously only the lower part of 64-bit hash was used for calculating the offset.
This may give uneven distribution in some cases. So let's use all the available 64 bits from the hash
for calculating the offset.
Prometheus allows to have groups with no rules, so we should support
it in vmalert as well for compatibility reasons.
It is also allowed to hot-reload empty groups by adding or removing rules.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously, ID for alert entity was generated without alertname or groupname.
This led to collision, when multiple alerting rules within the same group
producing same labelsets. E.g. expr: `sum(metric1) by (job) > 0` and
expr: `sum(metric2) by (job) > 0` could result into same labelset `job: "job"`.
The issue affects only UI and Web API parts of vmalert, because alert ID is used
only for displaying and finding active alerts. It does not affect state restore
procedure, since this label was added right before pushing to remote storage.
The change now adds all extra labels right after receiving response from the datasource.
And removes adding extra labels before pushing to remote storage.
Additionally, change introduces a new flag `Restored` which will be displayed in UI
for alerts which have been restored from remote storage on restart.
Commit fixes potential race condition when group update
and generating of ID() happens simultaneously.
Signed-off-by: hagen1778 <roman@victoriametrics.com>