* vmalert: add support of `resendDelay` flag for alerts
Co-authored-by: dmitryk-dk <dmitry.kozlov@brightlocal.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
vmalert: support configuration file for notifiers
* vmalert notifiers now can be configured via file
see https://docs.victoriametrics.com/vmalert.html#notifier-configuration-file
* add support of Consul service discovery for notifiers config
see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1947
* add UI section for currently loaded/discovered notifiers
* deprecate `-rule.configCheckInterval` in favour of `-configCheckInterval`
* add ability to suppress logs for duplicated targets for notifiers discovery
* change behaviour of `vmalert_alerts_send_errors_total` - it now accounts
for failed alerts, not HTTP calls.
Previously, ID for alert entity was generated without alertname or groupname.
This led to collision, when multiple alerting rules within the same group
producing same labelsets. E.g. expr: `sum(metric1) by (job) > 0` and
expr: `sum(metric2) by (job) > 0` could result into same labelset `job: "job"`.
The issue affects only UI and Web API parts of vmalert, because alert ID is used
only for displaying and finding active alerts. It does not affect state restore
procedure, since this label was added right before pushing to remote storage.
The change now adds all extra labels right after receiving response from the datasource.
And removes adding extra labels before pushing to remote storage.
Additionally, change introduces a new flag `Restored` which will be displayed in UI
for alerts which have been restored from remote storage on restart.
* vmalert: remove `vmalert_execution_duration_seconds` metric
The summary for `vmalert_execution_duration_seconds` metric gives no additional
value comparing to `vmalert_iteration_duration_seconds` metric.
* vmalert: update config reload success metric properly
Previously, if there was unsuccessfull attempt to reload config and then
rollback to previous version - the metric remained set to 0.
* vmalert: add Grafana dashboard to overview application metrics
* docker: include vmalert target into list for scraping
* vmalert: extend notifier metrics with addr label
The change adds an `addr` label to metrics for alerts_sent and alerts_send_errors
to identify which exact address is having issues.
The according change was made to vmalert dashboard.
* vmalert: update documentation and docker environment for vmalert's dashboard
Mention Grafana's dashboard in vmalert's README in a new section #Monitoring.
Update docker-compose env to automatically add vmalert's dashboard.
Update docker-compose README with additional info about services.
* vmalert: support rules backfilling (aka `replay`)
vmalert can `replay` configured rules in the past
and backfill results via remote write protocol.
It supports MetricsQL/PromQL storage as data source,
and can backfill data to remote write compatible
storage.
Supports recording and alerting rules `replay`. See more
details in README.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836
* vmalert: review fixes
* vmalert: readme fixes
* changes vmalert Querier with per rule querier
it allows to changes some parametrs based on rule setting
for instance - alert type, tenant for cluster version or event endpoint url.
* init implementation for graphite alerts
* adds graphite support for vmalert
* small fix
* changes vmalert graphite api with type
* updates tests
* small fix
* fixes graphite parse
* Fixes graphite from time
The feature allows to speed up group rules execution by
executing them concurrently.
Change also contains README changes to reflect configuration
details.
* vmalert: Add recording rules support.
Recording rules support required additional service refactoring since
it wasn't planned to support them from the very beginning. The list
of changes is following:
* new entity RecordingRule was added for writing results of MetricsQL
expressions into remote storage;
* interface Rule now unites both recording and alerting rules;
* configuration parser was moved to separate package and now performs
more strict validation;
* new endpoint for listing all groups and rules in json format was added;
* evaluation interval may be set to every particular group;
* vmalert: uncomment tests
* vmalert: rm outdated TODO
* vmalert: fix typos in README