* vmalert: speed up state restore procedure on start
Alerts state restore procedure has been changed to become asynchronous.
It doesn't block groups start anymore which significantly improves vmalert's startup time.
Instead, state restore is called by each group in their goroutines after the first rules
evaluation.
While previously state restore attempt was made for all loaded alerting rules,
now it is called only for alerts which became active after the first evaluation.
This reduces the amount of API calls to the configured remote read URL.
This also means that `remoteRead.ignoreRestoreErrors` command-line flag becomes deprecated now
and will have no effect if configured.
See relevant issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2608
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* make lint happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Apply suggestions from code review
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Allow configuring the default number of stored rule's update states in memory
via global `-rule.updateEntriesLimit` command-line flag or per-rule via rule's
`update_entries_limit` configuration param.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously the `quotesEscape` function was escaping only double quotes.
This wasn't enough, since the input string could contain other special chars,
which must be escaped when put inside JSON string. For example, carriage return and line feed chars (\n\r),
backslash char, etc. This led to the following issues, which were improperly fixed:
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/890 - this issue
was "fixed" by introducing the `crlfEscape` function, which led to unnecessary
complications in user templates, while not fixing various corner cases
such as backslash chars in the input string.
See 1de15ad490
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3139 - this issue
was "fixed" by urlencoding the whole string passed to -external.alert.source
command-line flag. This led to invalid urls, which couldn't be parsed by Grafana.
See 00c838353d
and 4bd0244599
This commit properly encodes the input string passed to `quotesEscape`, so it can be safely embedded inside JSON strings.
This commit deprecates crlfEscape template function and adds the following new template functions:
- strvalue and stripDomain - these functions are supported by Prometheus, so they were added
for compatibility purposes.
- jsonEscape and htmlEscape for converting the input string to valid quoted JSON string
and for html-escaping the input string, so it could be safely embedded as a plaintext
into html.
This commit also documents all supported template functions at https://docs.victoriametrics.com/vmalert.html#template-functions
The deprecated crlfEscape function isn't documented on purpose, since its usefulness is negative in general case.
This reverts commit 00c838353d.
Reason for revert: it incorrectly fixes the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3139 .
Now `-external.alert.source=explore?orgId=1&left=...` is converted to the following invalid url, which cannot be handled by Grafana:
https://grafana.example.com/explore%3ForgId%3D1%26left%3D...
The next commit will contain the correct fix of the issue - the `quotesEscape` function must
properly escape the string, so it could be embedded into JSON string. This function must
properly escape \n\r chars too. In this case the `crlfEscape` function becomes unnecessary.
Actually, the next commit makes the `crlfEscape` function deprecated.
The change is supposed to provide additional flexibility for generating alert's
source link based on label values.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
We switch default alert's source link to redirect user
to vmalert's UI instead of previous JSON object. While it breaks
compatibility, it also supposed to improve user's experience.
The old behavior can be achieved by updating `-external.alert.source`
command-line flag.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Other components, such as `vmagent`, mark these flags as sensitive and
hide them from the `/metrics` endpoint by default. This commit adds
similar handling to the `vmalert` component, hiding them by default, to
prevent logging of secrets inappropriately.
Showing of these values is controlled by an additional flag.
Follow up to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2947
* vmalert: deprecate alert's status link
Deprecate alert's status link `/api/v1/<groupID>/<alertID>/status` in favour of
`api/v1/alerts?group_id=<group_id>&alert_id=<alert_id>"`.
The change was needed for simplifying logic in vmselect for proxying vmalert's requests.
The old alert's status link will be still supported for a few versions but will be removed in the future.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2825
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: fix review comments
Signed-off-by: hagen1778 <roman@victoriametrics.com>
To improve compatibility with Prometheus alerting the order of
templates processing has changed.
Before, vmalert did all labels processing beforehand. It meant
all extra labels (such as `alertname`, `alertgroup` or rule labels)
were available in templating. All collisions were resolved in favour
of extra labels.
In Prometheus, only labels from the received metric are available in
templating, so no collisions are possible.
This change makes vmalert's behaviour similar to Prometheus.
For example, consider alerting rule which is triggered by time series
with `alertname` label. In vmalert, this label would be overriden
by alerting rule's name everywhere: for alert labels, for annotations, etc.
In Prometheus, it would be overriden for alert's labels only, but in annotations
the original label value would be available.
See more details here https://github.com/prometheus/compliance/issues/80
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: add support of `resendDelay` flag for alerts
Co-authored-by: dmitryk-dk <dmitry.kozlov@brightlocal.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
vmalert: support configuration file for notifiers
* vmalert notifiers now can be configured via file
see https://docs.victoriametrics.com/vmalert.html#notifier-configuration-file
* add support of Consul service discovery for notifiers config
see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1947
* add UI section for currently loaded/discovered notifiers
* deprecate `-rule.configCheckInterval` in favour of `-configCheckInterval`
* add ability to suppress logs for duplicated targets for notifiers discovery
* change behaviour of `vmalert_alerts_send_errors_total` - it now accounts
for failed alerts, not HTTP calls.
* vmalert: check if remoteWrite is configured for replay mode
The purpose of `replay` mode is to backfill results of recording
or alerting rules. So `remoteWrite.url` should be required.
Otherwise, process can fail on attempt to send data.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update app/vmalert/main.go
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* vmalert: introduce additional HTTP URL params per-group configuration
The new group field `params` allows to configure custom HTTP URL params
per each group. These params will be applied to every request before
executing rule's expression. Hot config reload is also supported.
Field `extra_filter_labels` was deprecated in favour of `params` field.
vmalert will print deprecation log message if config file contains
the deprecated field.
`params` fields are supported by both Prometheus and Graphite datasource types.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: provide more examples for `params` field
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: set higher priority for `params` setting
If there would be a conflict between URL params set in `datasource.url` flag
and params in group definition the latter will have higher priority.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously, vmalert would print an err message and set vmalert_config_last_reload_successful=0
only once during a hot reload of a bad config. Such behaviour may result into non noticed
event of a bad config reload attempt
Now, it continues to print error messages and keep vmalert_config_last_reload_successful state
until successful attempt will be made or config state will be rolled back to prev state.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Regression was introduced during code refactoring. It potentially
could lead to situation when SIGHUP signals were ignored while
vmalert was still busy with initing group manager.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The source link is controlled by `external.url` and `external.alert.source`
flags, in the same way as for alertmanager notifications.
The source link is added to Alerts list view, and specific Alert view.
* vmalert: add flag to limit the max value for auto-resovle duration for alerts
The new flag `rule.maxResolveDuration` suppose to limit max value for
alert.End param, which is used by notifiers like Alertmanager for alerts auto resolve.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1586
* vmalert: allow extra GET params in datasource package
ExtraParams will be added as GET params to every HTTP request made by datasource.
The `roundDigits` param, for example, was substituted by corresponding extra param.
* vmalert: add nocache=1 param for replay process
The `nocache=1` param is VictoriaMetrics specific parameter which prevents it
from caching and boundaries aligning for queries. We set it to avoid cache
pollution in `replay` mode and also to avoid unnecessary time range boundaries
alignment.
* vmalert: mention nocache=1 in replay description
* vmalert: fix bug with unused param
* vmalert: remove `vmalert_execution_duration_seconds` metric
The summary for `vmalert_execution_duration_seconds` metric gives no additional
value comparing to `vmalert_iteration_duration_seconds` metric.
* vmalert: update config reload success metric properly
Previously, if there was unsuccessfull attempt to reload config and then
rollback to previous version - the metric remained set to 0.
* vmalert: add Grafana dashboard to overview application metrics
* docker: include vmalert target into list for scraping
* vmalert: extend notifier metrics with addr label
The change adds an `addr` label to metrics for alerts_sent and alerts_send_errors
to identify which exact address is having issues.
The according change was made to vmalert dashboard.
* vmalert: update documentation and docker environment for vmalert's dashboard
Mention Grafana's dashboard in vmalert's README in a new section #Monitoring.
Update docker-compose env to automatically add vmalert's dashboard.
Update docker-compose README with additional info about services.
* vmalert: support rules backfilling (aka `replay`)
vmalert can `replay` configured rules in the past
and backfill results via remote write protocol.
It supports MetricsQL/PromQL storage as data source,
and can backfill data to remote write compatible
storage.
Supports recording and alerting rules `replay`. See more
details in README.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836
* vmalert: review fixes
* vmalert: readme fixes
New flag `-rule.configCheckInterval` defines how often `vmalert` will re-read
config file. If it detects any changes, the config will be reloaded.
This behaviour is turned off by default.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/512
Alerting rules now can return specific error type ErrStateRestore to indicate
whether restore state procedure failed. Such errors were returned and logged
before as well. But now user can specify whether to just log these errors
(remoteRead.ignoreRestoreErrors=true) or to stop the process
(remoteRead.ignoreRestoreErrors=false). The latter is important when VM isn't
ready yet to serve queries from vmalert and it needs to wait.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252
* changes vmalert Querier with per rule querier
it allows to changes some parametrs based on rule setting
for instance - alert type, tenant for cluster version or event endpoint url.
The commit adds a support for template function `query`,
`first` and `value`. The function `query` executes
a MetricsQL query for active alerts. In vmalert we
update templates on every evaluation for active alerts
to keep them up to date. With `query` func it may become
a perf issue since it will fire a query on every execution.
We should keep it in mind for now.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/539
Description for `-rule` flag uses as example specific chars like asterisks
which could be interpreted wrong by different shells. To avoid this, description
now contains quoted flag values.
See also #708