github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Roman Khavronenko	a086e48964	vmalert: remove notions of vmalert being compatible with VM only (#2954 ) vmalert can be successfully used with datasources compatible with Prometheus HTTP API. So we remove comments or notes in Readme which are saying opposite. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-08-08 09:45:21 +02:00
Roman Khavronenko	2914ce5ca5	vmalert: remove dependency on datasource pkg from config (#2905 ) * vmalert: remove dependency on datasource pkg from config Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-07-22 10:44:55 +02:00
Roman Khavronenko	88edb3f6cf	vmalert: allow configuring custom headers per group (#2901 ) See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2860 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-07-21 15:59:55 +02:00
Roman Khavronenko	45f20ad1aa	vmalert: make `__name__` available for templating in alerts (#2783 ) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-06-27 09:57:56 +02:00
Roman Khavronenko	48a60eb593	vmalert: followup for `76f05f8670` (#2706 ) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-06-09 08:58:25 +02:00
Howie	76f05f8670	feat: rule limit (#2676 ) vmalert: support `limit` param in groups definition `limit` param limits number of time series samples produced by a single rule during execution. On reaching the limit rule will return an err. Signed-off-by: lihaowei <haoweili35@gmail.com>	2022-06-09 08:21:30 +02:00
Andrii Chubatiuk	a531a96193	added reusable templates support (#2532 ) Signed-off-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>	2022-05-14 11:38:44 +02:00
Roman Khavronenko	e9fa363480	Vmalert fix bugs in alerting evaluation (#2557 ) * vmalert: calculate time for firing alert based on the given timestamp Previously, current time was used for checking the `firing` threshold. This is not correct, since alerts are evaluated at specific timestamps. Hence, this specific timestamp supposed to be used in the calculation. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: properly calculate evaluation timestamp for rules Timestamp for rules evaluation should be calculated after the artifical delay for groups start. Otherwise, evaluation timestamp can fall back too far in time. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-05-09 10:11:06 +02:00
Roman Khavronenko	2b59fff526	vmalert: fix labels and annotations processing for alerts (#2403 ) To improve compatibility with Prometheus alerting the order of templates processing has changed. Before, vmalert did all labels processing beforehand. It meant all extra labels (such as `alertname`, `alertgroup` or rule labels) were available in templating. All collisions were resolved in favour of extra labels. In Prometheus, only labels from the received metric are available in templating, so no collisions are possible. This change makes vmalert's behaviour similar to Prometheus. For example, consider alerting rule which is triggered by time series with `alertname` label. In vmalert, this label would be overriden by alerting rule's name everywhere: for alert labels, for annotations, etc. In Prometheus, it would be overriden for alert's labels only, but in annotations the original label value would be available. See more details here https://github.com/prometheus/compliance/issues/80 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-04-06 20:24:45 +02:00
Roman Khavronenko	0989649ad0	Vmalert compliance 2 (#2340 ) * vmalert: split alert's `Start` field into `ActiveAt` and `Start` The `ActiveAt` field identifies when alert becomes active for rules with `for > 0`. Previously, this value was stored in field `Start`. The field `Start` now identifies the moment alert became `FIRING`. The split is needed in order to distinguish these two moments in the API responses for alerts. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: support specific moment of time for rules evaluation The Querier interface was extended to accept a new argument used as a timestamp at which evaluation should be made. It is needed to align rules execution time within the group. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: mark disappeared series as stale Series generated by alerting rules, which were sent to remote write now will be marked as stale if they will disappear on the next evaluation. This would make ALERTS and ALERTS_FOR_TIME series more precise. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: evaluate rules at fixed timestamp Before, time at which rules were evaluated was calculated right before rule execution. The change makes sure that timestamp is calculated only once per evalution round and all rules are using the same timestamp. It also updates the logic of resending of already resolved alert notification. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: allow overridin `alertname` label value if it is present in response Previously, `alertname` was always equal to the Alerting Rule name. Now, its value can be overriden if series in response containt the different value for this label. The change is needed for improving compatibility with Prometheus. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: align rules evaluation in time Now, evaluation timestamp for rules evaluates as if there was no delay in rules evaluation. It means, that rules will be evaluated at fixed timestamps+group_interval. This way provides more consistent evaluation results and improves compatibility with Prometheus, Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: add metric for missed iterations New metric `vmalert_iteration_missed_total` will show whether rules evaluation round was missed. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: reduce delay before the initial rule evaluation in group Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: rollback alertname override According to the spec: ``` The alert name from the alerting rule (HighRequestLatency from the example above) MUST be added to the labels of the alert with the label name as alertname. It MUST override any existing alertname label. ``` https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-3 Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: throw err immediately on dedup detection ``` The execution of an alerting rule MUST error out immediately and MUST NOT send any alerts or add samples to samples receiver if there is more than one alert with the same labels ``` https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-4 Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: cleanup Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: use strings builder to reduce allocs Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-03-29 15:09:07 +02:00
Dmytro Kozlov	11ae1ae924	Added resendDelay for alerts (#2296 ) * vmalert: add support of `resendDelay` flag for alerts Co-authored-by: dmitryk-dk <dmitry.kozlov@brightlocal.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2022-03-16 15:26:33 +00:00
Roman Khavronenko	fb6eab03a2	Vmalert compliance improvements (#2320 ) * vmalert: add support for `sortByLabel` template function * vmalert: update API according to Prometheus conformance program The changes to the API, field names and URL path has been made according to the Prometheus specification for `alert_generator` https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md * vmalert: fix the timestamp of the evaluated rules The timestamp used for alert's `EndsAt` was calculated before sending the notification. While the correct way is to use the timestamp taken right before rules evaluation. * vmalert: add `-datasource.queryTimeAlignment` flag The flag is supposed to provide ability to disable `time` param alignment when executing rules. By default, this flag is enabled, so it remains backward compatible. The flag was introduced to achieve better compatibility with Prometheus behaviour according to https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-03-15 11:54:53 +00:00
Roman Khavronenko	5da71eb685	vmalert: support configuration file for notifiers (#2127 ) vmalert: support configuration file for notifiers * vmalert notifiers now can be configured via file see https://docs.victoriametrics.com/vmalert.html#notifier-configuration-file * add support of Consul service discovery for notifiers config see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1947 * add UI section for currently loaded/discovered notifiers * deprecate `-rule.configCheckInterval` in favour of `-configCheckInterval` * add ability to suppress logs for duplicated targets for notifiers discovery * change behaviour of `vmalert_alerts_send_errors_total` - it now accounts for failed alerts, not HTTP calls.	2022-02-02 14:11:41 +02:00
Roman Khavronenko	2851709745	vmalert: update the order of service labels attaching (#1922 ) Service labels like `alertname` or `alertgroup` were attached after template expanding for `labels` section. Because of this, labels `alertname` or `alertgroup` weren't available for templating in `labels` section of alert's definition. This commit changes the order of labels attaching and adds a test for verifying these labels availability. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1921 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2021-12-10 12:10:26 +02:00
Roman Khavronenko	0afd14a14a	vmalert: introduce additional HTTP URL params per-group configuration (#1892 ) * vmalert: introduce additional HTTP URL params per-group configuration The new group field `params` allows to configure custom HTTP URL params per each group. These params will be applied to every request before executing rule's expression. Hot config reload is also supported. Field `extra_filter_labels` was deprecated in favour of `params` field. vmalert will print deprecation log message if config file contains the deprecated field. `params` fields are supported by both Prometheus and Graphite datasource types. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: provide more examples for `params` field Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: set higher priority for `params` setting If there would be a conflict between URL params set in `datasource.url` flag and params in group definition the latter will have higher priority. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2021-12-02 14:45:08 +02:00
Aliaksandr Valialkin	bf814320b0	app/vmalert: remove `rule.type` config, since it doesnt play well with the upcoming default tenants for -clusterMode It is better from the consistency point of view to set up rule types at group level where tenant config is set up.	2021-11-05 19:52:32 +02:00
Roman Khavronenko	43a7984cd8	vmalert: correctly calculate alert ID including extra labels (#1734 ) Previously, ID for alert entity was generated without alertname or groupname. This led to collision, when multiple alerting rules within the same group producing same labelsets. E.g. expr: `sum(metric1) by (job) > 0` and expr: `sum(metric2) by (job) > 0` could result into same labelset `job: "job"`. The issue affects only UI and Web API parts of vmalert, because alert ID is used only for displaying and finding active alerts. It does not affect state restore procedure, since this label was added right before pushing to remote storage. The change now adds all extra labels right after receiving response from the datasource. And removes adding extra labels before pushing to remote storage. Additionally, change introduces a new flag `Restored` which will be displayed in UI for alerts which have been restored from remote storage on restart.	2021-10-22 12:30:38 +03:00
Roman Khavronenko	8df3c569c7	vmalert: add Source link to alerts UI (#1701 ) The source link is controlled by `external.url` and `external.alert.source` flags, in the same way as for alertmanager notifications. The source link is added to Alerts list view, and specific Alert view.	2021-10-13 15:25:11 +03:00
Roman Khavronenko	21f022e5f0	vmalert: add initial UI implementation (#1602 ) New UI pages: / - welcome page with API handlers list; /groups - list of all rules per group; /alerts - list of all active alerts; /groupID/alertID/status - status of the active alert;	2021-09-07 22:39:22 +03:00
Roman Khavronenko	9ee3d0378f	vmalert: add flag `disableAlertgroupLabel` for disabling extra label added to series (#1534 ) The new label added in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/611 may negatively impact deduplication in Alertmanager. The new flag supposed to give an option to disable adding this label. To enable flag just add `-disableAlertgroupLabel` to binary execution command. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1532	2021-08-21 20:08:55 +03:00
Roman Khavronenko	7416fdaa8b	vmalert: expose new metrics for tracking number of produced samples during last evaluation (#1518 ) * vmalert: expose new metrics for tracking number of produced samples during last evaluation Two new metrics were added to track the number of samples produced during the last evaluation: * vmalert_recording_rules_last_evaluation_samples * vmalert_alerting_rules_last_evaluation_samples The gauge type is used to remain consistent with Prometheus metric `prometheus_rule_group_last_evaluation_samples` which is on the group level. However, the counter type was considered as well. Two metrics instead of one are used to make it easier to separate recording and alerting rules. It is likely, number of samples produced by recording rules is more important so people will refer to it more frequently. The expected usage of the new metric is the following: ``` - alert: RecordingRuleReturnsEmptyResults expr: sum(vmalert_recording_rules_last_evaluation_samples) by(recording) < 1 annotations: summary: Recording rule {{$labels.recording}} returns empty results. Please verify expression correctness. ``` Addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1494 * vmalert: rename `vmalert_alerts_error` to `vmalert_alerting_rules_error` to remain consistent with recording rules metrics	2021-08-05 09:59:46 +03:00
Roman Khavronenko	2a259ef5e7	vmalert: support rules backfilling (aka `replay`) (#1358 ) * vmalert: support rules backfilling (aka `replay`) vmalert can `replay` configured rules in the past and backfill results via remote write protocol. It supports MetricsQL/PromQL storage as data source, and can backfill data to remote write compatible storage. Supports recording and alerting rules `replay`. See more details in README. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836 * vmalert: review fixes * vmalert: readme fixes	2021-06-09 12:20:38 +03:00
Roman Khavronenko	84cc0513e1	vmalert: support `extra_filter_labels` setting per-group (#1319 ) The new setting `extra_filter_labels` may be assigned to group. If it is, then all rules within a group will automatically filter for configured labels. The feature is well-described here https://docs.victoriametrics.com#prometheus-querying-api-enhancements New setting is compatible only with VM datasource.	2021-05-23 00:26:01 +03:00
Roman Khavronenko	4247168a2d	vmalert: fix error when rule didn't start if restore failed (#1279 ) Previously, `startGroup` could exit on restore errors despite the `remoteRead.ignoreRestoreErrors` flag value. Now vmalert checks the flag value before deciding whether to return error or just log it.	2021-05-10 11:06:31 +03:00
Aliaksandr Valialkin	237e9f9fd7	app/vmalert: add missing comment for ErrStateRestore	2021-05-08 15:59:35 +03:00
Roman Khavronenko	9cdd4696fe	vmalert: add flag to control behaviour on startup for state restore errors (#1265 ) Alerting rules now can return specific error type ErrStateRestore to indicate whether restore state procedure failed. Such errors were returned and logged before as well. But now user can specify whether to just log these errors (remoteRead.ignoreRestoreErrors=true) or to stop the process (remoteRead.ignoreRestoreErrors=false). The latter is important when VM isn't ready yet to serve queries from vmalert and it needs to wait. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252	2021-05-05 10:07:19 +03:00
Roman Khavronenko	f3a048288e	Vmalert: adjust `time` param for datasource queries according to `evaluationInterval` (#1257 ) * Simplify arguments list for fn `queryDataSource` to improve readbility * vmalert: adjust `time` param according to rule evaluation interval With this change, vmalert will start to use rule's evaluation interval for truncating the `time` param. This is mostly needed to produce consistent time series with timestamps unaffected by vmalert start time. Now, timestamp becomes predictable. Additionally, adjustment is similar to what Grafana does for plotting range graphs. Hence, recording rule series and recording rule expression plotted in grafana suppose to become similar in most of cases.	2021-04-30 09:46:03 +03:00
Nikolay	15609ee447	changes vmalert Querier with per rule querier (#1249 ) * changes vmalert Querier with per rule querier it allows to changes some parametrs based on rule setting for instance - alert type, tenant for cluster version or event endpoint url.	2021-04-28 21:41:15 +01:00
Nikolay	195341a7cf	Graphite vmalert wip (#112 ) * init implementation for graphite alerts * adds graphite support for vmalert * small fix * changes vmalert graphite api with type * updates tests * small fix * fixes graphite parse * Fixes graphite from time	2021-02-01 15:05:32 +02:00
Roman Khavronenko	404cbd1522	vmalert-974: fix order for labels templating (#975 ) The change fixes bug caused by `3adf8c5a6f`. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/974	2020-12-19 14:10:59 +02:00
Roman Khavronenko	6247884057	vmalert: add function "query", "first" and "value" to alert templates functions (#960 ) The commit adds a support for template function `query`, `first` and `value`. The function `query` executes a MetricsQL query for active alerts. In vmalert we update templates on every evaluation for active alerts to keep them up to date. With `query` func it may become a perf issue since it will fire a query on every execution. We should keep it in mind for now. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/539	2020-12-14 20:11:45 +02:00
Roman Khavronenko	3adf8c5a6f	vmalert: explicitly set extra labels to alert entities (#886 ) The previous implementation treated extra labels (global and rule labels) as separate label set to returned time series labels. Hence, time series always contained only original labels and alert ID was generated from sorted labels key-values. Extra labels didn't affect the generated ID and were applied on the following actions: - templating for Summary and Annotations; - persisting state via remote write; - restoring state via remote read. Such behaviour caused difficulties on restore procedure because extra labels had to be dropped before checking the alert ID, but that not always worked. Consider the case when expression returns the following time series `up{job="foo"}` and rule has extra label `job=bar`. This would mean that restored alert ID will be always different to the real time series because of collision. To solve the situation extra labels are now always applied beforehand and `vmalert` doesn't store original labels anymore. However, this could result into a new error situation. Consider the case when expression returns two time series `up{job="foo"}` and `up{job="baz"}`, while rule has extra label `job=bar`. In such case, applying extra labels will result into two identical time series and `vmalert` will return error: `result contains metrics with the same labelset after applying rule labels` https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870	2020-11-10 00:27:32 +02:00
Roman Khavronenko	f0bdc5716e	vmalert: skip automatically added labels on alerts restore (#871 ) Label `alertgroup` was introduced in #611 and automatically added to generated time series. By mistake, this new label wasn't correctly purged on restore event and affected alert's ID uniqueness. This commit removes `alertgroup` label in restore function. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870	2020-10-30 08:18:20 +00:00
Aliaksandr Valialkin	f4e8687c88	app/vmalert: accept days, weeks and years in `for:` part of config like Prometheus does Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817	2020-10-08 20:13:15 +03:00
Roman Khavronenko	6ad6480400	vmalert: add Group name as label to generated alerts and timeseries (#761 ) Solves #611	2020-09-11 20:52:56 +01:00
Roman Khavronenko	0be5b09fb4	app/vmalert: extend metrics set exported by `vmalert` #573 (#654 ) * app/vmalert: extend metrics set exported by `vmalert` #573 New metrics were added to improve observability: + vmalert_alerts_pending{alertname, group} - number of pending alerts per group per alert; + vmalert_alerts_acitve{alertname, group} - number of active alerts per group per alert; + vmalert_alerts_error{alertname, group} - is 1 if alertname ended up with error during prev execution, is 0 if no errors happened; + vmalert_recording_rules_error{recording, group} - is 1 if recording rule ended up with error during prev execution, is 0 if no errors happened; * vmalert_iteration_total{group, file} - now contains group and file name labels. This should improve control over specific groups; * vmalert_iteration_duration_seconds{group, file} - now contains group and file name labels. This should improve control over specific groups; Some collisions for alerts and recording rules are possible, because neither group name nor alert/recording rule name are unique for compatibility reasons. Commit contains list of TODOs for Unregistering metrics since groups and rules are ephemeral and could be removed without application restart. In order to unlock Unregistering feature corresponding PR was filed - https://github.com/VictoriaMetrics/metrics/pull/13 * app/vmalert: extend metrics set exported by `vmalert` #573 The changes are following: * add an ID label to rules metrics, since `name` collisions within one group is a common case - see the k8s example alerts; * supports metrics unregistering on rule updates. Consider the case when one rule was added or removed from the group, or the whole group was added or removed. The change depends on https://github.com/VictoriaMetrics/metrics/pull/16 where race condition for Unregister method was fixed.	2020-08-09 09:41:29 +03:00
Roman Khavronenko	2f1e7298ce	app/vmalert: support `external.label` to specify global labelset for all rules #622 (#652 ) `external.label` flag supposed to help to distinguish alert or recording rules source in situations when more than one `vmalert` runs for the same datasource or AlertManager.	2020-07-28 14:20:31 +03:00
Aliaksandr Valialkin	d5dddb0953	all: use %w instead of %s for wrapping errors in `fmt.Errorf` This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode . See https://blog.golang.org/go1.13-errors for details.	2020-06-30 23:05:11 +03:00
Roman Khavronenko	e91d758831	vmalert-537: allow name duplication for rules within one group. (#559 ) Uniqueness of rule is now defined by combination of its name, expression and labels. The hash of the combination is now used as rule ID and identifies rule within the group. Set of rules from coreos/kube-prometheus was added for testing purposes to verify compatibility. The check also showed that `vmalert` doesn't support `query` template function that was mentioned as limitation in README.	2020-06-15 20:15:47 +01:00
Aliaksandr Valialkin	d2f30e8d79	app/vmalert: fix comment for UpdateWith exported methods	2020-06-01 14:35:32 +03:00
Roman Khavronenko	270552fde4	vmalert: Add recording rules support. (#519 ) * vmalert: Add recording rules support. Recording rules support required additional service refactoring since it wasn't planned to support them from the very beginning. The list of changes is following: * new entity RecordingRule was added for writing results of MetricsQL expressions into remote storage; * interface Rule now unites both recording and alerting rules; * configuration parser was moved to separate package and now performs more strict validation; * new endpoint for listing all groups and rules in json format was added; * evaluation interval may be set to every particular group; * vmalert: uncomment tests * vmalert: rm outdated TODO * vmalert: fix typos in README	2020-06-01 13:46:37 +03:00

41 commits