VictoriaMetrics/app
Roman Khavronenko 0be5b09fb4
app/vmalert: extend metrics set exported by vmalert #573 (#654)
* app/vmalert: extend metrics set exported by `vmalert` #573

New metrics were added to improve observability:
+ vmalert_alerts_pending{alertname, group} - number of pending alerts per group
per alert;
+ vmalert_alerts_acitve{alertname, group} - number of active alerts per group
per alert;
+ vmalert_alerts_error{alertname, group} - is 1 if alertname ended up with error
during prev execution, is 0 if no errors happened;
+ vmalert_recording_rules_error{recording, group} - is 1 if recording rule
 ended up with error during prev execution, is 0 if no errors happened;
* vmalert_iteration_total{group, file} - now contains group and file name labels.
This should improve control over specific groups;
* vmalert_iteration_duration_seconds{group, file} - now contains group and file name labels. This should improve control over specific groups;

Some collisions for alerts and recording rules are possible, because neither
group name nor alert/recording rule name are unique for compatibility reasons.

Commit contains list of TODOs for Unregistering metrics since groups and rules
are ephemeral and could be removed without application restart. In order to
unlock Unregistering feature corresponding PR was filed - https://github.com/VictoriaMetrics/metrics/pull/13

* app/vmalert: extend metrics set exported by `vmalert` #573

The changes are following:
* add an ID label to rules metrics, since `name` collisions within one group is
a common case - see the k8s example alerts;
* supports metrics unregistering on rule updates. Consider the case when one rule
was added or removed from the group, or the whole group was added or removed.

The change depends on https://github.com/VictoriaMetrics/metrics/pull/16
where race condition for Unregister method was fixed.
2020-08-09 09:41:29 +03:00
..
victoria-metrics app/vmselect/prometheus: do not adjust last points in time series with timestamps exceeding the current time 2020-07-14 12:52:16 +03:00
vmagent app/vmagent: tune http client for sending data to remote storage in order to disable closing keep-alive connections 2020-08-04 21:00:29 +03:00
vmalert app/vmalert: extend metrics set exported by vmalert #573 (#654) 2020-08-09 09:41:29 +03:00
vmauth 401 Unauthorize HTTP error added (#681) 2020-08-09 09:38:41 +03:00
vmbackup all: add mssing APP_NAME to vm*-GOARCH builds 2020-07-31 13:42:18 +03:00
vminsert app/{vmagent,vminsert}: properly preserve db tag from query string passed to Influx line protocol query 2020-07-28 21:25:19 +03:00
vmrestore all: add mssing APP_NAME to vm*-GOARCH builds 2020-07-31 13:42:18 +03:00
vmselect app/vmselect/promql: properly handle -n^m like Prometheus does 2020-08-07 07:42:18 +03:00
vmstorage app/vmstorage: rename vm_cache_size_entries{type="storage/prefetchedMetricIDs"} to vm_cache_entries{type="storage/prefetchedMetricIDs"} to be consistent with other vm_cache_entries metrics 2020-08-06 16:34:24 +03:00