deployment/alerts: add TooManyMissedIterations alerting rule

The new rule for vmalert supposed to detect groups that miss their
evaulations due to slow queries.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit is contained in:
hagen1778 2023-10-31 10:30:05 +01:00
parent 8874b525b7
commit 9866974a53
No known key found for this signature in database
GPG key ID: 3BF75F3741CA9640
2 changed files with 14 additions and 0 deletions

View file

@ -51,6 +51,19 @@ groups:
produces 0 samples over the last 30min. It might be caused by a misconfiguration
or incorrect query expression."
- alert: TooManyMissedIterations
expr: sum(increase(vmalert_iteration_missed_total[5m])) by(job, instance, group) > 0
for: 15m
labels:
severity: warning
annotations:
summary: "vmalert instance {{ $labels.instance }} is missing rules evaluations"
description: "vmalert instance {{ $labels.instance }} is missing rules evaluations for group \"{{ $labels.group }}\".
The group evaluation time takes longer than the configured evaluation interval. This may result in missed
alerting notifications or recording rules samples. Try increasing evaluation interval or concurrency of
group \"{{ $labels.group }}\". See https://docs.victoriametrics.com/vmalert.html#groups.
If rule expressions are taking longer than expected, please see https://docs.victoriametrics.com/Troubleshooting.html#slow-queries."
- alert: RemoteWriteErrors
expr: sum(increase(vmalert_remotewrite_errors_total[5m])) by(job, instance) > 0
for: 15m

View file

@ -49,6 +49,7 @@ The sandbox cluster installation is running under the constant load generated by
* FEATURE: [vmbackup](https://docs.victoriametrics.com/vmbackup.html): add `-deleteAllObjectVersions` command-line flag, which can be used for forcing removal of all object versions in remote object storage. See [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5121) issue and [these docs](https://docs.victoriametrics.com/vmbackup.html#permanent-deletion-of-objects-in-s3-compatible-storages) for the details.
* FEATURE: [Alerting rules for VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts): account for `vmauth` component for alerts `ServiceDown` and `TooManyRestarts`.
* FEATURE: [Alerting rules for VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts): make `TooHighMemoryUsage` more tolerable to spikes or near-the-threshold states. The change should reduce number of false positives.
* FEATURE: [Alerting rules for VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts): add `TooManyMissedIterations` alerting rule for vmalert to detect groups that miss their evaulations due to slow queries.
* FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): add support for functions, labels, values in autocomplete. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3006).
* FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): retain specified time interval when executing a query from `Top Queries`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5097).
* FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): improve repeated VMUI page load times by enabling caching of static js and css at web browser side according to [these recommendations](https://developer.chrome.com/docs/lighthouse/performance/uses-long-cache-ttl/).