mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2024-11-21 14:44:00 +00:00
alerts: simplify aggregation of alerting rules
This is follow-up after
75196d7234
It updates some of the alerting rules to remove unnecessary aggregations.
It keeps aggregations for expressions which are using multiple time series
filters to make sure their label will match.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit is contained in:
parent
75196d7234
commit
8fb68152e6
4 changed files with 13 additions and 13 deletions
|
@ -81,7 +81,7 @@ groups:
|
||||||
Possible reasons for errors are misconfiguration, overload, network blips or unreachable components."
|
Possible reasons for errors are misconfiguration, overload, network blips or unreachable components."
|
||||||
|
|
||||||
- alert: RowsRejectedOnIngestion
|
- alert: RowsRejectedOnIngestion
|
||||||
expr: sum(rate(vm_rows_ignored_total[5m])) by (instance, reason) > 0
|
expr: rate(vm_rows_ignored_total[5m]) > 0
|
||||||
for: 15m
|
for: 15m
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
|
@ -113,7 +113,7 @@ groups:
|
||||||
expr: |
|
expr: |
|
||||||
sum(increase(vm_new_timeseries_created_total[24h]))
|
sum(increase(vm_new_timeseries_created_total[24h]))
|
||||||
>
|
>
|
||||||
(sum(vm_cache_entries{type="storage/hour_metric_ids"})* 3)
|
(sum(vm_cache_entries{type="storage/hour_metric_ids"}) * 3)
|
||||||
for: 15m
|
for: 15m
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
|
@ -155,7 +155,7 @@ groups:
|
||||||
Consider to increase the limit as fast as possible."
|
Consider to increase the limit as fast as possible."
|
||||||
|
|
||||||
- alert: LabelsLimitExceededOnIngestion
|
- alert: LabelsLimitExceededOnIngestion
|
||||||
expr: sum(increase(vm_metrics_with_dropped_labels_total[5m])) by (instance) > 0
|
expr: increase(vm_metrics_with_dropped_labels_total[5m]) > 0
|
||||||
for: 15m
|
for: 15m
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
|
|
|
@ -55,7 +55,7 @@ groups:
|
||||||
Consider to either increase available CPU resources or decrease the load on the process."
|
Consider to either increase available CPU resources or decrease the load on the process."
|
||||||
|
|
||||||
- alert: TooManyLogs
|
- alert: TooManyLogs
|
||||||
expr: sum(increase(vm_log_messages_total{level="error"}[5m])) by (job, instance) > 0
|
expr: sum(increase(vm_log_messages_total{level="error"}[5m])) without (app_version, location) > 0
|
||||||
for: 15m
|
for: 15m
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
|
@ -65,7 +65,7 @@ groups:
|
||||||
Worth to check logs for specific error messages."
|
Worth to check logs for specific error messages."
|
||||||
|
|
||||||
- alert: TooManyTSIDMisses
|
- alert: TooManyTSIDMisses
|
||||||
expr: sum(rate(vm_missing_tsids_for_metric_id_total[5m])) by (job, instance) > 0
|
expr: rate(vm_missing_tsids_for_metric_id_total[5m]) > 0
|
||||||
for: 10m
|
for: 10m
|
||||||
labels:
|
labels:
|
||||||
severity: critical
|
severity: critical
|
||||||
|
|
|
@ -18,7 +18,7 @@ groups:
|
||||||
Check vmalert's logs for detailed error message."
|
Check vmalert's logs for detailed error message."
|
||||||
|
|
||||||
- alert: AlertingRulesError
|
- alert: AlertingRulesError
|
||||||
expr: sum(increase(vmalert_alerting_rules_errors_total[5m])) by(job, instance, group, file) > 0
|
expr: sum(increase(vmalert_alerting_rules_errors_total[5m])) without(alertname, id) > 0
|
||||||
for: 5m
|
for: 5m
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
|
@ -29,7 +29,7 @@ groups:
|
||||||
Check vmalert's logs for detailed error message."
|
Check vmalert's logs for detailed error message."
|
||||||
|
|
||||||
- alert: RecordingRulesError
|
- alert: RecordingRulesError
|
||||||
expr: sum(increase(vmalert_recording_rules_errors_total[5m])) by(job, instance, group, file) > 0
|
expr: sum(increase(vmalert_recording_rules_errors_total[5m])) without(recording, id) > 0
|
||||||
for: 5m
|
for: 5m
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
|
@ -40,7 +40,7 @@ groups:
|
||||||
Check vmalert's logs for detailed error message."
|
Check vmalert's logs for detailed error message."
|
||||||
|
|
||||||
- alert: RecordingRulesNoData
|
- alert: RecordingRulesNoData
|
||||||
expr: sum(vmalert_recording_rules_last_evaluation_samples) by(job, group, recording, file) < 1
|
expr: sum(vmalert_recording_rules_last_evaluation_samples) without(recording, id) < 1
|
||||||
for: 30m
|
for: 30m
|
||||||
labels:
|
labels:
|
||||||
severity: info
|
severity: info
|
||||||
|
@ -52,7 +52,7 @@ groups:
|
||||||
or incorrect query expression."
|
or incorrect query expression."
|
||||||
|
|
||||||
- alert: TooManyMissedIterations
|
- alert: TooManyMissedIterations
|
||||||
expr: sum(increase(vmalert_iteration_missed_total[5m])) by(job, instance, group, file) > 0
|
expr: increase(vmalert_iteration_missed_total[5m]) > 0
|
||||||
for: 15m
|
for: 15m
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
|
@ -65,7 +65,7 @@ groups:
|
||||||
If rule expressions are taking longer than expected, please see https://docs.victoriametrics.com/Troubleshooting.html#slow-queries."
|
If rule expressions are taking longer than expected, please see https://docs.victoriametrics.com/Troubleshooting.html#slow-queries."
|
||||||
|
|
||||||
- alert: RemoteWriteErrors
|
- alert: RemoteWriteErrors
|
||||||
expr: sum(increase(vmalert_remotewrite_errors_total[5m])) by(job, instance) > 0
|
expr: increase(vmalert_remotewrite_errors_total[5m]) > 0
|
||||||
for: 15m
|
for: 15m
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
|
@ -75,7 +75,7 @@ groups:
|
||||||
or recording rules to the configured remote write URL. Check vmalert's logs for detailed error message."
|
or recording rules to the configured remote write URL. Check vmalert's logs for detailed error message."
|
||||||
|
|
||||||
- alert: AlertmanagerErrors
|
- alert: AlertmanagerErrors
|
||||||
expr: sum(increase(vmalert_alerts_send_errors_total[5m])) by(job, instance, addr) > 0
|
expr: increase(vmalert_alerts_send_errors_total[5m]) > 0
|
||||||
for: 15m
|
for: 15m
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
|
|
|
@ -61,7 +61,7 @@ groups:
|
||||||
Please verify if clients are sending correct requests."
|
Please verify if clients are sending correct requests."
|
||||||
|
|
||||||
- alert: RowsRejectedOnIngestion
|
- alert: RowsRejectedOnIngestion
|
||||||
expr: sum(rate(vm_rows_ignored_total[5m])) by (instance, reason) > 0
|
expr: rate(vm_rows_ignored_total[5m]) > 0
|
||||||
for: 15m
|
for: 15m
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
|
@ -124,7 +124,7 @@ groups:
|
||||||
See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3976#issuecomment-1476883183"
|
See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3976#issuecomment-1476883183"
|
||||||
|
|
||||||
- alert: LabelsLimitExceededOnIngestion
|
- alert: LabelsLimitExceededOnIngestion
|
||||||
expr: sum(increase(vm_metrics_with_dropped_labels_total[5m])) by (instance) > 0
|
expr: increase(vm_metrics_with_dropped_labels_total[5m]) > 0
|
||||||
for: 15m
|
for: 15m
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
|
|
Loading…
Reference in a new issue