From d71c745d19c5869e6c79a4018e941471462b8053 Mon Sep 17 00:00:00 2001 From: Roman Khavronenko Date: Thu, 31 Oct 2024 14:03:08 +0100 Subject: [PATCH] deployment/alerts: add RemoteWriteDroppingData to vmalert rules (#7393) ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are **mandatory**: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 (cherry picked from commit 3f0e2ab3b2128b9a5909ee9efca9775146063623) --- deployment/docker/rules/alerts-vmalert.yml | 11 +++++++++++ docs/changelog/CHANGELOG.md | 1 + 2 files changed, 12 insertions(+) diff --git a/deployment/docker/rules/alerts-vmalert.yml b/deployment/docker/rules/alerts-vmalert.yml index 07d58fa8f..182e64337 100644 --- a/deployment/docker/rules/alerts-vmalert.yml +++ b/deployment/docker/rules/alerts-vmalert.yml @@ -74,6 +74,17 @@ groups: description: "vmalert instance {{ $labels.instance }} is failing to push metrics generated via alerting or recording rules to the configured remote write URL. Check vmalert's logs for detailed error message." + - alert: RemoteWriteDroppingData + expr: increase(vmalert_remotewrite_dropped_rows_total[5m]) > 0 + for: 5m + labels: + severity: critical + annotations: + summary: "vmalert instance {{ $labels.instance }} is dropping data sent to remote write URL" + description: "vmalert instance {{ $labels.instance }} is failing to send results of alerting or recording rules + to the configured remote write URL. This may result into gaps in recording rules or alerts state. + Check vmalert's logs for detailed error message." + - alert: AlertmanagerErrors expr: increase(vmalert_alerts_send_errors_total[5m]) > 0 for: 15m diff --git a/docs/changelog/CHANGELOG.md b/docs/changelog/CHANGELOG.md index 4f1e4f095..59eae084d 100644 --- a/docs/changelog/CHANGELOG.md +++ b/docs/changelog/CHANGELOG.md @@ -23,6 +23,7 @@ See also [LTS releases](https://docs.victoriametrics.com/lts-releases/). * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent/): support scraping from Kubernetes Native Sidecars. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7287). * FEATURE: [Single-node VictoriaMetrics](https://docs.victoriametrics.com/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/cluster-victoriametrics/): add a separate cache type for storing sparse entries when performing large index scans. This significantly reduces memory usage when applying [downsampling filters](https://docs.victoriametrics.com/#downsampling) and [retention filters](https://docs.victoriametrics.com/#retention-filters) during background merge. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7182) for the details. * FEATURE: [dashboards](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/dashboards) for VM single-node, cluster, vmalert, vmagent, VictoriaLogs: add `Restarts` panel to show the events of process restarts. This panel should help correlate events of restart with unexpected behavior of processes. +* FEATURE: [alerts](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/rules/alerts-vmalert.yml): add alerting rule `RemoteWriteDroppingData` to track number of dropped samples that weren't sent to remote write URL. * BUGFIX: [dashboards](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/dashboards) for Single-node VictoriaMetrics, cluster: The free disk space calculation now will subtract the size of the `-storage.minFreeDiskSpaceBytes` flag to correctly display the remaining available space of Single-node VictoriaMetrics/vmstorage rather than the actual available disk space, as well as the full ETA. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7334) for the details. * BUGFIX: [vmalert](https://docs.victoriametrics.com/vmalert): properly set `group_name` and `file` fields for recording rules in `/api/v1/rules`.