From 003ef3a51803824ae4bfcc2adb3f53519c5fa70c Mon Sep 17 00:00:00 2001 From: hagen1778 Date: Tue, 24 Oct 2023 09:39:46 +0200 Subject: [PATCH] deployment/alerts: make `TooHighMemoryUsage` more tolerable to spikes Using `min_over_time` should reduce the amount of false positives when component is running in near-the-threshold state. Now it should trigger only if all collected samples were above the threshold on 10m interval. Signed-off-by: hagen1778 --- deployment/docker/alerts-health.yml | 2 +- docs/CHANGELOG.md | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/deployment/docker/alerts-health.yml b/deployment/docker/alerts-health.yml index eb1e2987c..59aa7f050 100644 --- a/deployment/docker/alerts-health.yml +++ b/deployment/docker/alerts-health.yml @@ -35,7 +35,7 @@ groups: Consider to increase the limit as fast as possible." - alert: TooHighMemoryUsage - expr: (process_resident_memory_anon_bytes / vm_available_memory_bytes) > 0.8 + expr: (min_over_time(process_resident_memory_anon_bytes[10m]) / vm_available_memory_bytes) > 0.8 for: 5m labels: severity: critical diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index 6660cc0f2..653fc2717 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -43,6 +43,7 @@ The sandbox cluster installation is running under the constant load generated by * FEATURE: [vmbackup](https://docs.victoriametrics.com/vmbackup.html): add `-filestream.disableFadvise` command-line flag, which can be used for disabling `fadvise` syscall during backup upload to the remote storage. By default `vmbackup` uses `fadvise` syscall in order to prevent from eviction of recently accessed data from the [OS page cache](https://en.wikipedia.org/wiki/Page_cache) when backing up large files. Sometimes the `fadvise` syscall may take significant amounts of CPU when the backup is performed with large value of `-concurrency` command-line flag on systems with big number of CPU cores. In this case it is better to manually disable `fadvise` syscall by passing `-filestream.disableFadvise` command-line flag to `vmbackup`. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5120) for details. * FEATURE: [vmbackup](https://docs.victoriametrics.com/vmbackup.html): add `-deleteAllObjectVersions` command-line flag, which can be used for forcing removal of all object versions in remote object storage. See [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5121) issue and [these docs](https://docs.victoriametrics.com/vmbackup.html#permanent-deletion-of-objects-in-s3-compatible-storages) for the details. * FEATURE: [Alerting rules for VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts): account for `vmauth` component for alerts `ServiceDown` and `TooManyRestarts`. +* FEATURE: [Alerting rules for VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts): make `TooHighMemoryUsage` more tolerable to spikes or near-the-threshold states. The change should reduce number of false positives. * FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): add support for functions, labels, values in autocomplete. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3006). * FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): retain specified time interval when executing a query from `Top Queries`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5097). * FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): improve repeated VMUI page load times by enabling caching of static js and css at web browser side according to [these recommendations](https://developer.chrome.com/docs/lighthouse/performance/uses-long-cache-ttl/).