From d349d6a9ce0e37795b810554214c2f376064c2ea Mon Sep 17 00:00:00 2001
From: hagen1778 <roman@victoriametrics.com>
Date: Tue, 24 Oct 2023 09:39:46 +0200
Subject: [PATCH] deployment/alerts: make `TooHighMemoryUsage` more tolerable
 to spikes

Using `min_over_time` should reduce the amount of false positives when
component is running in near-the-threshold state. Now it should trigger
only if all collected samples were above the threshold on 10m interval.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 003ef3a51803824ae4bfcc2adb3f53519c5fa70c)
---
 deployment/docker/alerts-health.yml | 2 +-
 docs/CHANGELOG.md                   | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/deployment/docker/alerts-health.yml b/deployment/docker/alerts-health.yml
index eb1e2987c3..59aa7f0509 100644
--- a/deployment/docker/alerts-health.yml
+++ b/deployment/docker/alerts-health.yml
@@ -35,7 +35,7 @@ groups:
           Consider to increase the limit as fast as possible."
 
       - alert: TooHighMemoryUsage
-        expr: (process_resident_memory_anon_bytes / vm_available_memory_bytes) > 0.8
+        expr: (min_over_time(process_resident_memory_anon_bytes[10m]) / vm_available_memory_bytes) > 0.8
         for: 5m
         labels:
           severity: critical
diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
index b6976d5bd8..529bc59a73 100644
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@@ -41,6 +41,7 @@ The sandbox cluster installation is running under the constant load generated by
 * FEATURE: [vmbackup](https://docs.victoriametrics.com/vmbackup.html): add `-filestream.disableFadvise` command-line flag, which can be used for disabling `fadvise` syscall during backup upload to the remote storage. By default `vmbackup` uses `fadvise` syscall in order to prevent from eviction of recently accessed data from the [OS page cache](https://en.wikipedia.org/wiki/Page_cache) when backing up large files. Sometimes the `fadvise` syscall may take significant amounts of CPU when the backup is performed with large value of `-concurrency` command-line flag on systems with big number of CPU cores. In this case it is better to manually disable `fadvise` syscall by passing `-filestream.disableFadvise` command-line flag to `vmbackup`. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5120) for details.
 * FEATURE: [vmbackup](https://docs.victoriametrics.com/vmbackup.html): add `-deleteAllObjectVersions` command-line flag, which can be used for forcing removal of all object versions in remote object storage. See [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5121) issue and [these docs](https://docs.victoriametrics.com/vmbackup.html#permanent-deletion-of-objects-in-s3-compatible-storages) for the details.
 * FEATURE: [Alerting rules for VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts): account for `vmauth` component for alerts `ServiceDown` and `TooManyRestarts`.
+* FEATURE: [Alerting rules for VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts): make `TooHighMemoryUsage` more tolerable to spikes or near-the-threshold states. The change should reduce number of false positives.
 * FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): add support for functions, labels, values in autocomplete. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3006).
 * FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): retain specified time interval when executing a query from `Top Queries`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5097).
 * FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): improve repeated VMUI page load times by enabling caching of static js and css at web browser side according to [these recommendations](https://developer.chrome.com/docs/lighthouse/performance/uses-long-cache-ttl/).