Commit graph

12 commits

Author SHA1 Message Date
Aliaksandr Valialkin
ec40affb59
deployment/docker/alerts.yml: formatting fixes after 865a60f13e 2021-10-19 08:53:03 +03:00
Yurii Kravets
865a60f13e
Update alerts.yml
Added Series Limit day\hour alerts
2021-10-18 18:14:49 +03:00
Roman Khavronenko
0f4bcc00b2
Single dashboards upd (#1593)
* dasbhoard: replace `null` datasources

null datasource value may confuse Grafana and make it drop panel query in some
versions.

* docker: bump grafana image version

* dashboards: add URL variable selector to vmagent dashboard

* dashboards: add new panel `Remote write connection saturation` to vmagent dashboard

* alerts: add new alert for `Remote write connection saturation` panel of vmagent dashboard

* dashboards: add "Logging rate" panel to vmagent dashboard
2021-09-01 11:46:22 +03:00
Roman Khavronenko
408ba43092
Alerts single update (#1510)
* alerts: move `ProcessNearFDLimits` to `vm-health` group since it is relevant for all services

* alerts: add new `TooHighMemoryUsage` alerting rule
2021-08-02 15:51:24 +03:00
Roman Khavronenko
2f54559c89
alerts: sync alert expression for DiskRunsOutOfSpaceIn3Days with dashboard (#1436) 2021-07-07 10:31:09 +03:00
Roman Khavronenko
5e9f3777bf
alerts: add new alert LabelsLimitExceededOnIngestion (#1359) 2021-06-09 12:15:36 +03:00
k1rk
668165f53d
rename serviceHealth group name to vm-health (#1360)
this causes conflicts in `victoria-metrics-k8s-stack` chart =)
2021-06-08 23:34:38 +03:00
Roman Khavronenko
162681e60d
add new alerts (#1195)
* alerts: backport `DiskRunsOutOfSpace` alert and some other tweaks from cluster branch

* alerts: add `ServiceDown` alert to detect "dead" services
2021-04-08 18:24:25 +03:00
Roman Khavronenko
cfdb6762e6
deployment: add new alert TooHighChurnRate24h (#1154)
Alert `TooHighChurnRate24h` suppose to cover cases when churn rate
is low but results in multiple times higher number than total
number of active series.
2021-03-29 12:38:03 +03:00
Roman Khavronenko
b457739f87
Single dashboard (#1126)
* dashboard: update single node dashboard

* add panel `Open FDs` for file descriptors metrics;
* add panel `Disk writes/reads` to show the real read/write
load on storage layer;
* add `process_resident_memory_bytes` metric to memory usage panel;
* add stats panel to show available CPUs, memory and disk space;
* rm flags panel since it didn't prove its usefulness.

* alerts: add alert for reaching FDs limit
2021-03-15 12:04:24 +02:00
Roman Khavronenko
14f0f90507
docker-compose: provide the example list of alerting rules for vm components (#1005)
List contains examples for the alerting rules which might be executed
via `vmalert` to track the health state of VM components. It is assumed
that list will be revised and calibrated for each system individually.
2021-01-11 13:03:15 +02:00
Artem Navoiev
4e391a5e39
[deployment] add vmalert + alertmanager to docker compose (#885) 2020-11-07 17:00:23 +02:00