github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
hagen1778	a5c002edef	deployment/alerts: fix copy&paste typo in TooHighGoroutineSchedulingLatency Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-09-24 11:48:19 +02:00
Roman Khavronenko	4d0b41e63b	deployment: add panel and alerts for displying go scheduler latency (#7078 ) The panel and alerting rule should help to understand whether VM component doesn't have enough CPU resources or gets throttled. The alert is applicable for all VM components. The panel was added to vmalert, vmagent, vmsingle, vm clusert and victorialogs dashes. ------------------- This alerting rule should have help us identify resource shortage for sandbox vmagent - see [this link](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/?g0.range_input=23d13h25m25s424ms&g0.end_input=2024-09-23T14%3A11%3A00&g0.relative_time=none&g0.tab=0&g0.expr=histogram_quantile%280.99%2C+sum%28rate%28go_sched_latencies_seconds_bucket%7Bjob%3D%22vmagent-monitoring-vmagent%22%7D%5B5m%5D%29%29+by+%28le%2C+job%2C+instance%29%29+%3E+0.1) for example. We weren't aware of resource shortage, because VM metrics assumed this vmagent had 1vCPU while in fact its limit was 0.2vCPU. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-09-23 16:54:42 +02:00
Artem Navoiev	508946ed9d	deployment docker: use line formatting in alerts-health fixes #6393 (#6394 ) ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Artem Navoiev <tenmozes@gmail.com>	2024-06-03 13:31:53 +04:00
hagen1778	1be1e9a7a4	deployment/alerts: add new alerting rules `TooLongLabelValues` and `TooLongLabelNames` to notify about truncation of label values or names respectively. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-24 15:09:52 +02:00
Nikolay	69d244e6fb	lib/mergeset: adds tracking for indexdb records drop (#6297 ) It allows to create alert for possible item drops at indexdb. It may happen, if ingested metric size exceeds max indexdb item size. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-05-24 14:55:20 +02:00
hagen1778	8fb68152e6	alerts: simplify aggregation of alerting rules This is follow-up after `75196d7234` It updates some of the alerting rules to remove unnecessary aggregations. It keeps aggregations for expressions which are using multiple time series filters to make sure their label will match. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-12-11 15:17:30 +01:00
hagen1778	4e0a779efe	deployment/alerts: update `TooHighMemoryUsage` annotation The memory usage isn't measured on 5m interval anymore. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-24 09:53:44 +02:00
hagen1778	003ef3a518	deployment/alerts: make `TooHighMemoryUsage` more tolerable to spikes Using `min_over_time` should reduce the amount of false positives when component is running in near-the-threshold state. Now it should trigger only if all collected samples were above the threshold on 10m interval. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-24 09:39:46 +02:00
hagen1778	de651165bd	alerting: account for `vmauth` component for alerts `ServiceDown` and `TooManyRestarts` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-03 16:45:33 +02:00
hagen1778	2e4d0d0e41	alerts: move `ConcurrentFlushesHitTheLimit` alert to health alerts The `ConcurrentFlushesHitTheLimit` could be related to components like vminsert, vmstorage, vm-single-node and vmagent. Moving this alert to the `health` section of alerts will be benefitial for all components and will remove the duplicates from single/cluster alerts. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-08-03 10:46:26 +02:00
Haleygo	ee933541b2	add vmalertmanager filter for health alerts (#4665 )	2023-07-19 20:29:45 +02:00
Roman Khavronenko	01520d3e5d	alerts: update TooHighMemoryUsage threshold (#4256 ) It appears that 90% usage for anonymous mem usage is already concerning. So we lowering the threshold to 80%. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-05-07 22:18:56 +02:00
Max Golionko	f3b829125e	add vmsingle filter for health alerts (#4238 )	2023-05-02 20:54:42 +08:00
Max Golionko	5c955dd876	alerts: relax job filter to support job names created by VMOperator (#4203 )	2023-04-26 15:32:25 +02:00
Roman Khavronenko	d3608be313	alerts: add `TooManyTSIDMisses` alerting rule (#3959 ) See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3502#issuecomment-1358374954 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-03-17 09:46:51 +01:00
Zakhar Bessarab	6711eec109	docker-compose: move `TooManyLogs` into `vm-health` alerts set (#3199 )	2022-10-05 19:23:36 +02:00
Roman Khavronenko	5714a68ac6	deployment/docker: move cluster compose env to master branch (#3130 ) * deployment/docker: move cluster compose env to master branch The change supposed to simplify the process of maintaining for single/cluster docker-compose envs, alerts, dashboards. It also supposes to reduce confusion for users when looking for cluster related alerts/configs. Signed-off-by: hagen1778 <roman@victoriametrics.com> * deployment/docker: move cluster compose env to master branch Review updates. Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-09-21 11:48:38 +03:00

17 commits