github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-01 14:47:38 +00:00

Author	SHA1	Message	Date
Hui Wang	9616814728	vmalert: integrate with victorialogs (#7255 ) address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706. See https://github.com/VictoriaMetrics/VictoriaMetrics/blob/vmalert-support-vlog-ds/docs/VictoriaLogs/vmalert.md. Related fix https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7254. Note: in this pull request, vmalert doesn't support [backfilling](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/vmalert-support-vlog-ds/docs/VictoriaLogs/vmalert.md#rules-backfilling) for rules with a customized time filter. It might be added in the future, see [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7289) for details. Feature can be tested with image `victoriametrics/vmalert:heads-vmalert-support-vlog-ds-0-g420629c-scratch`. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `68bad22fd2`)	2024-10-29 16:32:00 +01:00
hagen1778	6bdd0489e7	deployment/alerts: fix copy&paste typo in TooHighGoroutineSchedulingLatency Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `a5c002edef`)	2024-09-24 16:58:14 +02:00
Roman Khavronenko	deb2f87074	deployment: add panel and alerts for displying go scheduler latency (#7078 ) The panel and alerting rule should help to understand whether VM component doesn't have enough CPU resources or gets throttled. The alert is applicable for all VM components. The panel was added to vmalert, vmagent, vmsingle, vm clusert and victorialogs dashes. ------------------- This alerting rule should have help us identify resource shortage for sandbox vmagent - see [this link](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/?g0.range_input=23d13h25m25s424ms&g0.end_input=2024-09-23T14%3A11%3A00&g0.relative_time=none&g0.tab=0&g0.expr=histogram_quantile%280.99%2C+sum%28rate%28go_sched_latencies_seconds_bucket%7Bjob%3D%22vmagent-monitoring-vmagent%22%7D%5B5m%5D%29%29+by+%28le%2C+job%2C+instance%29%29+%3E+0.1) for example. We weren't aware of resource shortage, because VM metrics assumed this vmagent had 1vCPU while in fact its limit was 0.2vCPU. Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `4d0b41e63b`)	2024-09-24 16:58:14 +02:00
Artem Navoiev	ea2b32d2d4	deployment docker: use line formatting in alerts-health fixes #6393 (#6394 ) ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Artem Navoiev <tenmozes@gmail.com> (cherry picked from commit `508946ed9d`)	2024-06-03 11:53:42 +02:00
hagen1778	63b83d62e8	deployment/alerts: add new alerting rules `TooLongLabelValues` and `TooLongLabelNames` to notify about truncation of label values or names respectively. Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `1be1e9a7a4`)	2024-05-24 16:08:41 +02:00
Nikolay	5025ede7bc	lib/mergeset: adds tracking for indexdb records drop (#6297 ) It allows to create alert for possible item drops at indexdb. It may happen, if ingested metric size exceeds max indexdb item size. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `69d244e6fb`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-24 16:08:34 +02:00
hagen1778	383dce7201	alerts: simplify aggregation of alerting rules This is follow-up after `75196d7234` It updates some of the alerting rules to remove unnecessary aggregations. It keeps aggregations for expressions which are using multiple time series filters to make sure their label will match. Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `8fb68152e6`)	2023-12-11 15:38:16 +01:00
hagen1778	d9118cdaab	deployment/alerts: update `TooHighMemoryUsage` annotation The memory usage isn't measured on 5m interval anymore. Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `4e0a779efe`)	2023-10-25 14:39:48 +02:00
hagen1778	d349d6a9ce	deployment/alerts: make `TooHighMemoryUsage` more tolerable to spikes Using `min_over_time` should reduce the amount of false positives when component is running in near-the-threshold state. Now it should trigger only if all collected samples were above the threshold on 10m interval. Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `003ef3a518`)	2023-10-25 14:39:48 +02:00
hagen1778	297f63a01e	alerting: account for `vmauth` component for alerts `ServiceDown` and `TooManyRestarts` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-03 17:52:43 +02:00
hagen1778	f2b06484f2	alerts: move `ConcurrentFlushesHitTheLimit` alert to health alerts The `ConcurrentFlushesHitTheLimit` could be related to components like vminsert, vmstorage, vm-single-node and vmagent. Moving this alert to the `health` section of alerts will be benefitial for all components and will remove the duplicates from single/cluster alerts. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-08-11 04:39:28 -07:00
Haleygo	1794a97ebe	add vmalertmanager filter for health alerts (#4665 )	2023-07-19 14:50:06 -07:00
Roman Khavronenko	3049754575	alerts: update TooHighMemoryUsage threshold (#4256 ) It appears that 90% usage for anonymous mem usage is already concerning. So we lowering the threshold to 80%. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-05-09 21:41:40 -07:00
Max Golionko	ef11e49d7b	add vmsingle filter for health alerts (#4238 )	2023-05-08 22:00:43 -07:00
Max Golionko	0f14beff58	alerts: relax job filter to support job names created by VMOperator (#4203 )	2023-05-08 15:52:31 -07:00
Roman Khavronenko	66c5ddf2ad	alerts: add `TooManyTSIDMisses` alerting rule (#3959 ) See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3502#issuecomment-1358374954 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-03-17 16:14:50 -07:00
Zakhar Bessarab	434b00cee8	docker-compose: move `TooManyLogs` into `vm-health` alerts set (#3199 )	2022-10-05 22:42:31 +03:00
Roman Khavronenko	f772ee8326	deployment/docker: move cluster compose env to master branch (#3130 ) * deployment/docker: move cluster compose env to master branch The change supposed to simplify the process of maintaining for single/cluster docker-compose envs, alerts, dashboards. It also supposes to reduce confusion for users when looking for cluster related alerts/configs. Signed-off-by: hagen1778 <roman@victoriametrics.com> * deployment/docker: move cluster compose env to master branch Review updates. Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-09-21 12:03:10 +03:00

18 commits