mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2024-11-21 14:44:00 +00:00
Dashboards vmagent updates (#1973)
* dashboards/vmagent: shuffle panels for better visibility More important error/dropped panels were moved higher on the main row. Network usage panel moved to Resource usage row. Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards/vmagent: add Troubleshooting row to show top 5 instances/jobs by churn rate New panels are supposed to show top 5 jobs or targets which generate the most of the churn rate. They were placed into a new row "Troubleshooting". Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards/vmagent: add panels for showing persistent queue saturation New panels were added to Torubleshooting row to show the persistent queue saturation. The corresponding alerts were added and linked to these panels as well. Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards/vmagent: add alert "RejectedRemoteWriteDataBlocksAreDropped" New alert suppose to send a notification when vmagent starts to drop data blocks rejected by configured remote write destiantion. Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit is contained in:
parent
36f4130cf1
commit
bc79bdf68a
2 changed files with 1145 additions and 994 deletions
File diff suppressed because it is too large
Load diff
|
@ -216,6 +216,16 @@ groups:
|
|||
description: "Vmagent dropped {{ $value | humanize1024 }} from persistent queue
|
||||
on instance {{ $labels.instance }} for the last 10m."
|
||||
|
||||
- alert: RejectedRemoteWriteDataBlocksAreDropped
|
||||
expr: sum(increase(vmagent_remotewrite_packets_dropped_total[5m])) by (job, instance) > 0
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=79&var-instance={{ $labels.instance }}"
|
||||
summary: "Job \"{{ $labels.job }}\" on instance {{ $labels.instance }} drops the rejected by
|
||||
remote-write server data blocks. Check the logs to find the reason for rejects."
|
||||
|
||||
- alert: TooManyScrapeErrors
|
||||
expr: sum(increase(vm_promscrape_scrapes_failed_total[5m])) by (job, instance) > 0
|
||||
for: 15m
|
||||
|
@ -261,6 +271,30 @@ groups:
|
|||
This usually means that `-remoteWrite.queues` command-line flag must be increased in order to increase
|
||||
the number of connections per each remote storage."
|
||||
|
||||
- alert: PersistentQueueForWritesIsSaturated
|
||||
expr: rate(vm_persistentqueue_write_duration_seconds_total[5m]) > 0.9
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=98&var-instance={{ $labels.instance }}"
|
||||
summary: "Persistent queue writes for instance {{ $labels.instance }} are saturated"
|
||||
description: "Persistent queue writes for vmagent \"{{ $labels.job }}\" (instance {{ $labels.instance }})
|
||||
are saturated by more than 90% and vmagent won't be able to keep up with flushing data on disk.
|
||||
In this case, consider to decrease load on the vmagent or improve the disk throughput."
|
||||
|
||||
- alert: PersistentQueueForReadsIsSaturated
|
||||
expr: rate(vm_persistentqueue_read_duration_seconds_total[5m]) > 0.9
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=99&var-instance={{ $labels.instance }}"
|
||||
summary: "Persistent queue reads for instance {{ $labels.instance }} are saturated"
|
||||
description: "Persistent queue reads for vmagent \"{{ $labels.job }}\" (instance {{ $labels.instance }})
|
||||
are saturated by more than 90% and vmagent won't be able to keep up with reading data from the disk.
|
||||
In this case, consider to decrease load on the vmagent or improve the disk throughput."
|
||||
|
||||
- alert: SeriesLimitHourReached
|
||||
expr: (vmagent_hourly_series_limit_current_series / vmagent_hourly_series_limit_max_series) > 0.9
|
||||
labels:
|
||||
|
|
Loading…
Reference in a new issue