vlogs: added basic alerts (#7252)

### Describe Your Changes

Added basic VLogs alerts

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
This commit is contained in:
Andrii Chubatiuk 2024-10-17 12:33:06 +03:00 committed by Zakhar Bessarab
parent 952fce152a
commit 7b49d4f5dc
No known key found for this signature in database
GPG key ID: 932B34D6FE062023
6 changed files with 95 additions and 0 deletions

View file

@ -165,6 +165,8 @@ The list of alerting rules is the following:
alerting rules related to [vmalert](https://docs.victoriametrics.com/vmalert/) component;
* [alerts-vmauth.yml](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts-vmauth.yml):
alerting rules related to [vmauth](https://docs.victoriametrics.com/vmauth/) component;
* [alerts-vlogs.yml](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts-vlogs.yml):
alerting rules related to [VictoriaLogs](https://docs.victoriametrics.com/victorialogs/);
Please, also see [how to monitor](https://docs.victoriametrics.com/single-server-victoriametrics/#monitoring)
VictoriaMetrics installations.

View file

@ -0,0 +1,43 @@
# File contains default list of alerts for VictoriaLogs single server.
# The alerts below are just recommendations and may require some updates
# and threshold calibration according to every specific setup.
groups:
- name: vlogs
interval: 30s
concurrency: 2
rules:
- alert: DiskRunsOutOfSpace
expr: |
sum(vl_data_size_bytes) by(job, instance) /
(
sum(vl_free_disk_space_bytes) by(job, instance) +
sum(vl_data_size_bytes) by(job, instance)
) > 0.8
for: 30m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} (job={{ $labels.job }}) will run out of disk space soon"
description: "Disk utilisation on instance {{ $labels.instance }} is more than 80%.\n
Having less than 20% of free disk space could cripple merge processes and overall performance.
Consider to limit the ingestion rate, decrease retention or scale the disk space if possible."
- alert: RequestErrorsToAPI
expr: increase(vl_http_errors_total[5m]) > 0
for: 15m
labels:
severity: warning
annotations:
summary: "Too many errors served for path {{ $labels.path }} (instance {{ $labels.instance }})"
description: "Requests to path {{ $labels.path }} are receiving errors.
Please verify if clients are sending correct requests."
- alert: RowsRejectedOnIngestion
expr: rate(vl_rows_dropped_total[5m]) > 0
for: 15m
labels:
severity: warning
annotations:
summary: "Some rows are rejected on \"{{ $labels.instance }}\" on ingestion attempt"
description: "VictoriaLogs is rejecting to ingest rows on \"{{ $labels.instance }}\" due to the
following reason: \"{{ $labels.reason }}\""

View file

@ -69,6 +69,48 @@ services:
- vm_net
restart: always
# vmalert executes alerting and recording rules
vmalert:
container_name: vmalert
image: victoriametrics/vmalert:v1.104.0
depends_on:
- "victoriametrics"
- "alertmanager"
ports:
- 8880:8880
volumes:
- ./alerts.yml:/etc/alerts/alerts.yml
- ./alerts-health.yml:/etc/alerts/alerts-health.yml
- ./alerts-vlogs.yml:/etc/alerts/alerts-vlogs.yml
- ./alerts-vmalert.yml:/etc/alerts/alerts-vmalert.yml
command:
- "--datasource.url=http://victoriametrics:8428/"
- "--remoteRead.url=http://victoriametrics:8428/"
- "--remoteWrite.url=http://victoriametrics:8428/"
- "--notifier.url=http://alertmanager:9093/"
- "--rule=/etc/alerts/*.yml"
# display source of alerts in grafana
- "--external.url=http://127.0.0.1:3000" #grafana outside container
- '--external.alert.source=explore?orgId=1&left={"datasource":"VictoriaMetrics","queries":[{"expr":{{.Expr|jsonEscape|queryEscape}},"refId":"A"}],"range":{"from":"{{ .ActiveAt.UnixMilli }}","to":"now"}}'
networks:
- vm_net
restart: always
# alertmanager receives alerting notifications from vmalert
# and distributes them according to --config.file.
alertmanager:
container_name: alertmanager
image: prom/alertmanager:v0.27.0
volumes:
- ./alertmanager.yml:/config/alertmanager.yml
command:
- "--config.file=/config/alertmanager.yml"
ports:
- 9093:9093
networks:
- vm_net
restart: always
volumes:
vmdata: {}
vldata: {}

View file

@ -5,6 +5,9 @@ scrape_configs:
- job_name: 'victoriametrics'
static_configs:
- targets: ['victoriametrics:8428']
- job_name: 'vmalert'
static_configs:
- targets: [ 'vmalert:8880' ]
- job_name: 'victorialogs'
static_configs:
- targets: ['victorialogs:9428']

View file

@ -15,6 +15,8 @@ according to [these docs](https://docs.victoriametrics.com/victorialogs/quicksta
## tip
* FEATURE: add basic [alerting rules](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts-vlogs.yml) for VictoriaLogs process. See details at [monitoring docs](https://docs.victoriametrics.com/victorialogs/index.html#monitoring).
## [v0.36.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v0.36.0-victorialogs)
Released at 2024-10-16

View file

@ -38,6 +38,9 @@ It is recommended to set up monitoring of these metrics via VictoriaMetrics
(see [these docs](https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter)),
vmagent (see [these docs](https://docs.victoriametrics.com/vmagent/#how-to-collect-metrics-in-prometheus-format)) or via Prometheus.
We recommend setting up [alerts](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts-vlogs.yml)
via [vmalert](https://docs.victoriametrics.com/vmalert/) or via Prometheus.
VictoriaLogs emits its own logs to stdout. It is recommended to investigate these logs during troubleshooting.
## Upgrading