vlogs: added basic alerts (#7252)

### Describe Your Changes Added basic VLogs alerts Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-11-21 14:44:00 +00:00 · 2024-10-17 12:33:06 +03:00 · 2024-10-17 12:33:06 +03:00 · 7b49d4f5dc
commit 7b49d4f5dc
parent 952fce152a
6 changed files with 95 additions and 0 deletions
--- a/deployment/docker/README.md
+++ b/deployment/docker/README.md
@ -165,6 +165,8 @@ The list of alerting rules is the following:
  alerting rules related to [vmalert](https://docs.victoriametrics.com/vmalert/) component;
 * [alerts-vmauth.yml](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts-vmauth.yml):
  alerting rules related to [vmauth](https://docs.victoriametrics.com/vmauth/) component;
 * [alerts-vlogs.yml](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts-vlogs.yml):
    alerting rules related to [VictoriaLogs](https://docs.victoriametrics.com/victorialogs/);
 Please, also see [how to monitor](https://docs.victoriametrics.com/single-server-victoriametrics/#monitoring) 
 VictoriaMetrics installations.
--- a/deployment/docker/alerts-vlogs.yml
+++ b/deployment/docker/alerts-vlogs.yml
@ -0,0 +1,43 @@
 # File contains default list of alerts for VictoriaLogs single server.
 # The alerts below are just recommendations and may require some updates
 # and threshold calibration according to every specific setup.
 groups:
  - name: vlogs
    interval: 30s
    concurrency: 2
    rules:
      - alert: DiskRunsOutOfSpace
        expr: |
          sum(vl_data_size_bytes) by(job, instance) /
          (
           sum(vl_free_disk_space_bytes) by(job, instance) +
           sum(vl_data_size_bytes) by(job, instance)
          ) > 0.8
        for: 30m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} (job={{ $labels.job }}) will run out of disk space soon"
          description: "Disk utilisation on instance {{ $labels.instance }} is more than 80%.\n
            Having less than 20% of free disk space could cripple merge processes and overall performance.
            Consider to limit the ingestion rate, decrease retention or scale the disk space if possible."
      - alert: RequestErrorsToAPI
        expr: increase(vl_http_errors_total[5m]) > 0
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Too many errors served for path {{ $labels.path }} (instance {{ $labels.instance }})"
          description: "Requests to path {{ $labels.path }} are receiving errors.
            Please verify if clients are sending correct requests."
      - alert: RowsRejectedOnIngestion
        expr: rate(vl_rows_dropped_total[5m]) > 0
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Some rows are rejected on \"{{ $labels.instance }}\" on ingestion attempt"
          description: "VictoriaLogs is rejecting to ingest rows on \"{{ $labels.instance }}\" due to the
            following reason: \"{{ $labels.reason }}\""
--- a/deployment/docker/docker-compose-victorialogs.yml
+++ b/deployment/docker/docker-compose-victorialogs.yml
@ -69,6 +69,48 @@ services:
      - vm_net
    restart: always
  # vmalert executes alerting and recording rules
  vmalert:
    container_name: vmalert
    image: victoriametrics/vmalert:v1.104.0
    depends_on:
      - "victoriametrics"
      - "alertmanager"
    ports:
      - 8880:8880
    volumes:
      - ./alerts.yml:/etc/alerts/alerts.yml
      - ./alerts-health.yml:/etc/alerts/alerts-health.yml
      - ./alerts-vlogs.yml:/etc/alerts/alerts-vlogs.yml
      - ./alerts-vmalert.yml:/etc/alerts/alerts-vmalert.yml
    command:
      - "--datasource.url=http://victoriametrics:8428/"
      - "--remoteRead.url=http://victoriametrics:8428/"
      - "--remoteWrite.url=http://victoriametrics:8428/"
      - "--notifier.url=http://alertmanager:9093/"
      - "--rule=/etc/alerts/*.yml"
      # display source of alerts in grafana
      - "--external.url=http://127.0.0.1:3000" #grafana outside container
      - '--external.alert.source=explore?orgId=1&left={"datasource":"VictoriaMetrics","queries":[{"expr":{{.Expr|jsonEscape|queryEscape}},"refId":"A"}],"range":{"from":"{{ .ActiveAt.UnixMilli }}","to":"now"}}'
    networks:
      - vm_net
    restart: always
  # alertmanager receives alerting notifications from vmalert
  # and distributes them according to --config.file.
  alertmanager:
    container_name: alertmanager
    image: prom/alertmanager:v0.27.0
    volumes:
      - ./alertmanager.yml:/config/alertmanager.yml
    command:
      - "--config.file=/config/alertmanager.yml"
    ports:
      - 9093:9093
    networks:
      - vm_net
    restart: always
 volumes:
  vmdata: {}
  vldata: {}
--- a/deployment/docker/prometheus-victorialogs.yml
+++ b/deployment/docker/prometheus-victorialogs.yml
@ -5,6 +5,9 @@ scrape_configs:
  - job_name: 'victoriametrics'
    static_configs:
      - targets: ['victoriametrics:8428']
  - job_name: 'vmalert'
    static_configs:
      - targets: [ 'vmalert:8880' ]
  - job_name: 'victorialogs'
    static_configs:
      - targets: ['victorialogs:9428']
--- a/docs/VictoriaLogs/CHANGELOG.md
+++ b/docs/VictoriaLogs/CHANGELOG.md
@ -15,6 +15,8 @@ according to [these docs](https://docs.victoriametrics.com/victorialogs/quicksta
 ## tip
 * FEATURE: add basic [alerting rules](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts-vlogs.yml) for VictoriaLogs process. See details at [monitoring docs](https://docs.victoriametrics.com/victorialogs/index.html#monitoring).
 ## [v0.36.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v0.36.0-victorialogs)
 Released at 2024-10-16
--- a/docs/VictoriaLogs/README.md
+++ b/docs/VictoriaLogs/README.md
@ -38,6 +38,9 @@ It is recommended to set up monitoring of these metrics via VictoriaMetrics
 (see [these docs](https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter)),
 vmagent (see [these docs](https://docs.victoriametrics.com/vmagent/#how-to-collect-metrics-in-prometheus-format)) or via Prometheus.
 We recommend setting up [alerts](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts-vlogs.yml)
 via [vmalert](https://docs.victoriametrics.com/vmalert/) or via Prometheus.
 VictoriaLogs emits its own logs to stdout. It is recommended to investigate these logs during troubleshooting.
 ## Upgrading