VictoriaMetrics/docs/operator/resources/vmalert.md

372 lines
14 KiB
Markdown

---
sort: 2
weight: 2
title: VMAlert
menu:
docs:
parent: "operator-custom-resources"
weight: 2
---
# VMAlert
`VMAlert` - executes a list of given [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)
or [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) rules against configured address.
The `VMAlert` CRD declaratively defines a desired [VMAlert](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmalert)
setup to run in a Kubernetes cluster.
It has few required config options - `datasource` and `notifier` are required, for other config parameters
check [doc](../api.md#vmalert).
For each `VMAlert` resource, the Operator deploys a properly configured `Deployment` in the same namespace.
The VMAlert `Pod`s are configured to mount a list of `Configmaps` prefixed with `<VMAlert-name>-number` containing
the configuration for alerting rules.
For each `VMAlert` resource, the Operator adds `Service` and `VMServiceScrape` in the same namespace prefixed with
name `<VMAlert-name>`.
## Specification
You can see the full actual specification of the `VMAlert` resource in the **[API docs -> VMAlert](../api.md#vmalert)**.
If you can't find necessary field in the specification of the custom resource,
see [Extra arguments section](./README.md#extra-arguments).
Also, you can check out the [examples](#examples) section.
## Rules
The CRD specifies which `VMRule`s should be covered by the deployed `VMAlert` instances based on label selection.
The Operator then generates a configuration based on the included `VMRule`s and updates the `Configmaps` containing
the configuration. It continuously does so for all changes that are made to `VMRule`s or to the `VMAlert` resource itself.
Alerting rules are filtered by selectors `ruleNamespaceSelector` and `ruleSelector` in `VMAlert` CRD definition.
For selecting rules from all namespaces you must specify it to empty value:
```yaml
spec:
ruleNamespaceSelector: {}
```
[VMRUle](./vmrule.md) objects generate part of `VMAlert` configuration.
For filtering rules `VMAlert` uses selectors `ruleNamespaceSelector` and `ruleSelector`.
It allows configuring rules access control across namespaces and different environments.
Specification of selectors you can see in [this doc](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#labelselector-v1-meta).
In addition to the above selectors, the filtering of objects in a cluster is affected by the field `selectAllByDefault` of `VMAlert` spec and environment variable `WATCH_NAMESPACE` for operator.
Following rules are applied:
- If `ruleNamespaceSelector` and `ruleSelector` both undefined, then by default select nothing. With option set - `spec.selectAllByDefault: true`, select all vmrules.
- If `ruleNamespaceSelector` defined, `ruleSelector` undefined, then all vmrules are matching at namespaces for given `ruleNamespaceSelector`.
- If `ruleNamespaceSelector` undefined, `ruleSelector` defined, then all vmrules at `VMAgent`'s namespaces are matching for given `ruleSelector`.
- If `ruleNamespaceSelector` and `ruleSelector` both defined, then only vmrules at namespaces matched `ruleNamespaceSelector` for given `ruleSelector` are matching.
Here's a more visual and more detailed view:
| `ruleNamespaceSelector` | `ruleSelector` | `selectAllByDefault` | `WATCH_NAMESPACE` | Selected rules |
|-------------------------|----------------|----------------------|-------------------|------------------------------------------------------------------------------------------------------|
| undefined | undefined | false | undefined | nothing |
| undefined | undefined | **true** | undefined | all vmrules in the cluster |
| **defined** | undefined | any | undefined | all vmrules are matching at namespaces for given `ruleNamespaceSelector` |
| undefined | **defined** | any | undefined | all vmrules only at `VMAlert`'s namespace are matching for given `ruleSelector` |
| **defined** | **defined** | any | undefined | all vmrules only at namespaces matched `ruleNamespaceSelector` for given `ruleSelector` are matching |
| any | undefined | any | **defined** | all vmrules only at `VMAlert`'s namespace |
| any | **defined** | any | **defined** | all vmrules only at `VMAlert`'s namespace for given `ruleSelector` are matching |
More details about `WATCH_NAMESPACE` variable you can read in [this doc](../configuration.md#namespaced-mode).
Here are some examples of `VMAlert` configuration with selectors:
```yaml
# select all rule objects in the cluster
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
name: vmalert-select-all
spec:
# ...
selectAllByDefault: true
---
# select all rule objects in specific namespace (my-namespace)
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
name: vmalert-select-ns
spec:
# ...
ruleNamespaceSelector:
matchLabels:
kubernetes.io/metadata.name: my-namespace
```
## High availability
`VMAlert` can be launched with multiple replicas without an additional configuration as far [alertmanager](./vmalertmanager.md) is responsible for alert deduplication.
Note, if you want to use `VMAlert` with high-available [`VMAlertmanager`](./vmalertmanager.md), which has more than 1 replica.
You have to specify all pod fqdns at `VMAlert.spec.notifiers.[url]`. Or you can use service discovery for notifier, examples:
- alertmanager:
```yaml
apiVersion: v1
kind: Secret
metadata:
name: vmalertmanager-example-alertmanager
labels:
app: vm-operator
type: Opaque
stringData:
alertmanager.yaml: |
global:
resolve_timeout: 5m
route:
group_by: ['job']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'webhook'
receivers:
- name: 'webhook'
webhook_configs:
- url: 'http://alertmanagerwh:30500/'
# ...
---
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlertmanager
metadata:
name: example
namespace: default
labels:
usage: dedicated
spec:
replicaCount: 2
configSecret: vmalertmanager-example-alertmanager
configSelector: {}
configNamespaceSelector: {}
# ...
```
- vmalert with fqdns:
```yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
name: example-ha
namespace: default
spec:
replicaCount: 2
datasource:
url: http://vmsingle-example.default.svc:8429
notifiers:
- url: http://vmalertmanager-example-0.vmalertmanager-example.default.svc:9093
- url: http://vmalertmanager-example-1.vmalertmanager-example.default.svc:9093
evaluationInterval: "10s"
ruleSelector: {}
# ...
```
- vmalert with service discovery:
```yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
name: example-ha
namespace: default
spec:
replicaCount: 2
datasource:
url: http://vmsingle-example.default.svc:8429
notifiers:
- selector:
namespaceSelector:
matchNames:
- default
labelSelector:
matchLabels:
usage: dedicated
evaluationInterval: "10s"
ruleSelector: {}
# ...
```
In addition, you need to specify `remoteWrite` and `remoteRead` urls for restoring alert states after restarts:
```yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
name: example-ha
namespace: default
spec:
replicaCount: 2
evaluationInterval: "10s"
selectAllByDefault: true
datasource:
url: http://vmselect-demo.vm.svc:8481/select/0/prometheus
notifiers:
- url: http://vmalertmanager-example-0.vmalertmanager-example.default.svc:9093
- url: http://vmalertmanager-example-1.vmalertmanager-example.default.svc:9093
remoteWrite:
url: http://vminsert-demo.vm.svc:8480/insert/0/prometheus
remoteRead:
url: http://vmselect-demo.vm.svc:8481/select/0/prometheus
```
More details about `remoteWrite` and `remoteRead` you can read in [vmalert docs](https://docs.victoriametrics.com/vmalert.html#alerts-state-on-restarts).
## Version management
To set `VMAlert` version add `spec.image.tag` name from [releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
```yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
name: example-vmalert
spec:
image:
repository: victoriametrics/victoria-metrics
tag: v1.93.4
pullPolicy: Always
# ...
```
Also, you can specify `imagePullSecrets` if you are pulling images from private repo:
```yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
name: example-vmalert
spec:
image:
repository: victoriametrics/victoria-metrics
tag: v1.93.4
pullPolicy: Always
imagePullSecrets:
- name: my-repo-secret
# ...
```
## Enterprise features
VMAlert supports features [Reading rules from object storage](https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage)
and [Multitenancy](https://docs.victoriametrics.com/vmalert.html#multitenancy)
from [VictoriaMetrics Enterprise](https://docs.victoriametrics.com/enterprise.html#victoriametrics-enterprise).
For using Enterprise version of [vmalert](https://docs.victoriametrics.com/vmalert.html)
you need to change version of `VMAlert` to version with `-enterprise` suffix using [Version management](#version-management).
All the enterprise apps require `-eula` command-line flag to be passed to them.
This flag acknowledges that your usage fits one of the cases listed on [this page](https://docs.victoriametrics.com/enterprise.html#victoriametrics-enterprise).
So you can use [extraArgs](./README.md#extra-arguments) for passing this flag to `VMAlert`:
### Reading rules from object storage
After that you can pass `-rule` command-line argument with `s3://` or `gs://`
to `VMAlert` with [extraArgs](./README.md#extra-arguments).
More details about reading rules from object storage you can read in [vmalert docs](https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage).
Here are complete example for [Reading rules from object storage](https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage):
```yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
name: vmalert-ent-example
spec:
# enabling enterprise features
image:
# enterprise version of vmalert
tag: v1.93.5-enterprise
extraArgs:
# should be true and means that you have the legal right to run a vmalert enterprise
# that can either be a signed contract or an email with confirmation to run the service in a trial period
# https://victoriametrics.com/legal/esa/
eula: true
# using enterprise features: Reading rules from object storage
# more details about reading rules from object storage you can read on https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage
rule: s3://bucket/dir/alert.rules
# ...other fields...
```
### Multitenancy
After enabling enterprise version you can use [Multitenancy](https://docs.victoriametrics.com/vmalert.html#multitenancy)
feature in `VMAlert`.
For that you need to set `clusterMode` commad-line flag
with [extraArgs](./README.md#extra-arguments)
and specify `tenant` field for groups
in [VMRule](./vmrule.md#enterprise-features):
```yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
name: vmalert-ent-example
spec:
# enabling enterprise features
image:
# enterprise version of vmalert
tag: v1.93.5-enterprise
extraArgs:
# should be true and means that you have the legal right to run a vmalert enterprise
# that can either be a signed contract or an email with confirmation to run the service in a trial period
# https://victoriametrics.com/legal/esa/
eula: true
# using enterprise features: Multitenancy
# more details about multitenancy you can read on https://docs.victoriametrics.com/vmalert.html#multitenancy
clusterMode: true
# ...other fields...
---
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
name: vmrule-ent-example
spec:
groups:
- name: vmalert-1
rules:
# using enterprise features: Multitenancy
# more details about multitenancy you can read on https://docs.victoriametrics.com/vmalert.html#multitenancy
- tenant: 1
alert: vmalert config reload error
expr: delta(vmalert_config_last_reload_errors_total[5m]) > 0
for: 10s
labels:
severity: major
job: "{{ $labels.job }}"
annotations:
value: "{{ $value }}"
description: 'error reloading vmalert config, reload count for 5 min {{ $value }}'
```
## Examples
```yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
name: example-vmalert
spec:
replicaCount: 1
datasource:
url: "http://vmsingle-example-vmsingle-persisted.default.svc:8429"
notifier:
url: "http://vmalertmanager-example-alertmanager.default.svc:9093"
evaluationInterval: "30s"
selectAllByDefault: true
```