vmalert: add Troubleshooting section to docs (#3115)

Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit is contained in:
Roman Khavronenko 2022-09-15 16:15:39 +02:00 committed by Aliaksandr Valialkin
parent 8ca42b9bcb
commit 1c13cce5ed
No known key found for this signature in database
GPG key ID: A72BEC6CD3D0DED1
10 changed files with 250 additions and 116 deletions

View file

@ -638,6 +638,61 @@ Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/1495
If you have suggestions for improvements or have found a bug - please open an issue on github or add If you have suggestions for improvements or have found a bug - please open an issue on github or add
a review to the dashboard. a review to the dashboard.
## Troubleshooting
vmalert executes configured rules within certain intervals. It is expected that at the moment when rule is executed,
the data is already present in configured `-datasource.url`:
<img alt="vmalert expected evaluation" src="vmalert_ts_normal.gif">
Usually, troubles start to appear when data in `-datasource.url` is delayed or absent. In such cases, evaluations
may get empty response from datasource and produce empty recording rules or reset alerts state:
<img alt="vmalert evaluation when data is delayed" src="vmalert_ts_data_delay.gif">
Try the following recommendations in such cases:
* Always configure group's `evaluationInterval` to be bigger or equal to `scrape_interval` at which metrics
are delivered to the datasource;
* If you know in advance, that data in datasource is delayed - try changing vmalert's `-datasource.lookback`
command-line flag to add a time shift for evaluations;
* If time intervals between datapoints in datasource are irregular - try changing vmalert's `-datasource.queryStep`
command-line flag to specify how far search query can lookback for the recent datapoint. By default, this value
is equal to group's `evaluationInterval`.
Sometimes, it is not clear why some specific alert fired or didn't fire. It is very important to remember, that
alerts with `for: 0` fire immediately when their expression becomes true. And alerts with `for > 0` will fire only
after multiple consecutive evaluations, and at each evaluation their expression must be true. If at least one evaluation
becomes false, then alert's state resets to the initial state.
If `-remoteWrite.url` command-line flag is configured, vmalert will persist alert's state in form of time series
`ALERTS` and `ALERTS_FOR_STATE` to the specified destination. Such time series can be then queried via
[vmui](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmui) or Grafana to track how alerts state
changed in time.
vmalert also stores last N state updates for each rule. To check updates, click on `Details` link next to rule's name
on `/vmalert/groups` page and check the `Last updates` section:
<img alt="vmalert state" src="vmalert_state.png">
Rows in the section represent ordered rule evaluations and their results. The column `curl` contains an example of
HTTP request sent by vmalert to the `-datasource.url` during evaluation. If specific state shows that there were
no samples returned and curl command returns data - then it is very likely there was no data in datasource on the
moment when rule was evaluated.
vmalert also alows configuring more detailed logging for specific rule. Just set `debug: true` in rule's configuration
and vmalert will start printing additional log messages:
```terminal
2022-09-15T13:35:41.155Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:41+02:00: query returned 0 samples (elapsed: 5.896041ms)
2022-09-15T13:35:56.149Z DEBUG datasource request: executing POST request with params "denyPartialResponse=true&query=sum%28vm_tcplistener_conns%7Binstance%3D%22localhost%3A8429%22%7D%29+by%28instance%29+%3E+0&step=15s&time=1663248945"
2022-09-15T13:35:56.178Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:56+02:00: query returned 1 samples (elapsed: 28.368208ms)
2022-09-15T13:35:56.178Z DEBUG datasource request: executing POST request with params "denyPartialResponse=true&query=sum%28vm_tcplistener_conns%7Binstance%3D%22localhost%3A8429%22%7D%29&step=15s&time=1663248945"
2022-09-15T13:35:56.179Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:56+02:00: alert 10705778000901301787 {alertgroup="TestGroup",alertname="Conns",cluster="east-1",instance="localhost:8429",replica="a"} created in state PENDING
...
2022-09-15T13:36:56.153Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:36:56+02:00: alert 10705778000901301787 {alertgroup="TestGroup",alertname="Conns",cluster="east-1",instance="localhost:8429",replica="a"} PENDING => FIRING: 1m0s since becoming active at 2022-09-15 15:35:56.126006 +0200 CEST m=+39.384575417
```
## Profiling ## Profiling
`vmalert` provides handlers for collecting the following [Go profiles](https://blog.golang.org/profiling-go-programs): `vmalert` provides handlers for collecting the following [Go profiles](https://blog.golang.org/profiling-go-programs):

Binary file not shown.

After

Width:  |  Height:  |  Size: 109 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

View file

@ -384,6 +384,7 @@
</div> </div>
</div> </div>
</div> </div>
{% if rule.Type == "alerting" %}
<div class="container border-bottom p-2"> <div class="container border-bottom p-2">
<div class="row"> <div class="row">
<div class="col-2"> <div class="col-2">
@ -394,6 +395,7 @@
</div> </div>
</div> </div>
</div> </div>
{% endif %}
<div class="container border-bottom p-2"> <div class="container border-bottom p-2">
<div class="row"> <div class="row">
<div class="col-2"> <div class="col-2">
@ -406,6 +408,7 @@
</div> </div>
</div> </div>
</div> </div>
{% if rule.Type == "alerting" %}
<div class="container border-bottom p-2"> <div class="container border-bottom p-2">
<div class="row"> <div class="row">
<div class="col-2"> <div class="col-2">
@ -419,6 +422,7 @@
</div> </div>
</div> </div>
</div> </div>
{% endif %}
<div class="container border-bottom p-2"> <div class="container border-bottom p-2">
<div class="row"> <div class="row">
<div class="col-2"> <div class="col-2">

View file

@ -1187,6 +1187,11 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
</div> </div>
</div> </div>
</div> </div>
`)
//line app/vmalert/web.qtpl:387
if rule.Type == "alerting" {
//line app/vmalert/web.qtpl:387
qw422016.N().S(`
<div class="container border-bottom p-2"> <div class="container border-bottom p-2">
<div class="row"> <div class="row">
<div class="col-2"> <div class="col-2">
@ -1194,13 +1199,18 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
</div> </div>
<div class="col"> <div class="col">
`) `)
//line app/vmalert/web.qtpl:393 //line app/vmalert/web.qtpl:394
qw422016.E().V(rule.Duration) qw422016.E().V(rule.Duration)
//line app/vmalert/web.qtpl:393 //line app/vmalert/web.qtpl:394
qw422016.N().S(` seconds qw422016.N().S(` seconds
</div> </div>
</div> </div>
</div> </div>
`)
//line app/vmalert/web.qtpl:398
}
//line app/vmalert/web.qtpl:398
qw422016.N().S(`
<div class="container border-bottom p-2"> <div class="container border-bottom p-2">
<div class="row"> <div class="row">
<div class="col-2"> <div class="col-2">
@ -1208,27 +1218,32 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
</div> </div>
<div class="col"> <div class="col">
`) `)
//line app/vmalert/web.qtpl:403 //line app/vmalert/web.qtpl:405
for _, k := range labelKeys { for _, k := range labelKeys {
//line app/vmalert/web.qtpl:403 //line app/vmalert/web.qtpl:405
qw422016.N().S(` qw422016.N().S(`
<span class="m-1 badge bg-primary">`) <span class="m-1 badge bg-primary">`)
//line app/vmalert/web.qtpl:404 //line app/vmalert/web.qtpl:406
qw422016.E().S(k) qw422016.E().S(k)
//line app/vmalert/web.qtpl:404 //line app/vmalert/web.qtpl:406
qw422016.N().S(`=`) qw422016.N().S(`=`)
//line app/vmalert/web.qtpl:404 //line app/vmalert/web.qtpl:406
qw422016.E().S(rule.Labels[k]) qw422016.E().S(rule.Labels[k])
//line app/vmalert/web.qtpl:404 //line app/vmalert/web.qtpl:406
qw422016.N().S(`</span> qw422016.N().S(`</span>
`) `)
//line app/vmalert/web.qtpl:405 //line app/vmalert/web.qtpl:407
} }
//line app/vmalert/web.qtpl:405 //line app/vmalert/web.qtpl:407
qw422016.N().S(` qw422016.N().S(`
</div> </div>
</div> </div>
</div> </div>
`)
//line app/vmalert/web.qtpl:411
if rule.Type == "alerting" {
//line app/vmalert/web.qtpl:411
qw422016.N().S(`
<div class="container border-bottom p-2"> <div class="container border-bottom p-2">
<div class="row"> <div class="row">
<div class="col-2"> <div class="col-2">
@ -1236,28 +1251,33 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
</div> </div>
<div class="col"> <div class="col">
`) `)
//line app/vmalert/web.qtpl:415 //line app/vmalert/web.qtpl:418
for _, k := range annotationKeys { for _, k := range annotationKeys {
//line app/vmalert/web.qtpl:415 //line app/vmalert/web.qtpl:418
qw422016.N().S(` qw422016.N().S(`
<b>`) <b>`)
//line app/vmalert/web.qtpl:416 //line app/vmalert/web.qtpl:419
qw422016.E().S(k) qw422016.E().S(k)
//line app/vmalert/web.qtpl:416 //line app/vmalert/web.qtpl:419
qw422016.N().S(`:</b><br> qw422016.N().S(`:</b><br>
<p>`) <p>`)
//line app/vmalert/web.qtpl:417 //line app/vmalert/web.qtpl:420
qw422016.E().S(rule.Annotations[k]) qw422016.E().S(rule.Annotations[k])
//line app/vmalert/web.qtpl:417 //line app/vmalert/web.qtpl:420
qw422016.N().S(`</p> qw422016.N().S(`</p>
`) `)
//line app/vmalert/web.qtpl:418 //line app/vmalert/web.qtpl:421
} }
//line app/vmalert/web.qtpl:418 //line app/vmalert/web.qtpl:421
qw422016.N().S(` qw422016.N().S(`
</div> </div>
</div> </div>
</div> </div>
`)
//line app/vmalert/web.qtpl:425
}
//line app/vmalert/web.qtpl:425
qw422016.N().S(`
<div class="container border-bottom p-2"> <div class="container border-bottom p-2">
<div class="row"> <div class="row">
<div class="col-2"> <div class="col-2">
@ -1265,17 +1285,17 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
</div> </div>
<div class="col"> <div class="col">
<a target="_blank" href="`) <a target="_blank" href="`)
//line app/vmalert/web.qtpl:428 //line app/vmalert/web.qtpl:432
qw422016.E().S(prefix) qw422016.E().S(prefix)
//line app/vmalert/web.qtpl:428 //line app/vmalert/web.qtpl:432
qw422016.N().S(`groups#group-`) qw422016.N().S(`groups#group-`)
//line app/vmalert/web.qtpl:428 //line app/vmalert/web.qtpl:432
qw422016.E().S(rule.GroupID) qw422016.E().S(rule.GroupID)
//line app/vmalert/web.qtpl:428 //line app/vmalert/web.qtpl:432
qw422016.N().S(`">`) qw422016.N().S(`">`)
//line app/vmalert/web.qtpl:428 //line app/vmalert/web.qtpl:432
qw422016.E().S(rule.GroupID) qw422016.E().S(rule.GroupID)
//line app/vmalert/web.qtpl:428 //line app/vmalert/web.qtpl:432
qw422016.N().S(`</a> qw422016.N().S(`</a>
</div> </div>
</div> </div>
@ -1283,9 +1303,9 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
<br> <br>
<div class="display-6 pb-3">Last `) <div class="display-6 pb-3">Last `)
//line app/vmalert/web.qtpl:434 //line app/vmalert/web.qtpl:438
qw422016.N().D(len(rule.Updates)) qw422016.N().D(len(rule.Updates))
//line app/vmalert/web.qtpl:434 //line app/vmalert/web.qtpl:438
qw422016.N().S(` updates</span>:</div> qw422016.N().S(` updates</span>:</div>
<table class="table table-striped table-hover table-sm"> <table class="table table-striped table-hover table-sm">
<thead> <thead>
@ -1300,201 +1320,201 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
<tbody> <tbody>
`) `)
//line app/vmalert/web.qtpl:447 //line app/vmalert/web.qtpl:451
for _, u := range rule.Updates { for _, u := range rule.Updates {
//line app/vmalert/web.qtpl:447 //line app/vmalert/web.qtpl:451
qw422016.N().S(` qw422016.N().S(`
<tr`) <tr`)
//line app/vmalert/web.qtpl:448 //line app/vmalert/web.qtpl:452
if u.err != nil { if u.err != nil {
//line app/vmalert/web.qtpl:448 //line app/vmalert/web.qtpl:452
qw422016.N().S(` class="alert-danger"`) qw422016.N().S(` class="alert-danger"`)
//line app/vmalert/web.qtpl:448 //line app/vmalert/web.qtpl:452
} }
//line app/vmalert/web.qtpl:448 //line app/vmalert/web.qtpl:452
qw422016.N().S(`> qw422016.N().S(`>
<td> <td>
<span class="badge bg-primary rounded-pill me-3" title="Updated at">`) <span class="badge bg-primary rounded-pill me-3" title="Updated at">`)
//line app/vmalert/web.qtpl:450 //line app/vmalert/web.qtpl:454
qw422016.E().S(u.time.Format(time.RFC3339)) qw422016.E().S(u.time.Format(time.RFC3339))
//line app/vmalert/web.qtpl:450 //line app/vmalert/web.qtpl:454
qw422016.N().S(`</span> qw422016.N().S(`</span>
</td> </td>
<td class="text-center" wi>`) <td class="text-center" wi>`)
//line app/vmalert/web.qtpl:452 //line app/vmalert/web.qtpl:456
qw422016.N().D(u.samples) qw422016.N().D(u.samples)
//line app/vmalert/web.qtpl:452 //line app/vmalert/web.qtpl:456
qw422016.N().S(`</td> qw422016.N().S(`</td>
<td class="text-center">`) <td class="text-center">`)
//line app/vmalert/web.qtpl:453 //line app/vmalert/web.qtpl:457
qw422016.N().FPrec(u.duration.Seconds(), 3) qw422016.N().FPrec(u.duration.Seconds(), 3)
//line app/vmalert/web.qtpl:453 //line app/vmalert/web.qtpl:457
qw422016.N().S(`s</td> qw422016.N().S(`s</td>
<td class="text-center">`) <td class="text-center">`)
//line app/vmalert/web.qtpl:454 //line app/vmalert/web.qtpl:458
qw422016.E().S(u.at.Format(time.RFC3339)) qw422016.E().S(u.at.Format(time.RFC3339))
//line app/vmalert/web.qtpl:454 //line app/vmalert/web.qtpl:458
qw422016.N().S(`</td> qw422016.N().S(`</td>
<td> <td>
<textarea class="curl-area" rows="1" onclick="this.focus();this.select()">`) <textarea class="curl-area" rows="1" onclick="this.focus();this.select()">`)
//line app/vmalert/web.qtpl:456 //line app/vmalert/web.qtpl:460
qw422016.E().S(requestToCurl(u.req)) qw422016.E().S(requestToCurl(u.req))
//line app/vmalert/web.qtpl:456 //line app/vmalert/web.qtpl:460
qw422016.N().S(`</textarea> qw422016.N().S(`</textarea>
</td> </td>
</tr> </tr>
</li> </li>
`) `)
//line app/vmalert/web.qtpl:460 //line app/vmalert/web.qtpl:464
if u.err != nil { if u.err != nil {
//line app/vmalert/web.qtpl:460 //line app/vmalert/web.qtpl:464
qw422016.N().S(` qw422016.N().S(`
<tr`) <tr`)
//line app/vmalert/web.qtpl:461 //line app/vmalert/web.qtpl:465
if u.err != nil { if u.err != nil {
//line app/vmalert/web.qtpl:461 //line app/vmalert/web.qtpl:465
qw422016.N().S(` class="alert-danger"`) qw422016.N().S(` class="alert-danger"`)
//line app/vmalert/web.qtpl:461 //line app/vmalert/web.qtpl:465
} }
//line app/vmalert/web.qtpl:461 //line app/vmalert/web.qtpl:465
qw422016.N().S(`> qw422016.N().S(`>
<td colspan="4"> <td colspan="5">
<span class="alert-danger">`) <span class="alert-danger">`)
//line app/vmalert/web.qtpl:463 //line app/vmalert/web.qtpl:467
qw422016.E().V(u.err) qw422016.E().V(u.err)
//line app/vmalert/web.qtpl:463 //line app/vmalert/web.qtpl:467
qw422016.N().S(`</span> qw422016.N().S(`</span>
</td> </td>
</tr> </tr>
`) `)
//line app/vmalert/web.qtpl:466 //line app/vmalert/web.qtpl:470
} }
//line app/vmalert/web.qtpl:466 //line app/vmalert/web.qtpl:470
qw422016.N().S(` qw422016.N().S(`
`) `)
//line app/vmalert/web.qtpl:467 //line app/vmalert/web.qtpl:471
} }
//line app/vmalert/web.qtpl:467 //line app/vmalert/web.qtpl:471
qw422016.N().S(` qw422016.N().S(`
`) `)
//line app/vmalert/web.qtpl:469 //line app/vmalert/web.qtpl:473
tpl.StreamFooter(qw422016, r) tpl.StreamFooter(qw422016, r)
//line app/vmalert/web.qtpl:469 //line app/vmalert/web.qtpl:473
qw422016.N().S(` qw422016.N().S(`
`) `)
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
} }
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
func WriteRuleDetails(qq422016 qtio422016.Writer, r *http.Request, rule APIRule) { func WriteRuleDetails(qq422016 qtio422016.Writer, r *http.Request, rule APIRule) {
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
qw422016 := qt422016.AcquireWriter(qq422016) qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
StreamRuleDetails(qw422016, r, rule) StreamRuleDetails(qw422016, r, rule)
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
qt422016.ReleaseWriter(qw422016) qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
} }
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
func RuleDetails(r *http.Request, rule APIRule) string { func RuleDetails(r *http.Request, rule APIRule) string {
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
qb422016 := qt422016.AcquireByteBuffer() qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
WriteRuleDetails(qb422016, r, rule) WriteRuleDetails(qb422016, r, rule)
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
qs422016 := string(qb422016.B) qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
qt422016.ReleaseByteBuffer(qb422016) qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
return qs422016 return qs422016
//line app/vmalert/web.qtpl:470 //line app/vmalert/web.qtpl:474
} }
//line app/vmalert/web.qtpl:474 //line app/vmalert/web.qtpl:478
func streambadgeState(qw422016 *qt422016.Writer, state string) { func streambadgeState(qw422016 *qt422016.Writer, state string) {
//line app/vmalert/web.qtpl:474 //line app/vmalert/web.qtpl:478
qw422016.N().S(` qw422016.N().S(`
`) `)
//line app/vmalert/web.qtpl:476 //line app/vmalert/web.qtpl:480
badgeClass := "bg-warning text-dark" badgeClass := "bg-warning text-dark"
if state == "firing" { if state == "firing" {
badgeClass = "bg-danger" badgeClass = "bg-danger"
} }
//line app/vmalert/web.qtpl:480 //line app/vmalert/web.qtpl:484
qw422016.N().S(` qw422016.N().S(`
<span class="badge `) <span class="badge `)
//line app/vmalert/web.qtpl:481 //line app/vmalert/web.qtpl:485
qw422016.E().S(badgeClass) qw422016.E().S(badgeClass)
//line app/vmalert/web.qtpl:481 //line app/vmalert/web.qtpl:485
qw422016.N().S(`">`) qw422016.N().S(`">`)
//line app/vmalert/web.qtpl:481 //line app/vmalert/web.qtpl:485
qw422016.E().S(state) qw422016.E().S(state)
//line app/vmalert/web.qtpl:481 //line app/vmalert/web.qtpl:485
qw422016.N().S(`</span> qw422016.N().S(`</span>
`) `)
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
} }
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
func writebadgeState(qq422016 qtio422016.Writer, state string) { func writebadgeState(qq422016 qtio422016.Writer, state string) {
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
qw422016 := qt422016.AcquireWriter(qq422016) qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
streambadgeState(qw422016, state) streambadgeState(qw422016, state)
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
qt422016.ReleaseWriter(qw422016) qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
} }
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
func badgeState(state string) string { func badgeState(state string) string {
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
qb422016 := qt422016.AcquireByteBuffer() qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
writebadgeState(qb422016, state) writebadgeState(qb422016, state)
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
qs422016 := string(qb422016.B) qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
qt422016.ReleaseByteBuffer(qb422016) qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
return qs422016 return qs422016
//line app/vmalert/web.qtpl:482 //line app/vmalert/web.qtpl:486
} }
//line app/vmalert/web.qtpl:484 //line app/vmalert/web.qtpl:488
func streambadgeRestored(qw422016 *qt422016.Writer) { func streambadgeRestored(qw422016 *qt422016.Writer) {
//line app/vmalert/web.qtpl:484 //line app/vmalert/web.qtpl:488
qw422016.N().S(` qw422016.N().S(`
<span class="badge bg-warning text-dark" title="Alert state was restored after the service restart from remote storage">restored</span> <span class="badge bg-warning text-dark" title="Alert state was restored after the service restart from remote storage">restored</span>
`) `)
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
} }
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
func writebadgeRestored(qq422016 qtio422016.Writer) { func writebadgeRestored(qq422016 qtio422016.Writer) {
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
qw422016 := qt422016.AcquireWriter(qq422016) qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
streambadgeRestored(qw422016) streambadgeRestored(qw422016)
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
qt422016.ReleaseWriter(qw422016) qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
} }
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
func badgeRestored() string { func badgeRestored() string {
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
qb422016 := qt422016.AcquireByteBuffer() qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
writebadgeRestored(qb422016) writebadgeRestored(qb422016)
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
qs422016 := string(qb422016.B) qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
qt422016.ReleaseByteBuffer(qb422016) qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
return qs422016 return qs422016
//line app/vmalert/web.qtpl:486 //line app/vmalert/web.qtpl:490
} }

View file

@ -642,6 +642,61 @@ Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/1495
If you have suggestions for improvements or have found a bug - please open an issue on github or add If you have suggestions for improvements or have found a bug - please open an issue on github or add
a review to the dashboard. a review to the dashboard.
## Troubleshooting
vmalert executes configured rules within certain intervals. It is expected that at the moment when rule is executed,
the data is already present in configured `-datasource.url`:
<img alt="vmalert expected evaluation" src="vmalert_ts_normal.gif">
Usually, troubles start to appear when data in `-datasource.url` is delayed or absent. In such cases, evaluations
may get empty response from datasource and produce empty recording rules or reset alerts state:
<img alt="vmalert evaluation when data is delayed" src="vmalert_ts_data_delay.gif">
Try the following recommendations in such cases:
* Always configure group's `evaluationInterval` to be bigger or equal to `scrape_interval` at which metrics
are delivered to the datasource;
* If you know in advance, that data in datasource is delayed - try changing vmalert's `-datasource.lookback`
command-line flag to add a time shift for evaluations;
* If time intervals between datapoints in datasource are irregular - try changing vmalert's `-datasource.queryStep`
command-line flag to specify how far search query can lookback for the recent datapoint. By default, this value
is equal to group's `evaluationInterval`.
Sometimes, it is not clear why some specific alert fired or didn't fire. It is very important to remember, that
alerts with `for: 0` fire immediately when their expression becomes true. And alerts with `for > 0` will fire only
after multiple consecutive evaluations, and at each evaluation their expression must be true. If at least one evaluation
becomes false, then alert's state resets to the initial state.
If `-remoteWrite.url` command-line flag is configured, vmalert will persist alert's state in form of time series
`ALERTS` and `ALERTS_FOR_STATE` to the specified destination. Such time series can be then queried via
[vmui](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmui) or Grafana to track how alerts state
changed in time.
vmalert also stores last N state updates for each rule. To check updates, click on `Details` link next to rule's name
on `/vmalert/groups` page and check the `Last updates` section:
<img alt="vmalert state" src="vmalert_state.png">
Rows in the section represent ordered rule evaluations and their results. The column `curl` contains an example of
HTTP request sent by vmalert to the `-datasource.url` during evaluation. If specific state shows that there were
no samples returned and curl command returns data - then it is very likely there was no data in datasource on the
moment when rule was evaluated.
vmalert also alows configuring more detailed logging for specific rule. Just set `debug: true` in rule's configuration
and vmalert will start printing additional log messages:
```terminal
2022-09-15T13:35:41.155Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:41+02:00: query returned 0 samples (elapsed: 5.896041ms)
2022-09-15T13:35:56.149Z DEBUG datasource request: executing POST request with params "denyPartialResponse=true&query=sum%28vm_tcplistener_conns%7Binstance%3D%22localhost%3A8429%22%7D%29+by%28instance%29+%3E+0&step=15s&time=1663248945"
2022-09-15T13:35:56.178Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:56+02:00: query returned 1 samples (elapsed: 28.368208ms)
2022-09-15T13:35:56.178Z DEBUG datasource request: executing POST request with params "denyPartialResponse=true&query=sum%28vm_tcplistener_conns%7Binstance%3D%22localhost%3A8429%22%7D%29&step=15s&time=1663248945"
2022-09-15T13:35:56.179Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:56+02:00: alert 10705778000901301787 {alertgroup="TestGroup",alertname="Conns",cluster="east-1",instance="localhost:8429",replica="a"} created in state PENDING
...
2022-09-15T13:36:56.153Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:36:56+02:00: alert 10705778000901301787 {alertgroup="TestGroup",alertname="Conns",cluster="east-1",instance="localhost:8429",replica="a"} PENDING => FIRING: 1m0s since becoming active at 2022-09-15 15:35:56.126006 +0200 CEST m=+39.384575417
```
## Profiling ## Profiling
`vmalert` provides handlers for collecting the following [Go profiles](https://blog.golang.org/profiling-go-programs): `vmalert` provides handlers for collecting the following [Go profiles](https://blog.golang.org/profiling-go-programs):

BIN
docs/vmalert_state.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 109 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

BIN
docs/vmalert_ts_normal.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB