mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2025-01-20 15:16:42 +00:00
vmalert: add Troubleshooting section to docs (#3115)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit is contained in:
parent
8ca42b9bcb
commit
1c13cce5ed
10 changed files with 250 additions and 116 deletions
|
@ -638,6 +638,61 @@ Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/1495
|
|||
If you have suggestions for improvements or have found a bug - please open an issue on github or add
|
||||
a review to the dashboard.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
vmalert executes configured rules within certain intervals. It is expected that at the moment when rule is executed,
|
||||
the data is already present in configured `-datasource.url`:
|
||||
|
||||
<img alt="vmalert expected evaluation" src="vmalert_ts_normal.gif">
|
||||
|
||||
Usually, troubles start to appear when data in `-datasource.url` is delayed or absent. In such cases, evaluations
|
||||
may get empty response from datasource and produce empty recording rules or reset alerts state:
|
||||
|
||||
<img alt="vmalert evaluation when data is delayed" src="vmalert_ts_data_delay.gif">
|
||||
|
||||
Try the following recommendations in such cases:
|
||||
|
||||
* Always configure group's `evaluationInterval` to be bigger or equal to `scrape_interval` at which metrics
|
||||
are delivered to the datasource;
|
||||
* If you know in advance, that data in datasource is delayed - try changing vmalert's `-datasource.lookback`
|
||||
command-line flag to add a time shift for evaluations;
|
||||
* If time intervals between datapoints in datasource are irregular - try changing vmalert's `-datasource.queryStep`
|
||||
command-line flag to specify how far search query can lookback for the recent datapoint. By default, this value
|
||||
is equal to group's `evaluationInterval`.
|
||||
|
||||
Sometimes, it is not clear why some specific alert fired or didn't fire. It is very important to remember, that
|
||||
alerts with `for: 0` fire immediately when their expression becomes true. And alerts with `for > 0` will fire only
|
||||
after multiple consecutive evaluations, and at each evaluation their expression must be true. If at least one evaluation
|
||||
becomes false, then alert's state resets to the initial state.
|
||||
|
||||
If `-remoteWrite.url` command-line flag is configured, vmalert will persist alert's state in form of time series
|
||||
`ALERTS` and `ALERTS_FOR_STATE` to the specified destination. Such time series can be then queried via
|
||||
[vmui](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmui) or Grafana to track how alerts state
|
||||
changed in time.
|
||||
|
||||
vmalert also stores last N state updates for each rule. To check updates, click on `Details` link next to rule's name
|
||||
on `/vmalert/groups` page and check the `Last updates` section:
|
||||
|
||||
<img alt="vmalert state" src="vmalert_state.png">
|
||||
|
||||
Rows in the section represent ordered rule evaluations and their results. The column `curl` contains an example of
|
||||
HTTP request sent by vmalert to the `-datasource.url` during evaluation. If specific state shows that there were
|
||||
no samples returned and curl command returns data - then it is very likely there was no data in datasource on the
|
||||
moment when rule was evaluated.
|
||||
|
||||
vmalert also alows configuring more detailed logging for specific rule. Just set `debug: true` in rule's configuration
|
||||
and vmalert will start printing additional log messages:
|
||||
```terminal
|
||||
2022-09-15T13:35:41.155Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:41+02:00: query returned 0 samples (elapsed: 5.896041ms)
|
||||
2022-09-15T13:35:56.149Z DEBUG datasource request: executing POST request with params "denyPartialResponse=true&query=sum%28vm_tcplistener_conns%7Binstance%3D%22localhost%3A8429%22%7D%29+by%28instance%29+%3E+0&step=15s&time=1663248945"
|
||||
2022-09-15T13:35:56.178Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:56+02:00: query returned 1 samples (elapsed: 28.368208ms)
|
||||
2022-09-15T13:35:56.178Z DEBUG datasource request: executing POST request with params "denyPartialResponse=true&query=sum%28vm_tcplistener_conns%7Binstance%3D%22localhost%3A8429%22%7D%29&step=15s&time=1663248945"
|
||||
2022-09-15T13:35:56.179Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:56+02:00: alert 10705778000901301787 {alertgroup="TestGroup",alertname="Conns",cluster="east-1",instance="localhost:8429",replica="a"} created in state PENDING
|
||||
...
|
||||
2022-09-15T13:36:56.153Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:36:56+02:00: alert 10705778000901301787 {alertgroup="TestGroup",alertname="Conns",cluster="east-1",instance="localhost:8429",replica="a"} PENDING => FIRING: 1m0s since becoming active at 2022-09-15 15:35:56.126006 +0200 CEST m=+39.384575417
|
||||
```
|
||||
|
||||
|
||||
## Profiling
|
||||
|
||||
`vmalert` provides handlers for collecting the following [Go profiles](https://blog.golang.org/profiling-go-programs):
|
||||
|
|
BIN
app/vmalert/vmalert_state.png
Normal file
BIN
app/vmalert/vmalert_state.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 109 KiB |
BIN
app/vmalert/vmalert_ts_data_delay.gif
Normal file
BIN
app/vmalert/vmalert_ts_data_delay.gif
Normal file
Binary file not shown.
After Width: | Height: | Size: 41 KiB |
BIN
app/vmalert/vmalert_ts_normal.gif
Normal file
BIN
app/vmalert/vmalert_ts_normal.gif
Normal file
Binary file not shown.
After Width: | Height: | Size: 41 KiB |
|
@ -384,6 +384,7 @@
|
|||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% if rule.Type == "alerting" %}
|
||||
<div class="container border-bottom p-2">
|
||||
<div class="row">
|
||||
<div class="col-2">
|
||||
|
@ -394,6 +395,7 @@
|
|||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% endif %}
|
||||
<div class="container border-bottom p-2">
|
||||
<div class="row">
|
||||
<div class="col-2">
|
||||
|
@ -406,6 +408,7 @@
|
|||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% if rule.Type == "alerting" %}
|
||||
<div class="container border-bottom p-2">
|
||||
<div class="row">
|
||||
<div class="col-2">
|
||||
|
@ -419,6 +422,7 @@
|
|||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% endif %}
|
||||
<div class="container border-bottom p-2">
|
||||
<div class="row">
|
||||
<div class="col-2">
|
||||
|
|
|
@ -1187,6 +1187,11 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
|
|||
</div>
|
||||
</div>
|
||||
</div>
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:387
|
||||
if rule.Type == "alerting" {
|
||||
//line app/vmalert/web.qtpl:387
|
||||
qw422016.N().S(`
|
||||
<div class="container border-bottom p-2">
|
||||
<div class="row">
|
||||
<div class="col-2">
|
||||
|
@ -1194,13 +1199,18 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
|
|||
</div>
|
||||
<div class="col">
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:393
|
||||
qw422016.E().V(rule.Duration)
|
||||
//line app/vmalert/web.qtpl:393
|
||||
qw422016.N().S(` seconds
|
||||
//line app/vmalert/web.qtpl:394
|
||||
qw422016.E().V(rule.Duration)
|
||||
//line app/vmalert/web.qtpl:394
|
||||
qw422016.N().S(` seconds
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:398
|
||||
}
|
||||
//line app/vmalert/web.qtpl:398
|
||||
qw422016.N().S(`
|
||||
<div class="container border-bottom p-2">
|
||||
<div class="row">
|
||||
<div class="col-2">
|
||||
|
@ -1208,27 +1218,32 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
|
|||
</div>
|
||||
<div class="col">
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:403
|
||||
//line app/vmalert/web.qtpl:405
|
||||
for _, k := range labelKeys {
|
||||
//line app/vmalert/web.qtpl:403
|
||||
//line app/vmalert/web.qtpl:405
|
||||
qw422016.N().S(`
|
||||
<span class="m-1 badge bg-primary">`)
|
||||
//line app/vmalert/web.qtpl:404
|
||||
//line app/vmalert/web.qtpl:406
|
||||
qw422016.E().S(k)
|
||||
//line app/vmalert/web.qtpl:404
|
||||
//line app/vmalert/web.qtpl:406
|
||||
qw422016.N().S(`=`)
|
||||
//line app/vmalert/web.qtpl:404
|
||||
//line app/vmalert/web.qtpl:406
|
||||
qw422016.E().S(rule.Labels[k])
|
||||
//line app/vmalert/web.qtpl:404
|
||||
//line app/vmalert/web.qtpl:406
|
||||
qw422016.N().S(`</span>
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:405
|
||||
//line app/vmalert/web.qtpl:407
|
||||
}
|
||||
//line app/vmalert/web.qtpl:405
|
||||
//line app/vmalert/web.qtpl:407
|
||||
qw422016.N().S(`
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:411
|
||||
if rule.Type == "alerting" {
|
||||
//line app/vmalert/web.qtpl:411
|
||||
qw422016.N().S(`
|
||||
<div class="container border-bottom p-2">
|
||||
<div class="row">
|
||||
<div class="col-2">
|
||||
|
@ -1236,28 +1251,33 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
|
|||
</div>
|
||||
<div class="col">
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:415
|
||||
for _, k := range annotationKeys {
|
||||
//line app/vmalert/web.qtpl:415
|
||||
qw422016.N().S(`
|
||||
//line app/vmalert/web.qtpl:418
|
||||
for _, k := range annotationKeys {
|
||||
//line app/vmalert/web.qtpl:418
|
||||
qw422016.N().S(`
|
||||
<b>`)
|
||||
//line app/vmalert/web.qtpl:416
|
||||
qw422016.E().S(k)
|
||||
//line app/vmalert/web.qtpl:416
|
||||
qw422016.N().S(`:</b><br>
|
||||
//line app/vmalert/web.qtpl:419
|
||||
qw422016.E().S(k)
|
||||
//line app/vmalert/web.qtpl:419
|
||||
qw422016.N().S(`:</b><br>
|
||||
<p>`)
|
||||
//line app/vmalert/web.qtpl:417
|
||||
qw422016.E().S(rule.Annotations[k])
|
||||
//line app/vmalert/web.qtpl:417
|
||||
qw422016.N().S(`</p>
|
||||
//line app/vmalert/web.qtpl:420
|
||||
qw422016.E().S(rule.Annotations[k])
|
||||
//line app/vmalert/web.qtpl:420
|
||||
qw422016.N().S(`</p>
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:418
|
||||
}
|
||||
//line app/vmalert/web.qtpl:418
|
||||
qw422016.N().S(`
|
||||
//line app/vmalert/web.qtpl:421
|
||||
}
|
||||
//line app/vmalert/web.qtpl:421
|
||||
qw422016.N().S(`
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:425
|
||||
}
|
||||
//line app/vmalert/web.qtpl:425
|
||||
qw422016.N().S(`
|
||||
<div class="container border-bottom p-2">
|
||||
<div class="row">
|
||||
<div class="col-2">
|
||||
|
@ -1265,17 +1285,17 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
|
|||
</div>
|
||||
<div class="col">
|
||||
<a target="_blank" href="`)
|
||||
//line app/vmalert/web.qtpl:428
|
||||
//line app/vmalert/web.qtpl:432
|
||||
qw422016.E().S(prefix)
|
||||
//line app/vmalert/web.qtpl:428
|
||||
//line app/vmalert/web.qtpl:432
|
||||
qw422016.N().S(`groups#group-`)
|
||||
//line app/vmalert/web.qtpl:428
|
||||
//line app/vmalert/web.qtpl:432
|
||||
qw422016.E().S(rule.GroupID)
|
||||
//line app/vmalert/web.qtpl:428
|
||||
//line app/vmalert/web.qtpl:432
|
||||
qw422016.N().S(`">`)
|
||||
//line app/vmalert/web.qtpl:428
|
||||
//line app/vmalert/web.qtpl:432
|
||||
qw422016.E().S(rule.GroupID)
|
||||
//line app/vmalert/web.qtpl:428
|
||||
//line app/vmalert/web.qtpl:432
|
||||
qw422016.N().S(`</a>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -1283,9 +1303,9 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
|
|||
|
||||
<br>
|
||||
<div class="display-6 pb-3">Last `)
|
||||
//line app/vmalert/web.qtpl:434
|
||||
//line app/vmalert/web.qtpl:438
|
||||
qw422016.N().D(len(rule.Updates))
|
||||
//line app/vmalert/web.qtpl:434
|
||||
//line app/vmalert/web.qtpl:438
|
||||
qw422016.N().S(` updates</span>:</div>
|
||||
<table class="table table-striped table-hover table-sm">
|
||||
<thead>
|
||||
|
@ -1300,201 +1320,201 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
|
|||
<tbody>
|
||||
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:447
|
||||
//line app/vmalert/web.qtpl:451
|
||||
for _, u := range rule.Updates {
|
||||
//line app/vmalert/web.qtpl:447
|
||||
//line app/vmalert/web.qtpl:451
|
||||
qw422016.N().S(`
|
||||
<tr`)
|
||||
//line app/vmalert/web.qtpl:448
|
||||
//line app/vmalert/web.qtpl:452
|
||||
if u.err != nil {
|
||||
//line app/vmalert/web.qtpl:448
|
||||
//line app/vmalert/web.qtpl:452
|
||||
qw422016.N().S(` class="alert-danger"`)
|
||||
//line app/vmalert/web.qtpl:448
|
||||
//line app/vmalert/web.qtpl:452
|
||||
}
|
||||
//line app/vmalert/web.qtpl:448
|
||||
//line app/vmalert/web.qtpl:452
|
||||
qw422016.N().S(`>
|
||||
<td>
|
||||
<span class="badge bg-primary rounded-pill me-3" title="Updated at">`)
|
||||
//line app/vmalert/web.qtpl:450
|
||||
//line app/vmalert/web.qtpl:454
|
||||
qw422016.E().S(u.time.Format(time.RFC3339))
|
||||
//line app/vmalert/web.qtpl:450
|
||||
//line app/vmalert/web.qtpl:454
|
||||
qw422016.N().S(`</span>
|
||||
</td>
|
||||
<td class="text-center" wi>`)
|
||||
//line app/vmalert/web.qtpl:452
|
||||
//line app/vmalert/web.qtpl:456
|
||||
qw422016.N().D(u.samples)
|
||||
//line app/vmalert/web.qtpl:452
|
||||
//line app/vmalert/web.qtpl:456
|
||||
qw422016.N().S(`</td>
|
||||
<td class="text-center">`)
|
||||
//line app/vmalert/web.qtpl:453
|
||||
//line app/vmalert/web.qtpl:457
|
||||
qw422016.N().FPrec(u.duration.Seconds(), 3)
|
||||
//line app/vmalert/web.qtpl:453
|
||||
//line app/vmalert/web.qtpl:457
|
||||
qw422016.N().S(`s</td>
|
||||
<td class="text-center">`)
|
||||
//line app/vmalert/web.qtpl:454
|
||||
//line app/vmalert/web.qtpl:458
|
||||
qw422016.E().S(u.at.Format(time.RFC3339))
|
||||
//line app/vmalert/web.qtpl:454
|
||||
//line app/vmalert/web.qtpl:458
|
||||
qw422016.N().S(`</td>
|
||||
<td>
|
||||
<textarea class="curl-area" rows="1" onclick="this.focus();this.select()">`)
|
||||
//line app/vmalert/web.qtpl:456
|
||||
//line app/vmalert/web.qtpl:460
|
||||
qw422016.E().S(requestToCurl(u.req))
|
||||
//line app/vmalert/web.qtpl:456
|
||||
//line app/vmalert/web.qtpl:460
|
||||
qw422016.N().S(`</textarea>
|
||||
</td>
|
||||
</tr>
|
||||
</li>
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:460
|
||||
//line app/vmalert/web.qtpl:464
|
||||
if u.err != nil {
|
||||
//line app/vmalert/web.qtpl:460
|
||||
//line app/vmalert/web.qtpl:464
|
||||
qw422016.N().S(`
|
||||
<tr`)
|
||||
//line app/vmalert/web.qtpl:461
|
||||
//line app/vmalert/web.qtpl:465
|
||||
if u.err != nil {
|
||||
//line app/vmalert/web.qtpl:461
|
||||
//line app/vmalert/web.qtpl:465
|
||||
qw422016.N().S(` class="alert-danger"`)
|
||||
//line app/vmalert/web.qtpl:461
|
||||
//line app/vmalert/web.qtpl:465
|
||||
}
|
||||
//line app/vmalert/web.qtpl:461
|
||||
//line app/vmalert/web.qtpl:465
|
||||
qw422016.N().S(`>
|
||||
<td colspan="4">
|
||||
<td colspan="5">
|
||||
<span class="alert-danger">`)
|
||||
//line app/vmalert/web.qtpl:463
|
||||
//line app/vmalert/web.qtpl:467
|
||||
qw422016.E().V(u.err)
|
||||
//line app/vmalert/web.qtpl:463
|
||||
//line app/vmalert/web.qtpl:467
|
||||
qw422016.N().S(`</span>
|
||||
</td>
|
||||
</tr>
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:466
|
||||
//line app/vmalert/web.qtpl:470
|
||||
}
|
||||
//line app/vmalert/web.qtpl:466
|
||||
//line app/vmalert/web.qtpl:470
|
||||
qw422016.N().S(`
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:467
|
||||
//line app/vmalert/web.qtpl:471
|
||||
}
|
||||
//line app/vmalert/web.qtpl:467
|
||||
//line app/vmalert/web.qtpl:471
|
||||
qw422016.N().S(`
|
||||
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:469
|
||||
//line app/vmalert/web.qtpl:473
|
||||
tpl.StreamFooter(qw422016, r)
|
||||
//line app/vmalert/web.qtpl:469
|
||||
//line app/vmalert/web.qtpl:473
|
||||
qw422016.N().S(`
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
}
|
||||
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
func WriteRuleDetails(qq422016 qtio422016.Writer, r *http.Request, rule APIRule) {
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
qw422016 := qt422016.AcquireWriter(qq422016)
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
StreamRuleDetails(qw422016, r, rule)
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
qt422016.ReleaseWriter(qw422016)
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
}
|
||||
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
func RuleDetails(r *http.Request, rule APIRule) string {
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
qb422016 := qt422016.AcquireByteBuffer()
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
WriteRuleDetails(qb422016, r, rule)
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
qs422016 := string(qb422016.B)
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
qt422016.ReleaseByteBuffer(qb422016)
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
return qs422016
|
||||
//line app/vmalert/web.qtpl:470
|
||||
//line app/vmalert/web.qtpl:474
|
||||
}
|
||||
|
||||
//line app/vmalert/web.qtpl:474
|
||||
//line app/vmalert/web.qtpl:478
|
||||
func streambadgeState(qw422016 *qt422016.Writer, state string) {
|
||||
//line app/vmalert/web.qtpl:474
|
||||
//line app/vmalert/web.qtpl:478
|
||||
qw422016.N().S(`
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:476
|
||||
//line app/vmalert/web.qtpl:480
|
||||
badgeClass := "bg-warning text-dark"
|
||||
if state == "firing" {
|
||||
badgeClass = "bg-danger"
|
||||
}
|
||||
|
||||
//line app/vmalert/web.qtpl:480
|
||||
//line app/vmalert/web.qtpl:484
|
||||
qw422016.N().S(`
|
||||
<span class="badge `)
|
||||
//line app/vmalert/web.qtpl:481
|
||||
//line app/vmalert/web.qtpl:485
|
||||
qw422016.E().S(badgeClass)
|
||||
//line app/vmalert/web.qtpl:481
|
||||
//line app/vmalert/web.qtpl:485
|
||||
qw422016.N().S(`">`)
|
||||
//line app/vmalert/web.qtpl:481
|
||||
//line app/vmalert/web.qtpl:485
|
||||
qw422016.E().S(state)
|
||||
//line app/vmalert/web.qtpl:481
|
||||
//line app/vmalert/web.qtpl:485
|
||||
qw422016.N().S(`</span>
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
}
|
||||
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
func writebadgeState(qq422016 qtio422016.Writer, state string) {
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
qw422016 := qt422016.AcquireWriter(qq422016)
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
streambadgeState(qw422016, state)
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
qt422016.ReleaseWriter(qw422016)
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
}
|
||||
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
func badgeState(state string) string {
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
qb422016 := qt422016.AcquireByteBuffer()
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
writebadgeState(qb422016, state)
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
qs422016 := string(qb422016.B)
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
qt422016.ReleaseByteBuffer(qb422016)
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
return qs422016
|
||||
//line app/vmalert/web.qtpl:482
|
||||
//line app/vmalert/web.qtpl:486
|
||||
}
|
||||
|
||||
//line app/vmalert/web.qtpl:484
|
||||
//line app/vmalert/web.qtpl:488
|
||||
func streambadgeRestored(qw422016 *qt422016.Writer) {
|
||||
//line app/vmalert/web.qtpl:484
|
||||
//line app/vmalert/web.qtpl:488
|
||||
qw422016.N().S(`
|
||||
<span class="badge bg-warning text-dark" title="Alert state was restored after the service restart from remote storage">restored</span>
|
||||
`)
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
}
|
||||
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
func writebadgeRestored(qq422016 qtio422016.Writer) {
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
qw422016 := qt422016.AcquireWriter(qq422016)
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
streambadgeRestored(qw422016)
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
qt422016.ReleaseWriter(qw422016)
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
}
|
||||
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
func badgeRestored() string {
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
qb422016 := qt422016.AcquireByteBuffer()
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
writebadgeRestored(qb422016)
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
qs422016 := string(qb422016.B)
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
qt422016.ReleaseByteBuffer(qb422016)
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
return qs422016
|
||||
//line app/vmalert/web.qtpl:486
|
||||
//line app/vmalert/web.qtpl:490
|
||||
}
|
||||
|
|
|
@ -642,6 +642,61 @@ Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/1495
|
|||
If you have suggestions for improvements or have found a bug - please open an issue on github or add
|
||||
a review to the dashboard.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
vmalert executes configured rules within certain intervals. It is expected that at the moment when rule is executed,
|
||||
the data is already present in configured `-datasource.url`:
|
||||
|
||||
<img alt="vmalert expected evaluation" src="vmalert_ts_normal.gif">
|
||||
|
||||
Usually, troubles start to appear when data in `-datasource.url` is delayed or absent. In such cases, evaluations
|
||||
may get empty response from datasource and produce empty recording rules or reset alerts state:
|
||||
|
||||
<img alt="vmalert evaluation when data is delayed" src="vmalert_ts_data_delay.gif">
|
||||
|
||||
Try the following recommendations in such cases:
|
||||
|
||||
* Always configure group's `evaluationInterval` to be bigger or equal to `scrape_interval` at which metrics
|
||||
are delivered to the datasource;
|
||||
* If you know in advance, that data in datasource is delayed - try changing vmalert's `-datasource.lookback`
|
||||
command-line flag to add a time shift for evaluations;
|
||||
* If time intervals between datapoints in datasource are irregular - try changing vmalert's `-datasource.queryStep`
|
||||
command-line flag to specify how far search query can lookback for the recent datapoint. By default, this value
|
||||
is equal to group's `evaluationInterval`.
|
||||
|
||||
Sometimes, it is not clear why some specific alert fired or didn't fire. It is very important to remember, that
|
||||
alerts with `for: 0` fire immediately when their expression becomes true. And alerts with `for > 0` will fire only
|
||||
after multiple consecutive evaluations, and at each evaluation their expression must be true. If at least one evaluation
|
||||
becomes false, then alert's state resets to the initial state.
|
||||
|
||||
If `-remoteWrite.url` command-line flag is configured, vmalert will persist alert's state in form of time series
|
||||
`ALERTS` and `ALERTS_FOR_STATE` to the specified destination. Such time series can be then queried via
|
||||
[vmui](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmui) or Grafana to track how alerts state
|
||||
changed in time.
|
||||
|
||||
vmalert also stores last N state updates for each rule. To check updates, click on `Details` link next to rule's name
|
||||
on `/vmalert/groups` page and check the `Last updates` section:
|
||||
|
||||
<img alt="vmalert state" src="vmalert_state.png">
|
||||
|
||||
Rows in the section represent ordered rule evaluations and their results. The column `curl` contains an example of
|
||||
HTTP request sent by vmalert to the `-datasource.url` during evaluation. If specific state shows that there were
|
||||
no samples returned and curl command returns data - then it is very likely there was no data in datasource on the
|
||||
moment when rule was evaluated.
|
||||
|
||||
vmalert also alows configuring more detailed logging for specific rule. Just set `debug: true` in rule's configuration
|
||||
and vmalert will start printing additional log messages:
|
||||
```terminal
|
||||
2022-09-15T13:35:41.155Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:41+02:00: query returned 0 samples (elapsed: 5.896041ms)
|
||||
2022-09-15T13:35:56.149Z DEBUG datasource request: executing POST request with params "denyPartialResponse=true&query=sum%28vm_tcplistener_conns%7Binstance%3D%22localhost%3A8429%22%7D%29+by%28instance%29+%3E+0&step=15s&time=1663248945"
|
||||
2022-09-15T13:35:56.178Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:56+02:00: query returned 1 samples (elapsed: 28.368208ms)
|
||||
2022-09-15T13:35:56.178Z DEBUG datasource request: executing POST request with params "denyPartialResponse=true&query=sum%28vm_tcplistener_conns%7Binstance%3D%22localhost%3A8429%22%7D%29&step=15s&time=1663248945"
|
||||
2022-09-15T13:35:56.179Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:56+02:00: alert 10705778000901301787 {alertgroup="TestGroup",alertname="Conns",cluster="east-1",instance="localhost:8429",replica="a"} created in state PENDING
|
||||
...
|
||||
2022-09-15T13:36:56.153Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:36:56+02:00: alert 10705778000901301787 {alertgroup="TestGroup",alertname="Conns",cluster="east-1",instance="localhost:8429",replica="a"} PENDING => FIRING: 1m0s since becoming active at 2022-09-15 15:35:56.126006 +0200 CEST m=+39.384575417
|
||||
```
|
||||
|
||||
|
||||
## Profiling
|
||||
|
||||
`vmalert` provides handlers for collecting the following [Go profiles](https://blog.golang.org/profiling-go-programs):
|
||||
|
|
BIN
docs/vmalert_state.png
Normal file
BIN
docs/vmalert_state.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 109 KiB |
BIN
docs/vmalert_ts_data_delay.gif
Normal file
BIN
docs/vmalert_ts_data_delay.gif
Normal file
Binary file not shown.
After Width: | Height: | Size: 41 KiB |
BIN
docs/vmalert_ts_normal.gif
Normal file
BIN
docs/vmalert_ts_normal.gif
Normal file
Binary file not shown.
After Width: | Height: | Size: 41 KiB |
Loading…
Reference in a new issue