mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2025-01-10 15:14:09 +00:00
vmalert: allow configuring the default number of stored rule's update states (#3556)
Allow configuring the default number of stored rule's update states in memory via global `-rule.updateEntriesLimit` command-line flag or per-rule via rule's `update_entries_limit` configuration param. Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit is contained in:
parent
f27bb19213
commit
5cf2998af8
19 changed files with 144 additions and 33 deletions
|
@ -191,6 +191,11 @@ expr: <string>
|
|||
# Is applicable to alerting rules only.
|
||||
[ debug: <bool> | default = false ]
|
||||
|
||||
# Defines the number of rule's updates entries stored in memory
|
||||
# and available for view on rule's Details page.
|
||||
# Overrides `rule.updateEntriesLimit` value for this specific rule.
|
||||
[ update_entries_limit: <integer> | default 0 ]
|
||||
|
||||
# Labels to add or overwrite for each alert.
|
||||
labels:
|
||||
[ <labelname>: <tmpl_string> ]
|
||||
|
@ -319,6 +324,12 @@ expr: <string>
|
|||
# Labels to add or overwrite before storing the result.
|
||||
labels:
|
||||
[ <labelname>: <labelvalue> ]
|
||||
|
||||
|
||||
# Defines the number of rule's updates entries stored in memory
|
||||
# and available for view on rule's Details page.
|
||||
# Overrides `rule.updateEntriesLimit` value for this specific rule.
|
||||
[ update_entries_limit: <integer> | default 0 ]
|
||||
```
|
||||
|
||||
For recording rules to work `-remoteWrite.url` must be specified.
|
||||
|
@ -695,7 +706,7 @@ may get empty response from datasource and produce empty recording rules or rese
|
|||
|
||||
<img alt="vmalert evaluation when data is delayed" src="vmalert_ts_data_delay.gif">
|
||||
|
||||
By default recently written samples to VictoriaMetrics aren't visible for queries for up to 30s.
|
||||
By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s.
|
||||
This behavior is controlled by `-search.latencyOffset` command-line flag and the `latency_offset` query ag at `vmselect`.
|
||||
Usually, this results into a 30s shift for recording rules results.
|
||||
Note that too small value passed to `-search.latencyOffset` or to `latency_offest` query arg may lead to incomplete query results.
|
||||
|
@ -721,8 +732,9 @@ If `-remoteWrite.url` command-line flag is configured, vmalert will persist aler
|
|||
[vmui](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmui) or Grafana to track how alerts state
|
||||
changed in time.
|
||||
|
||||
vmalert also stores last N state updates for each rule. To check updates, click on `Details` link next to rule's name
|
||||
on `/vmalert/groups` page and check the `Last updates` section:
|
||||
vmalert stores last `-rule.maxUpdateEntries` (or `update_entries_limit` [per-rule config](https://docs.victoriametrics.com/vmalert.html#alerting-rules))
|
||||
state updates for each rule. To check updates, click on `Details` link next to rule's name on `/vmalert/groups` page
|
||||
and check the `Last updates` section:
|
||||
|
||||
<img alt="vmalert state" src="vmalert_state.png">
|
||||
|
||||
|
@ -731,7 +743,7 @@ HTTP request sent by vmalert to the `-datasource.url` during evaluation. If spec
|
|||
no samples returned and curl command returns data - then it is very likely there was no data in datasource on the
|
||||
moment when rule was evaluated.
|
||||
|
||||
vmalert also alows configuring more detailed logging for specific rule. Just set `debug: true` in rule's configuration
|
||||
vmalert allows configuring more detailed logging for specific alerting rule. Just set `debug: true` in rule's configuration
|
||||
and vmalert will start printing additional log messages:
|
||||
```terminal
|
||||
2022-09-15T13:35:41.155Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:41+02:00: query returned 0 samples (elapsed: 5.896041ms)
|
||||
|
@ -890,6 +902,8 @@ The shortlist of configuration flags is the following:
|
|||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerJSONFields string
|
||||
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
|
@ -1092,6 +1106,8 @@ The shortlist of configuration flags is the following:
|
|||
Interval for checking for changes in '-rule' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes. DEPRECATED - see '-configCheckInterval' instead
|
||||
-rule.maxResolveDuration duration
|
||||
Limits the maximum duration for automatic alert expiration, which is by default equal to 3 evaluation intervals of the parent group.
|
||||
-rule.maxUpdateEntries int
|
||||
Defines the max number of rule's state updates. (default 20)
|
||||
-rule.resendDelay duration
|
||||
Minimum amount of time to wait before resending an alert to notifier
|
||||
-rule.templates array
|
||||
|
|
|
@ -74,10 +74,15 @@ func newAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule
|
|||
Debug: cfg.Debug,
|
||||
}),
|
||||
alerts: make(map[uint64]*notifier.Alert),
|
||||
state: newRuleState(),
|
||||
metrics: &alertingRuleMetrics{},
|
||||
}
|
||||
|
||||
if cfg.UpdateEntriesLimit != nil {
|
||||
ar.state = newRuleState(*cfg.UpdateEntriesLimit)
|
||||
} else {
|
||||
ar.state = newRuleState(*ruleUpdateEntriesLimit)
|
||||
}
|
||||
|
||||
labels := fmt.Sprintf(`alertname=%q, group=%q, id="%d"`, ar.Name, group.Name, ar.ID())
|
||||
ar.metrics.pending = utils.GetOrCreateGauge(fmt.Sprintf(`vmalert_alerts_pending{%s}`, labels),
|
||||
func() float64 {
|
||||
|
@ -491,6 +496,7 @@ func (ar *AlertingRule) ToAPI() APIRule {
|
|||
State: "inactive",
|
||||
Alerts: ar.AlertsToAPI(),
|
||||
LastSamples: lastState.samples,
|
||||
MaxUpdates: ar.state.size(),
|
||||
Updates: ar.state.getAll(),
|
||||
|
||||
// encode as strings to avoid rounding in JSON
|
||||
|
|
|
@ -709,7 +709,6 @@ func TestAlertingRule_Template(t *testing.T) {
|
|||
"summary": `{{ $labels.alertname }}: Too high connection number for "{{ $labels.instance }}"`,
|
||||
},
|
||||
alerts: make(map[uint64]*notifier.Alert),
|
||||
state: newRuleState(),
|
||||
},
|
||||
[]datasource.Metric{
|
||||
metricWithValueAndLabels(t, 1, "instance", "foo"),
|
||||
|
@ -749,7 +748,6 @@ func TestAlertingRule_Template(t *testing.T) {
|
|||
"description": `{{ $labels.alertname}}: It is {{ $value }} connections for "{{ $labels.instance }}"`,
|
||||
},
|
||||
alerts: make(map[uint64]*notifier.Alert),
|
||||
state: newRuleState(),
|
||||
},
|
||||
[]datasource.Metric{
|
||||
metricWithValueAndLabels(t, 2, "__name__", "first", "instance", "foo", alertNameLabel, "override"),
|
||||
|
@ -789,7 +787,6 @@ func TestAlertingRule_Template(t *testing.T) {
|
|||
"summary": `Alert "{{ $labels.alertname }}({{ $labels.alertgroup }})" for instance {{ $labels.instance }}`,
|
||||
},
|
||||
alerts: make(map[uint64]*notifier.Alert),
|
||||
state: newRuleState(),
|
||||
},
|
||||
[]datasource.Metric{
|
||||
metricWithValueAndLabels(t, 1,
|
||||
|
@ -820,6 +817,7 @@ func TestAlertingRule_Template(t *testing.T) {
|
|||
fq := &fakeQuerier{}
|
||||
tc.rule.GroupID = fakeGroup.ID()
|
||||
tc.rule.q = fq
|
||||
tc.rule.state = newRuleState(10)
|
||||
fq.add(tc.metrics...)
|
||||
if _, err := tc.rule.Exec(context.TODO(), time.Now(), 0); err != nil {
|
||||
t.Fatalf("unexpected err: %s", err)
|
||||
|
@ -936,6 +934,6 @@ func newTestAlertingRule(name string, waitFor time.Duration) *AlertingRule {
|
|||
For: waitFor,
|
||||
EvalInterval: waitFor,
|
||||
alerts: make(map[uint64]*notifier.Alert),
|
||||
state: newRuleState(),
|
||||
state: newRuleState(10),
|
||||
}
|
||||
}
|
||||
|
|
|
@ -114,6 +114,9 @@ type Rule struct {
|
|||
Labels map[string]string `yaml:"labels,omitempty"`
|
||||
Annotations map[string]string `yaml:"annotations,omitempty"`
|
||||
Debug bool `yaml:"debug,omitempty"`
|
||||
// UpdateEntriesLimit defines max number of rule's state updates stored in memory.
|
||||
// Overrides `-rule.updateEntriesLimit`.
|
||||
UpdateEntriesLimit *int `yaml:"update_entries_limit,omitempty"`
|
||||
|
||||
// Catches all undefined fields and must be empty after parsing.
|
||||
XXX map[string]interface{} `yaml:",inline"`
|
||||
|
|
|
@ -550,6 +550,20 @@ rules:
|
|||
- alert: foo
|
||||
expr: sum by(job) (up == 1)
|
||||
debug: true
|
||||
`)
|
||||
})
|
||||
t.Run("`update_entries_limit` change", func(t *testing.T) {
|
||||
f(t, `
|
||||
name: TestGroup
|
||||
rules:
|
||||
- alert: foo
|
||||
expr: sum by(job) (up == 1)
|
||||
`, `
|
||||
name: TestGroup
|
||||
rules:
|
||||
- alert: foo
|
||||
expr: sum by(job) (up == 1)
|
||||
update_entries_limit: 33
|
||||
`)
|
||||
})
|
||||
}
|
||||
|
|
|
@ -12,6 +12,7 @@ groups:
|
|||
expr: vm_tcplistener_conns > 0
|
||||
for: 3m
|
||||
debug: true
|
||||
update_entries_limit: 40
|
||||
annotations:
|
||||
labels: "Available labels: {{ $labels }}"
|
||||
summary: Too high connection number for {{ $labels.instance }}
|
||||
|
@ -20,6 +21,7 @@ groups:
|
|||
{{ end }}
|
||||
description: "It is {{ $value }} connections for {{$labels.instance}}"
|
||||
- alert: ExampleAlertAlwaysFiring
|
||||
update_entries_limit: -1
|
||||
expr: sum by(job)
|
||||
(up == 1)
|
||||
labels:
|
||||
|
|
|
@ -7,6 +7,7 @@ groups:
|
|||
- alert: Conns
|
||||
expr: filterSeries(sumSeries(host.receiver.interface.cons),'last','>', 500)
|
||||
for: 3m
|
||||
|
||||
annotations:
|
||||
summary: Too high connection number for {{$labels.instance}}
|
||||
description: "It is {{ $value }} connections for {{$labels.instance}}"
|
||||
|
|
|
@ -460,7 +460,7 @@ func TestFaultyRW(t *testing.T) {
|
|||
|
||||
r := &RecordingRule{
|
||||
Name: "test",
|
||||
state: newRuleState(),
|
||||
state: newRuleState(10),
|
||||
q: fq,
|
||||
}
|
||||
|
||||
|
|
|
@ -56,7 +56,9 @@ absolute path to all .tpl files in root.`)
|
|||
validateExpressions = flag.Bool("rule.validateExpressions", true, "Whether to validate rules expressions via MetricsQL engine")
|
||||
maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maximum duration for automatic alert expiration, "+
|
||||
"which is by default equal to 3 evaluation intervals of the parent group.")
|
||||
resendDelay = flag.Duration("rule.resendDelay", 0, "Minimum amount of time to wait before resending an alert to notifier")
|
||||
resendDelay = flag.Duration("rule.resendDelay", 0, "Minimum amount of time to wait before resending an alert to notifier")
|
||||
ruleUpdateEntriesLimit = flag.Int("rule.updateEntriesLimit", 20, "Defines the max number of rule's state updates stored in-memory. "+
|
||||
"Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overriden per rule via update_entries_limit param.")
|
||||
|
||||
externalURL = flag.String("external.url", "", "External URL is used as alert's source for sent alerts to the notifier")
|
||||
externalAlertSource = flag.String("external.alert.source", "", `External Alert Source allows to override the Source link for alerts sent to AlertManager `+
|
||||
|
|
|
@ -58,7 +58,6 @@ func newRecordingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rul
|
|||
Labels: cfg.Labels,
|
||||
GroupID: group.ID(),
|
||||
metrics: &recordingRuleMetrics{},
|
||||
state: newRuleState(),
|
||||
q: qb.BuildWithParams(datasource.QuerierParams{
|
||||
DataSourceType: group.Type.String(),
|
||||
EvaluationInterval: group.Interval,
|
||||
|
@ -67,6 +66,12 @@ func newRecordingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rul
|
|||
}),
|
||||
}
|
||||
|
||||
if cfg.UpdateEntriesLimit != nil {
|
||||
rr.state = newRuleState(*cfg.UpdateEntriesLimit)
|
||||
} else {
|
||||
rr.state = newRuleState(*ruleUpdateEntriesLimit)
|
||||
}
|
||||
|
||||
labels := fmt.Sprintf(`recording=%q, group=%q, id="%d"`, rr.Name, group.Name, rr.ID())
|
||||
rr.metrics.errors = utils.GetOrCreateGauge(fmt.Sprintf(`vmalert_recording_rules_error{%s}`, labels),
|
||||
func() float64 {
|
||||
|
@ -212,6 +217,7 @@ func (rr *RecordingRule) ToAPI() APIRule {
|
|||
EvaluationTime: lastState.duration.Seconds(),
|
||||
Health: "ok",
|
||||
LastSamples: lastState.samples,
|
||||
MaxUpdates: rr.state.size(),
|
||||
Updates: rr.state.getAll(),
|
||||
|
||||
// encode as strings to avoid rounding
|
||||
|
|
|
@ -19,7 +19,7 @@ func TestRecordingRule_Exec(t *testing.T) {
|
|||
expTS []prompbmarshal.TimeSeries
|
||||
}{
|
||||
{
|
||||
&RecordingRule{Name: "foo", state: newRuleState()},
|
||||
&RecordingRule{Name: "foo"},
|
||||
[]datasource.Metric{metricWithValueAndLabels(t, 10,
|
||||
"__name__", "bar",
|
||||
)},
|
||||
|
@ -30,7 +30,7 @@ func TestRecordingRule_Exec(t *testing.T) {
|
|||
},
|
||||
},
|
||||
{
|
||||
&RecordingRule{Name: "foobarbaz", state: newRuleState()},
|
||||
&RecordingRule{Name: "foobarbaz"},
|
||||
[]datasource.Metric{
|
||||
metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "foo"),
|
||||
metricWithValueAndLabels(t, 2, "__name__", "bar", "job", "bar"),
|
||||
|
@ -53,8 +53,7 @@ func TestRecordingRule_Exec(t *testing.T) {
|
|||
},
|
||||
{
|
||||
&RecordingRule{
|
||||
Name: "job:foo",
|
||||
state: newRuleState(),
|
||||
Name: "job:foo",
|
||||
Labels: map[string]string{
|
||||
"source": "test",
|
||||
}},
|
||||
|
@ -80,6 +79,7 @@ func TestRecordingRule_Exec(t *testing.T) {
|
|||
fq := &fakeQuerier{}
|
||||
fq.add(tc.metrics...)
|
||||
tc.rule.q = fq
|
||||
tc.rule.state = newRuleState(10)
|
||||
tss, err := tc.rule.Exec(context.TODO(), time.Now(), 0)
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected Exec err: %s", err)
|
||||
|
@ -198,7 +198,7 @@ func TestRecordingRuleLimit(t *testing.T) {
|
|||
metricWithValuesAndLabels(t, []float64{2, 3}, "__name__", "bar", "job", "bar"),
|
||||
metricWithValuesAndLabels(t, []float64{4, 5, 6}, "__name__", "baz", "job", "baz"),
|
||||
}
|
||||
rule := &RecordingRule{Name: "job:foo", state: newRuleState(), Labels: map[string]string{
|
||||
rule := &RecordingRule{Name: "job:foo", state: newRuleState(10), Labels: map[string]string{
|
||||
"source": "test_limit",
|
||||
}}
|
||||
var err error
|
||||
|
@ -216,7 +216,7 @@ func TestRecordingRuleLimit(t *testing.T) {
|
|||
func TestRecordingRule_ExecNegative(t *testing.T) {
|
||||
rr := &RecordingRule{
|
||||
Name: "job:foo",
|
||||
state: newRuleState(),
|
||||
state: newRuleState(10),
|
||||
Labels: map[string]string{
|
||||
"job": "test",
|
||||
},
|
||||
|
|
|
@ -37,6 +37,8 @@ type ruleState struct {
|
|||
sync.RWMutex
|
||||
entries []ruleStateEntry
|
||||
cur int
|
||||
// disabled defines whether ruleState tracks ruleStateEntry
|
||||
disabled bool
|
||||
}
|
||||
|
||||
type ruleStateEntry struct {
|
||||
|
@ -57,21 +59,36 @@ type ruleStateEntry struct {
|
|||
curl string
|
||||
}
|
||||
|
||||
const defaultStateEntriesLimit = 20
|
||||
|
||||
func newRuleState() *ruleState {
|
||||
func newRuleState(size int) *ruleState {
|
||||
if size < 1 {
|
||||
return &ruleState{disabled: true}
|
||||
}
|
||||
return &ruleState{
|
||||
entries: make([]ruleStateEntry, defaultStateEntriesLimit),
|
||||
entries: make([]ruleStateEntry, size),
|
||||
}
|
||||
}
|
||||
|
||||
func (s *ruleState) getLast() ruleStateEntry {
|
||||
if s.disabled {
|
||||
return ruleStateEntry{}
|
||||
}
|
||||
|
||||
s.RLock()
|
||||
defer s.RUnlock()
|
||||
return s.entries[s.cur]
|
||||
}
|
||||
|
||||
func (s *ruleState) size() int {
|
||||
s.RLock()
|
||||
defer s.RUnlock()
|
||||
return len(s.entries)
|
||||
}
|
||||
|
||||
func (s *ruleState) getAll() []ruleStateEntry {
|
||||
if s.disabled {
|
||||
return nil
|
||||
}
|
||||
|
||||
entries := make([]ruleStateEntry, 0)
|
||||
|
||||
s.RLock()
|
||||
|
@ -94,6 +111,10 @@ func (s *ruleState) getAll() []ruleStateEntry {
|
|||
}
|
||||
|
||||
func (s *ruleState) add(e ruleStateEntry) {
|
||||
if s.disabled {
|
||||
return
|
||||
}
|
||||
|
||||
s.Lock()
|
||||
defer s.Unlock()
|
||||
|
||||
|
|
|
@ -6,8 +6,27 @@ import (
|
|||
"time"
|
||||
)
|
||||
|
||||
func TestRule_stateDisabled(t *testing.T) {
|
||||
state := newRuleState(-1)
|
||||
e := state.getLast()
|
||||
if !e.at.IsZero() {
|
||||
t.Fatalf("expected entry to be zero")
|
||||
}
|
||||
|
||||
state.add(ruleStateEntry{at: time.Now()})
|
||||
if !e.at.IsZero() {
|
||||
t.Fatalf("expected entry to be zero")
|
||||
}
|
||||
|
||||
if len(state.getAll()) != 0 {
|
||||
t.Fatalf("expected for state to have %d entries; got %d",
|
||||
0, len(state.getAll()),
|
||||
)
|
||||
}
|
||||
}
|
||||
func TestRule_state(t *testing.T) {
|
||||
state := newRuleState()
|
||||
stateEntriesN := 20
|
||||
state := newRuleState(stateEntriesN)
|
||||
e := state.getLast()
|
||||
if !e.at.IsZero() {
|
||||
t.Fatalf("expected entry to be zero")
|
||||
|
@ -39,7 +58,7 @@ func TestRule_state(t *testing.T) {
|
|||
}
|
||||
|
||||
var last time.Time
|
||||
for i := 0; i < defaultStateEntriesLimit*2; i++ {
|
||||
for i := 0; i < stateEntriesN*2; i++ {
|
||||
last = time.Now()
|
||||
state.add(ruleStateEntry{at: last})
|
||||
}
|
||||
|
@ -50,9 +69,9 @@ func TestRule_state(t *testing.T) {
|
|||
e.at, last)
|
||||
}
|
||||
|
||||
if len(state.getAll()) != defaultStateEntriesLimit {
|
||||
if len(state.getAll()) != stateEntriesN {
|
||||
t.Fatalf("expected for state to have %d entries only; got %d",
|
||||
defaultStateEntriesLimit, len(state.getAll()),
|
||||
stateEntriesN, len(state.getAll()),
|
||||
)
|
||||
}
|
||||
}
|
||||
|
@ -61,7 +80,7 @@ func TestRule_state(t *testing.T) {
|
|||
// execution of state updates.
|
||||
// Should be executed with -race flag
|
||||
func TestRule_stateConcurrent(t *testing.T) {
|
||||
state := newRuleState()
|
||||
state := newRuleState(20)
|
||||
|
||||
const workers = 50
|
||||
const iterations = 100
|
||||
|
|
|
@ -440,7 +440,7 @@
|
|||
</div>
|
||||
|
||||
<br>
|
||||
<div class="display-6 pb-3">Last {%d len(rule.Updates) %} updates</span>:</div>
|
||||
<div class="display-6 pb-3">Last {%d len(rule.Updates) %}/{%d rule.MaxUpdates %} updates</span>:</div>
|
||||
<table class="table table-striped table-hover table-sm">
|
||||
<thead>
|
||||
<tr>
|
||||
|
|
|
@ -1345,6 +1345,10 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
|
|||
<div class="display-6 pb-3">Last `)
|
||||
//line app/vmalert/web.qtpl:443
|
||||
qw422016.N().D(len(rule.Updates))
|
||||
//line app/vmalert/web.qtpl:443
|
||||
qw422016.N().S(`/`)
|
||||
//line app/vmalert/web.qtpl:443
|
||||
qw422016.N().D(rule.MaxUpdates)
|
||||
//line app/vmalert/web.qtpl:443
|
||||
qw422016.N().S(` updates</span>:</div>
|
||||
<table class="table table-striped table-hover table-sm">
|
||||
|
|
|
@ -17,7 +17,7 @@ func TestHandler(t *testing.T) {
|
|||
alerts: map[uint64]*notifier.Alert{
|
||||
0: {State: notifier.StateFiring},
|
||||
},
|
||||
state: newRuleState(),
|
||||
state: newRuleState(10),
|
||||
}
|
||||
g := &Group{
|
||||
Name: "group",
|
||||
|
|
|
@ -121,6 +121,8 @@ type APIRule struct {
|
|||
// GroupID is an unique Group's ID
|
||||
GroupID string `json:"group_id"`
|
||||
|
||||
// MaxUpdates is the max number of recorded ruleStateEntry objects
|
||||
MaxUpdates int `json:"max_updates_entries"`
|
||||
// Updates contains the ordered list of recorded ruleStateEntry objects
|
||||
Updates []ruleStateEntry `json:"updates"`
|
||||
}
|
||||
|
|
|
@ -18,6 +18,7 @@ The following tip changes can be tested by building VictoriaMetrics components f
|
|||
* FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): add ability to explore metrics exported by a particular `job` / `instance`. See [these docs](https://docs.victoriametrics.com/#metrics-explorer) and [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3386).
|
||||
* FEATURE: allow passing partial `RFC3339` date/time to `time`, `start` and `end` query args at [querying APIs](https://docs.victoriametrics.com/#prometheus-querying-api-usage) and [export APIs](https://docs.victoriametrics.com/#how-to-export-time-series). For example, `2022` is equivalent to `2022-01-01T00:00:00Z`, while `2022-01-30T14` is equivalent to `2022-01-30T14:00:00Z`. See [these docs](https://docs.victoriametrics.com/#timestamp-formats).
|
||||
* FEATURE: [relabeling](https://docs.victoriametrics.com/vmagent.html#relabeling): add support for `keepequal` and `dropequal` relabeling actions, which are supported by Prometheus starting from [v2.41.0](https://github.com/prometheus/prometheus/releases/tag/v2.41.0). These relabeling actions are almost identical to `keep_if_equal` and `drop_if_equal` relabeling actions supported by VictoriaMetrics since `v1.38.0` - see [these docs](https://docs.victoriametrics.com/vmagent.html#relabeling-enhancements) - so it is recommended sticking to `keep_if_equal` and `drop_if_equal` actions instead of switching to `keepequal` and `dropequal`.
|
||||
* FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): allow configuring the default number of stored rule's update states in memory via global `-rule.updateEntriesLimit` command-line flag or per-rule via rule's `update_entries_limit` configuration param.
|
||||
|
||||
* BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): properly update the `step` value in url after the `step` input field has been manually changed. This allows preserving the proper `step` when copy-n-pasting the url to another instance of web browser. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3513).
|
||||
|
||||
|
|
|
@ -195,6 +195,11 @@ expr: <string>
|
|||
# Is applicable to alerting rules only.
|
||||
[ debug: <bool> | default = false ]
|
||||
|
||||
# Defines the number of rule's updates entries stored in memory
|
||||
# and available for view on rule's Details page.
|
||||
# Overrides `rule.updateEntriesLimit` value for this specific rule.
|
||||
[ update_entries_limit: <integer> | default 0 ]
|
||||
|
||||
# Labels to add or overwrite for each alert.
|
||||
labels:
|
||||
[ <labelname>: <tmpl_string> ]
|
||||
|
@ -323,6 +328,12 @@ expr: <string>
|
|||
# Labels to add or overwrite before storing the result.
|
||||
labels:
|
||||
[ <labelname>: <labelvalue> ]
|
||||
|
||||
|
||||
# Defines the number of rule's updates entries stored in memory
|
||||
# and available for view on rule's Details page.
|
||||
# Overrides `rule.updateEntriesLimit` value for this specific rule.
|
||||
[ update_entries_limit: <integer> | default 0 ]
|
||||
```
|
||||
|
||||
For recording rules to work `-remoteWrite.url` must be specified.
|
||||
|
@ -699,7 +710,7 @@ may get empty response from datasource and produce empty recording rules or rese
|
|||
|
||||
<img alt="vmalert evaluation when data is delayed" src="vmalert_ts_data_delay.gif">
|
||||
|
||||
By default recently written samples to VictoriaMetrics aren't visible for queries for up to 30s.
|
||||
By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s.
|
||||
This behavior is controlled by `-search.latencyOffset` command-line flag and the `latency_offset` query ag at `vmselect`.
|
||||
Usually, this results into a 30s shift for recording rules results.
|
||||
Note that too small value passed to `-search.latencyOffset` or to `latency_offest` query arg may lead to incomplete query results.
|
||||
|
@ -725,8 +736,9 @@ If `-remoteWrite.url` command-line flag is configured, vmalert will persist aler
|
|||
[vmui](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmui) or Grafana to track how alerts state
|
||||
changed in time.
|
||||
|
||||
vmalert also stores last N state updates for each rule. To check updates, click on `Details` link next to rule's name
|
||||
on `/vmalert/groups` page and check the `Last updates` section:
|
||||
vmalert stores last `-rule.maxUpdateEntries` (or `update_entries_limit` [per-rule config](https://docs.victoriametrics.com/vmalert.html#alerting-rules))
|
||||
state updates for each rule. To check updates, click on `Details` link next to rule's name on `/vmalert/groups` page
|
||||
and check the `Last updates` section:
|
||||
|
||||
<img alt="vmalert state" src="vmalert_state.png">
|
||||
|
||||
|
@ -735,7 +747,7 @@ HTTP request sent by vmalert to the `-datasource.url` during evaluation. If spec
|
|||
no samples returned and curl command returns data - then it is very likely there was no data in datasource on the
|
||||
moment when rule was evaluated.
|
||||
|
||||
vmalert also alows configuring more detailed logging for specific rule. Just set `debug: true` in rule's configuration
|
||||
vmalert allows configuring more detailed logging for specific alerting rule. Just set `debug: true` in rule's configuration
|
||||
and vmalert will start printing additional log messages:
|
||||
```terminal
|
||||
2022-09-15T13:35:41.155Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:41+02:00: query returned 0 samples (elapsed: 5.896041ms)
|
||||
|
@ -894,6 +906,8 @@ The shortlist of configuration flags is the following:
|
|||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerJSONFields string
|
||||
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
|
@ -1096,6 +1110,8 @@ The shortlist of configuration flags is the following:
|
|||
Interval for checking for changes in '-rule' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes. DEPRECATED - see '-configCheckInterval' instead
|
||||
-rule.maxResolveDuration duration
|
||||
Limits the maximum duration for automatic alert expiration, which is by default equal to 3 evaluation intervals of the parent group.
|
||||
-rule.maxUpdateEntries int
|
||||
Defines the max number of rule's state updates. (default 20)
|
||||
-rule.resendDelay duration
|
||||
Minimum amount of time to wait before resending an alert to notifier
|
||||
-rule.templates array
|
||||
|
|
Loading…
Reference in a new issue