mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2025-03-11 15:34:56 +00:00
vmalert: add -rule.evalDelay
flag and eval_delay
as group attribute (#5185)
Also mark `-datasource.lookback` as will be deprecated, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155.
This commit is contained in:
parent
4e0a779efe
commit
c9375cac5e
7 changed files with 93 additions and 31 deletions
|
@ -119,6 +119,12 @@ name: <string>
|
|||
# `eval_offset` can't be bigger than `interval`.
|
||||
[ eval_offset: <duration> ]
|
||||
|
||||
# Optional
|
||||
# Adjust the `time` parameter of group evaluation requests to compensate intentional query delay from datasource.
|
||||
# By default, use flag `-rule.evalDelay` equal to `-search.latencyOffset` (a cmd-line flag configured for VictoriaMetrics single-node or vmselect). But if group has `latency_offset` param which value differs from `-search.latencyOffset`, set `eval_delay` equal to `latency_offset`.
|
||||
# See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155.
|
||||
[ eval_delay: <duration> ]
|
||||
|
||||
# Limit the number of alerts an alerting rule and series a recording
|
||||
# rule can produce. 0 is no limit.
|
||||
[ limit: <int> | default = 0 ]
|
||||
|
@ -794,9 +800,7 @@ Try the following recommendations to reduce the chance of hitting the data delay
|
|||
[time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution). For example,
|
||||
if expression is `rate(my_metric[2m]) > 0` then ensure that `my_metric` resolution is at least `1m` or better `30s`.
|
||||
If you use VictoriaMetrics as datasource, `[duration]` can be omitted and VictoriaMetrics will adjust it automatically.
|
||||
* If you know in advance, that data in datasource is delayed - try changing vmalerts `-datasource.lookback`
|
||||
command-line flag to add a time shift for evaluations. Or extend `[duration]` to tolerate the delay.
|
||||
For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
|
||||
* Extend `[duration]` in expr to help tolerate the delay. For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
|
||||
* If [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution)
|
||||
in datasource is inconsistent or `>=5min` - try changing vmalerts `-datasource.queryStep` command-line flag to specify
|
||||
how far search query can lookback for the recent datapoint. The recommendation is to have the step
|
||||
|
@ -805,8 +809,10 @@ at least two times bigger than the resolution.
|
|||
> Please note, data delay is inevitable in distributed systems. And it is better to account for it instead of ignoring.
|
||||
|
||||
By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s
|
||||
(see `-search.latencyOffset` command-line flag at vmselect). Such delay is needed to eliminate risk of incomplete
|
||||
(see `-search.latencyOffset` command-line flag at vmselect, and it can be overridden by adding `latency_offset` to group's params). Such delay is needed to eliminate risk of incomplete
|
||||
data on the moment of querying, since metrics collectors won't be able to deliver the data in time.
|
||||
To compensate the latency in timestamps for produced evaluation results, `-rule.evalDelay` is also set to 30s by default.
|
||||
If you changed the `-search.latencyOffset`(cmd-line flag configured for VictoriaMetrics single-node or vmselect) value and observed a delay in timestamps for produced evaluation results, try changing `-rule.evalDelay` equal to `-search.latencyOffset`.
|
||||
|
||||
### Alerts state
|
||||
|
||||
|
@ -1291,6 +1297,9 @@ The shortlist of configuration flags is the following:
|
|||
Limits the maximum duration for automatic alert expiration, which by default is 4 times evaluationInterval of the parent group.
|
||||
-rule.resendDelay duration
|
||||
Minimum amount of time to wait before resending an alert to notifier
|
||||
-rule.evalDelay duration
|
||||
Adjust the `time` parameter of rule evaluation requests to compensate intentional query delay from datasource. Normally should equal to `-search.latencyOffset`(a cmd-line flag configured for VictoriaMetrics single-node or vmselect). (default 30s)
|
||||
See more details [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155).
|
||||
-rule.templates array
|
||||
Path or glob pattern to location with go template definitions
|
||||
for rules annotations templating. Flag can be specified multiple times.
|
||||
|
|
|
@ -19,11 +19,14 @@ import (
|
|||
// Group contains list of Rules grouped into
|
||||
// entity with one name and evaluation interval
|
||||
type Group struct {
|
||||
Type Type `yaml:"type,omitempty"`
|
||||
File string
|
||||
Name string `yaml:"name"`
|
||||
Interval *promutils.Duration `yaml:"interval,omitempty"`
|
||||
EvalOffset *promutils.Duration `yaml:"eval_offset,omitempty"`
|
||||
Type Type `yaml:"type,omitempty"`
|
||||
File string
|
||||
Name string `yaml:"name"`
|
||||
Interval *promutils.Duration `yaml:"interval,omitempty"`
|
||||
EvalOffset *promutils.Duration `yaml:"eval_offset,omitempty"`
|
||||
// EvalDelay will adjust the `time` parameter of rule evaluation requests to compensate intentional query delay from datasource.
|
||||
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155
|
||||
EvalDelay *promutils.Duration `yaml:"eval_delay,omitempty"`
|
||||
Limit int `yaml:"limit,omitempty"`
|
||||
Rules []Rule `yaml:"rules"`
|
||||
Concurrency int `yaml:"concurrency"`
|
||||
|
|
|
@ -43,7 +43,8 @@ var (
|
|||
oauth2TokenURL = flag.String("datasource.oauth2.tokenUrl", "", "Optional OAuth2 tokenURL to use for -datasource.url.")
|
||||
oauth2Scopes = flag.String("datasource.oauth2.scopes", "", "Optional OAuth2 scopes to use for -datasource.url. Scopes must be delimited by ';'")
|
||||
|
||||
lookBack = flag.Duration("datasource.lookback", 0, `Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.`)
|
||||
lookBack = flag.Duration("datasource.lookback", 0, `Will be deprecated soon, please adjust "-search.latencyOffset" at datasource side or specify "latency_offset" in rule group's params.`+
|
||||
`Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.`)
|
||||
queryStep = flag.Duration("datasource.queryStep", 5*time.Minute, "How far a value can fallback to when evaluating queries. "+
|
||||
"For example, if -datasource.queryStep=15s then param \"step\" with value \"15s\" will be added to every query. "+
|
||||
"If set to 0, rule's evaluation interval will be used instead.")
|
||||
|
@ -85,6 +86,9 @@ func Init(extraParams url.Values) (QuerierBuilder, error) {
|
|||
if !*queryTimeAlignment {
|
||||
logger.Warnf("flag `datasource.queryTimeAlignment` is deprecated and will be removed in next releases, please use `eval_alignment` in rule group instead")
|
||||
}
|
||||
if *lookBack != 0 {
|
||||
logger.Warnf("flag `datasource.lookback` will be deprecated soon, please adjust `-search.latencyOffset`(a cmd-line flag configured for VictoriaMetrics single-node or vmselect) or specify `latency_offset` in rule group's params. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155.")
|
||||
}
|
||||
|
||||
tr, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
|
||||
if err != nil {
|
||||
|
|
|
@ -32,6 +32,8 @@ var (
|
|||
resendDelay = flag.Duration("rule.resendDelay", 0, "MiniMum amount of time to wait before resending an alert to notifier")
|
||||
maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maxiMum duration for automatic alert expiration, "+
|
||||
"which by default is 4 times evaluationInterval of the parent ")
|
||||
evalDelay = flag.Duration("rule.evalDelay", 30*time.Second, "Adjust the `time` parameter of rule evaluation requests to compensate intentional data query delay from datasource."+
|
||||
"Normally should equal to `-search.latencyOffset`(a cmd-line flag configured for VictoriaMetrics single-node or vmselect)")
|
||||
disableAlertGroupLabel = flag.Bool("disableAlertgroupLabel", false, "Whether to disable adding group's Name as label to generated alerts and time series.")
|
||||
remoteReadLookBack = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries."+
|
||||
" For example, if lookback=1h then range from now() to now()-1h will be scanned.")
|
||||
|
@ -39,13 +41,16 @@ var (
|
|||
|
||||
// Group is an entity for grouping rules
|
||||
type Group struct {
|
||||
mu sync.RWMutex
|
||||
Name string
|
||||
File string
|
||||
Rules []Rule
|
||||
Type config.Type
|
||||
Interval time.Duration
|
||||
EvalOffset *time.Duration
|
||||
mu sync.RWMutex
|
||||
Name string
|
||||
File string
|
||||
Rules []Rule
|
||||
Type config.Type
|
||||
Interval time.Duration
|
||||
EvalOffset *time.Duration
|
||||
// EvalDelay will adjust the `time` parameter of rule evaluation requests to compensate intentional query delay from datasource.
|
||||
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155
|
||||
EvalDelay *time.Duration
|
||||
Limit int
|
||||
Concurrency int
|
||||
Checksum string
|
||||
|
@ -139,6 +144,9 @@ func NewGroup(cfg config.Group, qb datasource.QuerierBuilder, defaultInterval ti
|
|||
if cfg.EvalOffset != nil {
|
||||
g.EvalOffset = &cfg.EvalOffset.D
|
||||
}
|
||||
if cfg.EvalDelay != nil {
|
||||
g.EvalDelay = &cfg.EvalDelay.D
|
||||
}
|
||||
for _, h := range cfg.Headers {
|
||||
g.Headers[h.Key] = h.Value
|
||||
}
|
||||
|
@ -581,17 +589,24 @@ func (g *Group) adjustReqTimestamp(timestamp time.Time) time.Time {
|
|||
// to 10:30, to the previous evaluationInterval.
|
||||
return ts.Add(-g.Interval)
|
||||
}
|
||||
// EvalOffset shouldn't interfere with evalAlignment,
|
||||
// so we return it immediately
|
||||
// when `eval_offset` is using, ts shouldn't be effect by `eval_alignment` and `eval_delay`
|
||||
// since it should be always aligned.
|
||||
return ts
|
||||
}
|
||||
if g.evalAlignment == nil || *g.evalAlignment {
|
||||
// align query time with interval to get similar result with grafana when plotting time series.
|
||||
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5049
|
||||
// and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232
|
||||
return timestamp.Truncate(g.Interval)
|
||||
return timestamp.Truncate(g.Interval).Add(-g.getEvalDelay())
|
||||
}
|
||||
return timestamp
|
||||
return timestamp.Add(-g.getEvalDelay())
|
||||
}
|
||||
|
||||
func (g *Group) getEvalDelay() time.Duration {
|
||||
if g.EvalDelay != nil {
|
||||
return *g.EvalDelay
|
||||
}
|
||||
return *evalDelay
|
||||
}
|
||||
|
||||
// executor contains group's notify and rw configs
|
||||
|
|
|
@ -628,6 +628,7 @@ func TestGroupStartDelay(t *testing.T) {
|
|||
|
||||
func TestGetPrometheusReqTimestamp(t *testing.T) {
|
||||
offset := 30 * time.Minute
|
||||
evalDelay := 1 * time.Minute
|
||||
disableAlign := false
|
||||
testCases := []struct {
|
||||
name string
|
||||
|
@ -635,24 +636,24 @@ func TestGetPrometheusReqTimestamp(t *testing.T) {
|
|||
originTS, expTS string
|
||||
}{
|
||||
{
|
||||
"with query align",
|
||||
"with query align + default evalDelay",
|
||||
&Group{
|
||||
Interval: time.Hour,
|
||||
},
|
||||
"2023-08-28T11:11:00+00:00",
|
||||
"2023-08-28T11:00:00+00:00",
|
||||
"2023-08-28T10:59:30+00:00",
|
||||
},
|
||||
{
|
||||
"without query align",
|
||||
"without query align + default evalDelay",
|
||||
&Group{
|
||||
Interval: time.Hour,
|
||||
evalAlignment: &disableAlign,
|
||||
},
|
||||
"2023-08-28T11:11:00+00:00",
|
||||
"2023-08-28T11:11:00+00:00",
|
||||
"2023-08-28T11:10:30+00:00",
|
||||
},
|
||||
{
|
||||
"with eval_offset, find previous offset point",
|
||||
"with eval_offset, find previous offset point + default evalDelay",
|
||||
&Group{
|
||||
EvalOffset: &offset,
|
||||
Interval: time.Hour,
|
||||
|
@ -661,7 +662,7 @@ func TestGetPrometheusReqTimestamp(t *testing.T) {
|
|||
"2023-08-28T10:30:00+00:00",
|
||||
},
|
||||
{
|
||||
"with eval_offset",
|
||||
"with eval_offset + default evalDelay",
|
||||
&Group{
|
||||
EvalOffset: &offset,
|
||||
Interval: time.Hour,
|
||||
|
@ -669,6 +670,25 @@ func TestGetPrometheusReqTimestamp(t *testing.T) {
|
|||
"2023-08-28T11:41:00+00:00",
|
||||
"2023-08-28T11:30:00+00:00",
|
||||
},
|
||||
{
|
||||
"with eval_delay",
|
||||
&Group{
|
||||
EvalDelay: &evalDelay,
|
||||
Interval: time.Hour,
|
||||
},
|
||||
"2023-08-28T11:41:00+00:00",
|
||||
"2023-08-28T10:59:00+00:00",
|
||||
},
|
||||
{
|
||||
"disable alignment with eval_delay",
|
||||
&Group{
|
||||
EvalDelay: &evalDelay,
|
||||
Interval: time.Hour,
|
||||
evalAlignment: &disableAlign,
|
||||
},
|
||||
"2023-08-28T11:41:00+00:00",
|
||||
"2023-08-28T11:40:00+00:00",
|
||||
},
|
||||
}
|
||||
for _, tc := range testCases {
|
||||
originT, _ := time.Parse(time.RFC3339, tc.originTS)
|
||||
|
|
|
@ -29,6 +29,7 @@ The sandbox cluster installation is running under the constant load generated by
|
|||
## tip
|
||||
|
||||
**vmalert's cmd-line flag `datasource.queryTimeAlignment` was deprecated and will have no effect anymore. It will be completely removed in next releases. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5049) and more detailed changes below.**
|
||||
**vmalert's cmd-line flag `datasource.lookback` will be deprecated soon, please use `-search.latencyOffset` in datasource or override it by adding `latency_offset` in rule group's params. It will have no effect in next release and be removed in future releases. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155).**
|
||||
|
||||
* SECURITY: upgrade Go builder from Go1.21.1 to Go1.21.3. See [the list of issues addressed in Go1.21.2](https://github.com/golang/go/issues?q=milestone%3AGo1.21.2+label%3ACherryPickApproved) and [the list of issues addressed in Go1.21.3](https://github.com/golang/go/issues?q=milestone%3AGo1.21.3+label%3ACherryPickApproved).
|
||||
|
||||
|
@ -38,6 +39,7 @@ The sandbox cluster installation is running under the constant load generated by
|
|||
* FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): add `eval_alignment` attribute for [Groups](https://docs.victoriametrics.com/vmalert.html#groups), it will align group query requests timestamp with interval like `datasource.queryTimeAlignment` did.
|
||||
This also means that `datasource.queryTimeAlignment` command-line flag becomes deprecated now and will have no effect if configured. If `datasource.queryTimeAlignment` was set to `false` before, then `eval_alignment` has to be set to `false` explicitly under group.
|
||||
See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5049).
|
||||
* FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): add `-rule.evalDelay` flag and `eval_delay` attribute for [Groups](https://docs.victoriametrics.com/vmalert.html#groups). The new flag and param can be used to adjust the `time` parameter for rule evaluation requests to match intentional query delay from the datasource. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): support data ingestion from [NewRelic infrastructure agent](https://docs.newrelic.com/docs/infrastructure/install-infrastructure-agent). See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-newrelic-agent), [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3520) and [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4712).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): skip job with error logs if there is incorrect syntax under `scrape_configs`, previously will exit. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959) and [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5153).
|
||||
* FEATURE: [vmbackup](https://docs.victoriametrics.com/vmbackup.html): add `-filestream.disableFadvise` command-line flag, which can be used for disabling `fadvise` syscall during backup upload to the remote storage. By default `vmbackup` uses `fadvise` syscall in order to prevent from eviction of recently accessed data from the [OS page cache](https://en.wikipedia.org/wiki/Page_cache) when backing up large files. Sometimes the `fadvise` syscall may take significant amounts of CPU when the backup is performed with large value of `-concurrency` command-line flag on systems with big number of CPU cores. In this case it is better to manually disable `fadvise` syscall by passing `-filestream.disableFadvise` command-line flag to `vmbackup`. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5120) for details.
|
||||
|
|
|
@ -130,6 +130,12 @@ name: <string>
|
|||
# `eval_offset` can't be bigger than `interval`.
|
||||
[ eval_offset: <duration> ]
|
||||
|
||||
# Optional
|
||||
# Adjust the `time` parameter of group evaluation requests to compensate intentional query delay from datasource.
|
||||
# By default, use flag `-rule.evalDelay` equal to `-search.latencyOffset` (a cmd-line flag configured for VictoriaMetrics single-node or vmselect). But if group has `latency_offset` param which value differs from `-search.latencyOffset`, set `eval_delay` equal to `latency_offset`.
|
||||
# See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155.
|
||||
[ eval_delay: <duration> ]
|
||||
|
||||
# Limit the number of alerts an alerting rule and series a recording
|
||||
# rule can produce. 0 is no limit.
|
||||
[ limit: <int> | default = 0 ]
|
||||
|
@ -805,9 +811,7 @@ Try the following recommendations to reduce the chance of hitting the data delay
|
|||
[time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution). For example,
|
||||
if expression is `rate(my_metric[2m]) > 0` then ensure that `my_metric` resolution is at least `1m` or better `30s`.
|
||||
If you use VictoriaMetrics as datasource, `[duration]` can be omitted and VictoriaMetrics will adjust it automatically.
|
||||
* If you know in advance, that data in datasource is delayed - try changing vmalerts `-datasource.lookback`
|
||||
command-line flag to add a time shift for evaluations. Or extend `[duration]` to tolerate the delay.
|
||||
For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
|
||||
* Extend `[duration]` in expr to help tolerate the delay. For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
|
||||
* If [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution)
|
||||
in datasource is inconsistent or `>=5min` - try changing vmalerts `-datasource.queryStep` command-line flag to specify
|
||||
how far search query can lookback for the recent datapoint. The recommendation is to have the step
|
||||
|
@ -816,8 +820,10 @@ at least two times bigger than the resolution.
|
|||
> Please note, data delay is inevitable in distributed systems. And it is better to account for it instead of ignoring.
|
||||
|
||||
By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s
|
||||
(see `-search.latencyOffset` command-line flag at vmselect). Such delay is needed to eliminate risk of incomplete
|
||||
(see `-search.latencyOffset` command-line flag at vmselect, and it can be overridden by adding `latency_offset` to group's params). Such delay is needed to eliminate risk of incomplete
|
||||
data on the moment of querying, since metrics collectors won't be able to deliver the data in time.
|
||||
To compensate the latency in timestamps for produced evaluation results, `-rule.evalDelay` is also set to 30s by default.
|
||||
If you changed the `-search.latencyOffset`(cmd-line flag configured for VictoriaMetrics single-node or vmselect) value and observed a delay in timestamps for produced evaluation results, try changing `-rule.evalDelay` equal to `-search.latencyOffset`.
|
||||
|
||||
### Alerts state
|
||||
|
||||
|
@ -1302,6 +1308,9 @@ The shortlist of configuration flags is the following:
|
|||
Limits the maximum duration for automatic alert expiration, which by default is 4 times evaluationInterval of the parent group.
|
||||
-rule.resendDelay duration
|
||||
Minimum amount of time to wait before resending an alert to notifier
|
||||
-rule.evalDelay duration
|
||||
Adjust the `time` parameter of rule evaluation requests to compensate intentional query delay from datasource. Normally should equal to `-search.latencyOffset`(a cmd-line flag configured for VictoriaMetrics single-node or vmselect). (default 30s)
|
||||
See more details [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155).
|
||||
-rule.templates array
|
||||
Path or glob pattern to location with go template definitions
|
||||
for rules annotations templating. Flag can be specified multiple times.
|
||||
|
|
Loading…
Reference in a new issue