vmalert: add -rule.evalDelay flag and eval_delay as group attribute (#5185)

Also mark `-datasource.lookback` as will be deprecated, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155.
2025-03-11 15:34:56 +00:00 · 2023-10-25 17:54:18 +08:00 · 2023-10-25 17:54:18 +08:00 · c9375cac5e
commit c9375cac5e
parent 4e0a779efe
7 changed files with 93 additions and 31 deletions
--- a/app/vmalert/README.md
+++ b/app/vmalert/README.md
@ -119,6 +119,12 @@ name: <string>
 # `eval_offset` can't be bigger than `interval`.
 [ eval_offset: <duration> ]

+# Optional
+# Adjust the `time` parameter of group evaluation requests to compensate intentional query delay from datasource.
+# By default, use flag `-rule.evalDelay` equal to `-search.latencyOffset` (a cmd-line flag configured for VictoriaMetrics single-node or vmselect). But if group has `latency_offset` param which value differs from `-search.latencyOffset`, set `eval_delay` equal to `latency_offset`.
+# See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155.
+[ eval_delay: <duration> ]
+
 # Limit the number of alerts an alerting rule and series a recording
 # rule can produce. 0 is no limit.
 [ limit: <int> | default = 0 ]
@ -794,9 +800,7 @@ Try the following recommendations to reduce the chance of hitting the data delay
 [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution). For example,
 if expression is `rate(my_metric[2m]) > 0` then ensure that `my_metric` resolution is at least `1m` or better `30s`. 
 If you use VictoriaMetrics as datasource, `[duration]` can be omitted and VictoriaMetrics will adjust it automatically.
-* If you know in advance, that data in datasource is delayed - try changing vmalerts `-datasource.lookback`
-command-line flag to add a time shift for evaluations. Or extend `[duration]` to tolerate the delay.
-For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
+* Extend `[duration]` in expr to help tolerate the delay. For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
 * If [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution)
 in datasource is inconsistent or `>=5min` - try changing vmalerts `-datasource.queryStep` command-line flag to specify
 how far search query can lookback for the recent datapoint. The recommendation is to have the step 
@ -805,8 +809,10 @@ at least two times bigger than the resolution.
 > Please note, data delay is inevitable in distributed systems. And it is better to account for it instead of ignoring.

 By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s
-(see `-search.latencyOffset` command-line flag at vmselect). Such delay is needed to eliminate risk of incomplete
+(see `-search.latencyOffset` command-line flag at vmselect, and it can be overridden by adding `latency_offset` to group's params). Such delay is needed to eliminate risk of incomplete
 data on the moment of querying, since metrics collectors won't be able to deliver the data in time.
+To compensate the latency in timestamps for produced evaluation results, `-rule.evalDelay` is also set to 30s by default.
+If you changed the `-search.latencyOffset`(cmd-line flag configured for VictoriaMetrics single-node or vmselect) value and observed a delay in timestamps for produced evaluation results, try changing `-rule.evalDelay` equal to `-search.latencyOffset`.

 ### Alerts state

@ -1291,6 +1297,9 @@ The shortlist of configuration flags is the following:
     Limits the maximum duration for automatic alert expiration, which by default is 4 times evaluationInterval of the parent group.
  -rule.resendDelay duration
     Minimum amount of time to wait before resending an alert to notifier
+  -rule.evalDelay duration
+     Adjust the `time` parameter of rule evaluation requests to compensate intentional query delay from datasource. Normally should equal to `-search.latencyOffset`(a cmd-line flag configured for VictoriaMetrics single-node or vmselect). (default 30s)
+     See more details [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155).
  -rule.templates array
     Path or glob pattern to location with go template definitions
     	for rules annotations templating. Flag can be specified multiple times.
--- a/app/vmalert/config/config.go
+++ b/app/vmalert/config/config.go
@ -19,11 +19,14 @@ import (
 // Group contains list of Rules grouped into
 // entity with one name and evaluation interval
 type Group struct {
-	Type        Type `yaml:"type,omitempty"`
-	File        string
-	Name        string              `yaml:"name"`
-	Interval    *promutils.Duration `yaml:"interval,omitempty"`
-	EvalOffset  *promutils.Duration `yaml:"eval_offset,omitempty"`
+	Type       Type `yaml:"type,omitempty"`
+	File       string
+	Name       string              `yaml:"name"`
+	Interval   *promutils.Duration `yaml:"interval,omitempty"`
+	EvalOffset *promutils.Duration `yaml:"eval_offset,omitempty"`
+	// EvalDelay will adjust the `time` parameter of rule evaluation requests to compensate intentional query delay from datasource.
+	// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155
+	EvalDelay   *promutils.Duration `yaml:"eval_delay,omitempty"`
 	Limit       int                 `yaml:"limit,omitempty"`
 	Rules       []Rule              `yaml:"rules"`
 	Concurrency int                 `yaml:"concurrency"`
--- a/app/vmalert/datasource/init.go
+++ b/app/vmalert/datasource/init.go
@ -43,7 +43,8 @@ var (
 	oauth2TokenURL         = flag.String("datasource.oauth2.tokenUrl", "", "Optional OAuth2 tokenURL to use for -datasource.url.")
 	oauth2Scopes           = flag.String("datasource.oauth2.scopes", "", "Optional OAuth2 scopes to use for -datasource.url. Scopes must be delimited by ';'")

-	lookBack  = flag.Duration("datasource.lookback", 0, `Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.`)
+	lookBack = flag.Duration("datasource.lookback", 0, `Will be deprecated soon, please adjust "-search.latencyOffset"  at datasource side or specify "latency_offset" in rule group's params.`+
+		`Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.`)
 	queryStep = flag.Duration("datasource.queryStep", 5*time.Minute, "How far a value can fallback to when evaluating queries. "+
 		"For example, if -datasource.queryStep=15s then param \"step\" with value \"15s\" will be added to every query. "+
 		"If set to 0, rule's evaluation interval will be used instead.")
@ -85,6 +86,9 @@ func Init(extraParams url.Values) (QuerierBuilder, error) {
 	if !*queryTimeAlignment {
 		logger.Warnf("flag `datasource.queryTimeAlignment` is deprecated and will be removed in next releases, please use `eval_alignment` in rule group instead")
 	}
+	if *lookBack != 0 {
+		logger.Warnf("flag `datasource.lookback` will be deprecated soon, please adjust `-search.latencyOffset`(a cmd-line flag configured for VictoriaMetrics single-node or vmselect) or specify `latency_offset` in rule group's params. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155.")
+	}

 	tr, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
 	if err != nil {
--- a/app/vmalert/rule/group.go
+++ b/app/vmalert/rule/group.go
@ -32,6 +32,8 @@ var (
 	resendDelay        = flag.Duration("rule.resendDelay", 0, "MiniMum amount of time to wait before resending an alert to notifier")
 	maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maxiMum duration for automatic alert expiration, "+
 		"which by default is 4 times evaluationInterval of the parent ")
+	evalDelay = flag.Duration("rule.evalDelay", 30*time.Second, "Adjust the `time` parameter of rule evaluation requests to compensate intentional data query delay from datasource."+
+		"Normally should equal to `-search.latencyOffset`(a cmd-line flag configured for VictoriaMetrics single-node or vmselect)")
 	disableAlertGroupLabel = flag.Bool("disableAlertgroupLabel", false, "Whether to disable adding group's Name as label to generated alerts and time series.")
 	remoteReadLookBack     = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries."+
 		" For example, if lookback=1h then range from now() to now()-1h will be scanned.")
@ -39,13 +41,16 @@ var (

 // Group is an entity for grouping rules
 type Group struct {
-	mu             sync.RWMutex
-	Name           string
-	File           string
-	Rules          []Rule
-	Type           config.Type
-	Interval       time.Duration
-	EvalOffset     *time.Duration
+	mu         sync.RWMutex
+	Name       string
+	File       string
+	Rules      []Rule
+	Type       config.Type
+	Interval   time.Duration
+	EvalOffset *time.Duration
+	// EvalDelay will adjust the `time` parameter of rule evaluation requests to compensate intentional query delay from datasource.
+	// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155
+	EvalDelay      *time.Duration
 	Limit          int
 	Concurrency    int
 	Checksum       string
@ -139,6 +144,9 @@ func NewGroup(cfg config.Group, qb datasource.QuerierBuilder, defaultInterval ti
 	if cfg.EvalOffset != nil {
 		g.EvalOffset = &cfg.EvalOffset.D
 	}
+	if cfg.EvalDelay != nil {
+		g.EvalDelay = &cfg.EvalDelay.D
+	}
 	for _, h := range cfg.Headers {
 		g.Headers[h.Key] = h.Value
 	}
@ -581,17 +589,24 @@ func (g *Group) adjustReqTimestamp(timestamp time.Time) time.Time {
 			// to 10:30, to the previous evaluationInterval.
 			return ts.Add(-g.Interval)
 		}
-		// EvalOffset shouldn't interfere with evalAlignment,
-		// so we return it immediately
+		// when `eval_offset` is using, ts shouldn't be effect by `eval_alignment` and `eval_delay`
+		// since it should be always aligned.
 		return ts
 	}
 	if g.evalAlignment == nil || *g.evalAlignment {
 		// align query time with interval to get similar result with grafana when plotting time series.
 		// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5049
 		// and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232
-		return timestamp.Truncate(g.Interval)
+		return timestamp.Truncate(g.Interval).Add(-g.getEvalDelay())
 	}
-	return timestamp
+	return timestamp.Add(-g.getEvalDelay())
+}
+
+func (g *Group) getEvalDelay() time.Duration {
+	if g.EvalDelay != nil {
+		return *g.EvalDelay
+	}
+	return *evalDelay
 }

 // executor contains group's notify and rw configs
--- a/app/vmalert/rule/group_test.go
+++ b/app/vmalert/rule/group_test.go
@ -628,6 +628,7 @@ func TestGroupStartDelay(t *testing.T) {

 func TestGetPrometheusReqTimestamp(t *testing.T) {
 	offset := 30 * time.Minute
+	evalDelay := 1 * time.Minute
 	disableAlign := false
 	testCases := []struct {
 		name            string
@ -635,24 +636,24 @@ func TestGetPrometheusReqTimestamp(t *testing.T) {
 		originTS, expTS string
 	}{
 		{
-			"with query align",
+			"with query align + default evalDelay",
 			&Group{
 				Interval: time.Hour,
 			},
 			"2023-08-28T11:11:00+00:00",
-			"2023-08-28T11:00:00+00:00",
+			"2023-08-28T10:59:30+00:00",
 		},
 		{
-			"without query align",
+			"without query align + default evalDelay",
 			&Group{
 				Interval:      time.Hour,
 				evalAlignment: &disableAlign,
 			},
 			"2023-08-28T11:11:00+00:00",
-			"2023-08-28T11:11:00+00:00",
+			"2023-08-28T11:10:30+00:00",
 		},
 		{
-			"with eval_offset, find previous offset point",
+			"with eval_offset, find previous offset point + default evalDelay",
 			&Group{
 				EvalOffset: &offset,
 				Interval:   time.Hour,
@ -661,7 +662,7 @@ func TestGetPrometheusReqTimestamp(t *testing.T) {
 			"2023-08-28T10:30:00+00:00",
 		},
 		{
-			"with eval_offset",
+			"with eval_offset + default evalDelay",
 			&Group{
 				EvalOffset: &offset,
 				Interval:   time.Hour,
@ -669,6 +670,25 @@ func TestGetPrometheusReqTimestamp(t *testing.T) {
 			"2023-08-28T11:41:00+00:00",
 			"2023-08-28T11:30:00+00:00",
 		},
+		{
+			"with eval_delay",
+			&Group{
+				EvalDelay: &evalDelay,
+				Interval:  time.Hour,
+			},
+			"2023-08-28T11:41:00+00:00",
+			"2023-08-28T10:59:00+00:00",
+		},
+		{
+			"disable alignment with eval_delay",
+			&Group{
+				EvalDelay:     &evalDelay,
+				Interval:      time.Hour,
+				evalAlignment: &disableAlign,
+			},
+			"2023-08-28T11:41:00+00:00",
+			"2023-08-28T11:40:00+00:00",
+		},
 	}
 	for _, tc := range testCases {
 		originT, _ := time.Parse(time.RFC3339, tc.originTS)
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@ -29,6 +29,7 @@ The sandbox cluster installation is running under the constant load generated by
 ## tip

 **vmalert's cmd-line flag `datasource.queryTimeAlignment` was deprecated and will have no effect anymore. It will be completely removed in next releases. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5049) and more detailed changes below.**
+**vmalert's cmd-line flag `datasource.lookback` will be deprecated soon, please use `-search.latencyOffset` in datasource or override it by adding `latency_offset` in rule group's params. It will have no effect in next release and be removed in future releases. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155).**

 * SECURITY: upgrade Go builder from Go1.21.1 to Go1.21.3. See [the list of issues addressed in Go1.21.2](https://github.com/golang/go/issues?q=milestone%3AGo1.21.2+label%3ACherryPickApproved) and [the list of issues addressed in Go1.21.3](https://github.com/golang/go/issues?q=milestone%3AGo1.21.3+label%3ACherryPickApproved).

@ -38,6 +39,7 @@ The sandbox cluster installation is running under the constant load generated by
 * FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): add `eval_alignment` attribute for [Groups](https://docs.victoriametrics.com/vmalert.html#groups), it will align group query requests timestamp with interval like `datasource.queryTimeAlignment` did.
  This also means that `datasource.queryTimeAlignment` command-line flag becomes deprecated now and will have no effect if configured. If `datasource.queryTimeAlignment` was set to `false` before, then `eval_alignment` has to be set to `false` explicitly under group.
  See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5049).
+* FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): add `-rule.evalDelay` flag and `eval_delay` attribute for [Groups](https://docs.victoriametrics.com/vmalert.html#groups). The new flag and param can be used to adjust the `time` parameter for rule evaluation requests to match intentional query delay from the datasource. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155).
 * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): support data ingestion from [NewRelic infrastructure agent](https://docs.newrelic.com/docs/infrastructure/install-infrastructure-agent). See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-newrelic-agent), [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3520) and [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4712).
 * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): skip job with error logs if there is incorrect syntax under `scrape_configs`, previously will exit. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959) and [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5153).
 * FEATURE: [vmbackup](https://docs.victoriametrics.com/vmbackup.html): add `-filestream.disableFadvise` command-line flag, which can be used for disabling `fadvise` syscall during backup upload to the remote storage. By default `vmbackup` uses `fadvise` syscall in order to prevent from eviction of recently accessed data from the [OS page cache](https://en.wikipedia.org/wiki/Page_cache) when backing up large files. Sometimes the `fadvise` syscall may take significant amounts of CPU when the backup is performed with large value of `-concurrency` command-line flag on systems with big number of CPU cores. In this case it is better to manually disable `fadvise` syscall by passing `-filestream.disableFadvise` command-line flag to `vmbackup`. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5120) for details.
--- a/docs/vmalert.md
+++ b/docs/vmalert.md
@ -130,6 +130,12 @@ name: <string>
 # `eval_offset` can't be bigger than `interval`.
 [ eval_offset: <duration> ]

+# Optional
+# Adjust the `time` parameter of group evaluation requests to compensate intentional query delay from datasource.
+# By default, use flag `-rule.evalDelay` equal to `-search.latencyOffset` (a cmd-line flag configured for VictoriaMetrics single-node or vmselect). But if group has `latency_offset` param which value differs from `-search.latencyOffset`, set `eval_delay` equal to `latency_offset`.
+# See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155.
+[ eval_delay: <duration> ]
+
 # Limit the number of alerts an alerting rule and series a recording
 # rule can produce. 0 is no limit.
 [ limit: <int> | default = 0 ]
@ -805,9 +811,7 @@ Try the following recommendations to reduce the chance of hitting the data delay
 [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution). For example,
 if expression is `rate(my_metric[2m]) > 0` then ensure that `my_metric` resolution is at least `1m` or better `30s`. 
 If you use VictoriaMetrics as datasource, `[duration]` can be omitted and VictoriaMetrics will adjust it automatically.
-* If you know in advance, that data in datasource is delayed - try changing vmalerts `-datasource.lookback`
-command-line flag to add a time shift for evaluations. Or extend `[duration]` to tolerate the delay.
-For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
+* Extend `[duration]` in expr to help tolerate the delay. For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
 * If [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution)
 in datasource is inconsistent or `>=5min` - try changing vmalerts `-datasource.queryStep` command-line flag to specify
 how far search query can lookback for the recent datapoint. The recommendation is to have the step 
@ -816,8 +820,10 @@ at least two times bigger than the resolution.
 > Please note, data delay is inevitable in distributed systems. And it is better to account for it instead of ignoring.

 By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s
-(see `-search.latencyOffset` command-line flag at vmselect). Such delay is needed to eliminate risk of incomplete
+(see `-search.latencyOffset` command-line flag at vmselect, and it can be overridden by adding `latency_offset` to group's params). Such delay is needed to eliminate risk of incomplete
 data on the moment of querying, since metrics collectors won't be able to deliver the data in time.
+To compensate the latency in timestamps for produced evaluation results, `-rule.evalDelay` is also set to 30s by default.
+If you changed the `-search.latencyOffset`(cmd-line flag configured for VictoriaMetrics single-node or vmselect) value and observed a delay in timestamps for produced evaluation results, try changing `-rule.evalDelay` equal to `-search.latencyOffset`.

 ### Alerts state

@ -1302,6 +1308,9 @@ The shortlist of configuration flags is the following:
     Limits the maximum duration for automatic alert expiration, which by default is 4 times evaluationInterval of the parent group.
  -rule.resendDelay duration
     Minimum amount of time to wait before resending an alert to notifier
+  -rule.evalDelay duration
+     Adjust the `time` parameter of rule evaluation requests to compensate intentional query delay from datasource. Normally should equal to `-search.latencyOffset`(a cmd-line flag configured for VictoriaMetrics single-node or vmselect). (default 30s)
+     See more details [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155).
  -rule.templates array
     Path or glob pattern to location with go template definitions
     	for rules annotations templating. Flag can be specified multiple times.