- Document the change at docs/CHANGELOG.md
- Add `Reading rules from object storage` section to docs/vmalert.md
- Add `s3` prefix to command-line flags related to the configuration of s3 and gcs clients
- Explicitly mention that reading rules from object storage is supported only in enterprise version
* vmalert: support object storage for rules
Support loading of alerting and recording rules from object
storages `gcs://`, `gs://`, `s3://`.
* review fixes
* vmalert: use group's ID in UI to avoid collisions
Identical group names are allowed. So we should used IDs
for various groupings and aggregations in UI.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: prevent disabling state updates tracking
The minimum number of update states to track is now set to 1.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: properly update `debug` and `update_entries_limit` params on hot-reload
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: display `debug` field for rule in UI
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: exclude `updates` field from json marhsaling
This field isn't correctly marshaled right now.
And implementing the correct marshaling for it doesn't
seem right, since json representation is mostly used
by systems like Grafana. And Grafana doesn't expect this
field to be present.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* fix test for disabled state
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* fix test for disabled state
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: speed up state restore procedure on start
Alerts state restore procedure has been changed to become asynchronous.
It doesn't block groups start anymore which significantly improves vmalert's startup time.
Instead, state restore is called by each group in their goroutines after the first rules
evaluation.
While previously state restore attempt was made for all loaded alerting rules,
now it is called only for alerts which became active after the first evaluation.
This reduces the amount of API calls to the configured remote read URL.
This also means that `remoteRead.ignoreRestoreErrors` command-line flag becomes deprecated now
and will have no effect if configured.
See relevant issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2608
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* make lint happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Apply suggestions from code review
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Stress the importance of specifying of all Alertmanager
URLs in vmalert's `-notifier.url` or `notifier.config`
if it runs in cluster mode.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3547
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Allow configuring the default number of stored rule's update states in memory
via global `-rule.updateEntriesLimit` command-line flag or per-rule via rule's
`update_entries_limit` configuration param.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmbackupmanager: add metrics for better observability, include more information to `/api/v1/backups` API call response
* app/vmbackupmanager: drop old metrics before creating new ones
* app/vmbackupmanager: use `_total` postfix for counter metrics
* app/vmbackupmanager: remove `_total` postfix for gauge-like metrics
* app/vmbackupmanager: add `_last_run_failed` metrics for backups and retention
* app/vmbackupmanager: address review feedback
* app/vmbackupmanager: fix metric name
* app/vmbackupmanager: address review feedback, remove background updates of metrics, add restoring state of `_last_run_failed` metric from remote storage
* app/vmbackupmanager: improve performance for backup size calculation
* app/vmbackupmanager: refactor backup and retention runs to deduplicate each run logic
* {app/vmbackupmanager,lib/formatutil}: move HumanizeBytes into lib package
* app/vmbackupmanager: fix creating new metrics instead of reusing existing ones
* lit/formatutil: add comment to make linter happy
* app/vmbackupmanager: address review feedback
The system links are absolute, e.g. they start from `/`, so there are high chances
they won't work as expected when requested via proxy such as vmselect with -vmalert.proxyURL
command-line flag.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3424
http.Request was used as a part of state struct
for generating the curl command when viewing the rule's
state changes.
It appears, that holding a referencing is far more expensive
than generating the curl command immediately.
On the test with 40k rules, this change reduces memory
and CPU usage by 50%.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: correctly return error for RW failures
By mistake, in 0989649ad0 the error
for remote write failures weren't return to user.
This change fixes it.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Method `metrics()` now pre-allocates slices for labels
and results from query responses. This reduces the number
of allocations on the hot path for instant requests.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The recent change in modifying default value
of `datasource.queryStep` flag resulted in situation
where replay mode was always running queries with
step=`datasource.queryStep`. When it should always
use rule's evaluation interval.
The fix is related not to replay mode only, but
for all Range requests. Now step param is set
individually for each mode.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/vmalert: add `remoteWrite.sendTimeout` command-line flag to configure timeout for sending data to `remoteWrite.url`
* vmalert: remove WriteTimeout from clients Cfg
No need to have it as a part of configuration struct:
* the client isn't used by other packages;
* there are no internal tests to check the WriteTimeout.
* vmalert: remove DisablePathAppend from clients Cfg
No need to have it as a part of configuration struct:
* the client isn't used by other packages;
* there are no internal tests to check the DisablePathAppend.
Co-authored-by: hagen1778 <roman@victoriametrics.com>
- Return meta-labels for the discovered targets via promutils.Labels
instead of map[string]string. This improves the speed of generating
meta-labels for discovered targets by up to 5x.
- Remove memory allocations in hot paths during ScrapeWork generation.
The ScrapeWork contains scrape settings for a single discovered target.
This improves the service discovery speed by up to 2x.
* flag reference update
there is no flag `-datasource.disablePathAppend` and datasource actually checking for `-remoteRead.disablePathAppend`
* update source for doc as well
The default list of alerting rules contains the basic
rules for checking vmalert's health state and is recommended
to use for monitoring vmalert deployments.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Previously the `quotesEscape` function was escaping only double quotes.
This wasn't enough, since the input string could contain other special chars,
which must be escaped when put inside JSON string. For example, carriage return and line feed chars (\n\r),
backslash char, etc. This led to the following issues, which were improperly fixed:
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/890 - this issue
was "fixed" by introducing the `crlfEscape` function, which led to unnecessary
complications in user templates, while not fixing various corner cases
such as backslash chars in the input string.
See 1de15ad490
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3139 - this issue
was "fixed" by urlencoding the whole string passed to -external.alert.source
command-line flag. This led to invalid urls, which couldn't be parsed by Grafana.
See 00c838353d
and 4bd0244599
This commit properly encodes the input string passed to `quotesEscape`, so it can be safely embedded inside JSON strings.
This commit deprecates crlfEscape template function and adds the following new template functions:
- strvalue and stripDomain - these functions are supported by Prometheus, so they were added
for compatibility purposes.
- jsonEscape and htmlEscape for converting the input string to valid quoted JSON string
and for html-escaping the input string, so it could be safely embedded as a plaintext
into html.
This commit also documents all supported template functions at https://docs.victoriametrics.com/vmalert.html#template-functions
The deprecated crlfEscape function isn't documented on purpose, since its usefulness is negative in general case.
This reverts commit 00c838353d.
Reason for revert: it incorrectly fixes the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3139 .
Now `-external.alert.source=explore?orgId=1&left=...` is converted to the following invalid url, which cannot be handled by Grafana:
https://grafana.example.com/explore%3ForgId%3D1%26left%3D...
The next commit will contain the correct fix of the issue - the `quotesEscape` function must
properly escape the string, so it could be embedded into JSON string. This function must
properly escape \n\r chars too. In this case the `crlfEscape` function becomes unnecessary.
Actually, the next commit makes the `crlfEscape` function deprecated.
The message about dropped data still remains at `error` level.
The change supposed to make log message more clear about how
serious it is.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>