Uniqueness of rule is now defined by combination of its name, expression and
labels. The hash of the combination is now used as rule ID and identifies rule within the group.
Set of rules from coreos/kube-prometheus was added for testing purposes to
verify compatibility. The check also showed that `vmalert` doesn't support
`query` template function that was mentioned as limitation in README.
The feature allows to speed up group rules execution by
executing them concurrently.
Change also contains README changes to reflect configuration
details.
* vmalert: Add recording rules support.
Recording rules support required additional service refactoring since
it wasn't planned to support them from the very beginning. The list
of changes is following:
* new entity RecordingRule was added for writing results of MetricsQL
expressions into remote storage;
* interface Rule now unites both recording and alerting rules;
* configuration parser was moved to separate package and now performs
more strict validation;
* new endpoint for listing all groups and rules in json format was added;
* evaluation interval may be set to every particular group;
* vmalert: uncomment tests
* vmalert: rm outdated TODO
* vmalert: fix typos in README
Before the change we were sending notifications to notifier
if following conditions are met:
* alert is in Fire state
* alert is in Inactive state
We were sending Inactive notifications to resolve alert ASAP.
Unfortunately, we were sending resolves for Pending alerts that become
Inactive, which is wrong.
In this change we delete alert from the active list if
it was Pending and become Inactive. In this way we now
have Inactive alerts only if they were in state Fire before.
See test change for example.
During group's update rules deletion was causing slice
mutations while slice index was assumed to be unchanged.
This caused "slice bounds out of range" errors when multiple
rules were deleted sequentially.
The check for non-nil remoteRead was mistakenly dropped
during refactoring which caused panics when `vmalert`
wasn't configured with `remoteRead` flag.
* feat: add remoteWrite.maxQueueSize to reduce queue full
* rename remote(write|read) flags to remote(Write|Read) for the sake of consistency
Co-authored-by: xiaobeibei <xiaobeibei@bigo.sg>
The change introduces new entity `manager` which replaces
`watchdog`, decouples requestHandler and groups. Manager
supposed to control life cycle of groups, rules and
config reloads.
Groups export an ID method which returns a hash
from filename and group name. ID supposed to be unique
identifier across all loaded groups.
Some tests were added to improve coverage.
Bug with wrong annotation value if $value is used in
templates after metrics being restored fixed.
Notifier interface was extended to accept context.
New set of metrics was introduced for config reload.
* app/vmalert: restore alerts state from datasource metrics
Vmalert will restore alerts state for rules that have `rule.For` > 0 from previously written timeseries via `remotewrite.url` flag.
* app/vmalert: mention remotewerite and remoteread configuration in README