Commit graph

239 commits

Author SHA1 Message Date
Roman Khavronenko
92212f04da
vmalert: avoid sending resolves for pending alerts (#498)
Before the change we were sending notifications to notifier
if following conditions are met:
* alert is in Fire state
* alert is in Inactive state

We were sending Inactive notifications to resolve alert ASAP. 
Unfortunately, we were sending resolves for Pending alerts that become
Inactive, which is wrong.

In this change we delete alert from the active list if
it was Pending and become Inactive. In this way we now
have Inactive alerts only if they were in state Fire before.
See test change for example.
2020-05-17 15:13:22 +01:00
Roman Khavronenko
de60ad0cd6
vmalert: fix potential race during configuration reloads (#497)
Configuration reload and rules evaluation can't be executed
in same time now. This may make reload time longer but
prevents from potential races.
2020-05-17 15:12:09 +01:00
Aliaksandr Valialkin
eac3da478e app/vmalert: run make quicktemplate-gen from the root dir of the repository 2020-05-16 22:46:02 +03:00
Aliaksandr Valialkin
93c87d28f6 all: print --help output to stdout instead of stderr
This is easier to grep and pipe
2020-05-16 11:59:33 +03:00
Roman Khavronenko
a249cd9d22
vmalert: fix the access to rules slice element by wrong index (#486)
During group's update rules deletion was causing slice
mutations while slice index was assumed to be unchanged.
This caused "slice bounds out of range" errors when multiple
rules were deleted sequentially.
2020-05-15 07:55:22 +01:00
hagen1778
ef0e37cb9e vmalert: update README 2020-05-15 09:17:28 +03:00
Aliaksandr Valialkin
0afd48d2ee lib: extract common code for returning fast unix timestamp into lib/fasttime 2020-05-14 23:02:07 +03:00
Roman Khavronenko
415b1ddfb5
vmalert: check if remoteRead object was initied before calling Restore (#473)
The check for non-nil remoteRead was mistakenly dropped
during refactoring which caused panics when `vmalert`
wasn't configured with `remoteRead` flag.
2020-05-13 19:32:58 +01:00
Roman Khavronenko
db7dd96346
vmalert: fix flag names and description in README (#475)
Change also adds the recommendation for `remotewrite`
queue error.
2020-05-13 19:32:21 +01:00
肖贝贝
ba48438b06
Feat/vmalert add max queue size (#472)
* feat: add remoteWrite.maxQueueSize to reduce queue full
* rename remote(write|read) flags to remote(Write|Read) for the sake of consistency

Co-authored-by: xiaobeibei <xiaobeibei@bigo.sg>
2020-05-13 18:58:56 +01:00
Roman Khavronenko
8c8ff5d0cb
vmalert: cleanup and restructure of code to improve maintainability (#471)
The change introduces new entity `manager` which replaces
`watchdog`, decouples requestHandler and groups. Manager
supposed to control life cycle of groups, rules and
config reloads.

Groups export an ID method which returns a hash
from filename and group name. ID supposed to be unique
identifier across all loaded groups.

Some tests were added to improve coverage.

Bug with wrong annotation value if $value is used in
 templates after metrics being restored fixed.

Notifier interface was extended to accept context.

New set of metrics was introduced for config reload.
2020-05-10 17:58:17 +01:00
Nikolay Khramchikhin
9e8733ff65
vmalert config reload
added config hot reload for vmalert with sighup and api call
2020-05-09 10:32:12 +01:00
Roman Khavronenko
0ba1b5c71b
app/vmalert: restore alerts state from datasource metrics (#461)
* app/vmalert: restore alerts state from datasource metrics

Vmalert will restore alerts state for rules that have `rule.For` > 0 from previously written timeseries via `remotewrite.url` flag.

* app/vmalert: mention remotewerite and remoteread configuration in README
2020-05-05 00:51:22 +03:00
Artem Navoiev
4487b454a8
Update README.md 2020-04-29 12:39:15 +03:00
Aliaksandr Valialkin
4e4f57b121 lib/metricsql: move it to a separate repository - github.com/VictoriaMetrics/metrics 2020-04-28 15:28:22 +03:00
Aliaksandr Valialkin
1397612117 app/vmalert: added missing comments for public entities 2020-04-28 11:21:07 +03:00
Roman Khavronenko
3bfa41a95c
app/vmalert: initial remote-write support for alerts state persistence. (#442)
* app/vmalert: initial remote-write support for alerts state persistence.

If `remotewrite.url` flag is set, vmalert will send alerts state  via remote-write protocol to remote storage. The sending is asynchronous to avoid blocking calls in rules evaluation loop.

* app/vmalert: merge with master

* app/vmalert: write both `instant` and `for` alerts timeseries states in remote storage.
2020-04-28 00:18:02 +03:00
Aliaksandr Valialkin
90670cb55e app/vmalert: include it into the next release 2020-04-28 00:10:12 +03:00
kreedom
2c18548e08
alert - rename validate function and flags (#440)
* alert - rename validate function and flags
2020-04-26 14:15:04 +03:00
kreedom
5f61d43db9
vmalert - validate template in labels (#439) 2020-04-26 13:53:57 +03:00
肖贝贝
eeadfccdc5
fix: fix vmalert template label not complete bug (#435)
Co-authored-by: xiaobeibei <xiaobeibei@bigo.sg>
2020-04-26 13:30:10 +03:00
hagen1778
2eed6c393f vmalert: prepare package for external usage
* update README according to changes
* add Makefile with basic commands
2020-04-12 15:32:42 +03:00
kreedom
948f8b6b5f [vmalert] fix linter issues 2020-04-12 15:08:11 +03:00
kreedom
8fca5f2819
[vmalert] add tests to webserver (#413) 2020-04-12 14:51:03 +03:00
Roman Khavronenko
7c9405f53d
Vmalert metrics (#412)
vmalert: add basic list of metrics
2020-04-11 20:42:01 +01:00
Roman Khavronenko
9f8cc8ae1b
Extend web responses for alerts: (#411)
vmalert: Extend web responses for alerts

* populate apiAlert object with additional fields
* return all active alerts, not only firing
* sort list of API alerts for deterministic output
* add helper for available path list
2020-04-11 16:49:23 +01:00
kreedom
90de3086b3
[vmalert] add webserver (#410)
* [vmalert] add webserver
2020-04-11 12:40:24 +03:00
Roman Khavronenko
b099d84271
Vmalert/rules eval (#400)
* Initial rules evaluation support.

Rules are now store alerts state in private field `alerts`. Every evaluation updates
the alerts and state. Every unique metric received from datastore represents a unique alert,
uniqueness is guaranteed by hashing ordered labelset.

* merge with master

* cleanup

* support endAt parameter as 3*evaluationInterval for active alerts

* make golint happy
2020-04-06 14:44:03 +03:00
kreedom
298eb0a0f8 [vmalert] improve external url handling 2020-04-01 22:29:11 +03:00
kreedom
12fe915b48
[vmalert] add prometheus template function (#396)
* [vmalert] add prometheus template function

* make linter be happy

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2020-04-01 18:17:53 +03:00
kreedom
bf6c24d0f4
[vmalert] config parser (#393)
* [vmalert] config parser

* make linter be happy

* fix test

* fix sprintf add test for rule validation
2020-03-29 01:48:30 +02:00
kreedom
b22da547a2
[vmalert] - parse template annotaions (#387)
* [vmalert] - parse template annotations
2020-03-27 18:31:16 +02:00
Aliaksandr Valialkin
4fc4912f0c app/vmalert/datasource: typo fix in docs: Labels -> Label 2020-03-13 12:22:33 +02:00
kreedom
a746cb62b6
vmalert add vm datasource, change alertmanager (#364)
* vmalert add vm datasource, change alertmanager

* make linter be happy

* make linter be happy.2

* PR comments

* PR comments.1
2020-03-13 12:19:31 +02:00
kreedom
49390b8dbc
[vmalert] integration with AlertManager (#325) 2020-02-21 23:15:05 +02:00
kreedom
3c06179184
basic vmalert backbone (#317)
* basic vmalert backbone

* Resolve code review comments for vmalert backbone

* Second review fixes for vmalert backbone
2020-02-16 20:59:02 +02:00
Artem Navoiev
a47f292295 [vmalert] add vmalert.png.2 2020-02-02 12:17:19 +02:00
Artem Navoiev
354232b62b [vmalert] add vmalert.png 2020-02-02 12:16:05 +02:00
Artem Navoiev
28778be0cc [vmalert] initial 2020-02-02 12:14:09 +02:00