Commit graph

599 commits

Author SHA1 Message Date
Aliaksandr Valialkin
0fdbe5de25 app/vmselect/netstorage: increase concurrency when processing small number of time series with big number of data points per each time series
Previously VictoriaMetrics was processing up to 32 time series in a single goroutine.
This could be slow if each time series contains big number of data points (10M+ or more), since only a single CPU core could be loaded with work,
while other CPU cores were idle. Fix this by launching GOMAXPROCS workers for time series processing.

This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/572
2020-06-23 22:45:57 +03:00
Aliaksandr Valialkin
3a444bb7bb lib/promrelabel: add support for keep_if_equal and drop_if_equal actions to relabel configs
These actions may be useful for filtering out unneeded targets and/or metrics if they contain equal label values.
For example, the following rule would leave the target only if __meta_kubernetes_annotation_prometheus_io_port
equals __meta_kubernetes_pod_container_port_number:

  - action: keep_if_equal
    source_labels: [__meta_kubernetes_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
2020-06-23 17:29:19 +03:00
kreedom
f227799c87 Support of custom URL path for alert (#560)
app/vmalert: Support custom URL for alerts source

Add flag `external.alert.source` for configuring custom URL
for alert's source. This may be handy to re-point default source
URL to other systems like Grafana.
Updates #517
2020-06-21 16:33:58 +03:00
Aliaksandr Valialkin
70bf8218bb app/vmselect/promql: properly override label values from group_left and group_right lists like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/577
2020-06-21 16:32:27 +03:00
Aliaksandr Valialkin
2fc2679a3f app/vminsert/netstorage: remove possible race condition when broken connection may be recovered before acquiring storageNode.bcLock 2020-06-20 16:38:08 +03:00
Aliaksandr Valialkin
9409a31c07 docs/vmauth.md: mention that we can provide custom integration with SAML 2020-06-19 13:13:53 +03:00
Aliaksandr Valialkin
4400700832 app/vminsert: properly replicate data for the last RF-1 storage nodes for -replicationFactor=RF
Previously the data for the last `RF-1` storage noes has been incorrectly replicated to the first storage node.
2020-06-19 12:40:22 +03:00
Aliaksandr Valialkin
4f673a5201 app/vminsert: export metrics for determining ingested rows with dropped or truncated labels
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/565
2020-06-19 01:12:44 +03:00
Aliaksandr Valialkin
6939e36fdd app/vmselect/promql: fill gaps on right side with values from left side of or operator in the same way as Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/552
2020-06-18 23:05:23 +03:00
Aliaksandr Valialkin
85c1ccb8b8 app/vminsert/netstorage: add missing return in storageNode.checkHealth on connection failure 2020-06-18 20:51:51 +03:00
Aliaksandr Valialkin
464682f380 app/vminsert/netstorage: periodically check for each -storageNode health, so it could be marked as healthy when it is ready to accept data
This fixes uneven data routing in cluster version when `-replicationFactor` is set to 1 (default value),
i.e. when the replication is disabled.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/546
2020-06-18 20:42:43 +03:00
Roman Khavronenko
1a01fe2cf2 vmalert-537: allow name duplication for rules within one group. (#559)
Uniqueness of rule is now defined by combination of its name, expression and
labels. The hash of the combination is now used as rule ID and identifies rule within the group.

Set of rules from coreos/kube-prometheus was added for testing purposes to
verify compatibility. The check also showed that `vmalert` doesn't support
`query` template function that was mentioned as limitation in README.
2020-06-18 18:54:35 +03:00
Aliaksandr Valialkin
87151e825e docs/vmbackup.md: mention that backups from single-node and cluster versions are incompatible 2020-06-18 18:54:34 +03:00
Aliaksandr Valialkin
cc2225cc49 app/vmselect: fix the error after 936f35920a 2020-06-12 22:00:45 +03:00
Aliaksandr Valialkin
936f35920a app/vmselect/prometheus: allow returning partial response from /api/v1/export if -search.denyPartialResponse=false
This makes `/api/v1/export` behaviour consistent with other `/api/v1/*` handlers.
2020-06-12 21:11:48 +03:00
Clémence Saussez
0b53e380cf app/vmalert: fix link to testdata (#547)
Fix broken link to vmalert test data
Signed-off-by: Clemence Saussez <clemence@zen.ly>
2020-06-10 19:37:21 +03:00
Roman Khavronenko
d71b6e6584 vmalert-491: allow to configure concurrent rules execution per group. (#542)
The feature allows to speed up group rules execution by
executing them concurrently.

Change also contains README changes to reflect configuration
details.
2020-06-09 15:22:11 +03:00
Roman Khavronenko
5c049bf4dd vmalert-521: allow to disable rules expression validation. (#536)
This feature may be useful for using `vmalert` with PromQL
compatible datasources like Loki.
2020-06-09 15:19:25 +03:00
Aliaksandr Valialkin
c1be462d42 app/vmauth: disable automatic response compression/uncompression, since it may work improperly in some cases
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/535
2020-06-05 20:14:07 +03:00
Aliaksandr Valialkin
7680b7155d app/vmauth: emit fatal errors instead of panics when incorrect command-line flags are set 2020-06-05 20:14:05 +03:00
Aliaksandr Valialkin
01719f4949 app/vmstorage/transport: simplify setupTfss in order to prevent the possibility of nil tfs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/534
2020-06-05 13:17:26 +03:00
Aliaksandr Valialkin
e4cef1b678 app/vmstorage: prevent from serving conns from vminsert and vmselect after the server is closed
Previously it was possible that the connection is served after the server is closed if the following
steps are performed:

1) Server accepts new connection.
2) Server.MustClose() is called and successfully finished.
3) Server starts processing the connection accepted at step 1. There could be various crashes
   like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/534 since the storage may be already closed.

Now the server closes the connection at step 3 without processing it.
2020-06-05 11:55:48 +03:00
Aliaksandr Valialkin
58069f5a6a app/vmalert: print brief usage info for vmalert -help 2020-06-05 10:43:24 +03:00
Aliaksandr Valialkin
3848ea3a4a app/vmauth: print brief usage info for vmauth -help 2020-06-05 10:40:11 +03:00
Aliaksandr Valialkin
8ad8ca350a app/vmagent: print brief usage info for vmagent -help 2020-06-05 10:40:10 +03:00
Aliaksandr Valialkin
3d0a0b3785 lib/fs: optimize MustGetFreeSpace performance by caching the results for up to 2 seconds 2020-06-04 13:14:04 +03:00
DexterZhang
fa103875a0
feat(vmselect): add tmp block dir size metrics vm_tmp_blocks_files_size_total (#527)
* feat(vmselect): add tmp block dir size metrics `vm_tmp_blocks_files_size_total`

* refactor(vmselect): use free space instead of used space in tmp block file metrics

* fix: add `bytes` suffix to tmp dir free space metric
2020-06-04 13:05:50 +03:00
Aliaksandr Valialkin
faea804b88 app/vmauth: log when -auth.config is reloaded in SIGHUP 2020-06-03 23:22:20 +03:00
Aliaksandr Valialkin
045b87c662 app/vmalert: fix comment for UpdateWith exported methods 2020-06-01 14:35:03 +03:00
Aliaksandr Valialkin
43b14b9569 app/vminsert/netstorage: free up unused memory in buffer after memory usage spikes 2020-06-01 14:33:35 +03:00
Roman Khavronenko
44c51c627f vmalert: Add recording rules support. (#519)
* vmalert: Add recording rules support.

Recording rules support required additional service refactoring since
it wasn't planned to support them from the very beginning. The list
of changes is following:
* new entity RecordingRule was added for writing results of MetricsQL
expressions into remote storage;
* interface Rule now unites both recording and alerting rules;
* configuration parser was moved to separate package and now performs
more strict validation;
* new endpoint for listing all groups and rules in json format was added;
* evaluation interval may be set to every particular group;

* vmalert: uncomment tests

* vmalert: rm outdated TODO

* vmalert: fix typos in README
2020-06-01 13:53:46 +03:00
Aliaksandr Valialkin
37aa4fe282 app/vmagent: reload -remoteWrite.relabelConfig and -remoteWrite.urlRelabelConfig on SIGHUP and on /-/reload
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/518
2020-05-30 14:37:02 +03:00
Aliaksandr Valialkin
a646131a33 app/vmagent: log fatal errors instead of panics when improper command-line flags are passed to vmagent 2020-05-30 14:22:38 +03:00
Aliaksandr Valialkin
f41a01332a app/vminsert/netstorage: evenly distribute rerouted rows among all the availalbe storage nodes
Previously such rows were distributed to the original storage node or to the next storage node.
This may result to uneven load among the remaining storage nodes.
2020-05-30 13:51:09 +03:00
Aliaksandr Valialkin
02b2064d8e app/vminsert/netstorage: do not increment vm_rpc_rows_lost_total when all the vmstorage nodes are unavailable, since vminsert retries sending the data instead of dropping it 2020-05-28 22:36:56 +03:00
Aliaksandr Valialkin
7a61357b5d app/vminsert/netstorage: make sure that the the data is always replicated among -replicationFactor vmstorage nodes
Previously vminsert could write multiple copies of the data to a single vmstorage node when the ingestion rate
exceeds the maximum throughput for connections to vmstorage nodes.
2020-05-28 19:59:07 +03:00
Aliaksandr Valialkin
77e5165e7b app/vminsert: add -replicationFactor command-line flag for enabling data replication among available -storageNode instances 2020-05-27 17:29:44 +03:00
Aliaksandr Valialkin
b4e3bffe4b app/vminsert/netstorage: emit warnings instead of errors when re-routing data to healthy storage nodes 2020-05-27 16:31:41 +03:00
Aliaksandr Valialkin
75f2f3b09d app/vminsert/netstorage: improve ingestion performance when a single vmstorage node is slower than other vmstorage nodes
Previously the ingestion performance has been limited by the slowest vmstorage node.
Now vminsert should re-route data from the slowest vmstorage node to the remaining nodes.
2020-05-27 15:08:22 +03:00
Aliaksandr Valialkin
9844845d79 app/vminsert: tune the maximum summary buffer size for pending data to 1/4 of available RAM, since 1/2 of RAM is too big considering GOGC overhead 2020-05-25 02:00:37 +03:00
Aliaksandr Valialkin
4a82631e44 app/vminsert: limit the summary buffer sizes for all the storage nodes to a half of the allowed memory 2020-05-25 01:39:33 +03:00
Aliaksandr Valialkin
4bd3d4b148 app/vminsert/netstorage: do not return error from storageNode.flushBufLocked when the buffer has been successfully re-routed to healthy nodes
This should reduce the number of false errors in the log and the number of falsely lost rows
2020-05-22 18:29:43 +03:00
Aliaksandr Valialkin
6edc33d9bb app/vminsert/netstorage: capture the first error instead of the last error when sending data to vmstorage
The first error has more chances to point to the real root cause of the issue.
2020-05-22 17:49:33 +03:00
Aliaksandr Valialkin
bb4a2bf1aa app/vmauth: fix make run-vmauth command 2020-05-22 16:45:19 +03:00
Aliaksandr Valialkin
dcbdc009f5 app/vmagent: check for error returned from flag.Set 2020-05-21 16:30:48 +03:00
Aliaksandr Valialkin
b59e089ac7 app/vmagent: add -dryRun option for checking all the configs mentioned in command-line flags without running vmagent
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/362
2020-05-21 15:23:18 +03:00
Aliaksandr Valialkin
901093279e app/vmstorage/transport: update stale comment - vmstorage now sends small ack packets to vminsert 2020-05-21 14:04:52 +03:00
kreedom
2752d6cb26 vmalert add quotes escape function (#510)
* vmalert add quotes escape function

Co-authored-by: kreedom
2020-05-21 12:10:35 +03:00
Aaron France
b26245c48b Update README.md 2020-05-21 12:10:33 +03:00
Aliaksandr Valialkin
d83c68ca03 app/vmselect/promql: add ascent_over_time(m[d]) and descent_over_time(m[d]) functions
These functions could be useful in GPS tracking apps for calculating the summary for height gain/loss
over the given duration `d`.
2020-05-21 12:06:34 +03:00
Aliaksandr Valialkin
8ff28f5b91 app/vmselect/promql: update numbers after the upgrade of github.com/VictoriaMetrics/metrics from v1.11.2 to v1.11.3 2020-05-20 03:07:07 +03:00
Aliaksandr Valialkin
ddc9e69bd6 docs/vmagent.md: mention an alternative to refresh_interval option in scrape configs 2020-05-19 23:10:16 +03:00
Aliaksandr Valialkin
7d46dd452a app/vmselect/promql: move common code from aggrFuncOutliersK and newAggrFuncRangeTopK into getRangeTopKTimeseries 2020-05-19 16:11:03 +03:00
Aliaksandr Valialkin
37068064dd app/vmselect/promql: fix outilersk calculations 2020-05-19 14:45:10 +03:00
Aliaksandr Valialkin
fc81ea38d4 app/vmselect/promql: add outliersk(N, m) aggregate function for anomaly detection across groups of similar time series 2020-05-19 13:52:44 +03:00
Aliaksandr Valialkin
9ca781b8f0 app/vmalert/notifier: go fmt 2020-05-19 13:00:18 +03:00
kreedom
27911ae179 vmalert - add expr to variables, add escape functions (#495)
* vmalert - add expr to variables, add escape functions

Co-authored-by: kreedom
2020-05-19 11:55:03 +03:00
Roman Khavronenko
c7f3e58032 vmalert: avoid sending resolves for pending alerts (#498)
Before the change we were sending notifications to notifier
if following conditions are met:
* alert is in Fire state
* alert is in Inactive state

We were sending Inactive notifications to resolve alert ASAP. 
Unfortunately, we were sending resolves for Pending alerts that become
Inactive, which is wrong.

In this change we delete alert from the active list if
it was Pending and become Inactive. In this way we now
have Inactive alerts only if they were in state Fire before.
See test change for example.
2020-05-19 11:55:00 +03:00
Roman Khavronenko
e5f5342e18 vmalert: fix potential race during configuration reloads (#497)
Configuration reload and rules evaluation can't be executed
in same time now. This may make reload time longer but
prevents from potential races.
2020-05-19 11:54:55 +03:00
Aliaksandr Valialkin
b99d03a956 app/vmalert: run make quicktemplate-gen from the root dir of the repository 2020-05-16 22:45:45 +03:00
Aliaksandr Valialkin
2784015a4d all: print --help output to stdout instead of stderr
This is easier to grep and pipe
2020-05-16 12:03:06 +03:00
Aliaksandr Valialkin
dbf8048134 app/vmrestore: document better that vmrestore works like rsync --delete, i.e. it deletes files in -storageDataPath, which are missing in the backup 2020-05-16 09:02:46 +03:00
Aliaksandr Valialkin
e544155a82 app/vmagent/Makefile: fix make run-vmagent rule 2020-05-15 19:35:16 +03:00
Aliaksandr Valialkin
6c43ba1cb1 app/vmagent/remotewrite: remove unused import after the commit 93267f143f 2020-05-15 17:42:31 +03:00
Aliaksandr Valialkin
1d71253653 app/vmagent/remotewrite: allow ingesting time series with multiple samples at once
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/481
2020-05-15 17:37:27 +03:00
Aliaksandr Valialkin
a853869e75 app/vmstorage/transport: prevent from uncontrolled memory usage growth when vminsert sends big packets with too long labels
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/490
2020-05-15 15:42:54 +03:00
Aliaksandr Valialkin
1e5c1d7eaa app/vmstorage: add vm_slow_metric_name_loads_total metric, which could be used as an indicator when more RAM is needed for improving query performance 2020-05-15 14:12:24 +03:00
Aliaksandr Valialkin
d6b9a49481 app/vmstorage: add vm_slow_row_inserts_total and vm_slow_per_day_index_inserts_total metrics for determining whether VictoriaMetrics required more RAM for the current number of active time series 2020-05-15 13:46:57 +03:00
Roman Khavronenko
e850bf0eff vmalert: fix the access to rules slice element by wrong index (#486)
During group's update rules deletion was causing slice
mutations while slice index was assumed to be unchanged.
This caused "slice bounds out of range" errors when multiple
rules were deleted sequentially.
2020-05-15 13:26:06 +03:00
hagen1778
d369450f27 vmalert: update README 2020-05-15 13:26:04 +03:00
Aliaksandr Valialkin
3845420a8f lib: extract common code for returning fast unix timestamp into lib/fasttime 2020-05-14 23:06:50 +03:00
Roman Khavronenko
e208e76222 vmalert: check if remoteRead object was initied before calling Restore (#473)
The check for non-nil remoteRead was mistakenly dropped
during refactoring which caused panics when `vmalert`
wasn't configured with `remoteRead` flag.
2020-05-13 22:57:26 +03:00
Roman Khavronenko
1523890742 vmalert: fix flag names and description in README (#475)
Change also adds the recommendation for `remotewrite`
queue error.
2020-05-13 22:57:20 +03:00
肖贝贝
8c3e9adf7f Feat/vmalert add max queue size (#472)
* feat: add remoteWrite.maxQueueSize to reduce queue full
* rename remote(write|read) flags to remote(Write|Read) for the sake of consistency

Co-authored-by: xiaobeibei <xiaobeibei@bigo.sg>
2020-05-13 22:57:16 +03:00
Aliaksandr Valialkin
bac9a684e8 docs/vmbackup.md: add a link to vmbackuper tool 2020-05-13 22:57:11 +03:00
Aliaksandr Valialkin
f3d9a5b0ec app/vmselect/promql: suppress "SA4006: this value of dstValues is never used" error in golangci-lint 2020-05-13 11:46:05 +03:00
Aliaksandr Valialkin
3b0f66a227 app/vmagent: fix a bug with improper relabeling when multiple -remoteWrite.urlRelableConfig args are set
This bug could result in incorrect relabeling and metrics' drop.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/467
2020-05-12 22:03:45 +03:00
Aliaksandr Valialkin
18a0caee43 app/vmselect/promql: fix any(..) calculations - return all the data points instead of the first one 2020-05-12 20:36:49 +03:00
Aliaksandr Valialkin
3d3f41b961 app/vmstorage/transport: fix panic during server stop on 32-bit arches
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/212
2020-05-12 20:21:40 +03:00
Aliaksandr Valialkin
81b8811cf4 app/vmselect/promql: remove -search.maxPointsPerTimeseries command-line flag
Limit the estimated time series count after aggregation with grouping by the number of source time series.
2020-05-12 19:54:44 +03:00
Aliaksandr Valialkin
408ade27a9 app/vmselect/promql: add any(x) by (y) aggregate function, which returns any time series from q for each group y 2020-05-12 19:50:29 +03:00
Aliaksandr Valialkin
21c2982ac8 app/vmselect/promql: support for sum(x) by (y) limit N syntax in order to limit the number of output time series after aggregation 2020-05-12 19:50:12 +03:00
Aliaksandr Valialkin
f341c6fcc4 Revert "app/vmselect: add -search.estimatedSeriesCountAfterAggregation command-line flag for tuning the probability of OOMs or false-positive not enough memory errors"
This reverts commit fbb7986dd2380fce2fc8633b7eda8b67f419e74c.

Reason for revert: this commit has been removed from single-node version
2020-05-12 19:50:08 +03:00
Aliaksandr Valialkin
d54a93fc81 app/vmagent: fix scraping mTLS targets, which has been broken in v1.35.1
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/470
2020-05-12 17:23:43 +03:00
Aliaksandr Valialkin
405cf44aed app/vmagent,lib/promscrape: do not set HostClient.DialDualStack, since it isnt used if HostClient.Dial is set 2020-05-12 15:24:53 +03:00
Aliaksandr Valialkin
da6a84e147 app/vmagent/remotewrite: properly dial TCP6 addresses set via -remoteWrite.url
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/469
2020-05-12 15:24:50 +03:00
Aliaksandr Valialkin
4e237b4670 app/vminsert/influx: support passing AccountID and ProjectID via plain TCP and UDP
Now `vminsert` accepts AccountID and ProjectID via `VictoriaMetrics_AccountID` and `VictoriaMetrics_ProjectID` tags
when reading Influx line protocol data via plain TCP or UDP (i.e. when `-influxListenAddr` is set).
2020-05-12 13:13:04 +03:00
Aliaksandr Valialkin
f7753b1469 lib/storage: gradually pre-populate per-day inverted index for the next day
This should prevent from CPU usage spikes at 00:00 UTC every day when
inverted index for new day must be quickly created for all the active time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
2020-05-12 12:13:32 +03:00
Roman Khavronenko
0157566fdb vmalert: cleanup and restructure of code to improve maintainability (#471)
The change introduces new entity `manager` which replaces
`watchdog`, decouples requestHandler and groups. Manager
supposed to control life cycle of groups, rules and
config reloads.

Groups export an ID method which returns a hash
from filename and group name. ID supposed to be unique
identifier across all loaded groups.

Some tests were added to improve coverage.

Bug with wrong annotation value if $value is used in
 templates after metrics being restored fixed.

Notifier interface was extended to accept context.

New set of metrics was introduced for config reload.
2020-05-11 14:35:55 +03:00
Nikolay Khramchikhin
0e8c345ffb vmalert config reload
added config hot reload for vmalert with sighup and api call
2020-05-11 14:35:50 +03:00
Aliaksandr Valialkin
6646b380ef docs/vmauth.md: fix a link to docker images 2020-05-08 14:11:10 +03:00
Aliaksandr Valialkin
28ad350a31 app/vmagent: return 200 from /-/reload endpoint as Prometheus does 2020-05-07 19:29:48 +03:00
Aliaksandr Valialkin
3052b479b7 lib/httpserver: reduce typical duration for http server graceful shutdown
Previously the duration for graceful shutdown for http server could take more than a minute
because of imporperly set timeouts in setNetworkTimeout.
Now typical duration for graceful shutdown should be reduced to less than 5 seconds.
2020-05-07 14:16:38 +03:00
Aliaksandr Valialkin
dc04040781 docs/{vmagent,vmauth}: small clarifications in the docs 2020-05-07 12:55:06 +03:00
Aliaksandr Valialkin
2b403d3f42 app/vmauth: prevent from attacks with .. in path for accessing resources outside the configured url_prefix 2020-05-07 12:55:04 +03:00
Aliaksandr Valialkin
20538a2a5d app/vmagent: allow setting independent auth configs per each configured -remoteWrite.url 2020-05-06 16:52:32 +03:00
Aliaksandr Valialkin
12dbb9e22c app/vmagent: properly set client-side TLS certificates for -remoteWrite.url. Previously they were mistakenly set as server-side 2020-05-06 16:50:37 +03:00
Aliaksandr Valialkin
8665c2edb1 docs/vmagent.md: small fixes 2020-05-06 14:49:25 +03:00
Aliaksandr Valialkin
8ab5e47b5c lib/promscrape: add Prometheus-compatible DNS-based service discovery aka dns_sd_configs 2020-05-06 00:02:41 +03:00
Aliaksandr Valialkin
21b91599c2 docs/{vmauth,vmagent}: fix ports for profiling 2020-05-05 20:16:09 +03:00
Aliaksandr Valialkin
309700ab8c docs/vmauth.md: mention that we can help creating customized proxy 2020-05-05 12:34:08 +03:00
Aliaksandr Valialkin
20e958789a docs/{vmagent,vmauth}: add Profiling section 2020-05-05 11:45:29 +03:00
Aliaksandr Valialkin
1153f30fee docs: add vmauth.md 2020-05-05 11:17:45 +03:00
Aliaksandr Valialkin
782fb30cd0 app/vmauth: build fixes 2020-05-05 11:03:25 +03:00
Aliaksandr Valialkin
de31d16154 app/vmauth: add initial version of vmauth. See https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmauth/README.md for details 2020-05-05 10:56:20 +03:00
Aliaksandr Valialkin
61df59b9ea docs/vmagent.md: /targets page doesnt expose infomration about imporperly configured scrape configs now. It is written in error log instead 2020-05-05 10:56:18 +03:00
Roman Khavronenko
abce2b092f app/vmalert: restore alerts state from datasource metrics (#461)
* app/vmalert: restore alerts state from datasource metrics

Vmalert will restore alerts state for rules that have `rule.For` > 0 from previously written timeseries via `remotewrite.url` flag.

* app/vmalert: mention remotewerite and remoteread configuration in README
2020-05-05 00:52:19 +03:00
Aliaksandr Valialkin
89aa6dbf56 lib/promscrape: add Prometheus-compatible service discovery for Consul aka consul_sd_configs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/330
2020-05-04 20:53:06 +03:00
Aliaksandr Valialkin
b21b73115a app/vminsert: add /-/reload handler in the same way as for vmagent 2020-04-30 02:18:08 +03:00
DexterZhang
ae215e5538 feat(vmagent): add promscrap config reload suppport via http (#450)
* feat(vmagent): add promscrap config reload suppport via http endpoint `/-/reload`

* fix: typo fix
2020-04-30 02:18:01 +03:00
Artem Navoiev
121f7e1d56 Update README.md 2020-04-29 17:41:04 +03:00
Aliaksandr Valialkin
b6d88bac04 vendor: use github.com/VictoriaMetrics/fasthttp instead of github.com/fasthttp/fasthttp
The upstream fasthttp may contain issues like 996610f021 ,
plus a code that isn't used by VictoriaMetrics. So let's use a private copy under our control instead.
2020-04-29 16:43:09 +03:00
Aliaksandr Valialkin
9ed4951ec8 lib/metricsql: move it to a separate repository - github.com/VictoriaMetrics/metrics 2020-04-28 15:30:06 +03:00
Aliaksandr Valialkin
cd1145e5f4 app/vmselect: add -search.estimatedSeriesCountAfterAggregation command-line flag for tuning the probability of OOMs or false-positive not enough memory errors 2020-04-28 12:51:48 +03:00
Aliaksandr Valialkin
a858b7e393 app/vmalert: added missing comments for public entities 2020-04-28 11:19:48 +03:00
Aliaksandr Valialkin
716bbe79d4 app/vminsert/netstorage: increase timeout for waiting for ack message after sending big data block to vmstorage 2020-04-28 11:19:46 +03:00
Aliaksandr Valialkin
50af16baf2 app/vmalert: fix build 2020-04-28 00:34:01 +03:00
Aliaksandr Valialkin
e3db2c73a6 app/vmalert: sync with master branch 2020-04-28 00:19:42 +03:00
Aliaksandr Valialkin
7644f40763 app/vmalert: include it into the next release 2020-04-28 00:11:41 +03:00
Aliaksandr Valialkin
86a1d9cb0c lib/promscrape: add initial support for Prometheus-compatible service discovery for Amazon EC2 aka ec2_sd_configs 2020-04-27 19:29:22 +03:00
Aliaksandr Valialkin
0daa37fa02 lib/promscrape/discovery/gce: allow empty project and zone for gce_sd_config 2020-04-27 11:45:45 +03:00
Aliaksandr Valialkin
989d84cf3f app/{vminsert,vmstorage}: wait for ack from vmstorage after each packet sent to it from vminsert
This should protect from possible data loss when `vmstorage` is stopped while the packet is sent from `vminsert`.

This commit switches to new protocol between vminsert and vmstorage, which is incompatible
with the previous protocol. So it is required that both vminsert and vmstorage nodes are updated.
2020-04-27 09:53:26 +03:00
Aliaksandr Valialkin
e933cbac16 lib/storage: postpone reading data from blocks during search
This eliminates the need for storing block data into temporary files on a single-node VictoriaMetrics
during heavy queries, which touch big number of time series over long time ranges.

This improves single-node VM performance on heavy queries by up to 2x.
2020-04-27 08:44:01 +03:00
Aliaksandr Valialkin
23a310cc68 app/vmselect/netstorage: substitute sorting packedTimeseries with the natural order of the fetched blocks
This should minimize the number of disk seeks when reading data from temporary file.
2020-04-26 16:46:17 +03:00
Aliaksandr Valialkin
31861c5b8e lib/promscrape/discovery/gce: allow empty zone arg in gce_sd_config - in this case zones for the given project are automatically discovered 2020-04-26 14:37:38 +03:00
Aliaksandr Valialkin
d9bdda408c docs/{vmbackup,vmrestore}.md: update -help output 2020-04-24 22:44:45 +03:00
Jason Gardner
7a6b2839b4 app/vmbackup: added ability to create and delete snapshots during backup (#428)
* app/vmbackup: added ability to create and delete snapshots during backup

Resolves: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/422

* Add snapshot create and delete url flags

* Fixed errcheck warnings in build
2020-04-24 22:35:50 +03:00
Aliaksandr Valialkin
32b3f959fc app/vmselect: fix description for -search.resetCacheAuthKey 2020-04-24 19:44:35 +03:00
Aliaksandr Valialkin
069690e3bd lib/promscrape: initial implementation for gce_sd_configs aga Prometheus-compatible service discovery for Google Compute Engine 2020-04-24 17:53:43 +03:00
Aliaksandr Valialkin
f9526809e5 app/vmselect: add /api/v1/status/tsdb page with useful stats for locating root cause for high cardinality issues
See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/268
2020-04-22 22:03:23 +03:00
Aliaksandr Valialkin
b59f1f1504 app/vmselect: add -search.minStalenessInterval command-line flag for removing gaps on graphs built from time series with irregular duration between samples
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/426
2020-04-20 19:42:41 +03:00
Aliaksandr Valialkin
603d4c9217 app/vmselect: merge -search.maxLookback and -search.maxStalenessInterval flags, since it has been appeared they have identical purpose :(
Leave both flags for backwards compatibility reasons.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/209
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/426
2020-04-20 19:28:28 +03:00
Aliaksandr Valialkin
db5fe03170 deployment/docker: allow building docker images on top of any base image set via ROOT_IMAGE environment var
For example, the following command will build VictoriaMetrics docker image on top of alpine image:

    ROOT_IMAGE=alpine make package-victoria-metrics
2020-04-20 01:16:21 +03:00
Aliaksandr Valialkin
1b911f6965 app/vmagent/remotewrite: retry sending data if the server closes keep-alive connection
This should fix the following error when sending data to remote storage:

couldn't send a block with size XX bytes to "YYY": the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection
2020-04-17 15:53:17 +03:00
Aliaksandr Valialkin
9105f72f17 docs/vmagent.md: typo fix: unvailable -> unavailable 2020-04-17 13:12:13 +03:00
Aliaksandr Valialkin
d46311fd93 app/vmagent/README.md: mention about prodmscrape.suppressScrapeErrors 2020-04-17 13:09:08 +03:00
Aliaksandr Valialkin
b9b5641c2f app/vmselect: properly apply -search.maxLookback to queries sent to /api/v1/query 2020-04-17 12:31:18 +03:00
Aliaksandr Valialkin
d4bc60d63c lib/logger: add WARN level for logging expected errors such as invalid user queries 2020-04-15 20:50:45 +03:00
Aliaksandr Valialkin
a873b553cf app/vmselect: handle timestamp(metric offset X) the same way as Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/415
2020-04-15 12:01:05 +03:00
Aliaksandr Valialkin
6ec582acb9 lib/promscrape: show information on improperly configured scrape targets at the bottom of /targets page
This is a common error whith improperly configured target autodiscovery and/or relabeling.
This error leads to duplicate scraping of the same targets with the same set of labels, which leads
to duplicate samples in time series.
2020-04-14 14:55:13 +03:00
Aliaksandr Valialkin
755f649c72 docs/vmagent.md: mention that vmagent supports kubernetes_sd_configs now 2020-04-13 21:07:00 +03:00
Aliaksandr Valialkin
38256bd66d docs: update minimum supported Go version from 1.12 to 1.13 2020-04-07 13:39:15 +03:00
Aliaksandr Valialkin
2b4d3effad app/vmagent/remotewrite: add "X-Prometheus-Remote-Write-Version: 0.1.0" http header to remote_write request
This header is required by Cortex (and, probably, other remote storage systems).
See 9c1f44d090/docs/apis.md (remote-api) .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/399
2020-04-04 16:24:47 +03:00
Aliaksandr Valialkin
87da127fbf app/victoria-metrics: remove accidentally added testdata for single-node VM 2020-04-04 16:09:08 +03:00
Aliaksandr Valialkin
a012f6fe70 app/vmselect/promql: keep metric name after applying first_over_time and last_over_time functions 2020-04-04 14:54:02 +03:00
Aliaksandr Valialkin
a53e332a93 app/vmstorage: add missing shutdown for http server on graceful shutdown
This could result in the following panic during graceful shutdown when `/metrics` page is requested:

http: panic serving 10.101.66.5:57366: runtime error: invalid memory address or nil pointer dereference
goroutine 2050 [running]:
net/http.(*conn).serve.func1(0xc00ef22000)
	net/http/server.go:1772 +0x139
panic(0xa0fc00, 0xe91d80)
	runtime/panic.go:973 +0x3e3
github.com/VictoriaMetrics/VictoriaMetrics/lib/workingsetcache.(*Cache).UpdateStats(0x0, 0xc0000516c8)
	github.com/VictoriaMetrics/VictoriaMetrics/lib/workingsetcache/cache.go:224 +0x37
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*indexDB).UpdateMetrics(0xc00b931d00, 0xc02c41acf8)
	github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/index_db.go:258 +0x9f
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*Storage).UpdateMetrics(0xc0000bc7e0, 0xc02c41ac00)
	github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/storage.go:413 +0x4c5
main.registerStorageMetrics.func1(0x0)
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:186 +0xd9
main.registerStorageMetrics.func3(0xc00008c380)
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:196 +0x26
main.registerStorageMetrics.func7(0xc)
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:211 +0x26
github.com/VictoriaMetrics/metrics.(*Gauge).marshalTo(0xc000010148, 0xaa407d, 0x20, 0xb50d60, 0xc005319890)
	github.com/VictoriaMetrics/metrics@v1.11.2/gauge.go:38 +0x3f
github.com/VictoriaMetrics/metrics.(*Set).WritePrometheus(0xc000084300, 0x7fd56809c940, 0xc005319860)
	github.com/VictoriaMetrics/metrics@v1.11.2/set.go:51 +0x1e1
github.com/VictoriaMetrics/metrics.WritePrometheus(0x7fd56809c940, 0xc005319860, 0xa16f01)
	github.com/VictoriaMetrics/metrics@v1.11.2/metrics.go:42 +0x41
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.writePrometheusMetrics(0x7fd56809c940, 0xc005319860)
	github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/metrics.go:16 +0x44
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper(0xb5a120, 0xc005319860, 0xc005018f00, 0xc00002cc90)
	github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:154 +0x58d
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.gzipHandler.func1(0xb5a120, 0xc005319860, 0xc005018f00)
	github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:119 +0x8e
net/http.HandlerFunc.ServeHTTP(0xc00002d110, 0xb5a660, 0xc0044141c0, 0xc005018f00)
	net/http/server.go:2012 +0x44
net/http.serverHandler.ServeHTTP(0xc004414000, 0xb5a660, 0xc0044141c0, 0xc005018f00)
	net/http/server.go:2807 +0xa3
net/http.(*conn).serve(0xc00ef22000, 0xb5bf60, 0xc010532080)
	net/http/server.go:1895 +0x86c
created by net/http.(*Server).Serve
	net/http/server.go:2933 +0x35c
2020-04-02 21:09:55 +03:00
Aliaksandr Valialkin
3b744f3c32 app/vmstorage: typo fix 2020-04-01 23:43:09 +03:00
Aliaksandr Valialkin
f838cdc86e app/vmstorage: add vm_free_disk_space_bytes metric for monitoring the remaining disk space at -storageDataPath 2020-04-01 23:10:44 +03:00
Aliaksandr Valialkin
5270b7a097 app/victoria-metrics/testdata: add a test for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/395 2020-03-31 12:51:46 +03:00
Aliaksandr Valialkin
d450249955 lib/storage: properly handle {label=~"foo|"} filters as Prometheus does
Such filters must match all the time series with `label="foo"` plus all the time series without `label`

Previously only time series with `label="foo"` were matched.
2020-03-30 20:21:47 +03:00