Commit graph

624 commits

Author SHA1 Message Date
Aliaksandr Valialkin
3d0a0b3785 lib/fs: optimize MustGetFreeSpace performance by caching the results for up to 2 seconds 2020-06-04 13:14:04 +03:00
DexterZhang
fa103875a0
feat(vmselect): add tmp block dir size metrics vm_tmp_blocks_files_size_total (#527)
* feat(vmselect): add tmp block dir size metrics `vm_tmp_blocks_files_size_total`

* refactor(vmselect): use free space instead of used space in tmp block file metrics

* fix: add `bytes` suffix to tmp dir free space metric
2020-06-04 13:05:50 +03:00
Aliaksandr Valialkin
faea804b88 app/vmauth: log when -auth.config is reloaded in SIGHUP 2020-06-03 23:22:20 +03:00
Aliaksandr Valialkin
045b87c662 app/vmalert: fix comment for UpdateWith exported methods 2020-06-01 14:35:03 +03:00
Aliaksandr Valialkin
43b14b9569 app/vminsert/netstorage: free up unused memory in buffer after memory usage spikes 2020-06-01 14:33:35 +03:00
Roman Khavronenko
44c51c627f vmalert: Add recording rules support. (#519)
* vmalert: Add recording rules support.

Recording rules support required additional service refactoring since
it wasn't planned to support them from the very beginning. The list
of changes is following:
* new entity RecordingRule was added for writing results of MetricsQL
expressions into remote storage;
* interface Rule now unites both recording and alerting rules;
* configuration parser was moved to separate package and now performs
more strict validation;
* new endpoint for listing all groups and rules in json format was added;
* evaluation interval may be set to every particular group;

* vmalert: uncomment tests

* vmalert: rm outdated TODO

* vmalert: fix typos in README
2020-06-01 13:53:46 +03:00
Aliaksandr Valialkin
37aa4fe282 app/vmagent: reload -remoteWrite.relabelConfig and -remoteWrite.urlRelabelConfig on SIGHUP and on /-/reload
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/518
2020-05-30 14:37:02 +03:00
Aliaksandr Valialkin
a646131a33 app/vmagent: log fatal errors instead of panics when improper command-line flags are passed to vmagent 2020-05-30 14:22:38 +03:00
Aliaksandr Valialkin
f41a01332a app/vminsert/netstorage: evenly distribute rerouted rows among all the availalbe storage nodes
Previously such rows were distributed to the original storage node or to the next storage node.
This may result to uneven load among the remaining storage nodes.
2020-05-30 13:51:09 +03:00
Aliaksandr Valialkin
02b2064d8e app/vminsert/netstorage: do not increment vm_rpc_rows_lost_total when all the vmstorage nodes are unavailable, since vminsert retries sending the data instead of dropping it 2020-05-28 22:36:56 +03:00
Aliaksandr Valialkin
7a61357b5d app/vminsert/netstorage: make sure that the the data is always replicated among -replicationFactor vmstorage nodes
Previously vminsert could write multiple copies of the data to a single vmstorage node when the ingestion rate
exceeds the maximum throughput for connections to vmstorage nodes.
2020-05-28 19:59:07 +03:00
Aliaksandr Valialkin
77e5165e7b app/vminsert: add -replicationFactor command-line flag for enabling data replication among available -storageNode instances 2020-05-27 17:29:44 +03:00
Aliaksandr Valialkin
b4e3bffe4b app/vminsert/netstorage: emit warnings instead of errors when re-routing data to healthy storage nodes 2020-05-27 16:31:41 +03:00
Aliaksandr Valialkin
75f2f3b09d app/vminsert/netstorage: improve ingestion performance when a single vmstorage node is slower than other vmstorage nodes
Previously the ingestion performance has been limited by the slowest vmstorage node.
Now vminsert should re-route data from the slowest vmstorage node to the remaining nodes.
2020-05-27 15:08:22 +03:00
Aliaksandr Valialkin
9844845d79 app/vminsert: tune the maximum summary buffer size for pending data to 1/4 of available RAM, since 1/2 of RAM is too big considering GOGC overhead 2020-05-25 02:00:37 +03:00
Aliaksandr Valialkin
4a82631e44 app/vminsert: limit the summary buffer sizes for all the storage nodes to a half of the allowed memory 2020-05-25 01:39:33 +03:00
Aliaksandr Valialkin
4bd3d4b148 app/vminsert/netstorage: do not return error from storageNode.flushBufLocked when the buffer has been successfully re-routed to healthy nodes
This should reduce the number of false errors in the log and the number of falsely lost rows
2020-05-22 18:29:43 +03:00
Aliaksandr Valialkin
6edc33d9bb app/vminsert/netstorage: capture the first error instead of the last error when sending data to vmstorage
The first error has more chances to point to the real root cause of the issue.
2020-05-22 17:49:33 +03:00
Aliaksandr Valialkin
bb4a2bf1aa app/vmauth: fix make run-vmauth command 2020-05-22 16:45:19 +03:00
Aliaksandr Valialkin
dcbdc009f5 app/vmagent: check for error returned from flag.Set 2020-05-21 16:30:48 +03:00
Aliaksandr Valialkin
b59e089ac7 app/vmagent: add -dryRun option for checking all the configs mentioned in command-line flags without running vmagent
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/362
2020-05-21 15:23:18 +03:00
Aliaksandr Valialkin
901093279e app/vmstorage/transport: update stale comment - vmstorage now sends small ack packets to vminsert 2020-05-21 14:04:52 +03:00
kreedom
2752d6cb26 vmalert add quotes escape function (#510)
* vmalert add quotes escape function

Co-authored-by: kreedom
2020-05-21 12:10:35 +03:00
Aaron France
b26245c48b Update README.md 2020-05-21 12:10:33 +03:00
Aliaksandr Valialkin
d83c68ca03 app/vmselect/promql: add ascent_over_time(m[d]) and descent_over_time(m[d]) functions
These functions could be useful in GPS tracking apps for calculating the summary for height gain/loss
over the given duration `d`.
2020-05-21 12:06:34 +03:00
Aliaksandr Valialkin
8ff28f5b91 app/vmselect/promql: update numbers after the upgrade of github.com/VictoriaMetrics/metrics from v1.11.2 to v1.11.3 2020-05-20 03:07:07 +03:00
Aliaksandr Valialkin
ddc9e69bd6 docs/vmagent.md: mention an alternative to refresh_interval option in scrape configs 2020-05-19 23:10:16 +03:00
Aliaksandr Valialkin
7d46dd452a app/vmselect/promql: move common code from aggrFuncOutliersK and newAggrFuncRangeTopK into getRangeTopKTimeseries 2020-05-19 16:11:03 +03:00
Aliaksandr Valialkin
37068064dd app/vmselect/promql: fix outilersk calculations 2020-05-19 14:45:10 +03:00
Aliaksandr Valialkin
fc81ea38d4 app/vmselect/promql: add outliersk(N, m) aggregate function for anomaly detection across groups of similar time series 2020-05-19 13:52:44 +03:00
Aliaksandr Valialkin
9ca781b8f0 app/vmalert/notifier: go fmt 2020-05-19 13:00:18 +03:00
kreedom
27911ae179 vmalert - add expr to variables, add escape functions (#495)
* vmalert - add expr to variables, add escape functions

Co-authored-by: kreedom
2020-05-19 11:55:03 +03:00
Roman Khavronenko
c7f3e58032 vmalert: avoid sending resolves for pending alerts (#498)
Before the change we were sending notifications to notifier
if following conditions are met:
* alert is in Fire state
* alert is in Inactive state

We were sending Inactive notifications to resolve alert ASAP. 
Unfortunately, we were sending resolves for Pending alerts that become
Inactive, which is wrong.

In this change we delete alert from the active list if
it was Pending and become Inactive. In this way we now
have Inactive alerts only if they were in state Fire before.
See test change for example.
2020-05-19 11:55:00 +03:00
Roman Khavronenko
e5f5342e18 vmalert: fix potential race during configuration reloads (#497)
Configuration reload and rules evaluation can't be executed
in same time now. This may make reload time longer but
prevents from potential races.
2020-05-19 11:54:55 +03:00
Aliaksandr Valialkin
b99d03a956 app/vmalert: run make quicktemplate-gen from the root dir of the repository 2020-05-16 22:45:45 +03:00
Aliaksandr Valialkin
2784015a4d all: print --help output to stdout instead of stderr
This is easier to grep and pipe
2020-05-16 12:03:06 +03:00
Aliaksandr Valialkin
dbf8048134 app/vmrestore: document better that vmrestore works like rsync --delete, i.e. it deletes files in -storageDataPath, which are missing in the backup 2020-05-16 09:02:46 +03:00
Aliaksandr Valialkin
e544155a82 app/vmagent/Makefile: fix make run-vmagent rule 2020-05-15 19:35:16 +03:00
Aliaksandr Valialkin
6c43ba1cb1 app/vmagent/remotewrite: remove unused import after the commit 93267f143f 2020-05-15 17:42:31 +03:00
Aliaksandr Valialkin
1d71253653 app/vmagent/remotewrite: allow ingesting time series with multiple samples at once
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/481
2020-05-15 17:37:27 +03:00
Aliaksandr Valialkin
a853869e75 app/vmstorage/transport: prevent from uncontrolled memory usage growth when vminsert sends big packets with too long labels
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/490
2020-05-15 15:42:54 +03:00
Aliaksandr Valialkin
1e5c1d7eaa app/vmstorage: add vm_slow_metric_name_loads_total metric, which could be used as an indicator when more RAM is needed for improving query performance 2020-05-15 14:12:24 +03:00
Aliaksandr Valialkin
d6b9a49481 app/vmstorage: add vm_slow_row_inserts_total and vm_slow_per_day_index_inserts_total metrics for determining whether VictoriaMetrics required more RAM for the current number of active time series 2020-05-15 13:46:57 +03:00
Roman Khavronenko
e850bf0eff vmalert: fix the access to rules slice element by wrong index (#486)
During group's update rules deletion was causing slice
mutations while slice index was assumed to be unchanged.
This caused "slice bounds out of range" errors when multiple
rules were deleted sequentially.
2020-05-15 13:26:06 +03:00
hagen1778
d369450f27 vmalert: update README 2020-05-15 13:26:04 +03:00
Aliaksandr Valialkin
3845420a8f lib: extract common code for returning fast unix timestamp into lib/fasttime 2020-05-14 23:06:50 +03:00
Roman Khavronenko
e208e76222 vmalert: check if remoteRead object was initied before calling Restore (#473)
The check for non-nil remoteRead was mistakenly dropped
during refactoring which caused panics when `vmalert`
wasn't configured with `remoteRead` flag.
2020-05-13 22:57:26 +03:00
Roman Khavronenko
1523890742 vmalert: fix flag names and description in README (#475)
Change also adds the recommendation for `remotewrite`
queue error.
2020-05-13 22:57:20 +03:00
肖贝贝
8c3e9adf7f Feat/vmalert add max queue size (#472)
* feat: add remoteWrite.maxQueueSize to reduce queue full
* rename remote(write|read) flags to remote(Write|Read) for the sake of consistency

Co-authored-by: xiaobeibei <xiaobeibei@bigo.sg>
2020-05-13 22:57:16 +03:00
Aliaksandr Valialkin
bac9a684e8 docs/vmbackup.md: add a link to vmbackuper tool 2020-05-13 22:57:11 +03:00
Aliaksandr Valialkin
f3d9a5b0ec app/vmselect/promql: suppress "SA4006: this value of dstValues is never used" error in golangci-lint 2020-05-13 11:46:05 +03:00
Aliaksandr Valialkin
3b0f66a227 app/vmagent: fix a bug with improper relabeling when multiple -remoteWrite.urlRelableConfig args are set
This bug could result in incorrect relabeling and metrics' drop.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/467
2020-05-12 22:03:45 +03:00
Aliaksandr Valialkin
18a0caee43 app/vmselect/promql: fix any(..) calculations - return all the data points instead of the first one 2020-05-12 20:36:49 +03:00
Aliaksandr Valialkin
3d3f41b961 app/vmstorage/transport: fix panic during server stop on 32-bit arches
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/212
2020-05-12 20:21:40 +03:00
Aliaksandr Valialkin
81b8811cf4 app/vmselect/promql: remove -search.maxPointsPerTimeseries command-line flag
Limit the estimated time series count after aggregation with grouping by the number of source time series.
2020-05-12 19:54:44 +03:00
Aliaksandr Valialkin
408ade27a9 app/vmselect/promql: add any(x) by (y) aggregate function, which returns any time series from q for each group y 2020-05-12 19:50:29 +03:00
Aliaksandr Valialkin
21c2982ac8 app/vmselect/promql: support for sum(x) by (y) limit N syntax in order to limit the number of output time series after aggregation 2020-05-12 19:50:12 +03:00
Aliaksandr Valialkin
f341c6fcc4 Revert "app/vmselect: add -search.estimatedSeriesCountAfterAggregation command-line flag for tuning the probability of OOMs or false-positive not enough memory errors"
This reverts commit fbb7986dd2380fce2fc8633b7eda8b67f419e74c.

Reason for revert: this commit has been removed from single-node version
2020-05-12 19:50:08 +03:00
Aliaksandr Valialkin
d54a93fc81 app/vmagent: fix scraping mTLS targets, which has been broken in v1.35.1
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/470
2020-05-12 17:23:43 +03:00
Aliaksandr Valialkin
405cf44aed app/vmagent,lib/promscrape: do not set HostClient.DialDualStack, since it isnt used if HostClient.Dial is set 2020-05-12 15:24:53 +03:00
Aliaksandr Valialkin
da6a84e147 app/vmagent/remotewrite: properly dial TCP6 addresses set via -remoteWrite.url
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/469
2020-05-12 15:24:50 +03:00
Aliaksandr Valialkin
4e237b4670 app/vminsert/influx: support passing AccountID and ProjectID via plain TCP and UDP
Now `vminsert` accepts AccountID and ProjectID via `VictoriaMetrics_AccountID` and `VictoriaMetrics_ProjectID` tags
when reading Influx line protocol data via plain TCP or UDP (i.e. when `-influxListenAddr` is set).
2020-05-12 13:13:04 +03:00
Aliaksandr Valialkin
f7753b1469 lib/storage: gradually pre-populate per-day inverted index for the next day
This should prevent from CPU usage spikes at 00:00 UTC every day when
inverted index for new day must be quickly created for all the active time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
2020-05-12 12:13:32 +03:00
Roman Khavronenko
0157566fdb vmalert: cleanup and restructure of code to improve maintainability (#471)
The change introduces new entity `manager` which replaces
`watchdog`, decouples requestHandler and groups. Manager
supposed to control life cycle of groups, rules and
config reloads.

Groups export an ID method which returns a hash
from filename and group name. ID supposed to be unique
identifier across all loaded groups.

Some tests were added to improve coverage.

Bug with wrong annotation value if $value is used in
 templates after metrics being restored fixed.

Notifier interface was extended to accept context.

New set of metrics was introduced for config reload.
2020-05-11 14:35:55 +03:00
Nikolay Khramchikhin
0e8c345ffb vmalert config reload
added config hot reload for vmalert with sighup and api call
2020-05-11 14:35:50 +03:00
Aliaksandr Valialkin
6646b380ef docs/vmauth.md: fix a link to docker images 2020-05-08 14:11:10 +03:00
Aliaksandr Valialkin
28ad350a31 app/vmagent: return 200 from /-/reload endpoint as Prometheus does 2020-05-07 19:29:48 +03:00
Aliaksandr Valialkin
3052b479b7 lib/httpserver: reduce typical duration for http server graceful shutdown
Previously the duration for graceful shutdown for http server could take more than a minute
because of imporperly set timeouts in setNetworkTimeout.
Now typical duration for graceful shutdown should be reduced to less than 5 seconds.
2020-05-07 14:16:38 +03:00
Aliaksandr Valialkin
dc04040781 docs/{vmagent,vmauth}: small clarifications in the docs 2020-05-07 12:55:06 +03:00
Aliaksandr Valialkin
2b403d3f42 app/vmauth: prevent from attacks with .. in path for accessing resources outside the configured url_prefix 2020-05-07 12:55:04 +03:00
Aliaksandr Valialkin
20538a2a5d app/vmagent: allow setting independent auth configs per each configured -remoteWrite.url 2020-05-06 16:52:32 +03:00
Aliaksandr Valialkin
12dbb9e22c app/vmagent: properly set client-side TLS certificates for -remoteWrite.url. Previously they were mistakenly set as server-side 2020-05-06 16:50:37 +03:00
Aliaksandr Valialkin
8665c2edb1 docs/vmagent.md: small fixes 2020-05-06 14:49:25 +03:00
Aliaksandr Valialkin
8ab5e47b5c lib/promscrape: add Prometheus-compatible DNS-based service discovery aka dns_sd_configs 2020-05-06 00:02:41 +03:00
Aliaksandr Valialkin
21b91599c2 docs/{vmauth,vmagent}: fix ports for profiling 2020-05-05 20:16:09 +03:00
Aliaksandr Valialkin
309700ab8c docs/vmauth.md: mention that we can help creating customized proxy 2020-05-05 12:34:08 +03:00
Aliaksandr Valialkin
20e958789a docs/{vmagent,vmauth}: add Profiling section 2020-05-05 11:45:29 +03:00
Aliaksandr Valialkin
1153f30fee docs: add vmauth.md 2020-05-05 11:17:45 +03:00
Aliaksandr Valialkin
782fb30cd0 app/vmauth: build fixes 2020-05-05 11:03:25 +03:00
Aliaksandr Valialkin
de31d16154 app/vmauth: add initial version of vmauth. See https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmauth/README.md for details 2020-05-05 10:56:20 +03:00
Aliaksandr Valialkin
61df59b9ea docs/vmagent.md: /targets page doesnt expose infomration about imporperly configured scrape configs now. It is written in error log instead 2020-05-05 10:56:18 +03:00
Roman Khavronenko
abce2b092f app/vmalert: restore alerts state from datasource metrics (#461)
* app/vmalert: restore alerts state from datasource metrics

Vmalert will restore alerts state for rules that have `rule.For` > 0 from previously written timeseries via `remotewrite.url` flag.

* app/vmalert: mention remotewerite and remoteread configuration in README
2020-05-05 00:52:19 +03:00
Aliaksandr Valialkin
89aa6dbf56 lib/promscrape: add Prometheus-compatible service discovery for Consul aka consul_sd_configs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/330
2020-05-04 20:53:06 +03:00
Aliaksandr Valialkin
b21b73115a app/vminsert: add /-/reload handler in the same way as for vmagent 2020-04-30 02:18:08 +03:00
DexterZhang
ae215e5538 feat(vmagent): add promscrap config reload suppport via http (#450)
* feat(vmagent): add promscrap config reload suppport via http endpoint `/-/reload`

* fix: typo fix
2020-04-30 02:18:01 +03:00
Artem Navoiev
121f7e1d56 Update README.md 2020-04-29 17:41:04 +03:00
Aliaksandr Valialkin
b6d88bac04 vendor: use github.com/VictoriaMetrics/fasthttp instead of github.com/fasthttp/fasthttp
The upstream fasthttp may contain issues like 996610f021 ,
plus a code that isn't used by VictoriaMetrics. So let's use a private copy under our control instead.
2020-04-29 16:43:09 +03:00
Aliaksandr Valialkin
9ed4951ec8 lib/metricsql: move it to a separate repository - github.com/VictoriaMetrics/metrics 2020-04-28 15:30:06 +03:00
Aliaksandr Valialkin
cd1145e5f4 app/vmselect: add -search.estimatedSeriesCountAfterAggregation command-line flag for tuning the probability of OOMs or false-positive not enough memory errors 2020-04-28 12:51:48 +03:00
Aliaksandr Valialkin
a858b7e393 app/vmalert: added missing comments for public entities 2020-04-28 11:19:48 +03:00
Aliaksandr Valialkin
716bbe79d4 app/vminsert/netstorage: increase timeout for waiting for ack message after sending big data block to vmstorage 2020-04-28 11:19:46 +03:00
Aliaksandr Valialkin
50af16baf2 app/vmalert: fix build 2020-04-28 00:34:01 +03:00
Aliaksandr Valialkin
e3db2c73a6 app/vmalert: sync with master branch 2020-04-28 00:19:42 +03:00
Aliaksandr Valialkin
7644f40763 app/vmalert: include it into the next release 2020-04-28 00:11:41 +03:00
Aliaksandr Valialkin
86a1d9cb0c lib/promscrape: add initial support for Prometheus-compatible service discovery for Amazon EC2 aka ec2_sd_configs 2020-04-27 19:29:22 +03:00
Aliaksandr Valialkin
0daa37fa02 lib/promscrape/discovery/gce: allow empty project and zone for gce_sd_config 2020-04-27 11:45:45 +03:00
Aliaksandr Valialkin
989d84cf3f app/{vminsert,vmstorage}: wait for ack from vmstorage after each packet sent to it from vminsert
This should protect from possible data loss when `vmstorage` is stopped while the packet is sent from `vminsert`.

This commit switches to new protocol between vminsert and vmstorage, which is incompatible
with the previous protocol. So it is required that both vminsert and vmstorage nodes are updated.
2020-04-27 09:53:26 +03:00
Aliaksandr Valialkin
e933cbac16 lib/storage: postpone reading data from blocks during search
This eliminates the need for storing block data into temporary files on a single-node VictoriaMetrics
during heavy queries, which touch big number of time series over long time ranges.

This improves single-node VM performance on heavy queries by up to 2x.
2020-04-27 08:44:01 +03:00
Aliaksandr Valialkin
23a310cc68 app/vmselect/netstorage: substitute sorting packedTimeseries with the natural order of the fetched blocks
This should minimize the number of disk seeks when reading data from temporary file.
2020-04-26 16:46:17 +03:00
Aliaksandr Valialkin
31861c5b8e lib/promscrape/discovery/gce: allow empty zone arg in gce_sd_config - in this case zones for the given project are automatically discovered 2020-04-26 14:37:38 +03:00
Aliaksandr Valialkin
d9bdda408c docs/{vmbackup,vmrestore}.md: update -help output 2020-04-24 22:44:45 +03:00
Jason Gardner
7a6b2839b4 app/vmbackup: added ability to create and delete snapshots during backup (#428)
* app/vmbackup: added ability to create and delete snapshots during backup

Resolves: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/422

* Add snapshot create and delete url flags

* Fixed errcheck warnings in build
2020-04-24 22:35:50 +03:00
Aliaksandr Valialkin
32b3f959fc app/vmselect: fix description for -search.resetCacheAuthKey 2020-04-24 19:44:35 +03:00
Aliaksandr Valialkin
069690e3bd lib/promscrape: initial implementation for gce_sd_configs aga Prometheus-compatible service discovery for Google Compute Engine 2020-04-24 17:53:43 +03:00
Aliaksandr Valialkin
f9526809e5 app/vmselect: add /api/v1/status/tsdb page with useful stats for locating root cause for high cardinality issues
See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/268
2020-04-22 22:03:23 +03:00
Aliaksandr Valialkin
b59f1f1504 app/vmselect: add -search.minStalenessInterval command-line flag for removing gaps on graphs built from time series with irregular duration between samples
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/426
2020-04-20 19:42:41 +03:00
Aliaksandr Valialkin
603d4c9217 app/vmselect: merge -search.maxLookback and -search.maxStalenessInterval flags, since it has been appeared they have identical purpose :(
Leave both flags for backwards compatibility reasons.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/209
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/426
2020-04-20 19:28:28 +03:00
Aliaksandr Valialkin
db5fe03170 deployment/docker: allow building docker images on top of any base image set via ROOT_IMAGE environment var
For example, the following command will build VictoriaMetrics docker image on top of alpine image:

    ROOT_IMAGE=alpine make package-victoria-metrics
2020-04-20 01:16:21 +03:00
Aliaksandr Valialkin
1b911f6965 app/vmagent/remotewrite: retry sending data if the server closes keep-alive connection
This should fix the following error when sending data to remote storage:

couldn't send a block with size XX bytes to "YYY": the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection
2020-04-17 15:53:17 +03:00
Aliaksandr Valialkin
9105f72f17 docs/vmagent.md: typo fix: unvailable -> unavailable 2020-04-17 13:12:13 +03:00
Aliaksandr Valialkin
d46311fd93 app/vmagent/README.md: mention about prodmscrape.suppressScrapeErrors 2020-04-17 13:09:08 +03:00
Aliaksandr Valialkin
b9b5641c2f app/vmselect: properly apply -search.maxLookback to queries sent to /api/v1/query 2020-04-17 12:31:18 +03:00
Aliaksandr Valialkin
d4bc60d63c lib/logger: add WARN level for logging expected errors such as invalid user queries 2020-04-15 20:50:45 +03:00
Aliaksandr Valialkin
a873b553cf app/vmselect: handle timestamp(metric offset X) the same way as Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/415
2020-04-15 12:01:05 +03:00
Aliaksandr Valialkin
6ec582acb9 lib/promscrape: show information on improperly configured scrape targets at the bottom of /targets page
This is a common error whith improperly configured target autodiscovery and/or relabeling.
This error leads to duplicate scraping of the same targets with the same set of labels, which leads
to duplicate samples in time series.
2020-04-14 14:55:13 +03:00
Aliaksandr Valialkin
755f649c72 docs/vmagent.md: mention that vmagent supports kubernetes_sd_configs now 2020-04-13 21:07:00 +03:00
Aliaksandr Valialkin
38256bd66d docs: update minimum supported Go version from 1.12 to 1.13 2020-04-07 13:39:15 +03:00
Aliaksandr Valialkin
2b4d3effad app/vmagent/remotewrite: add "X-Prometheus-Remote-Write-Version: 0.1.0" http header to remote_write request
This header is required by Cortex (and, probably, other remote storage systems).
See 9c1f44d090/docs/apis.md (remote-api) .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/399
2020-04-04 16:24:47 +03:00
Aliaksandr Valialkin
87da127fbf app/victoria-metrics: remove accidentally added testdata for single-node VM 2020-04-04 16:09:08 +03:00
Aliaksandr Valialkin
a012f6fe70 app/vmselect/promql: keep metric name after applying first_over_time and last_over_time functions 2020-04-04 14:54:02 +03:00
Aliaksandr Valialkin
a53e332a93 app/vmstorage: add missing shutdown for http server on graceful shutdown
This could result in the following panic during graceful shutdown when `/metrics` page is requested:

http: panic serving 10.101.66.5:57366: runtime error: invalid memory address or nil pointer dereference
goroutine 2050 [running]:
net/http.(*conn).serve.func1(0xc00ef22000)
	net/http/server.go:1772 +0x139
panic(0xa0fc00, 0xe91d80)
	runtime/panic.go:973 +0x3e3
github.com/VictoriaMetrics/VictoriaMetrics/lib/workingsetcache.(*Cache).UpdateStats(0x0, 0xc0000516c8)
	github.com/VictoriaMetrics/VictoriaMetrics/lib/workingsetcache/cache.go:224 +0x37
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*indexDB).UpdateMetrics(0xc00b931d00, 0xc02c41acf8)
	github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/index_db.go:258 +0x9f
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*Storage).UpdateMetrics(0xc0000bc7e0, 0xc02c41ac00)
	github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/storage.go:413 +0x4c5
main.registerStorageMetrics.func1(0x0)
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:186 +0xd9
main.registerStorageMetrics.func3(0xc00008c380)
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:196 +0x26
main.registerStorageMetrics.func7(0xc)
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:211 +0x26
github.com/VictoriaMetrics/metrics.(*Gauge).marshalTo(0xc000010148, 0xaa407d, 0x20, 0xb50d60, 0xc005319890)
	github.com/VictoriaMetrics/metrics@v1.11.2/gauge.go:38 +0x3f
github.com/VictoriaMetrics/metrics.(*Set).WritePrometheus(0xc000084300, 0x7fd56809c940, 0xc005319860)
	github.com/VictoriaMetrics/metrics@v1.11.2/set.go:51 +0x1e1
github.com/VictoriaMetrics/metrics.WritePrometheus(0x7fd56809c940, 0xc005319860, 0xa16f01)
	github.com/VictoriaMetrics/metrics@v1.11.2/metrics.go:42 +0x41
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.writePrometheusMetrics(0x7fd56809c940, 0xc005319860)
	github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/metrics.go:16 +0x44
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper(0xb5a120, 0xc005319860, 0xc005018f00, 0xc00002cc90)
	github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:154 +0x58d
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.gzipHandler.func1(0xb5a120, 0xc005319860, 0xc005018f00)
	github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:119 +0x8e
net/http.HandlerFunc.ServeHTTP(0xc00002d110, 0xb5a660, 0xc0044141c0, 0xc005018f00)
	net/http/server.go:2012 +0x44
net/http.serverHandler.ServeHTTP(0xc004414000, 0xb5a660, 0xc0044141c0, 0xc005018f00)
	net/http/server.go:2807 +0xa3
net/http.(*conn).serve(0xc00ef22000, 0xb5bf60, 0xc010532080)
	net/http/server.go:1895 +0x86c
created by net/http.(*Server).Serve
	net/http/server.go:2933 +0x35c
2020-04-02 21:09:55 +03:00
Aliaksandr Valialkin
3b744f3c32 app/vmstorage: typo fix 2020-04-01 23:43:09 +03:00
Aliaksandr Valialkin
f838cdc86e app/vmstorage: add vm_free_disk_space_bytes metric for monitoring the remaining disk space at -storageDataPath 2020-04-01 23:10:44 +03:00
Aliaksandr Valialkin
5270b7a097 app/victoria-metrics/testdata: add a test for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/395 2020-03-31 12:51:46 +03:00
Aliaksandr Valialkin
d450249955 lib/storage: properly handle {label=~"foo|"} filters as Prometheus does
Such filters must match all the time series with `label="foo"` plus all the time series without `label`

Previously only time series with `label="foo"` were matched.
2020-03-30 20:21:47 +03:00
Aliaksandr Valialkin
c6cbc0bd19 app/vmselect/prometheus: allow passing relative time to start, end and time args of /api/v1/* queries 2020-03-29 21:56:52 +03:00
Aliaksandr Valialkin
cb8696699a app/vmselect/prometheus: code simplification: (d.Seconds()/1e3) -> d.Milliseconds() 2020-03-29 21:50:35 +03:00
Aliaksandr Valialkin
ceb6d1459f docs/vmagent.md: add prometheus remote_write proxy use case 2020-03-28 23:17:41 +02:00
Dmitry Naumov
b84071fc25
Rootless docker images by default (#358)
* Rootless docker images by default

* Migrate to rootless base image

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2020-03-27 21:18:32 +02:00
Aliaksandr Valialkin
58cb7fc476 app/vmselect: adjust label_map() handling for corner cases
The following corner cases now supported:
* label_map(q, "label", "", "foo") - adds `label="foo"` to series with missing `label`
* label_map(q, "label", "foo", "") - removes `label="foo"` from series

All the unmatched labels are kept unchanged.
2020-03-13 18:41:52 +02:00
Aliaksandr Valialkin
0e7a71a245 app/vmselect: add label_map(q, label, srcValue1, dstValue1, ... srcValueN, dstValueN) function to MetricsQL
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/369
2020-03-12 19:13:56 +02:00
Aliaksandr Valialkin
50555d89d3 app/vmselect: add -search.maxStalenessInterval for tuning Prometheus data model closer to Influx-style data model 2020-03-11 16:44:03 +02:00
Aliaksandr Valialkin
b46af9678e app/vmagent: mention that vmagent can filter data 2020-03-11 16:23:10 +02:00
Aliaksandr Valialkin
8939c19281 app/vmstorage: return 500 status code instead of 200 status code on internal errors inside /snapshot/* handlers 2020-03-10 23:54:27 +02:00
Aliaksandr Valialkin
f6410ff2bf app/vmselect: add optional max_rows_per_line query arg to /api/v1/export
This arg allows limiting the number of data points that may be exported on a single line.
2020-03-10 21:47:43 +02:00
Aliaksandr Valialkin
2f0a36044c app/{vmagent,vminsert}: add support for importing csv data via /api/v1/import/csv 2020-03-10 21:17:40 +02:00
Aliaksandr Valialkin
3fc6599aa2 app/vmagent: properly apply -remoteWrite.sendTimeout to fasthttp.HostClient 2020-03-09 13:31:22 +02:00
Aliaksandr Valialkin
47e986c26f app/vmagent: properly add labels set via -remoteWrite.label to metrics before sending them to -remoteWrite.url 2020-03-06 19:28:14 +02:00
Aliaksandr Valialkin
0d893eff36 Makefile: add build and test rules with enabled race detector. These rules have -race suffix
Fix also `unsafe pointer conversion` errors detected by Go1.14. See https://golang.org/doc/go1.14#compiler .
2020-03-05 12:05:16 +02:00
Aliaksandr Valialkin
0176fc4206 app/vmagent/README.md: small fixes 2020-03-04 18:15:24 +02:00
Aliaksandr Valialkin
ac03be5a2c app/vmagent/README.md: typo fix 2020-03-04 18:05:43 +02:00
Aliaksandr Valialkin
9354b9177a app/vmagent/README.md: clarification 2020-03-04 18:04:06 +02:00
Aliaksandr Valialkin
4302555228 app/vmagent/README.md: add iot and edge monitoring use case 2020-03-04 18:01:40 +02:00
Aliaksandr Valialkin
ea5904fd76 app/vmagent/README.md: add use cases section 2020-03-04 17:42:57 +02:00
Aliaksandr Valialkin
f01d1bf4a8 app/vmagent: add -remoteWrite.maxDiskUsagePerURL for limiting the maximum disk usage for each -remoteWrite.url buffer
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/352
2020-03-03 19:49:20 +02:00
Aliaksandr Valialkin
808c17e250 app/vmagent/remotewrite: do not reset empty relabelCtx 2020-03-03 15:01:21 +02:00
Aliaksandr Valialkin
af19ca2483 app/vmagent: add -remoteWrite.urlRelabelConfig for applying individual relabeling for each -remoteWrite.url
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/320
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/308
2020-03-03 13:13:06 +02:00
Aliaksandr Valialkin
d23df53ba2 app/vmselect/prometheus: do not add __name__!= filter when searching for all the matching metric names via /api/v1/label/__name__/values with non-empty label filter
This should reduce query time.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/343
2020-02-28 23:36:38 +02:00
Aliaksandr Valialkin
0eed71c7f4 app/vmagent/remotewrite: yet another typo fix 2020-02-28 20:07:00 +02:00
Aliaksandr Valialkin
6cdc97a53f app/vmagent/remotewrite: typo fix 2020-02-28 19:05:11 +02:00
Aliaksandr Valialkin
cc39c9d74b app/vmagent/remotewrite: limit memory usage when big scrape blocks are pushed to remote storage 2020-02-28 18:58:13 +02:00
Aliaksandr Valialkin
45d21d18a8 docs: add a doc for vmagent 2020-02-28 12:23:44 +02:00
Aliaksandr Valialkin
8fa1cd24d8 app/vmselect/prometheus: properly pass filter for labelName=__name__ in labelValuesWithMatches
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/343
2020-02-28 12:17:30 +02:00
Aliaksandr Valialkin
cf9aee4ec3 all: properly split vm_deduplicated_samples_total among cluster components
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/345
2020-02-27 23:47:51 +02:00
Aliaksandr Valialkin
1286cead75 app/vminsert: properly initialize InsertCtx
This should prevent from panic described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/339
2020-02-26 21:21:02 +02:00
Aliaksandr Valialkin
0597f1e39a app/vmagent: allow setting -httpListenAddr to empty string in order to disable listening for http requests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/340
2020-02-26 20:58:26 +02:00
Aliaksandr Valialkin
266101feb4 app/vmagent/README.md: list service discovery mechanisms, which will be added soon
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/334
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/330
2020-02-26 19:28:19 +02:00
Aliaksandr Valialkin
c6c7843e93 app/vmagent: add -remoteWrite.maxBlockSize command-line flag for limiting the maximum size of unpacked block to send to remote storage 2020-02-25 19:58:11 +02:00
Aliaksandr Valialkin
c4194020ef app/vmagent: do not allow sending unpacked requests with sizes exceeding -maxInsertRequestSize 2020-02-25 19:35:43 +02:00
Aliaksandr Valialkin
2471340e0d app/vmagent: add ability to accept Influx line protocol data via TCP and UDP
Just set `-influxListenAddr` command-line flag

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/333
2020-02-25 19:18:01 +02:00
Aliaksandr Valialkin
f96fb93ca5 app/vmagent: add a counter for /targets handler calls 2020-02-25 18:17:25 +02:00
Aliaksandr Valialkin
25c570dae7 app/vmagent/README.md: mention that vmagent exposes target statuses at /targets page 2020-02-25 18:16:08 +02:00
Aliaksandr Valialkin
ca28a3e805 app/vmagent: logo fix 2020-02-25 00:09:55 +02:00
Aliaksandr Valialkin
777a39f7a1 app/vmagent: update docs 2020-02-25 00:09:53 +02:00
Aliaksandr Valialkin
61e67b8922 app/vmagent/README.md: small fixes 2020-02-24 21:26:12 +02:00
Aliaksandr Valialkin
6ca1e58d98 app/vmselect/promql: properly take into account the first datapoint when calculating rollup_candlestick 2020-02-24 13:25:07 +02:00
Aliaksandr Valialkin
b58e3fc8a9 app/vmselect/promql: do not take into account values outside the current window in rollup_candlestick
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/309
2020-02-23 18:06:26 +02:00
Yaroslav
c69d4b01f0 fix rollupOpen(), rollupHigh(), rollupLow() functions (#328) 2020-02-23 18:06:25 +02:00
Aliaksandr Valialkin
7ee7614e90 app/vmagent: initial implementation for vmagent 2020-02-23 17:31:54 +02:00
Aliaksandr Valialkin
f22aefdb16 app/vmselect/promql: log when rollupResult cache is cleared 2020-02-21 20:06:53 +02:00
Aliaksandr Valialkin
d5c2a0ce64 app/vmselect: add -search.cacheTimestampOffset command-line flag
This flag can be used for removing gaps on graphs if the difference between the current time
and the timestamps from the ingested data exceeds 5 minutes.

This is the case when the time between data sources and VictoriaMetrics is improperly synchronized.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/312
2020-02-21 14:02:15 +02:00
Aliaksandr Valialkin
c70822db50 app/vmselect: add /internl/resetRollupResultCache handler for resetting response cache
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/312
2020-02-21 14:02:12 +02:00
Aliaksandr Valialkin
afecb34491 app/vmstorage: limit the maximum error message size before sending it to client
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/315
2020-02-13 17:33:12 +02:00
Aliaksandr Valialkin
846d7fa7e9 app/vmselect: add sort_by_label(q, label) and sort_by_label_desc(q, label) functions
This is implementation of https://github.com/prometheus/prometheus/pull/1533 for VictoriaMetrics.
2020-02-13 17:01:50 +02:00
Aliaksandr Valialkin
347aaba79d lib/{storage,mergeset}: use time.Ticker instead of time.Timer where appropriate
It has been appeared that time.Timer was used in places where time.Ticker must be used instead.
This could result in blocked goroutines as in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/316 .
2020-02-13 13:21:48 +02:00
Aliaksandr Valialkin
6e0013ca39
app/vmselect/prometheus: typo fix in -latencyOffset description 2020-02-12 14:00:38 +02:00
Edouard Hur
e8f92a4ee8
Cluster - prometheus metrics fix (#314)
* add missing '/{}' in prom query range requests

* fix missing leading '/' on prom lavelValuesErrors path
2020-02-10 22:15:21 +02:00
Aliaksandr Valialkin
1010a57882 all: allow setting flags via environment vars
Now flags can be set via environment vars with the same names as flags.
Command-line flags override flags set via env vars.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/311
2020-02-10 13:31:21 +02:00
Aliaksandr Valialkin
ea66212c93 lib/storage: move -dedup.minScrapeInterval flag outside lib/storage, so it doesnt show up in vminsert in cluster version 2020-02-10 13:07:25 +02:00
Aliaksandr Valialkin
e6d9ea3094 app/vmselect/promql: do not add step to range end, since this hack became obsolete since commit 9e1119dab8 2020-02-05 21:23:44 +02:00
Aliaksandr Valialkin
4a1de7fee9 app/vmselect/promql: properly adjust time range for data to select
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/309
2020-02-05 21:23:43 +02:00
Aliaksandr Valialkin
8e77b54846 app/vmselect: unconditionally offset -step to rollup_candlestick. This makes results more consistent 2020-02-04 23:31:47 +02:00
Aliaksandr Valialkin
ce38b176bc app/vmselect/promql: automatically apply offset -step to rollup_candlestick function in order to obtain the expected OHLC results
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/309
2020-02-04 23:24:04 +02:00
Aliaksandr Valialkin
4f7116d1ee app/vmselect/promql: adjust rollup_candlestick calculations to the exepcted results
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/309
2020-02-04 22:43:28 +02:00
Aliaksandr Valialkin
ccd3aa4f15 app/vmselect: take into account the time the requests wait in the queue if -search.maxConcurrentRequests is exceeded
This will prevent from excess CPU usage for timed out queries.
2020-02-04 16:20:48 +02:00
Aliaksandr Valialkin
e6bf88a4d4 app/vmselect: add a placeholder for /api/v1/metadata, which could be requested by Grafana
See https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-metadata

VictoriaMetrics doesn't collect any metadata for metrics, so just return empty response.
2020-02-04 15:56:01 +02:00
Aliaksandr Valialkin
7cde594696 all: do not clash flag description with back-quoted flag types
See https://golang.org/pkg/flag/#PrintDefaults for more details.
2020-02-04 15:56:01 +02:00
Aliaksandr Valialkin
45bc6c62f2 app/vmselect/promql: adjust and and unless binary operator handling to be consistent with Prometheus 2020-01-31 18:52:51 +02:00
Aliaksandr Valialkin
e3adc095bd all: add -dedup.minScrapeInterval command-line flag for data de-duplication
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/86
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/278
2020-01-31 01:18:54 +02:00
Aliaksandr Valialkin
cb5c39ee70 lib/fs: optimize small reads for ReaderAt.MustReadAt by reading from memory-mapped space instead of reading from file descriptor
This should improve performance when reading many small blocks.
2020-01-30 15:16:16 +02:00
Aliaksandr Valialkin
4ed5e9a7ce lib/storage: pre-fetch metricNames for the found metricIDs in Search.Init
This should speed up Search.NextMetricBlock loop for big number of found time series.
2020-01-30 15:16:16 +02:00
Aliaksandr Valialkin
170c1c3a4e app/vmselect/promql: add keep_next_value(q) for filling gaps with the next non-empty value 2020-01-29 00:48:14 +02:00
Aliaksandr Valialkin
a9c1d5b351 app/vminsert: moved -maxInsertRequestSize command-line flag out of lib/prompb in order to prevent its inclusion in vmselect and vmstorage apps 2020-01-28 22:53:50 +02:00
Aliaksandr Valialkin
b28c9a3944 app/vmselect/promql: return expected results from increase() over the beginning of time series, which start from big value
Examples for such counters: OS-level counters for network or cpu stats.
2020-01-28 16:31:05 +02:00
Aliaksandr Valialkin
3e304890a6 app/vmselect/promql: fix panic on a single zero vmrange bucket in prometheus_buckets() function
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/296
2020-01-27 18:05:12 +02:00
Aliaksandr Valialkin
4d70a81e18 app/vminsert: do not drop pending rows if all the vmstorage backends are unavailable
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/294
2020-01-24 22:10:10 +02:00
Aliaksandr Valialkin
0cda6afa8e app/vminsert: move ingestion protocol parsers to lib/protoparser, so they could be re-used in the upcoming vmagent 2020-01-24 16:55:18 +02:00
Aliaksandr Valialkin
ea53a21b02 all: consistently log durations in seconds with millisecond precision
This should improve logs readability
2020-01-22 18:35:24 +02:00
Aliaksandr Valialkin
e1a264173a app/vmselect: mention the original query and time range in error messages
This should simplify debugging invalid or heavy queries.
2020-01-22 17:34:35 +02:00
Aliaksandr Valialkin
e127173984 app/vmselect: mention command-line flag, which could be used for adjusting query timeouts, in timeout errors 2020-01-22 15:53:42 +02:00