Commit graph

1143 commits

Author SHA1 Message Date
Aliaksandr Valialkin
e3f61d540b lib/promscrape: limit scrape_timeout by scrape_interval like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1281
2021-05-13 16:10:42 +03:00
匠心零度
d5285ecaf0 fix vagent imbalance problem (#1292)
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=0 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=1 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=2 -promscrape.config=/path/to/config.yml ...

Co-authored-by: lirenzuo <lirenzuo@shein.com>
2021-05-13 11:19:30 +03:00
Aliaksandr Valialkin
f13585dc5d vendor: update github.com/VictoriaMetrics/fasthttp from v1.0.14 to v1.0.15
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289
2021-05-13 10:47:09 +03:00
Aliaksandr Valialkin
d13906bf1f lib/promscrape: exponentially increase retry interval on unsuccesful requests to scrape targets or to service discovery services
This should reduce CPU load at vmagent and at remote side when the remote side doesn't accept HTTP requests.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289
2021-05-13 10:47:07 +03:00
Aliaksandr Valialkin
66c6976723 lib/cgroup: document the ability to detect cgroup v2 memory and cpu limits. This is follow-up for b50024812e 2021-05-13 09:27:35 +03:00
Nikolay
8743bf541f adds cgroupsv2 support (#1283)
* adds cgroupv2 limits support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1269

* small fix

* changes Atoi to ParseUint
2021-05-13 09:27:33 +03:00
Aliaksandr Valialkin
2839055513 lib/storage: substitute GetTSDBStatusForDate with GetTSDBStatusWithFiltersForDate with nil tfss 2021-05-13 09:01:05 +03:00
Aliaksandr Valialkin
008ae25b3a lib/storage: merge getTSDBStatusForDate with getTSDBStatusWithFiltersForDate
These functions are non-trivial, while their code has minimal differences.
It is better from maintainability PoV to merge these functions into a single function.
2021-05-12 18:01:08 +03:00
Nikolay
be87be34a4 Adds tsdb match filters (#1282)
* init work on filters

* init propose for status filters

* fixes tsdb status
adds test

* fix bug

* removes checks from test
2021-05-12 17:16:58 +03:00
Aliaksandr Valialkin
027607db3e lib/promscrape/discovery/kubernetes: refresh endpoints and endpointslices scrape targets every 5 seconds, since they may depend on changed service and pod objects
This should make endpoints and endpointslices scrape targets eventually consistent with the maximum delay of 5 seconds after the related service or pod object changes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-12 14:12:43 +03:00
Aliaksandr Valialkin
1d32b008c6 lib/httpserver: add new X-Server-Hostname header instead of overwriting already exsiting header
This makes possible tracking origins of chained requests over multiple hops.
2021-05-11 23:47:19 +03:00
Aliaksandr Valialkin
f1317f7c6c lib/httpserver: return X-Server-Hostname http header in all the responses for better debuggability 2021-05-11 22:04:41 +03:00
Aliaksandr Valialkin
4e59cf4380 lib/storage: properly apply time range when matching an empty filter
It must match all the time series on the given time range.
Previously it was matched to all the time series without the restriction on the given time range.
2021-05-11 01:09:35 +03:00
Aliaksandr Valialkin
326cf83eb4 lib/storage: remove dead code after the commit 3ccf7ea20c 2021-05-08 20:15:59 +03:00
Aliaksandr Valialkin
9c505d27dd lib/ingestserver: properly close incoming connections during graceful shutdown 2021-05-08 19:53:45 +03:00
Aliaksandr Valialkin
4a5f45c77e app/vminsert: add support for data ingestion via other vminsert nodes 2021-05-08 19:53:45 +03:00
Aliaksandr Valialkin
e6c19cb09d lib/promscrape/discovery/kubernetes: start watchers for pods and services before starting watchers for endpoints
This should eliminate possible race when an update on endpoints depends on pods and/or services, which are missing in the cache yet.
This could result in missing targets based on endpoints or endpointslices.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-05 12:23:16 +03:00
Aliaksandr Valialkin
43c52ff77a lib/storage: use WARNING instead of INFO level for logging dropped labels 2021-05-03 13:57:28 +03:00
Aliaksandr Valialkin
ec6becd3f5 lib/httpserver: stop the process on panics in request handlers
Panics may leave the process in inconsistent state. That's why it is better to stop the process after the panic
instead of recovering from the panic. Unfortunately, the standard net/http.Server recovers panics in request handlers.
See https://github.com/golang/go/issues/16542 . That's lib/httpserver must stop the process on itself after the panic.
2021-05-03 12:00:44 +03:00
Nikolay
62d58324dd adds stalePartsRemover (#1261)
for new created partitions
2021-05-03 11:34:33 +03:00
Aliaksandr Valialkin
60ffbcbb99 lib/promrelabel: add tests for removing the specified {label="value"} pair 2021-05-03 11:26:58 +03:00
Aliaksandr Valialkin
b43ba6d85f lib/storage: log dropped labels if the number of labels in a metric exceeds -maxLabelsPerTimeseries command-line flag value
This should improve debuggability for this case.
2021-05-01 09:29:56 +03:00
Aliaksandr Valialkin
8be1cb297b app/vmagent: list user-visible endpoints at http://vmagent:8429/
While at it, use common WriteAPIHelp function for the listing in vmagent, vmalert and victoria-metrics
2021-04-30 09:38:23 +03:00
Aliaksandr Valialkin
421a92983a lib/promscrape/discovery/kubernetes: remove a mutex at urlWatcher - use groupWatcher mutex for accessing all the urlWatcher children
This simplifies the code a bit and reduces the probability of improper mutex handling and deadlocks.
2021-04-29 10:17:45 +03:00
Nikolay
535b3ff618 vmagent kubernetes_sd tests (#1253)
* first part of tests for kubernetes sd

* makes linter happy

* added more test cases

* adds pub/sub for tests
2021-04-29 10:17:45 +03:00
Aliaksandr Valialkin
e37e1b1e34 lib/{storage,mergeset}: fix unaligned 64-bit atomic operation panic for 32-bit architectures
The panic has been introduced in 56b6b893ce
2021-04-27 16:42:19 +03:00
Aliaksandr Valialkin
2d1d60118d lib/mergeset: split rows ingestion among multiple shards
This improves rows ingestion on systems with many CPU cores by reducing lock contention.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244

Thanks to @waldoweng for the original idea and draft implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1243
2021-04-27 15:45:11 +03:00
Aliaksandr Valialkin
b3da457629 lib/promscrape/discovery/kubernetes: fix a deadlock introduced in eddba29664
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240

Thanks to @f41gh7 for providing the initial idea for deadlock fix at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1248
2021-04-27 14:59:56 +03:00
Aliaksandr Valialkin
cba2d13456 lib/storage: typo fix in info message when deleting the part outside the configured retention
Previously the message was displaying incorrect retention time
2021-04-27 13:33:36 +03:00
Aliaksandr Valialkin
f14412321b lib/persistentqueue: eliminate possible data race when obtaining vm_persistentqueue_bytes_pending metric value 2021-04-27 00:26:32 +03:00
Aliaksandr Valialkin
320983f650 lib/promscrape: apply scrape_timeout on receiving the first response byte for stream_parse: true scrape targets
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1017#issuecomment-767235047
2021-04-23 22:05:00 +03:00
Aliaksandr Valialkin
34321e5f8d lib/promscrape/discovery/kubernetes: refresh role: endpoints targets on service object removal as Prometheus does
This is a follow-up for ae37cfd528

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-04-23 20:27:29 +03:00
Aliaksandr Valialkin
db27dbab5e lib/promscrape/discovery/kubernetes: refresh endpoints and endpointslices targets on service object update like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-04-23 20:12:22 +03:00
Aliaksandr Valialkin
ab8008d6d7 lib/{storage,mergeset}: remove empty directories on startup. Such directories can be left after unclean shutdown on NFS storage
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1142
2021-04-22 13:03:29 +03:00
Aliaksandr Valialkin
6dc5d3b357 all: rename https://victoriametrics.github.io to https://docs.victoriametrics.com 2021-04-20 20:20:01 +03:00
Aliaksandr Valialkin
1088113110 lib/uint64set: remove memory allocation in getOrCreateSmallPool()
This partially reverts fb82c4b9fa

It has been appeared that the additional memory allocation may result in higher GC pauses.
It is better to spend CPU time on copying bigger bucket16 structs instead of increasing query latencies due to higher GC pauses
2021-04-14 12:31:53 +03:00
Aliaksandr Valialkin
72c41323fa lib/storage: code clarification: remove caching the found metricName in searchMetricName 2021-04-13 10:20:35 +03:00
Artem Navoiev
c3dcfdef8c improve docs for cli flags (#1202)
* improve docs for cli flags

* improve docs for cli flags.2
2021-04-12 12:28:36 +03:00
Aliaksandr Valialkin
3b1f0cb3f6 lib/promscrape: create a single swosFunc per scrape_config 2021-04-08 09:31:38 +03:00
Aliaksandr Valialkin
276dbc2133 lib/promscrape: do not spend CPU time on constructing scrapeWork key if clustering is disabled 2021-04-07 21:55:06 +03:00
Aliaksandr Valialkin
59ccc43e3a lib/storage: properly handle big time ranges passed to /api/v1/labels and /api/v1/label/<labelName>/values
It should be faster querying all the labels and/or all the values instead of querying per-day labels/values on time ranges exceeding maxDaysForPerDaySearch
2021-04-07 13:33:10 +03:00
Aliaksandr Valialkin
02b83e0957 lib/promscrape/discovery: remove superflouos check in registerPendingAPIWatchers
The check `_, ok := uw.aws[aw]; !ok` isn't needed, since aw cannot exist in uw.aws
because of the check inside subscribeAPIWatcher
2021-04-07 13:10:04 +03:00
Aliaksandr Valialkin
db56ee0e28 lib/promscrape/discovery/kubernetes: register pending apiWatchers in uw.aws 2021-04-06 11:11:53 +03:00
Aliaksandr Valialkin
edd66b7e82 lib/promscrape/discovery/kubernetes: remove superflouos mustStart and mustStop functions 2021-04-05 22:43:49 +03:00
Lu Jiajing
4ee6def68b fix access to nil *url.URL (#1180)
* fix access to nil *url.URL

Signed-off-by: Megrez Lu <lujiajing1126@gmail.com>

* Update lib/promscrape/discovery/kubernetes/api_watcher.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-04-05 22:26:43 +03:00
Aliaksandr Valialkin
7eca60694e lib/promscrape/discovery/kubernetes: reduce CPU time spent on registering big number of Kubernetes objects shared among big number of scrape jobs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182
2021-04-05 22:05:02 +03:00
Aliaksandr Valialkin
9da2ef3d8f lib/promscrape/discovery/kubernetes: load objects missing in local cache from api seriver in getObjectByRole()
This should fix possible race for `role: endpoints` and `role: endpointslices` service discovery,
when the referred `pod` and `service` objects aren't propagated to urlWatcher cache yet.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182#issuecomment-813353359 for details.
2021-04-05 20:31:22 +03:00
Aliaksandr Valialkin
4d6a3aba4d lib/persistentqueue: delete corrupted persistent queue instead of throwing a fatal error
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1030
2021-04-05 19:26:51 +03:00
Aliaksandr Valialkin
fe084fdd33 lib/promscrape/discovery/kubernetes: synchronously load Kubernetes objects on first access
Remove async registration of apiWatchers, since it breaks discovering `role: endpoints` and `role: endpointslices` targets,
which depend on pod and service objects.

There is no need in reloading `endpoints` and `endpointslices` targets if the referenced `pod` or `service` objects change,
since in this case the corresponding `endpoints` and `endpointslices` objects should also change because they contain
ResourceVersion of the referenced `pod` or `service` objects, which is modified on object update.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182
2021-04-05 14:37:07 +03:00
Aliaksandr Valialkin
c8b7c32e23 lib/proxy: typo fix after a5c5b54c22 2021-04-05 14:37:06 +03:00
Aliaksandr Valialkin
e0b687171b lib/proxy: add support for socks5 over tls proxy 2021-04-05 13:00:42 +03:00
Aliaksandr Valialkin
90da332ea0 lib/promscrape: pass X-Prometheus-Scrape-Timeout-Seconds header to scrape targets as Prometheus does 2021-04-05 12:15:14 +03:00
Nikolay
d0b664454b adds socks5 support for fasthttp client (#1178)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1177

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-04-04 01:43:59 +03:00
Aliaksandr Valialkin
dd19fab7c9 lib/promscrape: properly send full url in GET request via simple HTTP proxy
This is a follow-up for a0ae0f86666a75ec57b45eab2429da7ab4a7b250

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-04 01:20:43 +03:00
Aliaksandr Valialkin
ab9e1eb41f lib/promscrape: support for simple HTTP proxies without CONNECT method support such as https://github.com/prometheus-community/PushProx
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-04 00:40:58 +03:00
Aliaksandr Valialkin
27d8a5d2c0 lib/promscrape: add tests for authorization config, which has been added in df148f48b7 2021-04-03 22:14:03 +03:00
Aliaksandr Valialkin
ae5c20a34c lib/proxy: log response body on non-200 response code
This should improve debuggability for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-03 03:03:41 +03:00
Aliaksandr Valialkin
87700f1259 lib/promscrape: add support for authorization config in -promscrape.config as Prometheus 2.26 does
See https://github.com/prometheus/prometheus/pull/8512
2021-04-02 21:20:37 +03:00
Aliaksandr Valialkin
2825a1e86d lib/promscrape: add follow_redirect option to scrape_configs section like Prometheus does
See https://github.com/prometheus/prometheus/pull/8546
2021-04-02 21:20:37 +03:00
Aliaksandr Valialkin
245eba8896 lib/promscrape/discovery/kubernetes: properly track objects with the same names in multiple namespaces
This is a follow-up for 12e4785fe8

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
2021-04-02 14:46:34 +03:00
Aliaksandr Valialkin
eee860f83d lib/promscrape/discovery/kubernetes: properly discover targets in multiple namespaces
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
2021-04-02 14:29:24 +03:00
Nikolay
00157f3ec6 Adds aws ECS credentials support (#1175) 2021-04-02 13:11:03 +03:00
Aliaksandr Valialkin
512addc608 app/{vminsert,vmagent}: add -sortLabels command-line option for sorting time series labels before ingesting them in the storage
This option can be useful when samples for the same time series are ingested with distinct order of labels.
For example, metric{k1="v1",k2="v2"} and metric{k2="v2",k1="v1"}.
2021-03-31 23:27:21 +03:00
Aliaksandr Valialkin
ae1c653d55 lib/storage: reduce memory usage when ingesting samples for the same time series with distinct order of labels 2021-03-31 21:22:40 +03:00
Aliaksandr Valialkin
887e67f13b lib/uint64set: improve Set.Has() performance scalability on multi-CPU system
Do not update bucket32.hint on Set.Has() call, since it leads to memory ping-pong between CPU cores multi-CPU system
2021-03-29 12:34:19 +03:00
Aliaksandr Valialkin
940a547116 lib/storage: do not update b.nextIdx if no samples are removed because of retention 2021-03-29 12:13:38 +03:00
Aliaksandr Valialkin
7d87d42a91 lib/promscrape/discovery/kubernetes: typo fix in error message 2021-03-26 12:46:33 +02:00
Aliaksandr Valialkin
a920e71809 lib/promscrape/discovery/kubernetes: properly handle too old resource version error message from Kubernetes watch API 2021-03-26 12:28:35 +02:00
Aliaksandr Valialkin
9c2be144cf app/vmselect: log the metric which trigger rollup result cache reset
This should help finding the source of stale metrics
2021-03-25 21:32:28 +02:00
Aliaksandr Valialkin
f971fe86cd lib/storage: tune loopsCountPerMetricNameMatch according to production workload 2021-03-25 13:48:17 +02:00
Aliaksandr Valialkin
6b1f807418 app/vmagent: add -promscrape.consul.waitTime command-line flag for configuring Consul service discovery wait time
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1144
2021-03-23 19:34:12 +02:00
Aliaksandr Valialkin
9947c65df3 lib/storage: do not reload metricName for the same metricID in Search.NextMetricBlock
This should speed up Search.NextMetricBlock a bit
2021-03-23 17:59:34 +02:00
Nikolay
19a40faf8e changes consul_service label value (#1143)
according to prometheus discovery.
 It should mitigate issue with case sensetive services
https://github.com/hashicorp/consul/issues/5707
2021-03-23 15:37:06 +02:00
Aliaksandr Valialkin
12ca0efc19 lib/storage: respect the deadline passed to Storage.SearchMetricNames 2021-03-22 23:03:00 +02:00
Aliaksandr Valialkin
40e47935e7 lib/storage: improve Search.NextMetricBlock performance by using MetricID->MetricName cache 2021-03-22 23:02:59 +02:00
Aliaksandr Valialkin
1618b0ca6d lib/storage: tune loopsCountPerMetricNameMatch 2021-03-22 12:57:54 +02:00
Aliaksandr Valialkin
7503111feb lib/storage: small code simplification after 6cee5338b2 2021-03-18 15:22:39 +02:00
Aliaksandr Valialkin
4443254fb9 lib/storage: prevent from infinite loop if {__graphite__="..."} filter matches a metric name with *, [ or { chars
The idea has been borrowed from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1137
2021-03-18 14:57:39 +02:00
Aliaksandr Valialkin
e03233f441 lib/fs: reduce the frequency of failed to remove directory ... due to NFS lock log warnings
Log `failed to remove directory ... due to NFS lock` warning only if the directory cannot be removed in one second.
2021-03-18 13:23:43 +02:00
Aliaksandr Valialkin
b859fe7879 lib/storage: faster move heavy filters to the end of list 2021-03-17 15:11:56 +02:00
Aliaksandr Valialkin
5e77a939c2 all: make go vet happy 2021-03-17 00:48:44 +02:00
Aliaksandr Valialkin
41fe707bec lib/storage: limit loops count in order to reduce max CPU usage during filter search 2021-03-17 00:48:44 +02:00
Aliaksandr Valialkin
e2a0c8bd72 lib/storage: do not modify filterLoopsCount stats with loopsCount stats
Such a modification can result in incorrect filter sorting later
2021-03-17 00:48:44 +02:00
Aliaksandr Valialkin
b997f4a418 all: make golangci-lint happy after the commit 6378205415 2021-03-17 00:24:31 +02:00
Aliaksandr Valialkin
8005ba26b9 lib/netutil: enable IPv6 UDP listening if -enableTCP6 command-line flag is passed to VictoriaMetrics
This is a follow-up for 18cfc4be7b

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1131
2021-03-17 00:19:30 +02:00
Nikolay
b293f81eb8 Adds udp6 support for ingest servers (#1134)
with flag -enableUDP6  https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1131
2021-03-17 00:19:29 +02:00
Aliaksandr Valialkin
1cfe3f872f lib/uint64set: optimize bucket16.addMulti a bit 2021-03-16 21:08:38 +02:00
Aliaksandr Valialkin
727ded9d4e lib/storage: time series search optimization according to production workload profiling
Do not pass filter metric ids to getMetricIDsForTagFilter, since it has been appeared that this slows down
the function by multiple times when it finds big number of metricIDs (tens of millions).
2021-03-16 20:08:43 +02:00
Aliaksandr Valialkin
f4a44d6c0d lib/storage: further tuning for time series search 2021-03-16 18:47:29 +02:00
Aliaksandr Valialkin
d074326970 app/vmstorage: add -logNewSeries command-line flag for determining the source of series churn rate 2021-03-15 22:40:28 +02:00
Aliaksandr Valialkin
c4d0c20de2 lib/influxutils: return response compatible with InfluxDB 1.8.4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1124
2021-03-15 22:20:38 +02:00
f41gh7
b7c9f3676e typo fix 2021-03-15 22:20:37 +02:00
Aliaksandr Valialkin
e2717d84c0 all: various fixes in command-line flag descriptions 2021-03-15 22:03:49 +02:00
Aliaksandr Valialkin
776b8b32ca app/{vminsert,vmagent}: a follow-up for b1aa8c3d8f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1124
2021-03-15 22:03:49 +02:00
Aliaksandr Valialkin
fc902734d9 lib/storage: further tuning for time series selector code 2021-03-15 20:32:37 +02:00
Aliaksandr Valialkin
dc70e7de76 lib/uint64set: reduce the size of bucket16 by storing smallPool by pointer.
This reduces CPU time spent on bucket16 copying inside bucket13.addBucketAtPos.
2021-03-15 17:27:03 +02:00
Aliaksandr Valialkin
cc6b1a5941 lib/uint64set: optimize Set.AddMulti for large sorted sets 2021-03-15 17:10:54 +02:00
Aliaksandr Valialkin
f023e92f1a lib/uint64set: optimize bucket16.add and bucket16.addMulti a bit 2021-03-15 16:58:24 +02:00
Aliaksandr Valialkin
1c26020080 lib/storage: tune per-day index search 2021-03-15 13:36:36 +02:00
Aliaksandr Valialkin
7f52aae20c lib/promscrape: an attempt to reduce memory usage when vmagent scrapes targets with varying number of metrics
Do not cache too big byte buffers and too big writeRequestCtx objects,
since it is cheaper to re-create them instead of wasting RAM for their caching.

This reverts 7f6f350ee1
2021-03-15 11:49:29 +02:00
Aliaksandr Valialkin
33cd6c26d3 lib/promscrape: return back the logic for flushing big buffers to storage from the commit 3fd8653b40
This should reduce memory usage when vmagent scrapes targets with big number of metrics and `-promscrape.streamParse` isn't enabled
2021-03-14 22:25:37 +02:00
Aliaksandr Valialkin
894246176f lib/promscrape/discovery/kubernetes: do not start object watcher until initial objects are loaded 2021-03-14 21:56:16 +02:00
Aliaksandr Valialkin
9e55db4a53 lib/promscrape: retry service discovery in a few seconds if it starts returning 0 targets
This should reduce recovery time from temporary issues during service discovery
2021-03-14 21:56:16 +02:00
Aliaksandr Valialkin
3b46ae1c05 lib/promscrape: remove duplicate target word in error message 2021-03-14 21:56:16 +02:00
Aliaksandr Valialkin
b0b28eeb93 lib/promscrape/discovery/kubernetes: further optimize kubernetes service discovery for the case with many scrape jobs
Do not re-calculate labels per each scrape job - reuse them instead for scrape jobs with identical Kubernetes role
2021-03-14 21:16:41 +02:00
Aliaksandr Valialkin
620f05cd2c lib/promscrape/discovery: fixes after 133b288681
- Removed a deadlock in addAPIWatcher
- Do not create unused ScrapeWork objects
- Do not spend CPU resources on creating objectByKey map in addAPIWatcher

This work is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1125
2021-03-13 15:22:38 +02:00
Aliaksandr Valialkin
54f902467d lib/proxy: there is no need in cloning tlsCfg, which has been created two lines above 2021-03-12 10:48:01 +02:00
Aliaksandr Valialkin
72a8fa484b lib/proxy: set proxy address in tls.Config.ServerName instead of the target address
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
2021-03-12 10:41:25 +02:00
Aliaksandr Valialkin
60e0280a94 lib/promscrape: add ability to configure proxy options via proxy_tls_config, proxy_basic_auth, proxy_bearer_token and proxy_bearer_token_file
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
2021-03-12 03:36:11 +02:00
Aliaksandr Valialkin
b2732575f7 lib/storage: further tune filters sorting logic 2021-03-12 00:51:35 +02:00
Aliaksandr Valialkin
8fc29ffc67 lib/promscrape/discovery/kubernetes: use a single watcher per apiURL
Previously multiple scrape jobs could create multiple watchers for the same apiURL. Now only a single watcher is used.
This should reduce load on Kubernetes API server when many scrape job configs use Kubernetes service discovery.
2021-03-11 17:04:14 +02:00
Aliaksandr Valialkin
8b8d4cbcfe lib/proxy: do not show inline basic auth passwords when logging errors related to proxy_url 2021-03-11 13:44:14 +02:00
Aliaksandr Valialkin
41f641b132 lib/promscrape/discovery/kubernetes: localize Bookmark parsing code
This is a follow-up for e772d1c920
2021-03-11 13:08:56 +02:00
Aliaksandr Valialkin
6c9cd3f7c1 lib/promscrape/discovery/kubernetes: reduce load on Kubernetes API server by using watch bookmarks
This allows continuing object watch from the last bookbark instead of reloading all the objects
on watch errors or timeouts.

See https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
2021-03-10 15:08:40 +02:00
Aliaksandr Valialkin
bd8b7a88a7 lib/httpserver: export vm_available_memory_bytes and vm_available_cpu_cores metrics
These metrics are useful for tracking the available memory and CPU cores for VictoriaMetrics apps.
2021-03-10 12:08:26 +02:00
Aliaksandr Valialkin
e15f3f4f2a lib/proxy: pass proxy hostname in Host header of the CONNECT request
This should resolve the following issue when connecting to tls proxy:

  cannot validate certificate for ... because it doesn't contain any IP SANs
2021-03-09 20:41:18 +02:00
Aliaksandr Valialkin
9d8223eafb lib/proxy: set missing ServerName in TLS config for proxy_url.
While at it, allow setting Proxy-Authorization for `proxy_url` via `basic_auth` and `bearer_token` configs.
2021-03-09 19:01:14 +02:00
Nikolay
1310f84122 Changes tlsConfig init for proxy connections (#1121)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
2021-03-09 19:01:13 +02:00
Aliaksandr Valialkin
0554430d7e lib/promscrape: apply sample_limit after metric relabeling is applied as Prometheus does
See the description for `sample_limit` option from Prometheus docs:

Per-scrape limit on number of scraped samples that will be accepted.
If more than this number of samples are present after metric relabeling
the entire scrape will be treated as failed. 0 means no limit.

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
2021-03-09 15:52:41 +02:00
Aliaksandr Valialkin
7b66c8cbf8 lib/promscrape/discovery/kubernetes: remove too verbose logs about starting and stopping the watchers
Log the number of objects loaded per each watch url

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
2021-03-09 15:07:12 +02:00
John Belmonte
edf39aa225 spelling fix: adjacent (#1115) 2021-03-09 09:19:16 +02:00
Aliaksandr Valialkin
502fab797a lib/promscrape: add scrape_offset option to scrape_config
This option can be used for specifying the particular offset per each scrape interval for target scraping
2021-03-08 11:59:32 +02:00
Aliaksandr Valialkin
c4a0bd5eac lib/storage: go fmt 2021-03-08 11:59:31 +02:00
Aliaksandr Valialkin
c76a904bb0 lib/storage: tune loopsCount estimations in getMetricIDsForTagFilterSlow
The adjusted estmations give up to 2x lower median response times on 200qps /api/v1/query_range workload
2021-03-07 21:17:48 +02:00
Aliaksandr Valialkin
c04505e585 lib/promscrape/discovery/kubernetes: reduce memory usage further when big number of scrape jobs are configured for the same kubernetes_sd_config role
Serialize reloading per-role objects, so they don't occupy too much memory when objects for many scrape jobs are simultaneously refreshed.
Do not reload per-role objects if they were already refreshed by concurrent goroutines. This should reduce load on Kubernetes API server
when big number of scrape jobs are configured for the same Kubernetes role.

This is a follow-up for 17b87725ed

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
2021-03-07 20:03:22 +02:00
Aliaksandr Valialkin
175466bb41 lib/decimal: prevent exponent overflow when processing values close to zero
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1114
2021-03-05 18:53:41 +02:00
Aliaksandr Valialkin
5807ff57f3 lib/promscrape/discovery/kubernetes: reduce memory usage when Kubernetes service discovery is configured on a big number of scrape jobs
Previously vmagent was creating a separate Kubernetes object cache per each scrape job.
This could result in increased memory usage when monitoring a Kubernetes cluster with big number of objects (pods / nodes / services, etc.)
as seen at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113

Now it uses a shared map of scrape objects across multiple scrape jobs.
2021-03-05 17:32:33 +02:00
Aliaksandr Valialkin
92ddb8f197 lib/promscrape/discovery/kubernetes: move apiWatcher code to a separate file 2021-03-05 17:32:32 +02:00
Aliaksandr Valialkin
02c0959380 lib/promscrape: make cluster membership calculations consistent across 32-bit and 64-bit architectures 2021-03-05 09:06:08 +02:00
Aliaksandr Valialkin
133fb9fc00 lib/promscrape: add -promscrape.cluster.replicationFactor command-line flag for replicating scrape targets among vmagent instances in the cluster 2021-03-04 10:21:27 +02:00
Aliaksandr Valialkin
bae7a1b47a lib/promscrape/discovery/kubernetes: fix tests after e154f4a644 2021-03-03 22:42:04 +02:00
Nikolay
7d92ef3acd Fix ingress discovery api (#1110) 2021-03-03 10:45:50 +02:00
Aliaksandr Valialkin
25f453ce1a lib/promscrape/discovery/kubernetes: properly check for nil pointer inside interface
See https://mangatmodi.medium.com/go-check-nil-interface-the-right-way-d142776edef1

This fixes a panic when the ScrapeWork is filtered out in swcFunc.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1108
2021-03-03 10:42:54 +02:00
Aliaksandr Valialkin
fea27b9845 lib/promscrape: go fmt 2021-03-02 21:20:23 +02:00
Aliaksandr Valialkin
c67a07b469 lib/handshake: log read/write operation duration on connection errors
This improve debuggability of network errors
2021-03-02 21:20:20 +02:00
Aliaksandr Valialkin
c8dde1fd6b lib/storage: typo fix: umarshal -> unmarshal 2021-03-02 20:48:44 +02:00
Aliaksandr Valialkin
3fbe2bf1c8 lib/promscrape: pre-allocate space for a map in mergeLabels
This should reduce the number of memory allocations when discovering big number of targets
2021-03-02 18:42:44 +02:00
Aliaksandr Valialkin
ac5c47a9f5 lib/promscrape/discovery: properly track vm_promscrape_discovery_kubernetes_objects_removed_total metric 2021-03-02 18:33:29 +02:00
Aliaksandr Valialkin
62d7e07ff7 lib/promrelabel: remove unneded optimizations for labeldrop and labelkeep actions
These optimizations may slow down code execution by matching the same label against regexp two times instead of a single time
2021-03-02 18:01:08 +02:00
Aliaksandr Valialkin
f9c1fe3852 lib/promscrape/discovery/kubernetes: cache ScrapeWork objects as soon as the corresponding k8s objects are changed
This should reduce CPU usage and memory usage when Kubernetes contains tens of thousands of objects
2021-03-02 16:44:19 +02:00
Aliaksandr Valialkin
f686174329 lib/promscrape/discovery/ec2: follow-up after f6114345de 2021-03-02 13:47:35 +02:00
Nikolay
fd8ca7df50 Adds webIndentity token for aws (#1099)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1080
2021-03-02 13:47:34 +02:00
Aliaksandr Valialkin
e45c399467 lib/protoparser/prometheus: properly unescape label values in Prometheus exposition format
Unescape only `\n`, `\"` and `\\` sequences as Prometheus does. Other escape sequences shouldn't be unescaped.
2021-03-02 13:22:10 +02:00
Aliaksandr Valialkin
f4969a624d lib/protoparser/graphite: fix parsing of a Graphite line with empty tags such as foo; 1 2
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1100
2021-03-01 17:17:01 +02:00
Aliaksandr Valialkin
b89a4fac2f lib/promscrape/discovery/kubernetes: deflake tests; a follow-up for 05fb08713c 2021-03-01 14:31:44 +02:00
Aliaksandr Valialkin
c3bf72992f lib/promscrape: explicitly stop and cleanup service discovery routines when new config is read from -promscrape.config
This should reduce memory usage when `-promscrape.config` file frequently changes
2021-03-01 14:15:16 +02:00
Aliaksandr Valialkin
8af9370bf2 lib/promscrape: use target arg in ScrapeWork cache 2021-03-01 12:31:11 +02:00
Aliaksandr Valialkin
5328506c3e lib/promscrape: typo fix, which prevented from caching ScrapeWork entries 2021-03-01 12:13:13 +02:00
Aliaksandr Valialkin
d5058caccc lib/promscrape: add vm_promscrape_scrapework_cache_* metrics for tracking ScrapeWork cache effectiveness 2021-03-01 12:05:58 +02:00
Aliaksandr Valialkin
ee5d26a546 lib/httpserver: make make errcheck happy after the commit 9fc7726d84 2021-03-01 00:35:30 +02:00