Commit graph

324 commits

Author SHA1 Message Date
Aliaksandr Valialkin
320983f650 lib/promscrape: apply scrape_timeout on receiving the first response byte for stream_parse: true scrape targets
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1017#issuecomment-767235047
2021-04-23 22:05:00 +03:00
Aliaksandr Valialkin
34321e5f8d lib/promscrape/discovery/kubernetes: refresh role: endpoints targets on service object removal as Prometheus does
This is a follow-up for ae37cfd528

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-04-23 20:27:29 +03:00
Aliaksandr Valialkin
db27dbab5e lib/promscrape/discovery/kubernetes: refresh endpoints and endpointslices targets on service object update like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-04-23 20:12:22 +03:00
Aliaksandr Valialkin
6dc5d3b357 all: rename https://victoriametrics.github.io to https://docs.victoriametrics.com 2021-04-20 20:20:01 +03:00
Aliaksandr Valialkin
3b1f0cb3f6 lib/promscrape: create a single swosFunc per scrape_config 2021-04-08 09:31:38 +03:00
Aliaksandr Valialkin
276dbc2133 lib/promscrape: do not spend CPU time on constructing scrapeWork key if clustering is disabled 2021-04-07 21:55:06 +03:00
Aliaksandr Valialkin
02b83e0957 lib/promscrape/discovery: remove superflouos check in registerPendingAPIWatchers
The check `_, ok := uw.aws[aw]; !ok` isn't needed, since aw cannot exist in uw.aws
because of the check inside subscribeAPIWatcher
2021-04-07 13:10:04 +03:00
Aliaksandr Valialkin
db56ee0e28 lib/promscrape/discovery/kubernetes: register pending apiWatchers in uw.aws 2021-04-06 11:11:53 +03:00
Aliaksandr Valialkin
edd66b7e82 lib/promscrape/discovery/kubernetes: remove superflouos mustStart and mustStop functions 2021-04-05 22:43:49 +03:00
Lu Jiajing
4ee6def68b fix access to nil *url.URL (#1180)
* fix access to nil *url.URL

Signed-off-by: Megrez Lu <lujiajing1126@gmail.com>

* Update lib/promscrape/discovery/kubernetes/api_watcher.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-04-05 22:26:43 +03:00
Aliaksandr Valialkin
7eca60694e lib/promscrape/discovery/kubernetes: reduce CPU time spent on registering big number of Kubernetes objects shared among big number of scrape jobs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182
2021-04-05 22:05:02 +03:00
Aliaksandr Valialkin
9da2ef3d8f lib/promscrape/discovery/kubernetes: load objects missing in local cache from api seriver in getObjectByRole()
This should fix possible race for `role: endpoints` and `role: endpointslices` service discovery,
when the referred `pod` and `service` objects aren't propagated to urlWatcher cache yet.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182#issuecomment-813353359 for details.
2021-04-05 20:31:22 +03:00
Aliaksandr Valialkin
fe084fdd33 lib/promscrape/discovery/kubernetes: synchronously load Kubernetes objects on first access
Remove async registration of apiWatchers, since it breaks discovering `role: endpoints` and `role: endpointslices` targets,
which depend on pod and service objects.

There is no need in reloading `endpoints` and `endpointslices` targets if the referenced `pod` or `service` objects change,
since in this case the corresponding `endpoints` and `endpointslices` objects should also change because they contain
ResourceVersion of the referenced `pod` or `service` objects, which is modified on object update.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182
2021-04-05 14:37:07 +03:00
Aliaksandr Valialkin
90da332ea0 lib/promscrape: pass X-Prometheus-Scrape-Timeout-Seconds header to scrape targets as Prometheus does 2021-04-05 12:15:14 +03:00
Aliaksandr Valialkin
dd19fab7c9 lib/promscrape: properly send full url in GET request via simple HTTP proxy
This is a follow-up for a0ae0f86666a75ec57b45eab2429da7ab4a7b250

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-04 01:20:43 +03:00
Aliaksandr Valialkin
ab9e1eb41f lib/promscrape: support for simple HTTP proxies without CONNECT method support such as https://github.com/prometheus-community/PushProx
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-04 00:40:58 +03:00
Aliaksandr Valialkin
27d8a5d2c0 lib/promscrape: add tests for authorization config, which has been added in df148f48b7 2021-04-03 22:14:03 +03:00
Aliaksandr Valialkin
87700f1259 lib/promscrape: add support for authorization config in -promscrape.config as Prometheus 2.26 does
See https://github.com/prometheus/prometheus/pull/8512
2021-04-02 21:20:37 +03:00
Aliaksandr Valialkin
2825a1e86d lib/promscrape: add follow_redirect option to scrape_configs section like Prometheus does
See https://github.com/prometheus/prometheus/pull/8546
2021-04-02 21:20:37 +03:00
Aliaksandr Valialkin
245eba8896 lib/promscrape/discovery/kubernetes: properly track objects with the same names in multiple namespaces
This is a follow-up for 12e4785fe8

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
2021-04-02 14:46:34 +03:00
Aliaksandr Valialkin
eee860f83d lib/promscrape/discovery/kubernetes: properly discover targets in multiple namespaces
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
2021-04-02 14:29:24 +03:00
Nikolay
00157f3ec6 Adds aws ECS credentials support (#1175) 2021-04-02 13:11:03 +03:00
Aliaksandr Valialkin
7d87d42a91 lib/promscrape/discovery/kubernetes: typo fix in error message 2021-03-26 12:46:33 +02:00
Aliaksandr Valialkin
a920e71809 lib/promscrape/discovery/kubernetes: properly handle too old resource version error message from Kubernetes watch API 2021-03-26 12:28:35 +02:00
Aliaksandr Valialkin
6b1f807418 app/vmagent: add -promscrape.consul.waitTime command-line flag for configuring Consul service discovery wait time
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1144
2021-03-23 19:34:12 +02:00
Nikolay
19a40faf8e changes consul_service label value (#1143)
according to prometheus discovery.
 It should mitigate issue with case sensetive services
https://github.com/hashicorp/consul/issues/5707
2021-03-23 15:37:06 +02:00
Aliaksandr Valialkin
5e77a939c2 all: make go vet happy 2021-03-17 00:48:44 +02:00
Aliaksandr Valialkin
b997f4a418 all: make golangci-lint happy after the commit 6378205415 2021-03-17 00:24:31 +02:00
Aliaksandr Valialkin
8005ba26b9 lib/netutil: enable IPv6 UDP listening if -enableTCP6 command-line flag is passed to VictoriaMetrics
This is a follow-up for 18cfc4be7b

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1131
2021-03-17 00:19:30 +02:00
Aliaksandr Valialkin
e2717d84c0 all: various fixes in command-line flag descriptions 2021-03-15 22:03:49 +02:00
Aliaksandr Valialkin
7f52aae20c lib/promscrape: an attempt to reduce memory usage when vmagent scrapes targets with varying number of metrics
Do not cache too big byte buffers and too big writeRequestCtx objects,
since it is cheaper to re-create them instead of wasting RAM for their caching.

This reverts 7f6f350ee1
2021-03-15 11:49:29 +02:00
Aliaksandr Valialkin
33cd6c26d3 lib/promscrape: return back the logic for flushing big buffers to storage from the commit 3fd8653b40
This should reduce memory usage when vmagent scrapes targets with big number of metrics and `-promscrape.streamParse` isn't enabled
2021-03-14 22:25:37 +02:00
Aliaksandr Valialkin
894246176f lib/promscrape/discovery/kubernetes: do not start object watcher until initial objects are loaded 2021-03-14 21:56:16 +02:00
Aliaksandr Valialkin
9e55db4a53 lib/promscrape: retry service discovery in a few seconds if it starts returning 0 targets
This should reduce recovery time from temporary issues during service discovery
2021-03-14 21:56:16 +02:00
Aliaksandr Valialkin
3b46ae1c05 lib/promscrape: remove duplicate target word in error message 2021-03-14 21:56:16 +02:00
Aliaksandr Valialkin
b0b28eeb93 lib/promscrape/discovery/kubernetes: further optimize kubernetes service discovery for the case with many scrape jobs
Do not re-calculate labels per each scrape job - reuse them instead for scrape jobs with identical Kubernetes role
2021-03-14 21:16:41 +02:00
Aliaksandr Valialkin
620f05cd2c lib/promscrape/discovery: fixes after 133b288681
- Removed a deadlock in addAPIWatcher
- Do not create unused ScrapeWork objects
- Do not spend CPU resources on creating objectByKey map in addAPIWatcher

This work is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1125
2021-03-13 15:22:38 +02:00
Aliaksandr Valialkin
60e0280a94 lib/promscrape: add ability to configure proxy options via proxy_tls_config, proxy_basic_auth, proxy_bearer_token and proxy_bearer_token_file
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
2021-03-12 03:36:11 +02:00
Aliaksandr Valialkin
8fc29ffc67 lib/promscrape/discovery/kubernetes: use a single watcher per apiURL
Previously multiple scrape jobs could create multiple watchers for the same apiURL. Now only a single watcher is used.
This should reduce load on Kubernetes API server when many scrape job configs use Kubernetes service discovery.
2021-03-11 17:04:14 +02:00
Aliaksandr Valialkin
41f641b132 lib/promscrape/discovery/kubernetes: localize Bookmark parsing code
This is a follow-up for e772d1c920
2021-03-11 13:08:56 +02:00
Aliaksandr Valialkin
6c9cd3f7c1 lib/promscrape/discovery/kubernetes: reduce load on Kubernetes API server by using watch bookmarks
This allows continuing object watch from the last bookbark instead of reloading all the objects
on watch errors or timeouts.

See https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
2021-03-10 15:08:40 +02:00
Aliaksandr Valialkin
9d8223eafb lib/proxy: set missing ServerName in TLS config for proxy_url.
While at it, allow setting Proxy-Authorization for `proxy_url` via `basic_auth` and `bearer_token` configs.
2021-03-09 19:01:14 +02:00
Nikolay
1310f84122 Changes tlsConfig init for proxy connections (#1121)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
2021-03-09 19:01:13 +02:00
Aliaksandr Valialkin
0554430d7e lib/promscrape: apply sample_limit after metric relabeling is applied as Prometheus does
See the description for `sample_limit` option from Prometheus docs:

Per-scrape limit on number of scraped samples that will be accepted.
If more than this number of samples are present after metric relabeling
the entire scrape will be treated as failed. 0 means no limit.

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
2021-03-09 15:52:41 +02:00
Aliaksandr Valialkin
7b66c8cbf8 lib/promscrape/discovery/kubernetes: remove too verbose logs about starting and stopping the watchers
Log the number of objects loaded per each watch url

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
2021-03-09 15:07:12 +02:00
Aliaksandr Valialkin
502fab797a lib/promscrape: add scrape_offset option to scrape_config
This option can be used for specifying the particular offset per each scrape interval for target scraping
2021-03-08 11:59:32 +02:00
Aliaksandr Valialkin
c04505e585 lib/promscrape/discovery/kubernetes: reduce memory usage further when big number of scrape jobs are configured for the same kubernetes_sd_config role
Serialize reloading per-role objects, so they don't occupy too much memory when objects for many scrape jobs are simultaneously refreshed.
Do not reload per-role objects if they were already refreshed by concurrent goroutines. This should reduce load on Kubernetes API server
when big number of scrape jobs are configured for the same Kubernetes role.

This is a follow-up for 17b87725ed

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
2021-03-07 20:03:22 +02:00
Aliaksandr Valialkin
5807ff57f3 lib/promscrape/discovery/kubernetes: reduce memory usage when Kubernetes service discovery is configured on a big number of scrape jobs
Previously vmagent was creating a separate Kubernetes object cache per each scrape job.
This could result in increased memory usage when monitoring a Kubernetes cluster with big number of objects (pods / nodes / services, etc.)
as seen at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113

Now it uses a shared map of scrape objects across multiple scrape jobs.
2021-03-05 17:32:33 +02:00
Aliaksandr Valialkin
92ddb8f197 lib/promscrape/discovery/kubernetes: move apiWatcher code to a separate file 2021-03-05 17:32:32 +02:00
Aliaksandr Valialkin
02c0959380 lib/promscrape: make cluster membership calculations consistent across 32-bit and 64-bit architectures 2021-03-05 09:06:08 +02:00
Aliaksandr Valialkin
133fb9fc00 lib/promscrape: add -promscrape.cluster.replicationFactor command-line flag for replicating scrape targets among vmagent instances in the cluster 2021-03-04 10:21:27 +02:00
Aliaksandr Valialkin
bae7a1b47a lib/promscrape/discovery/kubernetes: fix tests after e154f4a644 2021-03-03 22:42:04 +02:00
Nikolay
7d92ef3acd Fix ingress discovery api (#1110) 2021-03-03 10:45:50 +02:00
Aliaksandr Valialkin
25f453ce1a lib/promscrape/discovery/kubernetes: properly check for nil pointer inside interface
See https://mangatmodi.medium.com/go-check-nil-interface-the-right-way-d142776edef1

This fixes a panic when the ScrapeWork is filtered out in swcFunc.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1108
2021-03-03 10:42:54 +02:00
Aliaksandr Valialkin
fea27b9845 lib/promscrape: go fmt 2021-03-02 21:20:23 +02:00
Aliaksandr Valialkin
3fbe2bf1c8 lib/promscrape: pre-allocate space for a map in mergeLabels
This should reduce the number of memory allocations when discovering big number of targets
2021-03-02 18:42:44 +02:00
Aliaksandr Valialkin
ac5c47a9f5 lib/promscrape/discovery: properly track vm_promscrape_discovery_kubernetes_objects_removed_total metric 2021-03-02 18:33:29 +02:00
Aliaksandr Valialkin
f9c1fe3852 lib/promscrape/discovery/kubernetes: cache ScrapeWork objects as soon as the corresponding k8s objects are changed
This should reduce CPU usage and memory usage when Kubernetes contains tens of thousands of objects
2021-03-02 16:44:19 +02:00
Aliaksandr Valialkin
f686174329 lib/promscrape/discovery/ec2: follow-up after f6114345de 2021-03-02 13:47:35 +02:00
Nikolay
fd8ca7df50 Adds webIndentity token for aws (#1099)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1080
2021-03-02 13:47:34 +02:00
Aliaksandr Valialkin
b89a4fac2f lib/promscrape/discovery/kubernetes: deflake tests; a follow-up for 05fb08713c 2021-03-01 14:31:44 +02:00
Aliaksandr Valialkin
c3bf72992f lib/promscrape: explicitly stop and cleanup service discovery routines when new config is read from -promscrape.config
This should reduce memory usage when `-promscrape.config` file frequently changes
2021-03-01 14:15:16 +02:00
Aliaksandr Valialkin
8af9370bf2 lib/promscrape: use target arg in ScrapeWork cache 2021-03-01 12:31:11 +02:00
Aliaksandr Valialkin
5328506c3e lib/promscrape: typo fix, which prevented from caching ScrapeWork entries 2021-03-01 12:13:13 +02:00
Aliaksandr Valialkin
d5058caccc lib/promscrape: add vm_promscrape_scrapework_cache_* metrics for tracking ScrapeWork cache effectiveness 2021-03-01 12:05:58 +02:00
Aliaksandr Valialkin
109bfaadad lib/promscrape: reduce CPU usage an memory allocations when constructing scrapeWorkKey 2021-02-28 22:30:36 +02:00
Aliaksandr Valialkin
9a2bf65134 lib/promscrape: add ability to spread scrape targets among multiple vmagent instances
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1084
2021-02-28 18:40:42 +02:00
Aliaksandr Valialkin
3e44d9947e lib/promscrape/discovery/kubernetes: properly account the number of objects when watcher is stopped
A follow-up for b21b110b7a
2021-02-28 17:06:49 +02:00
Aliaksandr Valialkin
0ef7a94056 lib/promscrape/discovery/kubernetes: add vm_promscrape_discovery_kubernetes_* metrics for monitoring internal state of k8s service discovery 2021-02-28 16:58:45 +02:00
Aliaksandr Valialkin
f52bdbe2a3 lib/promscrape/discovery/kubernetes: remove resourceVersionMatch=NotOlderThan query arg when watching for k8s object changes, since it cannot be used when watch=1 query arg is passed 2021-02-28 16:08:43 +02:00
Aliaksandr Valialkin
6f8866a40a lib/promscrape: fix possible deadlock in parallel execution of target relabeling 2021-02-28 16:05:32 +02:00
Aliaksandr Valialkin
5c9e657808 lib/promscrape/discovery/kubernetes: fix deadlock in startWatcherForURL
reloadObjects must be called without holding aw.mu lock
2021-02-28 15:25:33 +02:00
Aliaksandr Valialkin
e77f2f8630 lib/promscrape/discovery/kubernetes: typo fix after 241ffd1f3b 2021-02-28 15:15:27 +02:00
Aliaksandr Valialkin
82441537ff lib/promscrape/discovery/kubernetes: pre-populate labelsByKey in reloadObject() 2021-02-28 15:09:43 +02:00
Aliaksandr Valialkin
e003453941 lib/promscrape/discovery/kubernetes: compare sorted sets of labels in tests
This should deflake tests where the order of labels isn't stable
2021-02-28 14:12:32 +02:00
Aliaksandr Valialkin
6a21ef87b7 lib/promscrape: add missing startWatchersForRole() call at the beginning of apiWatcher.getLabelsForRole 2021-02-28 14:00:00 +02:00
Aliaksandr Valialkin
6d0e7fb8b0 lib/promscrape/discovery/kubernetes: reload k8s resources on every error
This is needed for obtaining fresh resourceVersion
2021-02-27 01:46:59 +02:00
Aliaksandr Valialkin
44975e28fe lib/promscrape: yet another typo fix after ed8441ec52 2021-02-26 23:36:34 +02:00
Aliaksandr Valialkin
50e74c439c lib/promscrape: typo fix after ed8441ec52 2021-02-26 23:04:40 +02:00
Aliaksandr Valialkin
fa3ce450fb lib/promscrape: cache ScrapeWork
This should reduce the time needed for updating big number of scrape targets.
2021-02-26 21:43:41 +02:00
Aliaksandr Valialkin
efcdf613c2 lib/promscrape/discovery/kubernetes: cache target labels
This should reduce CPU usage on repeated SDConfig.GetLabels() calls.
2021-02-26 20:24:29 +02:00
Aliaksandr Valialkin
22822feea3 lib/promscrape/discovery/kubernetes: errcheck fix 2021-02-26 19:09:12 +02:00
Aliaksandr Valialkin
dc8c045378 lib/promscrape: cleanup after 9b2246c29b
Main points:

* Revert changes outside lib/promscrape/discovery/kuberntes . These changes can be applied later in a separate commit
* Minimize changes in lib/promscrape/discovery/kubernetes compared to a93e644001
* Corner case fixes.
2021-02-26 19:09:12 +02:00
Nikolay
cf9262b01f vmagent kubernetes watch stream discovery. (#1082)
* started work on sd for k8s

* continue work on watch sd

* fixes

* continue work

* continue work on sd k8s

* disable gzip

* fixes typos

* log errror

* minor fix

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-02-26 19:09:12 +02:00
Aliaksandr Valialkin
4cfac70cde lib/promscrape: remove duplicate code a bit 2021-02-26 15:54:07 +02:00
Aliaksandr Valialkin
55eb983193 lib/promscrape: reduce processing time for big number of discovered targets by processing them in parallel 2021-02-26 12:48:48 +02:00
Aliaksandr Valialkin
197ecca426 lib/promrelabel: add more optimizations for relabeling for common cases 2021-02-22 16:36:54 +02:00
Aliaksandr Valialkin
8b87398333 lib/promscrape: export vm_promscrape_target_relabel_duration_seconds metric 2021-02-21 23:21:35 +02:00
Aliaksandr Valialkin
f09b46b2b1 lib/promscrape: typo fix after the commit f26162ec99 2021-02-19 00:34:18 +02:00
Aliaksandr Valialkin
502d0e2524 lib/promscrape: add scrape_align_interval config option into scrape config
This option allows aligning scrapes to a particular intervals.
2021-02-18 23:53:04 +02:00
Aliaksandr Valialkin
fccb481de2 lib/promscrape/discovery/kubernetes: add __meta_kubernetes_endpoints_label_* and __meta_kuberntes_endpoints_annotation_* labels to role: endpoints
This syncs kubernetes SD with Prometheus 2.25
See 617c56f55a
2021-02-15 02:51:36 +02:00
Aliaksandr Valialkin
18c2075159 lib/promscrape: remove vm_promscrape_scrapes_failed_per_url_total and vm_promscrape_scrapes_skipped_by_sample_limit_per_url_total metrics
These metrics may result in big number of time series when vmagent scrapes thousands of targets and these targets constantly changes.

* It is better using `up == 0` query for determining failing targets.
* It is better using the following query for determining targets with exceeded limit on the number of metrics:

  scrape_samples_scraped > 0 if up == 0
2021-02-12 05:23:27 +02:00
Nikolay
6fc2a8e544 fixes dockerswarm (#1053)
fixes improper usage of host network services
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1028
2021-02-04 15:57:41 +02:00
Aliaksandr Valialkin
755b0998ce lib/promscrape: add vm_promscrape_service_discovery_duration_seconds metric 2021-02-02 16:16:51 +02:00
Aliaksandr Valialkin
4c59dbc127 lib/promscrape: add vm_promscrape_scrape_retries_total, vm_promscrape_discovery_retries_total and vm_promscrape_discovery_requests_total metrics 2021-02-01 20:06:16 +02:00
Aliaksandr Valialkin
ffec5131ae lib/promscrape: export vm_promscrape_scrapes_failed_per_url_total and vm_promscrape_scrapes_skipped_by_sample_limit_per_url_total metrics
These metrics could be useful for determining imporperly working scrape targets.
Note that these metrics are exported only for failing scrape targets. They aren't exposed for normally working targets.
2021-01-27 00:40:39 +02:00
Aliaksandr Valialkin
4b324da947 all: consistently use timers from timerpool 2021-01-27 00:40:39 +02:00
Aliaksandr Valialkin
fdced59278 lib/promscrape: retry scrape and service discovery requests when the remote server closes http keep-alive connection 2021-01-22 13:22:59 +02:00
Nikolay
9f0a4fd00e Fixes error handling for promscrape.streamParse (#1009)
properly return error if client cannot read data,
properly suppress scraper errors
2021-01-12 13:35:09 +02:00
Aliaksandr Valialkin
c97681b45c lib/promscrape: properly show scrape duration on /targets page
Previously it has been shown as 0.000s for any scrape duration.
2021-01-11 21:15:50 +02:00