VictoriaMetrics/lib/promscrape/discovery
Hui Wang 49fa92c1d0
lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice (#5557)
* lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice

Previously the groupWatcher could be mistakenly stopped when requests for pod or services resources take too long.

* remove mislead comment

* docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640

* wip

* lib/promscrape/kubernetes: prevent from stopping groupWatcher when there are in-flight apiWatcher.mustStart() calls

groupWatcher is stopped if it has zero registered apiWatchers during 14 seconds.
But such a groupWatcher can be still in use if apiWatcher for `role: endpoints` or `role: endpointslice`
is being registered and the discovery of the associated `pod` and/or `service` objects takes longer
than 14 seconds - see the beginning of groupWatcher.startWatchersForRole() function for details.

Track the number of in-flight calls to apiWatcher.mustStart() and prevent from stopping the associated groupWatcher
if the number of in-flight calls is non-zero.

P.S. postponing the discovery of `pod` and/or `service` objects associated with `endpoints` or `endpointslice` roles
isn't the best solution, since it slows down initial discovery of `endpoints` and `endpointslice` targets.

* typo fix

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-22 01:33:17 +02:00
..
azure lib/promscrape/discovery: close unused HTTP connections to service discovery servers 2023-07-27 14:47:55 -07:00
consul do not print redundant error logs when failed to scrape consul or no… (#5239) 2023-10-27 14:18:47 +02:00
consulagent lib/promscrape/discovery: close unused HTTP connections to service discovery servers 2023-07-27 14:47:55 -07:00
digitalocean all: consistently use %w instead of %s in when error is passed to fmt.Errorf() 2023-10-26 09:44:40 +02:00
dns lib/promauth: follow-up for e16d3f5639 2023-10-26 09:55:47 +02:00
docker lib/promscrape/discovery: close unused HTTP connections to service discovery servers 2023-07-27 14:47:55 -07:00
dockerswarm lib/promauth: follow-up for e16d3f5639 2023-10-26 09:55:47 +02:00
ec2 Makefile: update golangci-lint from v1.51.2 to v1.54.2 2023-09-01 10:25:49 +02:00
eureka lib/promscrape/discovery: close unused HTTP connections to service discovery servers 2023-07-27 14:47:55 -07:00
gce Makefile: update golangci-lint from v1.51.2 to v1.54.2 2023-09-01 10:25:49 +02:00
hetzner lib/promscrape/discovery/hetzner: follow-up after 03a97dc678 2024-01-22 00:53:23 +02:00
http all: consistently use %w instead of %s in when error is passed to fmt.Errorf() 2023-10-26 09:44:40 +02:00
kubernetes lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice (#5557) 2024-01-22 01:33:17 +02:00
kuma lib/promscrape/discovery: close unused HTTP connections to service discovery servers 2023-07-27 14:47:55 -07:00
nomad do not print redundant error logs when failed to scrape consul or no… (#5239) 2023-10-27 14:18:47 +02:00
openstack lib/promauth: follow-up for e16d3f5639 2023-10-26 09:55:47 +02:00
yandexcloud lib/promauth: follow-up for e16d3f5639 2023-10-26 09:55:47 +02:00