github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-11 14:53:49 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	f03e81c693	lib/promauth: follow-up for `e16d3f5639` - Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup don't prevent from processing the corresponding scrape targets after the file becomes correct, without the need to restart vmagent. Previously scrape targets with invalid TLS CA file or TLS client certificate files were permanently dropped after the first attempt to initialize them, and they didn't appear until the next vmagent reload or the next change in other places of the loaded scrape configs. - Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent. Previously the old TLS CA was used until vmagent restart. - Properly handle errors during http request creation for the second attempt to send data to remote system at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing, since the returned request is nil on error. - Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent. Previously it could miss details on the source of the request. - Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header of every http request issued by vmagent during service discovery or target scraping. Re-use the HTTP client instead until the corresponding scrape config changes. - Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached, e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server when auth header cannot be obtained because of temporary error. - Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config. Cache the loaded certificate and the error for one second. This should significantly reduce CPU load when scraping big number of targets with the same tls_config. - Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`. - Improve test coverage at lib/promauth - Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later. Previously vmagent was exitting in this case. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959	2023-10-26 09:55:47 +02:00
Hui Wang	d7dd7614eb	fix inconsistent behaviors with prometheus when scraping (#5153 ) * fix inconsistent behaviors with prometheus when scraping 1. address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959. skip job with wrong syntax in `scrape_configs` with error logs instead of exiting; 2. show error messages on vmagent /targets ui if there are wrong auth configs in `scrape_configs`, previously will print error logs and do scrape without auth header; 3. don't send requests if there are wrong auth configs in: 1. vmagent remoteWrite; 2. vmalert datasource/remoteRead/remoteWrite/notifier. * add changelogs * address review comments * fix ut	2023-10-26 08:56:54 +02:00
Zakhar Bessarab	0be8960875	lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs (#5048 ) lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs It is possible that context.Cancelled will appear after k8s watcher was closed due to reload(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850). Logging an error misinforms user and looks like vmagent discovery will stop working even though this does not affect discovery. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> (cherry picked from commit `8d99c12a7d`)	2023-09-22 13:02:57 +02:00
Aliaksandr Valialkin	de2b3ff9b0	lib/promscrape/discovery/kubernetes: follow-up after `03fece44e0` - Properly update vm_promscrape_discovery_kubernetes_url_watchers and vm_promscrape_discovery_kubernetes_group_watchers metrics after config changes - Properly stop goroutine responsible for recreating scrapeWorks after the corresponding urlWatcher is stopped - Log the event when urlWatcher is stopped in order to simplify debugging Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4861	2023-09-19 00:44:21 +02:00
Aliaksandr Valialkin	705b31c351	lib/promscrape/discovery/kubernetes: wait for 10 seconds before checking whether the urlWatcher must be stopped This should prevent from excess urlWatcher churn on config reload, since it leads to removal of all the apiWatchers before creating new apiWatchers. So, every config reload would lead to stopping of all the previous urlWatchers and starting new urlWatchers. The new logic gives 10 seconds for config reload before stopping unused urlWatchers. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4861	2023-09-19 00:43:26 +02:00
Aliaksandr Valialkin	fe24523e19	lib/promscrape/discovery/kubernetes: follow-up after `eeb862f3ff` - Move the bugfix description to the correct place in docs/CHANGELOG.md - Prevent from logging of 'context canceled' errors after the url watcher is stopped, since these errors are expected and may confuse users. - Remove unused urlWatcher.refCount field. - Remove unused urlWatcher.close() method. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850	2023-09-19 00:41:29 +02:00
Zakhar Bessarab	55d25fb844	lib/promscrape/discovery/kubernetes: fix leaking api watcher (#4861 ) * lib/promscrape/discovery/kubernetes: fix leaking api watcher goroutine which was polling k8s API had no execution control. This leaded to leaking goroutines during config reload. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: use reference counting for urlWatcher cleanup Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: remove waitgroup sync for goroutines polling API server This is unnecessary since context will is cancelled and new requests will not be sent. Also, using waitgroup will increase time required to perform reload which might result in missed scrapes. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: clarify comment Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * Apply suggestions from code review * lib/promscrape/discovery/kubernetes: address review feedback Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-09-18 17:13:55 +02:00
Nikolay	ae85b20c5b	lib/promscrape/k8s_sd: set resourceVersion to 0 by default for watch … (#4901 ) * lib/promscrape/k8s_sd: set resourceVersion to 0 by default for watch requests it must reduce load for kubernetes ETCD servers. Since requests without resourceVersion performs force cache sync at kubernetes API server with ETCD more info at https://kubernetes.io/docs/reference/using-api/api-concepts/\#semantics-for-watch https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4855 * wip --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-08-30 16:04:14 +02:00
my-git9	7d86c5c94a	chore: Use http constants to replace numbers (#3846 ) Signed-off-by: xin.li <xin.li@daocloud.io>	2023-02-22 18:59:32 -08:00
Oleksandr Redko	0e1c395609	app,lib: fix typos in comments (#3804 )	2023-02-13 09:32:35 -08:00
Aliaksandr Valialkin	be6da5053f	lib/promscrape: optimize service discovery speed - Return meta-labels for the discovered targets via promutils.Labels instead of map[string]string. This improves the speed of generating meta-labels for discovered targets by up to 5x. - Remove memory allocations in hot paths during ScrapeWork generation. The ScrapeWork contains scrape settings for a single discovered target. This improves the service discovery speed by up to 2x.	2022-11-29 21:26:23 -08:00
Nikolay	ea0596d9d8	lib/promscrape/discovery/kubernetes: correctly wrap error (#3250 ) * lib/promscrape/discovery/kubernetes: correctly wrap error follow-up after `1304824201` * Update docs/CHANGELOG.md Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-10-18 20:40:37 +03:00
Aliaksandr Valialkin	592612b63f	lib/promscrape/discovery/kubernetes: add more context on WatchEvent parse error This should improve debugging issues with Kubernetes API server	2022-09-13 19:37:40 +03:00
Aliaksandr Valialkin	1905618d10	all: subsitute ioutil.ReadAll with io.ReadAll ioutil.ReadAll is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil VictoriaMetrics requires at least Go1.18, so it is OK to switch from ioutil.ReadAll to io.ReadAll. This is a follow-up for `02ca2342ab`	2022-08-22 00:16:04 +03:00
Aliaksandr Valialkin	06f6de6d47	all: use os.{Read\|Write}File instead of ioutil.{Read\|Write}File The ioutil.{Read\|Write}File is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil VictoriaMetrics needs at least Go1.18, so it is safe to remove ioutil usage from source code. This is a follow-up for `02ca2342ab`	2022-08-21 23:55:20 +03:00
Aliaksandr Valialkin	c4cc45d7f8	lib/promscrape/discovery/kubernetes: allow attaching node-level labels to `role: endpoints` and `role: endpointlice` targets in the same way as Prometheus does See https://github.com/prometheus/prometheus/pull/10759	2022-07-07 00:36:24 +03:00
Aliaksandr Valialkin	3ae6300497	lib/promauth: add ability to send additional http headers in requests to scrape targets This solves https://stackoverflow.com/questions/66032498/prometheus-scrape-metric-with-custom-header	2022-06-22 20:40:50 +03:00
Roman Khavronenko	7406665fc3	lib/promscrape/discovery/kubernetes: fixes kubernetes service discovery (#2615 ) * lib/promscrape/discovery/kubernetes: properly updates discovered scrape works previously, added or updated scrapeworks may override previuosly discovered. it happens because swosByKey may contain small subset of kubernetes objects with it's labels. It happens for objectsUpdated and objectsAdded maps, which include only changed elements * Properly calculate vm_promscrape_discovery_kubernetes_scrape_works Co-authored-by: f41gh7 <nik@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-05-21 01:17:21 +03:00
Aliaksandr Valialkin	810dd74fb9	lib/promscrape: properly implement ScrapeConfig.clone() Previously ScrapeConfig.clone() was improperly copying promauth.Secret fields - their contents was replaced with `<secret>` value. This led to inability to use passwords and secrets in `-promscrape.config` file. The bug has been introduced in v1.77.0 in the commit `67b10896d2` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2551	2022-05-07 00:06:19 +03:00
Aliaksandr Valialkin	c2b13e6a04	lib/promscrape/discovery/kubernetes: limit the minimum sleep time between updating dependent ScrapeWork objects Previously the sleep time could be dropped to nanoseconds, which could result in CPU time waste	2022-04-22 23:15:34 +03:00
Aliaksandr Valialkin	a89e31b304	lib/promscrape/discovery/kubernetes: allow attaching node-level labels and annotations to discovered pod targets in the same way as Prometheus 2.35 does See https://github.com/prometheus/prometheus/issues/9510 and https://github.com/prometheus/prometheus/pull/10080	2022-04-22 20:15:34 +03:00
Aliaksandr Valialkin	cc6eae6992	lib/promscrape/discovery/kubernetes: improve the performance of urlWatcher.reloadObjects() on multi-CPU systems Parallelize the generation of ScrapeWork objects there. Previously they were generated in a single goroutine.	2022-04-22 13:23:39 +03:00
Aliaksandr Valialkin	fea9d1e6ee	lib/promscrape/discovery/kubernetes: properly update endpoints and endpointslice objects when the related pod or service objects are updated Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240 This is a follow-up for `2341bd48d7`	2022-04-21 13:06:49 +03:00
Aliaksandr Valialkin	e9f08b1e6a	lib/promscrape/discovery/kubernetes: do not pre-allocate memory for ScrapeWork objects There is high chance that ScrapeWork objects won't be generated because of relabeling	2022-04-20 16:42:41 +03:00
Aliaksandr Valialkin	909a3ee0e4	lib/promscrape: follow-up after `91e290a8ff`	2022-04-20 16:12:26 +03:00
Nikolay	429848a67d	lib/promscrape: reduce latency for k8s GetLabels (#2454 ) replaces internStringMap with sync.Map - it greatly reduces lock contention concurently reload scrape work for api watcher - each object labels added by dedicated CPU changes can be tested with following script https://gist.github.com/f41gh7/6f8f8d8719786aff1f18a85c23aebf70	2022-04-20 16:12:25 +03:00
Aliaksandr Valialkin	d0bac8e224	all: typo fix: Kuberntes -> Kubernetes	2022-04-20 10:51:41 +03:00
Aliaksandr Valialkin	bc18368c15	lib/promscrape/discovery/kubernetes: add the ability to limit service discovery to the current namespace See https://github.com/prometheus/prometheus/issues/9782 and https://github.com/prometheus/prometheus/pull/9881	2022-01-13 22:44:59 +02:00
Aliaksandr Valialkin	146c14d879	lib/promscrape/discovery/kubernetes: return back support `role: endpointslices`, since it is used by VictoriaMetrics operator This is a follow up commit after `31b42b30b6`	2021-08-29 12:37:36 +03:00
Aliaksandr Valialkin	ca61d7c82b	lib/promscrape/discovery/kubernetes: rename `role: endpointslices` to `role: endpointslice` to be consistent with Prometheus See `2ec6c7dbb8/discovery/kubernetes/kubernetes.go (L99)`	2021-08-29 11:23:59 +03:00
Aliaksandr Valialkin	327034b54f	lib/promscrape/discovery/kubernetes: use v1 API instead of v1beta1 API for `role: ingress` and `role: endpointslices` This should fix service discovery for these roles in Kubernetes v1.22 and newer versions. See https://kubernetes.io/docs/reference/using-api/deprecation-guide/#ingress-v122 The corresponding change in Prometheus - https://github.com/prometheus/prometheus/pull/9205	2021-08-29 11:23:58 +03:00
Aliaksandr Valialkin	110a888e39	lib/promscrape/discovery/kubernetes: make `golangci-lint` happy by removing empty branches	2021-05-20 12:00:17 +03:00
Aliaksandr Valialkin	9d97f44772	lib/promscrape/discovery/kubernetes: reload objects on object parse error Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240	2021-05-18 23:27:24 +03:00
Aliaksandr Valialkin	c507faec0b	lib/promscrape/discovery/kubernetes: simplify the reload logic for urlWatcher.objectsByKey	2021-05-18 15:41:51 +03:00
Aliaksandr Valialkin	0f54c0121b	lib/promscrape/discovery/kubernetes: properly update vm_promscrape_discovery_kubernetes_scrape_works metric Previously it wasn't descreased during config update.	2021-05-18 15:41:51 +03:00
Aliaksandr Valialkin	9f62d348db	lib/promscrape/discovery/kubernetes: log errors and stop service discovery when unexpected updates are received from Kubernetes API server Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240	2021-05-18 15:41:51 +03:00
Aliaksandr Valialkin	8764b0ae21	lib/promscrape/discovery/kubernetes: key ScrapeWork objects by urlWatcher instead of namespace This makes the code less fragile if urlWatcher would depend on additional to namepsace properties. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240	2021-05-17 23:49:48 +03:00
Aliaksandr Valialkin	e08287f017	lib/promscrape: reload auth tokens from files every second Previously auth tokens were loaded at startup and couldn't be updated without vmagent restart. Now there is no need in vmagent restart. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1297	2021-05-14 20:03:35 +03:00
Aliaksandr Valialkin	a6cb4f10a7	app/{vmalert,vmauth}: explicitly set MaxIdleConnsPerHost in net/http.Client.Transport By default MaxIdleConnsPerHost is set to 2. This limits the possibility to re-use http keep-alive connections. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1300	2021-05-14 18:13:34 +03:00
Aliaksandr Valialkin	027607db3e	lib/promscrape/discovery/kubernetes: refresh endpoints and endpointslices scrape targets every 5 seconds, since they may depend on changed service and pod objects This should make endpoints and endpointslices scrape targets eventually consistent with the maximum delay of 5 seconds after the related service or pod object changes. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240	2021-05-12 14:12:43 +03:00
Aliaksandr Valialkin	e6c19cb09d	lib/promscrape/discovery/kubernetes: start watchers for pods and services before starting watchers for endpoints This should eliminate possible race when an update on endpoints depends on pods and/or services, which are missing in the cache yet. This could result in missing targets based on endpoints or endpointslices. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240	2021-05-05 12:23:16 +03:00
Aliaksandr Valialkin	421a92983a	lib/promscrape/discovery/kubernetes: remove a mutex at urlWatcher - use groupWatcher mutex for accessing all the urlWatcher children This simplifies the code a bit and reduces the probability of improper mutex handling and deadlocks.	2021-04-29 10:17:45 +03:00
Aliaksandr Valialkin	b3da457629	lib/promscrape/discovery/kubernetes: fix a deadlock introduced in `eddba29664` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240 Thanks to @f41gh7 for providing the initial idea for deadlock fix at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1248	2021-04-27 14:59:56 +03:00
Aliaksandr Valialkin	34321e5f8d	lib/promscrape/discovery/kubernetes: refresh `role: endpoints` targets on service object removal as Prometheus does This is a follow-up for `ae37cfd528` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240	2021-04-23 20:27:29 +03:00
Aliaksandr Valialkin	db27dbab5e	lib/promscrape/discovery/kubernetes: refresh endpoints and endpointslices targets on service object update like Prometheus does Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240	2021-04-23 20:12:22 +03:00
Aliaksandr Valialkin	02b83e0957	lib/promscrape/discovery: remove superflouos check in registerPendingAPIWatchers The check `_, ok := uw.aws[aw]; !ok` isn't needed, since aw cannot exist in uw.aws because of the check inside subscribeAPIWatcher	2021-04-07 13:10:04 +03:00
Aliaksandr Valialkin	db56ee0e28	lib/promscrape/discovery/kubernetes: register pending apiWatchers in uw.aws	2021-04-06 11:11:53 +03:00
Lu Jiajing	4ee6def68b	fix access to nil url.URL (#1180 ) fix access to nil url.URL Signed-off-by: Megrez Lu <lujiajing1126@gmail.com> Update lib/promscrape/discovery/kubernetes/api_watcher.go Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>	2021-04-05 22:26:43 +03:00
Aliaksandr Valialkin	7eca60694e	lib/promscrape/discovery/kubernetes: reduce CPU time spent on registering big number of Kubernetes objects shared among big number of scrape jobs Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182	2021-04-05 22:05:02 +03:00
Aliaksandr Valialkin	9da2ef3d8f	lib/promscrape/discovery/kubernetes: load objects missing in local cache from api seriver in getObjectByRole() This should fix possible race for `role: endpoints` and `role: endpointslices` service discovery, when the referred `pod` and `service` objects aren't propagated to urlWatcher cache yet. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182#issuecomment-813353359 for details.	2021-04-05 20:31:22 +03:00

1 2

65 commits