github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	7e1dd8ab9d	lib: consistently use atomic.* types instead of atomic.* functions See `ea9e2b19a5`	2024-02-24 02:07:53 +02:00
Aliaksandr Valialkin	5a092e161c	lib/promscrape/discovery/kuma: add support for `client_id` option See https://github.com/prometheus/prometheus/pull/13278	2024-02-18 19:19:40 +02:00
Aliaksandr Valialkin	ac5b740750	lib/promscrape/discovery/kubernetes: typo fix in the comment for ContainerStateTerminated struct This is a follow-up for `ef12598ad4`	2024-01-24 15:06:46 +02:00
Aliaksandr Valialkin	ef12598ad4	lib/promscrape/discovery/kubernetes: do not generate targets for already terminated pods and containers Already terminated pods and containers cannot be scraped and will never resurrect, so there is zero sense in creating scrape targets for them.	2024-01-24 14:57:53 +02:00
Aliaksandr Valialkin	3449d563bd	all: add up to 10% random jitter to the interval between periodic tasks performed by various components This should smooth CPU and RAM usage spikes related to these periodic tasks, by reducing the probability that multiple concurrent periodic tasks are performed at the same time.	2024-01-22 18:40:32 +02:00
Hui Wang	4e3242b02d	lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice (#5557 ) * lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice Previously the groupWatcher could be mistakenly stopped when requests for pod or services resources take too long. * remove mislead comment * docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640 * wip * lib/promscrape/kubernetes: prevent from stopping groupWatcher when there are in-flight apiWatcher.mustStart() calls groupWatcher is stopped if it has zero registered apiWatchers during 14 seconds. But such a groupWatcher can be still in use if apiWatcher for `role: endpoints` or `role: endpointslice` is being registered and the discovery of the associated `pod` and/or `service` objects takes longer than 14 seconds - see the beginning of groupWatcher.startWatchersForRole() function for details. Track the number of in-flight calls to apiWatcher.mustStart() and prevent from stopping the associated groupWatcher if the number of in-flight calls is non-zero. P.S. postponing the discovery of `pod` and/or `service` objects associated with `endpoints` or `endpointslice` roles isn't the best solution, since it slows down initial discovery of `endpoints` and `endpointslice` targets. * typo fix --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-21 23:13:15 +02:00
Aliaksandr Valialkin	1f105dde98	all: allow dynamically reading *AuthKey flag values from files and urls Examples: 1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath 2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath 3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url The flag value is automatically updated when the file contents changes.	2024-01-21 22:03:38 +02:00
Aliaksandr Valialkin	7fba73ce11	lib/promscrape/discovery/kubernetes: add -promscrape.kubernetes.attachNodeMetadataAll command-line flag This flag allows setting attach_metadata.node=true for all the kubernetes_sd_configs defined at -promscrape.config Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640 Thanks to wasim-nihal for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5593	2024-01-21 03:13:56 +02:00
Aliaksandr Valialkin	74448a7e57	lib/promscrape/discovery/hetzner: follow-up after `03a97dc678` - docs/sd_configs.md: moved hetzner_sd_configs docs to the correct place according to alphabetical order of SD names, document missing __meta_hetzner_role label. - lib/promscrape/config.go: added missing MustStop() call for Hetzner SD, and moved the code to the correct place according to alphabetical order of SD names. - lib/promscrape/discovery/hetzner: properly handle pagination for hloud API responses, populate missing __meta_hetzner_role label like Prometheus does. - Properly populate __meta_hetzner_public_ipv6_network label like Prometheus does. - Remove unused SDConfig.Token. - Remove "omitempty" annotation from SDConfig.Role field, since this field is mandatory. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3154	2024-01-20 17:01:53 +02:00
Aliaksandr Valialkin	4b42c8abbb	lib/promscrape/discovery/hetzner: fix golangci-lint warnings after `03a97dc678` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550	2024-01-15 17:12:40 +02:00
Aleksandr Stepanov	03a97dc678	vmagent: added hetzner sd config (#5550 ) * added hetzner robot and hetzner cloud sd configs * remove gettoken fun and update docs * Updated CHANGELOG and vmagent docs * Updated CHANGELOG and vmagent docs --------- Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-01-15 10:13:22 +01:00
Aliaksandr Valialkin	613b545dfd	lib/promscrape/discovery/kubernetes: propagate possible errors at newAPIWatcher() to the caller This allows substituting FATAL panics with recoverable runtime errors such as missing or invalid TLS CA file and/or missing/invalid /var/run/secrets/kubernetes.io/serviceaccount/namespace file. Now these errors are logged instead of PANIC'ing, so they can be fixed by updating the corresponding files without the need to restart vmagent. This is a follow-up for `90427abc65` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5243	2023-10-27 20:24:46 +02:00
Hui Wang	90427abc65	lib/promscrape/discovery/kubernetes: avoid possible panic if given caFile under kubernetes.SDConfig.HTTPClientConfig is not exist (#5243 ) follow up `d5a599badc`	2023-10-27 20:20:22 +02:00
Aliaksandr Valialkin	632d788b63	lib/promscrape/discovery/kubernetes: stop all the url watchers, which belong to a particular groupWatcher, at once Previously url watchers for pod, service and node objects could be mistakenly closed when service discovery was set up only for endpoints and endpointslice roles, since watchers for these roles may start start pod, service and node url watchers with nil apiWatcher passed to groupWatcher.startWatchersForRole(). Now all the url watchers, which belong to a particular groupWatcher, are stopped at once when this groupWatcher has no apiWatcher subscribers. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5216 The issue has been introduced in v1.93.5 when addressing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850	2023-10-27 13:51:35 +02:00
Hui Wang	7c90ce39cb	do not print redundant error logs when failed to scrape consul or no… (#5239 ) * do not print redundant error logs when failed to scrape consul or nomad target prometheus performs the same because it uses consul lib which just drops the error(`1806bcb38c/api/api.go (L1134)`)	2023-10-27 13:31:55 +08:00
Aliaksandr Valialkin	d5a599badc	lib/promauth: follow-up for `e16d3f5639` - Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup don't prevent from processing the corresponding scrape targets after the file becomes correct, without the need to restart vmagent. Previously scrape targets with invalid TLS CA file or TLS client certificate files were permanently dropped after the first attempt to initialize them, and they didn't appear until the next vmagent reload or the next change in other places of the loaded scrape configs. - Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent. Previously the old TLS CA was used until vmagent restart. - Properly handle errors during http request creation for the second attempt to send data to remote system at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing, since the returned request is nil on error. - Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent. Previously it could miss details on the source of the request. - Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header of every http request issued by vmagent during service discovery or target scraping. Re-use the HTTP client instead until the corresponding scrape config changes. - Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached, e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server when auth header cannot be obtained because of temporary error. - Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config. Cache the loaded certificate and the error for one second. This should significantly reduce CPU load when scraping big number of targets with the same tls_config. - Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`. - Improve test coverage at lib/promauth - Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later. Previously vmagent was exitting in this case. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959	2023-10-25 23:19:37 +02:00
Aliaksandr Valialkin	c22e3e7b1d	lib/promscrape/discovery/kubernetes/kubeconfig_test.go: make TestParseKubeConfigSuccess test code easier to follow	2023-10-25 23:17:18 +02:00
Aliaksandr Valialkin	eed5206376	lib/promauth: properly parse string contents for ca, cert and key fields at tls_config Previously yaml parser wasn't accepting string values for these fields, because it was mistakenly expecting a list of uint8 values instead.	2023-10-25 23:12:21 +02:00
Aliaksandr Valialkin	42dd71bb63	all: consistently use %w instead of %s in when error is passed to fmt.Errorf() This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.	2023-10-25 21:24:03 +02:00
Hui Wang	e16d3f5639	fix inconsistent behaviors with prometheus when scraping (#5153 ) * fix inconsistent behaviors with prometheus when scraping 1. address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959. skip job with wrong syntax in `scrape_configs` with error logs instead of exiting; 2. show error messages on vmagent /targets ui if there are wrong auth configs in `scrape_configs`, previously will print error logs and do scrape without auth header; 3. don't send requests if there are wrong auth configs in: 1. vmagent remoteWrite; 2. vmalert datasource/remoteRead/remoteWrite/notifier. * add changelogs * address review comments * fix ut	2023-10-17 17:58:19 +08:00
Zakhar Bessarab	8d99c12a7d	lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs (#5048 ) lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs It is possible that context.Cancelled will appear after k8s watcher was closed due to reload(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850). Logging an error misinforms user and looks like vmagent discovery will stop working even though this does not affect discovery. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-09-22 13:01:33 +02:00
Aliaksandr Valialkin	30a645cd82	lib/promscrape/discovery/kubernetes: follow-up after `03fece44e0` - Properly update vm_promscrape_discovery_kubernetes_url_watchers and vm_promscrape_discovery_kubernetes_group_watchers metrics after config changes - Properly stop goroutine responsible for recreating scrapeWorks after the corresponding urlWatcher is stopped - Log the event when urlWatcher is stopped in order to simplify debugging Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4861	2023-09-18 23:23:45 +02:00
Aliaksandr Valialkin	03fece44e0	lib/promscrape/discovery/kubernetes: wait for 10 seconds before checking whether the urlWatcher must be stopped This should prevent from excess urlWatcher churn on config reload, since it leads to removal of all the apiWatchers before creating new apiWatchers. So, every config reload would lead to stopping of all the previous urlWatchers and starting new urlWatchers. The new logic gives 10 seconds for config reload before stopping unused urlWatchers. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4861	2023-09-18 17:45:12 +02:00
Aliaksandr Valialkin	76af32d869	lib/promscrape/discovery/kubernetes: follow-up after `eeb862f3ff` - Move the bugfix description to the correct place in docs/CHANGELOG.md - Prevent from logging of 'context canceled' errors after the url watcher is stopped, since these errors are expected and may confuse users. - Remove unused urlWatcher.refCount field. - Remove unused urlWatcher.close() method. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850	2023-09-18 17:06:39 +02:00
Zakhar Bessarab	eeb862f3ff	lib/promscrape/discovery/kubernetes: fix leaking api watcher (#4861 ) * lib/promscrape/discovery/kubernetes: fix leaking api watcher goroutine which was polling k8s API had no execution control. This leaded to leaking goroutines during config reload. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: use reference counting for urlWatcher cleanup Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: remove waitgroup sync for goroutines polling API server This is unnecessary since context will is cancelled and new requests will not be sent. Also, using waitgroup will increase time required to perform reload which might result in missed scrapes. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: clarify comment Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * Apply suggestions from code review * lib/promscrape/discovery/kubernetes: address review feedback Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-09-15 19:40:13 +02:00
Aliaksandr Valialkin	edee262ecc	Makefile: update golangci-lint from v1.51.2 to v1.54.2 See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2	2023-09-01 10:16:42 +02:00
Nikolay	00685b627f	lib/promscrape/k8s_sd: set resourceVersion to 0 by default for watch … (#4901 ) * lib/promscrape/k8s_sd: set resourceVersion to 0 by default for watch requests it must reduce load for kubernetes ETCD servers. Since requests without resourceVersion performs force cache sync at kubernetes API server with ETCD more info at https://kubernetes.io/docs/reference/using-api/api-concepts/\#semantics-for-watch https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4855 * wip --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-08-30 16:03:41 +02:00
Aliaksandr Valialkin	3d73640815	lib/promscrape/discovery: close unused HTTP connections to service discovery servers This should prevent from connection leaks See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4724	2023-07-27 14:48:56 -07:00
Aliaksandr Valialkin	140e7b6b74	all: replace atomic.Value with atomic.Pointer[T] This eliminates the need in .(*T) casting for results obtained from Load() Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map, because map is already a pointer type.	2023-07-19 17:42:06 -07:00
Aliaksandr Valialkin	8a07621a0c	lib/promscrape: disable support for service discovery and metrics scrape via http2 Reasons for disabling http2: - http2 is used very rarely comparing to http for Prometheus metrics exposition and service discovery - http2 is much harder to debug than http - http2 has very bad security record because of its complexity - see https://portswigger.net/research/http2 VictoriaMetrics components are compiled with nethttpomithttp2 tag because of these issues. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4283 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4274 This is a follow-up for `72c3cd47eb`	2023-07-06 16:03:37 -07:00
Alexander Marshalov	40d12be607	fixed service name detection for consulagent service discovery in case of a difference in service name and service id (#4390 ) (#4439 ) Signed-off-by: Alexander Marshalov <_@marshalov.org>	2023-06-12 16:16:43 +02:00
Haleygo	72c3cd47eb	vmagent:scrape config support enable_http2 (#4295 ) app/vmagent: support `enable_http2` in scrape config This change adds HTTP2 support for scrape config and improves compatibility with Prometheus config. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4283	2023-06-05 15:56:49 +02:00
Haleygo	b3d0ff463a	vmagent:support follow_redirects on SD level (#4286 ) * vmagent:support follow_redirects on SD level * fix follow_redirects on sd level https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4282	2023-05-26 09:39:45 +02:00
Aliaksandr Valialkin	b9bb64ce55	lib/promscrape/discovery/consulagent: substitute metaPrefix with the `__meta_consulagent_` plaintext string This simplifies future code navigation and search for the specific meta-label starting from __meta_consulagent_* prefix. For example, `grep __meta_consulagent_namespace` finds the exact place where this label is defined. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3953 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4217	2023-05-08 23:40:13 -07:00
Aliaksandr Valialkin	74155afb71	docs: clarify docs after `5ee344824f` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4183	2023-05-08 16:11:44 -07:00
Zakhar Bessarab	4e71003620	lib/promscrape/discovery/kubernetes: follow-up for `d5e94721db` (#4255 ) - add changelog reference to an author - fix tests - add metadata to match Prometheus behavior Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-05-05 14:41:17 +02:00
Vasilchenko Anton	22e65402af	Add endpoint labels for pod targets discovered form endpoint but has different ports (#4253 ) Signed-off-by: Vasilchenko Anton <vasilchenko-as@yandex.ru>	2023-05-05 15:46:07 +04:00
Alexander Marshalov	56b84140a9	added new consulagent service discovery (#3953 ) (#4217 )	2023-05-04 11:36:21 +02:00
Zakhar Bessarab	bf3b6732bd	lib/promscrape/discovery/kubernetes: add common labels to all ports discovered from endpoints (#4235 ) * lib/promscrape/discovery/kubernetes: add common labels to all ports discovered from endpoints Sets `__meta_kubernetes_endpoints_name` and `__meta_kubernetes_namespace` labels to all ports of pod. Prometheus sets those labels to all ports in pod (`0ab9553611/discovery/kubernetes/endpoints.go (L267C15-L269)`) even if port is not matching any service. See: #4154 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: fix test for updated discovery logic Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-05-03 02:17:33 +02:00
Nikolay	5ee344824f	lib/promscrape: adds filter for consul_sd_configs: (#4184 ) * lib/promscrape: adds filter for consul_sd_configs: it allows advanced filtering for consul service discovery requests https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4183 * typo fix * removes deprecation mentions since it's not relevant * Update docs/CHANGELOG.md Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2023-04-26 19:16:27 +02:00
Aliaksandr Valialkin	f7ef80aaad	.golangci.yml: properly enable `revive` linter and fix all the warnings it detects	2023-02-26 12:18:59 -08:00
Aliaksandr Valialkin	e688121de8	lib/promscrape/discovery/kuma: substitute blocking HTTP call with non-blocking HTTP call at discoveryutils.Client	2023-02-23 15:13:08 -08:00
Mattias Ängehov	6d019a3c37	Azure Service Discovery - Fix token fetch for Container Apps/App Services (#3832 ) * Modify API version when running in Container App * Handle expires on from token response Response from IMDS does not always contain expires in value which is currently used to get the token expiry time. An example resources that doesn't provide it are Container Apps and App Service. Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com> * Fix client id parameter for user assigned identity * Apply suggestions from code review --------- Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com> Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>	2023-02-22 19:19:53 -08:00
Aliaksandr Valialkin	510f78a96b	all: consistently use http.Method{Get,Post,Put} across the codebase This is a follow-up after `9dec3c8f80`	2023-02-22 18:58:46 -08:00
my-git9	9dec3c8f80	chore: Use http constants to replace numbers (#3846 ) Signed-off-by: xin.li <xin.li@daocloud.io>	2023-02-22 18:53:05 -08:00
Aliaksandr Valialkin	9fbd45a22f	lib/promscrape/discovery/kuma: follow-up for `317fef95f9` - Do not generate __meta_server label, since it is unavailable in Prometheus. - Add a link to https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs to docs/CHANGELOG.md, so users could click it and read the docs without the need to search the corresponding docs. - Remove kumaTarget struct, since it is easier generating labels for discovered targets directly from the response returned by Kuma. This simplifies the code. - Store the generated labels for discovered targets inside atomic.Value. This allows reading them from concurrent goroutines without the need to use mutex. - Use synchronouse requests to Kuma instead of long polling, since there is a little sense in the long polling when the Kuma server may return 304 Not Modified response every -promscrape.kumaSDCheckInterval. - Remove -promscrape.kuma.waitTime command-line flag, since it is no longer needed when long polling isn't used. - Set default value for -promscrape.kumaSDCheckInterval to 30s in order to be consistent with Prometheus. - Remove unnecessary indirections for string literals, which are used only once, in order to improve code readability. - Remove unused fields from discoveryRequest and discoveryResponse. - Update tests. - Document why fetch_timeout and refresh_interval options are missing in kuma_sd_config. - Add docs to discoveryutils.RequestCallback and discoveryutils.ResponseCallback, since these are public types. Side notes: it is weird that Prometheus implementation for kuma_sd_configs sets `instance` label, since usually this label is set by the Prometheus itself to __address__ after the relabeling phase. See https://www.robustperception.io/life-of-a-label/ Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3389 See https://github.com/prometheus/prometheus/issues/7919 and https://github.com/prometheus/prometheus/pull/8844 as a reference implementation in Prometheus	2023-02-22 17:51:51 -08:00
Aliaksandr Valialkin	eb08579452	lib/promscrape/discovery: add a comment explaining why duplicates are removed from the generated target labels	2023-02-22 17:51:51 -08:00
Alexander Marshalov	317fef95f9	add kuma_sd_config for Kuma Control Plane targets discovery (#3389 ) (#3840 )	2023-02-22 13:59:56 +01:00
Oleksandr Redko	9fff48c3e3	app,lib: fix typos in comments (#3804 )	2023-02-13 13:27:13 +01:00
Aliaksandr Valialkin	f9b3409ee3	lib/promscrape/discovery/openstack: use port 80 for the discovered target by default if it isnt specified in the config	2023-02-11 14:41:58 -08:00

1 2 3 4 5 ...

280 commits