github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	15dda54e79	lib/promscrape/discovery/kubernetes: propagate possible errors at newAPIWatcher() to the caller This allows substituting FATAL panics with recoverable runtime errors such as missing or invalid TLS CA file and/or missing/invalid /var/run/secrets/kubernetes.io/serviceaccount/namespace file. Now these errors are logged instead of PANIC'ing, so they can be fixed by updating the corresponding files without the need to restart vmagent. This is a follow-up for `90427abc65` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5243	2023-10-27 20:27:58 +02:00
Hui Wang	a37125d043	lib/promscrape/discovery/kubernetes: avoid possible panic if given caFile under kubernetes.SDConfig.HTTPClientConfig is not exist (#5243 ) follow up `d5a599badc`	2023-10-27 20:27:58 +02:00
Aliaksandr Valialkin	20aeb8b65d	lib/promscrape/discovery/kubernetes: stop all the url watchers, which belong to a particular groupWatcher, at once Previously url watchers for pod, service and node objects could be mistakenly closed when service discovery was set up only for endpoints and endpointslice roles, since watchers for these roles may start start pod, service and node url watchers with nil apiWatcher passed to groupWatcher.startWatchersForRole(). Now all the url watchers, which belong to a particular groupWatcher, are stopped at once when this groupWatcher has no apiWatcher subscribers. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5216 The issue has been introduced in v1.93.5 when addressing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850	2023-10-27 14:34:25 +02:00
Hui Wang	69f4a58f76	do not print redundant error logs when failed to scrape consul or no… (#5239 ) * do not print redundant error logs when failed to scrape consul or nomad target prometheus performs the same because it uses consul lib which just drops the error(`1806bcb38c/api/api.go (L1134)`)	2023-10-27 14:18:47 +02:00
Aliaksandr Valialkin	8fbe5a0893	lib/promscrape: do not add a suggestion for enabling TCP6 in error message when the dial address is TCPv4	2023-10-27 14:06:49 +02:00
Dima Lazerka	1e48ad486e	Revert "lib/promscrape: do not add a suggestion for enabling TCP6 in error message when the dial address is TCPv4" It broke CI (lint) This reverts commit `5464376d16`.	2023-10-27 14:06:31 +02:00
Aliaksandr Valialkin	46dd504d81	lib/promscrape: do not add a suggestion for enabling TCP6 in error message when the dial address is TCPv4	2023-10-26 09:56:55 +02:00
Aliaksandr Valialkin	af6dc9c963	lib/promscrape: properly track the number of updated service discovery routines inside Config.mustRestart() This is a follow-up for `d5a599badc`	2023-10-26 09:56:36 +02:00
Aliaksandr Valialkin	f03e81c693	lib/promauth: follow-up for `e16d3f5639` - Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup don't prevent from processing the corresponding scrape targets after the file becomes correct, without the need to restart vmagent. Previously scrape targets with invalid TLS CA file or TLS client certificate files were permanently dropped after the first attempt to initialize them, and they didn't appear until the next vmagent reload or the next change in other places of the loaded scrape configs. - Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent. Previously the old TLS CA was used until vmagent restart. - Properly handle errors during http request creation for the second attempt to send data to remote system at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing, since the returned request is nil on error. - Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent. Previously it could miss details on the source of the request. - Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header of every http request issued by vmagent during service discovery or target scraping. Re-use the HTTP client instead until the corresponding scrape config changes. - Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached, e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server when auth header cannot be obtained because of temporary error. - Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config. Cache the loaded certificate and the error for one second. This should significantly reduce CPU load when scraping big number of targets with the same tls_config. - Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`. - Improve test coverage at lib/promauth - Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later. Previously vmagent was exitting in this case. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959	2023-10-26 09:55:47 +02:00
Aliaksandr Valialkin	8c9e3b7b50	lib/promscrape/discovery/kubernetes/kubeconfig_test.go: make TestParseKubeConfigSuccess test code easier to follow	2023-10-26 09:54:40 +02:00
Aliaksandr Valialkin	02684a0b29	lib/promauth: properly parse string contents for ca, cert and key fields at tls_config Previously yaml parser wasn't accepting string values for these fields, because it was mistakenly expecting a list of uint8 values instead.	2023-10-26 09:54:18 +02:00
Aliaksandr Valialkin	194deeea1b	lib/promscrape: move duplicate code from functions, which collect ScrapeWork lists for distinct SD types into Config.getScrapeWorkGeneric() This removes more than 200 lines of duplicate code	2023-10-26 09:53:59 +02:00
Aliaksandr Valialkin	36a1fdca6c	all: consistently use %w instead of %s in when error is passed to fmt.Errorf() This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.	2023-10-26 09:44:40 +02:00
Hui Wang	d7dd7614eb	fix inconsistent behaviors with prometheus when scraping (#5153 ) * fix inconsistent behaviors with prometheus when scraping 1. address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959. skip job with wrong syntax in `scrape_configs` with error logs instead of exiting; 2. show error messages on vmagent /targets ui if there are wrong auth configs in `scrape_configs`, previously will print error logs and do scrape without auth header; 3. don't send requests if there are wrong auth configs in: 1. vmagent remoteWrite; 2. vmalert datasource/remoteRead/remoteWrite/notifier. * add changelogs * address review comments * fix ut	2023-10-26 08:56:54 +02:00
Aliaksandr Valialkin	2a0f77aaf7	lib/promscrape: add a link to https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets in descriptions for -promscrape.cluster.* command-line flags This should help users figuring out the purpose of -promscrape.cluster.* command-line flags	2023-10-16 14:47:38 +02:00
Roman Khavronenko	1f2cb594d9	lib/promscrape: make concurrency control optional (#5073 ) * lib/promscrape: make concurrency control optional Before, `-maxConcurrentInserts` was limiting all calls to `promscrape.Parse` function: during ingestion and scraping. This behavior is incorrect. Cmd-line flag `-maxConcurrentInserts` should have effect onl on ingestion. Since both pipelines use the same `promscrape.Parse` function, we extend it to make concurrency limiter optional. So caller can decide whether concurrency should be limited or not. This commit makes `c53b5788b4` obsolete. Signed-off-by: hagen1778 <roman@victoriametrics.com> * Revert "dashboards: move `Concurrent inserts` panel to Troubleshooting section" This reverts commit `c53b5788b4`. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-02 21:34:41 +02:00
Aliaksandr Valialkin	d80ccf52a0	Revert "lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total` (#5074 )" This reverts commit `74301cdbf5`. Reason for revert: vmagent already provides better approach for detecting slow scrape targets via the following query: scrape_duration_seconds / scrape_timeout_seconds > 1 This query depends on automatically generated per-target metrics. See https://docs.victoriametrics.com/vmagent.html#automatically-generated-metrics for more details. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5074	2023-10-02 21:08:13 +02:00
Roman Khavronenko	0df0b0f29e	lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total` (#5074 ) * lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total` add metric `vm_promscrape_scrapes_skipped_total`to show whether vmagent skips the scrapes. This could happen if vmagent is overloaded or target is responding too slow for configured `scrape_interval`. The follow-up commit should add a corresponding alerting rule and panel to vmagent dashboard. Signed-off-by: hagen1778 <roman@victoriametrics.com> * deployment/docker: add `TooManyScrapeSkips` alerting rule for vmagent Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards: add panels `Scrape duration 0.99 quantile` and `Skipped scrapes` to vmagent dashboard Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-02 20:38:23 +02:00
Zakhar Bessarab	0be8960875	lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs (#5048 ) lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs It is possible that context.Cancelled will appear after k8s watcher was closed due to reload(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850). Logging an error misinforms user and looks like vmagent discovery will stop working even though this does not affect discovery. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> (cherry picked from commit `8d99c12a7d`)	2023-09-22 13:02:57 +02:00
Aliaksandr Valialkin	de2b3ff9b0	lib/promscrape/discovery/kubernetes: follow-up after `03fece44e0` - Properly update vm_promscrape_discovery_kubernetes_url_watchers and vm_promscrape_discovery_kubernetes_group_watchers metrics after config changes - Properly stop goroutine responsible for recreating scrapeWorks after the corresponding urlWatcher is stopped - Log the event when urlWatcher is stopped in order to simplify debugging Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4861	2023-09-19 00:44:21 +02:00
Aliaksandr Valialkin	705b31c351	lib/promscrape/discovery/kubernetes: wait for 10 seconds before checking whether the urlWatcher must be stopped This should prevent from excess urlWatcher churn on config reload, since it leads to removal of all the apiWatchers before creating new apiWatchers. So, every config reload would lead to stopping of all the previous urlWatchers and starting new urlWatchers. The new logic gives 10 seconds for config reload before stopping unused urlWatchers. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4861	2023-09-19 00:43:26 +02:00
Aliaksandr Valialkin	fe24523e19	lib/promscrape/discovery/kubernetes: follow-up after `eeb862f3ff` - Move the bugfix description to the correct place in docs/CHANGELOG.md - Prevent from logging of 'context canceled' errors after the url watcher is stopped, since these errors are expected and may confuse users. - Remove unused urlWatcher.refCount field. - Remove unused urlWatcher.close() method. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850	2023-09-19 00:41:29 +02:00
Zakhar Bessarab	55d25fb844	lib/promscrape/discovery/kubernetes: fix leaking api watcher (#4861 ) * lib/promscrape/discovery/kubernetes: fix leaking api watcher goroutine which was polling k8s API had no execution control. This leaded to leaking goroutines during config reload. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: use reference counting for urlWatcher cleanup Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: remove waitgroup sync for goroutines polling API server This is unnecessary since context will is cancelled and new requests will not be sent. Also, using waitgroup will increase time required to perform reload which might result in missed scrapes. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: clarify comment Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * Apply suggestions from code review * lib/promscrape/discovery/kubernetes: address review feedback Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-09-18 17:13:55 +02:00
Aliaksandr Valialkin	d8afd7fe98	Makefile: update golangci-lint from v1.51.2 to v1.54.2 See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2	2023-09-01 10:25:49 +02:00
Nikolay	ae85b20c5b	lib/promscrape/k8s_sd: set resourceVersion to 0 by default for watch … (#4901 ) * lib/promscrape/k8s_sd: set resourceVersion to 0 by default for watch requests it must reduce load for kubernetes ETCD servers. Since requests without resourceVersion performs force cache sync at kubernetes API server with ETCD more info at https://kubernetes.io/docs/reference/using-api/api-concepts/\#semantics-for-watch https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4855 * wip --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-08-30 16:04:14 +02:00
hagen1778	d70b346623	lib/promscrape: follow-up after `eabcfc9bcd` `-promscrape.cluster.membersCount` by default should be `1`, like every single vmagent is a cluster of one member on its own. The change additionally validates that user can't set `-promscrape.cluster.membersCount` to value lower than `1`. `eabcfc9bcd` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-08-29 11:18:12 +02:00
Haleygo	685c3d95e7	fix clusterMembersCount check (#4900 )	2023-08-29 11:15:50 +02:00
Zakhar Bessarab	46e86add2f	lib/promscrape/client: sync timeout for HostClient and http.Client (#4889 ) Initially, stream parse mode was reading data from response and parsing it on flight. This was causing longer delay to read the whole response and required increasing timeout value to allow data processing while reading. So that `908e35affd` increased timeout value to fix this. But after `74c00a8762` response in stream parse mode is saved into memory and then parsed eliminating necessity of having timeout value higher that for usual scrape. Updates: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4847 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> (cherry picked from commit `6e8611f301`)	2023-08-27 09:06:00 +02:00
hagen1778	b18e9b5bb0	app/vmagent: follow-up after `6788704152` https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4884 Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `4ebe8bb1d5`)	2023-08-27 09:05:22 +02:00
Zakhar Bessarab	1242460fa6	lib/promscrape/client: make User-Agent consistent between fasthttp and native client (#4886 ) User agent was not set for native client which resulted in using one provided by Golang. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4884 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> (cherry picked from commit `6788704152`)	2023-08-27 09:05:08 +02:00
Aliaksandr Valialkin	c813b5e4b1	lib/promscrape: add -promscrape.cluster.memberLabel command-line flag This flag allows specifying an additional label to add to all the scraped metrics. The flag must contain label name to add. The label value will be equal to -promscrape.cluster.memberNum. This functionality can help when there is a need to differentiate metrics scraped by distinct vmagent instances in the cluster according to https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4247 See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4247#issuecomment-1692279393	2023-08-24 22:04:34 +02:00
Aliaksandr Valialkin	3e62c71e8c	lib/promscrape: add a comment why `honor_timestamps` is set to false by default This should prevent from returning it back to true in the future Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697	2023-07-28 21:36:55 -07:00
Aliaksandr Valialkin	ee98f9ae66	lib/promscrape: use local scrape timestamp for scraped metrics unless `honor_timestamps: true` is set explicitly This fixes the case with gaps for metrics collected from cadvisor, which exports invalid timestamps, which break staleness detection at VictoriaMetrics side. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697 , https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697#issuecomment-1654614799 and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697#issuecomment-1656540535 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1773	2023-07-28 21:11:46 -07:00
Aliaksandr Valialkin	1f30f53df2	lib/promscrape/discovery: close unused HTTP connections to service discovery servers This should prevent from connection leaks See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4724	2023-07-27 14:47:55 -07:00
Aliaksandr Valialkin	324a3c5288	lib/promscrape: follow-up after `6aa50ca954` - Improve docs - Hide `debug relabeling` column when -promscrape.dropOriginalLabels command-line flag is set - Inline the code from the added template functions, since the code is harder to follow with the template functions, especially when these functions have misleading names. Also, these functions are used only in one place, e.g. they do not reduce the amounts of code. - Hide `click to show original labels` title at `labels` column when original labels aren't available. - Show the reason on whey original labels aren't available at /service-discovery page. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4597	2023-07-20 21:54:09 -07:00
Dmytro Kozlov	f0d8f77e6d	app/vmagent: fix creating target id if `--promscrape.dropOriginalLabels` flag was used (#4616 ) * app/vmagent: fix creating target id if `--promscrape.dropOriginalLabels` flag was used * app/vmagent: hide links if OriginalLabels was dropped * app/vmagent: update CHANGELOG.md and added information to the docs * app/vmagent: fix comments	2023-07-20 19:21:41 -07:00
Aliaksandr Valialkin	992c300ce9	all: replace atomic.Value with atomic.Pointer[T] This eliminates the need in .(*T) casting for results obtained from Load() Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map, because map is already a pointer type.	2023-07-19 17:48:26 -07:00
Alexander Marshalov	4084dba9e4	fixed service name detection for consulagent service discovery in case of a difference in service name and service id (#4390 ) (#4439 ) Signed-off-by: Alexander Marshalov <_@marshalov.org>	2023-07-06 16:53:29 -07:00
Aliaksandr Valialkin	5b8095a30a	lib/promscrape: disable support for service discovery and metrics scrape via http2 Reasons for disabling http2: - http2 is used very rarely comparing to http for Prometheus metrics exposition and service discovery - http2 is much harder to debug than http - http2 has very bad security record because of its complexity - see https://portswigger.net/research/http2 VictoriaMetrics components are compiled with nethttpomithttp2 tag because of these issues. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4283 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4274 This is a follow-up for `72c3cd47eb`	2023-07-06 16:04:31 -07:00
Aliaksandr Valialkin	6a3cee5c2c	lib/promscrape/discoveryutils: re-use checkRedirect function for both client and blockingClient Also document follow_redirects option at https://docs.victoriametrics.com/sd_configs.html#http-api-client-options This is a follow-up for `b3d0ff463a` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4282	2023-07-06 10:52:13 -07:00
Roman Khavronenko	d677c2a5a6	lib/promscrape/discoveryutils: properly check for net.ErrClosed (#4426 ) This error may be wrapped in another error, and should normally be tested using `errors.Is(err, net.ErrClosed)`. Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `dfe53a36fc`)	2023-06-09 10:41:07 +02:00
Roman Khavronenko	fb9b8f6b1b	app/vmagent: mention `enable_http2` in changelog (#4403 ) Follow-up after `72c3cd47eb` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `3305a6901c`)	2023-06-09 10:40:24 +02:00
Haleygo	6edf94c4b9	vmagent:scrape config support enable_http2 (#4295 ) app/vmagent: support `enable_http2` in scrape config This change adds HTTP2 support for scrape config and improves compatibility with Prometheus config. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4283 (cherry picked from commit `72c3cd47eb`)	2023-06-09 10:40:17 +02:00
Haleygo	73a8f763a0	vmagent:support follow_redirects on SD level (#4286 ) * vmagent:support follow_redirects on SD level * fix follow_redirects on sd level https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4282 (cherry picked from commit `b3d0ff463a`)	2023-06-02 13:19:35 +02:00
Alexander Marshalov	d321ea91f2	fixed typos in documentation and commandline flags descriptions (#4275 )	2023-05-10 02:22:06 -07:00
Aliaksandr Valialkin	a47b9e55ac	lib/promscrape/discovery/consulagent: substitute metaPrefix with the `__meta_consulagent_` plaintext string This simplifies future code navigation and search for the specific meta-label starting from __meta_consulagent_* prefix. For example, `grep __meta_consulagent_namespace` finds the exact place where this label is defined. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3953 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4217	2023-05-09 22:58:08 -07:00
Aliaksandr Valialkin	e2358d3bd5	docs: clarify docs after `5ee344824f` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4183	2023-05-09 22:49:13 -07:00
Aliaksandr Valialkin	8703b2fa87	app/vmselect: small cleanup after `4f3f9950d0` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3807	2023-05-09 22:45:02 -07:00
Alexander Marshalov	de68e94c91	fixed `vm_promscrape_config_last_reload_successful` metric value recovery after successful reloading with unchanged content (#4260 ) (#4268 ) Signed-off-by: Alexander Marshalov <_@marshalov.org>	2023-05-09 22:17:27 -07:00
Zakhar Bessarab	370a421ef4	lib/promscrape/discovery/kubernetes: follow-up for `d5e94721db` (#4255 ) - add changelog reference to an author - fix tests - add metadata to match Prometheus behavior Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-05-09 21:29:27 -07:00

1 2 3 4 5 ...

632 commits