github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	c2373a8109	lib/promscrape: fix BenchmarkScrapeWorkScrapeInternal, which has been broken by the commit `65bc460323`	2024-01-30 16:06:06 +02:00
Aliaksandr Valialkin	ac5b740750	lib/promscrape/discovery/kubernetes: typo fix in the comment for ContainerStateTerminated struct This is a follow-up for `ef12598ad4`	2024-01-24 15:06:46 +02:00
Aliaksandr Valialkin	ef12598ad4	lib/promscrape/discovery/kubernetes: do not generate targets for already terminated pods and containers Already terminated pods and containers cannot be scraped and will never resurrect, so there is zero sense in creating scrape targets for them.	2024-01-24 14:57:53 +02:00
Roman Khavronenko	89e3c70ccd	lib/promscrape: respect `0` value for `series_limit` param (#5663 ) * lib/promscrape: respect `0` value for `series_limit` param Respect `0` value for `series_limit` param in `scrape_config` even if global limit was set via `-promscrape.seriesLimitPerTarget`. Previously, `0` value will be ignored in favor of `-promscrape.seriesLimitPerTarget`. This behavior aligns with possibility to override `series_limit` value via relabeling with `__series_limit__` label. Signed-off-by: hagen1778 <roman@victoriametrics.com> * Update docs/CHANGELOG.md --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-23 13:09:14 +02:00
Aliaksandr Valialkin	3449d563bd	all: add up to 10% random jitter to the interval between periodic tasks performed by various components This should smooth CPU and RAM usage spikes related to these periodic tasks, by reducing the probability that multiple concurrent periodic tasks are performed at the same time.	2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin	d3ee3e0ef5	Revert "lib/promscrape: do not store last scrape response when stale markers … (#5577 )" This reverts commit `cfec258803`. Reason for revert: the original code already doesn't store the last scrape response when stale markers are disabled. The scrapeWork.areIdenticalSeries() function always returns true is stale markers are disabled. This prevents from storing the last response at scrapeWork.processScrapedData(). It looks like the reverted commit could also return back the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3660 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5577	2024-01-22 00:43:48 +02:00
Hui Wang	4e3242b02d	lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice (#5557 ) * lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice Previously the groupWatcher could be mistakenly stopped when requests for pod or services resources take too long. * remove mislead comment * docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640 * wip * lib/promscrape/kubernetes: prevent from stopping groupWatcher when there are in-flight apiWatcher.mustStart() calls groupWatcher is stopped if it has zero registered apiWatchers during 14 seconds. But such a groupWatcher can be still in use if apiWatcher for `role: endpoints` or `role: endpointslice` is being registered and the discovery of the associated `pod` and/or `service` objects takes longer than 14 seconds - see the beginning of groupWatcher.startWatchersForRole() function for details. Track the number of in-flight calls to apiWatcher.mustStart() and prevent from stopping the associated groupWatcher if the number of in-flight calls is non-zero. P.S. postponing the discovery of `pod` and/or `service` objects associated with `endpoints` or `endpointslice` roles isn't the best solution, since it slows down initial discovery of `endpoints` and `endpointslice` targets. * typo fix --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-21 23:13:15 +02:00
Aliaksandr Valialkin	1f105dde98	all: allow dynamically reading *AuthKey flag values from files and urls Examples: 1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath 2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath 3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url The flag value is automatically updated when the file contents changes.	2024-01-21 22:03:38 +02:00
Aliaksandr Valialkin	4eb9926125	lib/promscrape: code cleanup: send stale markers immediately after generating automatic metrics This cleanup has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5557/files#diff-6b205cf6637d7b65a5c45d9417d08822d4efad94227268cb196f61aa2a0fc0f7	2024-01-21 05:18:22 +02:00
Aliaksandr Valialkin	12f2c5679b	all: consistently clear prompbmarshal.Label by assigning an empty struct instead of zeroing Name and Value individually	2024-01-21 05:11:05 +02:00
Aliaksandr Valialkin	7fba73ce11	lib/promscrape/discovery/kubernetes: add -promscrape.kubernetes.attachNodeMetadataAll command-line flag This flag allows setting attach_metadata.node=true for all the kubernetes_sd_configs defined at -promscrape.config Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640 Thanks to wasim-nihal for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5593	2024-01-21 03:13:56 +02:00
Aliaksandr Valialkin	74448a7e57	lib/promscrape/discovery/hetzner: follow-up after `03a97dc678` - docs/sd_configs.md: moved hetzner_sd_configs docs to the correct place according to alphabetical order of SD names, document missing __meta_hetzner_role label. - lib/promscrape/config.go: added missing MustStop() call for Hetzner SD, and moved the code to the correct place according to alphabetical order of SD names. - lib/promscrape/discovery/hetzner: properly handle pagination for hloud API responses, populate missing __meta_hetzner_role label like Prometheus does. - Properly populate __meta_hetzner_public_ipv6_network label like Prometheus does. - Remove unused SDConfig.Token. - Remove "omitempty" annotation from SDConfig.Role field, since this field is mandatory. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3154	2024-01-20 17:01:53 +02:00
Hui Wang	cfec258803	lib/promscrape: do not store last scrape response when stale markers … (#5577 ) * lib/promscrape: do not store last scrape response when stale markers are disabled * update changelog	2024-01-20 00:53:41 +08:00
Aliaksandr Valialkin	41932db848	lib/promscrape: cosmetic changes after `3ac44baebe` - Rename mustLoadScrapeConfigFiles() to loadScrapeConfigFiles(), since now it may return error. - Split too long line with the error message into two lines in order to improve readability a bit. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5508 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5560	2024-01-16 22:29:09 +02:00
Hui Wang	3ac44baebe	exit vmagent if there is config syntax error in `scrape_config_files` when `-promscrape.config.strictParse=true` (#5560 )	2024-01-16 17:30:02 +08:00
Aliaksandr Valialkin	4b42c8abbb	lib/promscrape/discovery/hetzner: fix golangci-lint warnings after `03a97dc678` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550	2024-01-15 17:12:40 +02:00
Aleksandr Stepanov	03a97dc678	vmagent: added hetzner sd config (#5550 ) * added hetzner robot and hetzner cloud sd configs * remove gettoken fun and update docs * Updated CHANGELOG and vmagent docs * Updated CHANGELOG and vmagent docs --------- Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-01-15 10:13:22 +01:00
Aliaksandr Valialkin	d2c94a0663	lib/prompbmarshal: switch to github.com/VictoriaMetrics/easyproto	2024-01-14 23:04:45 +02:00
Aliaksandr Valialkin	5a88bc973f	all: use Gauge instead of Counter for `*_config_last_reload_successful` metrics This allows exposing the correct TYPE metadata for these labels when the app runs with -metrics.exposeMetadata command-line flag. See https://github.com/VictoriaMetrics/metrics/pull/61#issuecomment-1860085508 for more details. This is follow-up for `326a77c697`	2023-12-20 14:23:42 +02:00
hagen1778	e0fc5ef140	lib/promscrape: comsetic changes after `e373bb84d5` * fix typos in docs * add `shard-` prefix to generated links when `-promscrape.cluster.memberURLTemplate` is enabled Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-12-12 11:28:18 +01:00
Aliaksandr Valialkin	b05e1512d4	lib/promscrape: add a wraning when the /service-discovery page contains incomplete list of dropped targets	2023-12-08 19:03:51 +02:00
Aliaksandr Valialkin	e373bb84d5	lib/promscrape: add `-promscrape.cluster.memberURLTemplate` command-line flag for creating direct links to vmagent instances at /service-discovery page See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4018#issuecomment-1843811569	2023-12-07 16:04:21 +02:00
Aliaksandr Valialkin	7cb8ed8271	lib/promscrape: show -promscrape.cluster.memberNum values for vmagent instances, which scrape the given dropped target at /service-discovery page The /service-discovery page contains the list of all the discovered targets after the commit `487f6380d0` on all the vmagent instances in cluster mode ( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ). This commit improves debuggability of targets in cluster mode by providing a list of -promscrape.cluster.memberNum values per each target at /service-discovery page, which has been dropped becasue of sharding, e.g. if this target is scraped by other vmagent instances in the cluster. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4018	2023-12-07 00:05:32 +02:00
Aliaksandr Valialkin	67468a0c46	lib/promscrape: show `never scraped` message for never scraped targets at /targets page	2023-12-06 22:33:39 +02:00
Aliaksandr Valialkin	65bc460323	lib/promscrape: follow-up for `97373b7786` Substitute O(N^2) algorithm for exposing the `vm_promscrape_scrape_pool_targets` metric with O(N) algorithm, where N is the number of scrape jobs. The previous algorithm could slow down /metrics exposition significantly when -promscrape.config contains thousands of scrape jobs. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5311 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5335	2023-12-06 17:35:50 +02:00
Hui Wang	97373b7786	vmagent: add `vm_promscrape_scrape_pool_targets` for scrape jobs like… (#5335 ) * vmagent: export `vm_promscrape_scrape_pool_targets` metric to track the number of targets that each scrape_job discovers * add extra panel for new metric	2023-12-06 15:44:39 +08:00
Aliaksandr Valialkin	487f6380d0	lib/promscrape: show dropped targets because of sharding at /service-discovery page Previously the /service-discovery page didn't show targets dropped because of sharding ( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ). Show also the reason why every target is dropped at /service-discovery page. This should improve debuging why particular targets are dropped. While at it, do not remove dropped targets from the list at /service-discovery page until the total number of targets exceeds the limit passed to -promscrape.maxDroppedTargets . Previously the list was cleaned up every 10 minutes from the entries, which weren't updated for the last minute. This could complicate debugging of dropped targets. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389	2023-12-01 16:48:48 +02:00
Aliaksandr Valialkin	5034aa0773	app/vmagent: follow-up for `090cb2c9de` - Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check for the result returned from these functions. - Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue. - Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case. - Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage cannot keep up with the data ingestion rate. - Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples. - Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push data to persistent queue when -remoteWrite.disableOnDiskQueue is set. - Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics, because they are replaced with vmagent_remotewrite_samples_dropped_total metric. - Update 'Disabling on-disk persistence' docs at docs/vmagent.md - Update stale comments in the code Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110	2023-11-25 12:09:44 +02:00
Nikolay	090cb2c9de	app/vmagent: allow to disabled on-disk persistence (#5088 ) * app/vmagent: allow to disabled on-disk queue Previously, it wasn't possible to build data processing pipeline with a chain of vmagents. In case when remoteWrite for the last vmagent in the chain wasn't accessible, it persisted data only when it has enough disk capacity. If disk queue is full, it started to silently drop ingested metrics. New flags allows to disable on-disk persistent and immediatly return an error if remoteWrite is not accessible anymore. It blocks any writes and notify client, that data ingestion isn't possible. Main use case for this feature - use external queue such as kafka for data persistence. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110 * adds test, updates readme * apply review suggestions * update docs for vmagent * makes linter happy --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-24 13:42:11 +01:00
Aliaksandr Valialkin	613b545dfd	lib/promscrape/discovery/kubernetes: propagate possible errors at newAPIWatcher() to the caller This allows substituting FATAL panics with recoverable runtime errors such as missing or invalid TLS CA file and/or missing/invalid /var/run/secrets/kubernetes.io/serviceaccount/namespace file. Now these errors are logged instead of PANIC'ing, so they can be fixed by updating the corresponding files without the need to restart vmagent. This is a follow-up for `90427abc65` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5243	2023-10-27 20:24:46 +02:00
Hui Wang	90427abc65	lib/promscrape/discovery/kubernetes: avoid possible panic if given caFile under kubernetes.SDConfig.HTTPClientConfig is not exist (#5243 ) follow up `d5a599badc`	2023-10-27 20:20:22 +02:00
Aliaksandr Valialkin	632d788b63	lib/promscrape/discovery/kubernetes: stop all the url watchers, which belong to a particular groupWatcher, at once Previously url watchers for pod, service and node objects could be mistakenly closed when service discovery was set up only for endpoints and endpointslice roles, since watchers for these roles may start start pod, service and node url watchers with nil apiWatcher passed to groupWatcher.startWatchersForRole(). Now all the url watchers, which belong to a particular groupWatcher, are stopped at once when this groupWatcher has no apiWatcher subscribers. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5216 The issue has been introduced in v1.93.5 when addressing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850	2023-10-27 13:51:35 +02:00
Hui Wang	7c90ce39cb	do not print redundant error logs when failed to scrape consul or no… (#5239 ) * do not print redundant error logs when failed to scrape consul or nomad target prometheus performs the same because it uses consul lib which just drops the error(`1806bcb38c/api/api.go (L1134)`)	2023-10-27 13:31:55 +08:00
Aliaksandr Valialkin	cdbc06a639	lib/promscrape: do not add a suggestion for enabling TCP6 in error message when the dial address is TCPv4	2023-10-25 17:57:56 -07:00
Dima Lazerka	8b41b506c2	Revert "lib/promscrape: do not add a suggestion for enabling TCP6 in error message when the dial address is TCPv4" It broke CI (lint) This reverts commit `5464376d16`.	2023-10-25 16:24:31 -07:00
Aliaksandr Valialkin	5464376d16	lib/promscrape: do not add a suggestion for enabling TCP6 in error message when the dial address is TCPv4	2023-10-26 00:29:51 +02:00
Aliaksandr Valialkin	ac933cc423	lib/promscrape: properly track the number of updated service discovery routines inside Config.mustRestart() This is a follow-up for `d5a599badc`	2023-10-26 00:06:29 +02:00
Aliaksandr Valialkin	d5a599badc	lib/promauth: follow-up for `e16d3f5639` - Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup don't prevent from processing the corresponding scrape targets after the file becomes correct, without the need to restart vmagent. Previously scrape targets with invalid TLS CA file or TLS client certificate files were permanently dropped after the first attempt to initialize them, and they didn't appear until the next vmagent reload or the next change in other places of the loaded scrape configs. - Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent. Previously the old TLS CA was used until vmagent restart. - Properly handle errors during http request creation for the second attempt to send data to remote system at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing, since the returned request is nil on error. - Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent. Previously it could miss details on the source of the request. - Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header of every http request issued by vmagent during service discovery or target scraping. Re-use the HTTP client instead until the corresponding scrape config changes. - Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached, e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server when auth header cannot be obtained because of temporary error. - Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config. Cache the loaded certificate and the error for one second. This should significantly reduce CPU load when scraping big number of targets with the same tls_config. - Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`. - Improve test coverage at lib/promauth - Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later. Previously vmagent was exitting in this case. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959	2023-10-25 23:19:37 +02:00
Aliaksandr Valialkin	c22e3e7b1d	lib/promscrape/discovery/kubernetes/kubeconfig_test.go: make TestParseKubeConfigSuccess test code easier to follow	2023-10-25 23:17:18 +02:00
Aliaksandr Valialkin	eed5206376	lib/promauth: properly parse string contents for ca, cert and key fields at tls_config Previously yaml parser wasn't accepting string values for these fields, because it was mistakenly expecting a list of uint8 values instead.	2023-10-25 23:12:21 +02:00
Aliaksandr Valialkin	4afcb2a689	lib/promscrape: move duplicate code from functions, which collect ScrapeWork lists for distinct SD types into Config.getScrapeWorkGeneric() This removes more than 200 lines of duplicate code	2023-10-25 23:03:40 +02:00
Aliaksandr Valialkin	42dd71bb63	all: consistently use %w instead of %s in when error is passed to fmt.Errorf() This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.	2023-10-25 21:24:03 +02:00
Hui Wang	e16d3f5639	fix inconsistent behaviors with prometheus when scraping (#5153 ) * fix inconsistent behaviors with prometheus when scraping 1. address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959. skip job with wrong syntax in `scrape_configs` with error logs instead of exiting; 2. show error messages on vmagent /targets ui if there are wrong auth configs in `scrape_configs`, previously will print error logs and do scrape without auth header; 3. don't send requests if there are wrong auth configs in: 1. vmagent remoteWrite; 2. vmalert datasource/remoteRead/remoteWrite/notifier. * add changelogs * address review comments * fix ut	2023-10-17 17:58:19 +08:00
Aliaksandr Valialkin	b8c267075e	lib/promscrape: add a link to https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets in descriptions for -promscrape.cluster.* command-line flags This should help users figuring out the purpose of -promscrape.cluster.* command-line flags	2023-10-16 14:46:22 +02:00
Roman Khavronenko	a4bd73ec7e	lib/promscrape: make concurrency control optional (#5073 ) * lib/promscrape: make concurrency control optional Before, `-maxConcurrentInserts` was limiting all calls to `promscrape.Parse` function: during ingestion and scraping. This behavior is incorrect. Cmd-line flag `-maxConcurrentInserts` should have effect onl on ingestion. Since both pipelines use the same `promscrape.Parse` function, we extend it to make concurrency limiter optional. So caller can decide whether concurrency should be limited or not. This commit makes `c53b5788b4` obsolete. Signed-off-by: hagen1778 <roman@victoriametrics.com> * Revert "dashboards: move `Concurrent inserts` panel to Troubleshooting section" This reverts commit `c53b5788b4`. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-02 21:32:11 +02:00
Aliaksandr Valialkin	859977d591	Revert "lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total` (#5074 )" This reverts commit `74301cdbf5`. Reason for revert: vmagent already provides better approach for detecting slow scrape targets via the following query: scrape_duration_seconds / scrape_timeout_seconds > 1 This query depends on automatically generated per-target metrics. See https://docs.victoriametrics.com/vmagent.html#automatically-generated-metrics for more details. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5074	2023-10-02 20:59:56 +02:00
Roman Khavronenko	74301cdbf5	lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total` (#5074 ) * lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total` add metric `vm_promscrape_scrapes_skipped_total`to show whether vmagent skips the scrapes. This could happen if vmagent is overloaded or target is responding too slow for configured `scrape_interval`. The follow-up commit should add a corresponding alerting rule and panel to vmagent dashboard. Signed-off-by: hagen1778 <roman@victoriametrics.com> * deployment/docker: add `TooManyScrapeSkips` alerting rule for vmagent Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards: add panels `Scrape duration 0.99 quantile` and `Skipped scrapes` to vmagent dashboard Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-02 17:12:12 +02:00
Zakhar Bessarab	8d99c12a7d	lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs (#5048 ) lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs It is possible that context.Cancelled will appear after k8s watcher was closed due to reload(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850). Logging an error misinforms user and looks like vmagent discovery will stop working even though this does not affect discovery. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-09-22 13:01:33 +02:00
Aliaksandr Valialkin	30a645cd82	lib/promscrape/discovery/kubernetes: follow-up after `03fece44e0` - Properly update vm_promscrape_discovery_kubernetes_url_watchers and vm_promscrape_discovery_kubernetes_group_watchers metrics after config changes - Properly stop goroutine responsible for recreating scrapeWorks after the corresponding urlWatcher is stopped - Log the event when urlWatcher is stopped in order to simplify debugging Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4861	2023-09-18 23:23:45 +02:00
Aliaksandr Valialkin	03fece44e0	lib/promscrape/discovery/kubernetes: wait for 10 seconds before checking whether the urlWatcher must be stopped This should prevent from excess urlWatcher churn on config reload, since it leads to removal of all the apiWatchers before creating new apiWatchers. So, every config reload would lead to stopping of all the previous urlWatchers and starting new urlWatchers. The new logic gives 10 seconds for config reload before stopping unused urlWatchers. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4861	2023-09-18 17:45:12 +02:00
Aliaksandr Valialkin	76af32d869	lib/promscrape/discovery/kubernetes: follow-up after `eeb862f3ff` - Move the bugfix description to the correct place in docs/CHANGELOG.md - Prevent from logging of 'context canceled' errors after the url watcher is stopped, since these errors are expected and may confuse users. - Remove unused urlWatcher.refCount field. - Remove unused urlWatcher.close() method. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850	2023-09-18 17:06:39 +02:00
Zakhar Bessarab	eeb862f3ff	lib/promscrape/discovery/kubernetes: fix leaking api watcher (#4861 ) * lib/promscrape/discovery/kubernetes: fix leaking api watcher goroutine which was polling k8s API had no execution control. This leaded to leaking goroutines during config reload. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: use reference counting for urlWatcher cleanup Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: remove waitgroup sync for goroutines polling API server This is unnecessary since context will is cancelled and new requests will not be sent. Also, using waitgroup will increase time required to perform reload which might result in missed scrapes. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: clarify comment Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * Apply suggestions from code review * lib/promscrape/discovery/kubernetes: address review feedback Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-09-15 19:40:13 +02:00
Aliaksandr Valialkin	edee262ecc	Makefile: update golangci-lint from v1.51.2 to v1.54.2 See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2	2023-09-01 10:16:42 +02:00
Nikolay	00685b627f	lib/promscrape/k8s_sd: set resourceVersion to 0 by default for watch … (#4901 ) * lib/promscrape/k8s_sd: set resourceVersion to 0 by default for watch requests it must reduce load for kubernetes ETCD servers. Since requests without resourceVersion performs force cache sync at kubernetes API server with ETCD more info at https://kubernetes.io/docs/reference/using-api/api-concepts/\#semantics-for-watch https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4855 * wip --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-08-30 16:03:41 +02:00
hagen1778	5d848363f0	lib/promscrape: follow-up after `eabcfc9bcd` `-promscrape.cluster.membersCount` by default should be `1`, like every single vmagent is a cluster of one member on its own. The change additionally validates that user can't set `-promscrape.cluster.membersCount` to value lower than `1`. `eabcfc9bcd` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-08-29 10:04:57 +02:00
Haleygo	eabcfc9bcd	fix clusterMembersCount check (#4900 )	2023-08-29 15:58:24 +08:00
Zakhar Bessarab	6e8611f301	lib/promscrape/client: sync timeout for HostClient and http.Client (#4889 ) Initially, stream parse mode was reading data from response and parsing it on flight. This was causing longer delay to read the whole response and required increasing timeout value to allow data processing while reading. So that `908e35affd` increased timeout value to fix this. But after `74c00a8762` response in stream parse mode is saved into memory and then parsed eliminating necessity of having timeout value higher that for usual scrape. Updates: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4847 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-08-25 15:47:11 +02:00
Aliaksandr Valialkin	f1c2508243	lib/promscrape: add -promscrape.cluster.memberLabel command-line flag This flag allows specifying an additional label to add to all the scraped metrics. The flag must contain label name to add. The label value will be equal to -promscrape.cluster.memberNum. This functionality can help when there is a need to differentiate metrics scraped by distinct vmagent instances in the cluster according to https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4247 See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4247#issuecomment-1692279393	2023-08-24 22:03:54 +02:00
hagen1778	4ebe8bb1d5	app/vmagent: follow-up after `6788704152` https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4884 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-08-24 11:36:42 +02:00
Zakhar Bessarab	6788704152	lib/promscrape/client: make User-Agent consistent between fasthttp and native client (#4886 ) User agent was not set for native client which resulted in using one provided by Golang. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4884 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-08-24 11:31:13 +02:00
Aliaksandr Valialkin	d18ff993e6	lib/promscrape: add a comment why `honor_timestamps` is set to false by default This should prevent from returning it back to true in the future Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697	2023-07-28 21:36:32 -07:00
Aliaksandr Valialkin	e3ef3df938	lib/promscrape: use local scrape timestamp for scraped metrics unless `honor_timestamps: true` is set explicitly This fixes the case with gaps for metrics collected from cadvisor, which exports invalid timestamps, which break staleness detection at VictoriaMetrics side. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697 , https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697#issuecomment-1654614799 and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697#issuecomment-1656540535 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1773	2023-07-28 21:11:26 -07:00
Aliaksandr Valialkin	3d73640815	lib/promscrape/discovery: close unused HTTP connections to service discovery servers This should prevent from connection leaks See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4724	2023-07-27 14:48:56 -07:00
Aliaksandr Valialkin	49bd2905fa	lib/promscrape: follow-up after `6aa50ca954` - Improve docs - Hide `debug relabeling` column when -promscrape.dropOriginalLabels command-line flag is set - Inline the code from the added template functions, since the code is harder to follow with the template functions, especially when these functions have misleading names. Also, these functions are used only in one place, e.g. they do not reduce the amounts of code. - Hide `click to show original labels` title at `labels` column when original labels aren't available. - Show the reason on whey original labels aren't available at /service-discovery page. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4597	2023-07-20 19:14:33 -07:00
Dmytro Kozlov	6aa50ca954	app/vmagent: fix creating target id if `--promscrape.dropOriginalLabels` flag was used (#4616 ) * app/vmagent: fix creating target id if `--promscrape.dropOriginalLabels` flag was used * app/vmagent: hide links if OriginalLabels was dropped * app/vmagent: update CHANGELOG.md and added information to the docs * app/vmagent: fix comments	2023-07-20 10:13:39 +02:00
Aliaksandr Valialkin	140e7b6b74	all: replace atomic.Value with atomic.Pointer[T] This eliminates the need in .(*T) casting for results obtained from Load() Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map, because map is already a pointer type.	2023-07-19 17:42:06 -07:00
Aliaksandr Valialkin	8a07621a0c	lib/promscrape: disable support for service discovery and metrics scrape via http2 Reasons for disabling http2: - http2 is used very rarely comparing to http for Prometheus metrics exposition and service discovery - http2 is much harder to debug than http - http2 has very bad security record because of its complexity - see https://portswigger.net/research/http2 VictoriaMetrics components are compiled with nethttpomithttp2 tag because of these issues. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4283 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4274 This is a follow-up for `72c3cd47eb`	2023-07-06 16:03:37 -07:00
Aliaksandr Valialkin	792860db10	lib/promscrape/discoveryutils: re-use checkRedirect function for both client and blockingClient Also document follow_redirects option at https://docs.victoriametrics.com/sd_configs.html#http-api-client-options This is a follow-up for `b3d0ff463a` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4282	2023-07-06 10:51:33 -07:00
Alexander Marshalov	40d12be607	fixed service name detection for consulagent service discovery in case of a difference in service name and service id (#4390 ) (#4439 ) Signed-off-by: Alexander Marshalov <_@marshalov.org>	2023-06-12 16:16:43 +02:00
Roman Khavronenko	dfe53a36fc	lib/promscrape/discoveryutils: properly check for net.ErrClosed (#4426 ) This error may be wrapped in another error, and should normally be tested using `errors.Is(err, net.ErrClosed)`. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-09 09:26:33 +02:00
Roman Khavronenko	3305a6901c	app/vmagent: mention `enable_http2` in changelog (#4403 ) Follow-up after `72c3cd47eb` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-05 16:31:58 +02:00
Haleygo	72c3cd47eb	vmagent:scrape config support enable_http2 (#4295 ) app/vmagent: support `enable_http2` in scrape config This change adds HTTP2 support for scrape config and improves compatibility with Prometheus config. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4283	2023-06-05 15:56:49 +02:00
Haleygo	b3d0ff463a	vmagent:support follow_redirects on SD level (#4286 ) * vmagent:support follow_redirects on SD level * fix follow_redirects on sd level https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4282	2023-05-26 09:39:45 +02:00
Alexander Marshalov	2e494e2375	fixed typos in documentation and commandline flags descriptions (#4275 )	2023-05-10 09:50:41 +02:00
Aliaksandr Valialkin	b9bb64ce55	lib/promscrape/discovery/consulagent: substitute metaPrefix with the `__meta_consulagent_` plaintext string This simplifies future code navigation and search for the specific meta-label starting from __meta_consulagent_* prefix. For example, `grep __meta_consulagent_namespace` finds the exact place where this label is defined. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3953 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4217	2023-05-08 23:40:13 -07:00
Aliaksandr Valialkin	74155afb71	docs: clarify docs after `5ee344824f` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4183	2023-05-08 16:11:44 -07:00
Aliaksandr Valialkin	ec3943d14a	app/vmselect: small cleanup after `4f3f9950d0` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3807	2023-05-08 14:57:11 -07:00
Alexander Marshalov	8225a48b56	fixed `vm_promscrape_config_last_reload_successful` metric value recovery after successful reloading with unchanged content (#4260 ) (#4268 ) Signed-off-by: Alexander Marshalov <_@marshalov.org>	2023-05-08 13:32:51 +02:00
Zakhar Bessarab	4e71003620	lib/promscrape/discovery/kubernetes: follow-up for `d5e94721db` (#4255 ) - add changelog reference to an author - fix tests - add metadata to match Prometheus behavior Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-05-05 14:41:17 +02:00
Vasilchenko Anton	22e65402af	Add endpoint labels for pod targets discovered form endpoint but has different ports (#4253 ) Signed-off-by: Vasilchenko Anton <vasilchenko-as@yandex.ru>	2023-05-05 15:46:07 +04:00
Alexander Marshalov	56b84140a9	added new consulagent service discovery (#3953 ) (#4217 )	2023-05-04 11:36:21 +02:00
Zakhar Bessarab	bf3b6732bd	lib/promscrape/discovery/kubernetes: add common labels to all ports discovered from endpoints (#4235 ) * lib/promscrape/discovery/kubernetes: add common labels to all ports discovered from endpoints Sets `__meta_kubernetes_endpoints_name` and `__meta_kubernetes_namespace` labels to all ports of pod. Prometheus sets those labels to all ports in pod (`0ab9553611/discovery/kubernetes/endpoints.go (L267C15-L269)`) even if port is not matching any service. See: #4154 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape/discovery/kubernetes: fix test for updated discovery logic Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-05-03 02:17:33 +02:00
Nikolay	5ee344824f	lib/promscrape: adds filter for consul_sd_configs: (#4184 ) * lib/promscrape: adds filter for consul_sd_configs: it allows advanced filtering for consul service discovery requests https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4183 * typo fix * removes deprecation mentions since it's not relevant * Update docs/CHANGELOG.md Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2023-04-26 19:16:27 +02:00
Yury Molodov	4f3f9950d0	vmui: add metric relabel debug (#3889 ) * feat: add metric relabel debug (#3807) * fix: add link to relabeling cookbook * lib/promrelabel: merge, fix conflicts * lib/promrelabel: fix diff * docs/vmui: add metric relabel playground --------- Co-authored-by: dmitryk-dk <kozlovdmitriyy@gmail.com>	2023-04-26 11:53:29 +03:00
Aliaksandr Valialkin	f638496298	lib/promscrape: do not re-use previously loaded scrape targets on failed attempt to load updated scrape targets at file_sd_configs The logic employed for re-using the previously loaded scrape target was broken initially. The commit `cc0427897c` tried to fix it, but the new logic became too complex and fragile. So it is better to just remove this logic, since the targets from temporarily broken file should be eventually loaded on next attempts every -promscrape.fileSDCheckInterval This also allows removing fragile hacks around __vm_filepath label. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3989	2023-04-02 21:05:28 -07:00
Dmytro Kozlov	cc0427897c	lib/promscrape: fix the problem with scrape work duplicates when file_sd_config can't be read (#4027 ) * lib/promscrape: fix the problem with scrape work duplicates when file_sd_config can't be read * lib/promscrape: clarified comment * lib/promscrape: made better approach to handle a problem with growing []ScrapeWork on each error when loading config lib/promscrape: added CHANGELOG.md * Update docs/CHANGELOG.md --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-04-02 20:26:13 -07:00
Aliaksandr Valialkin	d577657fb7	lib/streamaggr: follow-up for `ff72ca14b9` - Make sure that the last successfully loaded config is used on hot-reload failure - Properly cleanup resources occupied by already initialized aggregators when the current aggregator fails to be initialized - Expose distinct vmagent_streamaggr_config_reload* metrics per each -remoteWrite.streamAggr.config This should simplify monitoring and debugging failed reloads - Remove race condition at app/vminsert/common.MustStopStreamAggr when calling sa.MustStop() while sa could be in use at realoadSaConfig() - Remove lib/streamaggr.aggregator.hasState global variable, since it may negatively impact scalability on system with big number of CPU cores at hasState.Store(true) call inside aggregator.Push(). - Remove fine-grained aggregator reload - reload all the aggregators on config change instead. This simplifies the code a bit. The fine-grained aggregator reload may be returned back if there will be demand from real users for it. - Check -relabelConfig and -streamAggr.config files when single-node VictoriaMetrics runs with -dryRun flag - Return back accidentally removed changelog for v1.87.4 at docs/CHANGELOG.md Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3639	2023-03-31 22:30:38 -07:00
Dmytro Kozlov	e79cd24807	lib/promrelabel: make target url from labels on target relabel page (#3882 ) * lib/promrelabel: make target url from labels on target relabel page * wip --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-03-20 22:07:52 -07:00
Haleygo	d056be710b	fix some typo (#3898 )	2023-03-03 11:02:13 +01:00
Zakhar Bessarab	5fadd58cf6	lib/promscrape: correctly register `vm_promscrape_config_` metrics (#3876 ) lib/promscrape: set `vm_promscrape_config_last_reload_successful` to 1 if there was no promscrape config provided Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape: register `vm_promscrape_config_*` metrics only in case promscrape config is used Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-02-27 11:53:53 -08:00
Aliaksandr Valialkin	f7ef80aaad	.golangci.yml: properly enable `revive` linter and fix all the warnings it detects	2023-02-26 12:18:59 -08:00
Aliaksandr Valialkin	c6ad3692ad	lib/promscrape: follow-up for `43e104a83f` - Return immediately on context cancel during the backoff sleep. This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747 - Add a comment describing why the second attempt to obtain the response from remote side is perfromed immediately after the first attempt. - Remove fasthttp dependency from lib/promscrape/discoveryutils - Set context deadline before calling doRequestWithPossibleRetry(). This simplifies the doRequestWithPossibleRetry() a bit. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3293	2023-02-24 12:20:42 -08:00
Zakhar Bessarab	43e104a83f	fix: do not use exponential backoff for first retry of scrape request (#3824 ) * fix: do not use exponential backoff for first retry of scrape request (#3293) * lib/promscrape: refactor `doRequestWithPossibleRetry` backoff to simplify logic Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * Update lib/promscrape/client.go Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> * lib/promscrape: refactor `doRequestWithPossibleRetry` to make it more straightforward Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2023-02-24 11:39:56 -08:00
Aliaksandr Valialkin	e688121de8	lib/promscrape/discovery/kuma: substitute blocking HTTP call with non-blocking HTTP call at discoveryutils.Client	2023-02-23 15:13:08 -08:00
Mattias Ängehov	6d019a3c37	Azure Service Discovery - Fix token fetch for Container Apps/App Services (#3832 ) * Modify API version when running in Container App * Handle expires on from token response Response from IMDS does not always contain expires in value which is currently used to get the token expiry time. An example resources that doesn't provide it are Container Apps and App Service. Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com> * Fix client id parameter for user assigned identity * Apply suggestions from code review --------- Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com> Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>	2023-02-22 19:19:53 -08:00
Aliaksandr Valialkin	510f78a96b	all: consistently use http.Method{Get,Post,Put} across the codebase This is a follow-up after `9dec3c8f80`	2023-02-22 18:58:46 -08:00
my-git9	9dec3c8f80	chore: Use http constants to replace numbers (#3846 ) Signed-off-by: xin.li <xin.li@daocloud.io>	2023-02-22 18:53:05 -08:00
Aliaksandr Valialkin	9fbd45a22f	lib/promscrape/discovery/kuma: follow-up for `317fef95f9` - Do not generate __meta_server label, since it is unavailable in Prometheus. - Add a link to https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs to docs/CHANGELOG.md, so users could click it and read the docs without the need to search the corresponding docs. - Remove kumaTarget struct, since it is easier generating labels for discovered targets directly from the response returned by Kuma. This simplifies the code. - Store the generated labels for discovered targets inside atomic.Value. This allows reading them from concurrent goroutines without the need to use mutex. - Use synchronouse requests to Kuma instead of long polling, since there is a little sense in the long polling when the Kuma server may return 304 Not Modified response every -promscrape.kumaSDCheckInterval. - Remove -promscrape.kuma.waitTime command-line flag, since it is no longer needed when long polling isn't used. - Set default value for -promscrape.kumaSDCheckInterval to 30s in order to be consistent with Prometheus. - Remove unnecessary indirections for string literals, which are used only once, in order to improve code readability. - Remove unused fields from discoveryRequest and discoveryResponse. - Update tests. - Document why fetch_timeout and refresh_interval options are missing in kuma_sd_config. - Add docs to discoveryutils.RequestCallback and discoveryutils.ResponseCallback, since these are public types. Side notes: it is weird that Prometheus implementation for kuma_sd_configs sets `instance` label, since usually this label is set by the Prometheus itself to __address__ after the relabeling phase. See https://www.robustperception.io/life-of-a-label/ Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3389 See https://github.com/prometheus/prometheus/issues/7919 and https://github.com/prometheus/prometheus/pull/8844 as a reference implementation in Prometheus	2023-02-22 17:51:51 -08:00
Aliaksandr Valialkin	eb08579452	lib/promscrape/discovery: add a comment explaining why duplicates are removed from the generated target labels	2023-02-22 17:51:51 -08:00
Zakhar Bessarab	d2b92d3264	lib/promscrape: fix cancelling in-flight scrape requests during configuration reload (#3853 ) * lib/promscrape: fix cancelling in-flight scrape requests during configuration reload (see #3747) Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape: fix order of params for `doRequestWithPossibleRetry` to follow codestyle Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promscrape: accept deadline explicitly and extend passed context for local use Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-02-22 17:05:16 +01:00

1 2 3 4 5 ...

710 commits