Commit graph

290 commits

Author SHA1 Message Date
Aliaksandr Valialkin
83edbb7cab lib/promscrape: retry service discovery in a few seconds if it starts returning 0 targets
This should reduce recovery time from temporary issues during service discovery
2021-03-14 21:53:23 +02:00
Aliaksandr Valialkin
bf15d6a6a2 lib/promscrape: remove duplicate target word in error message 2021-03-14 21:52:02 +02:00
Aliaksandr Valialkin
d409898515 lib/promscrape/discovery/kubernetes: further optimize kubernetes service discovery for the case with many scrape jobs
Do not re-calculate labels per each scrape job - reuse them instead for scrape jobs with identical Kubernetes role

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
2021-03-14 21:14:53 +02:00
Aliaksandr Valialkin
7a16e8e3a2 lib/promscrape/discovery: fixes after 133b288681
- Removed a deadlock in addAPIWatcher
- Do not create unused ScrapeWork objects
- Do not spend CPU resources on creating objectByKey map in addAPIWatcher

This work is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1125
2021-03-13 15:18:51 +02:00
Aliaksandr Valialkin
def014eb75 lib/promscrape/discovery/kubernetes: remove debug lines left after the commit 133b288681 2021-03-12 11:22:33 +02:00
Aliaksandr Valialkin
a6a71ef861 lib/promscrape: add ability to configure proxy options via proxy_tls_config, proxy_basic_auth, proxy_bearer_token and proxy_bearer_token_file
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
2021-03-12 03:36:19 +02:00
Aliaksandr Valialkin
133b288681 lib/promscrape/discovery/kubernetes: use a single watcher per apiURL
Previously multiple scrape jobs could create multiple watchers for the same apiURL. Now only a single watcher is used.
This should reduce load on Kubernetes API server when many scrape job configs use Kubernetes service discovery.
2021-03-11 16:43:04 +02:00
Aliaksandr Valialkin
bebcb8130c lib/promscrape/discovery/kubernetes: localize Bookmark parsing code
This is a follow-up for e772d1c920
2021-03-11 13:08:08 +02:00
Aliaksandr Valialkin
e772d1c920 lib/promscrape/discovery/kubernetes: reduce load on Kubernetes API server by using watch bookmarks
This allows continuing object watch from the last bookbark instead of reloading all the objects
on watch errors or timeouts.

See https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
2021-03-10 15:06:35 +02:00
Aliaksandr Valialkin
36fd007247 lib/proxy: set missing ServerName in TLS config for proxy_url.
While at it, allow setting Proxy-Authorization for `proxy_url` via `basic_auth` and `bearer_token` configs.
2021-03-09 18:58:18 +02:00
Nikolay
ad34f42467
Changes tlsConfig init for proxy connections (#1121)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
2021-03-09 18:51:00 +02:00
Aliaksandr Valialkin
3fd8653b40 lib/promscrape: apply sample_limit after metric relabeling is applied as Prometheus does
See the description for `sample_limit` option from Prometheus docs:

Per-scrape limit on number of scraped samples that will be accepted.
If more than this number of samples are present after metric relabeling
the entire scrape will be treated as failed. 0 means no limit.

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
2021-03-09 15:47:18 +02:00
Aliaksandr Valialkin
4b18a4f026 lib/promscrape/discovery/kubernetes: remove too verbose logs about starting and stopping the watchers
Log the number of objects loaded per each watch url

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
2021-03-09 15:05:14 +02:00
Aliaksandr Valialkin
14a399dd06 lib/promscrape: add scrape_offset option to scrape_config
This option can be used for specifying the particular offset per each scrape interval for target scraping
2021-03-08 12:03:33 +02:00
Aliaksandr Valialkin
ab4f090c63 lib/promscrape/discovery/kubernetes: reduce memory usage further when big number of scrape jobs are configured for the same kubernetes_sd_config role
Serialize reloading per-role objects, so they don't occupy too much memory when objects for many scrape jobs are simultaneously refreshed.
Do not reload per-role objects if they were already refreshed by concurrent goroutines. This should reduce load on Kubernetes API server
when big number of scrape jobs are configured for the same Kubernetes role.

This is a follow-up for 17b87725ed

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
2021-03-07 19:51:03 +02:00
Aliaksandr Valialkin
17b87725ed lib/promscrape/discovery/kubernetes: reduce memory usage when Kubernetes service discovery is configured on a big number of scrape jobs
Previously vmagent was creating a separate Kubernetes object cache per each scrape job.
This could result in increased memory usage when monitoring a Kubernetes cluster with big number of objects (pods / nodes / services, etc.)
as seen at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113

Now it uses a shared map of scrape objects across multiple scrape jobs.
2021-03-05 17:29:55 +02:00
Aliaksandr Valialkin
c9a25e931b lib/promscrape/discovery/kubernetes: move apiWatcher code to a separate file 2021-03-05 12:36:05 +02:00
Aliaksandr Valialkin
444fca89f8 lib/promscrape: make cluster membership calculations consistent across 32-bit and 64-bit architectures 2021-03-05 09:06:17 +02:00
Aliaksandr Valialkin
423cd981fb lib/promscrape: add -promscrape.cluster.replicationFactor command-line flag for replicating scrape targets among vmagent instances in the cluster 2021-03-04 10:20:15 +02:00
Aliaksandr Valialkin
9a48c1b53d lib/promscrape/discovery/kubernetes: fix tests after e154f4a644 2021-03-03 22:41:30 +02:00
Nikolay
e154f4a644
Fix ingress discovery api (#1110) 2021-03-03 10:43:39 +02:00
Aliaksandr Valialkin
7906316741 lib/promscrape/discovery/kubernetes: properly check for nil pointer inside interface
See https://mangatmodi.medium.com/go-check-nil-interface-the-right-way-d142776edef1

This fixes a panic when the ScrapeWork is filtered out in swcFunc.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1108
2021-03-03 10:33:17 +02:00
Aliaksandr Valialkin
03dceb700d lib/promscrape: go fmt 2021-03-02 21:20:43 +02:00
Aliaksandr Valialkin
062211c61c lib/promscrape: pre-allocate space for a map in mergeLabels
This should reduce the number of memory allocations when discovering big number of targets
2021-03-02 18:41:58 +02:00
Aliaksandr Valialkin
d1d34664b5 lib/promscrape/discovery: properly track vm_promscrape_discovery_kubernetes_objects_removed_total metric 2021-03-02 18:32:54 +02:00
Aliaksandr Valialkin
6a7ef768ff lib/promscrape/discovery/kubernetes: cache ScrapeWork objects as soon as the corresponding k8s objects are changed
This should reduce CPU usage and memory usage when Kubernetes contains tens of thousands of objects
2021-03-02 16:42:55 +02:00
Aliaksandr Valialkin
22b1941cfc lib/promscrape/discovery/ec2: follow-up after f6114345de 2021-03-02 13:46:26 +02:00
Nikolay
f6114345de
Adds webIndentity token for aws (#1099)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1080
2021-03-02 13:27:09 +02:00
Aliaksandr Valialkin
1a3689af9a lib/promscrape/discovery/kubernetes: deflake tests; a follow-up for 05fb08713c 2021-03-01 14:32:12 +02:00
Aliaksandr Valialkin
62ebf5c88e lib/promscrape: explicitly stop and cleanup service discovery routines when new config is read from -promscrape.config
This should reduce memory usage when `-promscrape.config` file frequently changes
2021-03-01 14:14:00 +02:00
Aliaksandr Valialkin
e32ad9e923 lib/promscrape: use target arg in ScrapeWork cache 2021-03-01 12:29:09 +02:00
Aliaksandr Valialkin
f5d77a7081 lib/promscrape: typo fix, which prevented from caching ScrapeWork entries 2021-03-01 12:12:56 +02:00
Aliaksandr Valialkin
e84153d5ca lib/promscrape: add vm_promscrape_scrapework_cache_* metrics for tracking ScrapeWork cache effectiveness 2021-03-01 12:05:45 +02:00
Aliaksandr Valialkin
530e9904af lib/promscrape: reduce CPU usage an memory allocations when constructing scrapeWorkKey 2021-02-28 22:29:58 +02:00
Aliaksandr Valialkin
e5ca8ac0db lib/promscrape: add ability to spread scrape targets among multiple vmagent instances
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1084
2021-02-28 18:41:08 +02:00
Aliaksandr Valialkin
e02d1ef93c lib/promscrape/discovery/kubernetes: properly account the number of objects when watcher is stopped
A follow-up for b21b110b7a
2021-02-28 17:06:02 +02:00
Aliaksandr Valialkin
b21b110b7a lib/promscrape/discovery/kubernetes: add vm_promscrape_discovery_kubernetes_* metrics for monitoring internal state of k8s service discovery 2021-02-28 16:57:40 +02:00
Aliaksandr Valialkin
c459600346 lib/promscrape/discovery/kubernetes: remove resourceVersionMatch=NotOlderThan query arg when watching for k8s object changes, since it cannot be used when watch=1 query arg is passed 2021-02-28 16:07:14 +02:00
Aliaksandr Valialkin
59a31171e3 lib/promscrape: fix possible deadlock in parallel execution of target relabeling 2021-02-28 16:05:13 +02:00
Aliaksandr Valialkin
68a0f5ce12 lib/promscrape/discovery/kubernetes: fix deadlock in startWatcherForURL
reloadObjects must be called without holding aw.mu lock
2021-02-28 15:26:30 +02:00
Aliaksandr Valialkin
b523e0369c lib/promscrape/discovery/kubernetes: typo fix after 241ffd1f3b 2021-02-28 15:12:17 +02:00
Aliaksandr Valialkin
241ffd1f3b lib/promscrape/discovery/kubernetes: pre-populate labelsByKey in reloadObject() 2021-02-28 15:09:49 +02:00
Aliaksandr Valialkin
05fb08713c lib/promscrape/discovery/kubernetes: compare sorted sets of labels in tests
This should deflake tests where the order of labels isn't stable
2021-02-28 14:10:19 +02:00
Aliaksandr Valialkin
af8b7e8391 lib/promscrape: add missing startWatchersForRole() call at the beginning of apiWatcher.getLabelsForRole 2021-02-28 14:00:17 +02:00
Aliaksandr Valialkin
422b31de40 lib/promscrape/discovery/kubernetes: reload k8s resources on every error
This is needed for obtaining fresh resourceVersion
2021-02-27 01:47:27 +02:00
Aliaksandr Valialkin
a78948ae8b lib/promscrape: yet another typo fix after ed8441ec52 2021-02-26 23:35:47 +02:00
Aliaksandr Valialkin
9fa2632ac3 lib/promscrape: typo fix after ed8441ec52 2021-02-26 23:04:05 +02:00
Aliaksandr Valialkin
ed8441ec52 lib/promscrape: cache ScrapeWork
This should reduce the time needed for updating big number of scrape targets.
2021-02-26 21:43:22 +02:00
Aliaksandr Valialkin
815666e6a6 lib/promscrape/discovery/kubernetes: cache target labels
This should reduce CPU usage on repeated SDConfig.GetLabels() calls.
2021-02-26 20:23:28 +02:00
Aliaksandr Valialkin
19712fc2bd lib/promscrape/discovery/kubernetes: errcheck fix 2021-02-26 17:00:08 +02:00
Aliaksandr Valialkin
c8f2f9b2e8 lib/promscrape: cleanup after 9b2246c29b
Main points:

* Revert changes outside lib/promscrape/discovery/kuberntes . These changes can be applied later in a separate commit
* Minimize changes in lib/promscrape/discovery/kubernetes compared to a93e644001
* Corner case fixes.
2021-02-26 16:54:05 +02:00
Nikolay
9b2246c29b
vmagent kubernetes watch stream discovery. (#1082)
* started work on sd for k8s

* continue work on watch sd

* fixes

* continue work

* continue work on sd k8s

* disable gzip

* fixes typos

* log errror

* minor fix

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-02-26 16:46:13 +02:00
Aliaksandr Valialkin
a93e644001 lib/promscrape: remove duplicate code a bit 2021-02-26 16:39:56 +02:00
Aliaksandr Valialkin
f7b242540b lib/promscrape: reduce processing time for big number of discovered targets by processing them in parallel 2021-02-26 16:39:56 +02:00
Aliaksandr Valialkin
d136081040 lib/promrelabel: add more optimizations for relabeling for common cases 2021-02-22 16:33:55 +02:00
Aliaksandr Valialkin
ff5bbc4b88 lib/promscrape: export vm_promscrape_target_relabel_duration_seconds metric 2021-02-21 23:21:42 +02:00
Aliaksandr Valialkin
2cfb376945 lib/promscrape: typo fix after the commit f26162ec99 2021-02-19 00:33:37 +02:00
Aliaksandr Valialkin
f26162ec99 lib/promscrape: add scrape_align_interval config option into scrape config
This option allows aligning scrapes to a particular intervals.
2021-02-18 23:53:44 +02:00
Aliaksandr Valialkin
38d7e96602 lib/promscrape/discovery/kubernetes: add __meta_kubernetes_endpoints_label_* and __meta_kuberntes_endpoints_annotation_* labels to role: endpoints
This syncs kubernetes SD with Prometheus 2.25
See 617c56f55a
2021-02-15 02:51:16 +02:00
Aliaksandr Valialkin
80dc74dbc1 lib/promscrape: remove vm_promscrape_scrapes_failed_per_url_total and vm_promscrape_scrapes_skipped_by_sample_limit_per_url_total metrics
These metrics may result in big number of time series when vmagent scrapes thousands of targets and these targets constantly changes.

* It is better using `up == 0` query for determining failing targets.
* It is better using the following query for determining targets with exceeded limit on the number of metrics:

  scrape_samples_scraped > 0 if up == 0
2021-02-12 05:26:04 +02:00
Nikolay
48c8c5093b
fixes dockerswarm (#1053)
fixes improper usage of host network services
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1028
2021-02-04 15:56:42 +02:00
Aliaksandr Valialkin
4068f8d590 lib/promscrape: add vm_promscrape_service_discovery_duration_seconds metric 2021-02-02 16:15:25 +02:00
Aliaksandr Valialkin
bd11fd8f1d lib/promscrape: add vm_promscrape_scrape_retries_total, vm_promscrape_discovery_retries_total and vm_promscrape_discovery_requests_total metrics 2021-02-01 20:06:27 +02:00
Aliaksandr Valialkin
fc5b26d856 lib/promscrape: export vm_promscrape_scrapes_failed_per_url_total and vm_promscrape_scrapes_skipped_by_sample_limit_per_url_total metrics
These metrics could be useful for determining imporperly working scrape targets.
Note that these metrics are exported only for failing scrape targets. They aren't exposed for normally working targets.
2021-01-27 00:39:26 +02:00
Aliaksandr Valialkin
de3c662e8a all: consistently use timers from timerpool 2021-01-27 00:39:26 +02:00
Aliaksandr Valialkin
8cea3c3cc4 lib/promscrape: retry scrape and service discovery requests when the remote server closes http keep-alive connection 2021-01-22 13:22:33 +02:00
Nikolay
7976c22797
Fixes error handling for promscrape.streamParse (#1009)
properly return error if client cannot read data,
properly suppress scraper errors
2021-01-12 13:31:47 +02:00
Aliaksandr Valialkin
2c44f9989a lib/promscrape: properly show scrape duration on /targets page
Previously it has been shown as 0.000s for any scrape duration.
2021-01-11 21:14:46 +02:00
Nikolay
b21e16ad0c
fixes kubernetes_sd (#983)
* fixes kubernetes_sd,
adds missing service metadata for pod ports without endpoint
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/982

* fix test
2020-12-24 11:26:14 +02:00
Aliaksandr Valialkin
820669da69 lib/promscrape: code prettifying for 8dd03ecf19 2020-12-24 10:56:10 +02:00
Nikolay
8dd03ecf19
adds proxy_url support, (#980)
* adds proxy_url support,
adds proxy_url to the dockerswarm, eureka, kubernetes and consul service discovery,
adds proxy_url to the scrape_config for targets scrapping,
http based proxy is supported atm,
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/503

* fixes imports
2020-12-24 10:52:37 +02:00
Aliaksandr Valialkin
2dfa746c91 lib/promscrape: remove ID field from ScrapeWork struct. Use a pointer to ScrapeWork as a key in targetStatusMap
This simplifies the code a bit.
2020-12-17 14:32:56 +02:00
Aliaksandr Valialkin
a4c7fcb5e1 lib/promscrape: properly remove deleted target from /targets page
Previously `sw` variable wasn't captured correctly by the started goroutine.
2020-12-15 20:57:09 +02:00
Aliaksandr Valialkin
b730fc2667 lib/promscrape: properly handle scrape errors when stream parsing is enabled
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/967
2020-12-15 14:08:28 +02:00
Aliaksandr Valialkin
60fcac4878 lib/promscrape: add bootstrap styles to /targets html page 2020-12-15 12:37:56 +02:00
Aliaksandr Valialkin
5af2a9ca0e lib/promscrape: formatting fixes for /tarets page 2020-12-15 11:59:04 +02:00
Aliaksandr Valialkin
020917949b lib/promscrape: formatting fixes for /targets page 2020-12-15 11:24:18 +02:00
Aliaksandr Valialkin
ae972429c7 lib/promscrape: add missing whitespace between duration and ago word at /targets page 2020-12-14 14:19:58 +02:00
Aliaksandr Valialkin
069c9ade52 app/{vmagent,vminsert}: follow-up for ce8c2dd1f1: return /targets page in HTML when requested via web browser 2020-12-14 14:06:00 +02:00
Nikolay
ce8c2dd1f1
Changes targets api (#961)
* changes /targets api
adds html response if requester accepts text/html,
adds quick template for /targets api,
fixes pathPrefix for / requests

* changes namings

* renamed targets file

* Update app/victoria-metrics/main.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>

* adds trimspace to qtpl,
moves content-type for targets response closer to writer

* fixes bug with prefix

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2020-12-14 13:36:48 +02:00
Aliaksandr Valialkin
5f9d88a3cb lib/promscrape/discovery/consul: reduce load on Consul API server by increasing timeout for blocking requests from 50 seconds to 9 minutes
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/574
2020-12-11 17:24:13 +02:00
Aliaksandr Valialkin
fd9fd191b9 lib/promscrape/discovery/consul: properly pass Datacenter filter to Consul API server
Previously it has been passed as `sdc` query arg, while it should be passed as `dc` query arg.
See https://www.consul.io/api-docs/health#list-nodes-for-service for details.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/574#issuecomment-740454170
2020-12-08 21:52:42 +02:00
Aliaksandr Valialkin
364f30a6e7 lib/promscrape: store ScrapeWork items by pointer in the slice returned from get*ScrapeWork()
This should prevent from possible 'memory leaks' when a pointer to ScrapeWork item stored in the slice
could prevent from releasing memory occupied by all the ScrapeWork items stored in the slice when they
are no longer used.

See the related commit e205975716 and the related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/825
2020-12-08 17:50:05 +02:00
Aliaksandr Valialkin
08b71d2067 lib/promscrape: re-use strings for labels stored in ScrapeWork
This should reduce memory usage when working with big number of scrape targets.
2020-12-08 12:22:59 +02:00
Aliaksandr Valialkin
0f1b969aa6 lib/promscrape: export vm_promscrape_scrapers_{started|stopped}_total metrics for monitoring target churn rate 2020-12-08 11:57:52 +02:00
Aliaksandr Valialkin
c7ac7c1807 lib/promscrape: store targetStatus entries in targetStatusMap by pointer instead of by value
This guarantees that GC frees memory occupied by targetStatus after it is unregistered from targetStatusMap.
2020-12-08 11:50:48 +02:00
Aliaksandr Valialkin
05813259dc lib/promscrape: export vm_promscrape_active_scrapers{type="<sd_type>"} metric for tracking the number of active scrapers per each service discovery type 2020-12-08 01:54:23 +02:00
Aliaksandr Valialkin
9c1c9d8e76 lib/promscrape: do not enable strict config parsing when -promscrape.config.dryRun command-line flag is passed
Strict parsing for -promscrape.config can be enabled by passing `-promscrape.config.strictParse` command-line flag.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/944
2020-12-07 13:18:50 +02:00
Aliaksandr Valialkin
82972a8f2a lib/promscrape: mention in scrape error message that scrape errors can be disabled by -promscrape.suppressScrapeErrors command-line flag 2020-12-06 23:27:58 +02:00
Aliaksandr Valialkin
299a35948c lib/promscrape: clarify error message on failed connection to scrape target when -enableTCP6 command-line flag isn't set 2020-12-06 13:18:39 +02:00
Aliaksandr Valialkin
52915c8f7e lib/promscrape/discoveryutils: remove limit on the number of concurrently running blocking queries
Too low limit could result in unexpected errors when performing big number of blocking queries.
2020-12-05 12:15:52 +02:00
Aliaksandr Valialkin
de0643fab5 lib/promscrape/discovery/consul: log the time needed for stoppig Consul service watcher 2020-12-03 20:14:55 +02:00
Aliaksandr Valialkin
9cd8eb92f1 lib/promscrape/discovery/consul: make sure that block response contains X-Consul-Index header 2020-12-03 20:05:23 +02:00
Aliaksandr Valialkin
5009b25a03 lib/promscrape: code cleanup after c6dee6c52d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/574
2020-12-03 19:50:53 +02:00
Nikolay
c6dee6c52d
Changes consul discovery api (#921)
* adds consul watch api,
it must reduce load on consul service with blocking wait requests,
changed discoveryClient api with fetchResponseMeta callback.

* small fix

* fix after master merge

* adds watch client at discovery utils

* fixes consul watcher,
changes namings,
fixes data race

* small typo fix

* sanity fix

* fix naming and service node update
2020-12-03 19:47:40 +02:00
Aliaksandr Valialkin
32869e4c0f lib/promscrape: fix failing tests after a906b3862f 2020-11-29 01:26:03 +02:00
Aliaksandr Valialkin
3a32789352 lib/promscrape: reduce memory allocations when unpacking gzipped responses received from scrape targets 2020-11-26 18:32:06 +02:00
Aliaksandr Valialkin
2cea4d403f all: typo fix: thouthand->thousand 2020-11-26 13:33:46 +02:00
Aliaksandr Valialkin
b0a5c382ee lib/promscrape: release http response non-200 status code 2020-11-26 13:25:17 +02:00
Aliaksandr Valialkin
b7fcdb528d app/{vmagent,victoria-metrics}: add -dryRun option and make more clear handling for -promscrape.config.dryRun 2020-11-25 22:59:13 +02:00