Aliaksandr Valialkin
17b87725ed
lib/promscrape/discovery/kubernetes: reduce memory usage when Kubernetes service discovery is configured on a big number of scrape jobs
...
Previously vmagent was creating a separate Kubernetes object cache per each scrape job.
This could result in increased memory usage when monitoring a Kubernetes cluster with big number of objects (pods / nodes / services, etc.)
as seen at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
Now it uses a shared map of scrape objects across multiple scrape jobs.
2021-03-05 17:29:55 +02:00
Aliaksandr Valialkin
c9a25e931b
lib/promscrape/discovery/kubernetes: move apiWatcher code to a separate file
2021-03-05 12:36:05 +02:00
Aliaksandr Valialkin
444fca89f8
lib/promscrape: make cluster membership calculations consistent across 32-bit and 64-bit architectures
2021-03-05 09:06:17 +02:00
Aliaksandr Valialkin
423cd981fb
lib/promscrape: add -promscrape.cluster.replicationFactor
command-line flag for replicating scrape targets among vmagent
instances in the cluster
2021-03-04 10:20:15 +02:00
Aliaksandr Valialkin
9a48c1b53d
lib/promscrape/discovery/kubernetes: fix tests after e154f4a644
2021-03-03 22:41:30 +02:00
Nikolay
e154f4a644
Fix ingress discovery api ( #1110 )
2021-03-03 10:43:39 +02:00
Aliaksandr Valialkin
7906316741
lib/promscrape/discovery/kubernetes: properly check for nil pointer inside interface
...
See https://mangatmodi.medium.com/go-check-nil-interface-the-right-way-d142776edef1
This fixes a panic when the ScrapeWork is filtered out in swcFunc.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1108
2021-03-03 10:33:17 +02:00
Aliaksandr Valialkin
03dceb700d
lib/promscrape: go fmt
2021-03-02 21:20:43 +02:00
Aliaksandr Valialkin
062211c61c
lib/promscrape: pre-allocate space for a map in mergeLabels
...
This should reduce the number of memory allocations when discovering big number of targets
2021-03-02 18:41:58 +02:00
Aliaksandr Valialkin
d1d34664b5
lib/promscrape/discovery: properly track vm_promscrape_discovery_kubernetes_objects_removed_total metric
2021-03-02 18:32:54 +02:00
Aliaksandr Valialkin
6a7ef768ff
lib/promscrape/discovery/kubernetes: cache ScrapeWork objects as soon as the corresponding k8s objects are changed
...
This should reduce CPU usage and memory usage when Kubernetes contains tens of thousands of objects
2021-03-02 16:42:55 +02:00
Aliaksandr Valialkin
22b1941cfc
lib/promscrape/discovery/ec2: follow-up after f6114345de
2021-03-02 13:46:26 +02:00
Nikolay
f6114345de
Adds webIndentity token for aws ( #1099 )
...
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1080
2021-03-02 13:27:09 +02:00
Aliaksandr Valialkin
1a3689af9a
lib/promscrape/discovery/kubernetes: deflake tests; a follow-up for 05fb08713c
2021-03-01 14:32:12 +02:00
Aliaksandr Valialkin
62ebf5c88e
lib/promscrape: explicitly stop and cleanup service discovery routines when new config is read from -promscrape.config
...
This should reduce memory usage when `-promscrape.config` file frequently changes
2021-03-01 14:14:00 +02:00
Aliaksandr Valialkin
e32ad9e923
lib/promscrape: use target arg in ScrapeWork cache
2021-03-01 12:29:09 +02:00
Aliaksandr Valialkin
f5d77a7081
lib/promscrape: typo fix, which prevented from caching ScrapeWork entries
2021-03-01 12:12:56 +02:00
Aliaksandr Valialkin
e84153d5ca
lib/promscrape: add vm_promscrape_scrapework_cache_* metrics for tracking ScrapeWork cache effectiveness
2021-03-01 12:05:45 +02:00
Aliaksandr Valialkin
530e9904af
lib/promscrape: reduce CPU usage an memory allocations when constructing scrapeWorkKey
2021-02-28 22:29:58 +02:00
Aliaksandr Valialkin
e5ca8ac0db
lib/promscrape: add ability to spread scrape targets among multiple vmagent instances
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1084
2021-02-28 18:41:08 +02:00
Aliaksandr Valialkin
e02d1ef93c
lib/promscrape/discovery/kubernetes: properly account the number of objects when watcher is stopped
...
A follow-up for b21b110b7a
2021-02-28 17:06:02 +02:00
Aliaksandr Valialkin
b21b110b7a
lib/promscrape/discovery/kubernetes: add vm_promscrape_discovery_kubernetes_*
metrics for monitoring internal state of k8s service discovery
2021-02-28 16:57:40 +02:00
Aliaksandr Valialkin
c459600346
lib/promscrape/discovery/kubernetes: remove resourceVersionMatch=NotOlderThan
query arg when watching for k8s object changes, since it cannot be used when watch=1
query arg is passed
2021-02-28 16:07:14 +02:00
Aliaksandr Valialkin
59a31171e3
lib/promscrape: fix possible deadlock in parallel execution of target relabeling
2021-02-28 16:05:13 +02:00
Aliaksandr Valialkin
68a0f5ce12
lib/promscrape/discovery/kubernetes: fix deadlock in startWatcherForURL
...
reloadObjects must be called without holding aw.mu lock
2021-02-28 15:26:30 +02:00
Aliaksandr Valialkin
b523e0369c
lib/promscrape/discovery/kubernetes: typo fix after 241ffd1f3b
2021-02-28 15:12:17 +02:00
Aliaksandr Valialkin
241ffd1f3b
lib/promscrape/discovery/kubernetes: pre-populate labelsByKey in reloadObject()
2021-02-28 15:09:49 +02:00
Aliaksandr Valialkin
05fb08713c
lib/promscrape/discovery/kubernetes: compare sorted sets of labels in tests
...
This should deflake tests where the order of labels isn't stable
2021-02-28 14:10:19 +02:00
Aliaksandr Valialkin
af8b7e8391
lib/promscrape: add missing startWatchersForRole()
call at the beginning of apiWatcher.getLabelsForRole
2021-02-28 14:00:17 +02:00
Aliaksandr Valialkin
422b31de40
lib/promscrape/discovery/kubernetes: reload k8s resources on every error
...
This is needed for obtaining fresh resourceVersion
2021-02-27 01:47:27 +02:00
Aliaksandr Valialkin
a78948ae8b
lib/promscrape: yet another typo fix after ed8441ec52
2021-02-26 23:35:47 +02:00
Aliaksandr Valialkin
9fa2632ac3
lib/promscrape: typo fix after ed8441ec52
2021-02-26 23:04:05 +02:00
Aliaksandr Valialkin
ed8441ec52
lib/promscrape: cache ScrapeWork
...
This should reduce the time needed for updating big number of scrape targets.
2021-02-26 21:43:22 +02:00
Aliaksandr Valialkin
815666e6a6
lib/promscrape/discovery/kubernetes: cache target labels
...
This should reduce CPU usage on repeated SDConfig.GetLabels() calls.
2021-02-26 20:23:28 +02:00
Aliaksandr Valialkin
19712fc2bd
lib/promscrape/discovery/kubernetes: errcheck fix
2021-02-26 17:00:08 +02:00
Aliaksandr Valialkin
c8f2f9b2e8
lib/promscrape: cleanup after 9b2246c29b
...
Main points:
* Revert changes outside lib/promscrape/discovery/kuberntes . These changes can be applied later in a separate commit
* Minimize changes in lib/promscrape/discovery/kubernetes compared to a93e644001
* Corner case fixes.
2021-02-26 16:54:05 +02:00
Nikolay
9b2246c29b
vmagent kubernetes watch stream discovery. ( #1082 )
...
* started work on sd for k8s
* continue work on watch sd
* fixes
* continue work
* continue work on sd k8s
* disable gzip
* fixes typos
* log errror
* minor fix
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-02-26 16:46:13 +02:00
Aliaksandr Valialkin
a93e644001
lib/promscrape: remove duplicate code a bit
2021-02-26 16:39:56 +02:00
Aliaksandr Valialkin
f7b242540b
lib/promscrape: reduce processing time for big number of discovered targets by processing them in parallel
2021-02-26 16:39:56 +02:00
Aliaksandr Valialkin
d136081040
lib/promrelabel: add more optimizations for relabeling for common cases
2021-02-22 16:33:55 +02:00
Aliaksandr Valialkin
ff5bbc4b88
lib/promscrape: export vm_promscrape_target_relabel_duration_seconds metric
2021-02-21 23:21:42 +02:00
Aliaksandr Valialkin
2cfb376945
lib/promscrape: typo fix after the commit f26162ec99
2021-02-19 00:33:37 +02:00
Aliaksandr Valialkin
f26162ec99
lib/promscrape: add scrape_align_interval config option into scrape config
...
This option allows aligning scrapes to a particular intervals.
2021-02-18 23:53:44 +02:00
Aliaksandr Valialkin
38d7e96602
lib/promscrape/discovery/kubernetes: add __meta_kubernetes_endpoints_label_*
and __meta_kuberntes_endpoints_annotation_*
labels to role: endpoints
...
This syncs kubernetes SD with Prometheus 2.25
See 617c56f55a
2021-02-15 02:51:16 +02:00
Aliaksandr Valialkin
80dc74dbc1
lib/promscrape: remove vm_promscrape_scrapes_failed_per_url_total and vm_promscrape_scrapes_skipped_by_sample_limit_per_url_total metrics
...
These metrics may result in big number of time series when vmagent scrapes thousands of targets and these targets constantly changes.
* It is better using `up == 0` query for determining failing targets.
* It is better using the following query for determining targets with exceeded limit on the number of metrics:
scrape_samples_scraped > 0 if up == 0
2021-02-12 05:26:04 +02:00
Nikolay
48c8c5093b
fixes dockerswarm ( #1053 )
...
fixes improper usage of host network services
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1028
2021-02-04 15:56:42 +02:00
Aliaksandr Valialkin
4068f8d590
lib/promscrape: add vm_promscrape_service_discovery_duration_seconds metric
2021-02-02 16:15:25 +02:00
Aliaksandr Valialkin
bd11fd8f1d
lib/promscrape: add vm_promscrape_scrape_retries_total
, vm_promscrape_discovery_retries_total
and vm_promscrape_discovery_requests_total
metrics
2021-02-01 20:06:27 +02:00
Aliaksandr Valialkin
fc5b26d856
lib/promscrape: export vm_promscrape_scrapes_failed_per_url_total
and vm_promscrape_scrapes_skipped_by_sample_limit_per_url_total
metrics
...
These metrics could be useful for determining imporperly working scrape targets.
Note that these metrics are exported only for failing scrape targets. They aren't exposed for normally working targets.
2021-01-27 00:39:26 +02:00
Aliaksandr Valialkin
de3c662e8a
all: consistently use timers from timerpool
2021-01-27 00:39:26 +02:00