Commit graph

614 commits

Author SHA1 Message Date
Aliaksandr Valialkin
682d9dae57
lib/promscrape/discovery/kubernetes: stop all the url watchers, which belong to a particular groupWatcher, at once
Previously url watchers for pod, service and node objects could be mistakenly closed
when service discovery was set up only for endpoints and endpointslice roles,
since watchers for these roles may start start pod, service and node url watchers
with nil apiWatcher passed to groupWatcher.startWatchersForRole().

Now all the url watchers, which belong to a particular groupWatcher, are stopped at once
when this groupWatcher has no apiWatcher subscribers.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5216

The issue has been introduced in v1.93.5 when addressing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
2023-10-27 14:37:45 +02:00
Aliaksandr Valialkin
eb2a7ced9c
lib/promauth: properly parse string contents for ca, cert and key fields at tls_config
Previously yaml parser wasn't accepting string values for these fields,
because it was mistakenly expecting a list of uint8 values instead.
2023-10-27 14:23:15 +02:00
Hui Wang
836cccdbd8
do not print redundant error logs when failed to scrape consul or no… (#5239)
* do not print redundant error logs when failed to scrape consul or nomad target
prometheus performs the same because it uses consul lib which just drops the error(1806bcb38c/api/api.go (L1134))
2023-10-27 14:20:21 +02:00
Zakhar Bessarab
a3afce1af8
lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs (#5048)
lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs

It is possible that context.Cancelled will appear after k8s watcher was closed due to reload(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850).

Logging an error misinforms user and looks like vmagent discovery will stop working even though this does not affect discovery.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-22 13:15:18 +02:00
Aliaksandr Valialkin
d7a57471dd
lib/promscrape/discovery/kubernetes: follow-up after 03fece44e0
- Properly update vm_promscrape_discovery_kubernetes_url_watchers
  and vm_promscrape_discovery_kubernetes_group_watchers metrics after config changes

- Properly stop goroutine responsible for recreating scrapeWorks after the corresponding urlWatcher is stopped

- Log the event when urlWatcher is stopped in order to simplify debugging

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4861
2023-09-19 00:44:42 +02:00
Aliaksandr Valialkin
f48e515c6b
lib/promscrape/discovery/kubernetes: wait for 10 seconds before checking whether the urlWatcher must be stopped
This should prevent from excess urlWatcher churn on config reload, since it leads to removal of all the apiWatchers
before creating new apiWatchers. So, every config reload would lead to stopping of all the previous urlWatchers
and starting new urlWatchers.

The new logic gives 10 seconds for config reload before stopping unused urlWatchers.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4861
2023-09-19 00:43:48 +02:00
Aliaksandr Valialkin
e7e96589af
lib/promscrape/discovery/kubernetes: follow-up after eeb862f3ff
- Move the bugfix description to the correct place in docs/CHANGELOG.md
- Prevent from logging of 'context canceled' errors after the url watcher is stopped,
  since these errors are expected and may confuse users.
- Remove unused urlWatcher.refCount field.
- Remove unused urlWatcher.close() method.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
2023-09-19 00:42:34 +02:00
Zakhar Bessarab
669ad1ca55
lib/promscrape/discovery/kubernetes: fix leaking api watcher (#4861)
* lib/promscrape/discovery/kubernetes: fix leaking api watcher

goroutine which was polling k8s API had no execution control. This leaded to leaking goroutines during config reload.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: use reference counting for urlWatcher cleanup

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: remove waitgroup sync for goroutines polling API server

This is unnecessary since context will is cancelled and new requests will not be sent. Also, using waitgroup will increase time required to perform reload which might result in missed scrapes.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: clarify comment

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Apply suggestions from code review

* lib/promscrape/discovery/kubernetes: address review feedback

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-18 17:14:15 +02:00
Aliaksandr Valialkin
2388c3c192
Makefile: update golangci-lint from v1.51.2 to v1.54.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2
2023-09-01 11:15:51 +02:00
Nikolay
dd9d64680a
lib/promscrape/k8s_sd: set resourceVersion to 0 by default for watch … (#4901)
* lib/promscrape/k8s_sd: set resourceVersion to 0 by default for watch requests
it must reduce load for kubernetes ETCD servers. Since requests without resourceVersion performs force cache sync at kubernetes API server with ETCD
more info at https://kubernetes.io/docs/reference/using-api/api-concepts/\#semantics-for-watch
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4855

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-08-30 16:13:38 +02:00
Zakhar Bessarab
4f03fcc4da
lib/promscrape/client: sync timeout for HostClient and http.Client (#4889)
Initially, stream parse mode was reading data from response and parsing it on flight. This was causing longer delay to read the whole response and required increasing timeout value to allow data processing while reading. So that 908e35affd increased timeout value to fix this.

But after 74c00a8762 response in stream parse mode is saved into memory and then parsed eliminating necessity of having timeout value higher that for usual scrape.

Updates: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4847
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-08-29 10:50:15 +02:00
hagen1778
99fd06deff
app/vmagent: follow-up after 6788704152
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4884
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-29 09:54:31 +02:00
Zakhar Bessarab
a3e7434d2e
lib/promscrape/client: make User-Agent consistent between fasthttp and native client (#4886)
User agent was not set for native client which resulted in using one provided by Golang.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4884

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-08-29 09:53:00 +02:00
Aliaksandr Valialkin
3e62c71e8c
lib/promscrape: add a comment why honor_timestamps is set to false by default
This should prevent from returning it back to true in the future

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697
2023-07-28 21:36:55 -07:00
Aliaksandr Valialkin
ee98f9ae66
lib/promscrape: use local scrape timestamp for scraped metrics unless honor_timestamps: true is set explicitly
This fixes the case with gaps for metrics collected from cadvisor,
which exports invalid timestamps, which break staleness detection at VictoriaMetrics side.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697 ,
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697#issuecomment-1654614799
and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697#issuecomment-1656540535

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1773
2023-07-28 21:11:46 -07:00
Aliaksandr Valialkin
1f30f53df2
lib/promscrape/discovery: close unused HTTP connections to service discovery servers
This should prevent from connection leaks

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4724
2023-07-27 14:47:55 -07:00
Aliaksandr Valialkin
324a3c5288
lib/promscrape: follow-up after 6aa50ca954
- Improve docs
- Hide `debug relabeling` column when -promscrape.dropOriginalLabels command-line flag is set
- Inline the code from the added template functions, since the code is harder to follow
  with the template functions, especially when these functions have misleading names.
  Also, these functions are used only in one place, e.g. they do not reduce the amounts of code.
- Hide `click to show original labels` title at `labels` column when original labels aren't available.
- Show the reason on whey original labels aren't available at /service-discovery page.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4597
2023-07-20 21:54:09 -07:00
Dmytro Kozlov
f0d8f77e6d
app/vmagent: fix creating target id if --promscrape.dropOriginalLabels flag was used (#4616)
* app/vmagent: fix creating target id if `--promscrape.dropOriginalLabels` flag was used

* app/vmagent: hide links if OriginalLabels was dropped

* app/vmagent: update CHANGELOG.md and added information to the docs

* app/vmagent: fix comments
2023-07-20 19:21:41 -07:00
Aliaksandr Valialkin
992c300ce9
all: replace atomic.Value with atomic.Pointer[T]
This eliminates the need in .(*T) casting for results obtained from Load()

Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map,
because map is already a pointer type.
2023-07-19 17:48:26 -07:00
Alexander Marshalov
4084dba9e4
fixed service name detection for consulagent service discovery in case of a difference in service name and service id (#4390) (#4439)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-07-06 16:53:29 -07:00
Aliaksandr Valialkin
5b8095a30a
lib/promscrape: disable support for service discovery and metrics scrape via http2
Reasons for disabling http2:

- http2 is used very rarely comparing to http for Prometheus metrics exposition and service discovery
- http2 is much harder to debug than http
- http2 has very bad security record because of its complexity - see https://portswigger.net/research/http2

VictoriaMetrics components are compiled with nethttpomithttp2 tag because of these issues.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4283
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4274

This is a follow-up for 72c3cd47eb
2023-07-06 16:04:31 -07:00
Aliaksandr Valialkin
6a3cee5c2c
lib/promscrape/discoveryutils: re-use checkRedirect function for both client and blockingClient
Also document follow_redirects option at https://docs.victoriametrics.com/sd_configs.html#http-api-client-options

This is a follow-up for b3d0ff463a

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4282
2023-07-06 10:52:13 -07:00
Roman Khavronenko
d677c2a5a6
lib/promscrape/discoveryutils: properly check for net.ErrClosed (#4426)
This error may be wrapped in another error, and should normally be tested using
`errors.Is(err, net.ErrClosed)`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit dfe53a36fc)
2023-06-09 10:41:07 +02:00
Roman Khavronenko
fb9b8f6b1b
app/vmagent: mention enable_http2 in changelog (#4403)
Follow-up after
72c3cd47eb

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 3305a6901c)
2023-06-09 10:40:24 +02:00
Haleygo
6edf94c4b9
vmagent:scrape config support enable_http2 (#4295)
app/vmagent: support `enable_http2` in scrape config

This change adds HTTP2 support for scrape config
and improves compatibility with Prometheus config.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4283

(cherry picked from commit 72c3cd47eb)
2023-06-09 10:40:17 +02:00
Haleygo
73a8f763a0
vmagent:support follow_redirects on SD level (#4286)
* vmagent:support follow_redirects on SD level

* fix follow_redirects on sd level

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4282
(cherry picked from commit b3d0ff463a)
2023-06-02 13:19:35 +02:00
Alexander Marshalov
d321ea91f2
fixed typos in documentation and commandline flags descriptions (#4275) 2023-05-10 02:22:06 -07:00
Aliaksandr Valialkin
a47b9e55ac
lib/promscrape/discovery/consulagent: substitute metaPrefix with the __meta_consulagent_ plaintext string
This simplifies future code navigation and search for the specific meta-label starting from __meta_consulagent_* prefix.
For example, `grep __meta_consulagent_namespace` finds the exact place where this label is defined.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3953
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4217
2023-05-09 22:58:08 -07:00
Aliaksandr Valialkin
e2358d3bd5
docs: clarify docs after 5ee344824f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4183
2023-05-09 22:49:13 -07:00
Aliaksandr Valialkin
8703b2fa87
app/vmselect: small cleanup after 4f3f9950d0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3807
2023-05-09 22:45:02 -07:00
Alexander Marshalov
de68e94c91
fixed vm_promscrape_config_last_reload_successful metric value recovery after successful reloading with unchanged content (#4260) (#4268)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-05-09 22:17:27 -07:00
Zakhar Bessarab
370a421ef4
lib/promscrape/discovery/kubernetes: follow-up for d5e94721db (#4255)
- add changelog reference to an author
- fix tests
- add metadata to match Prometheus behavior

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-05-09 21:29:27 -07:00
Vasilchenko Anton
866dbee4e3
Add endpoint labels for pod targets discovered form endpoint but has different ports (#4253)
Signed-off-by: Vasilchenko Anton <vasilchenko-as@yandex.ru>
2023-05-09 21:25:56 -07:00
Alexander Marshalov
26fc4afff8
added new consulagent service discovery (#3953) (#4217) 2023-05-08 23:43:59 -07:00
Zakhar Bessarab
1b06af321f
lib/promscrape/discovery/kubernetes: add common labels to all ports discovered from endpoints (#4235)
* lib/promscrape/discovery/kubernetes: add common labels to all ports discovered from endpoints

Sets
`__meta_kubernetes_endpoints_name` and `__meta_kubernetes_namespace` labels to all ports of pod.
Prometheus sets those labels to all ports in pod (0ab9553611/discovery/kubernetes/endpoints.go (L267C15-L269)) even if port is not matching any service.

See: #4154

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: fix test for updated discovery logic

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-05-08 22:15:37 -07:00
Nikolay
cfa058dfec
lib/promscrape: adds filter for consul_sd_configs: (#4184)
* lib/promscrape: adds filter for consul_sd_configs:
it allows advanced filtering for consul service discovery requests
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4183

* typo fix

* removes deprecation mentions since it's not relevant

* Update docs/CHANGELOG.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-05-08 16:14:15 -07:00
Yury Molodov
ddc5197bce
vmui: add metric relabel debug (#3889)
* feat: add metric relabel debug (#3807)

* fix: add link to relabeling cookbook

* lib/promrelabel: merge, fix conflicts

* lib/promrelabel: fix diff

* docs/vmui: add metric relabel playground

---------

Co-authored-by: dmitryk-dk <kozlovdmitriyy@gmail.com>
2023-05-08 14:59:35 -07:00
Aliaksandr Valialkin
02ceebccc0
lib/promscrape: do not re-use previously loaded scrape targets on failed attempt to load updated scrape targets at file_sd_configs
The logic employed for re-using the previously loaded scrape target was broken initially.
The commit cc0427897c tried to fix it, but the new logic
became too complex and fragile. So it is better to just remove this logic,
since the targets from temporarily broken file should be eventually loaded on next
attempts every -promscrape.fileSDCheckInterval

This also allows removing fragile hacks around __vm_filepath label.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3989
2023-04-02 21:11:12 -07:00
Dmytro Kozlov
6f0512a81c
lib/promscrape: fix the problem with scrape work duplicates when file_sd_config can't be read (#4027)
* lib/promscrape: fix the problem with scrape work duplicates when file_sd_config can't be read

* lib/promscrape: clarified comment

* lib/promscrape: made better approach to handle a problem with growing []*ScrapeWork on each error when loading config

* lib/promscrape: added CHANGELOG.md

* Update docs/CHANGELOG.md

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-04-02 21:11:10 -07:00
Aliaksandr Valialkin
dad13c0a91
lib/streamaggr: follow-up for ff72ca14b9
- Make sure that the last successfully loaded config is used on hot-reload failure
- Properly cleanup resources occupied by already initialized aggregators
  when the current aggregator fails to be initialized
- Expose distinct vmagent_streamaggr_config_reload* metrics per each -remoteWrite.streamAggr.config
  This should simplify monitoring and debugging failed reloads
- Remove race condition at app/vminsert/common.MustStopStreamAggr when calling sa.MustStop() while sa
  could be in use at realoadSaConfig()
- Remove lib/streamaggr.aggregator.hasState global variable, since it may negatively impact scalability
  on system with big number of CPU cores at hasState.Store(true) call inside aggregator.Push().
- Remove fine-grained aggregator reload - reload all the aggregators on config change instead.
  This simplifies the code a bit. The fine-grained aggregator reload may be returned back
  if there will be demand from real users for it.
- Check -relabelConfig and -streamAggr.config files when single-node VictoriaMetrics runs with -dryRun flag
- Return back accidentally removed changelog for v1.87.4 at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3639
2023-03-31 22:54:10 -07:00
Dmytro Kozlov
85b01c4aa7
lib/promrelabel: make target url from labels on target relabel page (#3882)
* lib/promrelabel: make target url from labels on target relabel page

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-20 22:08:39 -07:00
Haleygo
b301455150
fix some typo (#3898) 2023-03-08 00:32:57 -08:00
Zakhar Bessarab
1db010797e
lib/promscrape: correctly register vm_promscrape_config_* metrics (#3876)
* lib/promscrape: set `vm_promscrape_config_last_reload_successful` to 1 if there was no promscrape config provided

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape: register `vm_promscrape_config_*` metrics only in case promscrape config is used

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-27 12:06:49 -08:00
Aliaksandr Valialkin
18dd0d1dbf
.golangci.yml: properly enable revive linter and fix all the warnings it detects 2023-02-26 12:19:58 -08:00
Aliaksandr Valialkin
aed2dbe45e
lib/promscrape: follow-up for 43e104a83f
- Return immediately on context cancel during the backoff sleep.
  This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747

- Add a comment describing why the second attempt to obtain the response from remote side
  is perfromed immediately after the first attempt.

- Remove fasthttp dependency from lib/promscrape/discoveryutils

- Set context deadline before calling doRequestWithPossibleRetry().
  This simplifies the doRequestWithPossibleRetry() a bit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3293
2023-02-24 12:25:36 -08:00
Zakhar Bessarab
5ea6d71cb3
fix: do not use exponential backoff for first retry of scrape request (#3824)
* fix: do not use exponential backoff for first retry of scrape request (#3293)

* lib/promscrape: refactor `doRequestWithPossibleRetry` backoff to simplify logic

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update lib/promscrape/client.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* lib/promscrape: refactor `doRequestWithPossibleRetry` to make it more straightforward

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-02-24 12:25:35 -08:00
Aliaksandr Valialkin
bb5a3dc153
lib/promscrape/discovery/kuma: substitute blocking HTTP call with non-blocking HTTP call at discoveryutils.Client 2023-02-23 15:14:00 -08:00
Mattias Ängehov
3904b8959e
Azure Service Discovery - Fix token fetch for Container Apps/App Services (#3832)
* Modify API version when running in Container App

* Handle expires on from token response

Response from IMDS does not always contain expires in value which is
currently used to get the token expiry time. An example resources that
doesn't provide it are Container Apps and App Service.

Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com>

* Fix client id parameter for user assigned identity

* Apply suggestions from code review

---------

Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com>
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2023-02-22 19:24:23 -08:00
Aliaksandr Valialkin
0c60e4a30a
all: consistently use http.Method{Get,Post,Put} across the codebase
This is a follow-up after 9dec3c8f80
2023-02-22 19:01:09 -08:00
my-git9
7d86c5c94a
chore: Use http constants to replace numbers (#3846)
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-02-22 18:59:32 -08:00