Commit graph

89 commits

Author SHA1 Message Date
hagen1778
73c9981335
chore: follow-up after c740a8042e
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6d8e02f278)
2024-06-03 11:53:37 +02:00
Hui Wang
5b8c3fc9d0
app/vmalert: support DNS SRV record in -remoteWrite.url (#6299)
part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053,
supports [DNS SRV](https://en.wikipedia.org/wiki/SRV_record) address in
`-remoteWrite.url` command-line option.

(cherry picked from commit d7b5062917)
2024-05-22 10:53:22 +02:00
viperstars
ab78f3c89d
app/vmagent/remotewrite: skip sending empty block to downstream server (#6241)
Occasionally, vmagent sends empty blocks to downstream servers. If a
downstream server returns an unexpected response, vmagent gets stuck in
a retry loop. While vmagent handles 400 and 409 errors, there are
various prometheus remote write implementations that return different
error codes. For example, vector returns a 422 error. To mitigate the
risk of vmagent getting stuck in a retry loop, it is advisable to skip
sending empty blocks to downstream servers.

Co-authored-by: hao.peng <hao.peng@smartx.com>
Co-authored-by: Zhu Jiekun <jiekun.dev@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 3661373cc2)
2024-05-17 14:57:07 +02:00
Aliaksandr Valialkin
0211a04a52
all: replace the outdated url https://docs.victoriametrics.com/vmagent.html with the new one - https://docs.victoriametrics.com/vmagent/ 2024-04-18 01:32:57 +02:00
Aliaksandr Valialkin
ecd782c75e
app/vmagent: follow-up for b3b29ba6ac
- Automatically reload changed TLS root CA pointed by -remoteWrite.tlsCAFile command-line flag
- Automatically reload changed TLS root CA configured via oauth2.tsl_config.ca_file option at -promscrape.config
- Document the change as a feature instead of a bug at docs/CHANGELOG.md
- Simplify the code at lib/promauth, which is responsible for reloading changed TLS root CA files.
- Simplify the usage of lib/promauth.Config.NewRoundTripper() - now it accepts the base http.Transport
  instead of a callback, which can change the internal http.Transport.
- Reuse the default tls config if lib/promauth.Config doesn't contain tls-specific configs.
  This should reduce memory usage a bit when tls isn't used for scraping big number of targets.
- Do not re-read TLS root CA files on every processed request. Re-read them once per second.
  This should reduce CPU usage when scraping big number of targets over https.
- Do not store cert.pem and key.pem files in TestTLSConfigWithCertificatesFilesUpdate, since they can be loaded
  from byte slices via crypto/tls.X509KeyPair().
- Remove obsolete comparisons of string representations for authConfig and proxyAuthConfig at areEqualScrapeConfigs().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5725
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2171
2024-04-04 01:26:38 +03:00
Aliaksandr Valialkin
7edb5f77f1
app/vmagent: properly shutdown when -maxIngestionRate limit is reached
The remotewrite.Stop() expects that there are no pending calls to TryPush().
This means that the ingestionRateLimiter.Register() must be unblocked inside TryPush() when calling remotewrite.Stop().
Provide remotewrite.StopIngestionRateLimiter() function for unblocking the rate limiter before calling the remotewrite.Stop().

While at it, move the rate limiter into lib/ratelimiter package, since it has two users.
Also move the description of the feature to the correct place at docs/CHANGELOG.md.
Also cross-reference -remoteWrite.rateLimit and -maxIngestionRate command-line flags.

This is a follow-up for 02bccd1eb9
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5900
2024-04-03 02:41:11 +03:00
Alexander Marshalov
0b3744effa
[vmagent] added ingestion rate limiting with new flag -maxIngestionRate (#5900)
* [vmagent] added ingestion rate limiting with new flag `-maxIngestionRate`. This flag can be used to limit the number of samples ingested by vmagent per second. If the limit is exceeded, the ingestion rate will be throttled.

* fix changelog

* fix review comment

(cherry picked from commit 02bccd1eb9)
2024-03-25 15:42:52 +01:00
Aliaksandr Valialkin
67091537ae
app/vmagent/remotewrite: add -remoteWrite.tlsHandshakeTimeout command-line flag for tuning tls handshake timeout to -remoteWrite.url
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1699
2024-02-13 02:46:24 +02:00
Aliaksandr Valialkin
d52fd73f18
all: add up to 10% random jitter to the interval between periodic tasks performed by various components
This should smooth CPU and RAM usage spikes related to these periodic tasks,
by reducing the probability that multiple concurrent periodic tasks are performed at the same time.
2024-01-22 18:39:16 +02:00
Aliaksandr Valialkin
3a9cf13aaa
app/{vmagent,vmalert}: add the ability to set OAuth2 endpoint params via the corresponding *.oauth2.endpointParams command-line flags
This is a follow-up for 5ebd5a0d7b

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5427
2023-12-20 21:38:16 +02:00
Morgan
64e96fccd9
Expose OAuth2 Endpoint Parameters to cli (#5427)
The user may which to control the endpoint parameters for instance to
set the audience when requesting an access token. Exposing the
parameters as a map allows for additional use cases without requiring
modification.
2023-12-20 21:38:13 +02:00
Nikolay
25ac2aac31
app/vmagent: allow to disabled on-disk persistence (#5088)
* app/vmagent: allow to disabled on-disk queue
Previously, it wasn't possible to build data processing pipeline with a
chain of vmagents. In case when remoteWrite for the last vmagent in the
chain wasn't accessible, it persisted data only when it has enough disk
capacity. If disk queue is full, it started to silently drop ingested
metrics.

New flags allows to disable on-disk persistent and immediatly return an
error if remoteWrite is not accessible anymore. It blocks any writes and
notify client, that data ingestion isn't possible.

Main use case for this feature - use external queue such as kafka for
data persistence.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110

* adds test, updates readme

* apply review suggestions

* update docs for vmagent

* makes linter happy

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-25 12:12:29 +02:00
Aliaksandr Valialkin
f03e81c693
lib/promauth: follow-up for e16d3f5639
- Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup
  don't prevent from processing the corresponding scrape targets after the file becomes correct,
  without the need to restart vmagent.
  Previously scrape targets with invalid TLS CA file or TLS client certificate files
  were permanently dropped after the first attempt to initialize them, and they didn't
  appear until the next vmagent reload or the next change in other places of the loaded scrape configs.

- Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent.
  Previously the old TLS CA was used until vmagent restart.

- Properly handle errors during http request creation for the second attempt to send data to remote system
  at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing,
  since the returned request is nil on error.

- Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent.
  Previously it could miss details on the source of the request.

- Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header
  of every http request issued by vmagent during service discovery or target scraping.
  Re-use the HTTP client instead until the corresponding scrape config changes.

- Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached,
  e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server
  when auth header cannot be obtained because of temporary error.

- Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config.
  Cache the loaded certificate and the error for one second. This should significantly reduce CPU load
  when scraping big number of targets with the same tls_config.

- Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`.

- Improve test coverage at lib/promauth

- Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later.
  Previously vmagent was exitting in this case.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959
2023-10-26 09:55:47 +02:00
Hui Wang
d7dd7614eb
fix inconsistent behaviors with prometheus when scraping (#5153)
* fix inconsistent behaviors with prometheus when scraping

1. address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959. skip job with wrong syntax in `scrape_configs` with error logs instead of exiting;
2. show error messages on vmagent /targets ui if there are wrong auth configs in `scrape_configs`, previously will print error logs and do scrape without auth header;
3. don't send requests if there are wrong auth configs in:
    1. vmagent remoteWrite;
    2. vmalert datasource/remoteRead/remoteWrite/notifier.

* add changelogs

* address review comments

* fix ut
2023-10-26 08:56:54 +02:00
Aliaksandr Valialkin
1159b31270
app/vmagent/remotewrite: do not retry request immediately on io.ErrUnexpectedEOF, since this error isn't returned on stale connection
Also, mention the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4139 in comments to the code
in order to simplify further maintenance of this code.

This is a follow-up for 992a1c0a3a
2023-08-29 09:48:49 +02:00
hagen1778
33bf28e1bd
app/vmagent: fix comment typo after 992a1c0a3a
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 757ae4275b)
2023-08-27 09:05:04 +02:00
Roman Khavronenko
b9a2512ac3
vmagent: retry failed write request on the closed connection (#4857)
* vmagent: retry failed write request on the closed connection

 Retry failed write request on the closed connection immediately,
 without waiting for backoff. This should improve data delivery speed
 and reduce amount of error logs emitted by vmagent when using idle connections.

 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4139

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmagent: retry failed write request on the closed connection

Re-instantinate request before retry as body could have been already spoiled.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
(cherry picked from commit 992a1c0a3a)
2023-08-27 09:04:59 +02:00
Aliaksandr Valialkin
0ee8a9120a
lib/flagutil: add defaultValue arg to NewArray{Int,Bytes,Duration} functions
The defaultValue is printed in the flag description when passing -help to the app.

This is a follow-up for aef31f201a

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4776
2023-08-12 04:19:34 -07:00
Aliaksandr Valialkin
d02fb47c2d
app/vmagent/remotewrite: keep in sync the default value for -remoteWrite.sendTimeout option in the description with the actually used timeout
This is a follow-up for aef31f201a

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4776
2023-08-11 05:46:27 -07:00
Alexander Marshalov
d90dae2a68
add info about remoteWrite.sendTimeout default value (#4776)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-08-11 04:53:16 -07:00
Zakhar Bessarab
e42f856b56
app/vmagent/remotewrite: fix error message for auth config (#4545)
Error message will be present for any auth error, but message claims an error is about OAuth2 configuration which is confusing.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-07-06 22:10:13 -07:00
Alexander Marshalov
d321ea91f2
fixed typos in documentation and commandline flags descriptions (#4275) 2023-05-10 02:22:06 -07:00
Aliaksandr Valialkin
b60594a548
app/vmagent/remotewrite: follow-up for e3a756d82869f8c357b072f6e635ebfc7d65dd2c
- Document the fix
- Move the detection of VictoriaMetrics remoteWrite protocol from client.init() to newHTTPClient()
  This simplifies the fix to the following diff:

diff --git a/app/vmagent/remotewrite/client.go b/app/vmagent/remotewrite/client.go
index 099899c19..70b904af4 100644
--- a/app/vmagent/remotewrite/client.go
+++ b/app/vmagent/remotewrite/client.go
@@ -151,10 +151,6 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
        }
        c.sendBlock = c.sendBlockHTTP

-       return c
-}
-
-func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
        useVMProto := forceVMProto.GetOptionalArg(argIdx)
        usePromProto := forcePromProto.GetOptionalArg(argIdx)
        if useVMProto && usePromProto {
@@ -173,6 +169,10 @@ func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
        }
        c.useVMProto = useVMProto

+       return c
+}
+
+func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
2023-03-08 01:35:31 -08:00
Dmytro Kozlov
2cf6797a24
app/vmagent: fix panic if auth config not defined (#530) 2023-03-08 01:35:04 -08:00
Aliaksandr Valialkin
1e156ac3c3
app/vmagent: use the provided auth options when checking whether the remote storage supports VictoriaMetrics remote write protocol
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3847
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-26 12:19:53 -08:00
Aliaksandr Valialkin
f579cac297
app/vmagent: automatically detect whether the remote storage supports VictoriaMetrics remote write protocol
Substitute -remoteWrite.useVMProto with -remoteWrite.forcePromProto command-line flag,
which can be used for forcing Prometheus remote write protocol in cases when the remote storage
supports VictoriaMetrics remote write protocol.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3847
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-23 17:38:47 -08:00
my-git9
7d86c5c94a
chore: Use http constants to replace numbers (#3846)
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-02-22 18:59:32 -08:00
Aliaksandr Valialkin
80c6d1e24c
app/vmagent: add support for VictoriaMetrics remote write protocol, which allows saving up to 10x on network bandwidth costs under high load
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-20 18:40:40 -08:00
Aliaksandr Valialkin
6f9ce3f6d6
lib/flagutil: rename Array to ArrayString
This makes the ArrayString more consistent with other Array* types.

While at it, add ArrayBytes type, which will be used for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3071
2022-10-01 18:28:19 +03:00
Aliaksandr Valialkin
1905618d10
all: subsitute ioutil.ReadAll with io.ReadAll
ioutil.ReadAll is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil
VictoriaMetrics requires at least Go1.18, so it is OK to switch from ioutil.ReadAll to io.ReadAll.

This is a follow-up for 02ca2342ab
2022-08-22 00:16:04 +03:00
Aliaksandr Valialkin
ecbe1ddf1b
lib/promscrape/discovery/ec2: properly handle custom endpoint option in ec2_sd_configs
This option was ignored since d289ecded1

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1287
2022-08-05 18:52:37 +03:00
Roman Khavronenko
23e85e0fc5
vmagent: expose metric vmagent_remotewrite_queues (#2871)
The new metric `vmagent_remotewrite_queues` exports a static value of
number of configured remote write queus. This metric is useful to
calculate total saturation per each configured URL with given number
of queues. See corresponding changes to vmagent alerts and dashboard.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-07-18 14:41:04 +03:00
Boris Petersen
cbf01ee4ae
fix typo introduced in pr #2604 (#2866)
Signed-off-by: Boris Petersen <boris.petersen@idealo.de>
2022-07-13 23:45:42 +03:00
Aliaksandr Valialkin
ecc11dc32d
lib/promauth: refactor NewConfig in order to improve maintainability
1. Split NewConfig into smaller functions
2. Introduce Options struct for simplifying construction of the Config with various options

This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2684
2022-07-04 14:31:43 +03:00
Aliaksandr Valialkin
1e6b0a1f54
app/vmagent/remotewrite: do not shadow headers global variable in getAuthConfig 2022-06-30 20:18:43 +03:00
Aliaksandr Valialkin
0f6c17ed06
app/vmagent/remotewrite: clarify descriptions for -remoteWrite.* options, which must be set per each -remoteWrite.url 2022-06-30 20:18:42 +03:00
Aliaksandr Valialkin
7fc03a1deb
app/vmagent/remotewrite: add -remoteWrite.header command-line flag for setting additional http headers to send to -remoteWrite.url
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2805
2022-06-30 20:00:59 +03:00
Aliaksandr Valialkin
6c66804fd3
all: locate throttled loggers via logger.WithThrottler() only once and then use them
This reduces the contention on logThrottlerRegistryMu mutex when logger.WithThrottler()
is called frequently from concurrent goroutines.
2022-06-27 12:34:30 +03:00
Aliaksandr Valialkin
3ae6300497
lib/promauth: add ability to send additional http headers in requests to scrape targets
This solves https://stackoverflow.com/questions/66032498/prometheus-scrape-metric-with-custom-header
2022-06-22 20:40:50 +03:00
Boris Petersen
3a8b4fab97
Add ability to sign requests for all AWS services (#2604)
This adds the ability to utilize sigv4 signing for all AWS services not
just "aps". When the newly introduced property "service" is not set it
will default to "aps".

Signed-off-by: Boris Petersen <boris.petersen@idealo.de>
2022-05-20 14:20:00 +03:00
Aliaksandr Valialkin
9c89ea1c90
app/vmagent: rename vmagent_remote_write_rate_limit_reached_total to vmagent_remotewrite_rate_limit_reached_total for the sake of consistency with other vmagent_remotewrite_ metrics 2022-05-06 15:02:18 +03:00
Aliaksandr Valialkin
873f55bac5
lib/awsapi: pass filtersQueryString arg to GetEC2APIResponse() function, so the caller could decide whether to use the filters during the AWS API query
The filters shouldn't be passed to DescribeAvailabilityZones API call.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1626
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1287

Related commits:
0e09fdb8b0
d289ecded1
2022-05-05 10:29:47 +03:00
Aliaksandr Valialkin
9fea63d1c8
app/vmagent: rename -remoteWrite.useSigv4 command-line flag to -remoteWrite.aws.useSigv4, so its name is consistent with the other -remoteWrite.aws.* command-line flags 2022-05-04 20:41:45 +03:00
Nikolay
7e58cba6cf
{lib/promscrape,app/vmagent}: adds sigv4 support for vmagent remoteWrite (#2458)
* {lib/promscrape,app/vmagent}: adds sigv4 support for vmagent remoteWrite
moves aws related code into separate lib from lib/promscrape
it allows to write data from vmagent to the AWS managed prometheus (cortex)

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1287

* Apply suggestions from code review

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-05-04 20:28:37 +03:00
Gard Rimestad
ceadcb7f8e
app/vmagent add metric for rate limit (#2521)
This adds a metric for the rate limit.
The limit is present as a flag currently:
`flag{name="remoteWrite.rateLimit", value="500000", is_set="true"} 1`

We are running many instances of vmagent and when creating alerts it is harder than it needs to be when extracting the value from the flag.

With this change it should be easier to monitor how close to the limit we are.

`((100/vmagent_remotewrite_rate_limit{account="account"})*sum (rate(vmagent_remotewrite_conn_bytes_written_total{account="account"}))) and ON (account) flag{name="remoteWrite.rateLimit"} == 1`
2022-05-02 22:25:47 +03:00
Aliaksandr Valialkin
f082e64e0c
app/vmagent: reduce the probability of TLS handshake timeout when dialing the remote storage
The following actions are taken:

- Increase the TLS hashdshake timeout from 5 seconds to 10 seconds
- Increase dial timeout from 5 seconds to 30 seconds
- Specify DialContext instead of Dial in http.Transport. This allows properly handling
  the Context arg during dialing the remote storage

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1699
2022-04-06 12:35:14 +03:00
Roman Khavronenko
23e1de06ee
vmagent: add error log for skipped data block when rejected by receiv… (#1956)
* vmagent: add error log for skipped data block when rejected by receiving side

Previously, rejected data blocks were silently dropped - only metrics were update.
From operational perspective, having an additional logging for such cases is preferable.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1911

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmagent: throttle log messages about skipped blocks

The new type of logger was added to logger pacakge.
This new type supposed to control number of logged messages
by time.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/logger: make LogThrottler public, so its methods can be inspected by external packages

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-21 16:42:38 +02:00
Aliaksandr Valialkin
053e85ff3d
all: typo fix: unexected -> unexpected 2021-12-20 17:40:13 +02:00
Aliaksandr Valialkin
847004fa77
app/{vminsert,vmagent}: hide passwords and auth tokens by default at /config page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1764
2021-11-05 14:42:13 +02:00
Aliaksandr Valialkin
ad445a06cd
lib/promscrape: properly show proxy_url option value at /config page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1755
2021-10-26 21:24:22 +03:00