* vmagent: retry failed write request on the closed connection
Retry failed write request on the closed connection immediately,
without waiting for backoff. This should improve data delivery speed
and reduce amount of error logs emitted by vmagent when using idle connections.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4139
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmagent: retry failed write request on the closed connection
Re-instantinate request before retry as body could have been already spoiled.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
Error message will be present for any auth error, but message claims an error is about OAuth2 configuration which is confusing.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
ioutil.ReadAll is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil
VictoriaMetrics requires at least Go1.18, so it is OK to switch from ioutil.ReadAll to io.ReadAll.
This is a follow-up for 02ca2342ab
The new metric `vmagent_remotewrite_queues` exports a static value of
number of configured remote write queus. This metric is useful to
calculate total saturation per each configured URL with given number
of queues. See corresponding changes to vmagent alerts and dashboard.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This adds the ability to utilize sigv4 signing for all AWS services not
just "aps". When the newly introduced property "service" is not set it
will default to "aps".
Signed-off-by: Boris Petersen <boris.petersen@idealo.de>
* {lib/promscrape,app/vmagent}: adds sigv4 support for vmagent remoteWrite
moves aws related code into separate lib from lib/promscrape
it allows to write data from vmagent to the AWS managed prometheus (cortex)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1287
* Apply suggestions from code review
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This adds a metric for the rate limit.
The limit is present as a flag currently:
`flag{name="remoteWrite.rateLimit", value="500000", is_set="true"} 1`
We are running many instances of vmagent and when creating alerts it is harder than it needs to be when extracting the value from the flag.
With this change it should be easier to monitor how close to the limit we are.
`((100/vmagent_remotewrite_rate_limit{account="account"})*sum (rate(vmagent_remotewrite_conn_bytes_written_total{account="account"}))) and ON (account) flag{name="remoteWrite.rateLimit"} == 1`
The following actions are taken:
- Increase the TLS hashdshake timeout from 5 seconds to 10 seconds
- Increase dial timeout from 5 seconds to 30 seconds
- Specify DialContext instead of Dial in http.Transport. This allows properly handling
the Context arg during dialing the remote storage
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1699
* vmagent: add error log for skipped data block when rejected by receiving side
Previously, rejected data blocks were silently dropped - only metrics were update.
From operational perspective, having an additional logging for such cases is preferable.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1911
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmagent: throttle log messages about skipped blocks
The new type of logger was added to logger pacakge.
This new type supposed to control number of logged messages
by time.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/logger: make LogThrottler public, so its methods can be inspected by external packages
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This metric can be used for determining high saturation of every connection to remote storage with
an alerting query `rate(vmagent_remotewrite_send_duration_seconds_total) > 0.9s`.
This query triggers when a connection is satureated by more than 90%
* adds blocks drop at 400 BadRequest status code
recieved from remote storage,
not expected that remote storage will be able to handle it on retry
* removes error logging for dropped blocks,
its expected error