app/vmagent: reduce the probability of TLS handshake timeout when dialing the remote storage

The following actions are taken:

- Increase the TLS hashdshake timeout from 5 seconds to 10 seconds
- Increase dial timeout from 5 seconds to 30 seconds
- Specify DialContext instead of Dial in http.Transport. This allows properly handling
  the Context arg during dialing the remote storage

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1699
This commit is contained in:
Aliaksandr Valialkin 2022-04-06 12:28:54 +03:00
parent 087609a12b
commit f082e64e0c
No known key found for this signature in database
GPG key ID: A72BEC6CD3D0DED1
3 changed files with 23 additions and 4 deletions

View file

@ -92,9 +92,9 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
}
tlsCfg := authCfg.NewTLSConfig()
tr := &http.Transport{
Dial: statDial,
DialContext: statDial,
TLSClientConfig: tlsCfg,
TLSHandshakeTimeout: 5 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
MaxConnsPerHost: 2 * concurrency,
MaxIdleConnsPerHost: 2 * concurrency,
IdleConnTimeout: time.Minute,

View file

@ -1,7 +1,9 @@
package remotewrite
import (
"context"
"net"
"sync"
"sync/atomic"
"time"
@ -9,9 +11,25 @@ import (
"github.com/VictoriaMetrics/metrics"
)
func statDial(networkUnused, addr string) (conn net.Conn, err error) {
func getStdDialer() *net.Dialer {
stdDialerOnce.Do(func() {
stdDialer = &net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
DualStack: netutil.TCP6Enabled(),
}
})
return stdDialer
}
var (
stdDialer *net.Dialer
stdDialerOnce sync.Once
)
func statDial(ctx context.Context, networkUnused, addr string) (conn net.Conn, err error) {
network := netutil.GetTCPNetwork()
conn, err = net.DialTimeout(network, addr, 5*time.Second)
conn, err = stdDialer.DialContext(ctx, network, addr)
dialsTotal.Inc()
if err != nil {
dialErrors.Inc()

View file

@ -33,6 +33,7 @@ Previously the `-search.maxUniqueTimeseries` command-line flag was used as a glo
When using [cluster version of VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html), these command-line flags (including `-search.maxUniqueTimeseries`) must be passed to `vmselect` instead of `vmstorage`.
* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html) and [vmauth](https://docs.victoriametrics.com/vmauth.html): reduce the probability of `TLS handshake error from XX.XX.XX.XX: EOF` errors when `-remoteWrite.url` points to HTTPS url at `vmauth`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1699).
* BUGFIX: return `Content-Type: text/html` response header when requesting `/` HTTP path at VictoriaMetrics components. Previously `text/plain` response header was returned, which could lead to broken page formatting. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2323).
* BUGFIX: [Graphite Render API](https://docs.victoriametrics.com/#graphite-render-api-usage): accept floating-point values for [maxDataPoints](https://graphite.readthedocs.io/en/stable/render_api.html#maxdatapoints) query arg, since some clients send floating-point values instead of integer values for this arg.