app/vmagent/remotewrite: fix vmagent panic on shutdown (#4407)

app/vmagent/remotewrite: fix vmagent panic on shutdown

Currently, when vmagent is stopping it first flushes pending series in remote write context and proceeds to stop streaming aggregation. This leads to streaming aggregation being unable to write results into pending timeseries (since it is already nil) and panic.
This can lead to losing some aggregation results being lost almost silently.

The fix is reordering flow to first stop streaming aggregation and flush all pending time series after that.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This commit is contained in:
Zakhar Bessarab 2023-06-07 17:45:43 +04:00 committed by GitHub
parent 96b40b044c
commit ce7141383d
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 6 additions and 3 deletions

View file

@ -587,6 +587,11 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
} }
func (rwctx *remoteWriteCtx) MustStop() { func (rwctx *remoteWriteCtx) MustStop() {
// sas must be stopped before rwctx is closed
// because sas can write pending series to rwctx.pss if there are any
sas := rwctx.sas.Swap(nil)
sas.MustStop()
for _, ps := range rwctx.pss { for _, ps := range rwctx.pss {
ps.MustStop() ps.MustStop()
} }
@ -596,9 +601,6 @@ func (rwctx *remoteWriteCtx) MustStop() {
rwctx.c.MustStop() rwctx.c.MustStop()
rwctx.c = nil rwctx.c = nil
sas := rwctx.sas.Swap(nil)
sas.MustStop()
rwctx.fq.MustClose() rwctx.fq.MustClose()
rwctx.fq = nil rwctx.fq = nil

View file

@ -31,6 +31,7 @@ The following tip changes can be tested by building VictoriaMetrics components f
`--search.maxGraphiteTagKeys` for limiting the number of tag keys returned from Graphite `/tags`, `/tags/autoComplete/*`, `/tags/findSeries` API. `--search.maxGraphiteTagKeys` for limiting the number of tag keys returned from Graphite `/tags`, `/tags/autoComplete/*`, `/tags/findSeries` API.
`--search.maxGraphiteTagValues` for limiting the number of tag values returned Graphite `/tags/<tag_name>` API. `--search.maxGraphiteTagValues` for limiting the number of tag values returned Graphite `/tags/<tag_name>` API.
Remove redundant limit from [Prometheus api/v1/series](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#prometheus-querying-api-usage). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4339). Remove redundant limit from [Prometheus api/v1/series](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#prometheus-querying-api-usage). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4339).
* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): fix panic on vmagent shutdown which could lead to loosing aggregation results which were not flushed to remote yet. See [this](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4407) for details.
## [v1.91.2](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.91.2) ## [v1.91.2](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.91.2)