From 12cd32fd75706969f972a328e8583ca1da9e68c3 Mon Sep 17 00:00:00 2001 From: Aliaksandr Valialkin Date: Mon, 13 Nov 2023 21:19:06 +0100 Subject: [PATCH] lib/protoparser/promremotewrite: fall back to Snappy decoding if zstd decoding fails This case is possible after the following steps: 1. vmagent tries to perform handshake with the -remoteWrite.url in order to determine whether the remote storage supports zstd-compressed data. 2. The remote storage is unavailable during the handshake. In this case vmagent falls back to Snappy compression for the data sent to the remote storage. 3. vmagent compresses the collected data into blocks with Snappy and puts these blocks to persistent queue on disk. 4. The remote storage becomes available. 5. vmagent restarts, performs the handshake with the remote storage and detects that it supports zstd-compressed data. 6. vmagent starts sending Snappy-compressed data from persistent queue to the remote storage, while falsely advertizing it sends zstd-compressed data. 7. The remote storage receives Snappy-compressed data and fails unpacking it with zstd. The solution is to just fall back to Snappy decompression if zstd decompression fails. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301 --- docs/CHANGELOG.md | 1 + lib/protoparser/promremotewrite/stream/streamparser.go | 9 ++++++++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index 86acc5609..68f02b0f6 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -105,6 +105,7 @@ The sandbox cluster installation is running under the constant load generated by * BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): properly drop samples if `-streamAggr.dropInput` command-line flag is set and `-remoteWrite.streamAggr.config` contains an empty file. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5207). * BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): do not print redundant error logs when failed to scrape consul or nomad target. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5239). * BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): generate proper link to the main page and to `favicon.ico` at http pages served by `vmagent` such as `/targets` or `/service-discovery` when `vmagent` sits behind an http proxy with custom http path prefixes. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5306). +* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): properly decode Snappy-encoded data blocks received via [VictoriaMetrics remote_write protocol](https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301). * BUGFIX: [vmstorage](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html): prevent deleted series to be searchable via `/api/v1/series` API if they were re-ingested with staleness markers. This situation could happen if user deletes the series from the target and from VM, and then vmagent sends stale markers for absent series. Thanks to @ilyatrefilov for the [issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069) and [pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5174). * BUGFIX: [vmstorage](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html): log warning about switching to ReadOnly mode only on state change. Before, vmstorage would log this warning every 1s. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159) for details. * BUGFIX: [vmauth](https://docs.victoriametrics.com/vmauth.html): show browser authorization window for unauthorized requests to unsupported paths if the `unauthorized_user` section is specified. This allows properly authorizing the user. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5236) for details. diff --git a/lib/protoparser/promremotewrite/stream/streamparser.go b/lib/protoparser/promremotewrite/stream/streamparser.go index 4b751b1ad..bbd8a1fab 100644 --- a/lib/protoparser/promremotewrite/stream/streamparser.go +++ b/lib/protoparser/promremotewrite/stream/streamparser.go @@ -42,7 +42,14 @@ func Parse(r io.Reader, isVMRemoteWrite bool, callback func(tss []prompb.TimeSer if isVMRemoteWrite { bb.B, err = zstd.Decompress(bb.B[:0], ctx.reqBuf.B) if err != nil { - return fmt.Errorf("cannot decompress zstd-encoded request with length %d: %w", len(ctx.reqBuf.B), err) + // Fall back to Snappy decompression, since vmagent may send snappy-encoded messages + // with 'Content-Encoding: zstd' header if they were put into persistent queue before vmagent restart. + // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301 + zstdErr := err + bb.B, err = snappy.Decode(bb.B[:cap(bb.B)], ctx.reqBuf.B) + if err != nil { + return fmt.Errorf("cannot decompress zstd-encoded request with length %d: %w", len(ctx.reqBuf.B), zstdErr) + } } } else { bb.B, err = snappy.Decode(bb.B[:cap(bb.B)], ctx.reqBuf.B)