fix: vmselect multi-level setup panic (#3738)

* app/vmselect/netstorage: fix panic for multi-level cluster setup when `replicationFactor` was set and request contained `trace` parameter (#3734) Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * app/vmselect/netstorage: use correct context for retry Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-03-11 15:34:56 +00:00 · 2023-02-01 20:56:36 +04:00 · 2023-02-01 20:56:36 +04:00 · 1a8f6d98c7
commit 1a8f6d98c7
parent 840f4e3383
2 changed files with 5 additions and 4 deletions
--- a/app/vmselect/netstorage/netstorage.go
+++ b/app/vmselect/netstorage/netstorage.go
@ -1650,7 +1650,7 @@ func (sn *storageNode) processSearchQuery(qt *querytracer.Tracer, requestData []
 func (sn *storageNode) execOnConnWithPossibleRetry(qt *querytracer.Tracer, funcName string, f func(bc *handshake.BufferedConn) error, deadline searchutils.Deadline) error {
 	qtChild := qt.NewChild("rpc call %s()", funcName)
 	err := sn.execOnConn(qtChild, funcName, f, deadline)
-	qtChild.Done()
+	defer qtChild.Done()
 	if err == nil {
 		return nil
 	}
@ -1661,9 +1661,9 @@ func (sn *storageNode) execOnConnWithPossibleRetry(qt *querytracer.Tracer, funcN
 		return err
 	}
 	// Repeat the query in the hope the error was temporary.
-	qtChild = qt.NewChild("retry rpc call %s() after error", funcName)
-	err = sn.execOnConn(qtChild, funcName, f, deadline)
-	qtChild.Done()
+	qtRetry := qtChild.NewChild("retry rpc call %s() after error", funcName)
+	err = sn.execOnConn(qtRetry, funcName, f, deadline)
+	qtRetry.Done()
 	return err
 }

--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@ -17,6 +17,7 @@ The following tip changes can be tested by building VictoriaMetrics components f

 * BUGFIX: fix a bug, which could prevent background merges for the previous partitions until restart if the storage didn't have enough disk space for final deduplication and down-sampling.
 * BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): update API version for [ec2_sd_configs](https://docs.victoriametrics.com/sd_configs.html#ec2_sd_configs) to fix [the issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3700) with missing `__meta_ec2_availability_zone_id` attribute.
+* BUGFIX: [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html): fix panic on top-level vmselect nodes of [multi-level setup](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multi-level-cluster-setup) when the `-replicationFactor` flag is set and request contains `trace` query parameter. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3734).
 * BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): [dockerswarm_sd_configs](https://docs.victoriametrics.com/sd_configs.html#dockerswarm_sd_configs): apply `filters` only to objects of the specified `role`. Previously filters were applied to all the objects, which could cause errors when different types of objects were used with filters that were not compatible with them. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3579).
 * BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): suppress all the scrape errors when `-promscrape.suppressScrapeErrors` is enabled. Previously some scrape errors were logged even if `-promscrape.suppressScrapeErrors` flag was set.
 * BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): consistently put the scrape url with scrape target labels to all error logs for failed scrapes. Previously some failed scrapes were logged without this information.