lib/fs: wait for a while before giving up on NFS file removal if the removal queue is full

This should reduce the probability of the panic on a highly loaded VictoriaMetrics
accepting millions of samples per second.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
This commit is contained in:
Aliaksandr Valialkin 2021-05-21 17:17:44 +03:00
parent e9a63a5942
commit 23355ca34c
2 changed files with 12 additions and 1 deletions

View file

@ -33,6 +33,7 @@ sort: 15
* BUGFIX: vmctl: properly import InfluxDB rows if they have a field and a tag with identical names. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1299).
* BUGFIX: properly reload configs if `SIGHUP` signal arrives during service initialization. Previously such `SIGHUP` signal could be ingonred and configs weren't reloaded.
* BUGFIX: vmalert: properly import default rules from OpenShift. See [this issue](https://github.com/VictoriaMetrics/operator/issues/243).
* BUGFIX: reduce the probability of `the removal queue is full` panic when highly loaded VictoriaMetrics stores data on NFS. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313).
## [v1.59.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.59.0)

View file

@ -8,6 +8,7 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
"github.com/VictoriaMetrics/metrics"
)
@ -34,7 +35,16 @@ func mustRemoveAll(path string, done func()) bool {
select {
case removeDirCh <- w:
default:
logger.Panicf("FATAL: cannot schedule %s for removal, since the removal queue is full (%d entries)", path, cap(removeDirCh))
// Wait for a while in the hope files are removed from removeDirCh.
// This can be the case on highly loaded system with high ingestion rate
// as described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
t := timerpool.Get(10 * time.Second)
select {
case removeDirCh <- w:
timerpool.Put(t)
case <-t.C:
logger.Panicf("FATAL: cannot schedule %s for removal, since the removal queue is full (%d entries)", path, cap(removeDirCh))
}
}
return false
}