Commit graph

71 commits

Author SHA1 Message Date
Aliaksandr Valialkin
b9616c017f lib/{mergeset,storage}: remove transaction files only after the mentioned dirs are really removed
This should fix the issue on NFS when incompletely removed dirs may be left
after unclean shutdown (OOM, kill -9, hard reset, etc.), while the corresponding transaction
files are already removed.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162
2019-12-02 21:34:37 +02:00
Aliaksandr Valialkin
87b39222be Revert "lib/fs: do not postpone directory removal on NFS error"
This reverts commit 21aeb02b46649ac9906cb37733f7b155a77a0db9.
2019-11-12 16:29:50 +02:00
Oleg Kovalov
74ba42d111 fix misspelled words (#229) 2019-11-12 00:18:24 +02:00
Aliaksandr Valialkin
5f52eb7653 lib/fs: do not postpone directory removal on NFS error
Continue trying to remove NFS directory on temporary errors for up to a minute.

The previous async removal process breaks in the following case during VictoriaMetrics start

- VictoriaMetrics opens index, finds incomplete merge transactions and starts replaying them.
- The transaction instructs removing old directories for parts, which were already merged into bigger part.
- VictoriaMetrics removes these directories, but their removal is delayed due to NFS errors.
- VictoriaMetrics scans partition directory after all the incomplete merge transactions are finished
  and finds directories, which should be removed, but weren't still removed due to NFS errors.
- VictoriaMetrics panics when it finds unexpected empty directory.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162
2019-11-10 13:27:16 +02:00
Aliaksandr Valialkin
f581b2736a lib/fs: typo fix in comment to WriteFileAtomically 2019-10-29 11:31:34 +02:00
Aliaksandr Valialkin
2c654258ef lib/fs: add MustStopDirRemover for waiting until pending directories are removed on graceful shutdown
This patch is mainly required for laggy NFS. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162
2019-09-05 11:17:17 +03:00
Aliaksandr Valialkin
82bfe818d0 lib/fs: try harder with directory removal on NFS in the event of temporary lock
Do not give up after 11 attempts of directory removal on laggy NFS.

Add `vm_nfs_dir_remove_failed_attempts_total` metric for counting the number of failed attempts
on directory removal.

Log failed attempts on directory removal after long sleep times.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162
2019-09-04 12:24:41 +03:00
Aliaksandr Valialkin
604a4312f9 all: port to FreeBSD on GOARCH=amd64 2019-08-28 01:46:09 +03:00
Aliaksandr Valialkin
51263b1a45 lib/fs: add test for IsTemporaryFileName 2019-08-13 21:33:54 +03:00
Aliaksandr Valialkin
39f3f3a517 lib: move common code for creating flock.lock file into fs.CreateFlockFile 2019-08-13 01:46:20 +03:00
Aliaksandr Valialkin
73f866d874 lib/fs: atomically create file with the given contents on WriteFileAtomically
This should prevent from `transaction` and `metadata.json` files corruption
on unclean shutdown such as OOM, `kill -9`, power loss, etc.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/148
2019-08-12 15:02:04 +03:00
Aliaksandr Valialkin
41f512af1c all: add vm_data_size_bytes metrics for easy monitoring of on-disk data size and on-disk inverted index size 2019-07-04 19:43:04 +03:00
Aliaksandr Valialkin
b0b93e3d50 lib/fs: sync parent dir in MustRemoveAll only if it exists
The parent directory may be non-existing when the deleted directory
didn't exist before the MustRemoveAll call
2019-06-12 02:16:15 +03:00
Aliaksandr Valialkin
18d6f293f7 lib/fs: consolidate *RemoveAll* funcs into a single MustRemoveAll func
The func syncs parent dir in order to persist directory removal
in the event of power loss
2019-06-12 01:55:18 +03:00
Aliaksandr Valialkin
28d9904efc lib/fs: panic with fatal error when directories cannot be removed
Unremoved directories may lead to inconsistent data directory,
so VictoriaMetrics will fail to start next time.

So panic on the first error when trying to remove directory in order
to simplify recover process.
2019-06-12 01:20:10 +03:00
Aliaksandr Valialkin
d897bc3f08 lib/fs: attempt #2 to work around NFS issue with directory removal
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/61
2019-06-12 01:07:29 +03:00
Aliaksandr Valialkin
51e2e255a6 lib/fs: consistency renaming SyncPath -> MustSyncPath, since it doesnt return error 2019-06-11 23:13:45 +03:00
Aliaksandr Valialkin
3fa4c28f6b lib/fs: make sure the created directory remains visible in the fs in the event of power loss
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/63
2019-06-11 23:08:17 +03:00
Aliaksandr Valialkin
0b7f751f60 lib/fs: use filepath.Dir instead of filepath.Split, since the filename is unused 2019-06-11 22:54:23 +03:00
Aliaksandr Valialkin
3437c30180 all: try hard removing directory with contents
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/61
2019-06-11 01:58:08 +03:00
Aliaksandr Valialkin
1836c415e6 all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00