When WriteFileAndSync fails, then the caller eventually logs the error message
and exits. The error message returned by WriteFileAndSync already contains the path
to the file, which couldn't be created. This information alongside the call stack
is enough for debugging the issue. So just use log.Panicf("FATAL: ...") inside MustWriteAndSync().
This simplifies error handling at caller side a bit.
This is a follow-up after 42bba64aa7
Previously the part directory listing was fsync'ed implicitly inside partHeader.WriteMetadata()
by calling fs.WriteFileAtomically(). Now it must be fsync'ed explicitly.
There is no need in fsync'ing the parent directory, since it is fsync'ed by the caller
when updating parts.json file.
Previously the created part directory listing was fsynced implicitly
when storing metadata.json file in it.
Also remove superflouous fsync for part directory listing,
which was called at blockStreamWriter.MustClose().
After that the metadata.json file is created, so an additional fsync
for the directory contents is needed.
Improperly configured -bigMergeConcurrency command-line flag usually leads to uncontrolled
growth of unmerged parts, which, in turn, increases CPU usage and query durations.
So it is better deprecating this flag. In rare cases -smallMergeConcurrency command-line flag
can be used instead for controlling the concurrency of background merges.
This makes it easier to understand exact point in time which is included in this backup.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
The logic employed for re-using the previously loaded scrape target was broken initially.
The commit cc0427897c tried to fix it, but the new logic
became too complex and fragile. So it is better to just remove this logic,
since the targets from temporarily broken file should be eventually loaded on next
attempts every -promscrape.fileSDCheckInterval
This also allows removing fragile hacks around __vm_filepath label.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3989
* lib/promscrape: fix the problem with scrape work duplicates when file_sd_config can't be read
* lib/promscrape: clarified comment
* lib/promscrape: made better approach to handle a problem with growing []*ScrapeWork on each error when loading config
* lib/promscrape: added CHANGELOG.md
* Update docs/CHANGELOG.md
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/storage: check for free disk space before opening tables
We check for free disk space before call to `openTable`,
so `Storage` can be set to ReadOnly before mergeWorkers start.
Before the change, there was a chance that merges will start
even if Storage has to start in ReadOnly mode because of
`-storage.minFreeDiskSpaceBytes` limit.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4023
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/storage: chore
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update lib/storage/storage.go
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
- Make sure that the last successfully loaded config is used on hot-reload failure
- Properly cleanup resources occupied by already initialized aggregators
when the current aggregator fails to be initialized
- Expose distinct vmagent_streamaggr_config_reload* metrics per each -remoteWrite.streamAggr.config
This should simplify monitoring and debugging failed reloads
- Remove race condition at app/vminsert/common.MustStopStreamAggr when calling sa.MustStop() while sa
could be in use at realoadSaConfig()
- Remove lib/streamaggr.aggregator.hasState global variable, since it may negatively impact scalability
on system with big number of CPU cores at hasState.Store(true) call inside aggregator.Push().
- Remove fine-grained aggregator reload - reload all the aggregators on config change instead.
This simplifies the code a bit. The fine-grained aggregator reload may be returned back
if there will be demand from real users for it.
- Check -relabelConfig and -streamAggr.config files when single-node VictoriaMetrics runs with -dryRun flag
- Return back accidentally removed changelog for v1.87.4 at docs/CHANGELOG.md
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3639
Verifying status code helps to avoid misleading errors caused by attempt to parse unsuccessful response.
Related issue: #4034
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
- Compare directory names instead of paths to directory when determining which persistent queues must be deleted
This is less error-prone solution, since paths to the same directory can differ, which could lead
to accidental directory removal for the existing -remoteWrite.url
- Log the `removed %d dangling queues` message when at least a single queue has been removed
- Consistently use filepath.Join() for creating paths to persistent queues.
This is needed for Windows support (see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70 )
- Clarify the description of the change at docs/CHANGELOG.md
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
- Use windows.FlushFileBuffers() instead of windows.Fsync() at streamTracker.adviseDontNeed()
for consistency with implementations for other architectures.
- Use filepath.Base() instead of filepath.Split(), since the dir part isn't used.
This simplifies the code a bit.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
This is a follow-up for 43b24164ef
* lib/fs: adds memory map for windows
it should improve performance for file reading
* lib/storage: replace '/' with os specific separator
it must fix an errors for windows
* lib/fs: mention windows fsync support
* lib/filestream: adds fdatasync for windows writes
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
* allowed using dashes and dots in environment variables names for templating config files with envtemplate (#3999)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
* Apply suggestions from code review
---------
Signed-off-by: Alexander Marshalov <_@marshalov.org>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/netutil: log only parsing errors for proxy-protocol
Previosly every error was logged. With configured TCP health checks at load-balancer or kubernetes, vmauth spams a lot of false positive error message into logs
* Update docs/CHANGELOG.md
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* Update lib/netutil/tcplistener.go
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
This commit changes background merge algorithm, so it becomes compatible with Windows file semantics.
The previous algorithm for background merge:
1. Merge source parts into a destination part inside tmp directory.
2. Create a file in txn directory with instructions on how to atomically
swap source parts with the destination part.
3. Perform instructions from the file.
4. Delete the file with instructions.
This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since the remaining files with instructions is replayed on the next restart,
after that the remaining contents of the tmp directory is deleted.
Unfortunately this algorithm doesn't work under Windows because
it disallows removing and moving files, which are in use.
So the new algorithm for background merge has been implemented:
1. Merge source parts into a destination part inside the partition directory itself.
E.g. now the partition directory may contain both complete and incomplete parts.
2. Atomically update the parts.json file with the new list of parts after the merge,
e.g. remove the source parts from the list and add the destination part to the list
before storing it to parts.json file.
3. Remove the source parts from disk when they are no longer used.
This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since incomplete partitions from step 1 or old source parts from step 3 are removed
on the next startup by inspecting parts.json file.
This algorithm should work under Windows, since it doesn't remove or move files in use.
This algorithm has also the following benefits:
- It should work better for NFS.
- It fits object storage semantics.
The new algorithm changes data storage format, so it is impossible to downgrade
to the previous versions of VictoriaMetrics after upgrading to this algorithm.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3236
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3821
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
lib{mergset,storage}: prevent possible race condition with logging stats for merges
Previously partwrapper could be release by background process and reference for part may be invalid
during logging stats. It will lead to panic at vmstorage
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3897
- Use flag.Duration instead of flagutil.Duration for -snapshotCreateTimeout,
since the flagutil.Duration is intended mostly for big durations, e.g. days, months and years,
while the -snapshotCreateTimeout is usually smaller than one hour.
- Add links to https://docs.victoriametrics.com/#how-to-work-with-snapshots in docs/CHANGELOG.md,
so readers could easily find the corresponding docs when reading the changelog.
- Properly remove all the created directories on unsuccessful attempt to create
snapshot in Storage.CreateSnapshot().
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
* lib/{fs,mergeset,storage}: skip `.must-remove.` dirs when creating snapshot (#3858)
* lib/{mergeset,storage}: add timeout configuration for snapshots creation, remove incomplete snapshots from storage
* docs: fix formatting
* app/vmstorage: add metrics to track status of snapshots
* app/vmstorage: use `vm_http_requests_total` metric for snapshot endpoints metrics, rename new flag to make name more clear
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vmstorage: update flag name in docs
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vmstorage: reflect new metrics names change in docs
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/promscrape: set `vm_promscrape_config_last_reload_successful` to 1 if there was no promscrape config provided
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/promscrape: register `vm_promscrape_config_*` metrics only in case promscrape config is used
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>