The logic employed for re-using the previously loaded scrape target was broken initially.
The commit cc0427897c tried to fix it, but the new logic
became too complex and fragile. So it is better to just remove this logic,
since the targets from temporarily broken file should be eventually loaded on next
attempts every -promscrape.fileSDCheckInterval
This also allows removing fragile hacks around __vm_filepath label.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3989
* lib/promscrape: fix the problem with scrape work duplicates when file_sd_config can't be read
* lib/promscrape: clarified comment
* lib/promscrape: made better approach to handle a problem with growing []*ScrapeWork on each error when loading config
* lib/promscrape: added CHANGELOG.md
* Update docs/CHANGELOG.md
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/storage: check for free disk space before opening tables
We check for free disk space before call to `openTable`,
so `Storage` can be set to ReadOnly before mergeWorkers start.
Before the change, there was a chance that merges will start
even if Storage has to start in ReadOnly mode because of
`-storage.minFreeDiskSpaceBytes` limit.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4023
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/storage: chore
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update lib/storage/storage.go
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
- Make sure that the last successfully loaded config is used on hot-reload failure
- Properly cleanup resources occupied by already initialized aggregators
when the current aggregator fails to be initialized
- Expose distinct vmagent_streamaggr_config_reload* metrics per each -remoteWrite.streamAggr.config
This should simplify monitoring and debugging failed reloads
- Remove race condition at app/vminsert/common.MustStopStreamAggr when calling sa.MustStop() while sa
could be in use at realoadSaConfig()
- Remove lib/streamaggr.aggregator.hasState global variable, since it may negatively impact scalability
on system with big number of CPU cores at hasState.Store(true) call inside aggregator.Push().
- Remove fine-grained aggregator reload - reload all the aggregators on config change instead.
This simplifies the code a bit. The fine-grained aggregator reload may be returned back
if there will be demand from real users for it.
- Check -relabelConfig and -streamAggr.config files when single-node VictoriaMetrics runs with -dryRun flag
- Return back accidentally removed changelog for v1.87.4 at docs/CHANGELOG.md
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3639
Verifying status code helps to avoid misleading errors caused by attempt to parse unsuccessful response.
Related issue: #4034
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
- Compare directory names instead of paths to directory when determining which persistent queues must be deleted
This is less error-prone solution, since paths to the same directory can differ, which could lead
to accidental directory removal for the existing -remoteWrite.url
- Log the `removed %d dangling queues` message when at least a single queue has been removed
- Consistently use filepath.Join() for creating paths to persistent queues.
This is needed for Windows support (see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70 )
- Clarify the description of the change at docs/CHANGELOG.md
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
- Use windows.FlushFileBuffers() instead of windows.Fsync() at streamTracker.adviseDontNeed()
for consistency with implementations for other architectures.
- Use filepath.Base() instead of filepath.Split(), since the dir part isn't used.
This simplifies the code a bit.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
This is a follow-up for 43b24164ef
* lib/fs: adds memory map for windows
it should improve performance for file reading
* lib/storage: replace '/' with os specific separator
it must fix an errors for windows
* lib/fs: mention windows fsync support
* lib/filestream: adds fdatasync for windows writes
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
* allowed using dashes and dots in environment variables names for templating config files with envtemplate (#3999)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
* Apply suggestions from code review
---------
Signed-off-by: Alexander Marshalov <_@marshalov.org>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/netutil: log only parsing errors for proxy-protocol
Previosly every error was logged. With configured TCP health checks at load-balancer or kubernetes, vmauth spams a lot of false positive error message into logs
* Update docs/CHANGELOG.md
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* Update lib/netutil/tcplistener.go
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
This commit changes background merge algorithm, so it becomes compatible with Windows file semantics.
The previous algorithm for background merge:
1. Merge source parts into a destination part inside tmp directory.
2. Create a file in txn directory with instructions on how to atomically
swap source parts with the destination part.
3. Perform instructions from the file.
4. Delete the file with instructions.
This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since the remaining files with instructions is replayed on the next restart,
after that the remaining contents of the tmp directory is deleted.
Unfortunately this algorithm doesn't work under Windows because
it disallows removing and moving files, which are in use.
So the new algorithm for background merge has been implemented:
1. Merge source parts into a destination part inside the partition directory itself.
E.g. now the partition directory may contain both complete and incomplete parts.
2. Atomically update the parts.json file with the new list of parts after the merge,
e.g. remove the source parts from the list and add the destination part to the list
before storing it to parts.json file.
3. Remove the source parts from disk when they are no longer used.
This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since incomplete partitions from step 1 or old source parts from step 3 are removed
on the next startup by inspecting parts.json file.
This algorithm should work under Windows, since it doesn't remove or move files in use.
This algorithm has also the following benefits:
- It should work better for NFS.
- It fits object storage semantics.
The new algorithm changes data storage format, so it is impossible to downgrade
to the previous versions of VictoriaMetrics after upgrading to this algorithm.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3236
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3821
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
lib{mergset,storage}: prevent possible race condition with logging stats for merges
Previously partwrapper could be release by background process and reference for part may be invalid
during logging stats. It will lead to panic at vmstorage
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3897
- Use flag.Duration instead of flagutil.Duration for -snapshotCreateTimeout,
since the flagutil.Duration is intended mostly for big durations, e.g. days, months and years,
while the -snapshotCreateTimeout is usually smaller than one hour.
- Add links to https://docs.victoriametrics.com/#how-to-work-with-snapshots in docs/CHANGELOG.md,
so readers could easily find the corresponding docs when reading the changelog.
- Properly remove all the created directories on unsuccessful attempt to create
snapshot in Storage.CreateSnapshot().
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
* lib/{fs,mergeset,storage}: skip `.must-remove.` dirs when creating snapshot (#3858)
* lib/{mergeset,storage}: add timeout configuration for snapshots creation, remove incomplete snapshots from storage
* docs: fix formatting
* app/vmstorage: add metrics to track status of snapshots
* app/vmstorage: use `vm_http_requests_total` metric for snapshot endpoints metrics, rename new flag to make name more clear
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vmstorage: update flag name in docs
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vmstorage: reflect new metrics names change in docs
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/promscrape: set `vm_promscrape_config_last_reload_successful` to 1 if there was no promscrape config provided
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/promscrape: register `vm_promscrape_config_*` metrics only in case promscrape config is used
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
- Return immediately on context cancel during the backoff sleep.
This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747
- Add a comment describing why the second attempt to obtain the response from remote side
is perfromed immediately after the first attempt.
- Remove fasthttp dependency from lib/promscrape/discoveryutils
- Set context deadline before calling doRequestWithPossibleRetry().
This simplifies the doRequestWithPossibleRetry() a bit.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3293
* fix: do not use exponential backoff for first retry of scrape request (#3293)
* lib/promscrape: refactor `doRequestWithPossibleRetry` backoff to simplify logic
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Update lib/promscrape/client.go
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* lib/promscrape: refactor `doRequestWithPossibleRetry` to make it more straightforward
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* Modify API version when running in Container App
* Handle expires on from token response
Response from IMDS does not always contain expires in value which is
currently used to get the token expiry time. An example resources that
doesn't provide it are Container Apps and App Service.
Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com>
* Fix client id parameter for user assigned identity
* Apply suggestions from code review
---------
Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com>
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
- Do not generate __meta_server label, since it is unavailable in Prometheus.
- Add a link to https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs to docs/CHANGELOG.md,
so users could click it and read the docs without the need to search the corresponding docs.
- Remove kumaTarget struct, since it is easier generating labels for discovered targets
directly from the response returned by Kuma. This simplifies the code.
- Store the generated labels for discovered targets inside atomic.Value. This allows reading them
from concurrent goroutines without the need to use mutex.
- Use synchronouse requests to Kuma instead of long polling, since there is a little sense
in the long polling when the Kuma server may return 304 Not Modified response every -promscrape.kumaSDCheckInterval.
- Remove -promscrape.kuma.waitTime command-line flag, since it is no longer needed when long polling isn't used.
- Set default value for -promscrape.kumaSDCheckInterval to 30s in order to be consistent with Prometheus.
- Remove unnecessary indirections for string literals, which are used only once, in order to improve code readability.
- Remove unused fields from discoveryRequest and discoveryResponse.
- Update tests.
- Document why fetch_timeout and refresh_interval options are missing in kuma_sd_config.
- Add docs to discoveryutils.RequestCallback and discoveryutils.ResponseCallback,
since these are public types.
Side notes: it is weird that Prometheus implementation for kuma_sd_configs sets `instance` label,
since usually this label is set by the Prometheus itself to __address__ after the relabeling phase.
See https://www.robustperception.io/life-of-a-label/
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3389
See https://github.com/prometheus/prometheus/issues/7919
and https://github.com/prometheus/prometheus/pull/8844
as a reference implementation in Prometheus
`lib/protoparser/prometheus` is used by various applications,
such as `app/vmalert`. The recent change to the
`lib/protoparser/prometheus` package introduced a new dependency
of `lib/writeconcurrencylimiter` which exposes some metrics.
Because of the dependency, now all applications which have this
dependency also expose these metrics.
Creating a new `lib/protoparser/prometheus/stream` package helps
to remove these metrics from apps which use `lib/protoparser/prometheus`
as dependency.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3761
Signed-off-by: hagen1778 <roman@victoriametrics.com>
previously historical data backfilling may trigger force merge for previous month every hour
it consumes cpu, disk io and decrease cluster performance.
Following commit fixes it by applying deduplication for InMemoryParts
The limit has been increased from 300 bytes to 500 bytes according to the collected production stats.
This allows reducing CPU usage without significant increase of RAM usage in most practical cases.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
Switch to the actual API version `2016-11-15`,
since the old version doesn't provide access to all
the fields which implementation expects.
For example, old API missing `zone_id` field
in `DescribeAvailabilityZonesResponse` response.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3700
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Allow users fine-tuning the maximum string length for interning via -internStringMaxLen command-line flag.
This may be used for fine-tuning RAM vs CPU usage for certain workloads.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
- Call httpserver.GetQuotedRemoteAddr() and httpserver.GetRequestURI() only when the error occurs.
This saves CPU time on fast path when there are no parsing errors.
- Create a helper function - httpserver.LogError() - for logging the error with the request uri and remote addr context.
Stale series are sent when there is a difference between current
and previous scrapes. Those series which disappeared in the current scrape
are marked as stale and sent to the remote storage.
Sending stale series requires memory allocation and in case when too many
series disappear in the same it could result in noticeable memory spike.
For example, re-deploy of a big fleet of service can result into
excessive memory usage for vmagent, because all the series with old
pod name will be marked as stale and sent to the remote write storage.
This change limits the number of stale series which can be sent at once,
so memory usage remains steady.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3668https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3675
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This prevents vmbackup from leaking passwords into logs like shown below.
2023-01-11T15:00:01.050Z info VictoriaMetrics/lib/logger/flag.go:12 build version: vmbackup-20221214-211706-tags-v1.85.1-0-g09a70d3e9
2023-01-11T15:00:01.050Z info VictoriaMetrics/lib/logger/flag.go:13 command-line flags
2023-01-11T15:00:01.050Z info VictoriaMetrics/lib/logger/flag.go:20 -dst="fs:///vm-backups/latest"
2023-01-11T15:00:01.050Z info VictoriaMetrics/lib/logger/flag.go:20 -snapshot.createURL="http://user:super_sercret123@victoriametricspshot/create"
2023-01-11T15:00:01.050Z info VictoriaMetrics/lib/logger/flag.go:20 -storageDataPath="/storage"
2023-01-11T15:00:01.050Z info VictoriaMetrics/app/vmbackup/main.go:53 Snapshot create url http://user:super_sercret123@victoriametrics:8428/snapshot/create
2023-01-11T15:00:01.050Z info VictoriaMetrics/app/vmbackup/main.go:60 Snapshot delete url http://user:super_sercret123@victoriametrics:8428/snapshot/delete
Assisted merges are intended to be performed by goroutines, which accept the incoming samples,
in order to limit the data ingestion rate.
The worker, which converts pending samples to parts, shouldn't be penalized by assisted merges,
since this may result in increased number of pending rows as seen at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647#issuecomment-1385039142
when the assisted merge takes too much time.