The issue was in the `labels := dst[offset:]` line in the beginning of appendExtraLabels() function.
The `dst` may be re-allocated when adding extra labels to it. In this case the addition of `exported_`
prefix to labels inside `labels` slice become invisible in the returned `dst` labels.
While at it, properly handle some corner cases:
- Add additional `exported_` prefix to clashing metric labels with already existing `exported_` prefix.
- Store scraped metric names in `exported___name__` label if scrape target contains `__name__` label.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3278
Thanks to @jplanckeel for the initial attempt to fix this issue
at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3281
Previously netstorage.MustStop() call didn't free up all the resources,
so the subsequent call to nestorage.Init() would panic.
This allows writing tests, which call nestorage.Init() + nestorage.MustStop() in a loop.
Blocks outside the configured retention are eventually deleted during background merge.
But such blocks may reside in the storage for long time until background merge.
Previously VictoriaMetrics could spend additional CPU time on processing such blocks
during search queries. Now these blocks are skipped.
The searchTSIDs function was searching for metricIDs matching the the given tag filters
and then was locating the corresponding TSID entries for the found metricIDs.
The TSID entries aren't needed when searching for time series names (aka MetricName),
so this commit removes the uneeded TSID search from the implementation of /api/v1/series API.
This improves perfromance of /api/v1/series calls.
This commit also improves performance a bit for /api/v1/query and /api/v1/query_range calls,
since now these calls cache small metricIDs instead of big TSID entries
in the indexdb/tagFilters cache (now this cache is named indexdb/tagFiltersToMetricIDs)
without the need to compress the saved entries in order to save cache space.
This commit also removes concurrency limiter during searching for matching time series,
which was introduced in 8f16388428, since the concurrency
for all the read queries is already limited with -search.maxConcurrentRequests command-line flag.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
This increases the maximum time for cache population with new entries from 20 minutes to 40 minutes.
This
This change shouldn't increase memory usage for caches, since the prev cache cleaner
should free up memory by deleting unused prev cache as soon as possible.
See 08ca45d238 for details on prev cache cleaner.
The panic has been introduced in 68f3a02589
While at it, add padding to shard structs in order to avoid false sharing on mordern CPUs
This should improve scalability on systems with many CPU cores
This means that the majority of requests are successfully served from the current cache,
so the previous cache can be reset in order to free up memory.
* lib/backup: set s3 default region to us-west-2
it should fix an error with region detection for bucket, if AWS_REGION env var is not set
* Update lib/backup/s3remote/s3.go
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Sort labels explicitly after calling the ParsedConfigs.Apply() when needed.
This reduces CPU usage when performing metric-level relabeling, where labels' sorting isn't needed.
Previously the `__address__` label could contain only `host:port` part of the target url,
while the scheme and metrics path were obtained from `__scheme__` and `__metrics_path__`
labels. Now it is possible to set the full url in `__address__` label.
This makes valid the following scrape config, which is frequently used by novice users:
scrape_configs:
- job_name: foo
static_configs:
- targets:
- http://host1/metrics1
- https://host2/metrics2
* Optimize fast path for /api/v1/import when importing numeric values
* Move the docs about the change from features to bugfixes at docs/CHANGELOG.md
* Update tests at lib/protoparser/vmimport
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3161
* app/vmselect: properly work when export import json from `api/v1/{export, import}` API
* app/vmselect: update convert function
* app/vmselect: export null if `math.IsNaN(v)`
* app/vmselect: get float from json
* lib/protoparser: add test
* docs: add change log
* lib/protoparser: make export import api compatible
Incorrect 301 redirects can be cached by user agents such as web browsers.
This can complicate recovery procedure after the incorrect redirect is fixed,
e.g. web browser cache must be reset.
The related issue - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1752
* app/vminsert: allows parsing tenant id from labels
it should help mitigate issues with vmagent's multiTenant mode, which works incorrectly at heavy load
and it cannot handle more then 100 different tenants.
This functional hidden with flag and do not change vminsert default behaviour
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2970
* Update docs/Cluster-VictoriaMetrics.md
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* wip
* app/vminsert/netstorage: clean remaining labels in order to free up GC
* docs/Cluster-VictoriaMetrics.md: typo fix
* wip
* wip
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Cache `action: replace` results for non-trivial regexs and return them next time
instead of performing CPU-intensive regex replacement.
Optimize also `action: labelmap_all` and `action: replace_all` in the same way.
* lib/{httpserver,netutil}: allow to define min and max TLS version of the http server
* lib/httpserver: added descriptions about tls supported versions
* lib/netutil: check minimal tls version, added supported tls versions to error
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This reduces scrape duration for targets returning big responses.
The response body was already read into memory in stream parsing mode before this change,
so this commit shouldn't increase memory usage.
Cache sanitized label names and return them next time.
This reduces the number of allocations and speeds up the SanitizeLabelName()
function for common case when the number of unique label names is smaller than 100k
The following regex patterns are optimized:
- literal string match, e.g. "foo"
- prefix match, e.g. "foo.*" and "foo.+"
- substring match, e.g. ".*foo.*" and ".+foo.+"
- alternate values match, e.g. "foo|bar|baz"
This is OK, since the anchors are implicitly applied to the whole regexp.
This optimization should improve the speed for regexp series filters with explicit $ and ^ anchors.
For example, `{label="^(foo|bar)$"}`
These metrics allow alerting when the number of unique series approach the limit.
For example, the following query alerts when the number of series reaches 90% of the configured limit:
vm_hourly_series_limit_current_series / vm_hourly_series_limit_max_series > 0.9
ioutil.ReadAll is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil
VictoriaMetrics requires at least Go1.18, so it is OK to switch from ioutil.ReadAll to io.ReadAll.
This is a follow-up for 02ca2342ab
The ioutil.ReadDir is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil
VictoriaMetrics requires at least Go1.18, so it is time to switch from io.ReadDir to os.ReadDir
This is a follow-up for 02ca2342ab
The ioutil.{Read|Write}File is deprecated since Go1.16 -
see https://tip.golang.org/doc/go1.16#ioutil
VictoriaMetrics needs at least Go1.18, so it is safe to remove ioutil usage
from source code.
This is a follow-up for 02ca2342ab
* lib/storage: bump max merge concurrency for small parts to 15
The change is based on the feedback from users on github.
Thier examples show, that limit of 8 sometimes become a
bottleneck. Users report that without limit concurrency
can climb up to 15-20 merges at once.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update lib/storage/partition.go
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
The 429 status code means that the server is overwhelmed with requests.
The client can retry the request after some wait time.
Implement this strategy for service discovery and scrape requests.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2940
* lib/storage: prevent excessive loops when storage is in RO
Returning nil error when storage is in RO mode results
into excessive loops and function calls which could
result into CPU exhaustion. Returning an err instead
will trigger delays in the for loop and save some resources.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* document the change
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/auth: add tests from NewToken function
* lib/auth: update test, fix problem with type conversion
* lib/auth: update test description
* lib/auth: simplify failure tests
Previously the time series could be put into dateMetricIDCache without
registering in the per-day inverted index if GetOrCreateTSIDByName
finds TSID entry in the global index. This could lead to missing
series in query results.
The issue has been introduced in the commit 55e7afae3a,
which has been included in VictoriaMetrics v1.78.0
- show dates in human-readable format, e.g. 2022-05-07, instead of a numeric value
- limit the maximum length of queries and filters shown in trace messages
Previously SearchMetricNames was returning unmarshaled metric names.
This wasn't great for vmstorage, which should spend additional CPU time
for marshaling the metric names before sending them to vmselect.
While at it, remove possible duplicate metric names, which could occur when
multiple samples for new time series are ingested via concurrent requests.
Also sort the metric names before returning them to the client.
This simplifies debugging of the returned metric names across repeated requests to /api/v1/series
This opens doors for implementing vmselect api server at vmselect level,
so top-level vmselect could query lower-level vmselect nodes in the same way
as it queries vmstorage nodes.
This will create the ability to create highly available querying architecture
when multiple independent VictoriaMetrics clusters with the same data
are located in distinct availability zones. In this case we can use top-level
vmselect instead of Promxy for simultaneous querying of all the clusters
in all the AZs.
querytracer has been added to the following storage.Storage methods:
- RegisterMetricNames
- DeleteMetrics
- SearchTagValueSuffixes
- SearchGraphitePaths
If the previous dial attempt was unsuccessful, then all the new dial attempts are skipped
until the background goroutine determines that the given address can be successfully dialed.
This reduces query latency when some of vmstorage nodes are unavailable and dialing them is slow.
This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
This commit is based on ideas from the https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2756
The main differences are:
- The check for healthy/unhealthy storage nodes is moved one level lower from app/vmselect/netstorage to lib/netutil.ConnPool.
This makes possible re-using this feature everywhere lib/netutil.ConnPool is used.
- The check doesn't take into account handshake errors for already established connections.
Handshake errors usually mean improperly configured VictoriaMetrics cluster, so they shouldn't be ignored.
The commit 5fb45173ae takes into account only newly registered series
when applying cardinality limits. This means that the cardinality limit could be exceeded with already registered series.
This commit returns back accounting for already registered series when applying cardinality limits.
Previously the creation of per-day indexes and global indexes
for the newly registered time series was decoupled.
Now global indexes and per-day indexes for the current day are created toghether for new time series.
This should speed up registering new time series a bit.
* lib/httpserver: backport changes from master branch
adds basicAuth
adds authKey check for /metrics and /debug/pprof requests
it should improve security for cluster components
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
vmalert: support `limit` param in groups definition
`limit` param limits number of time series samples produced by a single rule
during execution.
On reaching the limit rule will return an err.
Signed-off-by: lihaowei <haoweili35@gmail.com>
The change updates histogram for registering SD update duration
only SD is considered as `active`. SD is active if at least
one scraper for this SD has started.
This change supposed to reduce metrics cardinality produced
by duration histogram which gets updated even if SD isn't configured.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2671
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Workers count for merges affects the max part size during merges. Such behaviour
protects storage from running out of disk space for scenario when all workers
are merging parts with the max size.
This works very well for most cases. But for systems where high number of CPUs
is allocated for vmstorage components this could significantly impact the max
part size and result in more unmerged parts than expected.
While checking multiple production highly loaded setups it was discovered that
`max_over_time(vm_active_merges{type="storage/big}[1h]}"` rarely exceeds 2,
and `max_over_time(vm_active_merges{type="storage/small}[1h]}"` rarely exceeds 4.
The change in this commit limits the max value for concurrency accordingly.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Remove unused js bloatware from /targets page. This strips down binary size by more than 100Kb
- Add /service-discovery page for API compatibility with Prometheus
- Properly load bootstrap.min.css from /prometheus/targets
- Serve static contents for /targets page from app/vminsert instead of app/vmselect, because /targets page is served from there
- Take into account `ca`, `key` and `cert` values when generating string representation of TLSConfig.
Print hashes instead of real values because of security considerations.
- Properly update Config.tlsCertDigets when `key` and `cert` values are set.
This allows properly updating scrape targets after these values are updated in configs.
- Do not re-generate certificate from `key` and `cert` values per each call to getTLSCert,
because these values are immutable.
- Do not set `ca` value from `ca_file` value, so it isn't exposed at `/config` page.
- Generate proper error messages on incorrect `key`, `cert` or `ca` values.
The default size of `indexdb/tagFilters` now can be overridden via
`storage.cacheSizeIndexDBTagFilters` flag.
Please, be careful with changing default size since it may
lead to inefficient work of the vmstorage or OOM exceptions.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2663
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
* promrelabel: add support of `lowercase` and `uppercase` relabeling actions
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2664
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/storage: make golangci-lint happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
* lib/promscrape/discovery/kubernetes: properly updates discovered scrape works
previously, added or updated scrapeworks may override previuosly
discovered.
it happens because swosByKey may contain small subset of kubernetes
objects with it's labels.
It happens for objectsUpdated and objectsAdded maps, which include only changed elements
* Properly calculate vm_promscrape_discovery_kubernetes_scrape_works
Co-authored-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This adds the ability to utilize sigv4 signing for all AWS services not
just "aps". When the newly introduced property "service" is not set it
will default to "aps".
Signed-off-by: Boris Petersen <boris.petersen@idealo.de>
This should reduce potential spikes in the number of established connections in the following cases:
- when the connection establishing procedure becomes temporarily slow
- after a temporary spike in the rate of ConnPool.Get() calls
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2552
Previously ScrapeConfig.clone() was improperly copying promauth.Secret fields -
their contents was replaced with `<secret>` value.
This led to inability to use passwords and secrets in `-promscrape.config` file.
The bug has been introduced in v1.77.0 in the commit 67b10896d2
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2551
* {lib/promscrape,app/vmagent}: adds sigv4 support for vmagent remoteWrite
moves aws related code into separate lib from lib/promscrape
it allows to write data from vmagent to the AWS managed prometheus (cortex)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1287
* Apply suggestions from code review
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/{storage,flagutil} - Add option for snapshot autoremoval
- add prometheus-like duration as command flag
- add option to delete stale snapshots
- update duration.go flag to re-use own code
* wip
* lib/flagutil: re-use Duration.Set() call in NewDuration
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>