github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	b4ba8d0d76	lib/protoparser: add missing /datadog/ prefix to the /api/v2/series path in the description for -datadog.maxInsertRequestSize command-line flag	2023-12-21 21:04:53 +02:00
Aliaksandr Valialkin	fb90a56de2	app/{vminsert,vmagent}: preliminary support for /api/v2/series ingestion from new versions of DataDog Agent This commit adds only JSON support - https://docs.datadoghq.com/api/latest/metrics/#submit-metrics , while recent versions of DataDog Agent send data to /api/v2/series in undocumented Protobuf format. The support for this format will be added later. Thanks to @AndrewChubatiuk for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4451	2023-12-21 20:50:55 +02:00
Aliaksandr Valialkin	01f9edda64	lib/promauth: add more context to errors returned by Options.NewConfig() in order to simplify troubleshooting	2023-12-20 21:58:12 +02:00
Aliaksandr Valialkin	160cc9debd	app/{vmagent,vmalert}: add the ability to set OAuth2 endpoint params via the corresponding *.oauth2.endpointParams command-line flags This is a follow-up for `5ebd5a0d7b` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5427	2023-12-20 21:35:28 +02:00
Morgan	5ebd5a0d7b	Expose OAuth2 Endpoint Parameters to cli (#5427 ) The user may which to control the endpoint parameters for instance to set the audience when requesting an access token. Exposing the parameters as a map allows for additional use cases without requiring modification.	2023-12-20 20:16:43 +02:00
Nikolay	7cfde237ec	lib/awsapi: properly assume role with webIdentity token (#5495 ) * lib/awsapi: properly assume role with webIdentity token introduce new irsaRoleArn param for config. It's only needed for authorization with webIdentity token. First credentials obtained with irsa role and the next sts assume call for an actual roleArn made with those credentials. Common use case for it - cross AWS accounts authorization https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3822 * wip --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-12-20 19:05:39 +02:00
Aliaksandr Valialkin	5a88bc973f	all: use Gauge instead of Counter for `*_config_last_reload_successful` metrics This allows exposing the correct TYPE metadata for these labels when the app runs with -metrics.exposeMetadata command-line flag. See https://github.com/VictoriaMetrics/metrics/pull/61#issuecomment-1860085508 for more details. This is follow-up for `326a77c697`	2023-12-20 14:23:42 +02:00
Aliaksandr Valialkin	326a77c697	all: add -metrics.exposeMetadata command-line flag, which can be used for adding TYPE and HELP metadata for metrics exposed at /metrics page This may be needed for systems, which require this metadata such as Google Cloud Managed Prometheus. See https://cloud.google.com/stackdriver/docs/managed-prometheus/troubleshooting#missing-metric-type	2023-12-19 03:20:40 +02:00
Aliaksandr Valialkin	4b529562ce	lib/pushmetrics: add -pushmetrics.header and -pushmetrics.disableCompression command-line flags	2023-12-17 19:56:46 +02:00
Aliaksandr Valialkin	0379a0eb82	lib/protoparser/opentelemetry: allow ingesting metrics without resource labels Some clients may ingest samples via OpenTelemetry protocol without Resource labels. Previously VictoriaMetrics was silently dropping such samples. The commit `317834f876` added vm_protoparser_rows_dropped_total{type="opentelemetry",reason="resource_not_set"} counter for tracking of such dropped samples. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5459 It is better from usability PoV to accept such samples instead of dropping them and incrementing the corresponding counter.	2023-12-17 19:12:58 +02:00
Zakhar Bessarab	317834f876	lib/protoparser/opentelemetry: add metric to track skipped rows without resource (#5459 ) Currently, it is impossible to understand why metrics are not ingested when resource is not set by OTEL exporter. Adding metric should simplify debugging and make it improve debuggability. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2023-12-15 11:16:25 +01:00
Aliaksandr Valialkin	72dbd24b22	lib/fs: remove unused IsEmptyDir() This function became unused after the commit `43b24164ef` The unused function has been found with deadode tool - https://go.dev/blog/deadcode	2023-12-14 19:38:53 +02:00
Aliaksandr Valialkin	0f91f83639	app/vmselect: add support for vmstorage groups with independent -replicationFactor per group Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5197 See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#vmstorage-groups-at-vmselect Thanks to @zekker6 for the initial pull request at https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/718	2023-12-13 00:14:45 +02:00
hagen1778	e0fc5ef140	lib/promscrape: comsetic changes after `e373bb84d5` * fix typos in docs * add `shard-` prefix to generated links when `-promscrape.cluster.memberURLTemplate` is enabled Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-12-12 11:28:18 +01:00
Aliaksandr Valialkin	51df2248f0	vendor: run `make vendor-update`	2023-12-11 10:48:36 +02:00
Aliaksandr Valialkin	042267541f	app/vmauth: add support for `hot standby` mode via `first_available` load balancing policy vmauth in `hot standby` mode sends requests to the first url_prefix while it is available. If the first url_prefix becomes unavailable, then vmauth falls back to the next url_prefix. This allows building highly available setup as described at https://docs.victoriametrics.com/vmauth.html#high-availability Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4893 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4792	2023-12-08 23:31:07 +02:00
Aliaksandr Valialkin	b05e1512d4	lib/promscrape: add a wraning when the /service-discovery page contains incomplete list of dropped targets	2023-12-08 19:03:51 +02:00
noodles2hg	8efe694160	lib/streamaggr/streamaggr.go: fix link in error message (#5439 )	2023-12-08 16:55:05 +03:00
Aliaksandr Valialkin	e373bb84d5	lib/promscrape: add `-promscrape.cluster.memberURLTemplate` command-line flag for creating direct links to vmagent instances at /service-discovery page See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4018#issuecomment-1843811569	2023-12-07 16:04:21 +02:00
Aliaksandr Valialkin	7cb8ed8271	lib/promscrape: show -promscrape.cluster.memberNum values for vmagent instances, which scrape the given dropped target at /service-discovery page The /service-discovery page contains the list of all the discovered targets after the commit `487f6380d0` on all the vmagent instances in cluster mode ( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ). This commit improves debuggability of targets in cluster mode by providing a list of -promscrape.cluster.memberNum values per each target at /service-discovery page, which has been dropped becasue of sharding, e.g. if this target is scraped by other vmagent instances in the cluster. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4018	2023-12-07 00:05:32 +02:00
Aliaksandr Valialkin	67468a0c46	lib/promscrape: show `never scraped` message for never scraped targets at /targets page	2023-12-06 22:33:39 +02:00
Aliaksandr Valialkin	65bc460323	lib/promscrape: follow-up for `97373b7786` Substitute O(N^2) algorithm for exposing the `vm_promscrape_scrape_pool_targets` metric with O(N) algorithm, where N is the number of scrape jobs. The previous algorithm could slow down /metrics exposition significantly when -promscrape.config contains thousands of scrape jobs. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5311 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5335	2023-12-06 17:35:50 +02:00
Hui Wang	97373b7786	vmagent: add `vm_promscrape_scrape_pool_targets` for scrape jobs like… (#5335 ) * vmagent: export `vm_promscrape_scrape_pool_targets` metric to track the number of targets that each scrape_job discovers * add extra panel for new metric	2023-12-06 15:44:39 +08:00
Aliaksandr Valialkin	06c73df55a	Revert "add datadog /api/v2/series and /api/beta/sketches support (#5094 )" This reverts commit `543f218fe9`. Reason for revert: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094#issuecomment-1839789080	2023-12-05 02:26:22 +02:00
Aliaksandr Valialkin	bc550e22d7	Revert "lib/protoparser/datadog: follow-up after 543f218fe96574b9b2189c8350bb09afa349e3bb" This reverts commit `98d0f81f21`. Reson for revert: see https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094#issuecomment-1839789080	2023-12-05 02:19:29 +02:00
Aliaksandr Valialkin	0160435802	app/vmagent: code cleanup for Kafka and Google PubSub consumers / producers - Add links to relevant docs into descriptions for every -kafka.* and -gcp.pubsub.* command-line flags. - Wait until message processing goroutines are stopped before returning from gcppubsub.Stop(). - Prevent from multiple calls to Init() without Stop(). - Drop message if tenantID cannot be parsed properly. - Take into account tenantID for all the supported message formats. - Support gzip-compressed messages for graphite format. - Use exponential backoff sleep when the message cannot be pushed to remote storage systems because of disabled on-disk persistence - https://docs.victoriametrics.com/vmagent.html#disabling-on-disk-persistence - Unblock from sleep as soon as Stop() is called. Previously the sleep could take up to 2 seconds after Stop() is called. - Remove unused globalCtx and initContext from app/vmagent/remotewrite/gcppubsub - Mention Google PubSub support at docs/enterprise.md - Make Google PubSub docs more clear at docs/vmagent.md This is a follow-up for commits 115245924a5f096c5a3383d6cc8e8b6fbd421984 and e6eab781ce42285a6a1750dc01eba6801dd35516 . Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/717 Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/713	2023-12-04 22:46:28 +02:00
Aliaksandr Valialkin	f5c4fcc250	lib/backup: consistently use path.Join() when constructing paths for s3, gs and azblob E.g. replace `fs.Dir + filePath` with `path.Join(fs.Dir, filePath)` The fs.Dir is guaranteed to end with slash - see Init() functions. The filePath may start with slash. If it starts with slash, then `fs.Dir + filePath` constructs an incorrect path with double slashes. path.Join() properly substitutes duplicate slashes with a single slash in this case. While at it, also substitute incorrect usage of filepath.Join() with path.Join() for constructing paths to object storage systems, which expect forward slashes in paths. filepath.Join() substittues forward slashes with backslashes on Windows, so this may break creating or managing backups from Windows. This is a follow-up for 0399367be602b577baf6a872ca81bf0f99ba401b Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/719	2023-12-04 10:34:39 +02:00
Aliaksandr Valialkin	487f6380d0	lib/promscrape: show dropped targets because of sharding at /service-discovery page Previously the /service-discovery page didn't show targets dropped because of sharding ( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ). Show also the reason why every target is dropped at /service-discovery page. This should improve debuging why particular targets are dropped. While at it, do not remove dropped targets from the list at /service-discovery page until the total number of targets exceeds the limit passed to -promscrape.maxDroppedTargets . Previously the list was cleaned up every 10 minutes from the entries, which weren't updated for the last minute. This could complicate debugging of dropped targets. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389	2023-12-01 16:48:48 +02:00
Aliaksandr Valialkin	ac65c6b178	lib/promrelabel: add `keep_if_contains` and `drop_if_contains` relabeling actions	2023-11-29 12:22:43 +02:00
Nikolay	41f7940f97	lib/streamaggr: properly reference slice with labels (#5406 ) * lib/streamaggr: properly reference slice with labels by limiting slice capacity. It must fix issues with slice modification, in case of append new slice will be allocated, instead of modifying refrenced slice https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5402 * Reduce memory allocations when output_relabel_configs adds new labels to output samples --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-29 10:03:04 +02:00
hagen1778	98d0f81f21	lib/protoparser/datadog: follow-up after `543f218fe9` * prevent /api/v1 from panic on parsing rows * add tests for Extract function for v1 and v2 api's * separate request types in different pools to prevent different objects mixing * add changelog line `543f218fe9` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-28 15:04:15 +01:00
Andrii Chubatiuk	543f218fe9	add datadog /api/v2/series and /api/beta/sketches support (#5094 ) Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com> Co-authored-by: Nikolay <https://github.com/f41gh7> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2023-11-28 14:52:29 +01:00
Aliaksandr Valialkin	5034aa0773	app/vmagent: follow-up for `090cb2c9de` - Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check for the result returned from these functions. - Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue. - Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case. - Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage cannot keep up with the data ingestion rate. - Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples. - Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push data to persistent queue when -remoteWrite.disableOnDiskQueue is set. - Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics, because they are replaced with vmagent_remotewrite_samples_dropped_total metric. - Update 'Disabling on-disk persistence' docs at docs/vmagent.md - Update stale comments in the code Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110	2023-11-25 12:09:44 +02:00
Nikolay	090cb2c9de	app/vmagent: allow to disabled on-disk persistence (#5088 ) * app/vmagent: allow to disabled on-disk queue Previously, it wasn't possible to build data processing pipeline with a chain of vmagents. In case when remoteWrite for the last vmagent in the chain wasn't accessible, it persisted data only when it has enough disk capacity. If disk queue is full, it started to silently drop ingested metrics. New flags allows to disable on-disk persistent and immediatly return an error if remoteWrite is not accessible anymore. It blocks any writes and notify client, that data ingestion isn't possible. Main use case for this feature - use external queue such as kafka for data persistence. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110 * adds test, updates readme * apply review suggestions * update docs for vmagent * makes linter happy --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-24 13:42:11 +01:00
Roman Khavronenko	0cf55ded34	lib/protoparser: decrease `import.maxLineLen` from 100MB to 10MB (#5364 ) Tests showed that importing a single line with 70MB size takes 5.3GiB RSS memory for VictoriaMetrics single-node. In the scenario when user exports and imports data from one VM to another, it could possibly lead to OOM exception for destination VM. Importing a single line with 16MB size taks 1.3GiB RSS memory. Hence, the limit for `import.maxLineLen` was decreased from 100MB to 10MB to improve reliability of VictoriaMetrics during imports. Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-24 12:53:04 +02:00
hagen1778	d493da562e	lib/storage: fix typo Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-21 11:20:43 +01:00
hagen1778	e96b4410a1	lib/storage: fix typo Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-21 10:52:53 +01:00
Hui Wang	ae3107153c	lib/protoparser/promremotewrite: fall back to zstd decoding if Snappy-decoding fails (#5344 ) This case is possible after the following steps: 1. vmagent successfully performed handshake with the -remoteWrite.url and the remote storage supports zstd-compressed data. 2. remote storage became unavailable or slow to ingest data, vmagent compressed the collected data into blocks with zstd and puts these blocks to persistent queue on disk. 3. vmagent restarts and the remote storage is unavailable during the handshake, then vmagent falls back to Snappy compression. 4. vmagent starts sending zstd-compressed data from persistent queue to the remote storage, while falsely advertizing it sends Snappy-compressed data. 5. The remote storage receives zstd-compressed data and fails unpacking it with Snappy. The solution is the same as `12cd32fd75`, just fall back to zstd decompression if Snappy decompression fails.	2023-11-17 15:51:09 +01:00
Aliaksandr Valialkin	d9a7dea9a1	lib/querytracer: add missing blank comment line after `3121d76bee`	2023-11-15 16:10:43 +01:00
Aliaksandr Valialkin	3076c1f400	lib/ingestserver: properly log the number of closed connections Previously there was off-by-one error, which resulted in logging len(conns-1) connections instead of len(conns) Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922	2023-11-14 21:53:24 +01:00
Nikolay	3121d76bee	lib/querytracer: makes package concurrent safe to use (#5322 ) * lib/querytracer: makes package concurrent safe to use it must fix various issues with concurrent code usage. Especially, when it's not reasonable to wait for all goroutines to be finished * wip --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-14 20:59:08 +01:00
Aliaksandr Valialkin	cb106bdf39	lib/logger: increase default -loggerMaxArgLen command-line flag value from 500 to 1000 The 500 chars limit for the maximum arg lengths during logging appeared to be too low for some cases	2023-11-14 19:52:27 +01:00
Aliaksandr Valialkin	f9bd265249	lib/ingestserver: typo fix after `f7834767c1` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922	2023-11-14 03:26:26 +01:00
Zakhar Bessarab	37997abd14	vmcluster: re-routing enhancement (#5293 ) * app/vmstorage: close vminsert connections gradually before stopping storage Implements graceful shutdown approach suggested here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1768146878 Test results for this can be found here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1790640274 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * app/vmstorage: update graceful shutdown logic - close connections from vminsert in determenistic order - update flag description - lower default timeout to 25 seconds. 25 seconds value was chosen because the lowest default value used in default configuration deployments is 30s(default value in Kubernetes and ansible-playbooks). Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs/cluster: add information about re-routing enhancement during restart Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs/changelog: add entry for new command-line flag Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * {app/vmstorage,lib/ingestserver}: address review feedback Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs/cluster: add note to update workload scheduler timeout Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * wip --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-14 01:03:44 +01:00
Aliaksandr Valialkin	cef7a39ba3	lib/logstorage: always check the previous indexBlockHeader for blocks with matching tenantID and/or streamID The previous indexBlockHeader may contain blocks for the matching tenantID and/or streamID, so it must be scanned unconditionally during the search. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5295 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4856 This is a follow-up for `89dcbc2fe7`	2023-11-13 23:13:53 +01:00
XLONG96	89dcbc2fe7	lib/logstorage: fix streamID and tenantID search (#4856 ) (#5295 )	2023-11-13 23:09:39 +01:00
Aliaksandr Valialkin	0feaeca3c1	lib/protoparser/promremotewrite: fall back to Snappy decoding if zstd decoding fails This case is possible after the following steps: 1. vmagent tries to perform handshake with the -remoteWrite.url in order to determine whether the remote storage supports zstd-compressed data. 2. The remote storage is unavailable during the handshake. In this case vmagent falls back to Snappy compression for the data sent to the remote storage. 3. vmagent compresses the collected data into blocks with Snappy and puts these blocks to persistent queue on disk. 4. The remote storage becomes available. 5. vmagent restarts, performs the handshake with the remote storage and detects that it supports zstd-compressed data. 6. vmagent starts sending Snappy-compressed data from persistent queue to the remote storage, while falsely advertizing it sends zstd-compressed data. 7. The remote storage receives Snappy-compressed data and fails unpacking it with zstd. The solution is to just fall back to Snappy decompression if zstd decompression fails. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301	2023-11-13 21:19:08 +01:00
Aliaksandr Valialkin	8af56ea2ed	lib/htmlcomponents: use relative links for the top page and for favicon.ico This allows hiding VictoriaMetrics components behind proxies with arbitrary path prefixes. For example, vmagent HTTP handlers can be served via /vmagent/ path prefix: - http://proxy/vmagent/targets - http://proxy/vmagent/service-discovery The path prefix can be arbitrary. For example, below are vmagent urls for /tenantID/vmagent/ path prefix: - http://proxy/tenantID/vmagent/targets - http://proxy/tenantID/vmagent/service-discovery While at it, consistently serve favicon.ico from any path directory. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5306 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5307	2023-11-13 20:29:05 +01:00
Aliaksandr Valialkin	cf23dc6480	all: cleanup: remove `// +build ...` lines, since they are no longer needed after Go1.17, and the minimum supported Go version for VictoriaMetrics source code is Go1.20	2023-11-13 19:12:51 +01:00
Aliaksandr Valialkin	3e93fa61ad	lib/regexutil: properly handle alternate regexps surrounded by .+ or .* Previously the following regexps were improperly handled: .+foo\|bar.+ .foo\|bar. This could lead to unexpected regexp match results. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5297 Thanks to @Haleygo for the initial attempt to fix the issue at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5308	2023-11-13 18:23:38 +01:00
Aliaksandr Valialkin	6340911d38	lib/stringsutil: add tests for LimitStringLen() function	2023-11-13 10:32:33 +01:00
Dmytro Kozlov	4722b70c89	lib/stringsutil: fix failing test (#5313 ) We have failed test on master branch. ``` --- FAIL: TestFormatLogMessage (0.00s) logger_test.go:24: unexpected result; got "foo: abcde, \"foo bar baz\", xx" want "foo: a..e, \"f..z\", xx" ``` if failed because maxArgs maxLen <= 4 in the `LimitStringLen` in that case we always will return the income string but in the test we limit the maxLen by value 4 ``` f("foo: %s, %q, %s", []interface{}{"abcde", fmt.Errorf("foo bar baz"), "xx"}, 4, `foo: a..e, "f..z", xx`)	2023-11-13 09:51:49 +01:00
Aliaksandr Valialkin	230230cf0b	lib/logger: add `-loggerMaxArgLen` command-line flag for fine-tuning the maximum length of logged args	2023-11-11 12:30:08 +01:00
Aliaksandr Valialkin	010dc15d16	lib/blockcache: do not cache entries, which were attempted to be accessed 1 or 2 times Previously entries which were accessed only 1 time weren't cached. It has been appeared that some rarely executed heavy queries may read indexdb block twice in a row instead of once. There is no need in caching such a block then. This change should eliminate cache size spikes for indexdb/dataBlocks when such heavy queries are executed. Expose -blockcache.missesBeforeCaching command-line flag, which can be used for fine-tuning the number of cache misses needed before storing the block in the caching.	2023-11-10 22:28:03 +01:00
Aliaksandr Valialkin	d407d13e7b	Makefile: update golangci-lint version from v1.54.2 to v1.55.1 See https://github.com/golangci/golangci-lint/releases/tag/v1.55.1	2023-11-10 20:23:48 +01:00
Aliaksandr Valialkin	815fda8995	docs: update -help output after recent changes to VictoriaMetrics components	2023-11-02 20:27:10 +01:00
Aliaksandr Valialkin	65db6609eb	docs/CHANGELOG.md: update the description of the optimization for SLO/SLI-like queries according to latest changes See commits `4497a08e3d` and `92826b0b4a`	2023-11-02 20:05:05 +01:00
Aliaksandr Valialkin	714af89b13	lib/httpserver: follow-up for `0638bbe69c` - Replace spaces with underscores in the `reason` label value for the vm_http_request_errors_total metric in order be consistent with Prometheus-like naming - Clarify the description for the change at docs/CHANGELOG.md Updates https://github.com/victoriaMetrics/victoriaMetrics/issues/4590 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5166	2023-10-31 18:52:39 +01:00
Aliaksandr Valialkin	98699f203b	lib/persistentqueue: properly re-create flock.lock file inside directory if persistent queue is broken. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5249 Thanks to @Sniper91 for the bugreport and initial fix at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5233	2023-10-31 18:38:32 +01:00
Aliaksandr Valialkin	efb6ac27c2	lib/httpserver: call Request.Header() only once instead of calling it each time a new request header is set This is a follow-up for `ad839aa492`	2023-10-31 18:38:32 +01:00
Aliaksandr Valialkin	7ac49162c6	lib/storage: follow-up for `29cebd82fb` Use atomic.CompareAndSwapUint32() instead of atomic.LoadUint32() followed by atomic.StoreUint32(). This makes the code more clear. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159	2023-10-31 16:08:54 +01:00
venkatbvc	0638bbe69c	vmauth: add counter metrics for auth successes and failures (#5166 ) New labels `reason="wrong basic auth creds"` and `reason="wrong auth key"` were added to metric `vm_http_request_errors_total` to help identify auth errors. https://github.com/victoriaMetrics/victoriaMetrics/issues/4590 Co-authored-by: Rao, B V Chalapathi <b_v_chalapathi.rao@nokia.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2023-10-31 12:48:02 +01:00
Dima Lazerka	ad839aa492	lib/httpserver: add flags to specify HSTS / Frame-Options / CSP headers for httpserver (#5111 ) support `Strict-Transport-Security`, `Content-Security-Policy` and `X-Frame-Options` HTTP headers in all VictoriaMetrics components. The values for headers can be specified by users via the following flags: `-http.header.hsts`, `-http.header.csp` and `-http.header.frameOptions`. Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-10-30 11:33:38 +01:00
Roman Khavronenko	29cebd82fb	lib/storage: log warning about RO mode only on state change (#5191 ) Before, vmstorage would log the same message each second producing excessive amount of logs. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-30 10:52:57 +01:00
Aliaksandr Valialkin	613b545dfd	lib/promscrape/discovery/kubernetes: propagate possible errors at newAPIWatcher() to the caller This allows substituting FATAL panics with recoverable runtime errors such as missing or invalid TLS CA file and/or missing/invalid /var/run/secrets/kubernetes.io/serviceaccount/namespace file. Now these errors are logged instead of PANIC'ing, so they can be fixed by updating the corresponding files without the need to restart vmagent. This is a follow-up for `90427abc65` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5243	2023-10-27 20:24:46 +02:00
Hui Wang	90427abc65	lib/promscrape/discovery/kubernetes: avoid possible panic if given caFile under kubernetes.SDConfig.HTTPClientConfig is not exist (#5243 ) follow up `d5a599badc`	2023-10-27 20:20:22 +02:00
Aliaksandr Valialkin	632d788b63	lib/promscrape/discovery/kubernetes: stop all the url watchers, which belong to a particular groupWatcher, at once Previously url watchers for pod, service and node objects could be mistakenly closed when service discovery was set up only for endpoints and endpointslice roles, since watchers for these roles may start start pod, service and node url watchers with nil apiWatcher passed to groupWatcher.startWatchersForRole(). Now all the url watchers, which belong to a particular groupWatcher, are stopped at once when this groupWatcher has no apiWatcher subscribers. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5216 The issue has been introduced in v1.93.5 when addressing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850	2023-10-27 13:51:35 +02:00
Hui Wang	7c90ce39cb	do not print redundant error logs when failed to scrape consul or no… (#5239 ) * do not print redundant error logs when failed to scrape consul or nomad target prometheus performs the same because it uses consul lib which just drops the error(`1806bcb38c/api/api.go (L1134)`)	2023-10-27 13:31:55 +08:00
Aliaksandr Valialkin	cdbc06a639	lib/promscrape: do not add a suggestion for enabling TCP6 in error message when the dial address is TCPv4	2023-10-25 17:57:56 -07:00
Dima Lazerka	8b41b506c2	Revert "lib/promscrape: do not add a suggestion for enabling TCP6 in error message when the dial address is TCPv4" It broke CI (lint) This reverts commit `5464376d16`.	2023-10-25 16:24:31 -07:00
Aliaksandr Valialkin	5464376d16	lib/promscrape: do not add a suggestion for enabling TCP6 in error message when the dial address is TCPv4	2023-10-26 00:29:51 +02:00
Aliaksandr Valialkin	ac933cc423	lib/promscrape: properly track the number of updated service discovery routines inside Config.mustRestart() This is a follow-up for `d5a599badc`	2023-10-26 00:06:29 +02:00
Aliaksandr Valialkin	612dcf231a	lib/promauth: typo fix in the error message after `d5a599badc`: obtaine -> obtain	2023-10-25 23:38:00 +02:00
Aliaksandr Valialkin	d5a599badc	lib/promauth: follow-up for `e16d3f5639` - Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup don't prevent from processing the corresponding scrape targets after the file becomes correct, without the need to restart vmagent. Previously scrape targets with invalid TLS CA file or TLS client certificate files were permanently dropped after the first attempt to initialize them, and they didn't appear until the next vmagent reload or the next change in other places of the loaded scrape configs. - Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent. Previously the old TLS CA was used until vmagent restart. - Properly handle errors during http request creation for the second attempt to send data to remote system at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing, since the returned request is nil on error. - Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent. Previously it could miss details on the source of the request. - Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header of every http request issued by vmagent during service discovery or target scraping. Re-use the HTTP client instead until the corresponding scrape config changes. - Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached, e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server when auth header cannot be obtained because of temporary error. - Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config. Cache the loaded certificate and the error for one second. This should significantly reduce CPU load when scraping big number of targets with the same tls_config. - Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`. - Improve test coverage at lib/promauth - Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later. Previously vmagent was exitting in this case. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959	2023-10-25 23:19:37 +02:00
Aliaksandr Valialkin	c22e3e7b1d	lib/promscrape/discovery/kubernetes/kubeconfig_test.go: make TestParseKubeConfigSuccess test code easier to follow	2023-10-25 23:17:18 +02:00
Aliaksandr Valialkin	eed5206376	lib/promauth: properly parse string contents for ca, cert and key fields at tls_config Previously yaml parser wasn't accepting string values for these fields, because it was mistakenly expecting a list of uint8 values instead.	2023-10-25 23:12:21 +02:00
Aliaksandr Valialkin	4afcb2a689	lib/promscrape: move duplicate code from functions, which collect ScrapeWork lists for distinct SD types into Config.getScrapeWorkGeneric() This removes more than 200 lines of duplicate code	2023-10-25 23:03:40 +02:00
Aliaksandr Valialkin	42dd71bb63	all: consistently use %w instead of %s in when error is passed to fmt.Errorf() This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.	2023-10-25 21:24:03 +02:00
Aliaksandr Valialkin	305c96e384	lib/workingsetcache: fix outdated comments for Load() and New() functions	2023-10-25 21:04:20 +02:00
Alexander Marshalov	33484d3365	lib/streamaggr: respect `streamAgg.dropInput` with empty stream aggr config (#5213 ) https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5207	2023-10-20 15:55:58 +02:00
hagen1778	fd2d07ba33	lib/storage: follow-up after `188cfe3a85` `188cfe3a85` See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-17 15:45:14 +02:00
Ilya Trefilov	188cfe3a85	lib/storage: do not create tsid if metric contains stale marker(#5069 ) (#5174 ) https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069	2023-10-17 15:30:58 +02:00
Hui Wang	e16d3f5639	fix inconsistent behaviors with prometheus when scraping (#5153 ) * fix inconsistent behaviors with prometheus when scraping 1. address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959. skip job with wrong syntax in `scrape_configs` with error logs instead of exiting; 2. show error messages on vmagent /targets ui if there are wrong auth configs in `scrape_configs`, previously will print error logs and do scrape without auth header; 3. don't send requests if there are wrong auth configs in: 1. vmagent remoteWrite; 2. vmalert datasource/remoteRead/remoteWrite/notifier. * add changelogs * address review comments * fix ut	2023-10-17 17:58:19 +08:00
Aliaksandr Valialkin	b8c267075e	lib/promscrape: add a link to https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets in descriptions for -promscrape.cluster.* command-line flags This should help users figuring out the purpose of -promscrape.cluster.* command-line flags	2023-10-16 14:46:22 +02:00
Aliaksandr Valialkin	fc98b62760	lib/promutils, app/vmalert-tool/unittest: move promutils.Duration.ParseTime() to app/vmalert-tool/unittest.durationToTime() The ParseTime() function looks strange, since it converts relative duration to absolute time since Unix Epoch. In most scenarios such a conversion is used by mistake. It is better to do not expose such a function for public use and hide it inside the package where it is needed, e.g. inside app/vmalert-tool/unittest. This is a follow-up for `dc28196237` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4789	2023-10-16 14:19:31 +02:00
Alexander Marshalov	b248413a07	fixed error when creating a full backup using the `-origin` flag (#5180 ) * fixed error when creating a full backup using the `-origin` flag (#5144) * Update docs/CHANGELOG.md --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-10-16 12:02:51 +02:00
Haleygo	8b6ccad41d	fix ingesting stale point, follow up `fe8cc573d1` (#5179 )	2023-10-16 09:05:37 +02:00
Aliaksandr Valialkin	2c334ed953	app/{vmagent,vminsert}: follow-up for NewRelic data ingestion protocol support This is a follow-up for `f60c08a7bd` Changes: - Make sure all the urls related to NewRelic protocol start from /newrelic . Previously some urls were started from /api/v1/newrelic - Remove /api/v1 part from NewRelic urls, since it has no sense - Remove automatic transformation from CamelCase to snake_case for NewRelic labels and metric names, since it may complicate the transition from NewRelic to VictoriaMetrics. Preserve all the metric names and label names, so users could query metrics and labels by the same names which are used in NewRelic. The automatic transformation from CamelCase to snake_case can be added later as a special action for relabeling rules if needed. - Properly update per-tenant data ingestion stats at app/vmagent/newrelic/request_handler.go . Previously it was always zero. - Fix NewRelic urls in vmagent when multitenant data ingestion is enabled. Previously they were mistakenly started from `/`. - Document NewRelic data ingestion url at docs/Cluster-VictoriaMetrics.md - Remove superflouos memory allocations at lib/protoparser/newrelic - Improve tests at lib/protoparser/newrelic/* Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3520 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4712	2023-10-16 00:25:25 +02:00
hagen1778	fe8cc573d1	docs: remove extra `/` in the end of the link Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-14 07:43:40 +02:00
Haleygo	dc28196237	vmalert-tool: implement unittest (#4789 ) 1. split package rule under /app/vmalert, expose needed objects 2. add vmalert-tool with unittest subcmd https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945	2023-10-13 13:54:33 +02:00
Zakhar Bessarab	2fc7e9f47e	lib/backup: add `-deleteAllObjectVersions` command-line flag (#5147 ) New flag enforces removal of all versions of the object in remote object storage. See: - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5121 - https://docs.victoriametrics.com/vmbackup.html#permanent-deletion-of-objects-in-s3-compatible-storages	2023-10-10 14:13:23 +02:00
Dmytro Kozlov	f60c08a7bd	app/(vminsert\|vmagent): add support for new relic infrastructure agent (#4712 ) Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-10-05 14:39:51 +02:00
Aliaksandr Valialkin	75dd7b30ba	lib/filestream: add `-filestream.disableFadvise` syscall for unconditional disabling of `fadvise` syscall This may be needed in rare cases when performing backups on systems with big number of CPU cores and big value passed to -concurrency command-line flag. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5120	2023-10-04 16:19:46 +02:00
Zakhar Bessarab	b296c8e95a	lib/logstorage: fix free space check (#5113 ) Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-10-03 12:39:41 +02:00
Roman Khavronenko	a4bd73ec7e	lib/promscrape: make concurrency control optional (#5073 ) * lib/promscrape: make concurrency control optional Before, `-maxConcurrentInserts` was limiting all calls to `promscrape.Parse` function: during ingestion and scraping. This behavior is incorrect. Cmd-line flag `-maxConcurrentInserts` should have effect onl on ingestion. Since both pipelines use the same `promscrape.Parse` function, we extend it to make concurrency limiter optional. So caller can decide whether concurrency should be limited or not. This commit makes `c53b5788b4` obsolete. Signed-off-by: hagen1778 <roman@victoriametrics.com> * Revert "dashboards: move `Concurrent inserts` panel to Troubleshooting section" This reverts commit `c53b5788b4`. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-02 21:32:11 +02:00
Aliaksandr Valialkin	859977d591	Revert "lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total` (#5074 )" This reverts commit `74301cdbf5`. Reason for revert: vmagent already provides better approach for detecting slow scrape targets via the following query: scrape_duration_seconds / scrape_timeout_seconds > 1 This query depends on automatically generated per-target metrics. See https://docs.victoriametrics.com/vmagent.html#automatically-generated-metrics for more details. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5074	2023-10-02 20:59:56 +02:00
Aliaksandr Valialkin	8dce4eb189	lib/logstorage: follow-up for `94627113db` - Move uniqueFields from rows to blockStreamMerger struct. This allows localizing all the references to uniqueFields inside blockStreamMerger.mustWriteBlock(), which should improve readability and maintainability of the code. - Remove logging of the event when blocks cannot be merged because they contain more than maxColumnsPerBlock, since the provided logging didn't provide the solution for the issue with too many columns. I couldn't figure out the proper solution, which could be helpful for end user, so decided to remove the logging until we find the solution. This commit also contains the following additional changes: - It truncates field names longer than 128 chars during logs ingestion. This should prevent from ingesting bogus field names. This also should prevent from too big columnsHeader blocks, which could negatively affect search query performance, since columnsHeader is read on every scan of the corresponding data block. - It limits the maximum length of const column value to 256. Longer values are stored in an ordinary columns. This helps limiting the size of columnsHeader blocks and improving search query performance by avoiding reading too long const columns on every scan of the corresponding data block. - It deduplicates columns with identical names during data ingestion and background merging. Previously it was possible to pass columns with duplicate names to block.mustInitFromRows(), and they were stored as is in the block. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4969	2023-10-02 19:19:08 +02:00
Roman Khavronenko	74301cdbf5	lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total` (#5074 ) * lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total` add metric `vm_promscrape_scrapes_skipped_total`to show whether vmagent skips the scrapes. This could happen if vmagent is overloaded or target is responding too slow for configured `scrape_interval`. The follow-up commit should add a corresponding alerting rule and panel to vmagent dashboard. Signed-off-by: hagen1778 <roman@victoriametrics.com> * deployment/docker: add `TooManyScrapeSkips` alerting rule for vmagent Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards: add panels `Scrape duration 0.99 quantile` and `Skipped scrapes` to vmagent dashboard Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-02 17:12:12 +02:00
Aliaksandr Valialkin	7b33a27874	lib/logstorage: follow-up for `8a23d08c21` - Compare the actual free disk space to the value provided via -storage.minFreeDiskSpaceBytes directly inside the Storage.IsReadOnly(). This should work fast in most cases. This simplifies the logic at lib/storage. - Do not take into account -storage.minFreeDiskSpaceBytes during background merges, since it results in uncontrolled growth of small parts when the free disk space approaches -storage.minFreeDiskSpaceBytes. The background merge logic uses another mechanism for determining whether there is enough disk space for the merge - it reserves the needed disk space before the merge and releases it after the merge. This prevents from out of disk space errors during background merge. - Properly handle corner cases for flushing in-memory data to disk when the storage enters read-only mode. This is better than losing the in-memory data. - Return back Storage.MustAddRows() instead of Storage.AddRows(), since the only case when AddRows() can return error is when the storage is in read-only mode. This case must be handled by the caller by calling Storage.IsReadOnly() before adding rows to the storage. This simplifies the code a bit, since the caller of Storage.MustAddRows() shouldn't handle errors returned by Storage.AddRows(). - Properly store parsed logs to Storage if parts of the request contain invalid log lines. Previously the parsed logs could be lost in this case. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4945	2023-10-02 16:52:23 +02:00
Aliaksandr Valialkin	10d9214980	lib/logstorage: run up to GOMAXPROCS flushers of old in-memory parts to disk One flusher isn't enough under high data ingestion rate. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775	2023-10-02 16:20:59 +02:00

1 2 3 4 5 ...

2236 commits