github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	74affa3aec	lib/protoparser/graphite: follow-up for `476faf5578` - Clarify the description of -graphite.sanitizeMetricName command-line flag at README.md - Do not sanitize tag values - only metric names and tag names must be sanitized, since they are treated specially by Grafana. Grafana doesn't apply any restrictions on tag values. - Properly replace more than two consecutive dots with a single dot. - Disallow unicode letters in metric names and tag names, since neither Prometheus nor Grafana do not support them. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6489 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6077	2024-07-17 12:41:55 +02:00
Aliaksandr Valialkin	58a757cd01	lib: consistently use regexp.Regexp.ReplaceAllLiteralString instead of regexp.Regexp.ReplaceAllString in places where the replacement cannot contain matching group placeholders	2024-07-17 12:41:54 +02:00
rtm0	bdc0e688e8	Fix inconsistent error handling in Storage.AddRows() (#6583 ) ### Describe Your Changes `Storage.AddRows()` returns an error only in one case: when `Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But the same error is treated as a warning when it happens inside `Storage.add()` or returned by `Storage.prefillNextIndexDB()`. This commit fixes this inconsistency by treating the error returned by `Storage.updatePerDateData()` as a warning as well. As a result `Storage.add()` does not need a return value anymore and so doesn't `Storage.AddRows()`. Additionally, this commit adds a unit test that checks all cases that result in a row not being added to the storage. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-07-17 12:07:14 +02:00
Aliaksandr Valialkin	c1e32f4517	lib/promrelabel: add test for IfExpression.String() function While at it, simplify this function a bit after the commit `861852f262` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462	2024-07-16 18:31:05 +02:00
Aliaksandr Valialkin	4304950391	lib/promscrape/discovery/yandexcloud: follow-up for `070abe5c71` - Obtain IAM token via GCE-like API instead of Amazon EC2 IMDSv2 API, since it looks like IMDBSv2 API isn't supported by Yandex Cloud according to https://yandex.cloud/en/docs/security/standard/authentication#aws-token : > So far, Yandex Cloud does not support version 2, so it is strongly recommended > to technically disable getting a service account token via the Amazon EC2 metadata service. - Try obtaining IAM token via GCE-like API at first and then fall back to the deprecated Amazon EC2 IMDBSv1. This should prevent from auth errors for instances with disabled GCE-like auth API. This addresses @ITD27M01 concern at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513#issuecomment-1867794884 - Make more clear the description of the change at docs/CHANGELOG.md , add reference to the related issue. P.S. This change wasn't tested in prod because I have no access to Yandex Cloud. It is recommended to test this change by @ITD27M01 and @vmazgo , who filed the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6524	2024-07-16 17:58:40 +02:00
Aliaksandr Valialkin	57000f5105	lib/promscrape: follow-up for `1e83598be3` - Clarify that the -promscrape.maxScrapeSize value is used for limiting the maximum scrape size if max_scrape_size option isn't set at https://docs.victoriametrics.com/sd_configs/#scrape_configs - Fix query example for scrape_response_size_bytes metric at https://docs.victoriametrics.com/vmagent/#automatically-generated-metrics - Mention about max_scrape_size option at the -help description for -promscrape.maxScrapeSize command-line flag - Treat zero value for max_scrape_size option as 'no scrape size limit' - Change float64 to int type for scrapeResponseSize struct fields and function args, since response size cannot be fractional - Optimize isAutoMetric() function a bit - Sort auto metrics in alphabetical order in isAutoMetric() and in scrapeWork.addAutoMetrics() functions for better maintainability in the future Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6434 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429	2024-07-16 12:38:21 +02:00
Aliaksandr Valialkin	7a3394bbe1	Revert "lib/protoparser/opentelemetry/firehose: escape requestID before returning it to user (#6451 )" This reverts commit `cd1aca217c`. Reason for revert: this commit has no sense, since the firehose response has application/json content-type, so it must contain JSON-encoded timestamp and requestId fields according to https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html#responseformat . HTML-escaping the requestId field may break the response, so the client couldn't correctly recognize the html-escaped requestId. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6451	2024-07-16 09:49:19 +02:00
Aliaksandr Valialkin	233e5f0a9e	lib/httpserver: skip basic auth check for additional request paths, which should call httpserver.CheckAuthFlag() This is a follow-up for `61dce6f2a1` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6338 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329	2024-07-16 01:00:45 +02:00
Aliaksandr Valialkin	784327ea30	lib/uint64set: optimize Set.Has() for nil Set - it should be inlined now This makes unnecessary the checkDeleted variable at lib/storage/index_db.go This is a follow-up for `b984f4672e` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6342	2024-07-15 23:59:20 +02:00
Aliaksandr Valialkin	832e088659	lib/mergeset: properly update TableMetrics.TooLongItemsDroppedTotal inside Table.UpdateMetrics Substitute '+=' with '=', since tooLongItemsTotal is global counter, which doesn't belong to the Table struct. This is a follow-up for `69d244e6fb` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6297	2024-07-15 23:39:10 +02:00
Aliaksandr Valialkin	a468a6e985	lib/{httputils,netutil}: move httputils.GetStatDialFunc to netutil.NewStatDialFunc - Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call - NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil - Simplify the implementation of NewStatDialFunc by removing sync.Map from there. - Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils - Use gauge instead of counter type for *_conns metric This is a follow-up for `d7b5062917` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299	2024-07-15 23:02:34 +02:00
Aliaksandr Valialkin	ad367c17bf	lib/streamaggr/streamaggr.go: typo fix after `5e29ef5ed5`: IgnoredNaNSamples -> ignoredNaNSamples	2024-07-15 21:58:56 +02:00
Aliaksandr Valialkin	db557b86ee	app/vmagent/remotewrite: follow-up for `f153f54d11` - Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go . This improves code maintainability a bit. - Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs(). This prevents from potential resource leaks. - Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation. This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions. - Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics. - Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation. This label should simplify investigation of the exposed metrics. - Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability: it is hard to use these labels in query filters and aggregation functions. - Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` . This metric shows the number of samples generated by the corresponding streaming aggregation rule. This metric has been added in the commit `861852f262` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 - Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice. This metric has been added in the commit `861852f262` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 - Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params, which could modify the behaviour of the constructed streaming aggregator. Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory. - Pass Options arg to LoadFromFile() function by reference, since this structure is quite big. This also allows passing nil instead of Options when default options are enough. - Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics, so they have consistent set of labels comparing to the rest of streaming aggregation metrics. - Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding `aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding `output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output metrics. This is a follow-up for the commit `2eb1bc4f81` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604 - Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason in the commit `2eb1bc4f81` . - Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function. While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268	2024-07-15 20:24:01 +02:00
Aliaksandr Valialkin	202e5704e6	vendor: update github.com/VictoriaMetrics/metrics from v1.34.1 to v1.35.0 Fix potential memory leaks across VictoriaMetrics codebase after metrics.UnregisterSet(s) call because of missing s.UnregisterAllMetrics() call. This is a follow-up for `6a6e34ab8e` . It is OK if some vmauth metrics aren't visible for a few microseconds when the previous metrics are unregistered and new metrics weren't registered yet. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6247 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6252 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5805	2024-07-15 10:43:37 +02:00
Aliaksandr Valialkin	c995ccad93	lib/{storage,mergeset}: do not allow setting dataFlushInterval to values smaller than pending{Items,Rows}FlushInterval Pending rows and items unconditionally remain in memory for up to pending{Items,Rows}FlushInterval, so there is no any sense in setting dataFlushInterval (the interval for guaranteed flush of in-memory data to disk) to values smaller than pending{Items,Rows}FlushInterval, since this doesn't affect the interval for flushing pending rows and items from memory to disk. This is a follow-up for `4c80b17027` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6221	2024-07-15 10:08:15 +02:00
Aliaksandr Valialkin	48ec66883a	lib/streamaggr: consistently use alphabetical order of benchmarked stream aggregation outputs	2024-07-15 09:53:19 +02:00
Aliaksandr Valialkin	5354374b62	lib/streamaggr: follow-up for `9c3d44c8c9` - Consistently enumerate stream aggregation outputs in alphabetical order across the source code and docs. This should simplify future maintenance of the corresponding code and docs. - Fix the link to `rate_sum()` at `see also` section of `rate_avg()` docs. - Make more clear the docs for `rate_sum()` and `rate_avg()` outputs. - Encapsulate output metric suffix inside rateAggrState. This eliminates possible bugs related to incorrect suffix passing to newRateAggrState(). - Rename rateAggrState.total field to less misleading rateAggrState.increase name, since it calculates counter increase in the current aggregation window. - Set rateLastValueState.prevTimestamp on the first sample in time series instead of the second sample. This makes more clear the code logic. - Move the code for removing outdated entries at rateAggrState into removeOldEntries() function. This make the code logic inside rateAggrState.flushState() more clear. - Do not write output sample with zero value if there are no input series, which could be used for calculating the rate, e.g. if only a single sample is registered for every input series. - Do not take into account input series with a single registered sample when calculating rate_avg(), since this leads to incorrect results. - Move {rate,total}AggrState.flushState() function to the end of rate.go and total.go files, so they look more similar. This shuld simplify future mantenance. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6243	2024-07-15 08:40:09 +02:00
Aliaksandr Valialkin	0145b65f25	app/vmagent/remotewrite: follow-up for `87fd400dfc` - Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage systems are configured with the disabled on-disk queue, every in-memory queue is full and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common, so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx relabeling and other processing in this case. - Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped(). Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set. In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full and on-disk queue is disabled. The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage processes pending data, so it drops aggregated samples in this case. - Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output, so it is clear that this flag can be set individually per each -remoteWrite.url. - Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems are configured with the disabled on-disk queue, then there is no sense in keeping samples on some of these systems, while dropping samples on the remaining systems, since this will result in global stall on the remote storage system with the disabled on-disk queue and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false from remotewrite.TryPush() in this case. This will result in infinite duplicate samples written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set. This allows proceeding with newly scraped / pushed samples by sending them to the remaining remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set. - Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test. - Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url. See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065	2024-07-13 02:25:19 +02:00
Aliaksandr Valialkin	0078399788	app/vmalert: switch from table-driven tests to f-tests This makes test code more clear and reduces the number of code lines by 500. This also simplifies debugging tests. See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e While at it, consistently use t.Fatal* instead of t.Error* across tests, since t.Error* requires more boilerplate code, which can result in additional bugs inside tests. While t.Error* allows writing logging errors for the same, this doesn't simplify fixing broken tests most of the time. This is a follow-up for `a9525da8a4`	2024-07-12 22:41:11 +02:00
hagen1778	2f65956259	lib/streamaggr: add missing test cases Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-07-12 11:06:45 +02:00
Hui Wang	2eb1bc4f81	vmagent: fix `vm_streamaggr_flushed_samples_total` counter (#6604 ) We use `vm_streamaggr_flushed_samples_total` to show the number of produced samples by aggregation rule, previously it was overcounted, and doesn't account for `output_relabel_configs`. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-07-12 10:56:07 +02:00
hagen1778	03e4c5c19c	lib/bakcup/azremote: follow-up after `5fd3aef549` Simplify tests by converting them to f-tests. `5fd3aef549` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-07-10 13:06:27 +02:00
justinrush	5fd3aef549	lib/backup: add support for Azure Managed Identity (#6518 ) ### Describe Your Changes These changes support using Azure Managed Identity for the `vmbackup` utility. It adds two new environment variables: * `AZURE_USE_DEFAULT_CREDENTIAL`: Instructs the `vmbackup` utility to build a connection using the [Azure Default Credential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity@v1.5.2#NewDefaultAzureCredential) mode. This causes the Azure SDK to check for a variety of environment variables to try and make a connection. By default, it tries to use managed identity if that is set up. This will close https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). ### Testing However you normally test the `vmbackup` utility using Azure Blob should continue to work without any changes. The set up for that is environment specific and not listed out here. Once regression testing has been done you can set up [Azure Managed Identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview) so your resource (AKS, VM, etc), can use that credential method. Once it is set up, update your environment variables according to the updated documentation. I added unit tests to the `FS.Init` function, then made my changes, then updated the unit tests to capture the new branches. I tested this in our environment, but with SAS token auth and managed identity and it works as expected. --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Justin Rush <jarush@epic.com> Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-07-10 11:52:05 +02:00
Aliaksandr Valialkin	ac06569c49	app/vlinsert/loki: use easyproto instead for parsing Loki protobuf messages	2024-07-10 03:05:17 +02:00
Aliaksandr Valialkin	aa9bb99527	lib/logstorage: drop all the pipes from the query when calculating the number of matching logs at /select/logsql/hits API	2024-07-10 00:39:28 +02:00
Aliaksandr Valialkin	3c02937a34	all: consistently use 'any' instead of 'interface{}' 'any' type is supported starting from Go1.18. Let's consistently use it instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.	2024-07-10 00:20:37 +02:00
Aliaksandr Valialkin	a9525da8a4	lib: consistently use f-tests instead of table-driven tests This makes easier to read and debug these tests. This also reduces test lines count by 15% from 3K to 2.5K See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e While at it, consistently use t.Fatal* instead of t.Error, since t.Error usually leads to more complicated and fragile tests, while it doesn't bring any practical benefits over t.Fatal*.	2024-07-09 22:40:50 +02:00
Aliaksandr Valialkin	35b3b95cbc	lib/promscrape/discovery/vultr: follow-up after `17e3d019d2` - Sort the discovered labels in alphabetical order at https://docs.victoriametrics.com/sd_configs/#vultr_sd_configs - Rename VultrConfigs to VultrSDConfigs to be consistent with the naming for other SD configs. - Prepare query arg filters for `list instances API` at newAPIConfig() instead of passing them in a separate listParams struct. This simplifies the code a bit. - Return error when bearer token isn't set at vultr_sd_configs, since this token is mandatory according to https://docs.victoriametrics.com/sd_configs/#vultr_sd_configs - Remove unused fields from the parsed response from Vultr list instances API in order to simplify the code a bit. - Remove double logging of errors inside getInstances() function, since these errors must be already logged by the caller. - Simplify tests, so they are easier to maintain. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6068	2024-07-05 17:40:03 +02:00
Aliaksandr Valialkin	c0caa69939	lib/logstorage: use quicktemplate.AppendJSONString instead of strconv.AppendQuote for encoding JSON strings The strconv.AppendQuote improperly encodes special chars such as \x1b . They must be encoded as \u001b . See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24	2024-07-05 01:22:23 +02:00
Aliaksandr Valialkin	2da7dfc754	Revert `c6c5a5a186` and `b2765c45d0` Reason for revert: There are many statsd servers exist: - https://github.com/statsd/statsd - classical statsd server - https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ ) - https://github.com/avito-tech/bioyino - high-performance statsd server - https://github.com/atlassian/gostatsd - statsd server in Go - https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd ( the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ). Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides significant advantages over the existing statsd servers, while has no significant drawbacks comparing to existing statsd servers. The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server. The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics ( see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers. So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent). Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation. In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation. This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed during querying. P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic specialized configuration for aggregating of statsd metrics. The main requirements for this configuration: - easy to write, read and update (ideally it should work out of the box for most cases without additional configuration) - hard to misconfigure (e.g. hard to shoot yourself in the foot) It would be great if this configuration will be compatible with the configuration of the most widely used statsd server. In the mean time it is recommended continue using external statsd server. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600	2024-07-03 23:51:56 +02:00
Aliaksandr Valialkin	d8c7cc266b	lib/promscrape: use prompbmarshal.MustParsePromMetrics function at parseData() test function The prompbmarshal.MustParsePromMetrics function has been added in the commit `cc4d57d650`	2024-07-03 16:08:13 +02:00
Aliaksandr Valialkin	bb00bae353	Revert "Exemplar support (#5982 )" This reverts commit `5a3abfa041`. Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below). Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ . Issues with Prometheus exemplars: - Prometheus still has only experimental support for exemplars after more than three years since they were introduced. It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature. See `0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)` and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage - It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus . - It looks like exemplars are supported for Histogram metric types only - see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar . Exemplars aren't supported for Counter, Gauge and Summary metric types. - Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query contains histogram_quantile() function. It queries exemplars via special Prometheus API - https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.) and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph. This makes the graph unusable, since it is litterally filled with thousands of exemplar dots. Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars. - Exemplars are usually connected to traces. While traces are good for some I doubt exemplars will become production-ready in the near future because of the issues outlined above. Alternative to exemplars: Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs - just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry! This doesn't work as expected in production as shown above. Are there better solutions, which work in production? Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job` and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry by these labes on the time range with the anomaly on metrics' graph.	2024-07-03 15:30:21 +02:00
Aliaksandr Valialkin	cc4d57d650	app/vmagent/remotewrite,lib/streamaggr: re-use common code in tests after `879771808b` - Export streamaggr.LoadFromData() function, so it could be used in tests outside the lib/streamaggr package. This allows removing a hack with creation of temporary files at TestRemoteWriteContext_TryPush_ImmutableTimeseries. - Move common code for mustParsePromMetrics() function into lib/prompbmarshal package, so it could be used in tests for building []prompbmarshal.TimeSeries from string. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6206	2024-07-03 15:21:36 +02:00
Aliaksandr Valialkin	f17b408643	lib/streamaggr: follow-up for the commit `c0e4ccb7b5` - Clarify docs for `Ignore aggregation intervals on start` feature. - Make more clear the code dealing with ignoreFirstIntervals at aggregator.runFlusher() functions. It is better from readability and maintainability PoV using distinct a.flush() calls for distinct cases instead of merging them into a single a.flush() call. - Take into account the first incomplete interval when tracking the number of skipped aggregation intervals, since this behaviour is easier to understand by the end users. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6137	2024-07-02 21:24:50 +02:00
Andrii Chubatiuk	476faf5578	lib/protoparser/graphite: added -graphite.sanitizeMetricName flag (#6489 ) ### Describe Your Changes Added flag to sanitize graphite metrics fixes #6077 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-07-02 14:56:41 +02:00
Aliaksandr Valialkin	3b6c78c26c	lib/logstorage: allow writing `after N` in front of `before N` at `stream_context` pipe	2024-07-02 01:38:20 +02:00
Andrii Chubatiuk	861852f262	lib/streamaggr: added stale samples metric, added metrics labels (#6462 ) ### Describe Your Changes - added stale metrics counters for input and output samples - added labels for aggregator metrics => `name="{rwctx}:{aggrId}:{aggrSuffix}"` - rwctx - global or number starting from 1 - aggrid - aggregator id starting from 1 - aggrSuffix - <interval>_(by\|without)_label1_label2_labeln e.g: `name="global:1:1m_without_instance_pod"` ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-07-01 14:56:17 +02:00
Aliaksandr Valialkin	6bb66cb3e9	lib/logstorage: properly search for the surrounding logs in `stream_context` pipe The set of log fields in the found logs may differ from the set of log fields present in the log stream. So compare only the log fields in the found logs when searching for the matching log entry in the log stream. While at it, return _stream field in the delimiter log entry, since this field is used by VictoriaLogs Web UI for grouping logs by log streams.	2024-07-01 02:29:50 +02:00
Aliaksandr Valialkin	bb0deb7ac4	lib/logstorage: add ability to store sorted log position into a separate field with `sort ... rank <fieldName>` syntax	2024-07-01 01:44:17 +02:00
Aliaksandr Valialkin	dc291d8980	lib/logstorage: add delimiter between log chunks returned from `\| stream_context` pipe	2024-07-01 01:30:37 +02:00
Aliaksandr Valialkin	d4ca651547	lib/logstorage: add `stream_context` pipe, which allows selecting surrounding logs for the matching logs	2024-06-28 19:14:29 +02:00
Aliaksandr Valialkin	0730f1324d	lib/logstorage: it is safe using `\| unroll` pipe in live tailing `\| unroll` pipe can make multiple copies of rows from the input row. This doesn't break live tailing, so allow `\| unroll` pipe in live tailing.	2024-06-27 19:44:57 +02:00
Aliaksandr Valialkin	7c8c040502	app/vlselect: properly return live tailing results	2024-06-27 15:05:57 +02:00
Aliaksandr Valialkin	87f1c8bd6c	lib/logstorage: work-in-progress	2024-06-27 14:20:43 +02:00
Andrii Chubatiuk	070abe5c71	added IMDSv2 for YC SD (#6524 ) ### Describe Your Changes Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-06-26 18:03:21 +02:00
rtm0	a42bd59ee4	Fix Date metricid cache consistency under concurrent use (#6534 ) ### Describe Your Changes Fix Date metricid cache consistency under concurrent use. When one goroutine calls Has() and does not find the cache entry in the immutable map it will acquire a lock and check the mutable map. And it is possible that before that lock is acquired, the entry is moved from the mutable map to the immutable map by another goroutine causing a cache miss. The fix is to check the immutable map again once the lock is acquired. ### Checklist The following checks are mandatory: - [x ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-06-26 17:33:38 +02:00
Aliaksandr Valialkin	dff5008392	app/vlstorage: add -retention.maxDiskSpaceUsageBytes command-line flag for limiting the retention at VictoriaLogs by disk space usage	2024-06-25 17:30:33 +02:00
Aliaksandr Valialkin	3eacd43fff	lib/logstorage: parse syslog structured data into separate fields in order to simplify further querying of this data	2024-06-25 14:53:39 +02:00
Aliaksandr Valialkin	9e1c037249	lib/logstorage: properly parse timezone offset at TryParseTimestampRFC3339Nano() The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets. While at it, make tryParseTimestampISO8601 function private in order to prevent from improper usage of this function from outside the lib/logstorage package. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508	2024-06-25 14:53:38 +02:00
Aliaksandr Valialkin	7252c5d258	lib/logstorage: make golangci-lint happy	2024-06-25 03:04:21 +02:00
Aliaksandr Valialkin	82d639411d	lib/httpserver: revert `9b7e532172` Reason for revert: this commit doesn't resolve real security issues, while it complicates the resulting code in subtle ways (aka security circus). Comparison of two strings (passwords, auth keys) takes a few nanoseconds. This comparison is performed in non-trivial http handler, which takes thousands of nanoseconds, and the request handler timing is non-deterministic because of Go runtime, Go GC and other concurrently executed goroutines. The request handler timing is even more non-deterministic when the application is executed in shared environments such as Kubernetes, where many other applications may run on the same host and use shared resources of this host (CPU, RAM bandwidth, network bandwidth). Additionally, it is expected that the passwords and auth keys are passed via TLS-encrypted connections. Establishing TLS connections takes additional non-trivial time (millions of nanoseconds), which depends on many factors such as network latency, network congestion, etc. This makes impossible to conduct timing attack on passwords and auth keys in VictoriaMetrics components. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6423/files Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6392	2024-06-25 01:36:12 +02:00
Aliaksandr Valialkin	de7450b7e0	lib/logstorage: work-in-progress	2024-06-24 23:27:12 +02:00
Andrii Chubatiuk	1e83598be3	app/vmagent: add max_scrape_size to scrape config (#6434 ) Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-06-20 13:58:42 +02:00
Slava Bobik	d236604d39	Fixed a typo in the FastQueue mutex comment (#6514 ) ### Describe Your Changes Fixed a small typo in a comment about the mutex inside the FastQueue struct ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-06-20 02:30:36 -07:00
Aliaksandr Valialkin	7229dd8c33	lib/logstorage: work-in-progress	2024-06-20 03:10:08 +02:00
Zakhar Bessarab	201fd6de1e	lib/fs/fscore: do not trim content from path (#6503 ) ### Describe Your Changes Trimming content which is loaded from an external pass leads to obscure issues in case user-defined input contained trimmed chars. For example. user-defined password "foo\n" will become "foo" while user will expect it to contain a new line. --- For example, a user defines a password which ends with `\n`. This often happens when user Kubernetes secrets and manually encodes value as base64-encoded string. In this case vmauth configuration might look like: ``` users: - url_prefix: - http://vminsert:8480/insert/0/prometheus/api/v1/write name: foo username: foo password: "foobar\n" ``` vmagent configuration for this setup will use the following flags: ``` -remoteWrite.url=http://vmauth:8427/ -remoteWrite.basicAuth.passwordFile=/tmp/vmagent-password -remoteWrite.basicAuth.username="foo" ``` Where `/tmp/vmagent-password` is a file with `foobar\n` password. Before this change such configuration will result in `401 Unauthorized` response received by vmagent since after file content will become `foobar`. --- An example with Kubernetes operator which uses a secret to reference the same password in multiple configurations. <details> <summary>See full manifests</summary> `Secret`: ``` apiVersion: v1 data: name: Zm9v # foo password: Zm9vYmFy # foobar\n username: Zm9v= # foo kind: Secret metadata: name: vmuser ``` `VMUser`: ``` apiVersion: operator.victoriametrics.com/v1beta1 kind: VMUser metadata: name: vmagents spec: generatePassword: false name: vmagents targetRefs: - crd: kind: VMAgent name: some-other-agent namespace: example username: foo # note - the secret above is referenced to provide password passwordRef: name: vmagent key: password ``` `VMAgent`: ``` apiVersion: operator.victoriametrics.com/v1beta1 kind: VMAgent metadata: name: example spec: selectAllByDefault: true scrapeInterval: 5s replicaCount: 1 remoteWrite: - url: "http://vmauth-vmauth-example:8427/api/v1/write" # note - the secret above is referenced as well basicAuth: username: name: vmagent key: username password: name: vmagent key: password ``` </details> Since both config target exactly the same `Secret` object it is expected to work, but apparently the result will be `401 Unauthrized` error. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-06-19 10:31:48 +02:00
Nihal	9b7e532172	victoria-metrics: constant-time comparison of credentials like authkeys and basic auth credentials (#6423 ) Changes for constant-time comparison of credentials like authkeys and basic auth credentials. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6392 --------- Signed-off-by: Syed Nihal <syed.nihal@nokia.com>	2024-06-19 09:36:56 +02:00
Aliaksandr Valialkin	e498fa6960	app/vlinsert/syslog: allow accepting syslog messages with different configs at different ports	2024-06-17 23:16:34 +02:00
hagen1778	34771ab293	lib/streamaggr: remove accidentally committed changes Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-17 14:24:54 +02:00
Roman Khavronenko	6149adbe10	app/vmselect/promql: check for ranged vectors in aggr funcs if implicit conversions are disabled (#6450 ) Check for ranged vector arguments in aggregate expressions when `-search.disableImplicitConversion` or `-search.logImplicitConversion` are enabled. For example, `sum(up[5m])` will fail to execute if these flags are set. ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [*] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-17 14:21:16 +02:00
Aliaksandr Valialkin	2b6a634ec0	lib/logstorage: work-in-progress	2024-06-17 12:13:18 +02:00
Andrii Chubatiuk	faf67aa8b5	lib/flagutil: use month limit for duration flag for parsed duration assessment (#6486 ) use maxMonths limit for parsed duration flag value https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6330 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-06-14 15:20:21 +02:00
Andrii Chubatiuk	e678a9aa51	lib/backup/s3remote: fixed credsFilePath flag (#6488 ) properly use credsFilePath flag value https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6353 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-06-14 14:13:02 +02:00
Roman Khavronenko	51d19485bb	lib/streamaggr: prevent `rate_sum` and `rate_avg` from producing NaNs (#6482 ) ### Describe Your Changes * check if `lastValue` was seen at least twice with different timestamps. Otherwise, the difference between last timestamp and previous timestamp could be `0` and will result into `NaN` calculation * check if there items left in lastValue map after staleness cleanup. Otherwise, `rate_avg` could have produce `NaN` result. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-14 10:06:22 +02:00
Aliaksandr Valialkin	1c094d928c	lib/leveledbytebufferpool: do not pool byte slices bigger than 2^18 bytes Previously byte slices up to 2^20 bytes (e.g. 1Mb) were cached because of a typo in the commit `c14dafce43` . This could result in increased memory usage when vmagent scrapes many regular targets, which expose relatively small number of metrics (e.g. up to a few thousand per target) and a few large targets such as kube-state-metrics, which expose more than 10 thousand metrics. This is common case for Kubernetes monitoring. While at it, remove pools for very small byte slices, since they are rarely used during scraping.	2024-06-13 16:56:25 +02:00
Aliaksandr Valialkin	d54840f2f2	lib/bytesutil: optimize internStringMap cleanup - Make it in a separate goroutine, so it doesn't slow down regular intern() calls. - Do not lock internStringMap.mutableLock during the cleanup routine, since now it is called from a single goroutine and reads only the readonly part of the internStringMap. This should prevent from locking regular intern() calls for new strings during cleanups. - Add jitter to the cleanup interval in order to prevent from synchornous increase in resource usage during cleanups. - Run the cleanup twice per -internStringCacheExpireDuration . This should save 30% CPU time spent on cleanup comparing to the previous code, which was running the cleanup 3 times per -internStringCacheExpireDuration .	2024-06-13 15:06:51 +02:00
Zakhar Bessarab	34071ac660	lib/promscrape: increase default value for promscrape.maxDroppedTargets to 10_000 (#6459 ) ### Describe Your Changes This limit can be increased since after `4513893ead` tracking of dropped targets uses much less memory per entry. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6381#issuecomment-2156708228 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-06-12 16:34:18 +02:00
LHHDZ	3a45bbb4e0	app/vmauth: fix discovering backend IPs when `url_prefix` contains hostname with `srv+` prefix (#6401 ) This change fixes the following panic: ``` 2024-06-04T11:16:52.899Z warn app/vmauth/auth_config.go:353 cannot discover backend SRV records for http://srv+localhost:8080: lookup localhost on 10.100.10.4:53: server misbehaving; use it literally panic: runtime error: integer divide by zero goroutine 9 [running]: github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper.func1() /Users/lhhdz/wd/projects/go/VictoriaMetrics/lib/httpserver/httpserver.go:291 +0x58 panic({0x103115100?, 0x10338d700?}) /Users/lhhdz/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.darwin-arm64/src/runtime/panic.go:770 +0x124 main.getLeastLoadedBackendURL({0x0?, 0x22?, 0x1400014757b?}, 0x1400013c120?) /Users/lhhdz/wd/projects/go/VictoriaMetrics/app/vmauth/auth_config.go:473 +0x210 main.(*URLPrefix).getBackendURL(0x140000aa080) /Users/lhhdz/wd/projects/go/VictoriaMetrics/app/vmauth/auth_config.go:312 +0xb8 ``` --------- Co-authored-by: Haley Wang <haley@victoriametrics.com>	2024-06-12 12:30:44 +02:00
Aliaksandr Valialkin	8f5dc966f6	lib/logstorage: work-in-progress	2024-06-11 17:50:32 +02:00
Aliaksandr Valialkin	65a97317e4	lib/streamaggr: prevent from data race inside dedupAggrShard when samplesBuf can be updated in pushSamples() while their values are read in the flush() loop without das.mu lock This issue has been introduced in the commit `253c0cffbe`	2024-06-11 17:31:16 +02:00
Aliaksandr Valialkin	0521e58a09	lib/logstorage: work-in-progress	2024-06-10 18:42:19 +02:00
Aliaksandr Valialkin	bf2d299420	lib/streamaggr: return back string interning to dedupAggr after 78953723200f15ffc417064d1912bdbb7551505c It should reduce memory allocation rate during stream deduplication	2024-06-10 18:05:42 +02:00
Aliaksandr Valialkin	6a0a36aa93	lib/bytesutil: reduce the number of memory allocations per each interned string in bytesutil.InternString() from 5 to 1 This should reduce GC overhead when tens of millions of strings are interned (for example, during stream deduplication of millions of active time series).	2024-06-10 18:05:41 +02:00
Roman Khavronenko	cd1aca217c	lib/protoparser/opentelemetry/firehose: escape requestID before returning it to user (#6451 ) All user input should be sanitized before rendering. This should prevent possible attacks. See https://github.com/VictoriaMetrics/VictoriaMetrics/security/code-scanning/203 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-10 16:55:59 +02:00
Aliaksandr Valialkin	253c0cffbe	lib/streamaggr: reduce memory allocations by using dedupAggrSample buffer per each dedupAggrShard	2024-06-10 16:38:42 +02:00
Aliaksandr Valialkin	a1e8003754	lib/streamaggr: reduce the number of duplicates per each sample in BenchmarkDedupAggr from 100 to 2 This is closer to typical production setups when deduplication is used for de-duplicating of 2 samples per series.	2024-06-10 16:38:41 +02:00
Aliaksandr Valialkin	0b7c47a40c	lib/streamaggr: use strings.Clone() instead of bytesutil.InternString() for creating series key in dedupAggr Our internal testing shows that this reduces GC overhead when deduplicating tens of millions of active series.	2024-06-10 16:08:34 +02:00
Aliaksandr Valialkin	e8bb4359bb	lib/streamaggr: improve performance for dedupAggr.sizeBytes() and dedupAggr.itemsCount() These functions are called every time `/metrics` page is scraped, so it would be great if they could be sped up for the cases when dedupAggr tracks tens of millions of active time series.	2024-06-10 15:59:37 +02:00
Aliaksandr Valialkin	f45d02a243	lib/streamaggr: remove flushState arg at dedupAggr.flush(), since it is always set to true in production	2024-06-10 15:59:33 +02:00
Hui Wang	61dce6f2a1	lib/httpserver: allow reloadAuthKey and configAuthKey to override htt… (#6338 ) …pAuth.* address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329, makes `reloadAuthKey`, `configAuthKey`, `flagsAuthKey`, `pprofAuthKey` behavior the same way, but keys like `-snapshotAuthKey`, `-forceMergeAuthKey` are still protected by httpAuth.*. All the available key are listed in https://docs.victoriametrics.com/single-server-victoriametrics/#security. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-06-10 12:09:47 +02:00
Aliaksandr Valialkin	36be090cd5	lib/streamaggr: follow-up for `7cb894a777` - Use bytesutil.InternString() instead of strings.Clone() for inputKey and outputKey in aggregatorpushSamples(). This should reduce string allocation rate, since strings can be re-used between aggrState flushes. - Reduce memory allocations at dedupAggrShard by storing dedupAggrSample by value in the active series map. - Remove duplicate call to bytesutil.InternBytes() at Deduplicator, since it is already called inside dedupAggr.pushSamples(). - Add missing string interning at rateAggrState.pushSamples(). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6402	2024-06-07 16:27:26 +02:00
Roman Khavronenko	7cb894a777	lib/streamaggr: reduce number of inuse objects (#6402 ) The main change is getting rid of interning of sample key. It was discovered that for cases with many unique time series aggregated by vmagent interned keys could grow up to hundreds of millions of objects. This has negative impact on the following aspects: 1. It slows down garbage collection cycles, as GC has to scan all inuse objects periodically. The higher is the number of inuse objects, the longer it takes/the more CPU it takes. 2. It slows down the hot path of samples aggregation where each key needs to be looked up in the map first. The change makes code more fragile, but suppose to provide performance optimization for heavy-loaded vmagents with stream aggregation enabled. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-06-07 15:45:52 +02:00
Roman Khavronenko	5f46f8a11d	lib/promrelabel: speedup label match by `__name__` (#6432 ) The change adds a fastpath for `equalValue` comparisons against `__name__` label by avoiding calls to `toCanonicalLabelName` func. This speedups matches by metric name like `'foo'`. See bench stats below: ``` benchcmp old.txt new.txt benchmark old ns/op new ns/op delta BenchmarkIfExpression/equal_label:_last-10 35.6 35.1 -1.18% BenchmarkIfExpression/equal_label:_middle-10 18.3 17.3 -5.41% BenchmarkIfExpression/equal_label:_first-10 1.20 1.24 +2.74% BenchmarkIfExpression/equal___name__:_last-10 10.1 4.96 -50.75% BenchmarkIfExpression/equal___name__:_middle-10 5.79 3.16 -45.41% BenchmarkIfExpression/equal___name__:_first-10 1.17 1.05 -9.76% ``` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-07 15:44:48 +02:00
Andrii Chubatiuk	185fac03b3	lib/streamaggr: metrics to track dropped, nan samples and samples lag (#6358 ) ### Describe Your Changes Added streamaggr metrics to: - `vm_streamaggr_samples_lag_seconds` - samples lag - `vm_streamaggr_ignored_samples_total{reason="nan"}` - ignored NaN samples - `vm_streamaggr_ignored_samples_total{reason="too_old"}` - ignored old samples	2024-06-06 14:06:11 +02:00
Aliaksandr Valialkin	55d8379ae6	lib/logstorage: work-in-progress	2024-06-06 12:27:05 +02:00
Aliaksandr Valialkin	80a7c65ab7	lib/logstorage: allow using `eval` keyword instead of `math` keyword in `math` pipe	2024-06-05 10:07:49 +02:00
Aliaksandr Valialkin	43cf221681	lib/logstorage: work-in-progress	2024-06-05 03:18:12 +02:00
pludov	3ddae77c63	lib/fs: support NFS implementations that return EEXIST instead of ENOTEMPTY (#6398 ) ### Describe Your Changes Fix for issue #6396: according to rmdir manpage, ENOTEMPTY and EEXIST should be treated equally https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6396 ### Checklist The following checks are mandatory: - [x ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Co-authored-by: Ludovic Pollet <ludovic.pollet@exfo.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-06-04 15:17:38 +02:00
Aliaksandr Valialkin	96c29ab403	lib/logstorage: allow typing `asc` in `sort` pipe for the sake of consistency with `desc`	2024-06-04 02:29:10 +02:00
Aliaksandr Valialkin	539fce9227	lib/logstorage: work-in-progress	2024-06-04 01:49:02 +02:00
Aliaksandr Valialkin	b30e80b071	lib/logstorage: work-in-progress	2024-05-30 16:19:23 +02:00
Roman Khavronenko	b984f4672e	lib/storage: filter deleted label names and values from `/api/v1/labe… (#6342 ) …ls` and `/api/v1/label/.../values` Check for deleted metrics when `match[]` filter matches small number of time series (optimized path). The issue was introduced [v1.81.0](https://docs.victoriametrics.com/changelog_2022/#v1810). Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6300 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-29 14:07:44 +02:00
Aliaksandr Valialkin	1de187bcb7	lib/logstorage: work-in-progress	2024-05-29 01:52:13 +02:00
Aliaksandr Valialkin	0aafca29be	lib/logstorage: work-in-progress	2024-05-28 19:29:41 +02:00
Aliaksandr Valialkin	99138e15c0	lib/logstorage: fix golangci-lint warnings	2024-05-26 02:01:32 +02:00
Aliaksandr Valialkin	1e203f35f7	lib/logstorage: work-in-progress	2024-05-26 01:55:21 +02:00
Aliaksandr Valialkin	7ac529c235	lib/logstorage: work-in-progress	2024-05-25 22:59:13 +02:00
Aliaksandr Valialkin	0b629ce5a5	lib/logstorage: re-use per-shard fields across processed blocks in pipePackJSON and pipeUnroll	2024-05-25 22:13:32 +02:00
Aliaksandr Valialkin	dc55146752	lib/logstorage: work-in-progress	2024-05-25 21:36:16 +02:00
Aliaksandr Valialkin	e2590f0485	lib/logstorage: work-in-progress	2024-05-25 00:30:58 +02:00
Nikolay	69d244e6fb	lib/mergeset: adds tracking for indexdb records drop (#6297 ) It allows to create alert for possible item drops at indexdb. It may happen, if ingested metric size exceeds max indexdb item size. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-05-24 14:55:20 +02:00
Aliaksandr Valialkin	4b458370c1	lib/logstorage: work-in-progress	2024-05-24 03:06:55 +02:00
Nikolay	a5d1013042	lib/storage: change default value for maxLabelValueLen to 1024 (#6313 ) * It must reduce memory usage for misbehaving clients. Since VictoriaMetrics stores sparse index inmemory. * Reduce disk space usage for indexdb. * Prevent possible indexDB items drops. * It may trigger slow insert and new timeseries registration due to default value for flag change https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6176 --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-05-22 21:53:53 +02:00
Alexander Marshalov	7da541360e	[vmlogs] fixed time parsing with millisecond precision time (#6293 ) (#6295 ) fix for #6293 Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-05-22 21:46:50 +02:00
Aliaksandr Valialkin	22107421eb	lib/logstorage: work-in-progress	2024-05-22 21:01:20 +02:00
Roman Khavronenko	ac836bcf6c	lib/backup: add `-s3TLSInsecureSkipVerify` command-line flag (#6318 ) * The new flag can be used for for skipping TLS certificates verification when connecting to S3 endpoint. Affects vmbackup, vmrestore, vmbackupmanager. * replace deprecated `EndpointResolver` with `BaseEndpoint` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1056 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-22 13:58:39 +02:00
Hui Wang	d7b5062917	app/vmalert: support DNS SRV record in `-remoteWrite.url` (#6299 ) part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053, supports [DNS SRV](https://en.wikipedia.org/wiki/SRV_record) address in `-remoteWrite.url` command-line option.	2024-05-22 10:52:51 +02:00
Roman Khavronenko	7ce052b32d	lib/streamaggr: skip empty aggregators (#6307 ) Prevent excessive resource usage when stream aggregation config file contains no matchers by prevent pushing data into Aggregators object. Before this change a lot of extra work was invoked without reason. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-20 14:03:28 +02:00
Aliaksandr Valialkin	bc4a0b8f37	lib/logstorage: fix golangci-lint warnings	2024-05-20 11:04:12 +02:00
Aliaksandr Valialkin	ad505a7a9a	lib/logstorage: work-in-progress	2024-05-20 04:08:30 +02:00
Andrii Chubatiuk	f153f54d11	app/vmagent: add global aggregator (#6268 ) Add global stream aggregation for VMAgent https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467	2024-05-17 14:00:47 +02:00
Nikolay	b2765c45d0	follow-up for `c6c5a5a186` (#6265 ) * adds datadog extensions for statsd: - multiple packed values (v1.1) - additional types distribution, histogram * adds type check and append metric type to the labels with special tag name `__statsd_metric_type__`. It simplifies streaming aggregation config. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-05-16 09:25:42 +02:00
Aliaksandr Valialkin	0aa19a2837	lib/logstorage: work-in-progress	2024-05-15 04:55:44 +02:00
Aliaksandr Valialkin	b617dc9c0b	lib/streamaggr: properly return output key from getOutputKey The bug has been introduced in `cc2647d212`	2024-05-14 17:47:21 +02:00
Aliaksandr Valialkin	da3af090c6	lib/logstorage: work-in-progress	2024-05-14 03:05:03 +02:00
Aliaksandr Valialkin	cb35e62e04	lib/logstorage: work-in-progress Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6258	2024-05-14 01:49:23 +02:00
Aliaksandr Valialkin	cc2647d212	lib/encoding: optimize UnmarshalVarUint64, UnmarshalVarInt64 and UnmarshalBytes a bit Change the return values for these functions - now they return the unmarshaled result plus the size of the unmarshaled result in bytes, so the caller could re-slice the src for further unmarshaling. This improves performance of these functions in hot loops of VictoriaLogs a bit.	2024-05-14 01:23:54 +02:00
Aliaksandr Valialkin	707f3a69db	lib/stringsutil: add LessNatural() function for natural sorting Natural sorting is needed for sort_by_label_natural() and sort_by_label_natural_desc() functions in MetricsQL - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6192 and https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6256 Natural sorting will be also used by `\| sort ...` pipe in VictoriaLogs - see https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe	2024-05-13 16:56:47 +02:00
Hui Wang	4c80b17027	storage: correctly apply `-inmemoryDataFlushInterval` when it's set t… (#6221 ) …o minimum supported value 1s pendingRowsFlushInterval was bumped to 2s in `73f0a805e2`	2024-05-13 16:44:30 +02:00
Andrii Chubatiuk	ce25d68b45	lib/streamaggr: added rate_sum and rate_avg to benchmarks, lint fix (#6264 ) fixed lint for rate outputs	2024-05-13 16:40:37 +02:00
Andrii Chubatiuk	9c3d44c8c9	lib/streamaggr: added rate and rate_avg output (#6243 ) Added `rate` and `rate_avg` output Resource usage is the same as for increase output, tested on a benchmark --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-05-13 15:39:49 +02:00
hagen1778	17283fab6c	lib/logstorage: make linter happy Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-13 15:35:11 +02:00
Aliaksandr Valialkin	9dbd0f9085	lib/logstorage: initial implementation of pipes in LogsQL See https://docs.victoriametrics.com/victorialogs/logsql/#pipes	2024-05-12 16:33:31 +02:00
Aliaksandr Valialkin	e66465cb03	lib/encoding: optimizing UnmarshalVarUint64 and UnmarshalVarInt64 a bit	2024-05-12 16:32:11 +02:00
Aliaksandr Valialkin	590160ddbb	lib/slicesutil: add helper functions for setting slice length and extending its capacity The added helper functions - SetLength() and ExtendCapacity() - replace error-prone code with simple function calls.	2024-05-12 11:32:17 +02:00
Aliaksandr Valialkin	f20d452196	lib/storage: remove outdated misleading comments	2024-05-12 10:24:04 +02:00
Roman Khavronenko	87fd400dfc	Feature allow configuring disableOnDiskQueue and dropSamplesOnOverload per url (#6248 ) * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): allow configuring `-remoteWrite.disableOnDiskQueue` and `-remoteWrite.dropSamplesOnOverload` cmd-line flags per each `-remoteWrite.url`. See this [pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065). Thanks to @rbizos for implementaion! * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add labels `path` and `url` to metrics `vmagent_remotewrite_push_failures_total` and `vmagent_remotewrite_samples_dropped_total`. Now number of failed pushes and dropped samples can be tracked per `-remoteWrite.url`. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Raphael Bizos <r.bizos@criteo.com>	2024-05-10 12:09:21 +02:00
Roman Khavronenko	8a03e987cb	lib/streamaggr: set correct suffix `<output>_prometheus` (#6228 ) Set correct suffix `<output>_prometheus` for aggregation outputs `increase_prometheus` and `total_prometheus` Before, outputs `total` and `total_prometheus` or `increase` and `increase_prometheus` had the same suffix. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-08 13:11:30 +02:00
Andrii Chubatiuk	a9283e06a3	streamaggr: made labels compressor shared (#6173 ) Though labels compressor is quite resource intensive, each aggregator and deduplicator instance has it's own compressor. Made it shared across all aggregators to consume less resources while using multiple aggregators. Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>	2024-05-08 13:10:53 +02:00
Zhu Jiekun	17e3d019d2	feature: [vmagent] Add service discovery support for Vultr (#6068 ) ### Describe Your Changes related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041 #### Added - Added service discovery support for Vultr. #### Docs - `CHANGELOG.md`, `sd_configs.md`, `vmagent.md` are updated. #### Note - Useful links: - Vultr API: https://www.vultr.com/api/#tag/instances/operation/list-instances - Vultr client SDK: https://github.com/vultr/govultr - Prometheus SD: https://github.com/prometheus/prometheus/tree/main/discovery/vultr --- ### Checklist The following checks are mandatory: - [X] I have read the [Contributing Guidelines](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/CONTRIBUTING.md) - [x] All commits are signed and include `Signed-off-by` line. Use `git commit -s` to include `Signed-off-by` your commits. See this [doc](https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work) about how to sign your commits. - [x] Tests are passing locally. Use `make test` to run all tests locally. - [x] Linting is passing locally. Use `make check-all` to run all linters locally. Further checks are optional for External Contributions: - [X] Include a link to the GitHub issue in the commit message, if issue exists. - [x] Mention the change in the [Changelog](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/CHANGELOG.md). Explain what has changed and why. If there is a related issue or documentation change - link them as well. Tips for writing a good changelog message:: * Write a human-readable changelog message that describes the problem and solution. * Include a link to the issue or pull request in your changelog message. * Use specific language identifying the fix, such as an error message, metric name, or flag name. * Provide a link to the relevant documentation for any new features you add or modify. - [ ] After your pull request is merged, please add a message to the issue with instructions for how to test the fix or try the feature you added. Here is an [example](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4048#issuecomment-1546453726) - [x] Do not close the original issue before the change is released. Please note, in some cases Github can automatically close the issue once PR is merged. Re-open the issue in such case. - [x] If the change somehow affects public interfaces (a new flag was added or updated, or some behavior has changed) - add the corresponding change to documentation. Signed-off-by: Jiekun <jiekun.dev@gmail.com>	2024-05-08 10:01:48 +02:00
Oleg	c6c5a5a186	Statsd protocol compatibility (#5053 ) In this PR I added compatibility with [statsd protocol](https://github.com/b/statsd_spec) with tags to be able to send metrics directly from statsd clients to vmagent or directly to VM. For example its compatible with [statsd-instrument](https://github.com/Shopify/statsd-instrument) and [dogstatsd-ruby](https://github.com/DataDog/dogstatsd-ruby) gems Related issues: #5052, #206, #4600	2024-05-07 21:46:08 +02:00
Ted Possible	5a3abfa041	Exemplar support (#5982 ) This code adds Exemplars to VMagent and the promscrape parser adhering to OpenMetrics Specifications. This will allow forwarding of exemplars to Prometheus and other third party apps that support OpenMetrics specs. --------- Signed-off-by: Ted Possible <ted_possible@cable.comcast.com>	2024-05-07 12:09:44 +02:00
Zakhar Bessarab	329c3cbdf0	lib/mergeset: improve test coverage (#6118 ) Add test to cover the code path with overflowing shards buffers and triggering merge to partition. This test covers the code path which leaded to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-04-30 10:21:37 +02:00
hagen1778	679844feaf	Revert "app/vmbackup: introduce new flag type URL (#6152 )" This reverts commit `029060af60`.	2024-04-24 13:47:57 +02:00
Roman Khavronenko	029060af60	app/vmbackup: introduce new flag type URL (#6152 ) The new flag type is supposed to be used for specifying URL values which could contain sensitive information such as auth tokens in GET params or HTTP basic authentication. The URL flag also allows loading its value from files if `file://` prefix is specified. As example, the new flag type was used in app/vmbackup as it requires specifying `authKey` param for making the snapshot. See related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5973 Thanks to @wasim-nihal for initial implementation https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6060 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-04-24 10:57:54 +02:00
hagen1778	bae3874e6a	app/streamaggr: follow-up after `c0e4ccb7b5` * rm vmagent mentions from vminsert flags * improve documentation wording, add links to related sections * mention `ignore_first_intervals` in the stream aggr options * update flags description * add basic test for config parsing validation Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-04-22 14:22:59 +02:00
Andrii Chubatiuk	c0e4ccb7b5	lib/streamaggr: add option to ignore first N aggregation intervals (#6137 ) Stream aggregation may yield inaccurate results if it processes incomplete data. This issue can arise when data is sourced from clients that maintain a queue of unsent data, such as Prometheus or vmagent. If the queue isn't fully cleared within the aggregation interval, only a portion of the time series may be included in that period, leading to distorted calculations. To mitigate this we add an option to ignore first N aggregation intervals. It is expected, that client queues will be cleared during the time while aggregation ignores first N intervals and all subsequent aggregations will be correct.	2024-04-22 13:52:04 +02:00
Aliaksandr Valialkin	4770294732	lib/protoparser: substitute hybrid channel-based pools with plain sync.Pool Using plain sync.Pool simplifies the code without increasing memory usage and CPU usage. So it is better to use plain sync.Pool from readability and maintainability PoV. This is a follow-up for `8942f290eb`	2024-04-20 21:59:51 +02:00
Aliaksandr Valialkin	7531e9084a	all: use clear() built-in Go function for clearing []prompbmarshal.TimeSeries and []prompbmarshal.Label slices This makes the code a bit clear.	2024-04-20 21:00:03 +02:00
Aliaksandr Valialkin	6b1cc9b946	lib/storage: search for all the values for the given label before applying filters and limits It is incorrect applying the limit on the number of values to search without applying filters, since the returned subset of label values may miss the label values matching the given filters. This is a follow-up for `66630c7960`	2024-04-18 20:29:36 +02:00
Aliaksandr Valialkin	2e3580905f	all: replace old https://docs.victoriametrics.com/relabeling.html url with the new one - https://docs.victoriametrics.com/relabeling/	2024-04-18 03:22:22 +02:00
Aliaksandr Valialkin	e9642e99f2	all: replace old https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html url with the new one - https://docs.victoriametrics.com/single-server-victoriametrics/	2024-04-18 03:11:03 +02:00
Aliaksandr Valialkin	828e78ceb4	all: replace old https://docs.victoriametrics.com/sd_configs.html url with the new one - https://docs.victoriametrics.com/sd_configs/	2024-04-18 02:27:47 +02:00
Aliaksandr Valialkin	4d2b9fe6b2	all: replace old https://docs.victoriametrics.com/stream-aggregation.html url with the new one - https://docs.victoriametrics.com/stream-aggregation/	2024-04-18 02:19:11 +02:00
Aliaksandr Valialkin	6e6bae3e8d	all: replace old https://docs.victoriametrics.com/vmbackup.html url with the new one - https://docs.victoriametrics.com/vmbackup/	2024-04-18 01:57:04 +02:00
Aliaksandr Valialkin	c81a633b02	all: replace the outdated url https://docs.victoriametrics.com/vmagent.html with the new one - https://docs.victoriametrics.com/vmagent/	2024-04-18 01:31:37 +02:00
Aliaksandr Valialkin	66630c7960	lib/storage: improve performance for /api/v1/label/labelName/values when match[] contains only a single filter on labelName This speeds up auto-suggestion for metric names in VMUI and Grafana, which use the following query in this case: /api/v1/label/__name__/values?match[]={__name__=~".some_value."} When the user types `some_value` in the query input field.	2024-04-18 01:15:20 +02:00
Aliaksandr Valialkin	50ac22df78	lib/httpserver: add support for automatic issuing of TLS certificates via Lets Encrypt service Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5949	2024-04-17 23:50:57 +02:00
Aliaksandr Valialkin	bd454f5063	lib/netutil: move creation of GetCertificate callback into a separate function This improves code readability a bit	2024-04-17 22:10:43 +02:00
Aliaksandr Valialkin	dc326f70b4	app/vmagent: support for DNS SRV urls at -remoteWrite.url, scrape target urls and service discovery urls Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053	2024-04-17 20:54:39 +02:00
Aliaksandr Valialkin	b426d10847	app/vmauth: add support for configuring backends via DNS SRV urls	2024-04-17 20:46:22 +02:00
Aliaksandr Valialkin	e3a26c0db6	lib/promscrape/discovery/consul: typo fix in the comment: enteprise -> enterprise	2024-04-16 19:34:18 +02:00
Aliaksandr Valialkin	85d09e5a2d	lib/{mergeset,storage}: log deleting directories inside partitions if they are missing in parts.json This should improve debuggability of unexpected deletion of directories inside partitions. While at it, log the proper path to parts.json when the directory for big part is missing in the partition. parts.json is located inside directory with small parts, and there is no parts.json file inside directory with big parts.	2024-04-16 19:11:32 +02:00
Aliaksandr Valialkin	6bcc6c938b	lib/storage: improve comments inside functions responsible for creating indexes for newly registered time series	2024-04-16 19:11:32 +02:00
Zakhar Bessarab	2205de2391	lib/mergeset: fix flushing incorrect set of inmemoryBlocks (#6089 ) Follow-up for `bace9a2501` Related: - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6069 - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-04-11 09:26:06 +02:00
wanshuangcheng	83216e956c	chore: fix function names in comment (#6076 ) Signed-off-by: wanshuangcheng <wanshuangcheng@outlook.com>	2024-04-08 01:11:12 -07:00
Aliaksandr Valialkin	f8d10a7106	lib/streamaggr: update the minimum allowed timestamp for incoming samples before flushing the samples to the storage This should prevent from dropping samples with old timestamps during long flushes. This is a follow-up for `1cedaf61cb`	2024-04-04 02:25:51 +03:00
Aliaksandr Valialkin	967d5496cf	app/vmagent: follow-up for `b3b29ba6ac` - Automatically reload changed TLS root CA pointed by -remoteWrite.tlsCAFile command-line flag - Automatically reload changed TLS root CA configured via oauth2.tsl_config.ca_file option at -promscrape.config - Document the change as a feature instead of a bug at docs/CHANGELOG.md - Simplify the code at lib/promauth, which is responsible for reloading changed TLS root CA files. - Simplify the usage of lib/promauth.Config.NewRoundTripper() - now it accepts the base http.Transport instead of a callback, which can change the internal http.Transport. - Reuse the default tls config if lib/promauth.Config doesn't contain tls-specific configs. This should reduce memory usage a bit when tls isn't used for scraping big number of targets. - Do not re-read TLS root CA files on every processed request. Re-read them once per second. This should reduce CPU usage when scraping big number of targets over https. - Do not store cert.pem and key.pem files in TestTLSConfigWithCertificatesFilesUpdate, since they can be loaded from byte slices via crypto/tls.X509KeyPair(). - Remove obsolete comparisons of string representations for authConfig and proxyAuthConfig at areEqualScrapeConfigs(). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5725 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2171	2024-04-04 01:27:35 +03:00
Zakhar Bessarab	f80ac120f3	lib/promscrape/config: fix missing timeout for http client (#6063 ) Follow-up for `b3b29ba6` Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-04-03 18:18:48 +02:00
Zakhar Bessarab	b3b29ba6ac	lib/{promauth,promscrape}: automatically refresh root CA certificates after changes on disk (#5725 ) * lib/{promauth,promscrape}: automatically refresh root CA certificates after changes on disk Added a custom `http.RoundTripper` implementation which checks for root CA content changes and updates `tls.Config` used by `http.RoundTripper` after detecting CA change. Client certificate changes are not tracked by this implementation since `tls.Config` already supports passing certificate dynamically by overriding `tls.Config.GetClientCertificate`. This change implements dynamic reload of root CA only for streaming client used for scraping. Blocking client (`fasthttp.HostClient`) does not support using custom transport so can't use this implementation. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promauth/config: update NewRoundTripper API Update API to allow user to update only parameters required for transport. Add warning log when reloading Root CA failed. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promauth/config: fix mutex acquire logic Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promauth/config: replace RWMutex with regular mutex to simplify the code - remove additional mutex used for getRootCABytes - require callee to use mutex - replace RWMutex with regular mutex Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promauth/config: refactor - hold the mutex lock to avoid round tripper being re-created twice - move recreation logic into separate func to simplify the code Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-04-03 10:01:43 +02:00
Aliaksandr Valialkin	fb42380ef3	lib/protoparser/opentelemetry: follow-up after `47892b4a4c` - Rename -opentelemetry.sanitizeMetrics command-line flag to more clear -opentelemetry.usePrometheusNaming - Clarify the description of the change at docs/CHANGELOG.md - Rename promrelabel.SanitizeLabelNameParts to more clear promrelabel.SplitMetricNameToTokens - Properly split metric names at '_' char in promerlabel.SplitMetricNameToTokens. - Add tests for various edge cases for Prometheus metric names' normalization according to the code at `b865505850/pkg/translator/prometheus/normalize_name.go` - Extract the code responsible for Prometheus metric names' normalization into a separate file (santize.go) Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6035	2024-04-03 02:25:29 +03:00
Aliaksandr Valialkin	918cccaddf	all: fix golangci-lint(revive) warnings after `0c0ed61ce7` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001	2024-04-02 23:16:29 +03:00
Aliaksandr Valialkin	c3a72b6cdb	lib/storage: consistently use stopCh instead of stop	2024-04-02 21:24:57 +03:00
Aliaksandr Valialkin	904e95fc69	app/vmagent: simplify code after `509df44d03` - Simplify the code in order to improve its maintenance - Properly pass tenant ID when processing multi-tenant opentelemetry request at vmagent Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6016	2024-04-02 17:58:13 +03:00
Aliaksandr Valialkin	c79bf3925c	Revert "app/vmselect: make vmselect resilient to absence of cache folder (#5987 )" This reverts commit `cb23685681`. Reason for revert: the "fix" may hide programming bugs related to incorrect creation of folders before their use. This may complicate detecting and fixing such bugs in the future. There are the following fixes for the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985 : - To configure the OS to do not drop data from the system-wide temporary directory (aka /tmp). - To run VictoriaMetrics with -cacheDataPath command-line flag, which points to the directory, which cannot be removed automatically by the OS. The case when the user accidentally deletes the directory with some files created by VictoriaMetrics shouldn't be considered as expected, so VictoriaMetrics shouldn't try resolving this case automatically. It is much better from operation and debuggability PoV is to crash with the clear `directory doesn't exist` error in this case.	2024-03-30 07:29:24 +02:00
Aliaksandr Valialkin	830b871baf	app/vmagent: properly shutdown when -maxIngestionRate limit is reached The remotewrite.Stop() expects that there are no pending calls to TryPush(). This means that the ingestionRateLimiter.Register() must be unblocked inside TryPush() when calling remotewrite.Stop(). Provide remotewrite.StopIngestionRateLimiter() function for unblocking the rate limiter before calling the remotewrite.Stop(). While at it, move the rate limiter into lib/ratelimiter package, since it has two users. Also move the description of the feature to the correct place at docs/CHANGELOG.md. Also cross-reference -remoteWrite.rateLimit and -maxIngestionRate command-line flags. This is a follow-up for `02bccd1eb9` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5900	2024-03-30 06:43:48 +02:00
Zakhar Bessarab	af3922b1df	lib/storage: add ability to use downsampling for the given series filter (#733 ) * lib/storage: add ability to use downsampling for the given series filter Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs: add information about downsampling filters Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs: fix MetricsQL filter Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/downsampling: treat missing downsampling filter as a bug Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/part_header: verify correctness of downsampling filters when opening partition Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/downsampling: save only appliable rules in part metadata Filter and save only rules which are appliable to partition based on MinTimestamp of stored data. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/downsampling: update log messages for final dedup Properly specify a reason of re-running deduplication for partition. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage: consistently use MaxTimestamp to determine deduplication/downsampling rules Using MinTimestamp leads to applying downsampling to parts which are only partially covered by downsampling rule. For example, partition covers range [1000-2000]. At t=2100 and rule offset 500 data with t=2100-500 => 1600 must be downsampled. The range check against MinTimestamp evaluates to true even though partition contains range which must not be downsampled - [1600:2000]. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * Follow-up - Apply the first matching downsampling period if multiple filters match the given time series. This allows fine-tuning the downsampling config for the specific needs. - Take into account downsampling filters during search queries. - Reduce the difference between community and enterprise branches. This should simplify further maintenance of these branches. - Properly parse series filters with colons inside them. - Document the feature at docs/CHANGELOG.md. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4960 --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-03-30 04:12:23 +02:00
Aliaksandr Valialkin	131f357098	lib/storage/table.go: reduce the difference with enterprise branch	2024-03-30 03:22:51 +02:00
Aliaksandr Valialkin	4001ca36b8	lib/storage/partition.go: reduce code difference a bit with enterprise branch	2024-03-30 01:39:27 +02:00
Nikolay	a05303eaa0	lib/storage: adds metrics for downsampling (#382 ) * lib/storage: adds metrics for downsampling vm_downsampling_partitions_scheduled - shows the number of parts, that must be downsampled vm_downsampling_partitions_scheduled_size_bytes - shows total size in bytes for parts, the must be donwsampled These two metrics answer the questions - is downsampling running? how many parts scheduled for downsampling and how many of them currently downsampled? Storage space that it occupies. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2612 * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-03-30 01:11:49 +02:00
Andrii Chubatiuk	47892b4a4c	opentelemetry: added cmd flag to sanitize metric names (#6035 )	2024-03-29 13:51:24 +01:00
Aliaksandr Valialkin	4a359d5f67	lib/storage: follow-up for `76f00cea6b` Store the deadline when the metricID entries must be deleted from indexdb if metricID->metricName entry isn't found after the deadline. This should make the code more clear comparing the the previous version, where the timestamp of the first metricID->metricName lookup miss was stored in missingMetricIDs. Remove the misleading comment about the importance of the order for creating entries in the inverted index when registering new time series. The order doesn't matter, since any subset of the created entries can become visible for search before any other subset after registering in indexdb. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959	2024-03-27 11:41:28 +02:00
Zakhar Bessarab	51f5ac1929	lib/storage/table: wait for merges to be completed when closing a table (#5965 ) * lib/storage/table: properly wait for force merges to be completed during shutdown Properly keep track of running background merges and wait for merges completion when closing the table. Previously, force merge was not in sync with overall storage shutdown which could lead to holding ptw ref. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs: add changelog entry Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-03-26 13:49:09 +01:00
Andrii Chubatiuk	509df44d03	app/{vmagent,vminsert}: fixed firehose response (#6016 )	2024-03-26 13:20:41 +01:00
Roman Khavronenko	cb23685681	app/vmselect: make vmselect resilient to absence of cache folder (#5987 ) vmselect uses a cache folder in file system for two purposes: 1. Storing rollup cache results on shutdown; 2. Storing temporary search results from vmstorage during query executions. It could happen that cache folder is deleted accidentally by user, or by OS during cleanup routines. This would cause vmselect to: 1. panic on /metrics call, because `MustGetFreeSpace` will fail; 2. return query error user, as it won't be able to store temporary search results. The changes in this commit are the following: 1. Make `MustGetFreeSpace` to try re-creating the cache folder if it is missing; 2. Make vmselect to try re-creating the cache folder if it can't persist tmp search results. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985 Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-03-26 12:59:50 +01:00
hagen1778	e6dd52b04c	lib/promauth: follow-up `b577413d3b` Convert test result expectations to canonical form. Starting from `b577413d3b` specified header keys are forced into canonical form https://pkg.go.dev/net/http#CanonicalHeaderKey Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-03-18 11:12:45 +01:00
Aliaksandr Valialkin	4553521f9a	lib/streamaggr: ignore out of order samples for `last` output This is a follow-up for `6a465f6e29` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931	2024-03-18 01:03:36 +02:00
Aliaksandr Valialkin	76f00cea6b	lib/storage: wait for up to 60 seconds before deciding to delete metricID entries from indexdb if metricID->metricName entry is missing during search The metricID->metricName entry can remain invisible for search for some time after registering new metricName. This is expected condition. So wait for up to 60 seconds in the hope that the metricID->metricName entry will become visible before deleting all the entries from indexdb, which are associated with the given metricID. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948 See also `20812008a7`	2024-03-18 00:34:32 +02:00
Aliaksandr Valialkin	729b263670	lib/httputils: rename CAFile -> caFile in order to be consistent with local var naming in Go This is a follow-up for `83e55456e2`	2024-03-17 23:19:52 +02:00
Aliaksandr Valialkin	1cedaf61cb	app/{vmagent,vminsert}: add an ability to ignore input samples outside the current aggregation interval for stream aggregation See https://docs.victoriametrics.com/stream-aggregation.html#ignoring-old-samples	2024-03-17 23:03:47 +02:00
Aliaksandr Valialkin	6a465f6e29	lib/streamaggr: ignore out of order samples when calculating increase, increase_prometheus, total and total_prometheus outputs Out of order samples may result in unexpected spikes for these outputs. So it is better to ignore such samples. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931	2024-03-17 22:03:03 +02:00
Aliaksandr Valialkin	cbd80efcc1	lib/streamaggr: follow-up for `15e33d56f1` - Properly set pushSample.timestamp when flushing de-duplicated samples to stream aggregation This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931 - Re-classify this change as feature instead of bugfix at docs/CHANGELOG.md - Verify de-duplication logic for samples with different timestamps Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5643 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5939	2024-03-17 21:37:16 +02:00
Aliaksandr Valialkin	b577413d3b	lib/promauth: properly set `Host` header in requests to scrape targets. The `Host` header must be set via net/http.Request.Host field, since net/http.Client ignores this header if it is set via Request.Header.Set(). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5969 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5970	2024-03-17 20:22:54 +02:00
Andrii Chubatiuk	15e33d56f1	lib/streamaggr: pick sample with bigger timestamp or value on deduplicator (#5939 ) Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication This would require more memory for deduplication, since we need to track timestamp for each record. However, deduplication should become more consistent. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5643 --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-03-12 22:47:29 +01:00
Aliaksandr Valialkin	d1d2771bee	lib/storage: optimize /api/v1/labels and /api/v1/label/.../values when match[] contains metric name Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055	2024-03-12 02:43:16 +02:00
Aliaksandr Valialkin	d46d87a9e0	lib/storage: move the conversion of tag filters to composite tag filters into indexSearch.searchMetricIDsInternal This makes the code less fragile - it is harder to skip the convertToCompositeTagFilterss() call now. While at it, call indexSearch.containsTimeRange() inside indexSearch.searchMetricIDsInternal() in order to quickly terminate search of time series in the old indexdb for new time ranges. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055 This is a follow-up for `2d31fd7855`	2024-03-11 20:40:28 +02:00
Aliaksandr Valialkin	2d31fd7855	lib/storage: use composite indexes (metricName, label=value) when searching for matching time series at /api/v1/labels, /api/v1/label/.../values and /api/v1/status/tsdb This should improve query performance when match[], extra_filters[] or extra_label args are passed to these APIs Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055	2024-03-10 12:57:34 +02:00
Aliaksandr Valialkin	cb259116b4	lib/promauth: set the Host header to tlsServerName if itsn't empty If tlsServerName isn't empty, then it is likely the https request is sent to IP instead of hostname. In this case the request will fail, since Go automatically sets the Host header to the IP instead of the desired hostname at tlsServerName. So set the Host header to tlsServerName if itsn't empty. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5802	2024-03-07 01:22:01 +02:00
Aliaksandr Valialkin	5582a24ecf	lib/streamaggr: add tests for keep_metric_names and drop_input_labels options	2024-03-06 18:34:04 +02:00
Aliaksandr Valialkin	b4b38f782c	app/vmagent/remotewrite: clarify the reason behind the default value for -remoteWrite.queues in the same way as the reason for -maxConcurrentInserts is defined at `73f5fb0f0c`	2024-03-06 13:43:08 +02:00
hagen1778	73f5fb0f0c	lib/writeconcurrencylimiter: mention dependency on CPU cores for `-maxConcurrentInserts` flag The change also removes misleading `default` value from README for `maxConcurrentInserts` cmd-line flag. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-03-05 18:55:38 +01:00
Aliaksandr Valialkin	da611ad628	app/{vmagent,vminsert}: add `-streamAggr.dropInputSamples` command-line flag for dropping the specified labels from input samples before deduplication and streaming aggregation	2024-03-05 02:15:01 +02:00
Aliaksandr Valialkin	ed523b5bbc	app/{vminsert,vmagent}: allow using -streamAggr.dedupInterval without -streamAggr.config This allows performing online de-duplication of incoming samples	2024-03-05 00:45:30 +02:00
Aliaksandr Valialkin	22d63ac7cd	lib/streamaggr: do not reset aggregation state after the aggregation took longer than the configured interval It is better from user PoV preserving this state until the next flush	2024-03-04 20:03:06 +02:00
Aliaksandr Valialkin	32653db7d5	lib/streamaggr: add missing "s" suffix in the warning message when the de-duplication or aggregation couldnt be finished in a timely manner	2024-03-04 19:37:58 +02:00
Aliaksandr Valialkin	6319d029a8	lib/streamaggr: benchmark only flush routines in BenchmarkDedupAggrFlushSerial and BenchmarkAggregatorsFlushSerial	2024-03-04 19:12:28 +02:00
Aliaksandr Valialkin	074abd5bee	Revert "lib/streamaggr: do not flush dedup shards in parallel" This reverts commit `eb40395a1c`. Reason for revert: it has been appeared that the performance gain on multiple CPU cores wasn't visible because the benchmark was generating incorrect pushSample.key. See a207e0bf687d65f5198207477248d70c69284296	2024-03-04 19:12:28 +02:00
Aliaksandr Valialkin	e70177c5fb	lib/streamaggr: properly generate pushSample.key in benchmarks	2024-03-04 19:12:27 +02:00
Aliaksandr Valialkin	b232968bb4	lib/streamaggr: reduce the number of pointers at "total" aggregation state This should reduce load on GC when scanning heap objects.	2024-03-04 19:12:27 +02:00
Aliaksandr Valialkin	d42667fc41	lib/streamaggr: use multiple job label values in BenchmarkAggregatorsPush instead of single value This should make the benchmark closer to production cases	2024-03-04 19:12:26 +02:00

... 2 3 4 5 6 ...

2712 commits