github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-01 14:47:38 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	54d3abc092	lib: consistently use regexp.Regexp.ReplaceAllLiteralString instead of regexp.Regexp.ReplaceAllString in places where the replacement cannot contain matching group placeholders	2024-07-17 12:57:43 +02:00
rtm0	1b03d7e6de	Fix inconsistent error handling in Storage.AddRows() (#6583 ) `Storage.AddRows()` returns an error only in one case: when `Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But the same error is treated as a warning when it happens inside `Storage.add()` or returned by `Storage.prefillNextIndexDB()`. This commit fixes this inconsistency by treating the error returned by `Storage.updatePerDateData()` as a warning as well. As a result `Storage.add()` does not need a return value anymore and so doesn't `Storage.AddRows()`. Additionally, this commit adds a unit test that checks all cases that result in a row not being added to the storage. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-07-17 12:55:07 +02:00
Aliaksandr Valialkin	5168d0f754	lib/promrelabel: add test for IfExpression.String() function While at it, simplify this function a bit after the commit `861852f262` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462	2024-07-16 18:32:34 +02:00
Aliaksandr Valialkin	617a7b4db6	lib/promscrape/discovery/yandexcloud: follow-up for `070abe5c71` - Obtain IAM token via GCE-like API instead of Amazon EC2 IMDSv2 API, since it looks like IMDBSv2 API isn't supported by Yandex Cloud according to https://yandex.cloud/en/docs/security/standard/authentication#aws-token : > So far, Yandex Cloud does not support version 2, so it is strongly recommended > to technically disable getting a service account token via the Amazon EC2 metadata service. - Try obtaining IAM token via GCE-like API at first and then fall back to the deprecated Amazon EC2 IMDBSv1. This should prevent from auth errors for instances with disabled GCE-like auth API. This addresses @ITD27M01 concern at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513#issuecomment-1867794884 - Make more clear the description of the change at docs/CHANGELOG.md , add reference to the related issue. P.S. This change wasn't tested in prod because I have no access to Yandex Cloud. It is recommended to test this change by @ITD27M01 and @vmazgo , who filed the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6524	2024-07-16 18:06:33 +02:00
Aliaksandr Valialkin	6d237da3f3	lib/promscrape: follow-up for `1e83598be3` - Clarify that the -promscrape.maxScrapeSize value is used for limiting the maximum scrape size if max_scrape_size option isn't set at https://docs.victoriametrics.com/sd_configs/#scrape_configs - Fix query example for scrape_response_size_bytes metric at https://docs.victoriametrics.com/vmagent/#automatically-generated-metrics - Mention about max_scrape_size option at the -help description for -promscrape.maxScrapeSize command-line flag - Treat zero value for max_scrape_size option as 'no scrape size limit' - Change float64 to int type for scrapeResponseSize struct fields and function args, since response size cannot be fractional - Optimize isAutoMetric() function a bit - Sort auto metrics in alphabetical order in isAutoMetric() and in scrapeWork.addAutoMetrics() functions for better maintainability in the future Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6434 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429	2024-07-16 12:38:41 +02:00
Aliaksandr Valialkin	893a555051	Revert "lib/protoparser/opentelemetry/firehose: escape requestID before returning it to user (#6451 )" This reverts commit `cd1aca217c`. Reason for revert: this commit has no sense, since the firehose response has application/json content-type, so it must contain JSON-encoded timestamp and requestId fields according to https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html#responseformat . HTML-escaping the requestId field may break the response, so the client couldn't correctly recognize the html-escaped requestId. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6451	2024-07-16 09:50:16 +02:00
Aliaksandr Valialkin	8b76a40715	lib/httpserver: skip basic auth check for additional request paths, which should call httpserver.CheckAuthFlag() This is a follow-up for `61dce6f2a1` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6338 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329	2024-07-16 01:08:41 +02:00
Aliaksandr Valialkin	0cf18a6f63	lib/uint64set: optimize Set.Has() for nil Set - it should be inlined now This makes unnecessary the checkDeleted variable at lib/storage/index_db.go This is a follow-up for `b984f4672e` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6342	2024-07-16 00:00:46 +02:00
Aliaksandr Valialkin	cbbc6e7141	lib/mergeset: properly update TableMetrics.TooLongItemsDroppedTotal inside Table.UpdateMetrics Substitute '+=' with '=', since tooLongItemsTotal is global counter, which doesn't belong to the Table struct. This is a follow-up for `69d244e6fb` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6297	2024-07-15 23:41:26 +02:00
Aliaksandr Valialkin	476bf400ac	lib/{httputils,netutil}: move httputils.GetStatDialFunc to netutil.NewStatDialFunc - Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call - NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil - Simplify the implementation of NewStatDialFunc by removing sync.Map from there. - Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils - Use gauge instead of counter type for *_conns metric This is a follow-up for `d7b5062917` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299	2024-07-15 23:05:46 +02:00
Aliaksandr Valialkin	f4dce57ebe	lib/streamaggr/streamaggr.go: typo fix after `5e29ef5ed5`: IgnoredNaNSamples -> ignoredNaNSamples	2024-07-15 21:59:03 +02:00
Aliaksandr Valialkin	cbc637d1dd	app/vmagent/remotewrite: follow-up for `f153f54d11` - Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go . This improves code maintainability a bit. - Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs(). This prevents from potential resource leaks. - Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation. This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions. - Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics. - Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation. This label should simplify investigation of the exposed metrics. - Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability: it is hard to use these labels in query filters and aggregation functions. - Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` . This metric shows the number of samples generated by the corresponding streaming aggregation rule. This metric has been added in the commit `861852f262` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 - Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice. This metric has been added in the commit `861852f262` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 - Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params, which could modify the behaviour of the constructed streaming aggregator. Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory. - Pass Options arg to LoadFromFile() function by reference, since this structure is quite big. This also allows passing nil instead of Options when default options are enough. - Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics, so they have consistent set of labels comparing to the rest of streaming aggregation metrics. - Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding `aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding `output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output metrics. This is a follow-up for the commit `2eb1bc4f81` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604 - Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason in the commit `2eb1bc4f81` . - Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function. While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268	2024-07-15 20:25:36 +02:00
Aliaksandr Valialkin	a8356f3a26	vendor: update github.com/VictoriaMetrics/metrics from v1.34.1 to v1.35.0 Fix potential memory leaks across VictoriaMetrics codebase after metrics.UnregisterSet(s) call because of missing s.UnregisterAllMetrics() call. This is a follow-up for `6a6e34ab8e` . It is OK if some vmauth metrics aren't visible for a few microseconds when the previous metrics are unregistered and new metrics weren't registered yet. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6247 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6252 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5805	2024-07-15 10:45:39 +02:00
Aliaksandr Valialkin	4878152678	lib/{storage,mergeset}: do not allow setting dataFlushInterval to values smaller than pending{Items,Rows}FlushInterval Pending rows and items unconditionally remain in memory for up to pending{Items,Rows}FlushInterval, so there is no any sense in setting dataFlushInterval (the interval for guaranteed flush of in-memory data to disk) to values smaller than pending{Items,Rows}FlushInterval, since this doesn't affect the interval for flushing pending rows and items from memory to disk. This is a follow-up for `4c80b17027` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6221	2024-07-15 10:11:23 +02:00
Aliaksandr Valialkin	c3d2351948	lib/streamaggr: consistently use alphabetical order of benchmarked stream aggregation outputs	2024-07-15 09:53:26 +02:00
Aliaksandr Valialkin	21f049e211	lib/streamaggr: follow-up for `9c3d44c8c9` - Consistently enumerate stream aggregation outputs in alphabetical order across the source code and docs. This should simplify future maintenance of the corresponding code and docs. - Fix the link to `rate_sum()` at `see also` section of `rate_avg()` docs. - Make more clear the docs for `rate_sum()` and `rate_avg()` outputs. - Encapsulate output metric suffix inside rateAggrState. This eliminates possible bugs related to incorrect suffix passing to newRateAggrState(). - Rename rateAggrState.total field to less misleading rateAggrState.increase name, since it calculates counter increase in the current aggregation window. - Set rateLastValueState.prevTimestamp on the first sample in time series instead of the second sample. This makes more clear the code logic. - Move the code for removing outdated entries at rateAggrState into removeOldEntries() function. This make the code logic inside rateAggrState.flushState() more clear. - Do not write output sample with zero value if there are no input series, which could be used for calculating the rate, e.g. if only a single sample is registered for every input series. - Do not take into account input series with a single registered sample when calculating rate_avg(), since this leads to incorrect results. - Move {rate,total}AggrState.flushState() function to the end of rate.go and total.go files, so they look more similar. This shuld simplify future mantenance. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6243	2024-07-15 08:44:48 +02:00
Aliaksandr Valialkin	bc1f92d7f5	app/vmagent/remotewrite: follow-up for `87fd400dfc` - Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage systems are configured with the disabled on-disk queue, every in-memory queue is full and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common, so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx relabeling and other processing in this case. - Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped(). Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set. In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full and on-disk queue is disabled. The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage processes pending data, so it drops aggregated samples in this case. - Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output, so it is clear that this flag can be set individually per each -remoteWrite.url. - Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems are configured with the disabled on-disk queue, then there is no sense in keeping samples on some of these systems, while dropping samples on the remaining systems, since this will result in global stall on the remote storage system with the disabled on-disk queue and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false from remotewrite.TryPush() in this case. This will result in infinite duplicate samples written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set. This allows proceeding with newly scraped / pushed samples by sending them to the remaining remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set. - Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test. - Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url. See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065	2024-07-13 02:30:10 +02:00
Aliaksandr Valialkin	43fc1183b9	app/vmalert: switch from table-driven tests to f-tests This makes test code more clear and reduces the number of code lines by 500. This also simplifies debugging tests. See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e While at it, consistently use t.Fatal* instead of t.Error* across tests, since t.Error* requires more boilerplate code, which can result in additional bugs inside tests. While t.Error* allows writing logging errors for the same, this doesn't simplify fixing broken tests most of the time. This is a follow-up for `a9525da8a4`	2024-07-12 22:45:50 +02:00
hagen1778	c835a6351e	lib/streamaggr: add missing test cases Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `2f65956259`)	2024-07-12 14:19:17 +02:00
Hui Wang	f3cbd62823	vmagent: fix `vm_streamaggr_flushed_samples_total` counter (#6604 ) We use `vm_streamaggr_flushed_samples_total` to show the number of produced samples by aggregation rule, previously it was overcounted, and doesn't account for `output_relabel_configs`. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `2eb1bc4f81`)	2024-07-12 14:19:17 +02:00
hagen1778	7370f84b97	lib/bakcup/azremote: follow-up after `5fd3aef549` Simplify tests by converting them to f-tests. `5fd3aef549` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `03e4c5c19c`)	2024-07-10 15:17:06 +02:00
justinrush	e65e55e2dd	lib/backup: add support for Azure Managed Identity (#6518 ) ### Describe Your Changes These changes support using Azure Managed Identity for the `vmbackup` utility. It adds two new environment variables: * `AZURE_USE_DEFAULT_CREDENTIAL`: Instructs the `vmbackup` utility to build a connection using the [Azure Default Credential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity@v1.5.2#NewDefaultAzureCredential) mode. This causes the Azure SDK to check for a variety of environment variables to try and make a connection. By default, it tries to use managed identity if that is set up. This will close https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). ### Testing However you normally test the `vmbackup` utility using Azure Blob should continue to work without any changes. The set up for that is environment specific and not listed out here. Once regression testing has been done you can set up [Azure Managed Identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview) so your resource (AKS, VM, etc), can use that credential method. Once it is set up, update your environment variables according to the updated documentation. I added unit tests to the `FS.Init` function, then made my changes, then updated the unit tests to capture the new branches. I tested this in our environment, but with SAS token auth and managed identity and it works as expected. --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Justin Rush <jarush@epic.com> Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `5fd3aef549`)	2024-07-10 12:26:21 +02:00
Aliaksandr Valialkin	a1decb5ca1	app/vlinsert/loki: use easyproto instead for parsing Loki protobuf messages	2024-07-10 03:05:55 +02:00
Aliaksandr Valialkin	b8a8d3d6f1	lib/logstorage: drop all the pipes from the query when calculating the number of matching logs at /select/logsql/hits API	2024-07-10 00:39:16 +02:00
Aliaksandr Valialkin	d6415b2572	all: consistently use 'any' instead of 'interface{}' 'any' type is supported starting from Go1.18. Let's consistently use it instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.	2024-07-10 00:23:26 +02:00
Aliaksandr Valialkin	9edeecabc8	lib: consistently use f-tests instead of table-driven tests This makes easier to read and debug these tests. This also reduces test lines count by 15% from 3K to 2.5K . See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e . While at it, consistently use t.Fatal* instead of t.Error, since t.Error usually leads to more complicated and fragile tests, while it doesn't bring any practical benefits over t.Fatal*.	2024-07-09 22:39:13 +02:00
Aliaksandr Valialkin	f3be3573e7	lib/promscrape/discovery/vultr: follow-up after `17e3d019d2` - Sort the discovered labels in alphabetical order at https://docs.victoriametrics.com/sd_configs/#vultr_sd_configs - Rename VultrConfigs to VultrSDConfigs to be consistent with the naming for other SD configs. - Prepare query arg filters for `list instances API` at newAPIConfig() instead of passing them in a separate listParams struct. This simplifies the code a bit. - Return error when bearer token isn't set at vultr_sd_configs, since this token is mandatory according to https://docs.victoriametrics.com/sd_configs/#vultr_sd_configs - Remove unused fields from the parsed response from Vultr list instances API in order to simplify the code a bit. - Remove double logging of errors inside getInstances() function, since these errors must be already logged by the caller. - Simplify tests, so they are easier to maintain. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6068	2024-07-05 17:40:39 +02:00
Aliaksandr Valialkin	6397c38a0a	lib/logstorage: use quicktemplate.AppendJSONString instead of strconv.AppendQuote for encoding JSON strings The strconv.AppendQuote improperly encodes special chars such as \x1b . They must be encoded as \u001b . See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24	2024-07-05 01:22:49 +02:00
Aliaksandr Valialkin	172ae1adf7	Revert `c6c5a5a186` and `b2765c45d0` Reason for revert: There are many statsd servers exist: - https://github.com/statsd/statsd - classical statsd server - https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ ) - https://github.com/avito-tech/bioyino - high-performance statsd server - https://github.com/atlassian/gostatsd - statsd server in Go - https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd ( the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ). Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides significant advantages over the existing statsd servers, while has no significant drawbacks comparing to existing statsd servers. The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server. The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics ( see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers. So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent). Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation. In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation. This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed during querying. P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic specialized configuration for aggregating of statsd metrics. The main requirements for this configuration: - easy to write, read and update (ideally it should work out of the box for most cases without additional configuration) - hard to misconfigure (e.g. hard to shoot yourself in the foot) It would be great if this configuration will be compatible with the configuration of the most widely used statsd server. In the mean time it is recommended continue using external statsd server. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600	2024-07-03 23:57:49 +02:00
Aliaksandr Valialkin	7a60e8abf7	lib/promscrape: use prompbmarshal.MustParsePromMetrics function at parseData() test function The prompbmarshal.MustParsePromMetrics function has been added in the commit `cc4d57d650`	2024-07-03 16:10:37 +02:00
Aliaksandr Valialkin	cd152693c6	Revert "Exemplar support (#5982 )" This reverts commit `5a3abfa041`. Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below). Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ . Issues with Prometheus exemplars: - Prometheus still has only experimental support for exemplars after more than three years since they were introduced. It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature. See `0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)` and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage - It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus . - It looks like exemplars are supported for Histogram metric types only - see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar . Exemplars aren't supported for Counter, Gauge and Summary metric types. - Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query contains histogram_quantile() function. It queries exemplars via special Prometheus API - https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.) and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph. This makes the graph unusable, since it is litterally filled with thousands of exemplar dots. Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars. - Exemplars are usually connected to traces. While traces are good for some I doubt exemplars will become production-ready in the near future because of the issues outlined above. Alternative to exemplars: Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs - just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry! This doesn't work as expected in production as shown above. Are there better solutions, which work in production? Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job` and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry by these labes on the time range with the anomaly on metrics' graph. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5982	2024-07-03 16:09:18 +02:00
Aliaksandr Valialkin	a5d60ad78e	app/vmagent/remotewrite,lib/streamaggr: re-use common code in tests after `879771808b` - Export streamaggr.LoadFromData() function, so it could be used in tests outside the lib/streamaggr package. This allows removing a hack with creation of temporary files at TestRemoteWriteContext_TryPush_ImmutableTimeseries. - Move common code for mustParsePromMetrics() function into lib/prompbmarshal package, so it could be used in tests for building []prompbmarshal.TimeSeries from string. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6206	2024-07-03 15:22:51 +02:00
Aliaksandr Valialkin	f8779d1ed2	lib/streamaggr: follow-up for the commit `c0e4ccb7b5` - Clarify docs for `Ignore aggregation intervals on start` feature. - Make more clear the code dealing with ignoreFirstIntervals at aggregator.runFlusher() functions. It is better from readability and maintainability PoV using distinct a.flush() calls for distinct cases instead of merging them into a single a.flush() call. - Take into account the first incomplete interval when tracking the number of skipped aggregation intervals, since this behaviour is easier to understand by the end users. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6137	2024-07-02 21:34:48 +02:00
Andrii Chubatiuk	252aa5a3ab	lib/protoparser/graphite: added -graphite.sanitizeMetricName flag (#6489 ) ### Describe Your Changes Added flag to sanitize graphite metrics fixes #6077 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `476faf5578`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-07-02 17:16:00 +02:00
Aliaksandr Valialkin	7426b40250	lib/logstorage: allow writing `after N` in front of `before N` at `stream_context` pipe	2024-07-02 01:39:45 +02:00
Andrii Chubatiuk	937ae2ca90	lib/streamaggr: added stale samples metric, added metrics labels (#6462 ) ### Describe Your Changes - added stale metrics counters for input and output samples - added labels for aggregator metrics => `name="{rwctx}:{aggrId}:{aggrSuffix}"` - rwctx - global or number starting from 1 - aggrid - aggregator id starting from 1 - aggrSuffix - <interval>_(by\|without)_label1_label2_labeln e.g: `name="global:1:1m_without_instance_pod"` ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `861852f262`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-07-01 15:01:49 +02:00
Aliaksandr Valialkin	208a624d4d	lib/logstorage: properly search for the surrounding logs in `stream_context` pipe The set of log fields in the found logs may differ from the set of log fields present in the log stream. So compare only the log fields in the found logs when searching for the matching log entry in the log stream. While at it, return _stream field in the delimiter log entry, since this field is used by VictoriaLogs Web UI for grouping logs by log streams.	2024-07-01 02:33:00 +02:00
Aliaksandr Valialkin	76a58ae08d	lib/logstorage: add ability to store sorted log position into a separate field with `sort ... rank <fieldName>` syntax	2024-07-01 01:46:03 +02:00
Aliaksandr Valialkin	d0dca7b8c5	lib/logstorage: add delimiter between log chunks returned from `\| stream_context` pipe	2024-07-01 01:46:02 +02:00
Aliaksandr Valialkin	4b3477e62b	lib/logstorage: add `stream_context` pipe, which allows selecting surrounding logs for the matching logs	2024-06-28 19:15:19 +02:00
Aliaksandr Valialkin	2f28819bb1	lib/logstorage: it is safe using `\| unroll` pipe in live tailing `\| unroll` pipe can make multiple copies of rows from the input row. This doesn't break live tailing, so allow `\| unroll` pipe in live tailing.	2024-06-27 19:45:12 +02:00
Aliaksandr Valialkin	b26acec9a8	app/vlselect: properly return live tailing results	2024-06-27 15:06:15 +02:00
Aliaksandr Valialkin	dd62a2b9d6	lib/logstorage: work-in-progress	2024-06-27 14:21:03 +02:00
Andrii Chubatiuk	580d02c3f8	added IMDSv2 for YC SD (#6524 ) ### Describe Your Changes Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-06-26 19:12:35 +02:00
rtm0	48a5c4cb01	Fix Date metricid cache consistency under concurrent use (#6534 ) ### Describe Your Changes Fix Date metricid cache consistency under concurrent use. When one goroutine calls Has() and does not find the cache entry in the immutable map it will acquire a lock and check the mutable map. And it is possible that before that lock is acquired, the entry is moved from the mutable map to the immutable map by another goroutine causing a cache miss. The fix is to check the immutable map again once the lock is acquired. ### Checklist The following checks are mandatory: - [x ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-06-26 19:12:34 +02:00
Aliaksandr Valialkin	d5cbda3424	app/vlstorage: add -retention.maxDiskSpaceUsageBytes command-line flag for limiting the retention at VictoriaLogs by disk space usage	2024-06-25 17:30:46 +02:00
Aliaksandr Valialkin	f24123a776	lib/logstorage: parse syslog structured data into separate fields in order to simplify further querying of this data	2024-06-25 14:54:25 +02:00
Aliaksandr Valialkin	1716c4e609	lib/logstorage: properly parse timezone offset at TryParseTimestampRFC3339Nano() The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets. While at it, make tryParseTimestampISO8601 function private in order to prevent from improper usage of this function from outside the lib/logstorage package. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508	2024-06-25 14:54:24 +02:00
Aliaksandr Valialkin	2a7fcba330	lib/logstorage: make golangci-lint happy	2024-06-25 03:06:28 +02:00
Aliaksandr Valialkin	7026498359	lib/httpserver: revert `9b7e532172` Reason for revert: this commit doesn't resolve real security issues, while it complicates the resulting code in subtle ways (aka security circus). Comparison of two strings (passwords, auth keys) takes a few nanoseconds. This comparison is performed in non-trivial http handler, which takes thousands of nanoseconds, and the request handler timing is non-deterministic because of Go runtime, Go GC and other concurrently executed goroutines. The request handler timing is even more non-deterministic when the application is executed in shared environments such as Kubernetes, where many other applications may run on the same host and use shared resources of this host (CPU, RAM bandwidth, network bandwidth). Additionally, it is expected that the passwords and auth keys are passed via TLS-encrypted connections. Establishing TLS connections takes additional non-trivial time (millions of nanoseconds), which depends on many factors such as network latency, network congestion, etc. This makes impossible to conduct timing attack on passwords and auth keys in VictoriaMetrics components. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6423/files Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6392	2024-06-25 01:51:06 +02:00

1 2 3 4 5 ...

2619 commits