github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-21 15:45:01 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	202eb429a7	lib/logstorage: refactor storage format to be more efficient for querying wide events It has been appeared that VictoriaLogs is frequently used for collecting logs with tens of fields. For example, standard Kuberntes setup on top of Filebeat generates more than 20 fields per each log. Such logs are also known as "wide events". The previous storage format was optimized for logs with a few fields. When at least a single field was referenced in the query, then the all the meta-information about all the log fields was unpacked and parsed per each scanned block during the query. This could require a lot of additional disk IO and CPU time when logs contain many fields. Resolve this issue by providing an (field -> metainfo_offset) index per each field in every data block. This index allows reading and extracting only the needed metainfo for fields used in the query. This index is stored in columnsHeaderIndexFilename ( columns_header_index.bin ). This allows increasing performance for queries over wide events by 10x and more. Another issue was that the data for bloom filters and field values across all the log fields except of _msg was intermixed in two files - fieldBloomFilename ( field_bloom.bin ) and fieldValuesFilename ( field_values.bin ). This could result in huge disk read IO overhead when some small field was referred in the query, since the Operating System usually reads more data than requested. It reads the data from disk in at least 4KiB blocks (usually the block size is much bigger in the range 64KiB - 512KiB). So, if 512-byte bloom filter or values' block is read from the file, then the Operating System reads up to 512KiB of data from disk, which results in 1000x disk read IO overhead. This overhead isn't visible for recently accessed data, since this data is usually stored in RAM (aka Operating System page cache), but this overhead may become very annoying when performing the query over large volumes of data which isn't present in OS page cache. The solution for this issue is to split bloom filters and field values across multiple shards. This reduces the worst-case disk read IO overhead by at least Nx where N is the number of shards, while the disk read IO overhead is completely removed in best case when the number of columns doesn't exceed N. Currently the number of shards is 8 - see bloomValuesShardsCount . This solution increases performance for queries over large volumes of newly ingested data by up to 1000x. The new storage format is versioned as v1, while the old storage format is version as v0. It is stored in the partHeader.FormatVersion. Parts with the old storage format are converted into parts with the new storage format during background merge. It is possible to force merge by querying /internal/force_merge HTTP endpoint - see https://docs.victoriametrics.com/victorialogs/#forced-merge .	2024-10-16 17:35:07 +02:00
Andrii Chubatiuk	daa7183749	lib/protoparser/influx: enable batch processing by default (#7165 ) ### Describe Your Changes Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7090 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-10-15 11:48:40 +02:00
Aliaksandr Valialkin	bac193e50b	app/vlselect: do not show empty fields in query results Empty fields are treated as non-existing fields by VictoriaLogs data model. So there is no sense in returning empty fields in query results, since they may mislead and confuse users.	2024-10-14 23:43:58 +02:00
Aliaksandr Valialkin	3c73dbbacc	app/vlstorage: add support for forced merge via /internal/force_merge HTTP endpoint	2024-10-13 22:20:31 +02:00
Aliaksandr Valialkin	b4b79a4961	lib/logstorage: make a copy of s.partitions slice when performing queries over the selected partitions s.partitions can be changed when new partition is registered or when old partition is dropped. This could lead to data races and panics when s.partitions slice is accessed by concurrently executed queries. The fix is to make a copy of the selected partitions under s.partitionsLock before performing the query.	2024-10-13 22:14:34 +02:00
Aliaksandr Valialkin	507b206a7d	lib/logstorage: move getConstColumnValue() and getColumnHeader() methods from columnsHeader to blockSearch This localizes blockSearch.getColumnsHeader() call at block_search.go . This call is going to be optimized in the next commits in order to avoid unmarshaling of header data for unneeded columns, which weren't requested by getConstColumnValue() / getColumnHeader().	2024-10-13 14:29:02 +02:00
Aliaksandr Valialkin	279e25e7c8	lib/logstorage: avoid redundant copying of column names and column values for dictionary-encoded columns during querying Refer the original byte slice with the marshaled columnsHeader for columns names and dictionary-encoded column values. This improves query performance a bit when big number of blocks with big number of columns are scanned during the query.	2024-10-13 13:25:38 +02:00
Aliaksandr Valialkin	9e48074b59	lib/logstorage: avoid calling columnsHeader.initFromBlockHeader() multiple times for the same blockSearch This should improve performance when blockSearch.getColumnsHeader() is called multiple times from different places of the code.	2024-10-13 12:56:12 +02:00
Aliaksandr Valialkin	867f671cc4	lib/logstorage: make sure that bs.br is non-nil before checking br.bs.bsw.bh.rowsCount there br.bs may be nil when br contains the block with additional filters applied during pipe calculations. For example, `* \| count() if (error) errors`.	2024-10-12 20:51:29 +02:00
Andrii Chubatiuk	9eb0c1fd86	lib/protoparser/opentelemetry: added exponential histograms support (#6354 ) ### Describe Your Changes added opentelemetry exponential histograms support. Such histograms are automatically converted into VictoriaMetrics histogram with `vmrange` buckets. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-10-11 13:44:52 +02:00
Aliaksandr Valialkin	7b475ed95d	lib/logstorage: disallow using pipe names as the first unquoted words in `filter` pipe Improperly written pipes could be silently parsed as filter pipe. For example, the following query: * \| by (x) was silently parsed to: * \| filter "by" x It is better to return error, so the user could identify and fix invalid pipe instead of silently executing invalid query with `filter` pipe.	2024-10-09 16:10:13 +02:00
Aliaksandr Valialkin	6acf543b90	lib/logstorage: disallow using by as the first word in log filters, since it frequently clashes with `stats by(...)` pipe where `stats` word is omitted	2024-10-09 15:53:15 +02:00
Zakhar Bessarab	eefae85450	vmagent: add support of HTTP2 client for Kubernetes SD (#7114 ) ### Describe Your Changes Currently, vmagent always uses a separate `http.Client` for every group watcher in Kubernetes SD. With a high number of group watchers this leads to large amount of opened connections. This PR adds 2 changes to address this: - re-use of existing `http.Client` - in case `http.Client` is connecting to the same API server and uses the same parameters it will be re-used between group watchers - HTTP2 support - this allows to reuse connections more efficiently due to ability of using streaming via existing connections. See this issue for the details and test results - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5971 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-10-08 10:36:31 +02:00
Aliaksandr Valialkin	89686094a0	lib/logstorage: allow special chars in unquoted _stream tag names and values This simplifies writing _stream filters. For example, {foo-bar=abc:de} can be written instead of {"foo-bar"="abc:de"}	2024-10-07 15:10:03 +02:00
Aliaksandr Valialkin	462b7cd597	lib/logstorage: quote logfmt strings only if they contain special chars, which could break logfmt parsing and/or reading	2024-10-07 14:31:30 +02:00
Artem Fetishev	c1cd3e85a7	lib/promscrape: Fix TestClientProxyReadOk flaky test (#7173 ) This PR fixes #7062 For hijacked connections, one has to read from the connection buffer, but still write directly to the connection. Otherwise, when reading directly from such connections, the first byte may be lost. This, in turn corrupts the ClientHello TLS handshake message and when the backend server receives it, it closes the connection and reports the following error in the log: ``` http: TLS handshake error from 127.0.0.1:33150: tls: first record does not look like a TLS handshake ``` The first byte may be lost because underlying HTTP request handler may read it from the connection and put it into the buffer. As the result, subsequent connection reads won't see that byte. - See: https://github.com/golang/go/issues/27408 - The fix is taken from : https://github.com/k3s-io/k3s/pull/6216 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>	2024-10-03 18:27:15 +02:00
Aliaksandr Valialkin	364f084b43	lib/logstorage: add `len` pipe for calculating byte length of log field values	2024-10-03 18:21:10 +02:00
hagen1778	2404b4bc00	v1.104.0 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEkhL6N9vmSTjg0VSVO/dfN0HKlkAFAmb9KqIACgkQO/dfN0HK lkAXcBAAluUXBa4oYV4g/xvsd30oXtC79DoFY527K1cTgesDohf0FdLGU++7Aphm efOR8BaytBPGHGn9PmuZIbebiFv6TVBih7b8gl+frm/yGLh/1WyAYp2sClB1KcJa r7rHBMF7sikDkLPFlJv9qYhERj05aUTc/uwWn7KzUMPbmUZcXOJhxttm1Hf7Rc6P zcO1cymSEouzSOw0qoHFHRZYgkt9j1GW36vUgEX6+b3VJvOAhoaolw6OX65wt8Cm +YdXW51gEalZRIRNtgY3lDJnCAHn72RsRbLpylyGW1TcuBnwfSIWlPpLU04IGVlx 06Vl47o/6vEBoVKk+2Y6La4iwD8+x/Td1RlrELOo4Qzrv1ppqOCveUa0wh6JQfjB aQawE7Yzh35qKvRVZtgY8NaUzkTL2QISlnpkokHfZZLIn6WAhok4c+vxnCl5CaBE 3yRenqZ/OdMs+Wa8WMb6thcxA0eQ40t3B35iYyvMJdhSKDtdNT2F5kFh7ve6Woiu 2TmN+GWPM0zBMVEVGy1i1L+42dlG6ANY3p5a8vz0qfqBBJF+V+P/BetfejTPjJ7r PN6HpdcfN+a+FGsUWckhFSU7z0LFJIytQyxb6vGn5N1UW0pupQMs5E/jFcFJl2/Q yO8WZmGm4QfhupcfgAkTIBgsUliqmIBXsNk6sjhzwbxYBtXSqwQ= =F+22 -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEkhL6N9vmSTjg0VSVO/dfN0HKlkAFAmb9XnUACgkQO/dfN0HK lkAYJw/8D+fAkqo48iNynRZf8N7kR22XBVc+zvJIFIL8wyScMec3o6+rRQ5WmmEk MxT73EWW2Nv81L4JC585u3zutM7Ow7nG1paxrF2hWLNAniKJd+Z+okRWThf89c8Q IC2egeVtgQ9ADNBTNGF72FsBBj+P6rv3Xe/M0XSLCS4mLY1eVnhdx7yuQsSNkzpr hxndq5odwEprFNXe9WEgH04ekS3u0ZMzWidhSHJpZVXDt6iFTxfoD+NkYpPRIZuc KwE0Zm1eTn98MJNZvoVyJ2hbD3f513I5yvdaNMFZ0I08Dh281uugYZu8r7mwqS49 0uCC9PoEuErYbCGCGjmXOGVnyB6vvRjIfIOif/M1KqpH5g7xTKWc9S23P2ib3HgI brFl5EDl1Qa+qnkwWC98G58b85hjTJjLYhbst+O/MW+j6W2zihrt0N9UsKKTPgzj xvLhYz97wF0GCOfD5sZyyMdTCI6QWqtbE79ysHw+WCSrbZIKh6MFp6eO6qQF3JWT 9IPT6O9G57Q9iwtS+MSVgriobE7qV/fHB/ICiciTGtsYfsovwxnq8BJuBiehwqau deqf4gbsZQiME1i+o9nnOcekDXkziKnkJIv8E5NBq77NQEzliSwfHwoaTtEusj7n 4XbgRX37B8XtANVg1twWZb8gYFtxqYoojymAKx/Ag2e4I3qnzbM= =e9ts -----END PGP SIGNATURE----- Merge tag 'v1.104.0' into pmm-6401-read-prometheus-data-files v1.104.0 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-10-02 16:53:41 +02:00
Roman Khavronenko	0d4f4b8f7d	(app\|lib)/vmstorage: do not increment `vm_rows_ignored_total` on NaNs (#7166 ) `vm_rows_ignored_total` metric is a metric for users to signalize about ingestion issues, such as bad timestamp or parsing error. In commit `a5424e95b3` this metric started to increment each time vmstorage gets NaN. But NaN is a valid value for Prometheus data model and for Prometheus metrics exposition format. Exporters from Prometheus ecosystem could expose NaNs as values for metrics and these values will be delivered to vmstorage and increment the metric. Since there is nothing user can do with this, in opposite to parsing errors or bad timestamps, there is not much sense in incrementing this metric. So this commit rolls-back `reason="nan_value"` increments. ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-10-02 12:37:27 +02:00
Aliaksandr Valialkin	a350be48b6	lib/logstorage: do not count dictionary values which have no matching logs in `count_uniq` stats function Create blockResultColumn.forEachDictValue* helper functions for visiting matching dictionary values. These helper functions should prevent from counting dictionary values without matching logs in the future. This is a follow-up for `0c0f013a60` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7152	2024-10-01 13:34:45 +02:00
Aliaksandr Valialkin	630211cfed	app/vlogscli: add interactive command-line tool for querying VictoriaLogs	2024-10-01 12:23:07 +02:00
Zhu Jiekun	7bb8853a5c	feature: [vmagent] Add service discovery support for OVH Cloud VPS and dedicated server (#6160 ) ### Describe Your Changes related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6071 #### Added - Added service discovery support for OVH Cloud: - VPS. - Dedicated server. #### Docs - `CHANGELOG.md`, `sd_configs.md`, `vmagent.md` are updated. #### Note - Useful links: - OVH Cloud VPS API: https://eu.api.ovh.com/console/#/vps~GET - OVH Cloud Dedicated server API: https://eu.api.ovh.com/console/#/dedicated/server~GET - OVH Cloud SDK: https://github.com/ovh/go-ovh - Prometheus SD: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ovhcloud_sd_config Tested on OVH Cloud VPS and dedicated server. <img width="1722" alt="image" src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/30280396/d3f0adc8-b0ef-423e-9379-8a9b9b0792ee"> <img width="1724" alt="image" src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/30280396/18b5b730-3512-4fc0-8b2c-f2450ac550fd"> --- Signed-off-by: Jiekun <jiekun@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-09-30 14:42:46 +02:00
Hui Wang	664f337c70	stream aggregation: fix possible duplicated aggregation results (#7118 ) When ingesting samples with the same labels(duplicated samples or samples with the same labels after `by` or `without` options). They could register different entries for the same labelset in LabelsCompressor. For example, both index 99 and 100 can be assigned to label `foo=1` in two concurrent pushes. Then due to differing label indexes in encoded keys, the samples will appear as distinct in aggrState, resulting in duplicated results after decompressing the label indexes. `fbde238cdc/lib/streamaggr/streamaggr.go (L933)` In this pull request, since we need to store `idxToLabel` first to ensure the idx can be searched after `lc.labelToIdxStore`, the `lc.idxToLabel` still could contain a duplicated entries [100]="foo=1". But given the low likelihood of this issue and the size of idxToLabel, it should be fine.	2024-09-30 14:24:59 +02:00
Aliaksandr Valialkin	0c0f013a60	lib/logstorage: skip values with zero hits for 'uniq', 'top' and 'field_values' pipes See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/72#issuecomment-2352078483	2024-09-30 14:15:07 +02:00
Artem Fetishev	ed5da38ede	Introduce a flag for limiting the number of time series to delete (#7091 ) ### Describe Your Changes Introduce the `-search.maxDeleteSeries` flag that limits the number of time series that can be deleted with a single `/api/v1/admin/tsdb/delete_series` call. Currently, any number can be deleted and if the number is big (millions) then the operation may result in unaccounted CPU and memory usage spikes which in some cases may result in OOM kill (see #7027). The flag limits the number to 30k by default and the users may override it if needed at the vmstorage start time. --------- Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-09-30 10:02:21 +02:00
Aliaksandr Valialkin	1da4650143	lib/logstorage: allow using `!` in unescaped phrase Previously the phrase filter with `!` was treated unexpectedly. For example, `foo!bar` filter was treated at `foo AND NOT bar`, while most users expect that it matches "foo!bar" phrase. This commit aligns with users' expectations.	2024-09-29 11:14:15 +02:00
Aliaksandr Valialkin	60183c7c79	lib/logstorage: allow using `-` instead of `!` in front of `(...)`	2024-09-29 11:12:22 +02:00
Nikolay	3bbb2aed72	fscore: rollback trailing space trim (#7106 ) Previous commit `201fd6de1e` removed trailing space trim from data read from file. But common practice is to remove such trailing space. And it leaded to the authorization errors for the major group of users. In first place, this change must help to mitigate an issue with kubernetes. When authorization information was read from Secret content. Changes to the operator was made to mitigate such problem at commit `1cf64358c8` We could introduce later optional flag for VictoriaMetrics to disable trim space behavior. Related issues: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6986 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7089 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6947 --------- Signed-off-by: f41gh7 <nik@victoriametrics.com> Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>	2024-09-29 10:59:25 +02:00
Aliaksandr Valialkin	b52862badf	lib/logstorage: return the expected `hits` results from `uniq` pipe when the number of unique values reaches the specified limit Previously `uniq` pipe could return zero `hits` if the number of found unique values equals the specified limit. This wasn't expected in most cases.	2024-09-29 10:51:09 +02:00
Aliaksandr Valialkin	55eb321f77	lib/logstorage: clear hits slice obtained from encoding.GetUint64s() before updating it with hits for valueTypeDict column encoding.GetUint64s() returns uninitialized slice, which may contain arbitrary values. So values in this slice must be reset to zero before using it for counting hits in `uniq` and `top` pipes.	2024-09-29 10:29:13 +02:00
Aliaksandr Valialkin	94afcbd9a9	lib/logstorage: postpone initialization of per-shard stateSizeBudget until the first call to pipeProcessor.writeBlock() This simplifies pipeProcessor initialization logic a bit. This also doesn't mangle the original maxStateSize value, which is used in error messages when the state size exceeds maxStateSize.	2024-09-29 10:29:13 +02:00
Aliaksandr Valialkin	0b91452ca4	lib/logstorage: add non-empty `if (...)` condition to automatically generated result names in `stats` pipe This allows executing queries with `stats` pipe, which calculate multiple results with the same functions, but with different `if (...)` conditions. For example: _time:5m \| count(), count() if (error) Previously such queries couldn't be executed becasue automatically generated name for the second result didn't include `if (error)`, so names for both results were identical - `count(*)`.	2024-09-29 09:51:28 +02:00
Aliaksandr Valialkin	8772aea24b	lib/logstorage: support `order` alias for `sort` pipe Now the following queries are equivalents: _time:5s \| sort by (_time) _time:5s \| order by (_time) This is needed for convenience, since `order by` is commonly used in other query languages such as SQL.	2024-09-29 09:51:27 +02:00
Aliaksandr Valialkin	09b309a82e	lib/logstorage: allow using `-` instead of `!` as a shorthand for `NOT` operator in LogsQL	2024-09-27 13:14:47 +02:00
Aliaksandr Valialkin	76c1b0b8ea	lib/logstorage: support skipping _stream: prefix for stream filters '_stream:{...}' can be written as '{...}' This simplifies writing queries with stream filters, and makes them more familier to Loki users.	2024-09-27 13:14:46 +02:00
Aliaksandr Valialkin	9367a9a6a2	lib/logstorage: consistently sort stream contexts belonging to different streams by the minimum time seen in the matching logs This should simplify debugging of stream_context output, since it remains stable over repeated requests.	2024-09-27 11:19:26 +02:00
Aliaksandr Valialkin	b49d1ea809	lib/logstorage: add _msg="---" delimiter between different log streams in stream_context output This should help investigating contexts, which belong to different log streams.	2024-09-27 11:01:13 +02:00
Aliaksandr Valialkin	b82bd0c2ec	lib/logstorage: improve performance for stream_context pipe over streams with big number of log entries Do not read timestamps for blocks, which cannot contain surrounding logs. This should improve peformance for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6730 . Also optimize min(_time) and max(_time) calculations a bit by avoiding conversion of timestamp to string when it isn't needed. This should improve performance for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .	2024-09-26 22:22:23 +02:00
Aliaksandr Valialkin	3646724c6f	lib/contextutil: make golanci-lint happy by substituing unused function arg name with _ This is a follow-up for `4b1611267f`	2024-09-26 17:06:48 +02:00
Aliaksandr Valialkin	4b1611267f	lib/logstorage: properly return surrounding logs outside the selected time range by stream_context pipe Previously only logs inside the selected time range could be returned by stream_context pipe. For example, the following query could return up to 10 surrounding logs only for the last 5 minutes, while most users expect this query should return up to 10 surrounding logs without restrictions on the time range. _time:5m panic \| stream_context before 10 This enables the ability to implement stream context feature at VictoriaLogs web UI: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7063 . Reduce memory usage when returning stream context over big log streams with millions of entries. The new logic scans over all the log messages for the selected log stream, while keeping in memory only the given number of surrounding logs. Previously all the logs for the given log stream on the selected time range were loaded in memory before selecting the needed surrounding logs. This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6730 . Reduce the scan performance for big log streams by fetching only the requested fields. For example, the following query should be executed much faster than before if logs contain many fields other than _stream, _msg and _time: panic \| stream_context after 30 \| fields _stream, _msg, _time	2024-09-26 17:03:45 +02:00
Aliaksandr Valialkin	037652d5ae	app/vlinsert: support `_time` field without timezone information during data ingestion Use local timezone of the host server in this case. The timezone can be overridden with TZ environment variable if needed. While at it, allow using whitespace instead of T as a delimiter between data and time in the ingested _time field. For example, '2024-09-20 10:20:30' is now accepted during data ingestion. This is valid ISO8601 format, which is used by some log shippers, so it should be supported. This format is also known as SQL datetime format. Also assume local time zone when time without timezone information is passed to querying APIs. Previously such a time was parsed in UTC timezone. Add `Z` to the end of the time string if the old behaviour is preferred. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6721	2024-09-26 12:49:35 +02:00
Aliaksandr Valialkin	255d1d4e13	app/vlselect/logsql: clone the query with the current timestamp when performing live tailing requests in the loop Previously the original timestamp was used in the copied query, so _time:duration filters were applied to the original time range: (timestamp-duration ... timestamp]. This resulted in stopped live tailing, since new logs have timestamps bigger than the original time range. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7028	2024-09-26 08:57:23 +02:00
Aliaksandr Valialkin	e9950f6307	lib/logstorage: add `blocks_count` pipe This pipe is useful for debugging purposes when the number of processed blocks must be calculated for the given query: <query> \| blocks_count This helps detecting the root cause of query performance slowdown in cases like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070	2024-09-25 19:17:48 +02:00
Aliaksandr Valialkin	65b93b17b1	lib/logstorage: lazily read column headers metadata during queries This improves performance for analytical queries, which do not need column headers metadata. For example, the following query doesn't need column headers metadata, since _stream and min(_time) are stored in block header, which is read separately from colum headers metadata: _time:1w \| stats by (_stream) min(_time) min_time This commit significantly improves the performance for this query. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070	2024-09-25 19:17:48 +02:00
Aliaksandr Valialkin	4599429f51	lib/logstorage: read timestamps column when it is really needed during query execution Previously timestamps column was read unconditionally on every query. This could significantly slow down queries, which do not need reading this column like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .	2024-09-25 19:17:47 +02:00
Aliaksandr Valialkin	7f1ba18719	lib/logstorage: improve the performance of obtaining _stream column value Substitute global streamTagsCache with per-blockSearch cache for ((stream.id) -> (_stream value)) entries. This improves scalability of obtaining _stream values on a machine with many CPU cores, since every CPU has its own blockSearch instance. This also should reduce memory usage when querying logs over big number of streams, since per-blockSearch cache of ((stream.id) -> (_stream value)) entries is limited in size, and its lifetime is bounded by a single query.	2024-09-24 20:57:00 +02:00
Aliaksandr Valialkin	cf2e7d0d92	lib/logstorage/consts.go: document that it isn't recommended setting maxColumnsPerBlock constant to too big values This should help avoiding cases like this one - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6425#issuecomment-2337446083	2024-09-24 18:51:46 +02:00
Aliaksandr Valialkin	f86e093b20	lib/logstorage: improve performance for streamID.marshalString() by more than 2x The streamID.marshalString() is executed in hot path if the query selects _stream_id field. Command to run the benchmark: go test ./lib/logstorage/ -run=NONE -bench=BenchmarkStreamIDMarshalString -benchtime=5s Results before the commit: BenchmarkStreamIDMarshalString-16 438480714 14.04 ns/op 71.23 MB/s 0 B/op 0 allocs/op Results after the commit: BenchmarkStreamIDMarshalString-16 982459660 6.049 ns/op 165.30 MB/s 0 B/op 0 allocs/op	2024-09-24 18:35:04 +02:00
Aliaksandr Valialkin	919d2dc90e	lib/logstorage: add benchmark for streamID.marshalString	2024-09-24 18:31:38 +02:00
hagen1778	8bb3f2fd43	lib/promscrape: make linter happy Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-09-24 15:12:55 +02:00
hagen1778	c7569dac50	lib/promscrape: temporary disable TestClientProxyReadOk This test is very flaky and prevents other tests from running in CI. Disabling this test should improve tests quality, since it isn't reliable anyway. There is a ticket to fix this test - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7062 Once fixed, this test should be uncommented. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-09-24 14:59:25 +02:00
Dmytro Kozlov	cbeb7d50e8	lib/promscrape: show only unhealthy targets if `show_only_unhealthy` filter is enabled (#6960 ) ### Describe Your Changes It is better to show only unhealthy targets instead of all of them when `show_only_unhealthy` filter is enabled. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3536 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-09-24 12:18:24 +02:00
Aliaksandr Valialkin	109772bdc4	lib/cgroup: round GOMAXPROCS to the lower integer value of cpuQuota Rounding GOMAXPROCS to the upper interger value of cpuQuota increases chances of CPU starvation, non-optimimal goroutine scheduling and additional CPU overhead related to context switching. So it is better to round GOMAXPROCS to the lower integer value of cpuQuota.	2024-09-23 16:09:12 +02:00
Artem Fetishev	55febc0920	lib/storage: restore ability to put empty metric ID list into tagFiltersToMetricIDsCache (#7064 ) ### Describe Your Changes Currently it the metricID list is empty it won't be mashalled and as the result won't be put into the tagFiltersToMetricIDsCache which causes the cache misses for the corresponding tagFilters. In some setups this causes severe search speed detradation (see #7009). The empty metric IDs was covered before but then was accidentally removed in `6c21439`. This PR restores the coverage of this case. A new unit test can be used as a proof that empty metricID lists are not added to the cache (just remove the fix in index_db.go and run the test to see the result) Also a benchmark has been added to see the implications of the compression. ``` user@laptop:~/p/github.com/rtm0/VictoriaMetrics/01/src$ go test ./lib/storage/ -run=NONE -bench BenchmarkMarshalUnmarshalMetricIDs --loggerLevel=ERROR goos: linux goarch: amd64 pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage cpu: 13th Gen Intel(R) Core(TM) i7-1355U BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-0-12 3237240 363.5 ns/op 0 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1-12 2831049 451.8 ns/op 0.4706 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10-12 1152764 1009 ns/op 1.667 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100-12 297055 3998 ns/op 5.755 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000-12 31172 34566 ns/op 8.484 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000-12 4900 289659 ns/op 9.416 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100000-12 447 2341173 ns/op 9.456 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000000-12 42 24926928 ns/op 9.468 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000000-12 5 204098872 ns/op 9.467 compression-rate PASS ok github.com/VictoriaMetrics/VictoriaMetrics/lib/storage 15.018s ``` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-09-20 17:21:53 +02:00
Aliaksandr Valialkin	787b9cd9a0	lib/storage: improve performance for indexSearch.containsTimeRange() The indexSearch.containsTimeRange() function is called for the current indexDB and the previous indexDB every time when searching for metricIDs by label filters. This function consumes a lot of additional CPU time for cases when queries with lightweight label filters are sent to VictoriaMetrics at high rate (e.g. thousands of RPS), like in the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7009 . Optimize indexSearch.containsTimeRange() function in the following ways: - Unconditionally return true if this function is called for the current indexDB, since there are very high chances that the current indexDB contains the data with timestamps in the requested time range. - Cache the minimum timestamp, which is missing in the indexed data for the previous indexDB. This is safe to do, since the previous indexDB is readonly. This optimization eliminates potentially slow lookup in the previous indexDB for typical use cases when the requested time range is close to the current time.	2024-09-20 13:07:20 +02:00
Aliaksandr Valialkin	6f61e9d49d	lib/storage: simplify indexDB.doExtDB() usage by removing the returned value Previously indexDB.doExtDB() was returning boolean value, which was indicating whether f callback was called. There is no need in returning this boolean value, since the f callback can determine on itself whether it was called. This simplifies the code a bit. While at it, document indexDB.doExtDB().	2024-09-20 11:59:57 +02:00
Roman Khavronenko	218c533874	lib/storage: follow-up after `d8f8822fa5` (#7036 ) Make function name and comments more clear. `d8f8822fa5` Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-09-20 11:50:47 +02:00
Aliaksandr Valialkin	a3d8077959	lib/logstorage: make sure that getCommonTokens returns common tokens in the original order of tokens inside tokenSets arg This fixes flaky test TestGetCommonTokensForOrFilters: filter_or_test.go:143: unexpected tokens for field "_msg"; got ["foo" "bar"]; want ["bar" "foo"]	2024-09-19 15:59:48 +02:00
Roman Khavronenko	e115b85770	lib/logger: increase default value of `-loggerMaxArgLen` cmd-line fla… (#7008 ) …g from 1e3 to 5e3 This should improve visibility on errors produced by very long queries. The change is classified as BUG in order to port it to LTS releases. ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Mathias Palmersheim <mathias@victoriametrics.com>	2024-09-19 14:29:18 +02:00
Nikolay	d8f8822fa5	lib/storage: consistently check for missing metricID index records (#6967 ) * Previously, only metricID->metricName missing index records were tracked with deadline But it was possible a case for missing metricID->TSID index records. IndexDB metrics fix exposed misleading metric for such missing records. * This commit adds check for metricID->TSID missing index records. And delete missing metricID entry if it hit 60 second deadline. Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6931 Signed-off-by: f41gh7 <nik@victoriametrics.com>	2024-09-16 10:05:08 +02:00
Nikolay	264c2ec6bd	lib/fs: properly call windows APIs (#6998 ) Previously we manually imported system windows DDLs and made direct syscall. But golang exposes syscall wrappers with sys/windows package. It seems, that direct syscall was broken at 1.23 golang release. It was `GetDiskFreeSpace` syscall in our case. This commit replaces all manual syscalls with wrappers Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6973 Related golang issue: https://github.com/golang/go/issues/69029 Signed-off-by: f41gh7 <nik@victoriametrics.com>	2024-09-13 12:22:25 +02:00
Aliaksandr Valialkin	657988ac3a	app/vlselect: consistently reuse the original query timestamp when executing /select/logsql/query with positive limit=N query arg Previously the query could return incorrect results, since the query timestamp was updated with every Query.Clone() call during iterative search for the time range with up to limit=N rows. While at it, optimize queries, which find low number of matching logs, while spend a lot of CPU time for searching across big number of logs. The optimization reduces the upper bound of the time range to search if the current time range contains zero matching rows. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6785	2024-09-08 14:32:23 +02:00
Aliaksandr Valialkin	45a3713bdb	lib/logstorage: preserve the order of tokens to check against bloom filters in AND filters Previously tokens from AND filters were extracted in random order. This could slow down checking them agains bloom filters if the most specific tokens go at the beginning of the AND filters. Preserve the original order of tokens when matching them against bloom filters, so the user could control the performance of the query by putting the most specific AND filters at the beginning of the query. While at it, add tests for getCommonTokensForAndFilters() and getCommonTokensForOrFilters(). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556	2024-09-08 12:27:30 +02:00
Aliaksandr Valialkin	eaee2d7db4	lib/logstorage: improve error logging for incorrect queries passed to /select/logsql/stats_query and /select/logsql/stats_query_range functions	2024-09-08 11:24:44 +02:00
Aliaksandr Valialkin	1cd06ace5a	lib/logstorage: properly extract common tokens from unsupported OR filters Previously the following query could miss rows matching !bar if these rows do not contain foo: foo OR !bar This is because of incorrect detection of common tokens for OR filters - all the unsupported filters were skipped (including the NOT filter (aka `!`)), while in this case zero common tokens must be returned. While at it, move repetiteve code in TestFilterAnd and TestFilterOr into f function. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556	2024-09-08 11:14:55 +02:00
Aliaksandr Valialkin	0a40064a6f	app/vlselect: add /select/logsql/stats_query_range endpoint for building time series panels in VictoriaLogs plugin for Grafana Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6943 Updates https://github.com/VictoriaMetrics/victorialogs-datasource/issues/61	2024-09-07 00:41:47 +02:00
Aliaksandr Valialkin	c9bb4ddeed	app/vlselect: add /select/logsql/stats_query endpoint, which is going to be used by vmalert Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6942 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706	2024-09-06 23:06:43 +02:00
Aliaksandr Valialkin	00e7d5add3	lib/logstorage: substitute `\|` operator with `or` operator at `math` pipe This is needed for avoiding confusion between the `\|` operator at `math` pipe and `\|` pipe delimiter. For example, the following query was parsed unexpectedly: * \| math foo / bar \| fields x as * \| math foo / (bar \| fields) as x Substituting `\|` with `or` inside `math` pipe fixes this ambiguity.	2024-09-06 22:44:14 +02:00
Artem Fetishev	a5424e95b3	lib/storage: adds metrics that count records that failed to insert ### Describe Your Changes Add storage metrics that count records that failed to insert: - `RowsReceivedTotal`: the number of records that have been received by the storage from the clients - `RowsAddedTotal`: the number of records that have actually been persisted. This value must be equal to `RowsReceivedTotal` if all the records have been valid ones. But it will be smaller otherwise. The values of the metrics below should provide the insight of why some records hasn't been added - `NaNValueRows`: the number of records whose value was `NaN` - `StaleNaNValueRows`: the number of records whose value was `Stale NaN` - `InvalidRawMetricNames`: the number of records whose raw metric name has failed to unmarshal. The following metrics existed before this PR and are listed here for completeness: - `TooSmallTimestampRows`: the number of records whose timestamp is negative or is older than retention period - `TooBigTimestampRows`: the number of records whose timestamp is too far in the future. - `HourlySeriesLimitRowsDropped`: the number of records that have not been added because the hourly series limit has been exceeded. - `DailySeriesLimitRowsDropped`: the number of records that have not been added because the daily series limit has been exceeded. --- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-09-06 17:57:21 +02:00
Aliaksandr Valialkin	0205170409	lib/logstorage: consistently use nsecsPerDay constant and remove nsecPerDay constant	2024-09-06 16:17:04 +02:00
Aliaksandr Valialkin	258ccfb953	lib/logstorage: pre-calculate hashes from tokens used in bloom filter search Previously per-token hashes for per-block bloom filters were re-calculated on every scanned block. This could be slow when the number of tokens is big or when the number of blocks to scan is big. Pre-calculate hashes for bloom filters and then use them for searching in bloom filters. This improves performance by 2.5x for in(...) filters with many values to search inside `in()`.	2024-09-05 19:44:17 +02:00
Zhu Jiekun	c193e6d43e	lib/discovery/azure: fix host check in next link in Azure SD (#6915 ) Previous bugfix at `49f63b2` only partially fixed pagination host validation error. Before this fix it was: ``` unexpected nextLink host \"management.azure.com\", expecting \"https://management.azure.com\" ``` Now we only check the `Host` without schema. However, when Azure respond `nextLink` in `Host:Port` format, the `nextLink` check will fail: ``` unexpected nextLink host \"management.azure.com:443\", expecting \"management.azure.com\" ``` This pull request further relaxes the checks by only checking the `Hostname`. --- related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6912	2024-09-05 16:48:09 +02:00
Artem Fetishev	39294b4919	lib/storage: do not drop stale NaN samples (#6936 ) This patch reverts `1fd3385` After discussing it we've come to conclusion that this is a valid behavior which can be avoided by deleting the time series only once the corresponding stale NaNs have been received. On the other hand, the fix leads to lost stale NaNs in some rare but valid use cases. For example: - In a cluster configuration the samples for a given time series are normally sent to the same vmstorage replica. However, wminsert may reroute the samples to another replica because the original one is down or is overloaded. In this case the stale NaN may end up on a replica that has no data for that time series, but we still want to record that sample. Thus, reverting that fix. --- related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069 Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-09-05 16:45:09 +02:00
Hui Wang	b48f5f3e59	lib/storage: fix metric `vm_object_references{type="indexdb"}` (#6937 ) follow up `4ecc370acb` ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-09-05 16:42:49 +02:00
Aliaksandr Valialkin	49e57ea80e	lib/logstorage: delete unused function - bloomfilter.containsAny	2024-09-05 16:21:06 +02:00
Aliaksandr Valialkin	2dd845fa53	lib/logstorage: properly fix incorrect extraction of common tokens for `OR` filters at distinct log fields Previously (f1:foo OR f2:bar) was incorrectly returning `foo` token for `f1` and `bar` token for `f2`. These tokens were used for checking against bloom filter for every data block, so the data block, which didn't contain simultaneously `foo` token for `f1` field and `bar` token for `f2` field, was skipped. This was incorrect, since such a block may contain logs matching the original OR filter. The fix is to return common tokens from `OR`-delimted filters only if these tokens exist at EVERY such filter for the given field name. If some `OR`-delimited filter misses the given field name, then `OR`-delimited filters do not contain common tokens, which could be used for checking against bloom filter. While at it, add more tests covering various edge cases for filters delimited by AND and OR. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556	2024-09-05 14:29:50 +02:00
f41gh7	7b0aaf1ea2	follow-up after `01430a155c` * properly check SeverityNumber at FormatSeverity function it could be negative, which could cause panic for victorialogs	2024-09-04 15:36:34 +02:00
Andrii Chubatiuk	01430a155c	vlinsert: added opentelemetry logs support Commit adds the following changes: * Adds support of OpenTelemetry logs for Victoria Logs with protobuf encoded messages * json encoding is not supported for the following reasons: - It brings a lot of fragile code, which works inefficiently. - json encoding is impossible to use with language SDK. * splits metrics and logs structures at lib/protoparser/opentelemetry/pb package. * adds docs with examples for opentelemetry logs. --- Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4839 Co-authored-by: AndrewChubatiuk <andrew.chubatiuk@gmail.com> Co-authored-by: f41gh7 <nik@victoriametrics.com>	2024-09-03 20:12:05 +02:00
rtm0	4df243d530	lib/storage: improve the message of the tooManyTimeseries error (#6893 ) ### Describe Your Changes This is a follow-up for #6836. Per @valyala's [comment](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6836#discussion_r1730291704), the error message does not reflect which flag needs to be adjusted. ### Checklist The following checks are mandatory: - [x ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-09-03 10:28:03 +02:00
jackyin	975ed27a76	lib/logstorage: `and` filter results in unexpected response (#6556 ) fix #6554 andfilter shouldn't return orfilter field which result in bloomfilter return false. --------- Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-09-03 10:17:44 +02:00
rtm0	2c856c6951	tests: check Metrics.RowsAddedTotal in unit tests (#6895 ) ### Describe Your Changes This is a follow-up PR: Unit tests introduced in #6872 can now use RowsAddedTotal counter whose scope was fixed in #6841. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-08-30 14:31:15 +02:00
Roman Khavronenko	f586082520	attempt to fix flaky TestClientProxyReadOk (#6899 ) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-30 13:23:32 +02:00
dufucun	95bafc8caf	tests: fix slice init length (#6897 ) ### Describe Your Changes fix slice init length ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: dufucun <dufuchun@sohu.com>	2024-08-30 10:55:25 +02:00
rtm0	334cd92a6c	testing: allow disabling fsync to make tests run faster (#6871 ) ### Describe Your Changes fsync() ensures that the data is written to disk. In production this is needed for data durability. However, during the development, when the unit tests are run, this level of durability is not needed. Therefore fsync() can be disabled which will makes test runs two times faster. The disabling is done by setting the `DISABLE_FSYNC_FOR_TESTING` environment variable. The valid values for this variable are the same as the values of the arg of `go doc strconv.ParseBool`: ``` 1, t, T, TRUE, true, True, 0, f, F, FALSE, false, False. ``` Any other value means `false`. The variable is set for all test build targets. Compare running times: Build Target \| DISABLE_FSYNC_FOR_TESTING=0 \| DISABLE_FSYNC_FOR_TESTING=1 ----------------- \| ------------------------------------------------ \| ------------------------------------------------- make test \| 1m5s \| 0m22s make test-race \| 3m1s \| 1m42s make test-pure \| 1m7s \| 0m20s make test-full \| 1m21s \| 0m32s make test-full-386 \| 1m42s \| 0m36s When running tests for a given package, fsync can be disabled as follows: ```shell DISABLE_FSYNC_FOR_TESTING=1 go test ./lib/storage ``` Disabling fsync() is intended for testing purposes only and the name of the variables reflects that. What could also have been done but haven't: - lib/filestream/filestream.go: `Writer.MustFlush()` also uses f.Sync() but nothing has been done to it, because the Writer.MustFlush() is not used anywhere in the VM codebase. A side question: what is the general policy for the unused code? - lib/filestream/filestream.go: Writer.Write() calls `adviceDontNeed()` which calls unix.Fdatasync(). Disabling it could potentially improve running time, but running tests with this code disabled has shown otherwise. ### Checklist The following checks are mandatory: - [ x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-08-30 10:54:46 +02:00
f41gh7	e3e06b1f47	Merge remote-tracking branch 'origin/master' into pmm-6401-read-prometheus-data-files-cpc	2024-08-29 15:47:43 +02:00
Nikolay	4ecc370acb	lib/storage: properly add previous indexDB metrics (#6890 ) Previously, some extIndexDB metrics were not registered. It resulted into missing metrics, if metric value was added to the extIndexDB. It's a usual case for search requests at both indexes. Current commit updates all metrics from extIndexDB according to the current IndexDB. It must fix such cases Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6868 ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-08-28 11:14:28 +02:00
rtm0	9fcfba3927	lib/storage: properly handle maxMetrics limit at metricID search `TL;DR` This PR improves the metric IDs search in IndexDB: - Avoid seaching for metric IDs twice when `maxMetrics` limit is exceeded - Use correct error type for indicating that the `maxMetrics` limit is exceded - Simplify the logic of deciding between per-day and global index search A unit test has been added to ensure that this refactoring does not break anything. --- Function calls before the fix: ``` idb.searchMetricIDs \|__ is.searchMetricIDs \|__ is.searchMetricIDsInternal \|__ is.updateMetricIDsForTagFilters \|__ is.tryUpdatingMetricIDsForDateRange \| \| \|__ is.getMetricIDsForDateAndFilters ``` - `searchMetricIDsInternal` searches metric IDs for each filter set. It maintains a metric ID set variable which is updated every time the `updateMetricIDsForTagFilters` function is called. After each successful call, the function checks the length of the updated metric ID set and if it is greater than `maxMetrics`, the function returns `too many timeseries` error. - `updateMetricIDsForTagFilters` uses either per-day or global index to search metric IDs for the given filter set. The decision of which index to use is made is made within the `tryUpdatingMetricIDsForDateRange` function and if it returns `fallback to global search` error then the function uses global index by calling `getMetricIDsForDateAndFilters` with zero date. - `tryUpdatingMetricIDsForDateRange` first checks if the given time range is larger than 40 days and if so returns `fallback to global search` error. Otherwise it proceeds to searching for metric IDs within that time range by calling `getMetricIDsForDateAndFilters` for each date. - `getMetricIDsForDateAndFilters` searches for metric IDs for the given date and returns `fallback to global search` error if the number of found metric IDs is greater than `maxMetrics`. Problems with this solution: 1. The `fallback to global search` error returned by `getMetricIDsForDateAndFilters` in case when maxMetrics is exceeded is misleading. 2. If `tryUpdatingMetricIDsForDateRange` proceeds to date range search and returns `fallback to global search` error (because `getMetricIDsForDateAndFilters` returns it) then this will trigger global search in `updateMetricIDsForTagFilters`. However the global search uses the same maxMetrics value which means this search is destined to fail too. I.e. the same search is performed twice and fails twice. 3. `too many timeseries` error is already handled in `searchMetricIDsInternal` and therefore handing this error in `updateMetricIDsForTagFilters` is redundant 4. updateMetricIDsForTagFilters is a better place to make a decision on whether to use per-day or global index. Solution: 1. Use a dedicated error for `too many timeseries` case 2. Handle `too many timeseries` error in `searchMetricIDsInternal` only 3. Move the per-day or global search decision from `tryUpdatingMetricIDsForDateRange` to `updateMetricIDsForTagFilters` and remove `fallback to global search` error. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-08-27 21:39:03 +02:00
rtm0	eef6943084	lib/storage: properly register index records with RegisterMetricNames Once the timeseries is in tsidCache, new entries won't be created in per-day index because the RegisterMetricNames() code does consider different dates for the same timeseries. So this case has been added. The same bug exists for AddRows() but it is not manifested because the index entries are finally created in updatePerDateData(). RegisterMetricNames also updated to increase the newTimeseriesCreated counter because it actually creates new time series in index. A unit tests has been added that check all possible data patterns (different metric names and dates) and code branches in both RegisterMetricNames and AddRows. The total number of new unit tests is around 100 which increaded the running time of storage tests by 50%. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>	2024-08-27 21:33:53 +02:00
rtm0	30f98916f9	Move rowsAddedTotal counter to Storage (#6841 ) ### Describe Your Changes Reduced the scope of rowsAddedTotal variable from global to Storage. This metric clearly belongs to a given Storage object as it counts the number of records added by a given Storage instance. Reducing the scope improves the incapsulation and allows to reset this variable during the unit tests (i.e. every time a new Storage object is created by a test, that object gets a new variable). Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-08-27 21:30:37 +02:00
Zhu Jiekun	e97e966f82	lib/promrelabel: follow-up for `8958cecad6` In the previous commit `8958cecad6` the default ports (80/443) were removed for both the `scrapeURL` and `instance` label values for those targets without a port in `__address__`. Different values in the `instance` label generate new time series. This commit reverts the changes made to the `instance` label. Now, for those targets: - `scrapeURL` will remain unchanged. - The `instance` label value will include the default port. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792	2024-08-27 13:04:26 +02:00
Nikolay	9feee15493	lib/promscrape: fixes proxy autorization (#6783 ) * Adds custom dial func for HTTP-Connect and socks5 proxy tunnels. Standard golang http.transport exposes GetProxyConnectHeader function, but it doesn't allow to use separate tls config for proxy. It also not possible to enforce HTTP-Connect with standard http lib. * For http scrape targets, by default http.Transport.Proxy function must be used. Since it has special case with full uri forward. * Adds proxy.URL json methods that allow to properly copy internal fields, like User/Password. It should fix bug with proxy_url. When credentials specified at URL was ignored. * Adds tests for scrape client proxy requests related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6771	2024-08-19 22:31:18 +02:00
Zhu Jiekun	723d834c1a	lib/promrelabel: stop adding default port 80/433 to address label * It was necessary to add default ports for fasthttp client. After migration to the std.httpclient it's no longer needed. * An additional configuration is required at proxy servers with implicitly set 80/443 ports to the host header (such as HA proxy. It's expected that after upgrade __address_ label may change. But it should be rare case. 80/443 ports are not widely used at monitoring ecosystem. And it shouldn't have much impact. Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792 Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-08-19 22:28:49 +02:00
hagen1778	febba3971b	make go vet happy Address `non-constant format string in call` check: https://github.com/golang/go/issues/60529 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-19 21:15:33 +02:00
Roman Khavronenko	e58dde6925	lib/httputils: parse URL before creating HTTP transport (#6820 ) https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6740 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-16 11:32:04 +02:00
Hui Wang	62d19369a3	stream aggregation: do not allow to enable `-stream.keepInput` and `k… (#6723 ) …eep_metric_names` options in stream aggregation config together With aggregated data and raw data under the same metric, results would be confusing. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-08-13 08:54:35 -04:00
Zhu Jiekun	9e2bd82376	app/vmagent: fixes azure service discovery pagination Azure API response with link to the next page was incorrectly validate. Validation used url.Host header to match configure API URL. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6784	2024-08-09 15:22:47 +02:00
Zakhar Bessarab	cb00b4b00f	lib/backup/s3remote: add retryer configuration (#6747 ) ### Describe Your Changes This helps to improve reliability of performing backups in environments with unreliable connection and tolerate temporary errors at S3 provider side. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6732 Default retry timeout is up to 3 minutes to make this consistent with the same configuration for GCS: `a05317f61f/lib/backup/gcsremote/gcs.go (L70-L76)` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-08-07 16:55:29 +02:00
Roman Khavronenko	f28f496a9d	lib/bytesutil: smooth buffer growth rate (#6761 ) Before, buffer growth was always x2 of its size, which could lead to excessive memory usage when processing big amount of data. For example, scraping a target with hundreds of MBs in response could result into hih memory spikes in vmagent because buffer has to double its size to fit the response. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6759 The change smoothes out the growth rate, trading higher allocation rate for lower mem usage at certain conditions. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-07 16:49:43 +02:00
hagen1778	1154f90d2d	lib/mergeset: fix typos in comments Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-07 15:54:15 +02:00
Aliaksandr Valialkin	04981c7a7f	lib/streamaggr: remove resetState arg from aggrState.flushState() The resetState arg was used only for the BenchmarkAggregatorsFlushInternalSerial benchmark. This benchmark was testing aggregate state flush performance by keeping the same state across flushes. The benhmark didn't reflect the performance and scalability of stream aggregation in production, while it led to non-trivial code changes related to resetState arg handling. So let's drop the benchmark together with all the code related to resetState handling, in order to simplify the code at lib/streamaggr a bit. Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314	2024-08-07 11:39:14 +02:00
Aliaksandr Valialkin	86c7afd126	lib/streamaggr: consistently use the same timestamp across all the output aggregated samples in a single aggregation interval Prevsiously every aggregation output was using its own timestamp for the output aggregated samples in a single aggregation interval. This could result in unexpected inconsitent timesetamps for the output aggregated samples. This commit consistently uses the same timestamp across all the output aggregated samples. This commit makes sure that the duration between subsequent timestamps strictly equals the configured aggregation interval. Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314 This commit should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580	2024-08-07 11:39:13 +02:00
Anzor	994796367b	app/vmagent: read __sample_limit__ from labels (#6665 ) (#6666 ) By introducing this feature, users will have the ability to customize the sampleLimit parameter on a per-target basis, providing more flexibility and control over the job execution behavior.	2024-08-07 09:36:14 +02:00
hagen1778	f283126084	fix typos in comments Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-06 14:54:49 +02:00
Zakhar Bessarab	9877a5e7d5	app/{vminsert,vmagent}: add healthcheck for influx ingestion endpoints (#6749 ) ### Describe Your Changes This is useful for clients which validate InfluxDB is available before data ingestion can be started. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6653 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-08-05 09:34:54 +02:00
f41gh7	2557e66ee0	Merge tag 'tags/v1.102.1' into pmm-6401-read-prometheus-data-files-cpc	2024-08-02 11:20:14 +02:00
Juraj Bubniak	11c0b05e8a	lib/backup/s3remote: fix typos (#6694 ) Fixes a few typos in errors in lib/backup/s3remote package.	2024-07-29 14:18:31 +02:00
jackyin	e5d279bb71	lib/netutil: validate TLS cert and key files immediately (#6621 ) Validate files specified via `-tlsKeyFile` and `-tlsCertFile` cmd-line flags on the process start-up. Previously, validation happened on the first connection accepted by HTTP server. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6608 --------- Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-07-29 13:58:53 +02:00
Aliaksandr Valialkin	8551fbe9f3	Revert "refactor(vmstorage): Refactor the code to reduce the time complexity of `MustAddRows` and improve readability (#6629 )" This reverts commit `e280d90e9a`. Reason for revert: the updated code doesn't improve the performance of table.MustAddRows for the typical case when rows contain timestamps belonging to ptws[0]. The performance may be improved in theory for the case when all the rows belong to partiton other than ptws[0], but this partition is automatically moved to ptws[0] by the code at lines `6aad1d43e9/lib/storage/table.go (L287-L298)` , so the next time the typical case will work. Also the updated code makes the code harder to follow, since it introduces an additional level of indirection with non-trivial semantics inside table.MustAddRows - the partition.TimeRangeInPartition() function. This function needs to be inspected and understood when reading the code at table.MustAddRows(). This function depends on minTsInRows and maxTsInRows vars, which are defined and initialized many lines above the partition.TimeRangeInPartition() call. This complicates reading and understanding the code even more. The previous code was using clearer loop over rows with the clear call to partition.HasTimestamp() for every timestamp in the row. The partition.HasTimestamp() call is used in the table.MustAddRows() function multiple times. This makes the use of partition.HasTimestamp() call more consistent, easier to understand and easier to maintain comparing to the mix of partition.HasTimestamp() and partition.TimeRangeInPartition() calls. Aslo, there is no need in documenting some hardcore software engineering refactoring at docs/CHANGLELOG.md, since the docs/CHANGELOG.md is intended for VictoriaMetrics users, who may not know software engineering. The docs/CHANGELOG.md must document user-visible changes, and the docs must be concise and clear for VictoriaMetrics users. See https://docs.victoriametrics.com/contributing/#pull-request-checklist for more details. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6629	2024-07-25 14:32:09 +02:00
Ruixiang Tan	e280d90e9a	refactor(vmstorage): Refactor the code to reduce the time complexity of `MustAddRows` and improve readability (#6629 ) ### Describe Your Changes The original logic is not only highly complex but also poorly readable, so it can be modified to increase readability and reduce time complexity. --------- Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>	2024-07-25 08:55:12 +02:00
Aliaksandr Valialkin	65ce4e30ab	lib/backup/azremote: follow-up for `5fd3aef549` - Mention that credentials can be configured via env variables at both vmbackup and vmrestore docs. - Make clear that the AZURE_STORAGE_DOMAIN env var is optional at https://docs.victoriametrics.com/vmbackup/#providing-credentials-via-env-variables - Use string literals as is for env variable names instead of indirecting them via string constants. This makes easier to read and understand the code. These environment variable names aren't going to change in the future, so there is no sense in hiding them under string constants with some other names. - Refer to https://docs.victoriametrics.com/vmbackup/#providing-credentials-via-env-variables in error messages when auth creds are improperly configured. This should simplify figuring out how to fix the error. - Simplify the code a bit at FS.newClient(), so it is easier to follow it now. While at it, remove the check when superflouos environment variables are set, since it is too fragile and it looks like it doesn't help properly configuring vmbackup / vmrestore. - Remove envLookuper indirection - just use 'func(name string) (string, bool)' type inline. This simplifies code reading and understanding. - Split TestFSInit() into TestFSInit_Failure() and TestFSInit_Success(). This simplifies the test code, so it should be easier to maintain in the future. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6518 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984	2024-07-17 17:55:06 +02:00
Aliaksandr Valialkin	eaed0465d2	all: substitute double "the the" with "the" This is a follow-up for `8786a08d27` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6600	2024-07-17 14:28:12 +02:00
Aliaksandr Valialkin	9c4b0334f2	all: consistently use stringsutil.JSONString() for formatting JSON strings with fmt.* functions instead of using "%q" formatter The %q formatter may result in incorrectly formatted JSON string if the original string contains special chars such as \x1b . They must be encoded as \u001b , otherwise the resulting JSON string cannot be parsed by JSON parsers. This is a follow-up for `c0caa69939` See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24	2024-07-17 13:52:13 +02:00
Aliaksandr Valialkin	8ff051b287	lib/protoparser/graphite: use Regex.ReplaceAllLiteralString instead of Regex.ReplaceAllString for the case when the replacement cannot contain placeholders for capturing groups This is a follow-up for `74affa3aec`	2024-07-17 13:01:05 +02:00
Aliaksandr Valialkin	74affa3aec	lib/protoparser/graphite: follow-up for `476faf5578` - Clarify the description of -graphite.sanitizeMetricName command-line flag at README.md - Do not sanitize tag values - only metric names and tag names must be sanitized, since they are treated specially by Grafana. Grafana doesn't apply any restrictions on tag values. - Properly replace more than two consecutive dots with a single dot. - Disallow unicode letters in metric names and tag names, since neither Prometheus nor Grafana do not support them. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6489 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6077	2024-07-17 12:41:55 +02:00
Aliaksandr Valialkin	58a757cd01	lib: consistently use regexp.Regexp.ReplaceAllLiteralString instead of regexp.Regexp.ReplaceAllString in places where the replacement cannot contain matching group placeholders	2024-07-17 12:41:54 +02:00
rtm0	bdc0e688e8	Fix inconsistent error handling in Storage.AddRows() (#6583 ) ### Describe Your Changes `Storage.AddRows()` returns an error only in one case: when `Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But the same error is treated as a warning when it happens inside `Storage.add()` or returned by `Storage.prefillNextIndexDB()`. This commit fixes this inconsistency by treating the error returned by `Storage.updatePerDateData()` as a warning as well. As a result `Storage.add()` does not need a return value anymore and so doesn't `Storage.AddRows()`. Additionally, this commit adds a unit test that checks all cases that result in a row not being added to the storage. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-07-17 12:07:14 +02:00
Aliaksandr Valialkin	c1e32f4517	lib/promrelabel: add test for IfExpression.String() function While at it, simplify this function a bit after the commit `861852f262` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462	2024-07-16 18:31:05 +02:00
Aliaksandr Valialkin	4304950391	lib/promscrape/discovery/yandexcloud: follow-up for `070abe5c71` - Obtain IAM token via GCE-like API instead of Amazon EC2 IMDSv2 API, since it looks like IMDBSv2 API isn't supported by Yandex Cloud according to https://yandex.cloud/en/docs/security/standard/authentication#aws-token : > So far, Yandex Cloud does not support version 2, so it is strongly recommended > to technically disable getting a service account token via the Amazon EC2 metadata service. - Try obtaining IAM token via GCE-like API at first and then fall back to the deprecated Amazon EC2 IMDBSv1. This should prevent from auth errors for instances with disabled GCE-like auth API. This addresses @ITD27M01 concern at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513#issuecomment-1867794884 - Make more clear the description of the change at docs/CHANGELOG.md , add reference to the related issue. P.S. This change wasn't tested in prod because I have no access to Yandex Cloud. It is recommended to test this change by @ITD27M01 and @vmazgo , who filed the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6524	2024-07-16 17:58:40 +02:00
Aliaksandr Valialkin	57000f5105	lib/promscrape: follow-up for `1e83598be3` - Clarify that the -promscrape.maxScrapeSize value is used for limiting the maximum scrape size if max_scrape_size option isn't set at https://docs.victoriametrics.com/sd_configs/#scrape_configs - Fix query example for scrape_response_size_bytes metric at https://docs.victoriametrics.com/vmagent/#automatically-generated-metrics - Mention about max_scrape_size option at the -help description for -promscrape.maxScrapeSize command-line flag - Treat zero value for max_scrape_size option as 'no scrape size limit' - Change float64 to int type for scrapeResponseSize struct fields and function args, since response size cannot be fractional - Optimize isAutoMetric() function a bit - Sort auto metrics in alphabetical order in isAutoMetric() and in scrapeWork.addAutoMetrics() functions for better maintainability in the future Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6434 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429	2024-07-16 12:38:21 +02:00
Aliaksandr Valialkin	7a3394bbe1	Revert "lib/protoparser/opentelemetry/firehose: escape requestID before returning it to user (#6451 )" This reverts commit `cd1aca217c`. Reason for revert: this commit has no sense, since the firehose response has application/json content-type, so it must contain JSON-encoded timestamp and requestId fields according to https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html#responseformat . HTML-escaping the requestId field may break the response, so the client couldn't correctly recognize the html-escaped requestId. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6451	2024-07-16 09:49:19 +02:00
Aliaksandr Valialkin	233e5f0a9e	lib/httpserver: skip basic auth check for additional request paths, which should call httpserver.CheckAuthFlag() This is a follow-up for `61dce6f2a1` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6338 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329	2024-07-16 01:00:45 +02:00
Aliaksandr Valialkin	784327ea30	lib/uint64set: optimize Set.Has() for nil Set - it should be inlined now This makes unnecessary the checkDeleted variable at lib/storage/index_db.go This is a follow-up for `b984f4672e` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6342	2024-07-15 23:59:20 +02:00
Aliaksandr Valialkin	832e088659	lib/mergeset: properly update TableMetrics.TooLongItemsDroppedTotal inside Table.UpdateMetrics Substitute '+=' with '=', since tooLongItemsTotal is global counter, which doesn't belong to the Table struct. This is a follow-up for `69d244e6fb` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6297	2024-07-15 23:39:10 +02:00
Aliaksandr Valialkin	a468a6e985	lib/{httputils,netutil}: move httputils.GetStatDialFunc to netutil.NewStatDialFunc - Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call - NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil - Simplify the implementation of NewStatDialFunc by removing sync.Map from there. - Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils - Use gauge instead of counter type for *_conns metric This is a follow-up for `d7b5062917` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299	2024-07-15 23:02:34 +02:00
Aliaksandr Valialkin	ad367c17bf	lib/streamaggr/streamaggr.go: typo fix after `5e29ef5ed5`: IgnoredNaNSamples -> ignoredNaNSamples	2024-07-15 21:58:56 +02:00
Aliaksandr Valialkin	db557b86ee	app/vmagent/remotewrite: follow-up for `f153f54d11` - Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go . This improves code maintainability a bit. - Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs(). This prevents from potential resource leaks. - Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation. This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions. - Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics. - Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation. This label should simplify investigation of the exposed metrics. - Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability: it is hard to use these labels in query filters and aggregation functions. - Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` . This metric shows the number of samples generated by the corresponding streaming aggregation rule. This metric has been added in the commit `861852f262` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 - Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice. This metric has been added in the commit `861852f262` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 - Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params, which could modify the behaviour of the constructed streaming aggregator. Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory. - Pass Options arg to LoadFromFile() function by reference, since this structure is quite big. This also allows passing nil instead of Options when default options are enough. - Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics, so they have consistent set of labels comparing to the rest of streaming aggregation metrics. - Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding `aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding `output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output metrics. This is a follow-up for the commit `2eb1bc4f81` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604 - Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason in the commit `2eb1bc4f81` . - Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function. While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268	2024-07-15 20:24:01 +02:00
Aliaksandr Valialkin	202e5704e6	vendor: update github.com/VictoriaMetrics/metrics from v1.34.1 to v1.35.0 Fix potential memory leaks across VictoriaMetrics codebase after metrics.UnregisterSet(s) call because of missing s.UnregisterAllMetrics() call. This is a follow-up for `6a6e34ab8e` . It is OK if some vmauth metrics aren't visible for a few microseconds when the previous metrics are unregistered and new metrics weren't registered yet. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6247 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6252 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5805	2024-07-15 10:43:37 +02:00
Aliaksandr Valialkin	c995ccad93	lib/{storage,mergeset}: do not allow setting dataFlushInterval to values smaller than pending{Items,Rows}FlushInterval Pending rows and items unconditionally remain in memory for up to pending{Items,Rows}FlushInterval, so there is no any sense in setting dataFlushInterval (the interval for guaranteed flush of in-memory data to disk) to values smaller than pending{Items,Rows}FlushInterval, since this doesn't affect the interval for flushing pending rows and items from memory to disk. This is a follow-up for `4c80b17027` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6221	2024-07-15 10:08:15 +02:00
Aliaksandr Valialkin	48ec66883a	lib/streamaggr: consistently use alphabetical order of benchmarked stream aggregation outputs	2024-07-15 09:53:19 +02:00
Aliaksandr Valialkin	5354374b62	lib/streamaggr: follow-up for `9c3d44c8c9` - Consistently enumerate stream aggregation outputs in alphabetical order across the source code and docs. This should simplify future maintenance of the corresponding code and docs. - Fix the link to `rate_sum()` at `see also` section of `rate_avg()` docs. - Make more clear the docs for `rate_sum()` and `rate_avg()` outputs. - Encapsulate output metric suffix inside rateAggrState. This eliminates possible bugs related to incorrect suffix passing to newRateAggrState(). - Rename rateAggrState.total field to less misleading rateAggrState.increase name, since it calculates counter increase in the current aggregation window. - Set rateLastValueState.prevTimestamp on the first sample in time series instead of the second sample. This makes more clear the code logic. - Move the code for removing outdated entries at rateAggrState into removeOldEntries() function. This make the code logic inside rateAggrState.flushState() more clear. - Do not write output sample with zero value if there are no input series, which could be used for calculating the rate, e.g. if only a single sample is registered for every input series. - Do not take into account input series with a single registered sample when calculating rate_avg(), since this leads to incorrect results. - Move {rate,total}AggrState.flushState() function to the end of rate.go and total.go files, so they look more similar. This shuld simplify future mantenance. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6243	2024-07-15 08:40:09 +02:00
Aliaksandr Valialkin	0145b65f25	app/vmagent/remotewrite: follow-up for `87fd400dfc` - Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage systems are configured with the disabled on-disk queue, every in-memory queue is full and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common, so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx relabeling and other processing in this case. - Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped(). Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set. In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full and on-disk queue is disabled. The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage processes pending data, so it drops aggregated samples in this case. - Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output, so it is clear that this flag can be set individually per each -remoteWrite.url. - Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems are configured with the disabled on-disk queue, then there is no sense in keeping samples on some of these systems, while dropping samples on the remaining systems, since this will result in global stall on the remote storage system with the disabled on-disk queue and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false from remotewrite.TryPush() in this case. This will result in infinite duplicate samples written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set. This allows proceeding with newly scraped / pushed samples by sending them to the remaining remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set. - Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test. - Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url. See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065	2024-07-13 02:25:19 +02:00
Aliaksandr Valialkin	0078399788	app/vmalert: switch from table-driven tests to f-tests This makes test code more clear and reduces the number of code lines by 500. This also simplifies debugging tests. See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e While at it, consistently use t.Fatal* instead of t.Error* across tests, since t.Error* requires more boilerplate code, which can result in additional bugs inside tests. While t.Error* allows writing logging errors for the same, this doesn't simplify fixing broken tests most of the time. This is a follow-up for `a9525da8a4`	2024-07-12 22:41:11 +02:00
hagen1778	2f65956259	lib/streamaggr: add missing test cases Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-07-12 11:06:45 +02:00
Hui Wang	2eb1bc4f81	vmagent: fix `vm_streamaggr_flushed_samples_total` counter (#6604 ) We use `vm_streamaggr_flushed_samples_total` to show the number of produced samples by aggregation rule, previously it was overcounted, and doesn't account for `output_relabel_configs`. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-07-12 10:56:07 +02:00
hagen1778	03e4c5c19c	lib/bakcup/azremote: follow-up after `5fd3aef549` Simplify tests by converting them to f-tests. `5fd3aef549` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-07-10 13:06:27 +02:00
justinrush	5fd3aef549	lib/backup: add support for Azure Managed Identity (#6518 ) ### Describe Your Changes These changes support using Azure Managed Identity for the `vmbackup` utility. It adds two new environment variables: * `AZURE_USE_DEFAULT_CREDENTIAL`: Instructs the `vmbackup` utility to build a connection using the [Azure Default Credential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity@v1.5.2#NewDefaultAzureCredential) mode. This causes the Azure SDK to check for a variety of environment variables to try and make a connection. By default, it tries to use managed identity if that is set up. This will close https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). ### Testing However you normally test the `vmbackup` utility using Azure Blob should continue to work without any changes. The set up for that is environment specific and not listed out here. Once regression testing has been done you can set up [Azure Managed Identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview) so your resource (AKS, VM, etc), can use that credential method. Once it is set up, update your environment variables according to the updated documentation. I added unit tests to the `FS.Init` function, then made my changes, then updated the unit tests to capture the new branches. I tested this in our environment, but with SAS token auth and managed identity and it works as expected. --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Justin Rush <jarush@epic.com> Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-07-10 11:52:05 +02:00
Aliaksandr Valialkin	ac06569c49	app/vlinsert/loki: use easyproto instead for parsing Loki protobuf messages	2024-07-10 03:05:17 +02:00
Aliaksandr Valialkin	aa9bb99527	lib/logstorage: drop all the pipes from the query when calculating the number of matching logs at /select/logsql/hits API	2024-07-10 00:39:28 +02:00
Aliaksandr Valialkin	3c02937a34	all: consistently use 'any' instead of 'interface{}' 'any' type is supported starting from Go1.18. Let's consistently use it instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.	2024-07-10 00:20:37 +02:00
Aliaksandr Valialkin	a9525da8a4	lib: consistently use f-tests instead of table-driven tests This makes easier to read and debug these tests. This also reduces test lines count by 15% from 3K to 2.5K See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e While at it, consistently use t.Fatal* instead of t.Error, since t.Error usually leads to more complicated and fragile tests, while it doesn't bring any practical benefits over t.Fatal*.	2024-07-09 22:40:50 +02:00
Aliaksandr Valialkin	35b3b95cbc	lib/promscrape/discovery/vultr: follow-up after `17e3d019d2` - Sort the discovered labels in alphabetical order at https://docs.victoriametrics.com/sd_configs/#vultr_sd_configs - Rename VultrConfigs to VultrSDConfigs to be consistent with the naming for other SD configs. - Prepare query arg filters for `list instances API` at newAPIConfig() instead of passing them in a separate listParams struct. This simplifies the code a bit. - Return error when bearer token isn't set at vultr_sd_configs, since this token is mandatory according to https://docs.victoriametrics.com/sd_configs/#vultr_sd_configs - Remove unused fields from the parsed response from Vultr list instances API in order to simplify the code a bit. - Remove double logging of errors inside getInstances() function, since these errors must be already logged by the caller. - Simplify tests, so they are easier to maintain. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6068	2024-07-05 17:40:03 +02:00
Aliaksandr Valialkin	c0caa69939	lib/logstorage: use quicktemplate.AppendJSONString instead of strconv.AppendQuote for encoding JSON strings The strconv.AppendQuote improperly encodes special chars such as \x1b . They must be encoded as \u001b . See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24	2024-07-05 01:22:23 +02:00
Aliaksandr Valialkin	2da7dfc754	Revert `c6c5a5a186` and `b2765c45d0` Reason for revert: There are many statsd servers exist: - https://github.com/statsd/statsd - classical statsd server - https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ ) - https://github.com/avito-tech/bioyino - high-performance statsd server - https://github.com/atlassian/gostatsd - statsd server in Go - https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd ( the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ). Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides significant advantages over the existing statsd servers, while has no significant drawbacks comparing to existing statsd servers. The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server. The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics ( see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers. So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent). Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation. In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation. This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed during querying. P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic specialized configuration for aggregating of statsd metrics. The main requirements for this configuration: - easy to write, read and update (ideally it should work out of the box for most cases without additional configuration) - hard to misconfigure (e.g. hard to shoot yourself in the foot) It would be great if this configuration will be compatible with the configuration of the most widely used statsd server. In the mean time it is recommended continue using external statsd server. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600	2024-07-03 23:51:56 +02:00
Aliaksandr Valialkin	d8c7cc266b	lib/promscrape: use prompbmarshal.MustParsePromMetrics function at parseData() test function The prompbmarshal.MustParsePromMetrics function has been added in the commit `cc4d57d650`	2024-07-03 16:08:13 +02:00
Aliaksandr Valialkin	bb00bae353	Revert "Exemplar support (#5982 )" This reverts commit `5a3abfa041`. Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below). Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ . Issues with Prometheus exemplars: - Prometheus still has only experimental support for exemplars after more than three years since they were introduced. It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature. See `0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)` and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage - It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus . - It looks like exemplars are supported for Histogram metric types only - see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar . Exemplars aren't supported for Counter, Gauge and Summary metric types. - Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query contains histogram_quantile() function. It queries exemplars via special Prometheus API - https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.) and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph. This makes the graph unusable, since it is litterally filled with thousands of exemplar dots. Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars. - Exemplars are usually connected to traces. While traces are good for some I doubt exemplars will become production-ready in the near future because of the issues outlined above. Alternative to exemplars: Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs - just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry! This doesn't work as expected in production as shown above. Are there better solutions, which work in production? Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job` and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry by these labes on the time range with the anomaly on metrics' graph.	2024-07-03 15:30:21 +02:00
Aliaksandr Valialkin	cc4d57d650	app/vmagent/remotewrite,lib/streamaggr: re-use common code in tests after `879771808b` - Export streamaggr.LoadFromData() function, so it could be used in tests outside the lib/streamaggr package. This allows removing a hack with creation of temporary files at TestRemoteWriteContext_TryPush_ImmutableTimeseries. - Move common code for mustParsePromMetrics() function into lib/prompbmarshal package, so it could be used in tests for building []prompbmarshal.TimeSeries from string. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6206	2024-07-03 15:21:36 +02:00
Aliaksandr Valialkin	f17b408643	lib/streamaggr: follow-up for the commit `c0e4ccb7b5` - Clarify docs for `Ignore aggregation intervals on start` feature. - Make more clear the code dealing with ignoreFirstIntervals at aggregator.runFlusher() functions. It is better from readability and maintainability PoV using distinct a.flush() calls for distinct cases instead of merging them into a single a.flush() call. - Take into account the first incomplete interval when tracking the number of skipped aggregation intervals, since this behaviour is easier to understand by the end users. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6137	2024-07-02 21:24:50 +02:00
Andrii Chubatiuk	476faf5578	lib/protoparser/graphite: added -graphite.sanitizeMetricName flag (#6489 ) ### Describe Your Changes Added flag to sanitize graphite metrics fixes #6077 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-07-02 14:56:41 +02:00
Aliaksandr Valialkin	3b6c78c26c	lib/logstorage: allow writing `after N` in front of `before N` at `stream_context` pipe	2024-07-02 01:38:20 +02:00
Andrii Chubatiuk	861852f262	lib/streamaggr: added stale samples metric, added metrics labels (#6462 ) ### Describe Your Changes - added stale metrics counters for input and output samples - added labels for aggregator metrics => `name="{rwctx}:{aggrId}:{aggrSuffix}"` - rwctx - global or number starting from 1 - aggrid - aggregator id starting from 1 - aggrSuffix - <interval>_(by\|without)_label1_label2_labeln e.g: `name="global:1:1m_without_instance_pod"` ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-07-01 14:56:17 +02:00
Aliaksandr Valialkin	6bb66cb3e9	lib/logstorage: properly search for the surrounding logs in `stream_context` pipe The set of log fields in the found logs may differ from the set of log fields present in the log stream. So compare only the log fields in the found logs when searching for the matching log entry in the log stream. While at it, return _stream field in the delimiter log entry, since this field is used by VictoriaLogs Web UI for grouping logs by log streams.	2024-07-01 02:29:50 +02:00
Aliaksandr Valialkin	bb0deb7ac4	lib/logstorage: add ability to store sorted log position into a separate field with `sort ... rank <fieldName>` syntax	2024-07-01 01:44:17 +02:00
Aliaksandr Valialkin	dc291d8980	lib/logstorage: add delimiter between log chunks returned from `\| stream_context` pipe	2024-07-01 01:30:37 +02:00
Aliaksandr Valialkin	d4ca651547	lib/logstorage: add `stream_context` pipe, which allows selecting surrounding logs for the matching logs	2024-06-28 19:14:29 +02:00
Aliaksandr Valialkin	0730f1324d	lib/logstorage: it is safe using `\| unroll` pipe in live tailing `\| unroll` pipe can make multiple copies of rows from the input row. This doesn't break live tailing, so allow `\| unroll` pipe in live tailing.	2024-06-27 19:44:57 +02:00
Aliaksandr Valialkin	7c8c040502	app/vlselect: properly return live tailing results	2024-06-27 15:05:57 +02:00
Aliaksandr Valialkin	87f1c8bd6c	lib/logstorage: work-in-progress	2024-06-27 14:20:43 +02:00
Andrii Chubatiuk	070abe5c71	added IMDSv2 for YC SD (#6524 ) ### Describe Your Changes Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-06-26 18:03:21 +02:00
rtm0	a42bd59ee4	Fix Date metricid cache consistency under concurrent use (#6534 ) ### Describe Your Changes Fix Date metricid cache consistency under concurrent use. When one goroutine calls Has() and does not find the cache entry in the immutable map it will acquire a lock and check the mutable map. And it is possible that before that lock is acquired, the entry is moved from the mutable map to the immutable map by another goroutine causing a cache miss. The fix is to check the immutable map again once the lock is acquired. ### Checklist The following checks are mandatory: - [x ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-06-26 17:33:38 +02:00
Aliaksandr Valialkin	dff5008392	app/vlstorage: add -retention.maxDiskSpaceUsageBytes command-line flag for limiting the retention at VictoriaLogs by disk space usage	2024-06-25 17:30:33 +02:00
Aliaksandr Valialkin	3eacd43fff	lib/logstorage: parse syslog structured data into separate fields in order to simplify further querying of this data	2024-06-25 14:53:39 +02:00
Aliaksandr Valialkin	9e1c037249	lib/logstorage: properly parse timezone offset at TryParseTimestampRFC3339Nano() The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets. While at it, make tryParseTimestampISO8601 function private in order to prevent from improper usage of this function from outside the lib/logstorage package. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508	2024-06-25 14:53:38 +02:00
Aliaksandr Valialkin	7252c5d258	lib/logstorage: make golangci-lint happy	2024-06-25 03:04:21 +02:00
Aliaksandr Valialkin	82d639411d	lib/httpserver: revert `9b7e532172` Reason for revert: this commit doesn't resolve real security issues, while it complicates the resulting code in subtle ways (aka security circus). Comparison of two strings (passwords, auth keys) takes a few nanoseconds. This comparison is performed in non-trivial http handler, which takes thousands of nanoseconds, and the request handler timing is non-deterministic because of Go runtime, Go GC and other concurrently executed goroutines. The request handler timing is even more non-deterministic when the application is executed in shared environments such as Kubernetes, where many other applications may run on the same host and use shared resources of this host (CPU, RAM bandwidth, network bandwidth). Additionally, it is expected that the passwords and auth keys are passed via TLS-encrypted connections. Establishing TLS connections takes additional non-trivial time (millions of nanoseconds), which depends on many factors such as network latency, network congestion, etc. This makes impossible to conduct timing attack on passwords and auth keys in VictoriaMetrics components. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6423/files Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6392	2024-06-25 01:36:12 +02:00
Aliaksandr Valialkin	de7450b7e0	lib/logstorage: work-in-progress	2024-06-24 23:27:12 +02:00
Andrii Chubatiuk	1e83598be3	app/vmagent: add max_scrape_size to scrape config (#6434 ) Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-06-20 13:58:42 +02:00
Slava Bobik	d236604d39	Fixed a typo in the FastQueue mutex comment (#6514 ) ### Describe Your Changes Fixed a small typo in a comment about the mutex inside the FastQueue struct ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-06-20 02:30:36 -07:00
Aliaksandr Valialkin	7229dd8c33	lib/logstorage: work-in-progress	2024-06-20 03:10:08 +02:00
Zakhar Bessarab	201fd6de1e	lib/fs/fscore: do not trim content from path (#6503 ) ### Describe Your Changes Trimming content which is loaded from an external pass leads to obscure issues in case user-defined input contained trimmed chars. For example. user-defined password "foo\n" will become "foo" while user will expect it to contain a new line. --- For example, a user defines a password which ends with `\n`. This often happens when user Kubernetes secrets and manually encodes value as base64-encoded string. In this case vmauth configuration might look like: ``` users: - url_prefix: - http://vminsert:8480/insert/0/prometheus/api/v1/write name: foo username: foo password: "foobar\n" ``` vmagent configuration for this setup will use the following flags: ``` -remoteWrite.url=http://vmauth:8427/ -remoteWrite.basicAuth.passwordFile=/tmp/vmagent-password -remoteWrite.basicAuth.username="foo" ``` Where `/tmp/vmagent-password` is a file with `foobar\n` password. Before this change such configuration will result in `401 Unauthorized` response received by vmagent since after file content will become `foobar`. --- An example with Kubernetes operator which uses a secret to reference the same password in multiple configurations. <details> <summary>See full manifests</summary> `Secret`: ``` apiVersion: v1 data: name: Zm9v # foo password: Zm9vYmFy # foobar\n username: Zm9v= # foo kind: Secret metadata: name: vmuser ``` `VMUser`: ``` apiVersion: operator.victoriametrics.com/v1beta1 kind: VMUser metadata: name: vmagents spec: generatePassword: false name: vmagents targetRefs: - crd: kind: VMAgent name: some-other-agent namespace: example username: foo # note - the secret above is referenced to provide password passwordRef: name: vmagent key: password ``` `VMAgent`: ``` apiVersion: operator.victoriametrics.com/v1beta1 kind: VMAgent metadata: name: example spec: selectAllByDefault: true scrapeInterval: 5s replicaCount: 1 remoteWrite: - url: "http://vmauth-vmauth-example:8427/api/v1/write" # note - the secret above is referenced as well basicAuth: username: name: vmagent key: username password: name: vmagent key: password ``` </details> Since both config target exactly the same `Secret` object it is expected to work, but apparently the result will be `401 Unauthrized` error. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-06-19 10:31:48 +02:00
Nihal	9b7e532172	victoria-metrics: constant-time comparison of credentials like authkeys and basic auth credentials (#6423 ) Changes for constant-time comparison of credentials like authkeys and basic auth credentials. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6392 --------- Signed-off-by: Syed Nihal <syed.nihal@nokia.com>	2024-06-19 09:36:56 +02:00
Aliaksandr Valialkin	e498fa6960	app/vlinsert/syslog: allow accepting syslog messages with different configs at different ports	2024-06-17 23:16:34 +02:00
hagen1778	34771ab293	lib/streamaggr: remove accidentally committed changes Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-17 14:24:54 +02:00
Roman Khavronenko	6149adbe10	app/vmselect/promql: check for ranged vectors in aggr funcs if implicit conversions are disabled (#6450 ) Check for ranged vector arguments in aggregate expressions when `-search.disableImplicitConversion` or `-search.logImplicitConversion` are enabled. For example, `sum(up[5m])` will fail to execute if these flags are set. ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [*] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-17 14:21:16 +02:00
Aliaksandr Valialkin	2b6a634ec0	lib/logstorage: work-in-progress	2024-06-17 12:13:18 +02:00
Andrii Chubatiuk	faf67aa8b5	lib/flagutil: use month limit for duration flag for parsed duration assessment (#6486 ) use maxMonths limit for parsed duration flag value https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6330 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-06-14 15:20:21 +02:00
Andrii Chubatiuk	e678a9aa51	lib/backup/s3remote: fixed credsFilePath flag (#6488 ) properly use credsFilePath flag value https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6353 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-06-14 14:13:02 +02:00
Roman Khavronenko	51d19485bb	lib/streamaggr: prevent `rate_sum` and `rate_avg` from producing NaNs (#6482 ) ### Describe Your Changes * check if `lastValue` was seen at least twice with different timestamps. Otherwise, the difference between last timestamp and previous timestamp could be `0` and will result into `NaN` calculation * check if there items left in lastValue map after staleness cleanup. Otherwise, `rate_avg` could have produce `NaN` result. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-14 10:06:22 +02:00
Aliaksandr Valialkin	1c094d928c	lib/leveledbytebufferpool: do not pool byte slices bigger than 2^18 bytes Previously byte slices up to 2^20 bytes (e.g. 1Mb) were cached because of a typo in the commit `c14dafce43` . This could result in increased memory usage when vmagent scrapes many regular targets, which expose relatively small number of metrics (e.g. up to a few thousand per target) and a few large targets such as kube-state-metrics, which expose more than 10 thousand metrics. This is common case for Kubernetes monitoring. While at it, remove pools for very small byte slices, since they are rarely used during scraping.	2024-06-13 16:56:25 +02:00
Aliaksandr Valialkin	d54840f2f2	lib/bytesutil: optimize internStringMap cleanup - Make it in a separate goroutine, so it doesn't slow down regular intern() calls. - Do not lock internStringMap.mutableLock during the cleanup routine, since now it is called from a single goroutine and reads only the readonly part of the internStringMap. This should prevent from locking regular intern() calls for new strings during cleanups. - Add jitter to the cleanup interval in order to prevent from synchornous increase in resource usage during cleanups. - Run the cleanup twice per -internStringCacheExpireDuration . This should save 30% CPU time spent on cleanup comparing to the previous code, which was running the cleanup 3 times per -internStringCacheExpireDuration .	2024-06-13 15:06:51 +02:00
Zakhar Bessarab	34071ac660	lib/promscrape: increase default value for promscrape.maxDroppedTargets to 10_000 (#6459 ) ### Describe Your Changes This limit can be increased since after `4513893ead` tracking of dropped targets uses much less memory per entry. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6381#issuecomment-2156708228 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-06-12 16:34:18 +02:00
LHHDZ	3a45bbb4e0	app/vmauth: fix discovering backend IPs when `url_prefix` contains hostname with `srv+` prefix (#6401 ) This change fixes the following panic: ``` 2024-06-04T11:16:52.899Z warn app/vmauth/auth_config.go:353 cannot discover backend SRV records for http://srv+localhost:8080: lookup localhost on 10.100.10.4:53: server misbehaving; use it literally panic: runtime error: integer divide by zero goroutine 9 [running]: github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper.func1() /Users/lhhdz/wd/projects/go/VictoriaMetrics/lib/httpserver/httpserver.go:291 +0x58 panic({0x103115100?, 0x10338d700?}) /Users/lhhdz/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.darwin-arm64/src/runtime/panic.go:770 +0x124 main.getLeastLoadedBackendURL({0x0?, 0x22?, 0x1400014757b?}, 0x1400013c120?) /Users/lhhdz/wd/projects/go/VictoriaMetrics/app/vmauth/auth_config.go:473 +0x210 main.(*URLPrefix).getBackendURL(0x140000aa080) /Users/lhhdz/wd/projects/go/VictoriaMetrics/app/vmauth/auth_config.go:312 +0xb8 ``` --------- Co-authored-by: Haley Wang <haley@victoriametrics.com>	2024-06-12 12:30:44 +02:00
Aliaksandr Valialkin	8f5dc966f6	lib/logstorage: work-in-progress	2024-06-11 17:50:32 +02:00
Aliaksandr Valialkin	65a97317e4	lib/streamaggr: prevent from data race inside dedupAggrShard when samplesBuf can be updated in pushSamples() while their values are read in the flush() loop without das.mu lock This issue has been introduced in the commit `253c0cffbe`	2024-06-11 17:31:16 +02:00
Aliaksandr Valialkin	0521e58a09	lib/logstorage: work-in-progress	2024-06-10 18:42:19 +02:00
Aliaksandr Valialkin	bf2d299420	lib/streamaggr: return back string interning to dedupAggr after 78953723200f15ffc417064d1912bdbb7551505c It should reduce memory allocation rate during stream deduplication	2024-06-10 18:05:42 +02:00
Aliaksandr Valialkin	6a0a36aa93	lib/bytesutil: reduce the number of memory allocations per each interned string in bytesutil.InternString() from 5 to 1 This should reduce GC overhead when tens of millions of strings are interned (for example, during stream deduplication of millions of active time series).	2024-06-10 18:05:41 +02:00
Roman Khavronenko	cd1aca217c	lib/protoparser/opentelemetry/firehose: escape requestID before returning it to user (#6451 ) All user input should be sanitized before rendering. This should prevent possible attacks. See https://github.com/VictoriaMetrics/VictoriaMetrics/security/code-scanning/203 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-10 16:55:59 +02:00
Aliaksandr Valialkin	253c0cffbe	lib/streamaggr: reduce memory allocations by using dedupAggrSample buffer per each dedupAggrShard	2024-06-10 16:38:42 +02:00
Aliaksandr Valialkin	a1e8003754	lib/streamaggr: reduce the number of duplicates per each sample in BenchmarkDedupAggr from 100 to 2 This is closer to typical production setups when deduplication is used for de-duplicating of 2 samples per series.	2024-06-10 16:38:41 +02:00
Aliaksandr Valialkin	0b7c47a40c	lib/streamaggr: use strings.Clone() instead of bytesutil.InternString() for creating series key in dedupAggr Our internal testing shows that this reduces GC overhead when deduplicating tens of millions of active series.	2024-06-10 16:08:34 +02:00
Aliaksandr Valialkin	e8bb4359bb	lib/streamaggr: improve performance for dedupAggr.sizeBytes() and dedupAggr.itemsCount() These functions are called every time `/metrics` page is scraped, so it would be great if they could be sped up for the cases when dedupAggr tracks tens of millions of active time series.	2024-06-10 15:59:37 +02:00
Aliaksandr Valialkin	f45d02a243	lib/streamaggr: remove flushState arg at dedupAggr.flush(), since it is always set to true in production	2024-06-10 15:59:33 +02:00
Hui Wang	61dce6f2a1	lib/httpserver: allow reloadAuthKey and configAuthKey to override htt… (#6338 ) …pAuth.* address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329, makes `reloadAuthKey`, `configAuthKey`, `flagsAuthKey`, `pprofAuthKey` behavior the same way, but keys like `-snapshotAuthKey`, `-forceMergeAuthKey` are still protected by httpAuth.*. All the available key are listed in https://docs.victoriametrics.com/single-server-victoriametrics/#security. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-06-10 12:09:47 +02:00
Aliaksandr Valialkin	36be090cd5	lib/streamaggr: follow-up for `7cb894a777` - Use bytesutil.InternString() instead of strings.Clone() for inputKey and outputKey in aggregatorpushSamples(). This should reduce string allocation rate, since strings can be re-used between aggrState flushes. - Reduce memory allocations at dedupAggrShard by storing dedupAggrSample by value in the active series map. - Remove duplicate call to bytesutil.InternBytes() at Deduplicator, since it is already called inside dedupAggr.pushSamples(). - Add missing string interning at rateAggrState.pushSamples(). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6402	2024-06-07 16:27:26 +02:00
Roman Khavronenko	7cb894a777	lib/streamaggr: reduce number of inuse objects (#6402 ) The main change is getting rid of interning of sample key. It was discovered that for cases with many unique time series aggregated by vmagent interned keys could grow up to hundreds of millions of objects. This has negative impact on the following aspects: 1. It slows down garbage collection cycles, as GC has to scan all inuse objects periodically. The higher is the number of inuse objects, the longer it takes/the more CPU it takes. 2. It slows down the hot path of samples aggregation where each key needs to be looked up in the map first. The change makes code more fragile, but suppose to provide performance optimization for heavy-loaded vmagents with stream aggregation enabled. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-06-07 15:45:52 +02:00
Roman Khavronenko	5f46f8a11d	lib/promrelabel: speedup label match by `__name__` (#6432 ) The change adds a fastpath for `equalValue` comparisons against `__name__` label by avoiding calls to `toCanonicalLabelName` func. This speedups matches by metric name like `'foo'`. See bench stats below: ``` benchcmp old.txt new.txt benchmark old ns/op new ns/op delta BenchmarkIfExpression/equal_label:_last-10 35.6 35.1 -1.18% BenchmarkIfExpression/equal_label:_middle-10 18.3 17.3 -5.41% BenchmarkIfExpression/equal_label:_first-10 1.20 1.24 +2.74% BenchmarkIfExpression/equal___name__:_last-10 10.1 4.96 -50.75% BenchmarkIfExpression/equal___name__:_middle-10 5.79 3.16 -45.41% BenchmarkIfExpression/equal___name__:_first-10 1.17 1.05 -9.76% ``` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-07 15:44:48 +02:00
Andrii Chubatiuk	185fac03b3	lib/streamaggr: metrics to track dropped, nan samples and samples lag (#6358 ) ### Describe Your Changes Added streamaggr metrics to: - `vm_streamaggr_samples_lag_seconds` - samples lag - `vm_streamaggr_ignored_samples_total{reason="nan"}` - ignored NaN samples - `vm_streamaggr_ignored_samples_total{reason="too_old"}` - ignored old samples	2024-06-06 14:06:11 +02:00
Aliaksandr Valialkin	55d8379ae6	lib/logstorage: work-in-progress	2024-06-06 12:27:05 +02:00
Aliaksandr Valialkin	80a7c65ab7	lib/logstorage: allow using `eval` keyword instead of `math` keyword in `math` pipe	2024-06-05 10:07:49 +02:00
Aliaksandr Valialkin	43cf221681	lib/logstorage: work-in-progress	2024-06-05 03:18:12 +02:00
pludov	3ddae77c63	lib/fs: support NFS implementations that return EEXIST instead of ENOTEMPTY (#6398 ) ### Describe Your Changes Fix for issue #6396: according to rmdir manpage, ENOTEMPTY and EEXIST should be treated equally https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6396 ### Checklist The following checks are mandatory: - [x ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Co-authored-by: Ludovic Pollet <ludovic.pollet@exfo.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-06-04 15:17:38 +02:00
Aliaksandr Valialkin	96c29ab403	lib/logstorage: allow typing `asc` in `sort` pipe for the sake of consistency with `desc`	2024-06-04 02:29:10 +02:00
Aliaksandr Valialkin	539fce9227	lib/logstorage: work-in-progress	2024-06-04 01:49:02 +02:00
Aliaksandr Valialkin	b30e80b071	lib/logstorage: work-in-progress	2024-05-30 16:19:23 +02:00
Roman Khavronenko	b984f4672e	lib/storage: filter deleted label names and values from `/api/v1/labe… (#6342 ) …ls` and `/api/v1/label/.../values` Check for deleted metrics when `match[]` filter matches small number of time series (optimized path). The issue was introduced [v1.81.0](https://docs.victoriametrics.com/changelog_2022/#v1810). Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6300 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-29 14:07:44 +02:00
Aliaksandr Valialkin	1de187bcb7	lib/logstorage: work-in-progress	2024-05-29 01:52:13 +02:00
Aliaksandr Valialkin	0aafca29be	lib/logstorage: work-in-progress	2024-05-28 19:29:41 +02:00
Aliaksandr Valialkin	99138e15c0	lib/logstorage: fix golangci-lint warnings	2024-05-26 02:01:32 +02:00
Aliaksandr Valialkin	1e203f35f7	lib/logstorage: work-in-progress	2024-05-26 01:55:21 +02:00
Aliaksandr Valialkin	7ac529c235	lib/logstorage: work-in-progress	2024-05-25 22:59:13 +02:00
Aliaksandr Valialkin	0b629ce5a5	lib/logstorage: re-use per-shard fields across processed blocks in pipePackJSON and pipeUnroll	2024-05-25 22:13:32 +02:00
Aliaksandr Valialkin	dc55146752	lib/logstorage: work-in-progress	2024-05-25 21:36:16 +02:00
Aliaksandr Valialkin	e2590f0485	lib/logstorage: work-in-progress	2024-05-25 00:30:58 +02:00
Nikolay	69d244e6fb	lib/mergeset: adds tracking for indexdb records drop (#6297 ) It allows to create alert for possible item drops at indexdb. It may happen, if ingested metric size exceeds max indexdb item size. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-05-24 14:55:20 +02:00
Aliaksandr Valialkin	4b458370c1	lib/logstorage: work-in-progress	2024-05-24 03:06:55 +02:00
Nikolay	a5d1013042	lib/storage: change default value for maxLabelValueLen to 1024 (#6313 ) * It must reduce memory usage for misbehaving clients. Since VictoriaMetrics stores sparse index inmemory. * Reduce disk space usage for indexdb. * Prevent possible indexDB items drops. * It may trigger slow insert and new timeseries registration due to default value for flag change https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6176 --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-05-22 21:53:53 +02:00
Alexander Marshalov	7da541360e	[vmlogs] fixed time parsing with millisecond precision time (#6293 ) (#6295 ) fix for #6293 Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-05-22 21:46:50 +02:00
Aliaksandr Valialkin	22107421eb	lib/logstorage: work-in-progress	2024-05-22 21:01:20 +02:00
Roman Khavronenko	ac836bcf6c	lib/backup: add `-s3TLSInsecureSkipVerify` command-line flag (#6318 ) * The new flag can be used for for skipping TLS certificates verification when connecting to S3 endpoint. Affects vmbackup, vmrestore, vmbackupmanager. * replace deprecated `EndpointResolver` with `BaseEndpoint` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1056 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-22 13:58:39 +02:00
Hui Wang	d7b5062917	app/vmalert: support DNS SRV record in `-remoteWrite.url` (#6299 ) part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053, supports [DNS SRV](https://en.wikipedia.org/wiki/SRV_record) address in `-remoteWrite.url` command-line option.	2024-05-22 10:52:51 +02:00
Roman Khavronenko	7ce052b32d	lib/streamaggr: skip empty aggregators (#6307 ) Prevent excessive resource usage when stream aggregation config file contains no matchers by prevent pushing data into Aggregators object. Before this change a lot of extra work was invoked without reason. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-20 14:03:28 +02:00
Aliaksandr Valialkin	bc4a0b8f37	lib/logstorage: fix golangci-lint warnings	2024-05-20 11:04:12 +02:00
Aliaksandr Valialkin	ad505a7a9a	lib/logstorage: work-in-progress	2024-05-20 04:08:30 +02:00
Andrii Chubatiuk	f153f54d11	app/vmagent: add global aggregator (#6268 ) Add global stream aggregation for VMAgent https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467	2024-05-17 14:00:47 +02:00
Nikolay	b2765c45d0	follow-up for `c6c5a5a186` (#6265 ) * adds datadog extensions for statsd: - multiple packed values (v1.1) - additional types distribution, histogram * adds type check and append metric type to the labels with special tag name `__statsd_metric_type__`. It simplifies streaming aggregation config. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-05-16 09:25:42 +02:00
Aliaksandr Valialkin	0aa19a2837	lib/logstorage: work-in-progress	2024-05-15 04:55:44 +02:00
Aliaksandr Valialkin	b617dc9c0b	lib/streamaggr: properly return output key from getOutputKey The bug has been introduced in `cc2647d212`	2024-05-14 17:47:21 +02:00
Aliaksandr Valialkin	da3af090c6	lib/logstorage: work-in-progress	2024-05-14 03:05:03 +02:00
Aliaksandr Valialkin	cb35e62e04	lib/logstorage: work-in-progress Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6258	2024-05-14 01:49:23 +02:00
Aliaksandr Valialkin	cc2647d212	lib/encoding: optimize UnmarshalVarUint64, UnmarshalVarInt64 and UnmarshalBytes a bit Change the return values for these functions - now they return the unmarshaled result plus the size of the unmarshaled result in bytes, so the caller could re-slice the src for further unmarshaling. This improves performance of these functions in hot loops of VictoriaLogs a bit.	2024-05-14 01:23:54 +02:00
Aliaksandr Valialkin	707f3a69db	lib/stringsutil: add LessNatural() function for natural sorting Natural sorting is needed for sort_by_label_natural() and sort_by_label_natural_desc() functions in MetricsQL - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6192 and https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6256 Natural sorting will be also used by `\| sort ...` pipe in VictoriaLogs - see https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe	2024-05-13 16:56:47 +02:00
Hui Wang	4c80b17027	storage: correctly apply `-inmemoryDataFlushInterval` when it's set t… (#6221 ) …o minimum supported value 1s pendingRowsFlushInterval was bumped to 2s in `73f0a805e2`	2024-05-13 16:44:30 +02:00
Andrii Chubatiuk	ce25d68b45	lib/streamaggr: added rate_sum and rate_avg to benchmarks, lint fix (#6264 ) fixed lint for rate outputs	2024-05-13 16:40:37 +02:00
Andrii Chubatiuk	9c3d44c8c9	lib/streamaggr: added rate and rate_avg output (#6243 ) Added `rate` and `rate_avg` output Resource usage is the same as for increase output, tested on a benchmark --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-05-13 15:39:49 +02:00
hagen1778	17283fab6c	lib/logstorage: make linter happy Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-13 15:35:11 +02:00
Aliaksandr Valialkin	9dbd0f9085	lib/logstorage: initial implementation of pipes in LogsQL See https://docs.victoriametrics.com/victorialogs/logsql/#pipes	2024-05-12 16:33:31 +02:00
Aliaksandr Valialkin	e66465cb03	lib/encoding: optimizing UnmarshalVarUint64 and UnmarshalVarInt64 a bit	2024-05-12 16:32:11 +02:00
Aliaksandr Valialkin	590160ddbb	lib/slicesutil: add helper functions for setting slice length and extending its capacity The added helper functions - SetLength() and ExtendCapacity() - replace error-prone code with simple function calls.	2024-05-12 11:32:17 +02:00
Aliaksandr Valialkin	f20d452196	lib/storage: remove outdated misleading comments	2024-05-12 10:24:04 +02:00
Roman Khavronenko	87fd400dfc	Feature allow configuring disableOnDiskQueue and dropSamplesOnOverload per url (#6248 ) * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): allow configuring `-remoteWrite.disableOnDiskQueue` and `-remoteWrite.dropSamplesOnOverload` cmd-line flags per each `-remoteWrite.url`. See this [pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065). Thanks to @rbizos for implementaion! * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add labels `path` and `url` to metrics `vmagent_remotewrite_push_failures_total` and `vmagent_remotewrite_samples_dropped_total`. Now number of failed pushes and dropped samples can be tracked per `-remoteWrite.url`. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Raphael Bizos <r.bizos@criteo.com>	2024-05-10 12:09:21 +02:00
Roman Khavronenko	8a03e987cb	lib/streamaggr: set correct suffix `<output>_prometheus` (#6228 ) Set correct suffix `<output>_prometheus` for aggregation outputs `increase_prometheus` and `total_prometheus` Before, outputs `total` and `total_prometheus` or `increase` and `increase_prometheus` had the same suffix. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-08 13:11:30 +02:00
Andrii Chubatiuk	a9283e06a3	streamaggr: made labels compressor shared (#6173 ) Though labels compressor is quite resource intensive, each aggregator and deduplicator instance has it's own compressor. Made it shared across all aggregators to consume less resources while using multiple aggregators. Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>	2024-05-08 13:10:53 +02:00
Zhu Jiekun	17e3d019d2	feature: [vmagent] Add service discovery support for Vultr (#6068 ) ### Describe Your Changes related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041 #### Added - Added service discovery support for Vultr. #### Docs - `CHANGELOG.md`, `sd_configs.md`, `vmagent.md` are updated. #### Note - Useful links: - Vultr API: https://www.vultr.com/api/#tag/instances/operation/list-instances - Vultr client SDK: https://github.com/vultr/govultr - Prometheus SD: https://github.com/prometheus/prometheus/tree/main/discovery/vultr --- ### Checklist The following checks are mandatory: - [X] I have read the [Contributing Guidelines](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/CONTRIBUTING.md) - [x] All commits are signed and include `Signed-off-by` line. Use `git commit -s` to include `Signed-off-by` your commits. See this [doc](https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work) about how to sign your commits. - [x] Tests are passing locally. Use `make test` to run all tests locally. - [x] Linting is passing locally. Use `make check-all` to run all linters locally. Further checks are optional for External Contributions: - [X] Include a link to the GitHub issue in the commit message, if issue exists. - [x] Mention the change in the [Changelog](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/CHANGELOG.md). Explain what has changed and why. If there is a related issue or documentation change - link them as well. Tips for writing a good changelog message:: * Write a human-readable changelog message that describes the problem and solution. * Include a link to the issue or pull request in your changelog message. * Use specific language identifying the fix, such as an error message, metric name, or flag name. * Provide a link to the relevant documentation for any new features you add or modify. - [ ] After your pull request is merged, please add a message to the issue with instructions for how to test the fix or try the feature you added. Here is an [example](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4048#issuecomment-1546453726) - [x] Do not close the original issue before the change is released. Please note, in some cases Github can automatically close the issue once PR is merged. Re-open the issue in such case. - [x] If the change somehow affects public interfaces (a new flag was added or updated, or some behavior has changed) - add the corresponding change to documentation. Signed-off-by: Jiekun <jiekun.dev@gmail.com>	2024-05-08 10:01:48 +02:00
Oleg	c6c5a5a186	Statsd protocol compatibility (#5053 ) In this PR I added compatibility with [statsd protocol](https://github.com/b/statsd_spec) with tags to be able to send metrics directly from statsd clients to vmagent or directly to VM. For example its compatible with [statsd-instrument](https://github.com/Shopify/statsd-instrument) and [dogstatsd-ruby](https://github.com/DataDog/dogstatsd-ruby) gems Related issues: #5052, #206, #4600	2024-05-07 21:46:08 +02:00
Ted Possible	5a3abfa041	Exemplar support (#5982 ) This code adds Exemplars to VMagent and the promscrape parser adhering to OpenMetrics Specifications. This will allow forwarding of exemplars to Prometheus and other third party apps that support OpenMetrics specs. --------- Signed-off-by: Ted Possible <ted_possible@cable.comcast.com>	2024-05-07 12:09:44 +02:00
Zakhar Bessarab	329c3cbdf0	lib/mergeset: improve test coverage (#6118 ) Add test to cover the code path with overflowing shards buffers and triggering merge to partition. This test covers the code path which leaded to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-04-30 10:21:37 +02:00
hagen1778	381d4494e9	v1.101.0 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEkhL6N9vmSTjg0VSVO/dfN0HKlkAFAmYqbhsACgkQO/dfN0HK lkBDhRAAhUO7SsbCCHZo4Azdw+G32J05LsYvKMGl0r+j2fYRPovl7Sgf/HdZUKxk 4aOXPPl8YogHWHVv8qrmFXl7gPWRNaFtCxmVlVIv+eEzwzN18tH2Umn+PwfQTtmN VM7ujy54rH8z28AGII8P8h4s/0kNVPGwPP9gEifm8ICIXtKpdnvbtkpAoCFEZvYf b5chm/3NQRA1R7c+yRxVs9YH15+XgYG0z/onVaVnjUxPXvme64v3RL+nt/ezimMo PXDBt5HRXa6lWIxM+g3oaGJ9/qFKwTrHykXgx3oPPWsphJMVW8ltt8sqg6sGuRJz fD/iRjpHIGAfD/2BX90TOMyYbC+s921rPU0+aQ70U5mPU+f8E1fI1HNVlsJiZ9NL Xhj9GOJzNQP2moql1dsDibZXhO0aIMfweHduXN7KRK88IPtnQdy1Sj6lhAJdJ2iH q1s5ShDx9gLLA2ecuL4COA9tQxTncnTZdsU4Y1bnSif0Iuct03L84ovaCSAuJ5BP XrwVo0Vk2albDpw8n2Dzq7Xquiewyb9IlaQ8U5B/tdKSpH4aAydy56PgdC+gHaZk 6c2aBf0HKmg3qxsp/xb593cWloToPgsgB0KB2m7b+nEPBLP62obzBEeS8P5ahrJB UmPA7tw6BlYT93JttotFn+gykZjAELcbHkO8Yoe7JnVQMA8irFs= =ecyf -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEkhL6N9vmSTjg0VSVO/dfN0HKlkAFAmYrkEYACgkQO/dfN0HK lkAj0w//WUXbB/gR3P47t6dNSX0P0qb3D/+AQ2+he/wo3mJ1msd3XtkpiUHcpP0k qtrYFrY5wQQ4lC82VMOWdlw7YY5E4ah2tbYCAUBpgpp3Lu7iD/muiLlfwQslwYUy PzwKH5HBQQmgtKgGX4miSeVk95TUyzE5m70dF4atIY8ydK73ZqiSV+IC9/cYah2y Q4Y9xJZnSMR1cKdMfTpYR0s9gPg5bB9yAKq9qB8TQfxMnW2A8wkhvwf675mJJCZ+ spRTXzrcKp3thKWmDowTtzu/ONYTRcpQfgiE5MxzySnHQcv4nQnf3jxeT1+1K+pk 5jI5bFAjkRVy1u1oDsbjHySdFyt1jZA6Klw9dlGf9EVjXfr0jbAUO++8T538CCfW UUSx2h83GTvSMvVaCDCtbYdlHZwxgLTwJvFDdcpm4nci9u3wKsfnoKw6doskt1fs Sp241F46Ck5embAdtv1FJaGYvH8PVe4j24slBWvn5vhN8TcP7mcsw8DCGjKWfSVS JQQxxjCmAyxQOZj+9k2v3wlRjmHoRQ0ELu7EbYQW3eyiYi3bWVQkK5Mg4Z0knwGl HWWum7LMwkM59hIAc771VGOz3jELGWXBTZWP8FcoRmtzcvBnzVjndDAiwH89xhR6 Kym5JKrCkVvytee6xNxkBOIyuiavcB2qoZ7IqhHAYnqF9uhMx/E= =Ppg1 -----END PGP SIGNATURE----- Merge tag 'v1.101.0' into pmm-6401-read-prometheus-data-files v1.101.0 Signed-off-by: hagen1778 <roman@victoriametrics.com> # gpg: Signature made чт 25 кві 16:52:11 2024 CEST # gpg: using RSA key 9212FA37DBE64938E0D154953BF75F3741CA9640 # gpg: Good signature from "hagen1778 (VM GPG key) <roman@victoriametrics.com>" [ultimate] # Conflicts: # go.mod	2024-04-26 13:30:14 +02:00
hagen1778	679844feaf	Revert "app/vmbackup: introduce new flag type URL (#6152 )" This reverts commit `029060af60`.	2024-04-24 13:47:57 +02:00
Roman Khavronenko	029060af60	app/vmbackup: introduce new flag type URL (#6152 ) The new flag type is supposed to be used for specifying URL values which could contain sensitive information such as auth tokens in GET params or HTTP basic authentication. The URL flag also allows loading its value from files if `file://` prefix is specified. As example, the new flag type was used in app/vmbackup as it requires specifying `authKey` param for making the snapshot. See related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5973 Thanks to @wasim-nihal for initial implementation https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6060 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-04-24 10:57:54 +02:00
hagen1778	bae3874e6a	app/streamaggr: follow-up after `c0e4ccb7b5` * rm vmagent mentions from vminsert flags * improve documentation wording, add links to related sections * mention `ignore_first_intervals` in the stream aggr options * update flags description * add basic test for config parsing validation Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-04-22 14:22:59 +02:00
Andrii Chubatiuk	c0e4ccb7b5	lib/streamaggr: add option to ignore first N aggregation intervals (#6137 ) Stream aggregation may yield inaccurate results if it processes incomplete data. This issue can arise when data is sourced from clients that maintain a queue of unsent data, such as Prometheus or vmagent. If the queue isn't fully cleared within the aggregation interval, only a portion of the time series may be included in that period, leading to distorted calculations. To mitigate this we add an option to ignore first N aggregation intervals. It is expected, that client queues will be cleared during the time while aggregation ignores first N intervals and all subsequent aggregations will be correct.	2024-04-22 13:52:04 +02:00
Aliaksandr Valialkin	4770294732	lib/protoparser: substitute hybrid channel-based pools with plain sync.Pool Using plain sync.Pool simplifies the code without increasing memory usage and CPU usage. So it is better to use plain sync.Pool from readability and maintainability PoV. This is a follow-up for `8942f290eb`	2024-04-20 21:59:51 +02:00
Aliaksandr Valialkin	7531e9084a	all: use clear() built-in Go function for clearing []prompbmarshal.TimeSeries and []prompbmarshal.Label slices This makes the code a bit clear.	2024-04-20 21:00:03 +02:00
Aliaksandr Valialkin	6b1cc9b946	lib/storage: search for all the values for the given label before applying filters and limits It is incorrect applying the limit on the number of values to search without applying filters, since the returned subset of label values may miss the label values matching the given filters. This is a follow-up for `66630c7960`	2024-04-18 20:29:36 +02:00
Aliaksandr Valialkin	2e3580905f	all: replace old https://docs.victoriametrics.com/relabeling.html url with the new one - https://docs.victoriametrics.com/relabeling/	2024-04-18 03:22:22 +02:00
Aliaksandr Valialkin	e9642e99f2	all: replace old https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html url with the new one - https://docs.victoriametrics.com/single-server-victoriametrics/	2024-04-18 03:11:03 +02:00
Aliaksandr Valialkin	828e78ceb4	all: replace old https://docs.victoriametrics.com/sd_configs.html url with the new one - https://docs.victoriametrics.com/sd_configs/	2024-04-18 02:27:47 +02:00
Aliaksandr Valialkin	4d2b9fe6b2	all: replace old https://docs.victoriametrics.com/stream-aggregation.html url with the new one - https://docs.victoriametrics.com/stream-aggregation/	2024-04-18 02:19:11 +02:00
Aliaksandr Valialkin	6e6bae3e8d	all: replace old https://docs.victoriametrics.com/vmbackup.html url with the new one - https://docs.victoriametrics.com/vmbackup/	2024-04-18 01:57:04 +02:00
Aliaksandr Valialkin	c81a633b02	all: replace the outdated url https://docs.victoriametrics.com/vmagent.html with the new one - https://docs.victoriametrics.com/vmagent/	2024-04-18 01:31:37 +02:00
Aliaksandr Valialkin	66630c7960	lib/storage: improve performance for /api/v1/label/labelName/values when match[] contains only a single filter on labelName This speeds up auto-suggestion for metric names in VMUI and Grafana, which use the following query in this case: /api/v1/label/__name__/values?match[]={__name__=~".some_value."} When the user types `some_value` in the query input field.	2024-04-18 01:15:20 +02:00
Aliaksandr Valialkin	50ac22df78	lib/httpserver: add support for automatic issuing of TLS certificates via Lets Encrypt service Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5949	2024-04-17 23:50:57 +02:00
Aliaksandr Valialkin	bd454f5063	lib/netutil: move creation of GetCertificate callback into a separate function This improves code readability a bit	2024-04-17 22:10:43 +02:00
Aliaksandr Valialkin	dc326f70b4	app/vmagent: support for DNS SRV urls at -remoteWrite.url, scrape target urls and service discovery urls Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053	2024-04-17 20:54:39 +02:00
Aliaksandr Valialkin	b426d10847	app/vmauth: add support for configuring backends via DNS SRV urls	2024-04-17 20:46:22 +02:00
Aliaksandr Valialkin	e3a26c0db6	lib/promscrape/discovery/consul: typo fix in the comment: enteprise -> enterprise	2024-04-16 19:34:18 +02:00
Aliaksandr Valialkin	85d09e5a2d	lib/{mergeset,storage}: log deleting directories inside partitions if they are missing in parts.json This should improve debuggability of unexpected deletion of directories inside partitions. While at it, log the proper path to parts.json when the directory for big part is missing in the partition. parts.json is located inside directory with small parts, and there is no parts.json file inside directory with big parts.	2024-04-16 19:11:32 +02:00
Aliaksandr Valialkin	6bcc6c938b	lib/storage: improve comments inside functions responsible for creating indexes for newly registered time series	2024-04-16 19:11:32 +02:00
Zakhar Bessarab	2205de2391	lib/mergeset: fix flushing incorrect set of inmemoryBlocks (#6089 ) Follow-up for `bace9a2501` Related: - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6069 - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-04-11 09:26:06 +02:00
wanshuangcheng	83216e956c	chore: fix function names in comment (#6076 ) Signed-off-by: wanshuangcheng <wanshuangcheng@outlook.com>	2024-04-08 01:11:12 -07:00
Aliaksandr Valialkin	b7b731d340	Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files	2024-04-04 03:49:49 +03:00
Aliaksandr Valialkin	f8d10a7106	lib/streamaggr: update the minimum allowed timestamp for incoming samples before flushing the samples to the storage This should prevent from dropping samples with old timestamps during long flushes. This is a follow-up for `1cedaf61cb`	2024-04-04 02:25:51 +03:00
Aliaksandr Valialkin	967d5496cf	app/vmagent: follow-up for `b3b29ba6ac` - Automatically reload changed TLS root CA pointed by -remoteWrite.tlsCAFile command-line flag - Automatically reload changed TLS root CA configured via oauth2.tsl_config.ca_file option at -promscrape.config - Document the change as a feature instead of a bug at docs/CHANGELOG.md - Simplify the code at lib/promauth, which is responsible for reloading changed TLS root CA files. - Simplify the usage of lib/promauth.Config.NewRoundTripper() - now it accepts the base http.Transport instead of a callback, which can change the internal http.Transport. - Reuse the default tls config if lib/promauth.Config doesn't contain tls-specific configs. This should reduce memory usage a bit when tls isn't used for scraping big number of targets. - Do not re-read TLS root CA files on every processed request. Re-read them once per second. This should reduce CPU usage when scraping big number of targets over https. - Do not store cert.pem and key.pem files in TestTLSConfigWithCertificatesFilesUpdate, since they can be loaded from byte slices via crypto/tls.X509KeyPair(). - Remove obsolete comparisons of string representations for authConfig and proxyAuthConfig at areEqualScrapeConfigs(). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5725 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2171	2024-04-04 01:27:35 +03:00
Zakhar Bessarab	f80ac120f3	lib/promscrape/config: fix missing timeout for http client (#6063 ) Follow-up for `b3b29ba6` Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-04-03 18:18:48 +02:00
Zakhar Bessarab	b3b29ba6ac	lib/{promauth,promscrape}: automatically refresh root CA certificates after changes on disk (#5725 ) * lib/{promauth,promscrape}: automatically refresh root CA certificates after changes on disk Added a custom `http.RoundTripper` implementation which checks for root CA content changes and updates `tls.Config` used by `http.RoundTripper` after detecting CA change. Client certificate changes are not tracked by this implementation since `tls.Config` already supports passing certificate dynamically by overriding `tls.Config.GetClientCertificate`. This change implements dynamic reload of root CA only for streaming client used for scraping. Blocking client (`fasthttp.HostClient`) does not support using custom transport so can't use this implementation. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promauth/config: update NewRoundTripper API Update API to allow user to update only parameters required for transport. Add warning log when reloading Root CA failed. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promauth/config: fix mutex acquire logic Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promauth/config: replace RWMutex with regular mutex to simplify the code - remove additional mutex used for getRootCABytes - require callee to use mutex - replace RWMutex with regular mutex Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/promauth/config: refactor - hold the mutex lock to avoid round tripper being re-created twice - move recreation logic into separate func to simplify the code Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-04-03 10:01:43 +02:00
Aliaksandr Valialkin	fb42380ef3	lib/protoparser/opentelemetry: follow-up after `47892b4a4c` - Rename -opentelemetry.sanitizeMetrics command-line flag to more clear -opentelemetry.usePrometheusNaming - Clarify the description of the change at docs/CHANGELOG.md - Rename promrelabel.SanitizeLabelNameParts to more clear promrelabel.SplitMetricNameToTokens - Properly split metric names at '_' char in promerlabel.SplitMetricNameToTokens. - Add tests for various edge cases for Prometheus metric names' normalization according to the code at `b865505850/pkg/translator/prometheus/normalize_name.go` - Extract the code responsible for Prometheus metric names' normalization into a separate file (santize.go) Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6035	2024-04-03 02:25:29 +03:00
Aliaksandr Valialkin	918cccaddf	all: fix golangci-lint(revive) warnings after `0c0ed61ce7` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001	2024-04-02 23:16:29 +03:00
Aliaksandr Valialkin	c3a72b6cdb	lib/storage: consistently use stopCh instead of stop	2024-04-02 21:24:57 +03:00
Aliaksandr Valialkin	904e95fc69	app/vmagent: simplify code after `509df44d03` - Simplify the code in order to improve its maintenance - Properly pass tenant ID when processing multi-tenant opentelemetry request at vmagent Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6016	2024-04-02 17:58:13 +03:00
Aliaksandr Valialkin	c79bf3925c	Revert "app/vmselect: make vmselect resilient to absence of cache folder (#5987 )" This reverts commit `cb23685681`. Reason for revert: the "fix" may hide programming bugs related to incorrect creation of folders before their use. This may complicate detecting and fixing such bugs in the future. There are the following fixes for the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985 : - To configure the OS to do not drop data from the system-wide temporary directory (aka /tmp). - To run VictoriaMetrics with -cacheDataPath command-line flag, which points to the directory, which cannot be removed automatically by the OS. The case when the user accidentally deletes the directory with some files created by VictoriaMetrics shouldn't be considered as expected, so VictoriaMetrics shouldn't try resolving this case automatically. It is much better from operation and debuggability PoV is to crash with the clear `directory doesn't exist` error in this case.	2024-03-30 07:29:24 +02:00
Aliaksandr Valialkin	830b871baf	app/vmagent: properly shutdown when -maxIngestionRate limit is reached The remotewrite.Stop() expects that there are no pending calls to TryPush(). This means that the ingestionRateLimiter.Register() must be unblocked inside TryPush() when calling remotewrite.Stop(). Provide remotewrite.StopIngestionRateLimiter() function for unblocking the rate limiter before calling the remotewrite.Stop(). While at it, move the rate limiter into lib/ratelimiter package, since it has two users. Also move the description of the feature to the correct place at docs/CHANGELOG.md. Also cross-reference -remoteWrite.rateLimit and -maxIngestionRate command-line flags. This is a follow-up for `02bccd1eb9` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5900	2024-03-30 06:43:48 +02:00
Zakhar Bessarab	af3922b1df	lib/storage: add ability to use downsampling for the given series filter (#733 ) * lib/storage: add ability to use downsampling for the given series filter Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs: add information about downsampling filters Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs: fix MetricsQL filter Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/downsampling: treat missing downsampling filter as a bug Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/part_header: verify correctness of downsampling filters when opening partition Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/downsampling: save only appliable rules in part metadata Filter and save only rules which are appliable to partition based on MinTimestamp of stored data. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/downsampling: update log messages for final dedup Properly specify a reason of re-running deduplication for partition. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage: consistently use MaxTimestamp to determine deduplication/downsampling rules Using MinTimestamp leads to applying downsampling to parts which are only partially covered by downsampling rule. For example, partition covers range [1000-2000]. At t=2100 and rule offset 500 data with t=2100-500 => 1600 must be downsampled. The range check against MinTimestamp evaluates to true even though partition contains range which must not be downsampled - [1600:2000]. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * Follow-up - Apply the first matching downsampling period if multiple filters match the given time series. This allows fine-tuning the downsampling config for the specific needs. - Take into account downsampling filters during search queries. - Reduce the difference between community and enterprise branches. This should simplify further maintenance of these branches. - Properly parse series filters with colons inside them. - Document the feature at docs/CHANGELOG.md. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4960 --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-03-30 04:12:23 +02:00
Aliaksandr Valialkin	131f357098	lib/storage/table.go: reduce the difference with enterprise branch	2024-03-30 03:22:51 +02:00
Aliaksandr Valialkin	4001ca36b8	lib/storage/partition.go: reduce code difference a bit with enterprise branch	2024-03-30 01:39:27 +02:00
Nikolay	a05303eaa0	lib/storage: adds metrics for downsampling (#382 ) * lib/storage: adds metrics for downsampling vm_downsampling_partitions_scheduled - shows the number of parts, that must be downsampled vm_downsampling_partitions_scheduled_size_bytes - shows total size in bytes for parts, the must be donwsampled These two metrics answer the questions - is downsampling running? how many parts scheduled for downsampling and how many of them currently downsampled? Storage space that it occupies. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2612 * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-03-30 01:11:49 +02:00
Andrii Chubatiuk	47892b4a4c	opentelemetry: added cmd flag to sanitize metric names (#6035 )	2024-03-29 13:51:24 +01:00
Aliaksandr Valialkin	4a359d5f67	lib/storage: follow-up for `76f00cea6b` Store the deadline when the metricID entries must be deleted from indexdb if metricID->metricName entry isn't found after the deadline. This should make the code more clear comparing the the previous version, where the timestamp of the first metricID->metricName lookup miss was stored in missingMetricIDs. Remove the misleading comment about the importance of the order for creating entries in the inverted index when registering new time series. The order doesn't matter, since any subset of the created entries can become visible for search before any other subset after registering in indexdb. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959	2024-03-27 11:41:28 +02:00
Zakhar Bessarab	51f5ac1929	lib/storage/table: wait for merges to be completed when closing a table (#5965 ) * lib/storage/table: properly wait for force merges to be completed during shutdown Properly keep track of running background merges and wait for merges completion when closing the table. Previously, force merge was not in sync with overall storage shutdown which could lead to holding ptw ref. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs: add changelog entry Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-03-26 13:49:09 +01:00
Andrii Chubatiuk	509df44d03	app/{vmagent,vminsert}: fixed firehose response (#6016 )	2024-03-26 13:20:41 +01:00
Roman Khavronenko	cb23685681	app/vmselect: make vmselect resilient to absence of cache folder (#5987 ) vmselect uses a cache folder in file system for two purposes: 1. Storing rollup cache results on shutdown; 2. Storing temporary search results from vmstorage during query executions. It could happen that cache folder is deleted accidentally by user, or by OS during cleanup routines. This would cause vmselect to: 1. panic on /metrics call, because `MustGetFreeSpace` will fail; 2. return query error user, as it won't be able to store temporary search results. The changes in this commit are the following: 1. Make `MustGetFreeSpace` to try re-creating the cache folder if it is missing; 2. Make vmselect to try re-creating the cache folder if it can't persist tmp search results. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985 Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-03-26 12:59:50 +01:00
hagen1778	e6dd52b04c	lib/promauth: follow-up `b577413d3b` Convert test result expectations to canonical form. Starting from `b577413d3b` specified header keys are forced into canonical form https://pkg.go.dev/net/http#CanonicalHeaderKey Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-03-18 11:12:45 +01:00
Aliaksandr Valialkin	4553521f9a	lib/streamaggr: ignore out of order samples for `last` output This is a follow-up for `6a465f6e29` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931	2024-03-18 01:03:36 +02:00
Aliaksandr Valialkin	76f00cea6b	lib/storage: wait for up to 60 seconds before deciding to delete metricID entries from indexdb if metricID->metricName entry is missing during search The metricID->metricName entry can remain invisible for search for some time after registering new metricName. This is expected condition. So wait for up to 60 seconds in the hope that the metricID->metricName entry will become visible before deleting all the entries from indexdb, which are associated with the given metricID. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948 See also `20812008a7`	2024-03-18 00:34:32 +02:00
Aliaksandr Valialkin	729b263670	lib/httputils: rename CAFile -> caFile in order to be consistent with local var naming in Go This is a follow-up for `83e55456e2`	2024-03-17 23:19:52 +02:00
Aliaksandr Valialkin	1cedaf61cb	app/{vmagent,vminsert}: add an ability to ignore input samples outside the current aggregation interval for stream aggregation See https://docs.victoriametrics.com/stream-aggregation.html#ignoring-old-samples	2024-03-17 23:03:47 +02:00
Aliaksandr Valialkin	6a465f6e29	lib/streamaggr: ignore out of order samples when calculating increase, increase_prometheus, total and total_prometheus outputs Out of order samples may result in unexpected spikes for these outputs. So it is better to ignore such samples. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931	2024-03-17 22:03:03 +02:00
Aliaksandr Valialkin	cbd80efcc1	lib/streamaggr: follow-up for `15e33d56f1` - Properly set pushSample.timestamp when flushing de-duplicated samples to stream aggregation This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931 - Re-classify this change as feature instead of bugfix at docs/CHANGELOG.md - Verify de-duplication logic for samples with different timestamps Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5643 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5939	2024-03-17 21:37:16 +02:00
Aliaksandr Valialkin	b577413d3b	lib/promauth: properly set `Host` header in requests to scrape targets. The `Host` header must be set via net/http.Request.Host field, since net/http.Client ignores this header if it is set via Request.Header.Set(). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5969 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5970	2024-03-17 20:22:54 +02:00
Andrii Chubatiuk	15e33d56f1	lib/streamaggr: pick sample with bigger timestamp or value on deduplicator (#5939 ) Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication This would require more memory for deduplication, since we need to track timestamp for each record. However, deduplication should become more consistent. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5643 --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-03-12 22:47:29 +01:00
Aliaksandr Valialkin	d1d2771bee	lib/storage: optimize /api/v1/labels and /api/v1/label/.../values when match[] contains metric name Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055	2024-03-12 02:43:16 +02:00

... 4 5 6 7 8 ...

3161 commits