Commit graph

3161 commits

Author SHA1 Message Date
Aliaksandr Valialkin
202eb429a7
lib/logstorage: refactor storage format to be more efficient for querying wide events
It has been appeared that VictoriaLogs is frequently used for collecting logs with tens of fields.
For example, standard Kuberntes setup on top of Filebeat generates more than 20 fields per each log.
Such logs are also known as "wide events".

The previous storage format was optimized for logs with a few fields. When at least a single field
was referenced in the query, then the all the meta-information about all the log fields was unpacked
and parsed per each scanned block during the query. This could require a lot of additional disk IO
and CPU time when logs contain many fields. Resolve this issue by providing an (field -> metainfo_offset)
index per each field in every data block. This index allows reading and extracting only the needed
metainfo for fields used in the query. This index is stored in columnsHeaderIndexFilename ( columns_header_index.bin ).
This allows increasing performance for queries over wide events by 10x and more.

Another issue was that the data for bloom filters and field values across all the log fields except of _msg
was intermixed in two files - fieldBloomFilename ( field_bloom.bin ) and fieldValuesFilename ( field_values.bin ).
This could result in huge disk read IO overhead when some small field was referred in the query,
since the Operating System usually reads more data than requested. It reads the data from disk
in at least 4KiB blocks (usually the block size is much bigger in the range 64KiB - 512KiB).
So, if 512-byte bloom filter or values' block is read from the file, then the Operating System
reads up to 512KiB of data from disk, which results in 1000x disk read IO overhead. This overhead isn't visible
for recently accessed data, since this data is usually stored in RAM (aka Operating System page cache),
but this overhead may become very annoying when performing the query over large volumes of data
which isn't present in OS page cache.

The solution for this issue is to split bloom filters and field values across multiple shards.
This reduces the worst-case disk read IO overhead by at least Nx where N is the number of shards,
while the disk read IO overhead is completely removed in best case when the number of columns doesn't exceed N.
Currently the number of shards is 8 - see bloomValuesShardsCount . This solution increases
performance for queries over large volumes of newly ingested data by up to 1000x.

The new storage format is versioned as v1, while the old storage format is version as v0.
It is stored in the partHeader.FormatVersion.

Parts with the old storage format are converted into parts with the new storage format during background merge.
It is possible to force merge by querying /internal/force_merge HTTP endpoint - see https://docs.victoriametrics.com/victorialogs/#forced-merge .
2024-10-16 17:35:07 +02:00
Andrii Chubatiuk
daa7183749
lib/protoparser/influx: enable batch processing by default (#7165)
### Describe Your Changes

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7090

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-10-15 11:48:40 +02:00
Aliaksandr Valialkin
bac193e50b
app/vlselect: do not show empty fields in query results
Empty fields are treated as non-existing fields by VictoriaLogs data model.
So there is no sense in returning empty fields in query results, since they may mislead and confuse users.
2024-10-14 23:43:58 +02:00
Aliaksandr Valialkin
3c73dbbacc
app/vlstorage: add support for forced merge via /internal/force_merge HTTP endpoint 2024-10-13 22:20:31 +02:00
Aliaksandr Valialkin
b4b79a4961
lib/logstorage: make a copy of s.partitions slice when performing queries over the selected partitions
s.partitions can be changed when new partition is registered or when old partition is dropped.
This could lead to data races and panics when s.partitions slice is accessed by concurrently executed queries.

The fix is to make a copy of the selected partitions under s.partitionsLock before performing the query.
2024-10-13 22:14:34 +02:00
Aliaksandr Valialkin
507b206a7d
lib/logstorage: move getConstColumnValue() and getColumnHeader() methods from columnsHeader to blockSearch
This localizes blockSearch.getColumnsHeader() call at block_search.go .
This call is going to be optimized in the next commits in order to avoid
unmarshaling of header data for unneeded columns, which weren't requested
by getConstColumnValue() / getColumnHeader().
2024-10-13 14:29:02 +02:00
Aliaksandr Valialkin
279e25e7c8
lib/logstorage: avoid redundant copying of column names and column values for dictionary-encoded columns during querying
Refer the original byte slice with the marshaled columnsHeader for columns names and dictionary-encoded column values.
This improves query performance a bit when big number of blocks with big number of columns are scanned during the query.
2024-10-13 13:25:38 +02:00
Aliaksandr Valialkin
9e48074b59
lib/logstorage: avoid calling columnsHeader.initFromBlockHeader() multiple times for the same blockSearch
This should improve performance when blockSearch.getColumnsHeader() is called multiple times
from different places of the code.
2024-10-13 12:56:12 +02:00
Aliaksandr Valialkin
867f671cc4
lib/logstorage: make sure that bs.br is non-nil before checking br.bs.bsw.bh.rowsCount there
br.bs may be nil when br contains the block with additional filters applied during pipe calculations.
For example, `* | count() if (error) errors`.
2024-10-12 20:51:29 +02:00
Andrii Chubatiuk
9eb0c1fd86
lib/protoparser/opentelemetry: added exponential histograms support (#6354)
### Describe Your Changes

added opentelemetry exponential histograms support. Such histograms are automatically converted into
VictoriaMetrics histogram with `vmrange` buckets.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-10-11 13:44:52 +02:00
Aliaksandr Valialkin
7b475ed95d
lib/logstorage: disallow using pipe names as the first unquoted words in filter pipe
Improperly written pipes could be silently parsed as filter pipe.
For example, the following query:

   * | by (x)

was silently parsed to:

   * | filter "by" x

It is better to return error, so the user could identify and fix invalid pipe
instead of silently executing invalid query with `filter` pipe.
2024-10-09 16:10:13 +02:00
Aliaksandr Valialkin
6acf543b90
lib/logstorage: disallow using by as the first word in log filters, since it frequently clashes with stats by(...) pipe where stats word is omitted 2024-10-09 15:53:15 +02:00
Zakhar Bessarab
eefae85450
vmagent: add support of HTTP2 client for Kubernetes SD (#7114)
### Describe Your Changes

Currently, vmagent always uses a separate `http.Client` for every group
watcher in Kubernetes SD. With a high number of group watchers this
leads to large amount of opened connections.

This PR adds 2 changes to address this:
- re-use of existing `http.Client` - in case `http.Client` is connecting
to the same API server and uses the same parameters it will be re-used
between group watchers
- HTTP2 support - this allows to reuse connections more efficiently due
to ability of using streaming via existing connections.

See this issue for the details and test results -
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5971

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-10-08 10:36:31 +02:00
Aliaksandr Valialkin
89686094a0
lib/logstorage: allow special chars in unquoted _stream tag names and values
This simplifies writing _stream filters. For example,

{foo-bar=abc:de}

can be written instead of

{"foo-bar"="abc:de"}
2024-10-07 15:10:03 +02:00
Aliaksandr Valialkin
462b7cd597
lib/logstorage: quote logfmt strings only if they contain special chars, which could break logfmt parsing and/or reading 2024-10-07 14:31:30 +02:00
Artem Fetishev
c1cd3e85a7
lib/promscrape: Fix TestClientProxyReadOk flaky test (#7173)
This PR fixes #7062 

For hijacked connections, one has to read from the connection buffer,
but still write directly to the connection. Otherwise, when reading
directly from such connections, the first byte may be lost. This, in
turn corrupts the ClientHello TLS handshake message and when the backend
server receives it, it closes the connection and reports the following
error in the log:

```
http: TLS handshake error from 127.0.0.1:33150: tls: first record does not look
like a TLS handshake
```

The first byte may be lost because underlying HTTP request handler may
read it from the connection and put it into the buffer. As the result,
subsequent connection reads won't see that byte.

-   See: https://github.com/golang/go/issues/27408
-   The fix is taken from : https://github.com/k3s-io/k3s/pull/6216

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2024-10-03 18:27:15 +02:00
Aliaksandr Valialkin
364f084b43
lib/logstorage: add len pipe for calculating byte length of log field values 2024-10-03 18:21:10 +02:00
hagen1778
2404b4bc00 v1.104.0
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkhL6N9vmSTjg0VSVO/dfN0HKlkAFAmb9KqIACgkQO/dfN0HK
 lkAXcBAAluUXBa4oYV4g/xvsd30oXtC79DoFY527K1cTgesDohf0FdLGU++7Aphm
 efOR8BaytBPGHGn9PmuZIbebiFv6TVBih7b8gl+frm/yGLh/1WyAYp2sClB1KcJa
 r7rHBMF7sikDkLPFlJv9qYhERj05aUTc/uwWn7KzUMPbmUZcXOJhxttm1Hf7Rc6P
 zcO1cymSEouzSOw0qoHFHRZYgkt9j1GW36vUgEX6+b3VJvOAhoaolw6OX65wt8Cm
 +YdXW51gEalZRIRNtgY3lDJnCAHn72RsRbLpylyGW1TcuBnwfSIWlPpLU04IGVlx
 06Vl47o/6vEBoVKk+2Y6La4iwD8+x/Td1RlrELOo4Qzrv1ppqOCveUa0wh6JQfjB
 aQawE7Yzh35qKvRVZtgY8NaUzkTL2QISlnpkokHfZZLIn6WAhok4c+vxnCl5CaBE
 3yRenqZ/OdMs+Wa8WMb6thcxA0eQ40t3B35iYyvMJdhSKDtdNT2F5kFh7ve6Woiu
 2TmN+GWPM0zBMVEVGy1i1L+42dlG6ANY3p5a8vz0qfqBBJF+V+P/BetfejTPjJ7r
 PN6HpdcfN+a+FGsUWckhFSU7z0LFJIytQyxb6vGn5N1UW0pupQMs5E/jFcFJl2/Q
 yO8WZmGm4QfhupcfgAkTIBgsUliqmIBXsNk6sjhzwbxYBtXSqwQ=
 =F+22
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkhL6N9vmSTjg0VSVO/dfN0HKlkAFAmb9XnUACgkQO/dfN0HK
 lkAYJw/8D+fAkqo48iNynRZf8N7kR22XBVc+zvJIFIL8wyScMec3o6+rRQ5WmmEk
 MxT73EWW2Nv81L4JC585u3zutM7Ow7nG1paxrF2hWLNAniKJd+Z+okRWThf89c8Q
 IC2egeVtgQ9ADNBTNGF72FsBBj+P6rv3Xe/M0XSLCS4mLY1eVnhdx7yuQsSNkzpr
 hxndq5odwEprFNXe9WEgH04ekS3u0ZMzWidhSHJpZVXDt6iFTxfoD+NkYpPRIZuc
 KwE0Zm1eTn98MJNZvoVyJ2hbD3f513I5yvdaNMFZ0I08Dh281uugYZu8r7mwqS49
 0uCC9PoEuErYbCGCGjmXOGVnyB6vvRjIfIOif/M1KqpH5g7xTKWc9S23P2ib3HgI
 brFl5EDl1Qa+qnkwWC98G58b85hjTJjLYhbst+O/MW+j6W2zihrt0N9UsKKTPgzj
 xvLhYz97wF0GCOfD5sZyyMdTCI6QWqtbE79ysHw+WCSrbZIKh6MFp6eO6qQF3JWT
 9IPT6O9G57Q9iwtS+MSVgriobE7qV/fHB/ICiciTGtsYfsovwxnq8BJuBiehwqau
 deqf4gbsZQiME1i+o9nnOcekDXkziKnkJIv8E5NBq77NQEzliSwfHwoaTtEusj7n
 4XbgRX37B8XtANVg1twWZb8gYFtxqYoojymAKx/Ag2e4I3qnzbM=
 =e9ts
 -----END PGP SIGNATURE-----

Merge tag 'v1.104.0' into pmm-6401-read-prometheus-data-files

v1.104.0

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-02 16:53:41 +02:00
Roman Khavronenko
0d4f4b8f7d
(app|lib)/vmstorage: do not increment vm_rows_ignored_total on NaNs (#7166)
`vm_rows_ignored_total` metric is a metric for users to signalize about
ingestion issues, such as bad timestamp or parsing error.
In commit
a5424e95b3
this metric started to increment each time vmstorage gets NaN. But NaN
is a valid value for Prometheus data model and for Prometheus metrics
exposition format. Exporters from Prometheus ecosystem could expose NaNs
as values for metrics and these values will be delivered to vmstorage
and increment the metric.
Since there is nothing user can do with this, in opposite to parsing
errors or bad timestamps, there is not much sense in incrementing this
metric. So this commit rolls-back `reason="nan_value"` increments.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-02 12:37:27 +02:00
Aliaksandr Valialkin
a350be48b6
lib/logstorage: do not count dictionary values which have no matching logs in count_uniq stats function
Create blockResultColumn.forEachDictValue* helper functions for visiting matching
dictionary values. These helper functions should prevent from counting dictionary values
without matching logs in the future.

This is a follow-up for 0c0f013a60
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7152
2024-10-01 13:34:45 +02:00
Aliaksandr Valialkin
630211cfed
app/vlogscli: add interactive command-line tool for querying VictoriaLogs 2024-10-01 12:23:07 +02:00
Zhu Jiekun
7bb8853a5c
feature: [vmagent] Add service discovery support for OVH Cloud VPS and dedicated server (#6160)
### Describe Your Changes
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6071

#### Added
- Added service discovery support for OVH Cloud:
    - VPS.
    - Dedicated server.

#### Docs
- `CHANGELOG.md`, `sd_configs.md`, `vmagent.md` are updated.

#### Note
- Useful links: 
    - OVH Cloud VPS API: https://eu.api.ovh.com/console/#/vps~GET
- OVH Cloud Dedicated server API:
https://eu.api.ovh.com/console/#/dedicated/server~GET
    - OVH Cloud SDK: https://github.com/ovh/go-ovh
- Prometheus SD:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ovhcloud_sd_config

Tested on OVH Cloud VPS and dedicated server.
<img width="1722" alt="image"
src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/30280396/d3f0adc8-b0ef-423e-9379-8a9b9b0792ee">

<img width="1724" alt="image"
src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/30280396/18b5b730-3512-4fc0-8b2c-f2450ac550fd">

---
Signed-off-by: Jiekun <jiekun@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-09-30 14:42:46 +02:00
Hui Wang
664f337c70
stream aggregation: fix possible duplicated aggregation results (#7118)
When ingesting samples with the same labels(duplicated samples or
samples with the same labels after `by` or `without` options). They
could register different entries for the same labelset in
LabelsCompressor.
For example, both index 99 and 100 can be assigned to label `foo=1` in
two concurrent pushes. Then due to differing label indexes in encoded
keys, the samples will appear as distinct in aggrState, resulting in
duplicated results after decompressing the label indexes.

fbde238cdc/lib/streamaggr/streamaggr.go (L933)

In this pull request, since we need to store `idxToLabel` first to
ensure the idx can be searched after `lc.labelToIdxStore`,
the `lc.idxToLabel` still could contain a duplicated entries
[100]="foo=1". But given the low likelihood of this issue and the size
of idxToLabel, it should be fine.
2024-09-30 14:24:59 +02:00
Aliaksandr Valialkin
0c0f013a60
lib/logstorage: skip values with zero hits for 'uniq', 'top' and 'field_values' pipes
See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/72#issuecomment-2352078483
2024-09-30 14:15:07 +02:00
Artem Fetishev
ed5da38ede
Introduce a flag for limiting the number of time series to delete (#7091)
### Describe Your Changes

Introduce the `-search.maxDeleteSeries` flag that limits the number of
time series that can be deleted with a single
`/api/v1/admin/tsdb/delete_series` call.

Currently, any number can be deleted and if the number is big (millions)
then the operation may result in unaccounted CPU and memory usage spikes
which in some cases may result in OOM kill (see #7027). The flag limits
the number to 30k by default and the users may override it if needed at
the vmstorage start time.


---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-09-30 10:02:21 +02:00
Aliaksandr Valialkin
1da4650143
lib/logstorage: allow using ! in unescaped phrase
Previously the phrase filter with `!` was treated unexpectedly.
For example, `foo!bar` filter was treated at `foo AND NOT bar`,
while most users expect that it matches "foo!bar" phrase.

This commit aligns with users' expectations.
2024-09-29 11:14:15 +02:00
Aliaksandr Valialkin
60183c7c79
lib/logstorage: allow using - instead of ! in front of (...) 2024-09-29 11:12:22 +02:00
Nikolay
3bbb2aed72
fscore: rollback trailing space trim (#7106)
Previous commit 201fd6de1e removed
trailing space trim from data read from file. But common practice is to
remove such trailing space. And it leaded to the authorization errors
for the major group of users.

In first place, this change must help to mitigate an issue with
kubernetes. When authorization information was read from Secret content.
Changes to the operator was made to mitigate such problem at commit
1cf64358c8

We could introduce later optional flag for VictoriaMetrics to disable
trim space behavior.

Related issues:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6986
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7089 
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6947

---------

Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>
2024-09-29 10:59:25 +02:00
Aliaksandr Valialkin
b52862badf
lib/logstorage: return the expected hits results from uniq pipe when the number of unique values reaches the specified limit
Previously `uniq` pipe could return zero `hits` if the number of found unique values equals the specified limit.
This wasn't expected in most cases.
2024-09-29 10:51:09 +02:00
Aliaksandr Valialkin
55eb321f77
lib/logstorage: clear hits slice obtained from encoding.GetUint64s() before updating it with hits for valueTypeDict column
encoding.GetUint64s() returns uninitialized slice, which may contain arbitrary values.
So values in this slice must be reset to zero before using it for counting hits in `uniq` and `top` pipes.
2024-09-29 10:29:13 +02:00
Aliaksandr Valialkin
94afcbd9a9
lib/logstorage: postpone initialization of per-shard stateSizeBudget until the first call to pipeProcessor.writeBlock()
This simplifies pipeProcessor initialization logic a bit.
This also doesn't mangle the original maxStateSize value, which is used in error messages when the state size exceeds maxStateSize.
2024-09-29 10:29:13 +02:00
Aliaksandr Valialkin
0b91452ca4
lib/logstorage: add non-empty if (...) condition to automatically generated result names in stats pipe
This allows executing queries with `stats` pipe, which calculate multiple results with the same functions,
but with different `if (...)` conditions. For example:

  _time:5m | count(), count() if (error)

Previously such queries couldn't be executed becasue automatically generated name for the second result
didn't include `if (error)`, so names for both results were identical - `count(*)`.
2024-09-29 09:51:28 +02:00
Aliaksandr Valialkin
8772aea24b
lib/logstorage: support order alias for sort pipe
Now the following queries are equivalents:

    _time:5s | sort by (_time)

    _time:5s | order by (_time)

This is needed for convenience, since `order by` is commonly used in other query languages such as SQL.
2024-09-29 09:51:27 +02:00
Aliaksandr Valialkin
09b309a82e
lib/logstorage: allow using - instead of ! as a shorthand for NOT operator in LogsQL 2024-09-27 13:14:47 +02:00
Aliaksandr Valialkin
76c1b0b8ea
lib/logstorage: support skipping _stream: prefix for stream filters
'_stream:{...}' can be written as '{...}'

This simplifies writing queries with stream filters, and makes them more familier to Loki users.
2024-09-27 13:14:46 +02:00
Aliaksandr Valialkin
9367a9a6a2
lib/logstorage: consistently sort stream contexts belonging to different streams by the minimum time seen in the matching logs
This should simplify debugging of stream_context output, since it remains stable over repeated requests.
2024-09-27 11:19:26 +02:00
Aliaksandr Valialkin
b49d1ea809
lib/logstorage: add _msg="---" delimiter between different log streams in stream_context output
This should help investigating contexts, which belong to different log streams.
2024-09-27 11:01:13 +02:00
Aliaksandr Valialkin
b82bd0c2ec
lib/logstorage: improve performance for stream_context pipe over streams with big number of log entries
Do not read timestamps for blocks, which cannot contain surrounding logs.
This should improve peformance for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6730 .

Also optimize min(_time) and max(_time) calculations a bit by avoiding conversion
of timestamp to string when it isn't needed.
This should improve performance for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .
2024-09-26 22:22:23 +02:00
Aliaksandr Valialkin
3646724c6f
lib/contextutil: make golanci-lint happy by substituing unused function arg name with _
This is a follow-up for 4b1611267f
2024-09-26 17:06:48 +02:00
Aliaksandr Valialkin
4b1611267f
lib/logstorage: properly return surrounding logs outside the selected time range by stream_context pipe
Previously only logs inside the selected time range could be returned by stream_context pipe.
For example, the following query could return up to 10 surrounding logs only for the last 5 minutes,
while most users expect this query should return up to 10 surrounding logs without restrictions on the time range.

    _time:5m panic | stream_context before 10

This enables the ability to implement stream context feature at VictoriaLogs web UI: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7063 .

Reduce memory usage when returning stream context over big log streams with millions of entries.
The new logic scans over all the log messages for the selected log stream, while keeping in memory only
the given number of surrounding logs. Previously all the logs for the given log stream on the selected time range
were loaded in memory before selecting the needed surrounding logs.
This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6730 .

Reduce the scan performance for big log streams by fetching only the requested fields. For example, the following
query should be executed much faster than before if logs contain many fields other than _stream, _msg and _time:

    panic | stream_context after 30 | fields _stream, _msg, _time
2024-09-26 17:03:45 +02:00
Aliaksandr Valialkin
037652d5ae
app/vlinsert: support _time field without timezone information during data ingestion
Use local timezone of the host server in this case. The timezone can be overridden
with TZ environment variable if needed.

While at it, allow using whitespace instead of T as a delimiter between data and time
in the ingested _time field. For example, '2024-09-20 10:20:30' is now accepted
during data ingestion. This is valid ISO8601 format, which is used by some log shippers,
so it should be supported. This format is also known as SQL datetime format.

Also assume local time zone when time without timezone information is passed to querying APIs.
Previously such a time was parsed in UTC timezone. Add `Z` to the end of the time string
if the old behaviour is preferred.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6721
2024-09-26 12:49:35 +02:00
Aliaksandr Valialkin
255d1d4e13
app/vlselect/logsql: clone the query with the current timestamp when performing live tailing requests in the loop
Previously the original timestamp was used in the copied query, so _time:duration filters
were applied to the original time range: (timestamp-duration ... timestamp]. This resulted
in stopped live tailing, since new logs have timestamps bigger than the original time range.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7028
2024-09-26 08:57:23 +02:00
Aliaksandr Valialkin
e9950f6307
lib/logstorage: add blocks_count pipe
This pipe is useful for debugging purposes when the number of processed blocks must be calculated for the given query:

    <query> | blocks_count

This helps detecting the root cause of query performance slowdown in cases like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070
2024-09-25 19:17:48 +02:00
Aliaksandr Valialkin
65b93b17b1
lib/logstorage: lazily read column headers metadata during queries
This improves performance for analytical queries, which do not need column headers metadata.
For example, the following query doesn't need column headers metadata, since _stream and min(_time)
are stored in block header, which is read separately from colum headers metadata:

  _time:1w | stats by (_stream) min(_time) min_time

This commit significantly improves the performance for this query.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070
2024-09-25 19:17:48 +02:00
Aliaksandr Valialkin
4599429f51
lib/logstorage: read timestamps column when it is really needed during query execution
Previously timestamps column was read unconditionally on every query.
This could significantly slow down queries, which do not need reading this column
like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .
2024-09-25 19:17:47 +02:00
Aliaksandr Valialkin
7f1ba18719
lib/logstorage: improve the performance of obtaining _stream column value
Substitute global streamTagsCache with per-blockSearch cache for ((stream.id) -> (_stream value)) entries.
This improves scalability of obtaining _stream values on a machine with many CPU cores, since every CPU
has its own blockSearch instance.

This also should reduce memory usage when querying logs over big number of streams, since per-blockSearch
cache of ((stream.id) -> (_stream value)) entries is limited in size, and its lifetime is bounded by a single query.
2024-09-24 20:57:00 +02:00
Aliaksandr Valialkin
cf2e7d0d92
lib/logstorage/consts.go: document that it isn't recommended setting maxColumnsPerBlock constant to too big values
This should help avoiding cases like this one - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6425#issuecomment-2337446083
2024-09-24 18:51:46 +02:00
Aliaksandr Valialkin
f86e093b20
lib/logstorage: improve performance for streamID.marshalString() by more than 2x
The streamID.marshalString() is executed in hot path if the query selects _stream_id field.

Command to run the benchmark:

go test ./lib/logstorage/ -run=NONE -bench=BenchmarkStreamIDMarshalString -benchtime=5s

Results before the commit:

BenchmarkStreamIDMarshalString-16    	438480714	        14.04 ns/op	  71.23 MB/s	       0 B/op	       0 allocs/op

Results after the commit:

BenchmarkStreamIDMarshalString-16    	982459660	         6.049 ns/op	 165.30 MB/s	       0 B/op	       0 allocs/op
2024-09-24 18:35:04 +02:00
Aliaksandr Valialkin
919d2dc90e
lib/logstorage: add benchmark for streamID.marshalString 2024-09-24 18:31:38 +02:00
hagen1778
8bb3f2fd43
lib/promscrape: make linter happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-09-24 15:12:55 +02:00
hagen1778
c7569dac50
lib/promscrape: temporary disable TestClientProxyReadOk
This test is very flaky and prevents other tests from running in CI.
Disabling this test should improve tests quality, since it isn't reliable anyway.

There is a ticket to fix this test - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7062

Once fixed, this test should be uncommented.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-09-24 14:59:25 +02:00
Dmytro Kozlov
cbeb7d50e8
lib/promscrape: show only unhealthy targets if show_only_unhealthy filter is enabled (#6960)
### Describe Your Changes

It is better to show only unhealthy targets instead of all of them when
`show_only_unhealthy` filter is enabled.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3536

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-09-24 12:18:24 +02:00
Aliaksandr Valialkin
109772bdc4
lib/cgroup: round GOMAXPROCS to the lower integer value of cpuQuota
Rounding GOMAXPROCS to the upper interger value of cpuQuota increases chances of CPU starvation,
non-optimimal goroutine scheduling and additional CPU overhead related to context switching.

So it is better to round GOMAXPROCS to the lower integer value of cpuQuota.
2024-09-23 16:09:12 +02:00
Artem Fetishev
55febc0920
lib/storage: restore ability to put empty metric ID list into tagFiltersToMetricIDsCache (#7064)
### Describe Your Changes

Currently it the metricID list is empty it won't be mashalled and as the
result won't be put into the tagFiltersToMetricIDsCache which causes the
cache misses for the corresponding tagFilters. In some setups this
causes severe search speed detradation (see #7009).

The empty metric IDs was covered before but then was accidentally
removed in 6c21439.

This PR restores the coverage of this case.

A new unit test can be used as a proof that empty metricID lists are not
added to the cache (just remove the fix in index_db.go and run the test
to see the result)

Also a benchmark has been added to see the implications of the
compression.

```
user@laptop:~/p/github.com/rtm0/VictoriaMetrics/01/src$ go test ./lib/storage/ -run=NONE -bench BenchmarkMarshalUnmarshalMetricIDs --loggerLevel=ERROR
goos: linux
goarch: amd64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage
cpu: 13th Gen Intel(R) Core(TM) i7-1355U
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-0-12             3237240               363.5 ns/op               0 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1-12             2831049               451.8 ns/op               0.4706 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10-12            1152764              1009 ns/op                 1.667 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100-12            297055              3998 ns/op                 5.755 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000-12            31172             34566 ns/op                 8.484 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000-12            4900            289659 ns/op                 9.416 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100000-12            447           2341173 ns/op                 9.456 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000000-12            42          24926928 ns/op                 9.468 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000000-12            5         204098872 ns/op                 9.467 compression-rate
PASS
ok      github.com/VictoriaMetrics/VictoriaMetrics/lib/storage  15.018s
```

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-09-20 17:21:53 +02:00
Aliaksandr Valialkin
787b9cd9a0
lib/storage: improve performance for indexSearch.containsTimeRange()
The indexSearch.containsTimeRange() function is called for the current indexDB and the previous indexDB
every time when searching for metricIDs by label filters. This function consumes a lot of additional CPU time
for cases when queries with lightweight label filters are sent to VictoriaMetrics at high rate (e.g. thousands of RPS),
like in the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7009 .

Optimize indexSearch.containsTimeRange() function in the following ways:

- Unconditionally return true if this function is called for the current indexDB, since there are very high
  chances that the current indexDB contains the data with timestamps in the requested time range.

- Cache the minimum timestamp, which is missing in the indexed data for the previous indexDB.
  This is safe to do, since the previous indexDB is readonly.
  This optimization eliminates potentially slow lookup in the previous indexDB for typical
  use cases when the requested time range is close to the current time.
2024-09-20 13:07:20 +02:00
Aliaksandr Valialkin
6f61e9d49d
lib/storage: simplify indexDB.doExtDB() usage by removing the returned value
Previously indexDB.doExtDB() was returning boolean value, which was indicating whether f callback was called.
There is no need in returning this boolean value, since the f callback can determine on itself whether it was called.

This simplifies the code a bit.

While at it, document indexDB.doExtDB().
2024-09-20 11:59:57 +02:00
Roman Khavronenko
218c533874
lib/storage: follow-up after d8f8822fa5 (#7036)
Make function name and comments more clear.

d8f8822fa5

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-09-20 11:50:47 +02:00
Aliaksandr Valialkin
a3d8077959
lib/logstorage: make sure that getCommonTokens returns common tokens in the original order of tokens inside tokenSets arg
This fixes flaky test TestGetCommonTokensForOrFilters:

    filter_or_test.go:143: unexpected tokens for field "_msg"; got ["foo" "bar"]; want ["bar" "foo"]
2024-09-19 15:59:48 +02:00
Roman Khavronenko
e115b85770
lib/logger: increase default value of -loggerMaxArgLen cmd-line fla… (#7008)
…g from 1e3 to 5e3

This should improve visibility on errors produced by very long queries.

The change is classified as BUG in order to port it to LTS releases.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Mathias Palmersheim <mathias@victoriametrics.com>
2024-09-19 14:29:18 +02:00
Nikolay
d8f8822fa5
lib/storage: consistently check for missing metricID index records (#6967)
* Previously, only metricID->metricName missing index records were
tracked with deadline But it was possible a case for missing
metricID->TSID index records. IndexDB metrics fix exposed misleading
metric for such missing records.

* This commit adds check for metricID->TSID missing index records. And
delete missing metricID entry if it hit 60 second deadline.

Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6931

Signed-off-by: f41gh7 <nik@victoriametrics.com>
2024-09-16 10:05:08 +02:00
Nikolay
264c2ec6bd
lib/fs: properly call windows APIs (#6998)
Previously we manually imported system windows DDLs
and made direct syscall.

 But golang exposes syscall wrappers with sys/windows package.
It seems, that direct syscall was broken at 1.23 golang release. It was
`GetDiskFreeSpace` syscall in our case.

This commit replaces all manual syscalls with wrappers

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6973

Related golang issue:
https://github.com/golang/go/issues/69029

Signed-off-by: f41gh7 <nik@victoriametrics.com>
2024-09-13 12:22:25 +02:00
Aliaksandr Valialkin
657988ac3a
app/vlselect: consistently reuse the original query timestamp when executing /select/logsql/query with positive limit=N query arg
Previously the query could return incorrect results, since the query timestamp was updated with every Query.Clone() call
during iterative search for the time range with up to limit=N rows.

While at it, optimize queries, which find low number of matching logs, while spend a lot of CPU time for searching
across big number of logs. The optimization reduces the upper bound of the time range to search if the current time range
contains zero matching rows.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6785
2024-09-08 14:32:23 +02:00
Aliaksandr Valialkin
45a3713bdb
lib/logstorage: preserve the order of tokens to check against bloom filters in AND filters
Previously tokens from AND filters were extracted in random order. This could slow down
checking them agains bloom filters if the most specific tokens go at the beginning of the AND filters.
Preserve the original order of tokens when matching them against bloom filters,
so the user could control the performance of the query by putting the most specific AND filters
at the beginning of the query.

While at it, add tests for getCommonTokensForAndFilters() and getCommonTokensForOrFilters().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556
2024-09-08 12:27:30 +02:00
Aliaksandr Valialkin
eaee2d7db4
lib/logstorage: improve error logging for incorrect queries passed to /select/logsql/stats_query and /select/logsql/stats_query_range functions 2024-09-08 11:24:44 +02:00
Aliaksandr Valialkin
1cd06ace5a
lib/logstorage: properly extract common tokens from unsupported OR filters
Previously the following query could miss rows matching !bar if these rows do not contain foo:

   foo OR !bar

This is because of incorrect detection of common tokens for OR filters - all the unsupported filters
were skipped (including the NOT filter (aka `!`)), while in this case zero common tokens must be returned.

While at it, move repetiteve code in TestFilterAnd and TestFilterOr into f function.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556
2024-09-08 11:14:55 +02:00
Aliaksandr Valialkin
0a40064a6f
app/vlselect: add /select/logsql/stats_query_range endpoint for building time series panels in VictoriaLogs plugin for Grafana
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6943
Updates https://github.com/VictoriaMetrics/victorialogs-datasource/issues/61
2024-09-07 00:41:47 +02:00
Aliaksandr Valialkin
c9bb4ddeed
app/vlselect: add /select/logsql/stats_query endpoint, which is going to be used by vmalert
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6942
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706
2024-09-06 23:06:43 +02:00
Aliaksandr Valialkin
00e7d5add3
lib/logstorage: substitute | operator with or operator at math pipe
This is needed for avoiding confusion between the `|` operator at `math` pipe and `|` pipe delimiter.
For example, the following query was parsed unexpectedly:

   * | math foo / bar | fields x

as

   * | math foo / (bar | fields) as x

Substituting `|` with `or` inside `math` pipe fixes this ambiguity.
2024-09-06 22:44:14 +02:00
Artem Fetishev
a5424e95b3
lib/storage: adds metrics that count records that failed to insert
### Describe Your Changes

Add storage metrics that count records that failed to insert:

- `RowsReceivedTotal`: the number of records that have been received by
the storage from the clients
- `RowsAddedTotal`: the number of records that have actually been
persisted. This value must be equal to `RowsReceivedTotal` if all the
records have been valid ones. But it will be smaller otherwise. The
values of the metrics below should provide the insight of why some
records hasn't been added
-   `NaNValueRows`: the number of records whose value was `NaN`
- `StaleNaNValueRows`: the number of records whose value was `Stale NaN`
- `InvalidRawMetricNames`: the number of records whose raw metric name
has failed to unmarshal.

The following metrics existed before this PR and are listed here for
completeness:

- `TooSmallTimestampRows`: the number of records whose timestamp is
negative or is older than retention period
- `TooBigTimestampRows`: the number of records whose timestamp is too
far in the future.
- `HourlySeriesLimitRowsDropped`: the number of records that have not
been added because the hourly series limit has been exceeded.
- `DailySeriesLimitRowsDropped`: the number of records that have not
been added because the daily series limit has been exceeded.

---
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
2024-09-06 17:57:21 +02:00
Aliaksandr Valialkin
0205170409
lib/logstorage: consistently use nsecsPerDay constant and remove nsecPerDay constant 2024-09-06 16:17:04 +02:00
Aliaksandr Valialkin
258ccfb953
lib/logstorage: pre-calculate hashes from tokens used in bloom filter search
Previously per-token hashes for per-block bloom filters were re-calculated on every scanned block.
This could be slow when the number of tokens is big or when the number of blocks to scan is big.
Pre-calculate hashes for bloom filters and then use them for searching in bloom filters.
This improves performance by 2.5x for in(...) filters with many values to search inside `in()`.
2024-09-05 19:44:17 +02:00
Zhu Jiekun
c193e6d43e
lib/discovery/azure: fix host check in next link in Azure SD (#6915)
Previous bugfix at 49f63b2 only partially fixed pagination host validation error.

 Before this fix it was:
```
unexpected nextLink host \"management.azure.com\", expecting \"https://management.azure.com\"
```

Now we only check the `Host` without schema. 

However, when Azure respond `nextLink` in `Host:Port` format, the
`nextLink` check will fail:
```
unexpected nextLink host \"management.azure.com:443\", expecting \"management.azure.com\"
```

This pull request further relaxes the checks by only checking the
`Hostname`.

---

 related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6912
2024-09-05 16:48:09 +02:00
Artem Fetishev
39294b4919
lib/storage: do not drop stale NaN samples (#6936)
This patch reverts 1fd3385

After discussing it we've come to conclusion that this is a valid
behavior which can be avoided by deleting the time series only once the
corresponding stale NaNs have been received.

On the other hand, the fix leads to lost stale NaNs in some rare but
valid use cases. For example:

- In a cluster configuration the samples for a given time series are
normally sent to the same vmstorage replica. However, wminsert may
reroute the samples to another replica because the original one is down
or is overloaded. In this case the stale NaN may end up on a replica
that has no data for that time series, but we still want to record that
sample.

Thus, reverting that fix.

---

related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
2024-09-05 16:45:09 +02:00
Hui Wang
b48f5f3e59
lib/storage: fix metric vm_object_references{type="indexdb"} (#6937)
follow up
4ecc370acb

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-09-05 16:42:49 +02:00
Aliaksandr Valialkin
49e57ea80e
lib/logstorage: delete unused function - bloomfilter.containsAny 2024-09-05 16:21:06 +02:00
Aliaksandr Valialkin
2dd845fa53
lib/logstorage: properly fix incorrect extraction of common tokens for OR filters at distinct log fields
Previously (f1:foo OR f2:bar) was incorrectly returning `foo` token for `f1` and `bar` token for `f2`.
These tokens were used for checking against bloom filter for every data block, so the data block,
which didn't contain simultaneously `foo` token for `f1` field and `bar` token for `f2` field, was skipped.
This was incorrect, since such a block may contain logs matching the original OR filter.

The fix is to return common tokens from `OR`-delimted filters only if these tokens exist at EVERY such filter
for the given field name. If some `OR`-delimited filter misses the given field name, then `OR`-delimited filters
do not contain common tokens, which could be used for checking against bloom filter.

While at it, add more tests covering various edge cases for filters delimited by AND and OR.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556
2024-09-05 14:29:50 +02:00
f41gh7
7b0aaf1ea2
follow-up after 01430a155c
* properly check SeverityNumber at FormatSeverity function
 it could be negative, which could cause panic for victorialogs
2024-09-04 15:36:34 +02:00
Andrii Chubatiuk
01430a155c
vlinsert: added opentelemetry logs support
Commit adds the following changes:

* Adds support of OpenTelemetry logs for Victoria Logs with protobuf encoded messages

*  json encoding is not supported for the following reasons:
   - It brings a lot of fragile code, which works inefficiently.
   - json encoding is impossible to use with language SDK.

* splits metrics and logs structures at lib/protoparser/opentelemetry/pb package.

* adds docs with examples for opentelemetry logs.

---
Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4839

Co-authored-by: AndrewChubatiuk <andrew.chubatiuk@gmail.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2024-09-03 20:12:05 +02:00
rtm0
4df243d530
lib/storage: improve the message of the tooManyTimeseries error (#6893)
### Describe Your Changes

This is a follow-up for #6836. Per @valyala's
[comment](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6836#discussion_r1730291704),
the error message does not reflect which flag needs to be adjusted.

### Checklist

The following checks are **mandatory**:

- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
2024-09-03 10:28:03 +02:00
jackyin
975ed27a76
lib/logstorage: and filter results in unexpected response (#6556)
fix #6554
andfilter shouldn't return orfilter field which result in bloomfilter
return false.

---------

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-09-03 10:17:44 +02:00
rtm0
2c856c6951
tests: check Metrics.RowsAddedTotal in unit tests (#6895)
### Describe Your Changes

This is a follow-up PR: Unit tests introduced in #6872 can now use
RowsAddedTotal counter whose scope was fixed in #6841.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-08-30 14:31:15 +02:00
Roman Khavronenko
f586082520
attempt to fix flaky TestClientProxyReadOk (#6899)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-08-30 13:23:32 +02:00
dufucun
95bafc8caf
tests: fix slice init length (#6897)
### Describe Your Changes

fix slice init length

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: dufucun <dufuchun@sohu.com>
2024-08-30 10:55:25 +02:00
rtm0
334cd92a6c
testing: allow disabling fsync to make tests run faster (#6871)
### Describe Your Changes

fsync() ensures that the data is written to disk. In production this is
needed for data durability. However, during the development, when the
unit tests are run, this level of durability is not needed. Therefore
fsync() can be disabled which will makes test runs two times faster.

The disabling is done by setting the `DISABLE_FSYNC_FOR_TESTING`
environment variable. The valid values for this variable are the same as
the values of the arg of `go doc strconv.ParseBool`:

```
1, t, T, TRUE, true, True, 0, f, F, FALSE, false, False.
```

Any other value means `false`.

The variable is set for all test build targets. Compare running times:

Build Target | DISABLE_FSYNC_FOR_TESTING=0 | DISABLE_FSYNC_FOR_TESTING=1
----------------- | ------------------------------------------------ |
-------------------------------------------------
make test | 1m5s  | 0m22s
make test-race | 3m1s | 1m42s
make test-pure | 1m7s | 0m20s
make test-full | 1m21s | 0m32s
make test-full-386 | 1m42s | 0m36s

When running tests for a given package, fsync can be disabled as
follows:

```shell
DISABLE_FSYNC_FOR_TESTING=1 go test ./lib/storage
```

Disabling fsync() is intended for testing purposes only and the name of
the variables reflects that.

What could also have been done but haven't:

- lib/filestream/filestream.go: `Writer.MustFlush()` also uses f.Sync()
but nothing has been done to it, because the Writer.MustFlush() is not
used anywhere in the VM codebase. A side question: what is the general
policy for the unused code?
- lib/filestream/filestream.go: Writer.Write() calls `adviceDontNeed()`
which calls unix.Fdatasync(). Disabling it could potentially improve
running time, but running tests with this code disabled has shown
otherwise.

### Checklist

The following checks are **mandatory**:

- [ x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
2024-08-30 10:54:46 +02:00
f41gh7
e3e06b1f47
Merge remote-tracking branch 'origin/master' into pmm-6401-read-prometheus-data-files-cpc 2024-08-29 15:47:43 +02:00
Nikolay
4ecc370acb
lib/storage: properly add previous indexDB metrics (#6890)
Previously, some extIndexDB metrics were not registered. It resulted
into missing metrics, if metric value was added to the extIndexDB. It's
a usual case for search requests at both indexes.

 Current commit updates all metrics from extIndexDB according to the
current IndexDB. It must fix such cases

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6868

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-08-28 11:14:28 +02:00
rtm0
9fcfba3927
lib/storage: properly handle maxMetrics limit at metricID search
`TL;DR` This PR improves the metric IDs search in IndexDB:

- Avoid seaching for metric IDs twice when `maxMetrics` limit is
exceeded
- Use correct error type for indicating that the `maxMetrics` limit is
exceded
- Simplify the logic of deciding between per-day and global index search

A unit test has been added to ensure that this refactoring does not
break anything.

---

Function calls before the fix:

```
idb.searchMetricIDs
    |__ is.searchMetricIDs
        |__ is.searchMetricIDsInternal
            |__ is.updateMetricIDsForTagFilters
                |__ is.tryUpdatingMetricIDsForDateRange
                |                       |
                |__ is.getMetricIDsForDateAndFilters
```

- `searchMetricIDsInternal` searches metric IDs for each filter set. It
maintains a metric ID set variable which is updated every time the
`updateMetricIDsForTagFilters` function is called. After each successful
call, the function checks the length of the updated metric ID set and if
it is greater than `maxMetrics`, the function returns `too many
timeseries` error.
- `updateMetricIDsForTagFilters` uses either per-day or global index to
search metric IDs for the given filter set. The decision of which index
to use is made is made within the `tryUpdatingMetricIDsForDateRange`
function and if it returns `fallback to global search` error then the
function uses global index by calling `getMetricIDsForDateAndFilters`
with zero date.
- `tryUpdatingMetricIDsForDateRange` first checks if the given time
range is larger than 40 days and if so returns `fallback to global
search` error. Otherwise it proceeds to searching for metric IDs within
that time range by calling `getMetricIDsForDateAndFilters` for each
date.
- `getMetricIDsForDateAndFilters` searches for metric IDs for the given
date and returns `fallback to global search` error if the number of
found metric IDs is greater than `maxMetrics`.

Problems with this solution:

1. The `fallback to global search` error returned by
`getMetricIDsForDateAndFilters` in case when maxMetrics is exceeded is
misleading.
2. If `tryUpdatingMetricIDsForDateRange` proceeds to date range search
and returns `fallback to global search` error (because
`getMetricIDsForDateAndFilters` returns it) then this will trigger
global search in `updateMetricIDsForTagFilters`. However the global
search uses the same maxMetrics value which means this search is
destined to fail too. I.e. the same search is performed twice and fails
twice.
3. `too many timeseries` error is already handled in
`searchMetricIDsInternal` and therefore handing this error in
`updateMetricIDsForTagFilters` is redundant
4. updateMetricIDsForTagFilters is a better place to make a decision on
whether to use per-day or global index.

Solution:

1.  Use a dedicated error for `too many timeseries` case
2. Handle `too many timeseries` error in  `searchMetricIDsInternal` only
3. Move the per-day or global search decision from
`tryUpdatingMetricIDsForDateRange` to `updateMetricIDsForTagFilters` and
remove `fallback to global search` error.


---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-08-27 21:39:03 +02:00
rtm0
eef6943084
lib/storage: properly register index records with RegisterMetricNames
Once the timeseries is in tsidCache, new entries won't be created in
per-day index because the RegisterMetricNames() code does consider
different dates for the same timeseries. So this case has been added.

The same bug exists for AddRows() but it is not manifested because the
index entries are finally created in updatePerDateData().

RegisterMetricNames also updated to increase the newTimeseriesCreated
counter because it actually creates new time series in index.

A unit tests has been added that check all possible data patterns
(different metric names and dates) and code branches in both
RegisterMetricNames and AddRows. The total number of new unit tests is
around 100 which increaded the running time of storage tests by 50%. 

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2024-08-27 21:33:53 +02:00
rtm0
30f98916f9
Move rowsAddedTotal counter to Storage (#6841)
### Describe Your Changes

Reduced the scope of rowsAddedTotal variable from global to Storage.

This metric clearly belongs to a given Storage object as it counts the
number of records added by a given Storage instance.
Reducing the scope improves the incapsulation and allows to reset this
variable during the unit tests (i.e. every time a new Storage object is
created by a test, that object gets a new variable).



Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
2024-08-27 21:30:37 +02:00
Zhu Jiekun
e97e966f82
lib/promrelabel: follow-up for 8958cecad6
In the previous commit 8958cecad6
the default ports (80/443) were removed for both the `scrapeURL` and
`instance` label values for those targets without a port in
`__address__`. Different values in the `instance` label generate new
time series.

This commit reverts the changes made to the `instance` label. Now,
for those targets:
- `scrapeURL` will remain unchanged.
- The `instance` label value will include the default port.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792
2024-08-27 13:04:26 +02:00
Nikolay
9feee15493
lib/promscrape: fixes proxy autorization (#6783)
* Adds custom dial func for HTTP-Connect and socks5 proxy tunnels.
  Standard golang http.transport exposes GetProxyConnectHeader function,
  but it doesn't allow to use separate tls config for proxy.
  It also not possible to enforce HTTP-Connect with standard http lib.
* For http scrape targets, by default http.Transport.Proxy function must
  be used. Since it has special case with full uri forward.
* Adds proxy.URL json methods that allow to properly copy internal
fields, like User/Password.
It should fix bug with proxy_url. When credentials specified at URL was
ignored.
* Adds tests for scrape client proxy requests

related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6771
2024-08-19 22:31:18 +02:00
Zhu Jiekun
723d834c1a
lib/promrelabel: stop adding default port 80/433 to address label
*  It was necessary to add default ports for fasthttp client. After migration to the std.httpclient it's no longer needed.
* An additional configuration is required at proxy servers with implicitly set 80/443 ports to the host header (such as HA proxy.

It's expected that after upgrade __address_ label may change. But it should be rare case. 80/443 ports are not widely used at monitoring ecosystem. And it shouldn't have much impact. 

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792

Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-08-19 22:28:49 +02:00
hagen1778
febba3971b
make go vet happy
Address `non-constant format string in call` check:
https://github.com/golang/go/issues/60529

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-08-19 21:15:33 +02:00
Roman Khavronenko
e58dde6925
lib/httputils: parse URL before creating HTTP transport (#6820)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6740

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-08-16 11:32:04 +02:00
Hui Wang
62d19369a3
stream aggregation: do not allow to enable -stream.keepInput and `k… (#6723)
…eep_metric_names` options in stream aggregation config together

With aggregated data and raw data under the same metric, results would
be confusing.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-08-13 08:54:35 -04:00
Zhu Jiekun
9e2bd82376
app/vmagent: fixes azure service discovery pagination
Azure API response with link to the next page was incorrectly validate. Validation used url.Host header to match configure API URL.


https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6784
2024-08-09 15:22:47 +02:00
Zakhar Bessarab
cb00b4b00f
lib/backup/s3remote: add retryer configuration (#6747)
### Describe Your Changes

This helps to improve reliability of performing backups in environments
with unreliable connection and tolerate temporary errors at S3 provider
side.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6732

Default retry timeout is up to 3 minutes to make this consistent with
the same configuration for GCS:
a05317f61f/lib/backup/gcsremote/gcs.go (L70-L76)

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-08-07 16:55:29 +02:00
Roman Khavronenko
f28f496a9d
lib/bytesutil: smooth buffer growth rate (#6761)
Before, buffer growth was always x2 of its size, which could lead to
excessive memory usage when processing big amount of data.
For example, scraping a target with hundreds of MBs in response could
result into hih memory spikes in vmagent because buffer has to double
its size to fit the response. See
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6759

The change smoothes out the growth rate, trading higher allocation rate
for lower mem usage at certain conditions.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-08-07 16:49:43 +02:00
hagen1778
1154f90d2d
lib/mergeset: fix typos in comments
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-08-07 15:54:15 +02:00
Aliaksandr Valialkin
04981c7a7f
lib/streamaggr: remove resetState arg from aggrState.flushState()
The resetState arg was used only for the BenchmarkAggregatorsFlushInternalSerial benchmark.
This benchmark was testing aggregate state flush performance by keeping the same state across flushes.
The benhmark didn't reflect the performance and scalability of stream aggregation in production,
while it led to non-trivial code changes related to resetState arg handling.

So let's drop the benchmark together with all the code related to resetState handling,
in order to simplify the code at lib/streamaggr a bit.

Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314
2024-08-07 11:39:14 +02:00
Aliaksandr Valialkin
86c7afd126
lib/streamaggr: consistently use the same timestamp across all the output aggregated samples in a single aggregation interval
Prevsiously every aggregation output was using its own timestamp for the output aggregated samples
in a single aggregation interval. This could result in unexpected inconsitent timesetamps for the output
aggregated samples.

This commit consistently uses the same timestamp across all the output aggregated samples.
This commit makes sure that the duration between subsequent timestamps strictly equals
the configured aggregation interval.

Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314
This commit should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580
2024-08-07 11:39:13 +02:00
Anzor
994796367b
app/vmagent: read __sample_limit__ from labels (#6665) (#6666)
By introducing this feature, users will have the ability to customize
the sampleLimit parameter on a per-target basis, providing more
flexibility and control over the job execution behavior.
2024-08-07 09:36:14 +02:00
hagen1778
f283126084
fix typos in comments
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-08-06 14:54:49 +02:00
Zakhar Bessarab
9877a5e7d5
app/{vminsert,vmagent}: add healthcheck for influx ingestion endpoints (#6749)
### Describe Your Changes

This is useful for clients which validate InfluxDB is available before
data ingestion can be started.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6653

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-08-05 09:34:54 +02:00
f41gh7
2557e66ee0
Merge tag 'tags/v1.102.1' into pmm-6401-read-prometheus-data-files-cpc 2024-08-02 11:20:14 +02:00
Juraj Bubniak
11c0b05e8a
lib/backup/s3remote: fix typos (#6694)
Fixes a few typos in errors in lib/backup/s3remote package.
2024-07-29 14:18:31 +02:00
jackyin
e5d279bb71
lib/netutil: validate TLS cert and key files immediately (#6621)
Validate files specified via `-tlsKeyFile` and `-tlsCertFile` cmd-line flags on the process start-up. Previously, validation happened on the first connection accepted by HTTP server.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6608

---------

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-07-29 13:58:53 +02:00
Aliaksandr Valialkin
8551fbe9f3
Revert "refactor(vmstorage): Refactor the code to reduce the time complexity of MustAddRows and improve readability (#6629)"
This reverts commit e280d90e9a.

Reason for revert: the updated code doesn't improve the performance of table.MustAddRows for the typical case
when rows contain timestamps belonging to ptws[0].

The performance may be improved in theory for the case when all the rows belong to partiton other than ptws[0],
but this partition is automatically moved to ptws[0] by the code at lines
6aad1d43e9/lib/storage/table.go (L287-L298) ,
so the next time the typical case will work.

Also the updated code makes the code harder to follow, since it introduces an additional level of indirection
with non-trivial semantics inside table.MustAddRows - the partition.TimeRangeInPartition() function.
This function needs to be inspected and understood when reading the code at table.MustAddRows().
This function depends on minTsInRows and maxTsInRows vars, which are defined and initialized
many lines above the partition.TimeRangeInPartition() call. This complicates reading and understanding
the code even more.

The previous code was using clearer loop over rows with the clear call to partition.HasTimestamp()
for every timestamp in the row. The partition.HasTimestamp() call is used in the table.MustAddRows()
function multiple times. This makes the use of partition.HasTimestamp() call more consistent,
easier to understand and easier to maintain comparing to the mix of partition.HasTimestamp() and partition.TimeRangeInPartition()
calls.

Aslo, there is no need in documenting some hardcore software engineering refactoring at docs/CHANGLELOG.md,
since the docs/CHANGELOG.md is intended for VictoriaMetrics users, who may not know software engineering.
The docs/CHANGELOG.md must document user-visible changes, and the docs must be concise and clear for VictoriaMetrics users.
See https://docs.victoriametrics.com/contributing/#pull-request-checklist for more details.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6629
2024-07-25 14:32:09 +02:00
Ruixiang Tan
e280d90e9a
refactor(vmstorage): Refactor the code to reduce the time complexity of MustAddRows and improve readability (#6629)
### Describe Your Changes
The original logic is not only highly complex but also poorly readable,
so it can be modified to increase readability and reduce time
complexity.


---------

Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>
2024-07-25 08:55:12 +02:00
Aliaksandr Valialkin
65ce4e30ab
lib/backup/azremote: follow-up for 5fd3aef549
- Mention that credentials can be configured via env variables at both vmbackup and vmrestore docs.

- Make clear that the AZURE_STORAGE_DOMAIN env var is optional at https://docs.victoriametrics.com/vmbackup/#providing-credentials-via-env-variables

- Use string literals as is for env variable names instead of indirecting them via string constants.
  This makes easier to read and understand the code. These environment variable names aren't going to change
  in the future, so there is no sense in hiding them under string constants with some other names.

- Refer to https://docs.victoriametrics.com/vmbackup/#providing-credentials-via-env-variables in error messages
  when auth creds are improperly configured. This should simplify figuring out how to fix the error.

- Simplify the code a bit at FS.newClient(), so it is easier to follow it now.
  While at it, remove the check when superflouos environment variables are set, since it is too fragile
  and it looks like it doesn't help properly configuring vmbackup / vmrestore.

- Remove envLookuper indirection - just use 'func(name string) (string, bool)' type inline.
  This simplifies code reading and understanding.

- Split TestFSInit() into TestFSInit_Failure() and TestFSInit_Success(). This simplifies the test code,
  so it should be easier to maintain in the future.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6518
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984
2024-07-17 17:55:06 +02:00
Aliaksandr Valialkin
eaed0465d2
all: substitute double "the the" with "the"
This is a follow-up for 8786a08d27

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6600
2024-07-17 14:28:12 +02:00
Aliaksandr Valialkin
9c4b0334f2
all: consistently use stringsutil.JSONString() for formatting JSON strings with fmt.* functions instead of using "%q" formatter
The %q formatter may result in incorrectly formatted JSON string if the original string
contains special chars such as \x1b . They must be encoded as \u001b , otherwise the resulting JSON string
cannot be parsed by JSON parsers.

This is a follow-up for c0caa69939

See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24
2024-07-17 13:52:13 +02:00
Aliaksandr Valialkin
8ff051b287
lib/protoparser/graphite: use Regex.ReplaceAllLiteralString instead of Regex.ReplaceAllString for the case when the replacement cannot contain placeholders for capturing groups
This is a follow-up for 74affa3aec
2024-07-17 13:01:05 +02:00
Aliaksandr Valialkin
74affa3aec
lib/protoparser/graphite: follow-up for 476faf5578
- Clarify the description of -graphite.sanitizeMetricName command-line flag at README.md
- Do not sanitize tag values - only metric names and tag names must be sanitized,
  since they are treated specially by Grafana. Grafana doesn't apply any restrictions on tag values.
- Properly replace more than two consecutive dots with a single dot.
- Disallow unicode letters in metric names and tag names, since neither Prometheus nor Grafana
  do not support them.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6489
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6077
2024-07-17 12:41:55 +02:00
Aliaksandr Valialkin
58a757cd01
lib: consistently use regexp.Regexp.ReplaceAllLiteralString instead of regexp.Regexp.ReplaceAllString in places where the replacement cannot contain matching group placeholders 2024-07-17 12:41:54 +02:00
rtm0
bdc0e688e8
Fix inconsistent error handling in Storage.AddRows() (#6583)
### Describe Your Changes

`Storage.AddRows()` returns an error only in one case: when
`Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But
the same error is treated as a warning when it happens inside
`Storage.add()` or returned by `Storage.prefillNextIndexDB()`.

This commit fixes this inconsistency by treating the error returned by
`Storage.updatePerDateData()` as a warning as well. As a result
`Storage.add()` does not need a return value anymore and so doesn't
`Storage.AddRows()`.

Additionally, this commit adds a unit test that checks all cases that
result in a row not being added to the storage.



---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-07-17 12:07:14 +02:00
Aliaksandr Valialkin
c1e32f4517
lib/promrelabel: add test for IfExpression.String() function
While at it, simplify this function a bit after the commit 861852f262

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462
2024-07-16 18:31:05 +02:00
Aliaksandr Valialkin
4304950391
lib/promscrape/discovery/yandexcloud: follow-up for 070abe5c71
- Obtain IAM token via GCE-like API instead of Amazon EC2 IMDSv2 API,
  since it looks like IMDBSv2 API isn't supported by Yandex Cloud
  according to https://yandex.cloud/en/docs/security/standard/authentication#aws-token :

  > So far, Yandex Cloud does not support version 2, so it is strongly recommended
  > to technically disable getting a service account token via the Amazon EC2 metadata service.

- Try obtaining IAM token via GCE-like API at first and then fall back to the deprecated Amazon EC2 IMDBSv1.
  This should prevent from auth errors for instances with disabled GCE-like auth API.
  This addresses @ITD27M01 concern at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513#issuecomment-1867794884

- Make more clear the description of the change at docs/CHANGELOG.md , add reference to the related issue.

P.S. This change wasn't tested in prod because I have no access to Yandex Cloud.
It is recommended to test this change by @ITD27M01 and @vmazgo , who filed
the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6524
2024-07-16 17:58:40 +02:00
Aliaksandr Valialkin
57000f5105
lib/promscrape: follow-up for 1e83598be3
- Clarify that the -promscrape.maxScrapeSize value is used for limiting the maximum
  scrape size if max_scrape_size option isn't set at https://docs.victoriametrics.com/sd_configs/#scrape_configs

- Fix query example for scrape_response_size_bytes metric at https://docs.victoriametrics.com/vmagent/#automatically-generated-metrics

- Mention about max_scrape_size option at the -help description for -promscrape.maxScrapeSize command-line flag

- Treat zero value for max_scrape_size option as 'no scrape size limit'

- Change float64 to int type for scrapeResponseSize struct fields and function args, since response size cannot be fractional

- Optimize isAutoMetric() function a bit

- Sort auto metrics in alphabetical order in isAutoMetric() and in scrapeWork.addAutoMetrics() functions
  for better maintainability in the future

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6434
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429
2024-07-16 12:38:21 +02:00
Aliaksandr Valialkin
7a3394bbe1
Revert "lib/protoparser/opentelemetry/firehose: escape requestID before returning it to user (#6451)"
This reverts commit cd1aca217c.

Reason for revert: this commit has no sense, since the firehose response has application/json content-type,
so it must contain JSON-encoded timestamp and requestId fields according to https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html#responseformat .
HTML-escaping the requestId field may break the response, so the client couldn't correctly recognize the html-escaped requestId.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6451
2024-07-16 09:49:19 +02:00
Aliaksandr Valialkin
233e5f0a9e
lib/httpserver: skip basic auth check for additional request paths, which should call httpserver.CheckAuthFlag()
This is a follow-up for 61dce6f2a1

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6338
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329
2024-07-16 01:00:45 +02:00
Aliaksandr Valialkin
784327ea30
lib/uint64set: optimize Set.Has() for nil Set - it should be inlined now
This makes unnecessary the checkDeleted variable at lib/storage/index_db.go

This is a follow-up for b984f4672e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6342
2024-07-15 23:59:20 +02:00
Aliaksandr Valialkin
832e088659
lib/mergeset: properly update TableMetrics.TooLongItemsDroppedTotal inside Table.UpdateMetrics
Substitute '+=' with '=', since tooLongItemsTotal is global counter, which doesn't belong to the Table struct.

This is a follow-up for 69d244e6fb
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6297
2024-07-15 23:39:10 +02:00
Aliaksandr Valialkin
a468a6e985
lib/{httputils,netutil}: move httputils.GetStatDialFunc to netutil.NewStatDialFunc
- Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call
- NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil
- Simplify the implementation of NewStatDialFunc by removing sync.Map from there.
- Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils
- Use gauge instead of counter type for *_conns metric

This is a follow-up for d7b5062917
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299
2024-07-15 23:02:34 +02:00
Aliaksandr Valialkin
ad367c17bf
lib/streamaggr/streamaggr.go: typo fix after 5e29ef5ed5: IgnoredNaNSamples -> ignoredNaNSamples 2024-07-15 21:58:56 +02:00
Aliaksandr Valialkin
db557b86ee
app/vmagent/remotewrite: follow-up for f153f54d11
- Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go .
  This improves code maintainability a bit.

- Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs().
  This prevents from potential resource leaks.

- Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation.
  This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions.

- Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label
  in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics.

- Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation.
  This label should simplify investigation of the exposed metrics.

- Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability:
  it is hard to use these labels in query filters and aggregation functions.

- Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` .
  This metric shows the number of samples generated by the corresponding streaming aggregation rule.
  This metric has been added in the commit 861852f262 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

- Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice.
  This metric has been added in the commit 861852f262 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

- Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params,
  which could modify the behaviour of the constructed streaming aggregator.
  Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory.

- Pass Options arg to LoadFromFile() function by reference, since this structure is quite big.
  This also allows passing nil instead of Options when default options are enough.

- Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics,
  so they have consistent set of labels comparing to the rest of streaming aggregation metrics.

- Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding
  `aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding
  `output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output
  metrics. This is a follow-up for the commit 2eb1bc4f81 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604

- Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users
  figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason
  in the commit 2eb1bc4f81 .

- Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function.
  While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268
2024-07-15 20:24:01 +02:00
Aliaksandr Valialkin
202e5704e6
vendor: update github.com/VictoriaMetrics/metrics from v1.34.1 to v1.35.0
Fix potential memory leaks across VictoriaMetrics codebase after metrics.UnregisterSet(s) call
because of missing s.UnregisterAllMetrics() call.

This is a follow-up for 6a6e34ab8e . It is OK if some vmauth metrics
aren't visible for a few microseconds when the previous metrics are unregistered and new metrics
weren't registered yet.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6247
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6252
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5805
2024-07-15 10:43:37 +02:00
Aliaksandr Valialkin
c995ccad93
lib/{storage,mergeset}: do not allow setting dataFlushInterval to values smaller than pending{Items,Rows}FlushInterval
Pending rows and items unconditionally remain in memory for up to pending{Items,Rows}FlushInterval,
so there is no any sense in setting dataFlushInterval (the interval for guaranteed flush of in-memory data to disk)
to values smaller than pending{Items,Rows}FlushInterval, since this doesn't affect the interval
for flushing pending rows and items from memory to disk.

This is a follow-up for 4c80b17027

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6221
2024-07-15 10:08:15 +02:00
Aliaksandr Valialkin
48ec66883a
lib/streamaggr: consistently use alphabetical order of benchmarked stream aggregation outputs 2024-07-15 09:53:19 +02:00
Aliaksandr Valialkin
5354374b62
lib/streamaggr: follow-up for 9c3d44c8c9
- Consistently enumerate stream aggregation outputs in alphabetical order across the source code and docs.
  This should simplify future maintenance of the corresponding code and docs.

- Fix the link to `rate_sum()` at `see also` section of `rate_avg()` docs.

- Make more clear the docs for `rate_sum()` and `rate_avg()` outputs.

- Encapsulate output metric suffix inside rateAggrState. This eliminates possible bugs related
  to incorrect suffix passing to newRateAggrState().

- Rename rateAggrState.total field to less misleading rateAggrState.increase name, since it calculates
  counter increase in the current aggregation window.

- Set rateLastValueState.prevTimestamp on the first sample in time series instead of the second sample.
  This makes more clear the code logic.

- Move the code for removing outdated entries at rateAggrState into removeOldEntries() function.
  This make the code logic inside rateAggrState.flushState() more clear.

- Do not write output sample with zero value if there are no input series, which could be used
  for calculating the rate, e.g. if only a single sample is registered for every input series.

- Do not take into account input series with a single registered sample when calculating rate_avg(),
  since this leads to incorrect results.

- Move {rate,total}AggrState.flushState() function to the end of rate.go and total.go files, so they look more similar.
  This shuld simplify future mantenance.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6243
2024-07-15 08:40:09 +02:00
Aliaksandr Valialkin
0145b65f25
app/vmagent/remotewrite: follow-up for 87fd400dfc
- Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage
  systems are configured with the disabled on-disk queue, every in-memory queue is full
  and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common,
  so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx
  relabeling and other processing in this case.

- Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped().
  Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set.
  In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full
  and on-disk queue is disabled.
  The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing
  the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage
  processes pending data, so it drops aggregated samples in this case.

- Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output,
  so it is clear that this flag can be set individually per each -remoteWrite.url.

- Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems
  are configured with the disabled on-disk queue, then there is no sense in keeping samples
  on some of these systems, while dropping samples on the remaining systems, since this
  will result in global stall on the remote storage system with the disabled on-disk queue
  and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false
  from remotewrite.TryPush() in this case. This will result in infinite duplicate samples
  written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload
  is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set.
  This allows proceeding with newly scraped / pushed samples by sending them to the remaining
  remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set.

- Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test.

- Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url.
  See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065
2024-07-13 02:25:19 +02:00
Aliaksandr Valialkin
0078399788
app/vmalert: switch from table-driven tests to f-tests
This makes test code more clear and reduces the number of code lines by 500.
This also simplifies debugging tests. See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e

While at it, consistently use t.Fatal* instead of t.Error* across tests, since t.Error*
requires more boilerplate code, which can result in additional bugs inside tests.
While t.Error* allows writing logging errors for the same, this doesn't simplify fixing
broken tests most of the time.

This is a follow-up for a9525da8a4
2024-07-12 22:41:11 +02:00
hagen1778
2f65956259
lib/streamaggr: add missing test cases
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-07-12 11:06:45 +02:00
Hui Wang
2eb1bc4f81
vmagent: fix vm_streamaggr_flushed_samples_total counter (#6604)
We use `vm_streamaggr_flushed_samples_total` to show the number of
produced samples by aggregation rule, previously it was overcounted, and
doesn't account for `output_relabel_configs`.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-07-12 10:56:07 +02:00
hagen1778
03e4c5c19c
lib/bakcup/azremote: follow-up after 5fd3aef549
Simplify tests by converting them to f-tests.

5fd3aef549
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-07-10 13:06:27 +02:00
justinrush
5fd3aef549
lib/backup: add support for Azure Managed Identity (#6518)
### Describe Your Changes

These changes support using Azure Managed Identity for the `vmbackup`
utility. It adds two new environment variables:

* `AZURE_USE_DEFAULT_CREDENTIAL`: Instructs the `vmbackup` utility to
build a connection using the [Azure Default
Credential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity@v1.5.2#NewDefaultAzureCredential)
mode. This causes the Azure SDK to check for a variety of environment
variables to try and make a connection. By default, it tries to use
managed identity if that is set up.

This will close
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

### Testing

However you normally test the `vmbackup` utility using Azure Blob should
continue to work without any changes. The set up for that is environment
specific and not listed out here.

Once regression testing has been done you can set up [Azure Managed
Identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview)
so your resource (AKS, VM, etc), can use that credential method. Once it
is set up, update your environment variables according to the updated
documentation.

I added unit tests to the `FS.Init` function, then made my changes, then
updated the unit tests to capture the new branches.

I tested this in our environment, but with SAS token auth and managed
identity and it works as expected.

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Justin Rush <jarush@epic.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-07-10 11:52:05 +02:00
Aliaksandr Valialkin
ac06569c49
app/vlinsert/loki: use easyproto instead for parsing Loki protobuf messages 2024-07-10 03:05:17 +02:00
Aliaksandr Valialkin
aa9bb99527
lib/logstorage: drop all the pipes from the query when calculating the number of matching logs at /select/logsql/hits API 2024-07-10 00:39:28 +02:00
Aliaksandr Valialkin
3c02937a34
all: consistently use 'any' instead of 'interface{}'
'any' type is supported starting from Go1.18. Let's consistently use it
instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.
2024-07-10 00:20:37 +02:00
Aliaksandr Valialkin
a9525da8a4
lib: consistently use f-tests instead of table-driven tests
This makes easier to read and debug these tests. This also reduces test lines count by 15% from 3K to 2.5K
See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e

While at it, consistently use t.Fatal* instead of t.Error*, since t.Error* usually leads
to more complicated and fragile tests, while it doesn't bring any practical benefits over t.Fatal*.
2024-07-09 22:40:50 +02:00
Aliaksandr Valialkin
35b3b95cbc
lib/promscrape/discovery/vultr: follow-up after 17e3d019d2
- Sort the discovered labels in alphabetical order at https://docs.victoriametrics.com/sd_configs/#vultr_sd_configs
- Rename VultrConfigs to VultrSDConfigs to be consistent with the naming for other SD configs.
- Prepare query arg filters for `list instances API` at newAPIConfig() instead of passing them in a separate listParams struct.
  This simplifies the code a bit.
- Return error when bearer token isn't set at vultr_sd_configs, since this token is mandatory
  according to https://docs.victoriametrics.com/sd_configs/#vultr_sd_configs
- Remove unused fields from the parsed response from Vultr list instances API in order to simplify the code a bit.
- Remove double logging of errors inside getInstances() function, since these errors must be already logged by the caller.
- Simplify tests, so they are easier to maintain.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6068
2024-07-05 17:40:03 +02:00
Aliaksandr Valialkin
c0caa69939
lib/logstorage: use quicktemplate.AppendJSONString instead of strconv.AppendQuote for encoding JSON strings
The strconv.AppendQuote improperly encodes special chars such as \x1b . They must be encoded as \u001b .

See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24
2024-07-05 01:22:23 +02:00
Aliaksandr Valialkin
2da7dfc754
Revert c6c5a5a186 and b2765c45d0
Reason for revert:

There are many statsd servers exist:

- https://github.com/statsd/statsd - classical statsd server
- https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ )
- https://github.com/avito-tech/bioyino - high-performance statsd server
- https://github.com/atlassian/gostatsd - statsd server in Go
- https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics

These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics
according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd (
the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target
according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ).

Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides
significant advantages over the existing statsd servers, while has no significant drawbacks comparing
to existing statsd servers.

The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server.
The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics (
see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers.
So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating
from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent).

Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server
with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation.
In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation.
This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated
statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed
during querying.

P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic
specialized configuration for aggregating of statsd metrics. The main requirements for this configuration:

- easy to write, read and update (ideally it should work out of the box for most cases without additional configuration)
- hard to misconfigure (e.g. hard to shoot yourself in the foot)

It would be great if this configuration will be compatible with the configuration of the most widely used statsd server.

In the mean time it is recommended continue using external statsd server.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600
2024-07-03 23:51:56 +02:00
Aliaksandr Valialkin
d8c7cc266b
lib/promscrape: use prompbmarshal.MustParsePromMetrics function at parseData() test function
The prompbmarshal.MustParsePromMetrics function has been added in the commit cc4d57d650
2024-07-03 16:08:13 +02:00
Aliaksandr Valialkin
bb00bae353
Revert "Exemplar support (#5982)"
This reverts commit 5a3abfa041.

Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below).
Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever
once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite
that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ .

Issues with Prometheus exemplars:

- Prometheus still has only experimental support for exemplars after more than three years since they were introduced.
  It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature.
  See 0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)
  and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage

- It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs
  for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API
  for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars
  via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus .

- It looks like exemplars are supported for Histogram metric types only -
  see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar .
  Exemplars aren't supported for Counter, Gauge and Summary metric types.

- Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query
  contains histogram_quantile() function. It queries exemplars via special Prometheus API -
  https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.)
  and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work
  in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets
  exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph.
  This makes the graph unusable, since it is litterally filled with thousands of exemplar dots.
  Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars.

- Exemplars are usually connected to traces. While traces are good for some

I doubt exemplars will become production-ready in the near future because of the issues outlined above.

Alternative to exemplars:

Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs -
just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry!
This doesn't work as expected in production as shown above. Are there better solutions, which work in production?
Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job`
and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry
by these labes on the time range with the anomaly on metrics' graph.
2024-07-03 15:30:21 +02:00
Aliaksandr Valialkin
cc4d57d650
app/vmagent/remotewrite,lib/streamaggr: re-use common code in tests after 879771808b
- Export streamaggr.LoadFromData() function, so it could be used in tests outside the lib/streamaggr package.
  This allows removing a hack with creation of temporary files at TestRemoteWriteContext_TryPush_ImmutableTimeseries.

- Move common code for mustParsePromMetrics() function into lib/prompbmarshal package,
  so it could be used in tests for building []prompbmarshal.TimeSeries from string.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6206
2024-07-03 15:21:36 +02:00
Aliaksandr Valialkin
f17b408643
lib/streamaggr: follow-up for the commit c0e4ccb7b5
- Clarify docs for `Ignore aggregation intervals on start` feature.

- Make more clear the code dealing with ignoreFirstIntervals at aggregator.runFlusher() functions.
  It is better from readability and maintainability PoV using distinct a.flush() calls
  for distinct cases instead of merging them into a single a.flush() call.

- Take into account the first incomplete interval when tracking the number of skipped aggregation intervals,
  since this behaviour is easier to understand by the end users.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6137
2024-07-02 21:24:50 +02:00
Andrii Chubatiuk
476faf5578
lib/protoparser/graphite: added -graphite.sanitizeMetricName flag (#6489)
### Describe Your Changes

Added flag to sanitize graphite metrics
fixes #6077

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-07-02 14:56:41 +02:00
Aliaksandr Valialkin
3b6c78c26c
lib/logstorage: allow writing after N in front of before N at stream_context pipe 2024-07-02 01:38:20 +02:00
Andrii Chubatiuk
861852f262
lib/streamaggr: added stale samples metric, added metrics labels (#6462)
### Describe Your Changes

- added stale metrics counters for input and output samples
- added labels for aggregator metrics =>
`name="{rwctx}:{aggrId}:{aggrSuffix}"`
   - rwctx - global or number starting from 1
   - aggrid - aggregator id starting from 1
   - aggrSuffix - <interval>_(by|without)_label1_label2_labeln
   e.g: `name="global:1:1m_without_instance_pod"`

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-07-01 14:56:17 +02:00
Aliaksandr Valialkin
6bb66cb3e9
lib/logstorage: properly search for the surrounding logs in stream_context pipe
The set of log fields in the found logs may differ from the set of log fields present in the log stream.
So compare only the log fields in the found logs when searching for the matching log entry in the log stream.

While at it, return _stream field in the delimiter log entry, since this field is used by VictoriaLogs Web UI
for grouping logs by log streams.
2024-07-01 02:29:50 +02:00
Aliaksandr Valialkin
bb0deb7ac4
lib/logstorage: add ability to store sorted log position into a separate field with sort ... rank <fieldName> syntax 2024-07-01 01:44:17 +02:00
Aliaksandr Valialkin
dc291d8980
lib/logstorage: add delimiter between log chunks returned from | stream_context pipe 2024-07-01 01:30:37 +02:00
Aliaksandr Valialkin
d4ca651547
lib/logstorage: add stream_context pipe, which allows selecting surrounding logs for the matching logs 2024-06-28 19:14:29 +02:00
Aliaksandr Valialkin
0730f1324d
lib/logstorage: it is safe using | unroll pipe in live tailing
`| unroll` pipe can make multiple copies of rows from the input row.
This doesn't break live tailing, so allow `| unroll` pipe in live tailing.
2024-06-27 19:44:57 +02:00
Aliaksandr Valialkin
7c8c040502
app/vlselect: properly return live tailing results 2024-06-27 15:05:57 +02:00
Aliaksandr Valialkin
87f1c8bd6c
lib/logstorage: work-in-progress 2024-06-27 14:20:43 +02:00
Andrii Chubatiuk
070abe5c71
added IMDSv2 for YC SD (#6524)
### Describe Your Changes

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-06-26 18:03:21 +02:00
rtm0
a42bd59ee4
Fix Date metricid cache consistency under concurrent use (#6534)
### Describe Your Changes

Fix Date metricid cache consistency under concurrent use.
When one goroutine calls Has() and does not find the cache entry in the
immutable map it will acquire a lock and check the mutable map. And it
is possible that before that lock is acquired, the entry is moved from
the mutable map to the immutable map by another goroutine causing a
cache miss.

The fix is to check the immutable map again once the lock is acquired. 

### Checklist

The following checks are **mandatory**:

- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-06-26 17:33:38 +02:00
Aliaksandr Valialkin
dff5008392
app/vlstorage: add -retention.maxDiskSpaceUsageBytes command-line flag for limiting the retention at VictoriaLogs by disk space usage 2024-06-25 17:30:33 +02:00
Aliaksandr Valialkin
3eacd43fff
lib/logstorage: parse syslog structured data into separate fields in order to simplify further querying of this data 2024-06-25 14:53:39 +02:00
Aliaksandr Valialkin
9e1c037249
lib/logstorage: properly parse timezone offset at TryParseTimestampRFC3339Nano()
The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets.

While at it, make tryParseTimestampISO8601 function private in order to prevent
from improper usage of this function from outside the lib/logstorage package.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508
2024-06-25 14:53:38 +02:00
Aliaksandr Valialkin
7252c5d258
lib/logstorage: make golangci-lint happy 2024-06-25 03:04:21 +02:00
Aliaksandr Valialkin
82d639411d
lib/httpserver: revert 9b7e532172
Reason for revert: this commit doesn't resolve real security issues,
while it complicates the resulting code in subtle ways (aka security circus).

Comparison of two strings (passwords, auth keys) takes a few nanoseconds.
This comparison is performed in non-trivial http handler, which takes thousands
of nanoseconds, and the request handler timing is non-deterministic because of Go runtime,
Go GC and other concurrently executed goroutines. The request handler timing is even
more non-deterministic when the application is executed in shared environments
such as Kubernetes, where many other applications may run on the same host and use
shared resources of this host (CPU, RAM bandwidth, network bandwidth).

Additionally, it is expected that the passwords and auth keys are passed via TLS-encrypted connections.
Establishing TLS connections takes additional non-trivial time (millions of nanoseconds),
which depends on many factors such as network latency, network congestion, etc.

This makes impossible to conduct timing attack on passwords and auth keys in VictoriaMetrics components.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6423/files
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6392
2024-06-25 01:36:12 +02:00
Aliaksandr Valialkin
de7450b7e0
lib/logstorage: work-in-progress 2024-06-24 23:27:12 +02:00
Andrii Chubatiuk
1e83598be3
app/vmagent: add max_scrape_size to scrape config (#6434)
Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-06-20 13:58:42 +02:00
Slava Bobik
d236604d39
Fixed a typo in the FastQueue mutex comment (#6514)
### Describe Your Changes

Fixed a small typo in a comment about the mutex inside the FastQueue
struct

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-06-20 02:30:36 -07:00
Aliaksandr Valialkin
7229dd8c33
lib/logstorage: work-in-progress 2024-06-20 03:10:08 +02:00
Zakhar Bessarab
201fd6de1e
lib/fs/fscore: do not trim content from path (#6503)
### Describe Your Changes

Trimming content which is loaded from an external pass leads to obscure
issues in case user-defined input contained trimmed chars. For example.
user-defined password "foo\n" will become "foo" while user will expect
it to contain a new line.

---
For example, a user defines a password which ends with `\n`. This often
happens when user Kubernetes secrets and manually encodes value as
base64-encoded string.

In this case vmauth configuration might look like:
```
users:
  - url_prefix:
      - http://vminsert:8480/insert/0/prometheus/api/v1/write
    name: foo
    username: foo
    password: "foobar\n"
```

vmagent configuration for this setup will use the following flags:
```
-remoteWrite.url=http://vmauth:8427/
-remoteWrite.basicAuth.passwordFile=/tmp/vmagent-password
-remoteWrite.basicAuth.username="foo"
```
Where `/tmp/vmagent-password` is a file with `foobar\n` password.

Before this change such configuration will result in `401 Unauthorized`
response received by vmagent since after file content will become
`foobar`.

---
An example with Kubernetes operator which uses a secret to reference the
same password in multiple configurations.

<details>
  <summary>See full manifests</summary>

`Secret`:
```
apiVersion: v1
data:
  name: Zm9v # foo
  password: Zm9vYmFy # foobar\n
  username: Zm9v= # foo
kind: Secret
metadata:
  name: vmuser
```


`VMUser`: 
```
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMUser
metadata:
  name: vmagents
spec:
  generatePassword: false
  name: vmagents
  targetRefs:
  - crd:
      kind: VMAgent
      name: some-other-agent
      namespace: example
  username: foo
  # note - the secret above is referenced to provide password
  passwordRef:
    name: vmagent
    key: password
```

`VMAgent`:
```
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAgent
metadata:
  name: example
spec:
  selectAllByDefault: true
  scrapeInterval: 5s
  replicaCount: 1
  remoteWrite:
    - url: "http://vmauth-vmauth-example:8427/api/v1/write"
      # note - the secret above is referenced as well
      basicAuth:
        username:
          name: vmagent
          key: username
        password:
          name: vmagent
          key: password
```

</details>

Since both config target exactly the same `Secret` object it is expected
to work, but apparently the result will be `401 Unauthrized` error.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-06-19 10:31:48 +02:00
Nihal
9b7e532172
victoria-metrics: constant-time comparison of credentials like authkeys and basic auth credentials (#6423)
Changes for constant-time comparison of credentials like authkeys and
basic auth credentials.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6392

---------

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>
2024-06-19 09:36:56 +02:00
Aliaksandr Valialkin
e498fa6960
app/vlinsert/syslog: allow accepting syslog messages with different configs at different ports 2024-06-17 23:16:34 +02:00
hagen1778
34771ab293
lib/streamaggr: remove accidentally committed changes
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-06-17 14:24:54 +02:00
Roman Khavronenko
6149adbe10
app/vmselect/promql: check for ranged vectors in aggr funcs if implicit conversions are disabled (#6450)
Check for ranged vector arguments in aggregate expressions when
`-search.disableImplicitConversion` or `-search.logImplicitConversion`
are enabled.
 For example, `sum(up[5m])` will fail to execute if these flags are set.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [*] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-06-17 14:21:16 +02:00
Aliaksandr Valialkin
2b6a634ec0
lib/logstorage: work-in-progress 2024-06-17 12:13:18 +02:00
Andrii Chubatiuk
faf67aa8b5
lib/flagutil: use month limit for duration flag for parsed duration assessment (#6486)
use maxMonths limit for parsed duration flag value

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6330

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-06-14 15:20:21 +02:00
Andrii Chubatiuk
e678a9aa51
lib/backup/s3remote: fixed credsFilePath flag (#6488)
properly use credsFilePath flag value

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6353

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-06-14 14:13:02 +02:00
Roman Khavronenko
51d19485bb
lib/streamaggr: prevent rate_sum and rate_avg from producing NaNs (#6482)
### Describe Your Changes

* check if `lastValue` was seen at least twice with different
timestamps. Otherwise, the difference between last timestamp and
previous timestamp could be `0` and will result into `NaN` calculation
* check if there items left in lastValue map after staleness cleanup.
Otherwise, `rate_avg` could have produce `NaN` result.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-06-14 10:06:22 +02:00
Aliaksandr Valialkin
1c094d928c
lib/leveledbytebufferpool: do not pool byte slices bigger than 2^18 bytes
Previously byte slices up to 2^20 bytes (e.g. 1Mb) were cached because of a typo in the commit c14dafce43 .

This could result in increased memory usage when vmagent scrapes many regular targets, which expose
relatively small number of metrics (e.g. up to a few thousand per target) and a few large targets such as kube-state-metrics,
which expose more than 10 thousand metrics. This is common case for Kubernetes monitoring.

While at it, remove pools for very small byte slices, since they are rarely used during scraping.
2024-06-13 16:56:25 +02:00
Aliaksandr Valialkin
d54840f2f2
lib/bytesutil: optimize internStringMap cleanup
- Make it in a separate goroutine, so it doesn't slow down regular intern() calls.

- Do not lock internStringMap.mutableLock during the cleanup routine, since now
  it is called from a single goroutine and reads only the readonly part of the internStringMap.
  This should prevent from locking regular intern() calls for new strings during cleanups.

- Add jitter to the cleanup interval in order to prevent from synchornous increase in resource usage
  during cleanups.

- Run the cleanup twice per -internStringCacheExpireDuration . This should save 30% CPU time spent
  on cleanup comparing to the previous code, which was running the cleanup 3 times per -internStringCacheExpireDuration .
2024-06-13 15:06:51 +02:00
Zakhar Bessarab
34071ac660
lib/promscrape: increase default value for promscrape.maxDroppedTargets to 10_000 (#6459)
### Describe Your Changes
This limit can be increased since after
4513893ead
tracking of dropped targets uses much less memory per entry.

See:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6381#issuecomment-2156708228


### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-06-12 16:34:18 +02:00
LHHDZ
3a45bbb4e0
app/vmauth: fix discovering backend IPs when url_prefix contains hostname with srv+ prefix (#6401)
This change fixes the following panic:
```
2024-06-04T11:16:52.899Z        warn    app/vmauth/auth_config.go:353   cannot discover backend SRV records for http://srv+localhost:8080: lookup localhost on 10.100.10.4:53: server misbehaving; use it literally
panic: runtime error: integer divide by zero

goroutine 9 [running]:
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper.func1()
        /Users/lhhdz/wd/projects/go/VictoriaMetrics/lib/httpserver/httpserver.go:291 +0x58
panic({0x103115100?, 0x10338d700?})
        /Users/lhhdz/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.darwin-arm64/src/runtime/panic.go:770 +0x124
main.getLeastLoadedBackendURL({0x0?, 0x22?, 0x1400014757b?}, 0x1400013c120?)
        /Users/lhhdz/wd/projects/go/VictoriaMetrics/app/vmauth/auth_config.go:473 +0x210
main.(*URLPrefix).getBackendURL(0x140000aa080)
        /Users/lhhdz/wd/projects/go/VictoriaMetrics/app/vmauth/auth_config.go:312 +0xb8
```

---------

Co-authored-by: Haley Wang <haley@victoriametrics.com>
2024-06-12 12:30:44 +02:00
Aliaksandr Valialkin
8f5dc966f6
lib/logstorage: work-in-progress 2024-06-11 17:50:32 +02:00
Aliaksandr Valialkin
65a97317e4
lib/streamaggr: prevent from data race inside dedupAggrShard when samplesBuf can be updated in pushSamples() while their values are read in the flush() loop without das.mu lock
This issue has been introduced in the commit 253c0cffbe
2024-06-11 17:31:16 +02:00
Aliaksandr Valialkin
0521e58a09
lib/logstorage: work-in-progress 2024-06-10 18:42:19 +02:00
Aliaksandr Valialkin
bf2d299420
lib/streamaggr: return back string interning to dedupAggr after 78953723200f15ffc417064d1912bdbb7551505c
It should reduce memory allocation rate during stream deduplication
2024-06-10 18:05:42 +02:00
Aliaksandr Valialkin
6a0a36aa93
lib/bytesutil: reduce the number of memory allocations per each interned string in bytesutil.InternString() from 5 to 1
This should reduce GC overhead when tens of millions of strings are interned (for example, during stream deduplication
of millions of active time series).
2024-06-10 18:05:41 +02:00
Roman Khavronenko
cd1aca217c
lib/protoparser/opentelemetry/firehose: escape requestID before returning it to user (#6451)
All user input should be sanitized before rendering. This should prevent
possible attacks. See
https://github.com/VictoriaMetrics/VictoriaMetrics/security/code-scanning/203

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-06-10 16:55:59 +02:00
Aliaksandr Valialkin
253c0cffbe
lib/streamaggr: reduce memory allocations by using dedupAggrSample buffer per each dedupAggrShard 2024-06-10 16:38:42 +02:00
Aliaksandr Valialkin
a1e8003754
lib/streamaggr: reduce the number of duplicates per each sample in BenchmarkDedupAggr from 100 to 2
This is closer to typical production setups when deduplication is used for de-duplicating of 2 samples per series.
2024-06-10 16:38:41 +02:00
Aliaksandr Valialkin
0b7c47a40c
lib/streamaggr: use strings.Clone() instead of bytesutil.InternString() for creating series key in dedupAggr
Our internal testing shows that this reduces GC overhead when deduplicating tens of millions of active series.
2024-06-10 16:08:34 +02:00
Aliaksandr Valialkin
e8bb4359bb
lib/streamaggr: improve performance for dedupAggr.sizeBytes() and dedupAggr.itemsCount()
These functions are called every time `/metrics` page is scraped, so it would be great
if they could be sped up for the cases when dedupAggr tracks tens of millions of active time series.
2024-06-10 15:59:37 +02:00
Aliaksandr Valialkin
f45d02a243
lib/streamaggr: remove flushState arg at dedupAggr.flush(), since it is always set to true in production 2024-06-10 15:59:33 +02:00
Hui Wang
61dce6f2a1
lib/httpserver: allow reloadAuthKey and configAuthKey to override htt… (#6338)
…pAuth.*

address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329, 
makes `reloadAuthKey`, `configAuthKey`, `flagsAuthKey`, `pprofAuthKey`
behavior the same way,
but keys like `-snapshotAuthKey`, `-forceMergeAuthKey` are still
protected by httpAuth.*. All the available key are listed in
https://docs.victoriametrics.com/single-server-victoriametrics/#security.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-06-10 12:09:47 +02:00
Aliaksandr Valialkin
36be090cd5
lib/streamaggr: follow-up for 7cb894a777
- Use bytesutil.InternString() instead of strings.Clone() for inputKey and outputKey in aggregatorpushSamples().
  This should reduce string allocation rate, since strings can be re-used between aggrState flushes.
- Reduce memory allocations at dedupAggrShard by storing dedupAggrSample by value in the active series map.
- Remove duplicate call to bytesutil.InternBytes() at Deduplicator, since it is already called inside dedupAggr.pushSamples().
- Add missing string interning at rateAggrState.pushSamples().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6402
2024-06-07 16:27:26 +02:00
Roman Khavronenko
7cb894a777
lib/streamaggr: reduce number of inuse objects (#6402)
The main change is getting rid of interning of sample key. It was
discovered that for cases with many unique time series aggregated by
vmagent interned keys could grow up to hundreds of millions of objects.
This has negative impact on the following aspects:
1. It slows down garbage collection cycles, as GC has to scan all inuse
objects periodically. The higher is the number of inuse objects, the
longer it takes/the more CPU it takes.
2. It slows down the hot path of samples aggregation where each key
needs to be looked up in the map first.

The change makes code more fragile, but suppose to provide performance
optimization for heavy-loaded vmagents with stream aggregation enabled.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-06-07 15:45:52 +02:00
Roman Khavronenko
5f46f8a11d
lib/promrelabel: speedup label match by __name__ (#6432)
The change adds a fastpath for `equalValue` comparisons against
`__name__` label by avoiding calls to `toCanonicalLabelName` func. This
speedups matches by metric name like `'foo'`. See bench stats below:
```
benchcmp old.txt new.txt

benchmark                                           old ns/op     new ns/op     delta
BenchmarkIfExpression/equal_label:_last-10          35.6          35.1          -1.18%
BenchmarkIfExpression/equal_label:_middle-10        18.3          17.3          -5.41%
BenchmarkIfExpression/equal_label:_first-10         1.20          1.24          +2.74%
BenchmarkIfExpression/equal___name__:_last-10       10.1          4.96          -50.75%
BenchmarkIfExpression/equal___name__:_middle-10     5.79          3.16          -45.41%
BenchmarkIfExpression/equal___name__:_first-10      1.17          1.05          -9.76%
```

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-06-07 15:44:48 +02:00
Andrii Chubatiuk
185fac03b3
lib/streamaggr: metrics to track dropped, nan samples and samples lag (#6358)
### Describe Your Changes

Added streamaggr metrics to:
 - `vm_streamaggr_samples_lag_seconds` - samples lag
- `vm_streamaggr_ignored_samples_total{reason="nan"}` - ignored NaN
samples
- `vm_streamaggr_ignored_samples_total{reason="too_old"}` - ignored old
samples
2024-06-06 14:06:11 +02:00
Aliaksandr Valialkin
55d8379ae6
lib/logstorage: work-in-progress 2024-06-06 12:27:05 +02:00
Aliaksandr Valialkin
80a7c65ab7
lib/logstorage: allow using eval keyword instead of math keyword in math pipe 2024-06-05 10:07:49 +02:00
Aliaksandr Valialkin
43cf221681
lib/logstorage: work-in-progress 2024-06-05 03:18:12 +02:00
pludov
3ddae77c63
lib/fs: support NFS implementations that return EEXIST instead of ENOTEMPTY (#6398)
### Describe Your Changes

Fix for issue #6396: according to rmdir manpage, ENOTEMPTY and EEXIST
should be treated equally

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6396

### Checklist

The following checks are **mandatory**:

- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: Ludovic Pollet <ludovic.pollet@exfo.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-06-04 15:17:38 +02:00
Aliaksandr Valialkin
96c29ab403
lib/logstorage: allow typing asc in sort pipe for the sake of consistency with desc 2024-06-04 02:29:10 +02:00
Aliaksandr Valialkin
539fce9227
lib/logstorage: work-in-progress 2024-06-04 01:49:02 +02:00
Aliaksandr Valialkin
b30e80b071
lib/logstorage: work-in-progress 2024-05-30 16:19:23 +02:00
Roman Khavronenko
b984f4672e
lib/storage: filter deleted label names and values from `/api/v1/labe… (#6342)
…ls` and `/api/v1/label/.../values`

Check for deleted metrics when `match[]` filter matches small number of
time series (optimized path).

The issue was introduced
[v1.81.0](https://docs.victoriametrics.com/changelog_2022/#v1810).

Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6300 Updates
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-29 14:07:44 +02:00
Aliaksandr Valialkin
1de187bcb7
lib/logstorage: work-in-progress 2024-05-29 01:52:13 +02:00
Aliaksandr Valialkin
0aafca29be
lib/logstorage: work-in-progress 2024-05-28 19:29:41 +02:00
Aliaksandr Valialkin
99138e15c0
lib/logstorage: fix golangci-lint warnings 2024-05-26 02:01:32 +02:00
Aliaksandr Valialkin
1e203f35f7
lib/logstorage: work-in-progress 2024-05-26 01:55:21 +02:00
Aliaksandr Valialkin
7ac529c235
lib/logstorage: work-in-progress 2024-05-25 22:59:13 +02:00
Aliaksandr Valialkin
0b629ce5a5
lib/logstorage: re-use per-shard fields across processed blocks in pipePackJSON and pipeUnroll 2024-05-25 22:13:32 +02:00
Aliaksandr Valialkin
dc55146752
lib/logstorage: work-in-progress 2024-05-25 21:36:16 +02:00
Aliaksandr Valialkin
e2590f0485
lib/logstorage: work-in-progress 2024-05-25 00:30:58 +02:00
Nikolay
69d244e6fb
lib/mergeset: adds tracking for indexdb records drop (#6297)
It allows to create alert for possible item drops at indexdb. It may
happen, if ingested metric size exceeds max indexdb item size.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-05-24 14:55:20 +02:00
Aliaksandr Valialkin
4b458370c1
lib/logstorage: work-in-progress 2024-05-24 03:06:55 +02:00
Nikolay
a5d1013042
lib/storage: change default value for maxLabelValueLen to 1024 (#6313)
* It must reduce memory usage for misbehaving clients. Since
VictoriaMetrics stores sparse index inmemory.
* Reduce disk space usage for indexdb.
* Prevent possible indexDB items drops.
* It may trigger slow insert and new timeseries registration due to
default value for flag change

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6176

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-05-22 21:53:53 +02:00
Alexander Marshalov
7da541360e
[vmlogs] fixed time parsing with millisecond precision time (#6293) (#6295)
fix for #6293

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-05-22 21:46:50 +02:00
Aliaksandr Valialkin
22107421eb
lib/logstorage: work-in-progress 2024-05-22 21:01:20 +02:00
Roman Khavronenko
ac836bcf6c
lib/backup: add -s3TLSInsecureSkipVerify command-line flag (#6318)
* The new flag can be used for for skipping TLS certificates
verification when connecting to S3 endpoint. Affects vmbackup,
vmrestore, vmbackupmanager.

* replace deprecated `EndpointResolver` with `BaseEndpoint`

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1056

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-22 13:58:39 +02:00
Hui Wang
d7b5062917
app/vmalert: support DNS SRV record in -remoteWrite.url (#6299)
part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053,
supports [DNS SRV](https://en.wikipedia.org/wiki/SRV_record) address in
`-remoteWrite.url` command-line option.
2024-05-22 10:52:51 +02:00
Roman Khavronenko
7ce052b32d
lib/streamaggr: skip empty aggregators (#6307)
Prevent excessive resource usage when stream aggregation config file
contains no matchers by prevent pushing data into Aggregators object.
Before this change a lot of extra work was invoked without reason.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-20 14:03:28 +02:00
Aliaksandr Valialkin
bc4a0b8f37
lib/logstorage: fix golangci-lint warnings 2024-05-20 11:04:12 +02:00
Aliaksandr Valialkin
ad505a7a9a
lib/logstorage: work-in-progress 2024-05-20 04:08:30 +02:00
Andrii Chubatiuk
f153f54d11
app/vmagent: add global aggregator (#6268)
Add global stream aggregation for VMAgent

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467
2024-05-17 14:00:47 +02:00
Nikolay
b2765c45d0
follow-up for c6c5a5a186 (#6265)
* adds datadog extensions for statsd:
  - multiple packed values (v1.1)
  - additional types distribution, histogram

* adds type check and append metric type to the labels with special tag
name `__statsd_metric_type__`. It simplifies streaming aggregation
config.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-05-16 09:25:42 +02:00
Aliaksandr Valialkin
0aa19a2837
lib/logstorage: work-in-progress 2024-05-15 04:55:44 +02:00
Aliaksandr Valialkin
b617dc9c0b
lib/streamaggr: properly return output key from getOutputKey
The bug has been introduced in cc2647d212
2024-05-14 17:47:21 +02:00
Aliaksandr Valialkin
da3af090c6
lib/logstorage: work-in-progress 2024-05-14 03:05:03 +02:00
Aliaksandr Valialkin
cb35e62e04
lib/logstorage: work-in-progress
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6258
2024-05-14 01:49:23 +02:00
Aliaksandr Valialkin
cc2647d212
lib/encoding: optimize UnmarshalVarUint64, UnmarshalVarInt64 and UnmarshalBytes a bit
Change the return values for these functions - now they return the unmarshaled result plus
the size of the unmarshaled result in bytes, so the caller could re-slice the src for further unmarshaling.

This improves performance of these functions in hot loops of VictoriaLogs a bit.
2024-05-14 01:23:54 +02:00
Aliaksandr Valialkin
707f3a69db
lib/stringsutil: add LessNatural() function for natural sorting
Natural sorting is needed for sort_by_label_natural() and sort_by_label_natural_desc()
functions in MetricsQL - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6192
and https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6256

Natural sorting will be also used by `| sort ...` pipe in VictoriaLogs -
see https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe
2024-05-13 16:56:47 +02:00
Hui Wang
4c80b17027
storage: correctly apply -inmemoryDataFlushInterval when it's set t… (#6221)
…o minimum supported value 1s
pendingRowsFlushInterval was bumped to 2s in
73f0a805e2
2024-05-13 16:44:30 +02:00
Andrii Chubatiuk
ce25d68b45
lib/streamaggr: added rate_sum and rate_avg to benchmarks, lint fix (#6264)
fixed lint for rate outputs
2024-05-13 16:40:37 +02:00
Andrii Chubatiuk
9c3d44c8c9
lib/streamaggr: added rate and rate_avg output (#6243)
Added `rate` and `rate_avg` output
Resource usage is the same as for increase output, tested on a benchmark

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-05-13 15:39:49 +02:00
hagen1778
17283fab6c
lib/logstorage: make linter happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-13 15:35:11 +02:00
Aliaksandr Valialkin
9dbd0f9085
lib/logstorage: initial implementation of pipes in LogsQL
See https://docs.victoriametrics.com/victorialogs/logsql/#pipes
2024-05-12 16:33:31 +02:00
Aliaksandr Valialkin
e66465cb03
lib/encoding: optimizing UnmarshalVarUint64 and UnmarshalVarInt64 a bit 2024-05-12 16:32:11 +02:00
Aliaksandr Valialkin
590160ddbb
lib/slicesutil: add helper functions for setting slice length and extending its capacity
The added helper functions - SetLength() and ExtendCapacity() - replace error-prone code with simple function calls.
2024-05-12 11:32:17 +02:00
Aliaksandr Valialkin
f20d452196
lib/storage: remove outdated misleading comments 2024-05-12 10:24:04 +02:00
Roman Khavronenko
87fd400dfc
Feature allow configuring disableOnDiskQueue and dropSamplesOnOverload per url (#6248)
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html):
allow configuring `-remoteWrite.disableOnDiskQueue` and
`-remoteWrite.dropSamplesOnOverload` cmd-line flags per each
`-remoteWrite.url`. See this [pull
request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065).
Thanks to @rbizos for implementaion!
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add
labels `path` and `url` to metrics
`vmagent_remotewrite_push_failures_total` and
`vmagent_remotewrite_samples_dropped_total`. Now number of failed pushes
and dropped samples can be tracked per `-remoteWrite.url`.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Raphael Bizos <r.bizos@criteo.com>
2024-05-10 12:09:21 +02:00
Roman Khavronenko
8a03e987cb
lib/streamaggr: set correct suffix <output>_prometheus (#6228)
Set correct suffix `<output>_prometheus` for aggregation outputs
`increase_prometheus` and `total_prometheus`
Before, outputs `total` and `total_prometheus` or `increase` and
`increase_prometheus` had the same suffix.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-08 13:11:30 +02:00
Andrii Chubatiuk
a9283e06a3
streamaggr: made labels compressor shared (#6173)
Though labels compressor is quite resource intensive, each aggregator
and deduplicator instance has it's own compressor. Made it shared across
all aggregators to consume less resources while using multiple
aggregators.

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2024-05-08 13:10:53 +02:00
Zhu Jiekun
17e3d019d2
feature: [vmagent] Add service discovery support for Vultr (#6068)
### Describe Your Changes
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041

#### Added
- Added service discovery support for Vultr.

#### Docs
- `CHANGELOG.md`, `sd_configs.md`, `vmagent.md` are updated.

#### Note
- Useful links: 
- Vultr API:
https://www.vultr.com/api/#tag/instances/operation/list-instances
    - Vultr client SDK: https://github.com/vultr/govultr
- Prometheus SD:
https://github.com/prometheus/prometheus/tree/main/discovery/vultr

---
### Checklist

The following checks are mandatory:

- [X] I have read the [Contributing
Guidelines](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/CONTRIBUTING.md)
- [x] All commits are signed and include `Signed-off-by` line. Use `git
commit -s` to include `Signed-off-by` your commits. See this
[doc](https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work) about
how to sign your commits.
- [x] Tests are passing locally. Use `make test` to run all tests
locally.
- [x] Linting is passing locally. Use `make check-all` to run all
linters locally.

Further checks are optional for External Contributions:

- [X] Include a link to the GitHub issue in the commit message, if issue
exists.
- [x] Mention the change in the
[Changelog](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/CHANGELOG.md).
Explain what has changed and why. If there is a related issue or
documentation change - link them as well.

  Tips for writing a good changelog message::

* Write a human-readable changelog message that describes the problem
and solution.
* Include a link to the issue or pull request in your changelog message.
* Use specific language identifying the fix, such as an error message,
metric name, or flag name.
* Provide a link to the relevant documentation for any new features you
add or modify.

- [ ] After your pull request is merged, please add a message to the
issue with instructions for how to test the fix or try the feature you
added. Here is an
[example](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4048#issuecomment-1546453726)
- [x] Do not close the original issue before the change is released.
Please note, in some cases Github can automatically close the issue once
PR is merged. Re-open the issue in such case.
- [x] If the change somehow affects public interfaces (a new flag was
added or updated, or some behavior has changed) - add the corresponding
change to documentation.

Signed-off-by: Jiekun <jiekun.dev@gmail.com>
2024-05-08 10:01:48 +02:00
Oleg
c6c5a5a186
Statsd protocol compatibility (#5053)
In this PR I added compatibility with [statsd
protocol](https://github.com/b/statsd_spec) with tags to be able to send
metrics directly from statsd clients to vmagent or directly to VM.
For example its compatible with
[statsd-instrument](https://github.com/Shopify/statsd-instrument) and
[dogstatsd-ruby](https://github.com/DataDog/dogstatsd-ruby) gems

Related issues: #5052, #206, #4600
2024-05-07 21:46:08 +02:00
Ted Possible
5a3abfa041
Exemplar support (#5982)
This code adds Exemplars to VMagent and the promscrape parser adhering
to OpenMetrics Specifications. This will allow forwarding of exemplars
to Prometheus and other third party apps that support OpenMetrics specs.

---------

Signed-off-by: Ted Possible <ted_possible@cable.comcast.com>
2024-05-07 12:09:44 +02:00
Zakhar Bessarab
329c3cbdf0
lib/mergeset: improve test coverage (#6118)
Add test to cover the code path with overflowing shards buffers and
triggering merge to partition.

This test covers the code path which leaded to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-04-30 10:21:37 +02:00
hagen1778
381d4494e9 v1.101.0
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkhL6N9vmSTjg0VSVO/dfN0HKlkAFAmYqbhsACgkQO/dfN0HK
 lkBDhRAAhUO7SsbCCHZo4Azdw+G32J05LsYvKMGl0r+j2fYRPovl7Sgf/HdZUKxk
 4aOXPPl8YogHWHVv8qrmFXl7gPWRNaFtCxmVlVIv+eEzwzN18tH2Umn+PwfQTtmN
 VM7ujy54rH8z28AGII8P8h4s/0kNVPGwPP9gEifm8ICIXtKpdnvbtkpAoCFEZvYf
 b5chm/3NQRA1R7c+yRxVs9YH15+XgYG0z/onVaVnjUxPXvme64v3RL+nt/ezimMo
 PXDBt5HRXa6lWIxM+g3oaGJ9/qFKwTrHykXgx3oPPWsphJMVW8ltt8sqg6sGuRJz
 fD/iRjpHIGAfD/2BX90TOMyYbC+s921rPU0+aQ70U5mPU+f8E1fI1HNVlsJiZ9NL
 Xhj9GOJzNQP2moql1dsDibZXhO0aIMfweHduXN7KRK88IPtnQdy1Sj6lhAJdJ2iH
 q1s5ShDx9gLLA2ecuL4COA9tQxTncnTZdsU4Y1bnSif0Iuct03L84ovaCSAuJ5BP
 XrwVo0Vk2albDpw8n2Dzq7Xquiewyb9IlaQ8U5B/tdKSpH4aAydy56PgdC+gHaZk
 6c2aBf0HKmg3qxsp/xb593cWloToPgsgB0KB2m7b+nEPBLP62obzBEeS8P5ahrJB
 UmPA7tw6BlYT93JttotFn+gykZjAELcbHkO8Yoe7JnVQMA8irFs=
 =ecyf
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkhL6N9vmSTjg0VSVO/dfN0HKlkAFAmYrkEYACgkQO/dfN0HK
 lkAj0w//WUXbB/gR3P47t6dNSX0P0qb3D/+AQ2+he/wo3mJ1msd3XtkpiUHcpP0k
 qtrYFrY5wQQ4lC82VMOWdlw7YY5E4ah2tbYCAUBpgpp3Lu7iD/muiLlfwQslwYUy
 PzwKH5HBQQmgtKgGX4miSeVk95TUyzE5m70dF4atIY8ydK73ZqiSV+IC9/cYah2y
 Q4Y9xJZnSMR1cKdMfTpYR0s9gPg5bB9yAKq9qB8TQfxMnW2A8wkhvwf675mJJCZ+
 spRTXzrcKp3thKWmDowTtzu/ONYTRcpQfgiE5MxzySnHQcv4nQnf3jxeT1+1K+pk
 5jI5bFAjkRVy1u1oDsbjHySdFyt1jZA6Klw9dlGf9EVjXfr0jbAUO++8T538CCfW
 UUSx2h83GTvSMvVaCDCtbYdlHZwxgLTwJvFDdcpm4nci9u3wKsfnoKw6doskt1fs
 Sp241F46Ck5embAdtv1FJaGYvH8PVe4j24slBWvn5vhN8TcP7mcsw8DCGjKWfSVS
 JQQxxjCmAyxQOZj+9k2v3wlRjmHoRQ0ELu7EbYQW3eyiYi3bWVQkK5Mg4Z0knwGl
 HWWum7LMwkM59hIAc771VGOz3jELGWXBTZWP8FcoRmtzcvBnzVjndDAiwH89xhR6
 Kym5JKrCkVvytee6xNxkBOIyuiavcB2qoZ7IqhHAYnqF9uhMx/E=
 =Ppg1
 -----END PGP SIGNATURE-----

Merge tag 'v1.101.0' into pmm-6401-read-prometheus-data-files

v1.101.0

Signed-off-by: hagen1778 <roman@victoriametrics.com>

# gpg: Signature made чт 25 кві 16:52:11 2024 CEST
# gpg:                using RSA key 9212FA37DBE64938E0D154953BF75F3741CA9640
# gpg: Good signature from "hagen1778 (VM GPG key) <roman@victoriametrics.com>" [ultimate]

# Conflicts:
#	go.mod
2024-04-26 13:30:14 +02:00
hagen1778
679844feaf
Revert "app/vmbackup: introduce new flag type URL (#6152)"
This reverts commit 029060af60.
2024-04-24 13:47:57 +02:00
Roman Khavronenko
029060af60
app/vmbackup: introduce new flag type URL (#6152)
The new flag type is supposed to be used for specifying URL values which
could contain sensitive information such as auth tokens in GET params or
HTTP basic authentication.

The URL flag also allows loading its value from files if `file://`
prefix is specified. As example, the new flag type was used in
app/vmbackup as it requires specifying `authKey` param for making the
snapshot.

See related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5973

Thanks to @wasim-nihal for initial implementation
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6060

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-24 10:57:54 +02:00
hagen1778
bae3874e6a
app/streamaggr: follow-up after c0e4ccb7b5
* rm vmagent mentions from vminsert flags
* improve documentation wording, add links to related sections
* mention `ignore_first_intervals` in the stream aggr options
* update flags description
* add basic test for config parsing validation

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-22 14:22:59 +02:00
Andrii Chubatiuk
c0e4ccb7b5
lib/streamaggr: add option to ignore first N aggregation intervals (#6137)
Stream aggregation may yield inaccurate results if it processes incomplete data. 
This issue can arise when data is sourced from clients that maintain a queue of unsent data, such as Prometheus or vmagent.
 If the queue isn't fully cleared within the aggregation interval, only a portion of the time series may be included in that period, leading to distorted calculations. 
To mitigate this we add an option to ignore first N aggregation intervals. It is expected, that client queues
will be cleared during the time while aggregation ignores first N intervals and all subsequent aggregations
will be correct.
2024-04-22 13:52:04 +02:00
Aliaksandr Valialkin
4770294732
lib/protoparser: substitute hybrid channel-based pools with plain sync.Pool
Using plain sync.Pool simplifies the code without increasing memory usage and CPU usage.
So it is better to use plain sync.Pool from readability and maintainability PoV.

This is a follow-up for 8942f290eb
2024-04-20 21:59:51 +02:00
Aliaksandr Valialkin
7531e9084a
all: use clear() built-in Go function for clearing []prompbmarshal.TimeSeries and []prompbmarshal.Label slices
This makes the code a bit clear.
2024-04-20 21:00:03 +02:00
Aliaksandr Valialkin
6b1cc9b946
lib/storage: search for all the values for the given label before applying filters and limits
It is incorrect applying the limit on the number of values to search without applying filters,
since the returned subset of label values may miss the label values matching the given filters.

This is a follow-up for 66630c7960
2024-04-18 20:29:36 +02:00
Aliaksandr Valialkin
2e3580905f
all: replace old https://docs.victoriametrics.com/relabeling.html url with the new one - https://docs.victoriametrics.com/relabeling/ 2024-04-18 03:22:22 +02:00
Aliaksandr Valialkin
e9642e99f2
all: replace old https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html url with the new one - https://docs.victoriametrics.com/single-server-victoriametrics/ 2024-04-18 03:11:03 +02:00
Aliaksandr Valialkin
828e78ceb4
all: replace old https://docs.victoriametrics.com/sd_configs.html url with the new one - https://docs.victoriametrics.com/sd_configs/ 2024-04-18 02:27:47 +02:00
Aliaksandr Valialkin
4d2b9fe6b2
all: replace old https://docs.victoriametrics.com/stream-aggregation.html url with the new one - https://docs.victoriametrics.com/stream-aggregation/ 2024-04-18 02:19:11 +02:00
Aliaksandr Valialkin
6e6bae3e8d
all: replace old https://docs.victoriametrics.com/vmbackup.html url with the new one - https://docs.victoriametrics.com/vmbackup/ 2024-04-18 01:57:04 +02:00
Aliaksandr Valialkin
c81a633b02
all: replace the outdated url https://docs.victoriametrics.com/vmagent.html with the new one - https://docs.victoriametrics.com/vmagent/ 2024-04-18 01:31:37 +02:00
Aliaksandr Valialkin
66630c7960
lib/storage: improve performance for /api/v1/label/labelName/values when match[] contains only a single filter on labelName
This speeds up auto-suggestion for metric names in VMUI and Grafana, which use the following query in this case:

  /api/v1/label/__name__/values?match[]={__name__=~"*.some_value.*"}

When the user types `some_value` in the query input field.
2024-04-18 01:15:20 +02:00
Aliaksandr Valialkin
50ac22df78
lib/httpserver: add support for automatic issuing of TLS certificates via Lets Encrypt service
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5949
2024-04-17 23:50:57 +02:00
Aliaksandr Valialkin
bd454f5063
lib/netutil: move creation of GetCertificate callback into a separate function
This improves code readability a bit
2024-04-17 22:10:43 +02:00
Aliaksandr Valialkin
dc326f70b4
app/vmagent: support for DNS SRV urls at -remoteWrite.url, scrape target urls and service discovery urls
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053
2024-04-17 20:54:39 +02:00
Aliaksandr Valialkin
b426d10847
app/vmauth: add support for configuring backends via DNS SRV urls 2024-04-17 20:46:22 +02:00
Aliaksandr Valialkin
e3a26c0db6
lib/promscrape/discovery/consul: typo fix in the comment: enteprise -> enterprise 2024-04-16 19:34:18 +02:00
Aliaksandr Valialkin
85d09e5a2d
lib/{mergeset,storage}: log deleting directories inside partitions if they are missing in parts.json
This should improve debuggability of unexpected deletion of directories inside partitions.

While at it, log the proper path to parts.json when the directory for big part is missing in the partition.
parts.json is located inside directory with small parts, and there is no parts.json file inside directory with big parts.
2024-04-16 19:11:32 +02:00
Aliaksandr Valialkin
6bcc6c938b
lib/storage: improve comments inside functions responsible for creating indexes for newly registered time series 2024-04-16 19:11:32 +02:00
Zakhar Bessarab
2205de2391
lib/mergeset: fix flushing incorrect set of inmemoryBlocks (#6089)
Follow-up for bace9a2501

Related:
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6069
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-04-11 09:26:06 +02:00
wanshuangcheng
83216e956c
chore: fix function names in comment (#6076)
Signed-off-by: wanshuangcheng <wanshuangcheng@outlook.com>
2024-04-08 01:11:12 -07:00
Aliaksandr Valialkin
b7b731d340
Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files 2024-04-04 03:49:49 +03:00
Aliaksandr Valialkin
f8d10a7106
lib/streamaggr: update the minimum allowed timestamp for incoming samples before flushing the samples to the storage
This should prevent from dropping samples with old timestamps during long flushes.

This is a follow-up for 1cedaf61cb
2024-04-04 02:25:51 +03:00
Aliaksandr Valialkin
967d5496cf
app/vmagent: follow-up for b3b29ba6ac
- Automatically reload changed TLS root CA pointed by -remoteWrite.tlsCAFile command-line flag
- Automatically reload changed TLS root CA configured via oauth2.tsl_config.ca_file option at -promscrape.config
- Document the change as a feature instead of a bug at docs/CHANGELOG.md
- Simplify the code at lib/promauth, which is responsible for reloading changed TLS root CA files.
- Simplify the usage of lib/promauth.Config.NewRoundTripper() - now it accepts the base http.Transport
  instead of a callback, which can change the internal http.Transport.
- Reuse the default tls config if lib/promauth.Config doesn't contain tls-specific configs.
  This should reduce memory usage a bit when tls isn't used for scraping big number of targets.
- Do not re-read TLS root CA files on every processed request. Re-read them once per second.
  This should reduce CPU usage when scraping big number of targets over https.
- Do not store cert.pem and key.pem files in TestTLSConfigWithCertificatesFilesUpdate, since they can be loaded
  from byte slices via crypto/tls.X509KeyPair().
- Remove obsolete comparisons of string representations for authConfig and proxyAuthConfig at areEqualScrapeConfigs().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5725
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2171
2024-04-04 01:27:35 +03:00
Zakhar Bessarab
f80ac120f3
lib/promscrape/config: fix missing timeout for http client (#6063)
Follow-up for b3b29ba6

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-04-03 18:18:48 +02:00
Zakhar Bessarab
b3b29ba6ac
lib/{promauth,promscrape}: automatically refresh root CA certificates after changes on disk (#5725)
* lib/{promauth,promscrape}: automatically refresh root CA certificates after changes on disk

Added a custom `http.RoundTripper` implementation which checks for root CA content changes and updates `tls.Config` used by `http.RoundTripper` after detecting CA change.

Client certificate changes are not tracked by this implementation since `tls.Config` already supports passing certificate dynamically by overriding `tls.Config.GetClientCertificate`.

This change implements dynamic reload of root CA only for streaming client used for scraping. Blocking client (`fasthttp.HostClient`) does not support using custom transport so can't use this implementation.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promauth/config: update NewRoundTripper API

Update API to allow user to update only parameters required for transport.

Add warning log when reloading Root CA failed.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promauth/config: fix mutex acquire logic

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promauth/config: replace RWMutex with regular mutex to simplify the code

- remove additional mutex used for getRootCABytes - require callee to use mutex
- replace RWMutex with regular mutex

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promauth/config: refactor

- hold the mutex lock to avoid round tripper being re-created twice
- move recreation logic into separate func to simplify the code

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-04-03 10:01:43 +02:00
Aliaksandr Valialkin
fb42380ef3
lib/protoparser/opentelemetry: follow-up after 47892b4a4c
- Rename -opentelemetry.sanitizeMetrics command-line flag to more clear -opentelemetry.usePrometheusNaming
- Clarify the description of the change at docs/CHANGELOG.md
- Rename promrelabel.SanitizeLabelNameParts to more clear promrelabel.SplitMetricNameToTokens
- Properly split metric names at '_' char in promerlabel.SplitMetricNameToTokens.
- Add tests for various edge cases for Prometheus metric names' normalization
  according to the code at b865505850/pkg/translator/prometheus/normalize_name.go
- Extract the code responsible for Prometheus metric names' normalization into a separate file (santize.go)

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6035
2024-04-03 02:25:29 +03:00
Aliaksandr Valialkin
918cccaddf
all: fix golangci-lint(revive) warnings after 0c0ed61ce7
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001
2024-04-02 23:16:29 +03:00
Aliaksandr Valialkin
c3a72b6cdb
lib/storage: consistently use stopCh instead of stop 2024-04-02 21:24:57 +03:00
Aliaksandr Valialkin
904e95fc69
app/vmagent: simplify code after 509df44d03
- Simplify the code in order to improve its maintenance
- Properly pass tenant ID when processing multi-tenant opentelemetry request at vmagent

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6016
2024-04-02 17:58:13 +03:00
Aliaksandr Valialkin
c79bf3925c
Revert "app/vmselect: make vmselect resilient to absence of cache folder (#5987)"
This reverts commit cb23685681.

Reason for revert: the "fix" may hide programming bugs related to incorrect creation of folders
before their use. This may complicate detecting and fixing such bugs in the future.

There are the following fixes for the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985 :
- To configure the OS to do not drop data from the system-wide temporary directory (aka /tmp).
- To run VictoriaMetrics with -cacheDataPath command-line flag, which points to the directory,
  which cannot be removed automatically by the OS.

The case when the user accidentally deletes the directory with some files created by VictoriaMetrics
shouldn't be considered as expected, so VictoriaMetrics shouldn't try resolving this case automatically.
It is much better from operation and debuggability PoV is to crash with the clear `directory doesn't exist` error
in this case.
2024-03-30 07:29:24 +02:00
Aliaksandr Valialkin
830b871baf
app/vmagent: properly shutdown when -maxIngestionRate limit is reached
The remotewrite.Stop() expects that there are no pending calls to TryPush().
This means that the ingestionRateLimiter.Register() must be unblocked inside TryPush() when calling remotewrite.Stop().
Provide remotewrite.StopIngestionRateLimiter() function for unblocking the rate limiter before calling the remotewrite.Stop().

While at it, move the rate limiter into lib/ratelimiter package, since it has two users.
Also move the description of the feature to the correct place at docs/CHANGELOG.md.
Also cross-reference -remoteWrite.rateLimit and -maxIngestionRate command-line flags.

This is a follow-up for 02bccd1eb9
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5900
2024-03-30 06:43:48 +02:00
Zakhar Bessarab
af3922b1df
lib/storage: add ability to use downsampling for the given series filter (#733)
* lib/storage: add ability to use downsampling for the given series filter

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add information about downsampling filters

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: fix MetricsQL filter

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/downsampling: treat missing downsampling filter as a bug

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/part_header: verify correctness of downsampling filters when opening partition

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/downsampling: save only appliable rules in part metadata

Filter and save only rules which are appliable to partition based on MinTimestamp of stored data.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/downsampling: update log messages for final dedup

Properly specify a reason of re-running deduplication for partition.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage: consistently use MaxTimestamp to determine deduplication/downsampling rules

Using MinTimestamp leads to applying downsampling to parts which are only partially covered by downsampling rule.
For example, partition covers range [1000-2000]. At t=2100 and rule offset 500 data with t=2100-500 => 1600 must be downsampled. The range check against MinTimestamp evaluates to true even though partition contains range which must not be downsampled - [1600:2000].

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Follow-up

- Apply the first matching downsampling period if multiple filters match the given time series.
  This allows fine-tuning the downsampling config for the specific needs.
- Take into account downsampling filters during search queries.
- Reduce the difference between community and enterprise branches. This should simplify further maintenance of these branches.
- Properly parse series filters with colons inside them.
- Document the feature at docs/CHANGELOG.md.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4960

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-03-30 04:12:23 +02:00
Aliaksandr Valialkin
131f357098
lib/storage/table.go: reduce the difference with enterprise branch 2024-03-30 03:22:51 +02:00
Aliaksandr Valialkin
4001ca36b8
lib/storage/partition.go: reduce code difference a bit with enterprise branch 2024-03-30 01:39:27 +02:00
Nikolay
a05303eaa0
lib/storage: adds metrics for downsampling (#382)
* lib/storage: adds metrics for downsampling
vm_downsampling_partitions_scheduled - shows the number of parts, that must be downsampled
vm_downsampling_partitions_scheduled_size_bytes - shows total size in bytes for parts, the must be donwsampled

These two metrics answer the questions - is downsampling running? how many parts scheduled for downsampling and how many of them currently downsampled? Storage space that it occupies.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2612

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-03-30 01:11:49 +02:00
Andrii Chubatiuk
47892b4a4c
opentelemetry: added cmd flag to sanitize metric names (#6035) 2024-03-29 13:51:24 +01:00
Aliaksandr Valialkin
4a359d5f67
lib/storage: follow-up for 76f00cea6b
Store the deadline when the metricID entries must be deleted from indexdb
if metricID->metricName entry isn't found after the deadline. This should
make the code more clear comparing the the previous version, where the timestamp
of the first metricID->metricName lookup miss was stored in missingMetricIDs.

Remove the misleading comment about the importance of the order for creating entries
in the inverted index when registering new time series. The order doesn't matter,
since any subset of the created entries can become visible for search
before any other subset after registering in indexdb.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959
2024-03-27 11:41:28 +02:00
Zakhar Bessarab
51f5ac1929
lib/storage/table: wait for merges to be completed when closing a table (#5965)
* lib/storage/table: properly wait for force merges to be completed during shutdown

Properly keep track of running background merges and wait for merges completion when closing the table.
Previously, force merge was not in sync with overall storage shutdown which could lead to holding ptw ref.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add changelog entry

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-03-26 13:49:09 +01:00
Andrii Chubatiuk
509df44d03
app/{vmagent,vminsert}: fixed firehose response (#6016) 2024-03-26 13:20:41 +01:00
Roman Khavronenko
cb23685681
app/vmselect: make vmselect resilient to absence of cache folder (#5987)
vmselect uses a cache folder in file system for two purposes:
1. Storing rollup cache results on shutdown;
2. Storing temporary search results from vmstorage during query executions.

It could happen that cache folder is deleted accidentally by user, or by OS
during cleanup routines. This would cause vmselect to:
1. panic on /metrics call, because `MustGetFreeSpace` will fail;
2. return query error user, as it won't be able to store temporary search results.

The changes in this commit are the following:
1. Make `MustGetFreeSpace` to try re-creating the cache folder if it is missing;
2. Make vmselect to try re-creating the cache folder if it can't persist tmp search
results.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-03-26 12:59:50 +01:00
hagen1778
e6dd52b04c
lib/promauth: follow-up b577413d3b
Convert test result expectations to canonical form.
Starting from b577413d3b specified header keys are forced
into canonical form https://pkg.go.dev/net/http#CanonicalHeaderKey

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-18 11:12:45 +01:00
Aliaksandr Valialkin
4553521f9a
lib/streamaggr: ignore out of order samples for last output
This is a follow-up for 6a465f6e29

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931
2024-03-18 01:03:36 +02:00
Aliaksandr Valialkin
76f00cea6b
lib/storage: wait for up to 60 seconds before deciding to delete metricID entries from indexdb if metricID->metricName entry is missing during search
The metricID->metricName entry can remain invisible for search for some time after registering new metricName.
This is expected condition. So wait for up to 60 seconds in the hope that the metricID->metricName
entry will become visible before deleting all the entries from indexdb, which are associated with the given metricID.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948

See also 20812008a7
2024-03-18 00:34:32 +02:00
Aliaksandr Valialkin
729b263670
lib/httputils: rename CAFile -> caFile in order to be consistent with local var naming in Go
This is a follow-up for 83e55456e2
2024-03-17 23:19:52 +02:00
Aliaksandr Valialkin
1cedaf61cb
app/{vmagent,vminsert}: add an ability to ignore input samples outside the current aggregation interval for stream aggregation
See https://docs.victoriametrics.com/stream-aggregation.html#ignoring-old-samples
2024-03-17 23:03:47 +02:00
Aliaksandr Valialkin
6a465f6e29
lib/streamaggr: ignore out of order samples when calculating increase, increase_prometheus, total and total_prometheus outputs
Out of order samples may result in unexpected spikes for these outputs.
So it is better to ignore such samples.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931
2024-03-17 22:03:03 +02:00
Aliaksandr Valialkin
cbd80efcc1
lib/streamaggr: follow-up for 15e33d56f1
- Properly set pushSample.timestamp when flushing de-duplicated samples to stream aggregation
  This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931

- Re-classify this change as feature instead of bugfix at docs/CHANGELOG.md

- Verify de-duplication logic for samples with different timestamps

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5643
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5939
2024-03-17 21:37:16 +02:00
Aliaksandr Valialkin
b577413d3b
lib/promauth: properly set Host header in requests to scrape targets.
The `Host` header must be set via net/http.Request.Host field, since net/http.Client
ignores this header if it is set via Request.Header.Set().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5969
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5970
2024-03-17 20:22:54 +02:00
Andrii Chubatiuk
15e33d56f1
lib/streamaggr: pick sample with bigger timestamp or value on deduplicator (#5939)
Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication
This would require more memory for deduplication, since we need to track timestamp
for each record. However, deduplication should become more consistent.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5643

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-03-12 22:47:29 +01:00
Aliaksandr Valialkin
d1d2771bee
lib/storage: optimize /api/v1/labels and /api/v1/label/.../values when match[] contains metric name
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-03-12 02:43:16 +02:00