Commit graph

2934 commits

Author SHA1 Message Date
Andrei Baidarov
728ceefca1
vmagent: set up a timeout for tcp connection establishment during k8s discovery
Previously, default dial timeout was used for kubernetes API server connection.

 This commit changes it for custom dialer used by the all VictoriaMetrics components. It has lower connection timeout (30s by default). 


 Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7127

---------
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2024-11-25 18:08:32 +01:00
Ivan Yurochko
60ac4e0c00
lib/streamaggr: add ignore_first_sample_interval param for streamaggr cfg (#7313)
### Describe Your Changes

As of right now by default aggregated output in streaming aggregation
takes a staleness interval and only starts sending first samples after
the staleness interval passes. We have a use case where we prefer to
start sending data as soon as we have any. This adds the option to
configure when we start sending first samples

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7116

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-11-22 17:38:13 +01:00
Artem Fetishev
3ddaafa729
lib/storage: confirm that changing retention period can cause previous indexDB deletion (#7569)
### Describe Your Changes

Add test cases proving that it is possible to lose indexDB after
changing the retention period. See #7609

### Checklist

The following checks are **mandatory**:

- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit 3383589fd1)
2024-11-21 15:23:48 +01:00
Will Jordan
209a5024ce
lib/tenantmetrics: improves CounterMap performance with large numbers of tenants
Previously, map for storing tenant metrics was re-created to each newly ingested tenant. It has significant performance impact for systems with large number of tenants. 

 This commit addresses this issue by changing algorithm of creating tenant metric records at map. Instead of map re-creation, it uses `sync.Map` primitive. 


Benchmark results:

```
goos: linux
goarch: amd64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics
cpu: AMD Ryzen 9 5900X 12-Core Processor            
                                            │ lib/tenantmetrics/orig.bench │     lib/tenantmetrics/new.bench     │
                                            │            sec/op            │    sec/op     vs base               │
CounterMapGrowth/n=100,nProcs=GOMAXPROCS-24                  1943.2µ ±  5%   248.0µ ± 11%  -87.24% (p=0.001 n=7)
CounterMapGrowth/n=100-24                                    434.63µ ±  5%   98.82µ ± 16%  -77.26% (p=0.001 n=7)
CounterMapGrowth/n=1000-24                                   32.719m ± 20%   1.425m ±  5%  -95.65% (p=0.001 n=7)
CounterMapGrowth/n=10000-24                                 3653.60m ±  5%   18.00m ±  2%  -99.51% (p=0.001 n=7)
geomean                                                       17.83m         890.4µ        -95.00%
```

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7482

---
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
2024-11-20 18:36:06 +01:00
Nikolay
59d2b4c7fc
lib/storage: properly check for minMissingTimestamps
After changes at commit 787b9cd. Minimal timestamps for extDB check was performed without context of the index search prefix.
It worked fine for Single node version, but for cluster version a different prefix was used for
metricID search requests. It may lead to incomplete results, if minimal missing timestamp was cached
for the tenant with different ingestion patterns.

 Minimal reproducible case is:
- metrics were ingested for tenants 0 and 1
- at some point in time metrics ingestion for tenant 1 stopped
- index records have the following timestamps layout:
 tenant 0:  1,2,3,4,5,6
 tenant 1:  1,2,3,4
- after indexDB rotation, containsTimeRange lookups may produce
  incorrect results:
 time range request for tenant 1 - 5:6 caches 5 as min timestamp
 request for the same or smaller time range for tenant 0 now returns
empty results.

Second case:
- requests for the tenant without metrics always updates atomic value with incorrect minimal time range for other tenants.

 This commit replaces single atomic with map of search prefix keys. It should have slight performance overhead,
but work consistently for cluster version. minMissingTimestamp is cached by prefix search key, which included tenantID.

 Since it will be only populated at runtime, it doesn't hold unused tenants for queries.

Related issue: 
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7417
2024-11-15 16:18:32 +01:00
Aliaksandr Valialkin
75e4a8e64b
lib/logstorage: properly skip filtered out dict values when calculating uniq_values, min, max, row_min and row_max stats functions
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7458
2024-11-14 17:21:28 +01:00
Aliaksandr Valialkin
8b287e8da4
lib/logstorage: properly clone field values at values stats function
Previously field values weren't properly cloned, which could lead to garbage output for `values` stats function

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7458
2024-11-14 17:21:28 +01:00
Aliaksandr Valialkin
a7e1575ebe
lib/logstorage: simplify the code for uniq_values stats function a bit
Move the repeated check for an empty value into statsUniqValuesProcessor.updateState() function.
This allow removing duplicate code for this check from statsUniqValuesProcessor.updateState() call sites.
2024-11-14 17:21:27 +01:00
Aliaksandr Valialkin
5b0b7d509f
lib/logstorage: support for [label1=value1 ... labelN=valueN] syntax inside syslog messages for adding arbitrary labels (fields) to log entries 2024-11-14 17:21:26 +01:00
Aliaksandr Valialkin
a02d26e853
lib/logstorage: properly take into account the end query arg when calculating time range for _time:duration filters
(cherry picked from commit e5537bc64d)
2024-11-08 17:07:57 +01:00
Aliaksandr Valialkin
f82cfa16bf
lib/logstorage: allow specifying _time filter offset without time range
This is useful when builiding graphs on time ranges in the past.

(cherry picked from commit a98fb495c6)
2024-11-08 17:07:57 +01:00
Aliaksandr Valialkin
a4ea3b87d7
lib/logstorage: optimize query imeediately after its parsing
This eliminates possible bugs related to forgotten Query.Optimize() calls.

This also allows removing optimize() function from pipe interface.

While at it, drop filterNoop inside filterAnd.

(cherry picked from commit 66b2987f49)
2024-11-08 17:07:56 +01:00
Aliaksandr Valialkin
52929c060a
app/vlselect/logsql: call Query.Optimize() inside parseCommonArgs(), which is called et every /select/logsql/* endpoint.
This reduces the probability of forgotten call to Query.Optimize().

(cherry picked from commit 0550093802)
2024-11-08 17:07:56 +01:00
Aliaksandr Valialkin
7d078dd591
lib/logstorage: add an ability to add prefix to resulting query field names in join pipe
See https://docs.victoriametrics.com/victorialogs/logsql/#join-pipe

(cherry picked from commit 5a6531b329)
2024-11-08 17:07:56 +01:00
Aliaksandr Valialkin
364a2e3e1f
docs/VictoriaLogs: properly sort log fields with floating-point numbers
(cherry picked from commit 42c9183281)
2024-11-07 13:00:20 +01:00
Aliaksandr Valialkin
7a39f526ec
lib/logstorage: add block_stats pipe for analyzing per-block storage stats
(cherry picked from commit 5ed54ebadf)
2024-11-07 13:00:19 +01:00
Aliaksandr Valialkin
83c9d42263
lib/logstorage: add join pipe for joining multiple query results
(cherry picked from commit f9e23bf8e3)
2024-11-07 13:00:19 +01:00
Zakhar Bessarab
718f8077a8
Revert "lib/mergeset: add sparse indexdb cache (#7269)"
This reverts commit 837d0d136d.
2024-11-04 10:33:22 -03:00
Aliaksandr Valialkin
fced48d540
app/vlinsert: implement the ability to add extra fields to the ingested logs
This can be done via extra_fields query arg or via VL-Extra-Fields HTTP header.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7354#issuecomment-2448671445

(cherry picked from commit 4478e48eb6)
2024-11-04 10:23:16 -03:00
Aliaksandr Valialkin
bcbaecd73f
lib/logstorage: increase the the maximum number of columns per block from 1000 to 2000
This will allow storing wide events with up to 2K fields per event into VictoriaLogs.
While at it, remove the misleading comment that columnsHeader is read in full per each matching block.
This is no longer the case after the improvements made at 202eb429a7 .
Now only the needed columnHeader is read for the column mentioned in the query.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6425#issuecomment-2418337124
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762

(cherry picked from commit 9ba6be4179)
2024-11-04 10:23:15 -03:00
Nikolay
b6e7852eee
lib/promscrape: add relabel configs to global section
This commit adds `metric_relabel_configs` and `relabel_configs` fields
into the `global` section of scrape configuration file.

 New fields are used as global relabeling rules for the scrape targets.

 These relabel configs are prepended to the target relabel configs.
This feature is useful to:
* apply global rules to __meta labels from service discovery targets.
* drop noisy labels during scrapping.
* mutate labels without affecting metrics ingested via any of push
protocols. 

Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6966

---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-10-31 20:08:31 +01:00
Aliaksandr Valialkin
1ea65d662f
lib/logstorage: properly reset cached output fields for extract and extract_regexp pipes after the log entry matches if(...) condition
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7162

(cherry picked from commit c5d08d317c)
2024-10-31 14:11:08 +01:00
Aliaksandr Valialkin
b74bcb7886
lib/logstorage: properly cache replace() and replace_regexp() results for identical adjacent field values
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7162

(cherry picked from commit 2e635a42d8)
2024-10-31 14:11:08 +01:00
Aliaksandr Valialkin
0c657a95dc
app/vlselect: add support for extra_filters and extra_stream_filters query args across all the HTTP querying APIs
These query args are going to be used for quick filtering on field values at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7365

(cherry picked from commit 7603446850)
2024-10-31 14:11:07 +01:00
Artem Fetishev
d7b3589dbd
tests: Initial version of integration tests (#7253)
### Describe Your Changes

Related issue: #7199 

This is the initial version of the integration tests for cluster. See
`README.md` for details.

Currently cluster only, but it can also be used for vm-single if needed.

The code has been added to the apptest package that resides in the root
directory of the VM codebase. This is done to exclude the integration
tests from regular testing build targets because:

- Most of the test variants do not apply to integration testing (such as
pure or race).
- The integtation tests may also be slow because each test must wait for
2 seconds so vmstorage flushes pending content). It may be okay when
there are a few tests but when there is a 100 of them running tests will
require much more time which will affect the developer wait time and CI
workflows.
- Finally, the integration tests may be flaky especially short term.

An alternative approach would be placing apptest under app package and
exclude apptest from packages under test, but that is not trivial.

The integration tests rely on retrieving some application runtime info
from the application logs, namely the application's host:port. Therefore
some changes to lib/httpserver/httpserver.go were necessary, such as
reporting the effective host:port instead the one from the flag.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2024-10-30 15:22:06 +01:00
Aliaksandr Valialkin
8baa5177aa
app/vlinsert: allow specifying comma-separated list of fields containing log message via _msg_field query arg and VL-Msg-Field HTTP request header
This msy be useful when ingesting logs from different sources, which store the log message in different fields.
For example, `_msg_field=message,event.data,some_field` will get log message from the first non-empty field:
`message`, `event.data` and `some_field`.

(cherry picked from commit ed73f8350b)
2024-10-30 15:19:52 +01:00
Aliaksandr Valialkin
bf243df9ce
lib/logstorage: make sure that the number of output (bloom, values) shards is bigger than zero.
If the number of output (bloom, values) shards is zero, then this may lead to panic
as shown at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7391 .

This panic may happen when parts with only constant fields with distinct values are merged into
output part with non-constant fields, which should be written to (bloom, values) shards.

(cherry picked from commit 102e9d4f4e)
2024-10-30 15:19:51 +01:00
cangqiaoyuzhuo
07cf3189f8
chore: fix function name (#7381)
### Describe Your Changes

 fix function name

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

(cherry picked from commit 45896fb477)
2024-10-30 13:13:05 +01:00
Aliaksandr Valialkin
1dd01b8a8f
lib/logstorage: follow-up for af831a6c906158f371f1b6810706fa0a54b78386
Sync the code between top and sort pipes regarding the code related to rank.

(cherry picked from commit 7a623c225f)
2024-10-30 09:52:52 +01:00
Aliaksandr Valialkin
329d9a46ee
lib/logstorage: add an ability to return rank from top pipe results
(cherry picked from commit 3c06d083ea)
2024-10-30 09:52:51 +01:00
Aliaksandr Valialkin
fe5f16b662
lib/logstorage: dynamically adjust the number of (bloom, values) shards in a part depending on the number of non-const columns
This allows reducing the amounts of data, which must be read during queries over logs with big number of fields (aka "wide events").
This, in turn, improves query performance when the data, which needs to be scanned during the query, doesn't fit OS page cache.

(cherry picked from commit 7a62eefa34)
2024-10-30 09:52:51 +01:00
Aliaksandr Valialkin
76b21c8560
lib/logstorage: avoid reading columnsHeader data when field_values pipe is applied directly to log filters
This improves performance of `field_values` pipe when it is applied to large number of data blocks.
This also improves performance of /select/logsql/field_values HTTP API.

(cherry picked from commit 8d968acd0a)
2024-10-30 09:52:50 +01:00
Hui Wang
9616814728
vmalert: integrate with victorialogs (#7255)
address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706.
See
https://github.com/VictoriaMetrics/VictoriaMetrics/blob/vmalert-support-vlog-ds/docs/VictoriaLogs/vmalert.md.

Related fix
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7254.

Note: in this pull request, vmalert doesn't support
[backfilling](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/vmalert-support-vlog-ds/docs/VictoriaLogs/vmalert.md#rules-backfilling)
for rules with a customized time filter. It might be added in the
future, see [this
issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7289)
for details.

Feature can be tested with image
`victoriametrics/vmalert:heads-vmalert-support-vlog-ds-0-g420629c-scratch`.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 68bad22fd2)
2024-10-29 16:32:00 +01:00
Zakhar Bessarab
517bd9392c
lib/storage/partition: prevent panic in case resulting in-memory part is empty after merge (#7329)
It is possible for in-memory part to be empty if ingested samples are
removed by retention filters. In this case, data will not be discarded
due to retention before creating in memory part. After in-memory parts
merge samples will be removed resulting in creating completely empty
part at destination.

 This commit checks for resulting part and skips it, if it's empty.

---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-10-27 20:42:42 +01:00
Zhu Jiekun
3d55605ae5
lib/promscrape: adds support for PuppetDB service discovery
This commit adds support for
[PuppetDB](https://www.puppet.com/docs/puppetdb/8/overview.html) service
discovery to the `vmagent` and `victoria-metrics-single` components.

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5744
2024-10-27 20:42:42 +01:00
Andrii Chubatiuk
f6f4884ba6
lib/promscrape/discovery/kubernetes: support kubernetes native sidecars (#7324)
This commit adds Kubernetes Native Sidecar support. 

It's the special type of init containers, that have restartPolicy == "Always" and continue to run after container initialization. 


related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7287
2024-10-27 20:25:15 +01:00
Zakhar Bessarab
8198e7241d
lib/mergeset: add sparse indexdb cache (#7269)
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7182

- add a separate index cache for searches which might read through large
amounts of random entries. Primary use-case for this is retention and
downsampling filters, when applying filters background merge needs to
fetch large amount of random entries which pollutes an index cache.
Using different caches allows to reduce effect on memory usage and cache
efficiency of the main cache while still having high cache hit rate. A
separate cache size is 5% of allowed memory.

- reduce size of indexdb/dataBlocks cache in order to free memory for
new sparse cache. Reduced size by 5% and moved this to a separate cache.

- add a separate metricName search which does not cache metric names -
this is needed in order to allow disabling metric name caching when
applying downsampling/retention filters. Applying filters during
background merge accesses random entries, this fills up cache and does
not provide an actual improvement due to random access nature.

Merge performance and memory usage stats before and after the change:

- before

![image](https://github.com/user-attachments/assets/485fffbb-c225-47ae-b5c5-bc8a7c57b36e)

- after

![image](https://github.com/user-attachments/assets/f4ba3440-7c1c-4ec1-bc54-4d2ab431eef5)

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
(cherry picked from commit 837d0d136d)
2024-10-24 12:43:06 -03:00
Artem Fetishev
a722ddbdd9
lib/storage: Fix flaky test: TestStorageRotateIndexDB (#7267)
This commit fixes the TestStorageRotateIndexDB flaky test reported at:
#6977. Sample test failure: https://pastebin.com/bTSs8HP1

The test fails because one goroutine adds items to the indexDB table
while another goroutine is closing that table. This may happen if
indexDB rotation happens twice during one Storage.add() operation:
-  Storage.add() takes the current indexDB and adds index recods to it
- First index db rotation makes the current index DB a previous one
(still ok at this point)
- Second index db rotation removes the indexDB that was current two
rotations earlier. It does this by setting the mustDrop flag to true and
decrementing the ref counter. The ref counter reaches zero which cases
the underlying indexdb table to release its resources gracefully.
Graceful release assumes that the table is not written anymore. But
Storage.add() still adds items to it.

The solution is to increment the indexDB ref counters while it is used
inside add().
The unit test has been changed a little so that the test fails reliably.
The idea is to make add() function invocation to last much longer,
therefore the test inserts not just one record at a time but thouthands
of them.

To see the test fail, just replace the idbsLocked() func with:

```go
unc (s *Storage) idbsLocked2() (*indexDB, *indexDB, func()) {
       return s.idbCurr.Load(), s.idbNext.Load(), func() {}
}
```

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>

(cherry picked from commit 6b9f57e5f7)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-24 15:45:54 +02:00
Zhu Jiekun
85f60237e2
vmstorage: auto calculate maxUniqueTimeseries based on resources (#6961)
### Describe Your Changes

Add support for
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6930

Calculate `-search.maxUniqueTimeseries` by
`-search.maxConcurrentRequests` and remaining memory if it's **not set**
or **less equal than 0**.

The remaining memory is affected by `-memory.allowedPercent`,
`-memory.allowedBytes` and cgroup memory limit.
### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-10-18 13:41:43 +02:00
Andrii Chubatiuk
1d352b92c7
lib/promscrape: fixed reload on max_scrape_size change (#7282)
### Describe Your Changes

fixed reload on max_scrape_size change
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7260

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 965a33c893)
2024-10-18 11:42:47 +02:00
Aliaksandr Valialkin
62e4baf556
lib/logstorage: use simpler in-memory cache instead of workingsetcache for caching recently ingested _stream values and recently queried set of streams
These caches aren't expected to grow big, so it is OK to use the most simplest cache based on sync.Map.
The benefit of this cache compared to workingsetcache is better scalability on systems with many CPU cores,
since it doesn't use mutexes at fast path.
An additional benefit is lower memory usage on average, since the size of in-memory cache equals
working set for the last 3 minutes.

The downside is that there is no upper bound for the cache size, so it may grow big during workload spikes.
But this is very unlikely for typical workloads.

(cherry picked from commit 0f24078146)
2024-10-18 11:42:16 +02:00
Aliaksandr Valialkin
f9d86a913c
lib/logstorage: do not persist streamIDCache, since it may go out of sync with partition directories, which can be changed manually between VictoriaLogs restarts
Partition directories can be manually deleted and copied from another sources such as backups or other VitoriaLogs instances.
In this case the persisted cache becomes out of sync with partitions. This can result in missing index entries
during data ingestion or in incorrect results during querying. So it is better to do not persist caches.
This shouldn't hurt VictoriaLogs performance just after the restart too much, since its caches usually contain
small amounts of data, which can be quickly re-populated from the persisted data.

(cherry picked from commit 8aa144fa74)
2024-10-18 11:42:16 +02:00
Aliaksandr Valialkin
b9fae4378a
lib/logstorage: consistently use "pHits := m[..]" pattern
Consistency improves maintainability of the code a bit.

(cherry picked from commit 1892e357c3)
2024-10-18 11:42:16 +02:00
Aliaksandr Valialkin
92b9b13df1
lib/logstorage: optimize performance for queries, which select all the log fields for logs containing hundreds of log fields (aka "wide events")
Unpack the full columnsHeader block instead of unpacking meta-information per each individual column
when the query, which selects all the columns, is executed. This improves performance when scanning
logs with big number of fields.

(cherry picked from commit 2023f017b1)
2024-10-18 11:42:15 +02:00
Aliaksandr Valialkin
5d541322c6
lib/logstorage: improve performance of top and field_values pipes on systems with many CPU cores
- Parallelize mering of per-CPU results.
- Parallelize writing the results to the next pipe.

(cherry picked from commit 78c6fb0883)
2024-10-18 11:42:15 +02:00
Aliaksandr Valialkin
cd7823a310
lib/logstorage: optimize 'stats by(...)' calculations for by(...) fields with millions of unique values on multi-CPU systems
- Parallelize merging of per-CPU `stats by(...)` result shards.
- Parallelize writing `stats by(...)` results to the next pipe.

(cherry picked from commit c4b2fdff70)
2024-10-18 11:42:15 +02:00
Aliaksandr Valialkin
1000ae437c
lib/logstorage: optimize performance for top pipe when it is applied to a field with millions of unique values
- Use parallel merge of per-CPU shard results. This improves merge performance on multi-CPU systems.
- Use topN heap sort of per-shard results. This improves performance when results contain millions of entries.

(cherry picked from commit 192c07f76a)
2024-10-18 11:42:15 +02:00
hagen1778
d14cf4b679
docs: follow-up after f0d1db81dc
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-17 11:19:03 -03:00
Roman Khavronenko
4114301955
lib/flagutil: rename Duration to RetentionDuration (#7284)
The purpose of this change is to reduce confusion between using
`flag.Duration` and `flagutils.Duration`. The reason is that
`flagutils.Duration` was mistakenly used for cases that required `m`
support. See
ab0d31a7b0

The change in name should clearly indicate the purpose of this data
type.

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-17 11:18:45 -03:00
Alexander Frolov
4ed01e3ab8
lib/flagutil: rm misleading minutes support from flagutil.Duration docs (#7066)
### Describe Your Changes

`flagutil.Duration` docs state that `m` suffix stands for `minute`, but
in fact this suffix is not supported due to ambiguity with `month`

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Alexander Frolov <winningpiece@gmail.com>
2024-10-17 11:11:46 -03:00
Zakhar Bessarab
efb7213f6b
lib/flagutil/dict: properly update default value in case there is no key value set (#7211)
### Describe Your Changes

If a dict flag has only one value without a prefix it is supposed to
replace default value.

Previously, when flag was set to `-flag=2` and the default value in
`NewDictInt` was set to 1 the resulting value for any `flag.Get()` call
would be 1 which is not expected.

This commit updates default value for the flag in case there is only one
entry for flag and the entry is a number without a key.

This affects cluster version and specifically `replicationFactor` flag
usage with vmstorage [node
groups](https://docs.victoriametrics.com/cluster-victoriametrics/#vmstorage-groups-at-vmselect).
Previously, the following configuration would effectively be ignored:
```
/path/to/vmselect \
 -replicationFactor=2 \
 -storageNode=g1/host1,g1/host2,g1/host3 \
 -storageNode=g2/host4,g2/host5,g2/host6 \
 -storageNode=g3/host7,g3/host8,g3/host9
```

Changes from this PR will force default value for `replicationFactor`
flag to be set to `2` which is expected as the result of this
configuration.


---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-10-17 11:11:45 -03:00
Aliaksandr Valialkin
54ccf09fdd
lib/logstorage: follow-up for 72941eac36
- Allow dropping metrics if the query result contains at least a single metric.
- Allow copying by(...) fields.
- Disallow overriding by(...) fields via `math` pipe.
- Allow using `format` pipe in stats query. This is useful for constructing some labels from the existing by(...) fields.
- Add more tests.
- Remove the check for time range in the query filter according to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7254/files#r1803405826

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7254
2024-10-17 11:09:16 -03:00
Hui Wang
21864de527
victorialogs: add more checks for stats query APIs (#7254)
1. Verify if field in [fields
pipe](https://docs.victoriametrics.com/victorialogs/logsql/#fields-pipe)
exists. If not, it generates a metric with illegal float value "" for
prometheus metrics protocol.
2. check if multiple time range filters produce conflicted query time
range, for instance:
```
query: _time: 5m | stats count(), 
start:2024-10-08T10:00:00.806Z, 
end: 2024-10-08T12:00:00.806Z, 
time: 2024-10-10T10:02:59.806Z
```
must give no result due to invalid final time range.

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-10-17 11:09:16 -03:00
Aliaksandr Valialkin
3346576a3a
lib/logstorage: refactor storage format to be more efficient for querying wide events
It has been appeared that VictoriaLogs is frequently used for collecting logs with tens of fields.
For example, standard Kuberntes setup on top of Filebeat generates more than 20 fields per each log.
Such logs are also known as "wide events".

The previous storage format was optimized for logs with a few fields. When at least a single field
was referenced in the query, then the all the meta-information about all the log fields was unpacked
and parsed per each scanned block during the query. This could require a lot of additional disk IO
and CPU time when logs contain many fields. Resolve this issue by providing an (field -> metainfo_offset)
index per each field in every data block. This index allows reading and extracting only the needed
metainfo for fields used in the query. This index is stored in columnsHeaderIndexFilename ( columns_header_index.bin ).
This allows increasing performance for queries over wide events by 10x and more.

Another issue was that the data for bloom filters and field values across all the log fields except of _msg
was intermixed in two files - fieldBloomFilename ( field_bloom.bin ) and fieldValuesFilename ( field_values.bin ).
This could result in huge disk read IO overhead when some small field was referred in the query,
since the Operating System usually reads more data than requested. It reads the data from disk
in at least 4KiB blocks (usually the block size is much bigger in the range 64KiB - 512KiB).
So, if 512-byte bloom filter or values' block is read from the file, then the Operating System
reads up to 512KiB of data from disk, which results in 1000x disk read IO overhead. This overhead isn't visible
for recently accessed data, since this data is usually stored in RAM (aka Operating System page cache),
but this overhead may become very annoying when performing the query over large volumes of data
which isn't present in OS page cache.

The solution for this issue is to split bloom filters and field values across multiple shards.
This reduces the worst-case disk read IO overhead by at least Nx where N is the number of shards,
while the disk read IO overhead is completely removed in best case when the number of columns doesn't exceed N.
Currently the number of shards is 8 - see bloomValuesShardsCount . This solution increases
performance for queries over large volumes of newly ingested data by up to 1000x.

The new storage format is versioned as v1, while the old storage format is version as v0.
It is stored in the partHeader.FormatVersion.

Parts with the old storage format are converted into parts with the new storage format during background merge.
It is possible to force merge by querying /internal/force_merge HTTP endpoint - see https://docs.victoriametrics.com/victorialogs/#forced-merge .
2024-10-17 11:09:16 -03:00
Nikolay
135e3ced8c
lib/storage: properly unmarshal SearchQuery (#7277)
After adding multitenant query feature at v1.104.0, searchQuery wasn't
properly unmarshalled at bottom vmselect in multi-level cluster setup.
It resulted into empty query responses.

This commit adds fallback to Unmarshal method of SearchQuery to fill
TenantTokens. It allows to properly execute search requests
at vmselect side.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7270

---------

Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-10-17 11:45:27 +02:00
Andrii Chubatiuk
019171fdfc
lib/protoparser/influx: enable batch processing by default (#7165)
### Describe Your Changes

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7090

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit daa7183749)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-15 11:51:48 +02:00
Aliaksandr Valialkin
0881e5fd5c
app/vlselect: do not show empty fields in query results
Empty fields are treated as non-existing fields by VictoriaLogs data model.
So there is no sense in returning empty fields in query results, since they may mislead and confuse users.

(cherry picked from commit bac193e50b)
2024-10-15 11:49:32 +02:00
Aliaksandr Valialkin
f627d7f686
app/vlstorage: add support for forced merge via /internal/force_merge HTTP endpoint
(cherry picked from commit 3c73dbbacc)
2024-10-15 11:49:31 +02:00
Aliaksandr Valialkin
ac2b6e8704
lib/logstorage: make a copy of s.partitions slice when performing queries over the selected partitions
s.partitions can be changed when new partition is registered or when old partition is dropped.
This could lead to data races and panics when s.partitions slice is accessed by concurrently executed queries.

The fix is to make a copy of the selected partitions under s.partitionsLock before performing the query.

(cherry picked from commit b4b79a4961)
2024-10-15 11:49:31 +02:00
Aliaksandr Valialkin
b694ca4952
lib/logstorage: move getConstColumnValue() and getColumnHeader() methods from columnsHeader to blockSearch
This localizes blockSearch.getColumnsHeader() call at block_search.go .
This call is going to be optimized in the next commits in order to avoid
unmarshaling of header data for unneeded columns, which weren't requested
by getConstColumnValue() / getColumnHeader().

(cherry picked from commit 507b206a7d)
2024-10-15 11:49:30 +02:00
Aliaksandr Valialkin
beeb80e4f8
lib/logstorage: avoid redundant copying of column names and column values for dictionary-encoded columns during querying
Refer the original byte slice with the marshaled columnsHeader for columns names and dictionary-encoded column values.
This improves query performance a bit when big number of blocks with big number of columns are scanned during the query.

(cherry picked from commit 279e25e7c8)
2024-10-15 11:49:30 +02:00
Aliaksandr Valialkin
afe5158443
lib/logstorage: avoid calling columnsHeader.initFromBlockHeader() multiple times for the same blockSearch
This should improve performance when blockSearch.getColumnsHeader() is called multiple times
from different places of the code.

(cherry picked from commit 9e48074b59)
2024-10-15 11:49:30 +02:00
Aliaksandr Valialkin
e581338b84
lib/logstorage: make sure that bs.br is non-nil before checking br.bs.bsw.bh.rowsCount there
br.bs may be nil when br contains the block with additional filters applied during pipe calculations.
For example, `* | count() if (error) errors`.

(cherry picked from commit 867f671cc4)
2024-10-15 11:49:29 +02:00
Andrii Chubatiuk
68b6834542
lib/protoparser/opentelemetry: added exponential histograms support (#6354)
### Describe Your Changes

added opentelemetry exponential histograms support. Such histograms are automatically converted into
VictoriaMetrics histogram with `vmrange` buckets.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 9eb0c1fd86)
2024-10-11 14:28:19 +02:00
Aliaksandr Valialkin
b3bbf94310
lib/logstorage: disallow using pipe names as the first unquoted words in filter pipe
Improperly written pipes could be silently parsed as filter pipe.
For example, the following query:

   * | by (x)

was silently parsed to:

   * | filter "by" x

It is better to return error, so the user could identify and fix invalid pipe
instead of silently executing invalid query with `filter` pipe.

(cherry picked from commit 7b475ed95d)
2024-10-11 14:27:46 +02:00
Aliaksandr Valialkin
834e2ad855
lib/logstorage: disallow using by as the first word in log filters, since it frequently clashes with stats by(...) pipe where stats word is omitted
(cherry picked from commit 6acf543b90)
2024-10-11 14:27:46 +02:00
Zakhar Bessarab
06947c2685
vmagent: add support of HTTP2 client for Kubernetes SD (#7114)
### Describe Your Changes

Currently, vmagent always uses a separate `http.Client` for every group
watcher in Kubernetes SD. With a high number of group watchers this
leads to large amount of opened connections.

This PR adds 2 changes to address this:
- re-use of existing `http.Client` - in case `http.Client` is connecting
to the same API server and uses the same parameters it will be re-used
between group watchers
- HTTP2 support - this allows to reuse connections more efficiently due
to ability of using streaming via existing connections.

See this issue for the details and test results -
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5971

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit eefae85450)
2024-10-08 10:37:48 +02:00
Aliaksandr Valialkin
0a61222627
lib/logstorage: quote logfmt strings only if they contain special chars, which could break logfmt parsing and/or reading
(cherry picked from commit 462b7cd597)
2024-10-07 14:47:22 +02:00
Artem Fetishev
ac128d268f
lib/promscrape: Fix TestClientProxyReadOk flaky test (#7173)
This PR fixes #7062

For hijacked connections, one has to read from the connection buffer,
but still write directly to the connection. Otherwise, when reading
directly from such connections, the first byte may be lost. This, in
turn corrupts the ClientHello TLS handshake message and when the backend
server receives it, it closes the connection and reports the following
error in the log:

```
http: TLS handshake error from 127.0.0.1:33150: tls: first record does not look
like a TLS handshake
```

The first byte may be lost because underlying HTTP request handler may
read it from the connection and put it into the buffer. As the result,
subsequent connection reads won't see that byte.

-   See: https://github.com/golang/go/issues/27408
-   The fix is taken from : https://github.com/k3s-io/k3s/pull/6216

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
(cherry picked from commit c1cd3e85a7)
2024-10-04 10:42:52 +02:00
Aliaksandr Valialkin
7a44614e0b
lib/logstorage: add len pipe for calculating byte length of log field values
(cherry picked from commit 364f084b43)
2024-10-04 10:42:51 +02:00
Roman Khavronenko
dfb2ad4ab4
(app|lib)/vmstorage: do not increment vm_rows_ignored_total on NaNs (#7166)
`vm_rows_ignored_total` metric is a metric for users to signalize about
ingestion issues, such as bad timestamp or parsing error.
In commit
a5424e95b3
this metric started to increment each time vmstorage gets NaN. But NaN
is a valid value for Prometheus data model and for Prometheus metrics
exposition format. Exporters from Prometheus ecosystem could expose NaNs
as values for metrics and these values will be delivered to vmstorage
and increment the metric.
Since there is nothing user can do with this, in opposite to parsing
errors or bad timestamps, there is not much sense in incrementing this
metric. So this commit rolls-back `reason="nan_value"` increments.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 0d4f4b8f7d)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-02 12:43:13 +02:00
Zakhar Bessarab
44b071296d
vmselect: add support of multi-tenant queries (#6346)
### Describe Your Changes

Added an ability to query data across multiple tenants. See:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1434

Currently, the following endpoints work with multi-tenancy:
- /prometheus/api/v1/query
- /prometheus/api/v1/query_range
- /prometheus/api/v1/series
- /prometheus/api/v1/labels
- /prometheus/api/v1/label/<label_name>/values
- /prometheus/api/v1/status/active_queries
- /prometheus/api/v1/status/top_queries
- /prometheus/api/v1/status/tsdb
- /prometheus/api/v1/export
- /prometheus/api/v1/export/csv
- /vmui


A note regarding VMUI: endpoints such as `active_queries` and
`top_queries` have been updated to indicate whether query was a
single-tenant or multi-tenant, but UI needs to be updated to display
this info.
cc: @Loori-R 

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2024-10-01 16:37:18 +02:00
Aliaksandr Valialkin
81f3e07e1e
lib/logstorage: do not count dictionary values which have no matching logs in count_uniq stats function
Create blockResultColumn.forEachDictValue* helper functions for visiting matching
dictionary values. These helper functions should prevent from counting dictionary values
without matching logs in the future.

This is a follow-up for 0c0f013a60
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7152
2024-10-01 13:36:27 +02:00
Aliaksandr Valialkin
8c55b699f4
app/vlogscli: add interactive command-line tool for querying VictoriaLogs 2024-10-01 12:24:53 +02:00
Zhu Jiekun
d1d59d6348
feature: [vmagent] Add service discovery support for OVH Cloud VPS and dedicated server (#6160)
### Describe Your Changes
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6071

#### Added
- Added service discovery support for OVH Cloud:
    - VPS.
    - Dedicated server.

#### Docs
- `CHANGELOG.md`, `sd_configs.md`, `vmagent.md` are updated.

#### Note
- Useful links: 
    - OVH Cloud VPS API: https://eu.api.ovh.com/console/#/vps~GET
- OVH Cloud Dedicated server API:
https://eu.api.ovh.com/console/#/dedicated/server~GET
    - OVH Cloud SDK: https://github.com/ovh/go-ovh
- Prometheus SD:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ovhcloud_sd_config

Tested on OVH Cloud VPS and dedicated server.
<img width="1722" alt="image"
src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/30280396/d3f0adc8-b0ef-423e-9379-8a9b9b0792ee">

<img width="1724" alt="image"
src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/30280396/18b5b730-3512-4fc0-8b2c-f2450ac550fd">

---
Signed-off-by: Jiekun <jiekun@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-09-30 15:06:14 +02:00
Hui Wang
bf3d9ba57b
stream aggregation: fix possible duplicated aggregation results (#7118)
When ingesting samples with the same labels(duplicated samples or
samples with the same labels after `by` or `without` options). They
could register different entries for the same labelset in
LabelsCompressor.
For example, both index 99 and 100 can be assigned to label `foo=1` in
two concurrent pushes. Then due to differing label indexes in encoded
keys, the samples will appear as distinct in aggrState, resulting in
duplicated results after decompressing the label indexes.

fbde238cdc/lib/streamaggr/streamaggr.go (L933)

In this pull request, since we need to store `idxToLabel` first to
ensure the idx can be searched after `lc.labelToIdxStore`,
the `lc.idxToLabel` still could contain a duplicated entries
[100]="foo=1". But given the low likelihood of this issue and the size
of idxToLabel, it should be fine.
2024-09-30 14:30:34 +02:00
Aliaksandr Valialkin
dbcf06cd85
lib/logstorage: skip values with zero hits for 'uniq', 'top' and 'field_values' pipes
See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/72#issuecomment-2352078483
2024-09-30 14:16:21 +02:00
Artem Fetishev
91c2b5b24d
Introduce a flag for limiting the number of time series to delete (cluster version) (#7112)
### Describe Your Changes

Introduce the `-search.maxDeleteSeries` flag that limits the number of
time series that can be deleted with a single
`/api/v1/admin/tsdb/delete_series` call.

Currently, any number can be deleted and if the number is big (millions)
then the operation may result in unaccounted CPU and memory usage spikes
which in some cases may result in OOM kill (see #7027). The flag limits
the number to 30k by default and the users may override it if needed at
the vmstorage start time.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7027
---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2024-09-30 12:43:11 +02:00
Nikolay
cb50408dc6
fscore: rollback trailing space trim (#7106)
Previous commit 201fd6de1e removed
trailing space trim from data read from file. But common practice is to
remove such trailing space. And it leaded to the authorization errors
for the major group of users.

In first place, this change must help to mitigate an issue with
kubernetes. When authorization information was read from Secret content.
Changes to the operator was made to mitigate such problem at commit
1cf64358c8

We could introduce later optional flag for VictoriaMetrics to disable
trim space behavior.

Related issues:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6986
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7089 
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6947

---------

Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>
2024-09-29 14:48:36 +02:00
Aliaksandr Valialkin
7456cbc653
lib/logstorage: allow using ! in unescaped phrase
Previously the phrase filter with `!` was treated unexpectedly.
For example, `foo!bar` filter was treated at `foo AND NOT bar`,
while most users expect that it matches "foo!bar" phrase.

This commit aligns with users' expectations.
2024-09-29 11:18:04 +02:00
Aliaksandr Valialkin
b7a3d575da
lib/logstorage: allow using - instead of ! in front of (...) 2024-09-29 11:18:04 +02:00
Aliaksandr Valialkin
fce6fc745a
lib/logstorage: return the expected hits results from uniq pipe when the number of unique values reaches the specified limit
Previously `uniq` pipe could return zero `hits` if the number of found unique values equals the specified limit.
This wasn't expected in most cases.
2024-09-29 10:53:44 +02:00
Aliaksandr Valialkin
58d1e517de
lib/logstorage: clear hits slice obtained from encoding.GetUint64s() before updating it with hits for valueTypeDict column
encoding.GetUint64s() returns uninitialized slice, which may contain arbitrary values.
So values in this slice must be reset to zero before using it for counting hits in `uniq` and `top` pipes.
2024-09-29 10:29:50 +02:00
Aliaksandr Valialkin
b5d94f06f5
lib/logstorage: postpone initialization of per-shard stateSizeBudget until the first call to pipeProcessor.writeBlock()
This simplifies pipeProcessor initialization logic a bit.
This also doesn't mangle the original maxStateSize value, which is used in error messages when the state size exceeds maxStateSize.
2024-09-29 10:29:49 +02:00
Aliaksandr Valialkin
7f8b1300a9
lib/logstorage: add non-empty if (...) condition to automatically generated result names in stats pipe
This allows executing queries with `stats` pipe, which calculate multiple results with the same functions,
but with different `if (...)` conditions. For example:

  _time:5m | count(), count() if (error)

Previously such queries couldn't be executed becasue automatically generated name for the second result
didn't include `if (error)`, so names for both results were identical - `count(*)`.
2024-09-29 09:52:19 +02:00
Aliaksandr Valialkin
04c73d54d4
lib/logstorage: support order alias for sort pipe
Now the following queries are equivalents:

    _time:5s | sort by (_time)

    _time:5s | order by (_time)

This is needed for convenience, since `order by` is commonly used in other query languages such as SQL.
2024-09-29 09:52:18 +02:00
Aliaksandr Valialkin
1a6313ca68
lib/logstorage: allow using - instead of ! as a shorthand for NOT operator in LogsQL 2024-09-27 13:15:55 +02:00
Aliaksandr Valialkin
b60cb98377
lib/logstorage: support skipping _stream: prefix for stream filters
'_stream:{...}' can be written as '{...}'

This simplifies writing queries with stream filters, and makes them more familier to Loki users.
2024-09-27 13:15:55 +02:00
Aliaksandr Valialkin
bc0bb0c36a
lib/logstorage: consistently sort stream contexts belonging to different streams by the minimum time seen in the matching logs
This should simplify debugging of stream_context output, since it remains stable over repeated requests.
2024-09-27 11:21:28 +02:00
Aliaksandr Valialkin
bce56d430d
lib/logstorage: add _msg="---" delimiter between different log streams in stream_context output
This should help investigating contexts, which belong to different log streams.
2024-09-27 11:21:27 +02:00
Aliaksandr Valialkin
e4e14697fa
lib/logstorage: improve performance for stream_context pipe over streams with big number of log entries
Do not read timestamps for blocks, which cannot contain surrounding logs.
This should improve peformance for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6730 .

Also optimize min(_time) and max(_time) calculations a bit by avoiding conversion
of timestamp to string when it isn't needed.
This should improve performance for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .
2024-09-26 22:31:05 +02:00
Aliaksandr Valialkin
bec0846e1a
lib/contextutil: make golanci-lint happy by substituing unused function arg name with _
This is a follow-up for 4b1611267f
2024-09-26 17:07:25 +02:00
Aliaksandr Valialkin
f5dfe1cacd
lib/logstorage: properly return surrounding logs outside the selected time range by stream_context pipe
Previously only logs inside the selected time range could be returned by stream_context pipe.
For example, the following query could return up to 10 surrounding logs only for the last 5 minutes,
while most users expect this query should return up to 10 surrounding logs without restrictions on the time range.

    _time:5m panic | stream_context before 10

This enables the ability to implement stream context feature at VictoriaLogs web UI: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7063 .

Reduce memory usage when returning stream context over big log streams with millions of entries.
The new logic scans over all the log messages for the selected log stream, while keeping in memory only
the given number of surrounding logs. Previously all the logs for the given log stream on the selected time range
were loaded in memory before selecting the needed surrounding logs.
This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6730 .

Reduce the scan performance for big log streams by fetching only the requested fields. For example, the following
query should be executed much faster than before if logs contain many fields other than _stream, _msg and _time:

    panic | stream_context after 30 | fields _stream, _msg, _time
2024-09-26 17:04:39 +02:00
Aliaksandr Valialkin
4d27933041
app/vlinsert: support _time field without timezone information during data ingestion
Use local timezone of the host server in this case. The timezone can be overridden
with TZ environment variable if needed.

While at it, allow using whitespace instead of T as a delimiter between data and time
in the ingested _time field. For example, '2024-09-20 10:20:30' is now accepted
during data ingestion. This is valid ISO8601 format, which is used by some log shippers,
so it should be supported. This format is also known as SQL datetime format.

Also assume local time zone when time without timezone information is passed to querying APIs.
Previously such a time was parsed in UTC timezone. Add `Z` to the end of the time string
if the old behaviour is preferred.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6721
2024-09-26 12:50:14 +02:00
Aliaksandr Valialkin
3a556bd15a
app/vlselect/logsql: clone the query with the current timestamp when performing live tailing requests in the loop
Previously the original timestamp was used in the copied query, so _time:duration filters
were applied to the original time range: (timestamp-duration ... timestamp]. This resulted
in stopped live tailing, since new logs have timestamps bigger than the original time range.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7028
2024-09-26 08:57:48 +02:00
Aliaksandr Valialkin
55ecf4f766
lib/logstorage: add blocks_count pipe
This pipe is useful for debugging purposes when the number of processed blocks must be calculated for the given query:

    <query> | blocks_count

This helps detecting the root cause of query performance slowdown in cases like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070
2024-09-25 19:18:38 +02:00
Aliaksandr Valialkin
66d6514e2e
lib/logstorage: lazily read column headers metadata during queries
This improves performance for analytical queries, which do not need column headers metadata.
For example, the following query doesn't need column headers metadata, since _stream and min(_time)
are stored in block header, which is read separately from colum headers metadata:

  _time:1w | stats by (_stream) min(_time) min_time

This commit significantly improves the performance for this query.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070
2024-09-25 19:18:37 +02:00
Aliaksandr Valialkin
246c339e3d
lib/logstorage: read timestamps column when it is really needed during query execution
Previously timestamps column was read unconditionally on every query.
This could significantly slow down queries, which do not need reading this column
like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .
2024-09-25 19:18:37 +02:00
Aliaksandr Valialkin
180137a377
lib/logstorage: improve the performance of obtaining _stream column value
Substitute global streamTagsCache with per-blockSearch cache for ((stream.id) -> (_stream value)) entries.
This improves scalability of obtaining _stream values on a machine with many CPU cores, since every CPU
has its own blockSearch instance.

This also should reduce memory usage when querying logs over big number of streams, since per-blockSearch
cache of ((stream.id) -> (_stream value)) entries is limited in size, and its lifetime is bounded by a single query.
2024-09-24 20:57:39 +02:00
Aliaksandr Valialkin
9d11a21541
lib/logstorage/consts.go: document that it isn't recommended setting maxColumnsPerBlock constant to too big values
This should help avoiding cases like this one - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6425#issuecomment-2337446083
2024-09-24 18:52:54 +02:00
Aliaksandr Valialkin
1264350566
lib/logstorage: improve performance for streamID.marshalString() by more than 2x
The streamID.marshalString() is executed in hot path if the query selects _stream_id field.

Command to run the benchmark:

go test ./lib/logstorage/ -run=NONE -bench=BenchmarkStreamIDMarshalString -benchtime=5s

Results before the commit:

BenchmarkStreamIDMarshalString-16    	438480714	        14.04 ns/op	  71.23 MB/s	       0 B/op	       0 allocs/op

Results after the commit:

BenchmarkStreamIDMarshalString-16    	982459660	         6.049 ns/op	 165.30 MB/s	       0 B/op	       0 allocs/op
2024-09-24 18:38:21 +02:00
Aliaksandr Valialkin
d944c162da
lib/logstorage: add benchmark for streamID.marshalString 2024-09-24 18:38:21 +02:00
hagen1778
a8a3bc1e31
lib/promscrape: make linter happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 8bb3f2fd43)
2024-09-24 16:58:17 +02:00
hagen1778
58ff914d96
lib/promscrape: temporary disable TestClientProxyReadOk
This test is very flaky and prevents other tests from running in CI.
Disabling this test should improve tests quality, since it isn't reliable anyway.

There is a ticket to fix this test - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7062

Once fixed, this test should be uncommented.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit c7569dac50)
2024-09-24 16:58:17 +02:00
Dmytro Kozlov
869b09122a
lib/promscrape: show only unhealthy targets if show_only_unhealthy filter is enabled (#6960)
### Describe Your Changes

It is better to show only unhealthy targets instead of all of them when
`show_only_unhealthy` filter is enabled.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3536

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit cbeb7d50e8)
2024-09-24 16:58:16 +02:00
Aliaksandr Valialkin
dba0553e11
lib/cgroup: round GOMAXPROCS to the lower integer value of cpuQuota
Rounding GOMAXPROCS to the upper interger value of cpuQuota increases chances of CPU starvation,
non-optimimal goroutine scheduling and additional CPU overhead related to context switching.

So it is better to round GOMAXPROCS to the lower integer value of cpuQuota.
2024-09-23 16:11:59 +02:00
Artem Fetishev
5f89374cc8
lib/storage: restore ability to put empty metric ID list into tagFiltersToMetricIDsCache (#7064)
### Describe Your Changes

Currently it the metricID list is empty it won't be mashalled and as the
result won't be put into the tagFiltersToMetricIDsCache which causes the
cache misses for the corresponding tagFilters. In some setups this
causes severe search speed detradation (see #7009).

The empty metric IDs was covered before but then was accidentally
removed in 6c21439.

This PR restores the coverage of this case.

A new unit test can be used as a proof that empty metricID lists are not
added to the cache (just remove the fix in index_db.go and run the test
to see the result)

Also a benchmark has been added to see the implications of the
compression.

```
user@laptop:~/p/github.com/rtm0/VictoriaMetrics/01/src$ go test ./lib/storage/ -run=NONE -bench BenchmarkMarshalUnmarshalMetricIDs --loggerLevel=ERROR
goos: linux
goarch: amd64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage
cpu: 13th Gen Intel(R) Core(TM) i7-1355U
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-0-12             3237240               363.5 ns/op               0 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1-12             2831049               451.8 ns/op               0.4706 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10-12            1152764              1009 ns/op                 1.667 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100-12            297055              3998 ns/op                 5.755 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000-12            31172             34566 ns/op                 8.484 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000-12            4900            289659 ns/op                 9.416 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100000-12            447           2341173 ns/op                 9.456 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000000-12            42          24926928 ns/op                 9.468 compression-rate
BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000000-12            5         204098872 ns/op                 9.467 compression-rate
PASS
ok      github.com/VictoriaMetrics/VictoriaMetrics/lib/storage  15.018s
```

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-09-20 17:38:56 +02:00
Aliaksandr Valialkin
654494a0de
lib/storage: improve performance for indexSearch.containsTimeRange()
The indexSearch.containsTimeRange() function is called for the current indexDB and the previous indexDB
every time when searching for metricIDs by label filters. This function consumes a lot of additional CPU time
for cases when queries with lightweight label filters are sent to VictoriaMetrics at high rate (e.g. thousands of RPS),
like in the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7009 .

Optimize indexSearch.containsTimeRange() function in the following ways:

- Unconditionally return true if this function is called for the current indexDB, since there are very high
  chances that the current indexDB contains the data with timestamps in the requested time range.

- Cache the minimum timestamp, which is missing in the indexed data for the previous indexDB.
  This is safe to do, since the previous indexDB is readonly.
  This optimization eliminates potentially slow lookup in the previous indexDB for typical
  use cases when the requested time range is close to the current time.
2024-09-20 17:37:28 +02:00
Aliaksandr Valialkin
781bff24b5
lib/storage: simplify indexDB.doExtDB() usage by removing the returned value
Previously indexDB.doExtDB() was returning boolean value, which was indicating whether f callback was called.
There is no need in returning this boolean value, since the f callback can determine on itself whether it was called.

This simplifies the code a bit.

While at it, document indexDB.doExtDB().
2024-09-20 17:37:03 +02:00
Roman Khavronenko
ec181e69e7
lib/storage: follow-up after d8f8822fa5 (#7036)
Make function name and comments more clear.

d8f8822fa5

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-09-20 17:36:10 +02:00
Aliaksandr Valialkin
472b6b326e
lib/logstorage: make sure that getCommonTokens returns common tokens in the original order of tokens inside tokenSets arg
This fixes flaky test TestGetCommonTokensForOrFilters:

    filter_or_test.go:143: unexpected tokens for field "_msg"; got ["foo" "bar"]; want ["bar" "foo"]
2024-09-19 16:00:21 +02:00
Roman Khavronenko
e6dac18db3
lib/logger: increase default value of -loggerMaxArgLen cmd-line fla… (#7008)
…g from 1e3 to 5e3

This should improve visibility on errors produced by very long queries.

The change is classified as BUG in order to port it to LTS releases.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Mathias Palmersheim <mathias@victoriametrics.com>
(cherry picked from commit e115b85770)
2024-09-19 15:48:09 +02:00
Nikolay
6f99dcc7c1
lib/storage: consistently check for missing metricID index records (#6967)
* Previously, only metricID->metricName missing index records were
tracked with deadline But it was possible a case for missing
metricID->TSID index records. IndexDB metrics fix exposed misleading
metric for such missing records.

* This commit adds check for metricID->TSID missing index records. And
delete missing metricID entry if it hit 60 second deadline.

Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6931

Signed-off-by: f41gh7 <nik@victoriametrics.com>
2024-09-16 13:07:37 +02:00
Nikolay
c32032ac1b
lib/fs: properly call windows APIs (#6998)
Previously we manually imported system windows DDLs
and made direct syscall.

 But golang exposes syscall wrappers with sys/windows package.
It seems, that direct syscall was broken at 1.23 golang release. It was
`GetDiskFreeSpace` syscall in our case.

This commit replaces all manual syscalls with wrappers

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6973

Related golang issue:
https://github.com/golang/go/issues/69029

Signed-off-by: f41gh7 <nik@victoriametrics.com>
2024-09-13 13:19:04 +02:00
Aliaksandr Valialkin
cad236003b
app/vlselect: consistently reuse the original query timestamp when executing /select/logsql/query with positive limit=N query arg
Previously the query could return incorrect results, since the query timestamp was updated with every Query.Clone() call
during iterative search for the time range with up to limit=N rows.

While at it, optimize queries, which find low number of matching logs, while spend a lot of CPU time for searching
across big number of logs. The optimization reduces the upper bound of the time range to search if the current time range
contains zero matching rows.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6785
2024-09-08 14:34:46 +02:00
Aliaksandr Valialkin
297301e8c0
lib/logstorage: preserve the order of tokens to check against bloom filters in AND filters
Previously tokens from AND filters were extracted in random order. This could slow down
checking them agains bloom filters if the most specific tokens go at the beginning of the AND filters.
Preserve the original order of tokens when matching them against bloom filters,
so the user could control the performance of the query by putting the most specific AND filters
at the beginning of the query.

While at it, add tests for getCommonTokensForAndFilters() and getCommonTokensForOrFilters().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556
2024-09-08 12:28:34 +02:00
Aliaksandr Valialkin
4b49b62a58
lib/logstorage: improve error logging for incorrect queries passed to /select/logsql/stats_query and /select/logsql/stats_query_range functions 2024-09-08 12:28:33 +02:00
Aliaksandr Valialkin
edb1afe804
lib/logstorage: properly extract common tokens from unsupported OR filters
Previously the following query could miss rows matching !bar if these rows do not contain foo:

   foo OR !bar

This is because of incorrect detection of common tokens for OR filters - all the unsupported filters
were skipped (including the NOT filter (aka `!`)), while in this case zero common tokens must be returned.

While at it, move repetiteve code in TestFilterAnd and TestFilterOr into f function.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556
2024-09-08 12:28:33 +02:00
Aliaksandr Valialkin
c448189f69
app/vlselect: add /select/logsql/stats_query_range endpoint for building time series panels in VictoriaLogs plugin for Grafana
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6943
Updates https://github.com/VictoriaMetrics/victorialogs-datasource/issues/61
2024-09-07 00:44:34 +02:00
Aliaksandr Valialkin
01c8e12370
app/vlselect: add /select/logsql/stats_query endpoint, which is going to be used by vmalert
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6942
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706
2024-09-06 23:00:58 +02:00
Aliaksandr Valialkin
c5badeea08
lib/logstorage: substitute | operator with or operator at math pipe
This is needed for avoiding confusion between the `|` operator at `math` pipe and `|` pipe delimiter.
For example, the following query was parsed unexpectedly:

   * | math foo / bar | fields x

as

   * | math foo / (bar | fields) as x

Substituting `|` with `or` inside `math` pipe fixes this ambiguity.
2024-09-06 22:43:29 +02:00
Artem Fetishev
85bf768013
lib/storage: adds metrics that count records that failed to insert
### Describe Your Changes

Add storage metrics that count records that failed to insert:

- `RowsReceivedTotal`: the number of records that have been received by
the storage from the clients
- `RowsAddedTotal`: the number of records that have actually been
persisted. This value must be equal to `RowsReceivedTotal` if all the
records have been valid ones. But it will be smaller otherwise. The
values of the metrics below should provide the insight of why some
records hasn't been added
-   `NaNValueRows`: the number of records whose value was `NaN`
- `StaleNaNValueRows`: the number of records whose value was `Stale NaN`
- `InvalidRawMetricNames`: the number of records whose raw metric name
has failed to unmarshal.

The following metrics existed before this PR and are listed here for
completeness:

- `TooSmallTimestampRows`: the number of records whose timestamp is
negative or is older than retention period
- `TooBigTimestampRows`: the number of records whose timestamp is too
far in the future.
- `HourlySeriesLimitRowsDropped`: the number of records that have not
been added because the hourly series limit has been exceeded.
- `DailySeriesLimitRowsDropped`: the number of records that have not
been added because the daily series limit has been exceeded.

---
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
2024-09-06 18:13:48 +02:00
Aliaksandr Valialkin
08fe7949d1
lib/logstorage: consistently use nsecsPerDay constant and remove nsecPerDay constant 2024-09-06 16:18:15 +02:00
Aliaksandr Valialkin
7dcce1ca02
lib/logstorage: pre-calculate hashes from tokens used in bloom filter search
Previously per-token hashes for per-block bloom filters were re-calculated on every scanned block.
This could be slow when the number of tokens is big or when the number of blocks to scan is big.
Pre-calculate hashes for bloom filters and then use them for searching in bloom filters.
This improves performance by 2.5x for in(...) filters with many values to search inside `in()`.
2024-09-05 19:44:42 +02:00
Zhu Jiekun
8848614315
lib/discovery/azure: fix host check in next link in Azure SD (#6915)
Previous bugfix at 49f63b2 only partially fixed pagination host validation error.

 Before this fix it was:
```
unexpected nextLink host \"management.azure.com\", expecting \"https://management.azure.com\"
```

Now we only check the `Host` without schema. 

However, when Azure respond `nextLink` in `Host:Port` format, the
`nextLink` check will fail:
```
unexpected nextLink host \"management.azure.com:443\", expecting \"management.azure.com\"
```

This pull request further relaxes the checks by only checking the
`Hostname`.

---

 related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6912
2024-09-05 16:58:10 +02:00
Artem Fetishev
8bdf52977f
lib/storage: do not drop stale NaN samples (#6936)
This patch reverts 1fd3385

After discussing it we've come to conclusion that this is a valid
behavior which can be avoided by deleting the time series only once the
corresponding stale NaNs have been received.

On the other hand, the fix leads to lost stale NaNs in some rare but
valid use cases. For example:

- In a cluster configuration the samples for a given time series are
normally sent to the same vmstorage replica. However, wminsert may
reroute the samples to another replica because the original one is down
or is overloaded. In this case the stale NaN may end up on a replica
that has no data for that time series, but we still want to record that
sample.

Thus, reverting that fix.

---

related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
2024-09-05 16:58:10 +02:00
Hui Wang
9cb1704d3c
lib/storage: fix metric vm_object_references{type="indexdb"} (#6937)
follow up
4ecc370acb

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-09-05 16:57:48 +02:00
Aliaksandr Valialkin
2630497e2c
lib/logstorage: delete unused function - bloomfilter.containsAny 2024-09-05 16:57:47 +02:00
Aliaksandr Valialkin
5763a957ef
lib/logstorage: properly fix incorrect extraction of common tokens for OR filters at distinct log fields
Previously (f1:foo OR f2:bar) was incorrectly returning `foo` token for `f1` and `bar` token for `f2`.
These tokens were used for checking against bloom filter for every data block, so the data block,
which didn't contain simultaneously `foo` token for `f1` field and `bar` token for `f2` field, was skipped.
This was incorrect, since such a block may contain logs matching the original OR filter.

The fix is to return common tokens from `OR`-delimted filters only if these tokens exist at EVERY such filter
for the given field name. If some `OR`-delimited filter misses the given field name, then `OR`-delimited filters
do not contain common tokens, which could be used for checking against bloom filter.

While at it, add more tests covering various edge cases for filters delimited by AND and OR.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556
2024-09-05 16:57:47 +02:00
f41gh7
64361c2d7a
follow-up after 01430a155c
* properly check SeverityNumber at FormatSeverity function
 it could be negative, which could cause panic for victorialogs
2024-09-04 15:39:55 +02:00
Andrii Chubatiuk
711f2cc4f2
vlinsert: added opentelemetry logs support
Commit adds the following changes:

* Adds support of OpenTelemetry logs for Victoria Logs with protobuf encoded messages

*  json encoding is not supported for the following reasons:
   - It brings a lot of fragile code, which works inefficiently.
   - json encoding is impossible to use with language SDK.

* splits metrics and logs structures at lib/protoparser/opentelemetry/pb package.

* adds docs with examples for opentelemetry logs.

---
Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4839

Co-authored-by: AndrewChubatiuk <andrew.chubatiuk@gmail.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2024-09-03 20:24:01 +02:00
rtm0
cd6f2e6efe
lib/storage: improve the message of the tooManyTimeseries error (#6893)
### Describe Your Changes

This is a follow-up for #6836. Per @valyala's
[comment](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6836#discussion_r1730291704),
the error message does not reflect which flag needs to be adjusted.

### Checklist

The following checks are **mandatory**:

- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
(cherry picked from commit 4df243d530)
2024-09-03 10:49:34 +02:00
jackyin
66789a8144
lib/logstorage: and filter results in unexpected response (#6556)
fix #6554
andfilter shouldn't return orfilter field which result in bloomfilter
return false.

---------

Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 975ed27a76)
2024-09-03 10:49:25 +02:00
rtm0
5d065d2746
tests: check Metrics.RowsAddedTotal in unit tests (#6895)
### Describe Your Changes

This is a follow-up PR: Unit tests introduced in #6872 can now use
RowsAddedTotal counter whose scope was fixed in #6841.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
(cherry picked from commit 2c856c6951)
2024-09-03 10:49:18 +02:00
Roman Khavronenko
71e592e677
attempt to fix flaky TestClientProxyReadOk (#6899)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit f586082520)
2024-09-03 10:49:16 +02:00
dufucun
1aa9f7be4e
tests: fix slice init length (#6897)
### Describe Your Changes

fix slice init length

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: dufucun <dufuchun@sohu.com>
(cherry picked from commit 95bafc8caf)
2024-08-30 11:18:21 +02:00
rtm0
602bedf362
testing: allow disabling fsync to make tests run faster (#6871)
### Describe Your Changes

fsync() ensures that the data is written to disk. In production this is
needed for data durability. However, during the development, when the
unit tests are run, this level of durability is not needed. Therefore
fsync() can be disabled which will makes test runs two times faster.

The disabling is done by setting the `DISABLE_FSYNC_FOR_TESTING`
environment variable. The valid values for this variable are the same as
the values of the arg of `go doc strconv.ParseBool`:

```
1, t, T, TRUE, true, True, 0, f, F, FALSE, false, False.
```

Any other value means `false`.

The variable is set for all test build targets. Compare running times:

Build Target | DISABLE_FSYNC_FOR_TESTING=0 | DISABLE_FSYNC_FOR_TESTING=1
----------------- | ------------------------------------------------ |
-------------------------------------------------
make test | 1m5s  | 0m22s
make test-race | 3m1s | 1m42s
make test-pure | 1m7s | 0m20s
make test-full | 1m21s | 0m32s
make test-full-386 | 1m42s | 0m36s

When running tests for a given package, fsync can be disabled as
follows:

```shell
DISABLE_FSYNC_FOR_TESTING=1 go test ./lib/storage
```

Disabling fsync() is intended for testing purposes only and the name of
the variables reflects that.

What could also have been done but haven't:

- lib/filestream/filestream.go: `Writer.MustFlush()` also uses f.Sync()
but nothing has been done to it, because the Writer.MustFlush() is not
used anywhere in the VM codebase. A side question: what is the general
policy for the unused code?
- lib/filestream/filestream.go: Writer.Write() calls `adviceDontNeed()`
which calls unix.Fdatasync(). Disabling it could potentially improve
running time, but running tests with this code disabled has shown
otherwise.

### Checklist

The following checks are **mandatory**:

- [ x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>

(cherry picked from commit 334cd92a6c)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-08-30 11:18:21 +02:00
Nikolay
0f9536eaf5
lib/storage: properly add previous indexDB metrics (#6890)
Previously, some extIndexDB metrics were not registered. It resulted
into missing metrics, if metric value was added to the extIndexDB. It's
a usual case for search requests at both indexes.

 Current commit updates all metrics from extIndexDB according to the
current IndexDB. It must fix such cases

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6868

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

(cherry picked from commit 4ecc370acb)
2024-08-28 11:17:23 +02:00
rtm0
4c31a6a1fc
lib/storage: properly handle maxMetrics limit at metricID search
`TL;DR` This PR improves the metric IDs search in IndexDB:

- Avoid seaching for metric IDs twice when `maxMetrics` limit is
exceeded
- Use correct error type for indicating that the `maxMetrics` limit is
exceded
- Simplify the logic of deciding between per-day and global index search

A unit test has been added to ensure that this refactoring does not
break anything.

---

Function calls before the fix:

```
idb.searchMetricIDs
    |__ is.searchMetricIDs
        |__ is.searchMetricIDsInternal
            |__ is.updateMetricIDsForTagFilters
                |__ is.tryUpdatingMetricIDsForDateRange
                |                       |
                |__ is.getMetricIDsForDateAndFilters
```

- `searchMetricIDsInternal` searches metric IDs for each filter set. It
maintains a metric ID set variable which is updated every time the
`updateMetricIDsForTagFilters` function is called. After each successful
call, the function checks the length of the updated metric ID set and if
it is greater than `maxMetrics`, the function returns `too many
timeseries` error.
- `updateMetricIDsForTagFilters` uses either per-day or global index to
search metric IDs for the given filter set. The decision of which index
to use is made is made within the `tryUpdatingMetricIDsForDateRange`
function and if it returns `fallback to global search` error then the
function uses global index by calling `getMetricIDsForDateAndFilters`
with zero date.
- `tryUpdatingMetricIDsForDateRange` first checks if the given time
range is larger than 40 days and if so returns `fallback to global
search` error. Otherwise it proceeds to searching for metric IDs within
that time range by calling `getMetricIDsForDateAndFilters` for each
date.
- `getMetricIDsForDateAndFilters` searches for metric IDs for the given
date and returns `fallback to global search` error if the number of
found metric IDs is greater than `maxMetrics`.

Problems with this solution:

1. The `fallback to global search` error returned by
`getMetricIDsForDateAndFilters` in case when maxMetrics is exceeded is
misleading.
2. If `tryUpdatingMetricIDsForDateRange` proceeds to date range search
and returns `fallback to global search` error (because
`getMetricIDsForDateAndFilters` returns it) then this will trigger
global search in `updateMetricIDsForTagFilters`. However the global
search uses the same maxMetrics value which means this search is
destined to fail too. I.e. the same search is performed twice and fails
twice.
3. `too many timeseries` error is already handled in
`searchMetricIDsInternal` and therefore handing this error in
`updateMetricIDsForTagFilters` is redundant
4. updateMetricIDsForTagFilters is a better place to make a decision on
whether to use per-day or global index.

Solution:

1.  Use a dedicated error for `too many timeseries` case
2. Handle `too many timeseries` error in  `searchMetricIDsInternal` only
3. Move the per-day or global search decision from
`tryUpdatingMetricIDsForDateRange` to `updateMetricIDsForTagFilters` and
remove `fallback to global search` error.

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-08-27 23:08:17 +02:00
rtm0
b51c6bf75d
lib/storage: properly register index records with RegisterMetricNames
Once the timeseries is in tsidCache, new entries won't be created in
per-day index because the RegisterMetricNames() code does consider
different dates for the same timeseries. So this case has been added.

The same bug exists for AddRows() but it is not manifested because the
index entries are finally created in updatePerDateData().

RegisterMetricNames also updated to increase the newTimeseriesCreated
counter because it actually creates new time series in index.

A unit tests has been added that check all possible data patterns
(different metric names and dates) and code branches in both
RegisterMetricNames and AddRows. The total number of new unit tests is
around 100 which increaded the running time of storage tests by 50%.

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2024-08-27 23:00:27 +02:00
rtm0
332d290b38
Move rowsAddedTotal counter to Storage (#6841)
### Describe Your Changes

Reduced the scope of rowsAddedTotal variable from global to Storage.

This metric clearly belongs to a given Storage object as it counts the
number of records added by a given Storage instance.
Reducing the scope improves the incapsulation and allows to reset this
variable during the unit tests (i.e. every time a new Storage object is
created by a test, that object gets a new variable).



Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
2024-08-27 22:34:54 +02:00
Zhu Jiekun
5b49fd83be
lib/promrelabel: follow-up for 8958cecad6
In the previous commit 8958cecad6
the default ports (80/443) were removed for both the `scrapeURL` and
`instance` label values for those targets without a port in
`__address__`. Different values in the `instance` label generate new
time series.

This commit reverts the changes made to the `instance` label. Now,
for those targets:
- `scrapeURL` will remain unchanged.
- The `instance` label value will include the default port.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792
(cherry picked from commit e97e966f82)
2024-08-27 15:44:07 +02:00
Nikolay
08cbbf8134
lib/promscrape: fixes proxy autorization (#6783)
* Adds custom dial func for HTTP-Connect and socks5 proxy tunnels.
  Standard golang http.transport exposes GetProxyConnectHeader function,
  but it doesn't allow to use separate tls config for proxy.
  It also not possible to enforce HTTP-Connect with standard http lib.
* For http scrape targets, by default http.Transport.Proxy function must
  be used. Since it has special case with full uri forward.
* Adds proxy.URL json methods that allow to properly copy internal
fields, like User/Password.
It should fix bug with proxy_url. When credentials specified at URL was
ignored.
* Adds tests for scrape client proxy requests

related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6771
2024-08-19 22:50:39 +02:00
Zhu Jiekun
8958cecad6
lib/promrelabel: stop adding default port 80/433 to address label
*  It was necessary to add default ports for fasthttp client. After migration to the std.httpclient it's no longer needed.
* An additional configuration is required at proxy servers with implicitly set 80/443 ports to the host header (such as HA proxy.

It's expected that after upgrade __address_ label may change. But it should be rare case. 80/443 ports are not widely used at monitoring ecosystem. And it shouldn't have much impact. 

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792

Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-08-19 22:50:39 +02:00
hagen1778
bd6405df01
make go vet happy
Address `non-constant format string in call` check:
https://github.com/golang/go/issues/60529

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit febba3971b)
2024-08-19 21:41:44 +02:00
Roman Khavronenko
d4240c4a3e
lib/httputils: parse URL before creating HTTP transport (#6820)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6740

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-08-16 11:34:49 +02:00
Hui Wang
e74d5f266e
stream aggregation: do not allow to enable -stream.keepInput and `k… (#6723)
…eep_metric_names` options in stream aggregation config together

With aggregated data and raw data under the same metric, results would
be confusing.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 62d19369a3)
2024-08-13 09:08:27 -04:00
Zhu Jiekun
49f63b2b9a
app/vmagent: fixes azure service discovery pagination
Azure API response with link to the next page was incorrectly validate. Validation used url.Host header to match configure API URL.


https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6784
2024-08-09 15:26:18 +02:00
Zakhar Bessarab
54315fbad6
lib/backup/s3remote: add retryer configuration (#6747)
### Describe Your Changes

This helps to improve reliability of performing backups in environments
with unreliable connection and tolerate temporary errors at S3 provider
side.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6732

Default retry timeout is up to 3 minutes to make this consistent with
the same configuration for GCS:
a05317f61f/lib/backup/gcsremote/gcs.go (L70-L76)

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
(cherry picked from commit cb00b4b00f)
2024-08-07 16:59:23 +02:00
Roman Khavronenko
c41a9b8d17
lib/bytesutil: smooth buffer growth rate (#6761)
Before, buffer growth was always x2 of its size, which could lead to
excessive memory usage when processing big amount of data.
For example, scraping a target with hundreds of MBs in response could
result into hih memory spikes in vmagent because buffer has to double
its size to fit the response. See
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6759

The change smoothes out the growth rate, trading higher allocation rate
for lower mem usage at certain conditions.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit f28f496a9d)
2024-08-07 16:59:23 +02:00
hagen1778
9e186c0319
lib/mergeset: fix typos in comments
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 1154f90d2d)
2024-08-07 16:59:22 +02:00
Aliaksandr Valialkin
29d526e20a
lib/streamaggr: remove resetState arg from aggrState.flushState()
The resetState arg was used only for the BenchmarkAggregatorsFlushInternalSerial benchmark.
This benchmark was testing aggregate state flush performance by keeping the same state across flushes.
The benhmark didn't reflect the performance and scalability of stream aggregation in production,
while it led to non-trivial code changes related to resetState arg handling.

So let's drop the benchmark together with all the code related to resetState handling,
in order to simplify the code at lib/streamaggr a bit.

Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314
2024-08-07 11:46:49 +02:00
Aliaksandr Valialkin
1332b6f912
lib/streamaggr: consistently use the same timestamp across all the output aggregated samples in a single aggregation interval
Prevsiously every aggregation output was using its own timestamp for the output aggregated samples
in a single aggregation interval. This could result in unexpected inconsitent timesetamps for the output
aggregated samples.

This commit consistently uses the same timestamp across all the output aggregated samples.
This commit makes sure that the duration between subsequent timestamps strictly equals
the configured aggregation interval.

Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314
This commit should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580
2024-08-07 11:46:47 +02:00
Anzor
7e32daa63a
app/vmagent: read __sample_limit__ from labels (#6665) (#6666)
By introducing this feature, users will have the ability to customize
the sampleLimit parameter on a per-target basis, providing more
flexibility and control over the job execution behavior.

(cherry picked from commit 994796367b)
2024-08-07 09:57:48 +02:00
hagen1778
c99700ae15
fix typos in comments
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit f283126084)
2024-08-06 16:30:10 +02:00
Zakhar Bessarab
0b1def6e24
app/{vminsert,vmagent}: add healthcheck for influx ingestion endpoints (#6749)
### Describe Your Changes

This is useful for clients which validate InfluxDB is available before
data ingestion can be started.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6653

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 9877a5e7d5)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-08-05 09:45:32 +02:00
Juraj Bubniak
daa8c4970d
lib/backup/s3remote: fix typos (#6694)
Fixes a few typos in errors in lib/backup/s3remote package.

(cherry picked from commit 11c0b05e8a)
2024-07-29 14:30:21 +02:00
jackyin
f0a87abedd
lib/netutil: validate TLS cert and key files immediately (#6621)
Validate files specified via `-tlsKeyFile` and `-tlsCertFile` cmd-line flags on the process start-up. Previously, validation happened on the first connection accepted by HTTP server.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6608

---------

Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit e5d279bb71)
2024-07-29 14:30:20 +02:00
Aliaksandr Valialkin
d2a825279b
Revert "refactor(vmstorage): Refactor the code to reduce the time complexity of MustAddRows and improve readability (#6629)"
This reverts commit e280d90e9a.

Reason for revert: the updated code doesn't improve the performance of table.MustAddRows for the typical case
when rows contain timestamps belonging to ptws[0].

The performance may be improved in theory for the case when all the rows belong to partiton other than ptws[0],
but this partition is automatically moved to ptws[0] by the code at lines
6aad1d43e9/lib/storage/table.go (L287-L298) ,
so the next time the typical case will work.

Also the updated code makes the code harder to follow, since it introduces an additional level of indirection
with non-trivial semantics inside table.MustAddRows - the partition.TimeRangeInPartition() function.
This function needs to be inspected and understood when reading the code at table.MustAddRows().
This function depends on minTsInRows and maxTsInRows vars, which are defined and initialized
many lines above the partition.TimeRangeInPartition() call. This complicates reading and understanding
the code even more.

The previous code was using clearer loop over rows with the clear call to partition.HasTimestamp()
for every timestamp in the row. The partition.HasTimestamp() call is used in the table.MustAddRows()
function multiple times. This makes the use of partition.HasTimestamp() call more consistent,
easier to understand and easier to maintain comparing to the mix of partition.HasTimestamp() and partition.TimeRangeInPartition()
calls.

Aslo, there is no need in documenting some hardcore software engineering refactoring at docs/CHANGLELOG.md,
since the docs/CHANGELOG.md is intended for VictoriaMetrics users, who may not know software engineering.
The docs/CHANGELOG.md must document user-visible changes, and the docs must be concise and clear for VictoriaMetrics users.
See https://docs.victoriametrics.com/contributing/#pull-request-checklist for more details.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6629
2024-07-25 14:43:00 +02:00
Ruixiang Tan
8e2ff15203
refactor(vmstorage): Refactor the code to reduce the time complexity of MustAddRows and improve readability (#6629)
### Describe Your Changes
The original logic is not only highly complex but also poorly readable,
so it can be modified to increase readability and reduce time
complexity.


---------

Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>
2024-07-25 13:52:54 +02:00
Aliaksandr Valialkin
9b529c2742
lib/backup/azremote: follow-up for 5fd3aef549
- Mention that credentials can be configured via env variables at both vmbackup and vmrestore docs.

- Make clear that the AZURE_STORAGE_DOMAIN env var is optional at https://docs.victoriametrics.com/vmbackup/#providing-credentials-via-env-variables

- Use string literals as is for env variable names instead of indirecting them via string constants.
  This makes easier to read and understand the code. These environment variable names aren't going to change
  in the future, so there is no sense in hiding them under string constants with some other names.

- Refer to https://docs.victoriametrics.com/vmbackup/#providing-credentials-via-env-variables in error messages
  when auth creds are improperly configured. This should simplify figuring out how to fix the error.

- Simplify the code a bit at FS.newClient(), so it is easier to follow it now.
  While at it, remove the check when superflouos environment variables are set, since it is too fragile
  and it looks like it doesn't help properly configuring vmbackup / vmrestore.

- Remove envLookuper indirection - just use 'func(name string) (string, bool)' type inline.
  This simplifies code reading and understanding.

- Split TestFSInit() into TestFSInit_Failure() and TestFSInit_Success(). This simplifies the test code,
  so it should be easier to maintain in the future.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6518
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984
2024-07-17 17:55:39 +02:00
Aliaksandr Valialkin
97d696ae8b
all: substitute double "the the" with "the"
This is a follow-up for 8786a08d27

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6600
2024-07-17 14:29:05 +02:00
Aliaksandr Valialkin
f8aa445945
all: consistently use stringsutil.JSONString() for formatting JSON strings with fmt.* functions instead of using "%q" formatter
The %q formatter may result in incorrectly formatted JSON string if the original string
contains special chars such as \x1b . They must be encoded as \u001b , otherwise the resulting JSON string
cannot be parsed by JSON parsers.

This is a follow-up for c0caa69939

See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24
2024-07-17 14:01:37 +02:00
Aliaksandr Valialkin
f2362812c3
lib/protoparser/graphite: use Regex.ReplaceAllLiteralString instead of Regex.ReplaceAllString for the case when the replacement cannot contain placeholders for capturing groups
This is a follow-up for 74affa3aec
2024-07-17 13:01:35 +02:00
Aliaksandr Valialkin
f7789b61e7
lib/protoparser/graphite: follow-up for 476faf5578
- Clarify the description of -graphite.sanitizeMetricName command-line flag at README.md
- Do not sanitize tag values - only metric names and tag names must be sanitized,
  since they are treated specially by Grafana. Grafana doesn't apply any restrictions on tag values.
- Properly replace more than two consecutive dots with a single dot.
- Disallow unicode letters in metric names and tag names, since neither Prometheus nor Grafana
  do not support them.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6489
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6077
2024-07-17 12:57:56 +02:00
Aliaksandr Valialkin
54d3abc092
lib: consistently use regexp.Regexp.ReplaceAllLiteralString instead of regexp.Regexp.ReplaceAllString in places where the replacement cannot contain matching group placeholders 2024-07-17 12:57:43 +02:00
rtm0
1b03d7e6de
Fix inconsistent error handling in Storage.AddRows() (#6583)
`Storage.AddRows()` returns an error only in one case: when
`Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But
the same error is treated as a warning when it happens inside
`Storage.add()` or returned by `Storage.prefillNextIndexDB()`.

This commit fixes this inconsistency by treating the error returned by
`Storage.updatePerDateData()` as a warning as well. As a result
`Storage.add()` does not need a return value anymore and so doesn't
`Storage.AddRows()`.

Additionally, this commit adds a unit test that checks all cases that
result in a row not being added to the storage.

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-07-17 12:55:07 +02:00
Aliaksandr Valialkin
5168d0f754
lib/promrelabel: add test for IfExpression.String() function
While at it, simplify this function a bit after the commit 861852f262

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462
2024-07-16 18:32:34 +02:00
Aliaksandr Valialkin
617a7b4db6
lib/promscrape/discovery/yandexcloud: follow-up for 070abe5c71
- Obtain IAM token via GCE-like API instead of Amazon EC2 IMDSv2 API,
  since it looks like IMDBSv2 API isn't supported by Yandex Cloud
  according to https://yandex.cloud/en/docs/security/standard/authentication#aws-token :

  > So far, Yandex Cloud does not support version 2, so it is strongly recommended
  > to technically disable getting a service account token via the Amazon EC2 metadata service.

- Try obtaining IAM token via GCE-like API at first and then fall back to the deprecated Amazon EC2 IMDBSv1.
  This should prevent from auth errors for instances with disabled GCE-like auth API.
  This addresses @ITD27M01 concern at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513#issuecomment-1867794884

- Make more clear the description of the change at docs/CHANGELOG.md , add reference to the related issue.

P.S. This change wasn't tested in prod because I have no access to Yandex Cloud.
It is recommended to test this change by @ITD27M01 and @vmazgo , who filed
the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6524
2024-07-16 18:06:33 +02:00
Aliaksandr Valialkin
6d237da3f3
lib/promscrape: follow-up for 1e83598be3
- Clarify that the -promscrape.maxScrapeSize value is used for limiting the maximum
  scrape size if max_scrape_size option isn't set at https://docs.victoriametrics.com/sd_configs/#scrape_configs

- Fix query example for scrape_response_size_bytes metric at https://docs.victoriametrics.com/vmagent/#automatically-generated-metrics

- Mention about max_scrape_size option at the -help description for -promscrape.maxScrapeSize command-line flag

- Treat zero value for max_scrape_size option as 'no scrape size limit'

- Change float64 to int type for scrapeResponseSize struct fields and function args, since response size cannot be fractional

- Optimize isAutoMetric() function a bit

- Sort auto metrics in alphabetical order in isAutoMetric() and in scrapeWork.addAutoMetrics() functions
  for better maintainability in the future

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6434
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429
2024-07-16 12:38:41 +02:00
Aliaksandr Valialkin
893a555051
Revert "lib/protoparser/opentelemetry/firehose: escape requestID before returning it to user (#6451)"
This reverts commit cd1aca217c.

Reason for revert: this commit has no sense, since the firehose response has application/json content-type,
so it must contain JSON-encoded timestamp and requestId fields according to https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html#responseformat .
HTML-escaping the requestId field may break the response, so the client couldn't correctly recognize the html-escaped requestId.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6451
2024-07-16 09:50:16 +02:00
Aliaksandr Valialkin
8b76a40715
lib/httpserver: skip basic auth check for additional request paths, which should call httpserver.CheckAuthFlag()
This is a follow-up for 61dce6f2a1

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6338
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329
2024-07-16 01:08:41 +02:00
Aliaksandr Valialkin
0cf18a6f63
lib/uint64set: optimize Set.Has() for nil Set - it should be inlined now
This makes unnecessary the checkDeleted variable at lib/storage/index_db.go

This is a follow-up for b984f4672e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6342
2024-07-16 00:00:46 +02:00
Aliaksandr Valialkin
cbbc6e7141
lib/mergeset: properly update TableMetrics.TooLongItemsDroppedTotal inside Table.UpdateMetrics
Substitute '+=' with '=', since tooLongItemsTotal is global counter, which doesn't belong to the Table struct.

This is a follow-up for 69d244e6fb
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6297
2024-07-15 23:41:26 +02:00
Aliaksandr Valialkin
476bf400ac
lib/{httputils,netutil}: move httputils.GetStatDialFunc to netutil.NewStatDialFunc
- Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call
- NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil
- Simplify the implementation of NewStatDialFunc by removing sync.Map from there.
- Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils
- Use gauge instead of counter type for *_conns metric

This is a follow-up for d7b5062917
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299
2024-07-15 23:05:46 +02:00
Aliaksandr Valialkin
f4dce57ebe
lib/streamaggr/streamaggr.go: typo fix after 5e29ef5ed5: IgnoredNaNSamples -> ignoredNaNSamples 2024-07-15 21:59:03 +02:00
Aliaksandr Valialkin
cbc637d1dd
app/vmagent/remotewrite: follow-up for f153f54d11
- Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go .
  This improves code maintainability a bit.

- Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs().
  This prevents from potential resource leaks.

- Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation.
  This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions.

- Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label
  in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics.

- Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation.
  This label should simplify investigation of the exposed metrics.

- Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability:
  it is hard to use these labels in query filters and aggregation functions.

- Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` .
  This metric shows the number of samples generated by the corresponding streaming aggregation rule.
  This metric has been added in the commit 861852f262 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

- Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice.
  This metric has been added in the commit 861852f262 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

- Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params,
  which could modify the behaviour of the constructed streaming aggregator.
  Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory.

- Pass Options arg to LoadFromFile() function by reference, since this structure is quite big.
  This also allows passing nil instead of Options when default options are enough.

- Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics,
  so they have consistent set of labels comparing to the rest of streaming aggregation metrics.

- Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding
  `aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding
  `output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output
  metrics. This is a follow-up for the commit 2eb1bc4f81 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604

- Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users
  figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason
  in the commit 2eb1bc4f81 .

- Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function.
  While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268
2024-07-15 20:25:36 +02:00
Aliaksandr Valialkin
a8356f3a26
vendor: update github.com/VictoriaMetrics/metrics from v1.34.1 to v1.35.0
Fix potential memory leaks across VictoriaMetrics codebase after metrics.UnregisterSet(s) call
because of missing s.UnregisterAllMetrics() call.

This is a follow-up for 6a6e34ab8e . It is OK if some vmauth metrics
aren't visible for a few microseconds when the previous metrics are unregistered and new metrics
weren't registered yet.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6247
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6252
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5805
2024-07-15 10:45:39 +02:00
Aliaksandr Valialkin
4878152678
lib/{storage,mergeset}: do not allow setting dataFlushInterval to values smaller than pending{Items,Rows}FlushInterval
Pending rows and items unconditionally remain in memory for up to pending{Items,Rows}FlushInterval,
so there is no any sense in setting dataFlushInterval (the interval for guaranteed flush of in-memory data to disk)
to values smaller than pending{Items,Rows}FlushInterval, since this doesn't affect the interval
for flushing pending rows and items from memory to disk.

This is a follow-up for 4c80b17027

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6221
2024-07-15 10:11:23 +02:00
Aliaksandr Valialkin
c3d2351948
lib/streamaggr: consistently use alphabetical order of benchmarked stream aggregation outputs 2024-07-15 09:53:26 +02:00
Aliaksandr Valialkin
21f049e211
lib/streamaggr: follow-up for 9c3d44c8c9
- Consistently enumerate stream aggregation outputs in alphabetical order across the source code and docs.
  This should simplify future maintenance of the corresponding code and docs.

- Fix the link to `rate_sum()` at `see also` section of `rate_avg()` docs.

- Make more clear the docs for `rate_sum()` and `rate_avg()` outputs.

- Encapsulate output metric suffix inside rateAggrState. This eliminates possible bugs related
  to incorrect suffix passing to newRateAggrState().

- Rename rateAggrState.total field to less misleading rateAggrState.increase name, since it calculates
  counter increase in the current aggregation window.

- Set rateLastValueState.prevTimestamp on the first sample in time series instead of the second sample.
  This makes more clear the code logic.

- Move the code for removing outdated entries at rateAggrState into removeOldEntries() function.
  This make the code logic inside rateAggrState.flushState() more clear.

- Do not write output sample with zero value if there are no input series, which could be used
  for calculating the rate, e.g. if only a single sample is registered for every input series.

- Do not take into account input series with a single registered sample when calculating rate_avg(),
  since this leads to incorrect results.

- Move {rate,total}AggrState.flushState() function to the end of rate.go and total.go files, so they look more similar.
  This shuld simplify future mantenance.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6243
2024-07-15 08:44:48 +02:00
Aliaksandr Valialkin
bc1f92d7f5
app/vmagent/remotewrite: follow-up for 87fd400dfc
- Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage
  systems are configured with the disabled on-disk queue, every in-memory queue is full
  and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common,
  so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx
  relabeling and other processing in this case.

- Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped().
  Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set.
  In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full
  and on-disk queue is disabled.
  The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing
  the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage
  processes pending data, so it drops aggregated samples in this case.

- Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output,
  so it is clear that this flag can be set individually per each -remoteWrite.url.

- Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems
  are configured with the disabled on-disk queue, then there is no sense in keeping samples
  on some of these systems, while dropping samples on the remaining systems, since this
  will result in global stall on the remote storage system with the disabled on-disk queue
  and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false
  from remotewrite.TryPush() in this case. This will result in infinite duplicate samples
  written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload
  is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set.
  This allows proceeding with newly scraped / pushed samples by sending them to the remaining
  remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set.

- Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test.

- Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url.
  See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065
2024-07-13 02:30:10 +02:00
Aliaksandr Valialkin
43fc1183b9
app/vmalert: switch from table-driven tests to f-tests
This makes test code more clear and reduces the number of code lines by 500.
This also simplifies debugging tests. See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e

While at it, consistently use t.Fatal* instead of t.Error* across tests, since t.Error*
requires more boilerplate code, which can result in additional bugs inside tests.
While t.Error* allows writing logging errors for the same, this doesn't simplify fixing
broken tests most of the time.

This is a follow-up for a9525da8a4
2024-07-12 22:45:50 +02:00
hagen1778
c835a6351e
lib/streamaggr: add missing test cases
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 2f65956259)
2024-07-12 14:19:17 +02:00
Hui Wang
f3cbd62823
vmagent: fix vm_streamaggr_flushed_samples_total counter (#6604)
We use `vm_streamaggr_flushed_samples_total` to show the number of
produced samples by aggregation rule, previously it was overcounted, and
doesn't account for `output_relabel_configs`.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 2eb1bc4f81)
2024-07-12 14:19:17 +02:00
hagen1778
7370f84b97
lib/bakcup/azremote: follow-up after 5fd3aef549
Simplify tests by converting them to f-tests.

5fd3aef549
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 03e4c5c19c)
2024-07-10 15:17:06 +02:00
justinrush
e65e55e2dd
lib/backup: add support for Azure Managed Identity (#6518)
### Describe Your Changes

These changes support using Azure Managed Identity for the `vmbackup`
utility. It adds two new environment variables:

* `AZURE_USE_DEFAULT_CREDENTIAL`: Instructs the `vmbackup` utility to
build a connection using the [Azure Default
Credential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity@v1.5.2#NewDefaultAzureCredential)
mode. This causes the Azure SDK to check for a variety of environment
variables to try and make a connection. By default, it tries to use
managed identity if that is set up.

This will close
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

### Testing

However you normally test the `vmbackup` utility using Azure Blob should
continue to work without any changes. The set up for that is environment
specific and not listed out here.

Once regression testing has been done you can set up [Azure Managed
Identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview)
so your resource (AKS, VM, etc), can use that credential method. Once it
is set up, update your environment variables according to the updated
documentation.

I added unit tests to the `FS.Init` function, then made my changes, then
updated the unit tests to capture the new branches.

I tested this in our environment, but with SAS token auth and managed
identity and it works as expected.

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Justin Rush <jarush@epic.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 5fd3aef549)
2024-07-10 12:26:21 +02:00
Aliaksandr Valialkin
a1decb5ca1
app/vlinsert/loki: use easyproto instead for parsing Loki protobuf messages 2024-07-10 03:05:55 +02:00
Aliaksandr Valialkin
b8a8d3d6f1
lib/logstorage: drop all the pipes from the query when calculating the number of matching logs at /select/logsql/hits API 2024-07-10 00:39:16 +02:00
Aliaksandr Valialkin
d6415b2572
all: consistently use 'any' instead of 'interface{}'
'any' type is supported starting from Go1.18. Let's consistently use it
instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.
2024-07-10 00:23:26 +02:00
Aliaksandr Valialkin
9edeecabc8
lib: consistently use f-tests instead of table-driven tests
This makes easier to read and debug these tests. This also reduces test lines count by 15% from 3K to 2.5K .
See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e .

While at it, consistently use t.Fatal* instead of t.Error*, since t.Error* usually leads
to more complicated and fragile tests, while it doesn't bring any practical benefits over t.Fatal*.
2024-07-09 22:39:13 +02:00
Aliaksandr Valialkin
f3be3573e7
lib/promscrape/discovery/vultr: follow-up after 17e3d019d2
- Sort the discovered labels in alphabetical order at https://docs.victoriametrics.com/sd_configs/#vultr_sd_configs
- Rename VultrConfigs to VultrSDConfigs to be consistent with the naming for other SD configs.
- Prepare query arg filters for `list instances API` at newAPIConfig() instead of passing them in a separate listParams struct.
  This simplifies the code a bit.
- Return error when bearer token isn't set at vultr_sd_configs, since this token is mandatory
  according to https://docs.victoriametrics.com/sd_configs/#vultr_sd_configs
- Remove unused fields from the parsed response from Vultr list instances API in order to simplify the code a bit.
- Remove double logging of errors inside getInstances() function, since these errors must be already logged by the caller.
- Simplify tests, so they are easier to maintain.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6068
2024-07-05 17:40:39 +02:00
Aliaksandr Valialkin
6397c38a0a
lib/logstorage: use quicktemplate.AppendJSONString instead of strconv.AppendQuote for encoding JSON strings
The strconv.AppendQuote improperly encodes special chars such as \x1b . They must be encoded as \u001b .

See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24
2024-07-05 01:22:49 +02:00
Aliaksandr Valialkin
172ae1adf7
Revert c6c5a5a186 and b2765c45d0
Reason for revert:

There are many statsd servers exist:

- https://github.com/statsd/statsd - classical statsd server
- https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ )
- https://github.com/avito-tech/bioyino - high-performance statsd server
- https://github.com/atlassian/gostatsd - statsd server in Go
- https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics

These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics
according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd (
the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target
according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ).

Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides
significant advantages over the existing statsd servers, while has no significant drawbacks comparing
to existing statsd servers.

The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server.
The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics (
see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers.
So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating
from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent).

Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server
with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation.
In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation.
This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated
statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed
during querying.

P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic
specialized configuration for aggregating of statsd metrics. The main requirements for this configuration:

- easy to write, read and update (ideally it should work out of the box for most cases without additional configuration)
- hard to misconfigure (e.g. hard to shoot yourself in the foot)

It would be great if this configuration will be compatible with the configuration of the most widely used statsd server.

In the mean time it is recommended continue using external statsd server.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600
2024-07-03 23:57:49 +02:00
Aliaksandr Valialkin
7a60e8abf7
lib/promscrape: use prompbmarshal.MustParsePromMetrics function at parseData() test function
The prompbmarshal.MustParsePromMetrics function has been added in the commit cc4d57d650
2024-07-03 16:10:37 +02:00
Aliaksandr Valialkin
cd152693c6
Revert "Exemplar support (#5982)"
This reverts commit 5a3abfa041.

Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below).
Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever
once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite
that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ .

Issues with Prometheus exemplars:

- Prometheus still has only experimental support for exemplars after more than three years since they were introduced.
  It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature.
  See 0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)
  and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage

- It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs
  for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API
  for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars
  via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus .

- It looks like exemplars are supported for Histogram metric types only -
  see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar .
  Exemplars aren't supported for Counter, Gauge and Summary metric types.

- Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query
  contains histogram_quantile() function. It queries exemplars via special Prometheus API -
  https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.)
  and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work
  in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets
  exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph.
  This makes the graph unusable, since it is litterally filled with thousands of exemplar dots.
  Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars.

- Exemplars are usually connected to traces. While traces are good for some

I doubt exemplars will become production-ready in the near future because of the issues outlined above.

Alternative to exemplars:

Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs -
just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry!
This doesn't work as expected in production as shown above. Are there better solutions, which work in production?
Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job`
and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry
by these labes on the time range with the anomaly on metrics' graph.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5982
2024-07-03 16:09:18 +02:00
Aliaksandr Valialkin
a5d60ad78e
app/vmagent/remotewrite,lib/streamaggr: re-use common code in tests after 879771808b
- Export streamaggr.LoadFromData() function, so it could be used in tests outside the lib/streamaggr package.
  This allows removing a hack with creation of temporary files at TestRemoteWriteContext_TryPush_ImmutableTimeseries.

- Move common code for mustParsePromMetrics() function into lib/prompbmarshal package,
  so it could be used in tests for building []prompbmarshal.TimeSeries from string.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6206
2024-07-03 15:22:51 +02:00
Aliaksandr Valialkin
f8779d1ed2
lib/streamaggr: follow-up for the commit c0e4ccb7b5
- Clarify docs for `Ignore aggregation intervals on start` feature.

- Make more clear the code dealing with ignoreFirstIntervals at aggregator.runFlusher() functions.
  It is better from readability and maintainability PoV using distinct a.flush() calls
  for distinct cases instead of merging them into a single a.flush() call.

- Take into account the first incomplete interval when tracking the number of skipped aggregation intervals,
  since this behaviour is easier to understand by the end users.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6137
2024-07-02 21:34:48 +02:00
Andrii Chubatiuk
252aa5a3ab
lib/protoparser/graphite: added -graphite.sanitizeMetricName flag (#6489)
### Describe Your Changes

Added flag to sanitize graphite metrics
fixes #6077

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 476faf5578)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-07-02 17:16:00 +02:00
Aliaksandr Valialkin
7426b40250
lib/logstorage: allow writing after N in front of before N at stream_context pipe 2024-07-02 01:39:45 +02:00