Commit graph

2787 commits

Author SHA1 Message Date
Andrii Chubatiuk
35be17ab30 cluster: obtain tenant information from headers 2024-11-29 15:07:11 +02:00
Zhu Jiekun
44d856922a
lib/promscrape/discovery: properly apply the resource_group filter for Azure service discovery
Previously, this filter did not apply to virtual
machine scale sets, causing all virtual machines to be discovered.

 This commit conditionally adds `resource_group` filter for Azure service discovery on virtual
machine scale sets. 

 Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7630.
2024-11-26 19:08:31 +01:00
f41gh7
ed9ab2ea73
refactoring: changed prompb to prompbmarshal everythere where internal series transformations are happening (#7409)
doing similar changes for both vmagent and vminsert (like one in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7399) ends up
with almost same implementations for each of packages instead of having
this shared code in one place. one of the reasons is the same Timeseries
and Labels structure from different prompb and prompbmarshal packages.
My proposal is to use structures from prompb package only to
marshal/unmarshal sent/received data, but for internal transformations
use only structures from prompbmarshal package

Another example, where it already can help to simplify code is streaming
aggregation pipeline for vmsingle (now it first marshals
prompb.Timeseries to storage.MetricRow and then if streaming aggregation
or deduplication is enabled it unmarshals all the series back but to
prompbmarshal.Timeseries)

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-11-26 19:02:06 +01:00
Andrei Baidarov
728ceefca1
vmagent: set up a timeout for tcp connection establishment during k8s discovery
Previously, default dial timeout was used for kubernetes API server connection.

 This commit changes it for custom dialer used by the all VictoriaMetrics components. It has lower connection timeout (30s by default). 


 Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7127

---------
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2024-11-25 18:08:32 +01:00
Ivan Yurochko
60ac4e0c00
lib/streamaggr: add ignore_first_sample_interval param for streamaggr cfg (#7313)
### Describe Your Changes

As of right now by default aggregated output in streaming aggregation
takes a staleness interval and only starts sending first samples after
the staleness interval passes. We have a use case where we prefer to
start sending data as soon as we have any. This adds the option to
configure when we start sending first samples

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7116

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-11-22 17:38:13 +01:00
Artem Fetishev
3ddaafa729
lib/storage: confirm that changing retention period can cause previous indexDB deletion (#7569)
### Describe Your Changes

Add test cases proving that it is possible to lose indexDB after
changing the retention period. See #7609

### Checklist

The following checks are **mandatory**:

- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit 3383589fd1)
2024-11-21 15:23:48 +01:00
Will Jordan
209a5024ce
lib/tenantmetrics: improves CounterMap performance with large numbers of tenants
Previously, map for storing tenant metrics was re-created to each newly ingested tenant. It has significant performance impact for systems with large number of tenants. 

 This commit addresses this issue by changing algorithm of creating tenant metric records at map. Instead of map re-creation, it uses `sync.Map` primitive. 


Benchmark results:

```
goos: linux
goarch: amd64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics
cpu: AMD Ryzen 9 5900X 12-Core Processor            
                                            │ lib/tenantmetrics/orig.bench │     lib/tenantmetrics/new.bench     │
                                            │            sec/op            │    sec/op     vs base               │
CounterMapGrowth/n=100,nProcs=GOMAXPROCS-24                  1943.2µ ±  5%   248.0µ ± 11%  -87.24% (p=0.001 n=7)
CounterMapGrowth/n=100-24                                    434.63µ ±  5%   98.82µ ± 16%  -77.26% (p=0.001 n=7)
CounterMapGrowth/n=1000-24                                   32.719m ± 20%   1.425m ±  5%  -95.65% (p=0.001 n=7)
CounterMapGrowth/n=10000-24                                 3653.60m ±  5%   18.00m ±  2%  -99.51% (p=0.001 n=7)
geomean                                                       17.83m         890.4µ        -95.00%
```

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7482

---
Co-authored-by: Artem Fetishev <rtm@victoriametrics.com>
2024-11-20 18:36:06 +01:00
Nikolay
59d2b4c7fc
lib/storage: properly check for minMissingTimestamps
After changes at commit 787b9cd. Minimal timestamps for extDB check was performed without context of the index search prefix.
It worked fine for Single node version, but for cluster version a different prefix was used for
metricID search requests. It may lead to incomplete results, if minimal missing timestamp was cached
for the tenant with different ingestion patterns.

 Minimal reproducible case is:
- metrics were ingested for tenants 0 and 1
- at some point in time metrics ingestion for tenant 1 stopped
- index records have the following timestamps layout:
 tenant 0:  1,2,3,4,5,6
 tenant 1:  1,2,3,4
- after indexDB rotation, containsTimeRange lookups may produce
  incorrect results:
 time range request for tenant 1 - 5:6 caches 5 as min timestamp
 request for the same or smaller time range for tenant 0 now returns
empty results.

Second case:
- requests for the tenant without metrics always updates atomic value with incorrect minimal time range for other tenants.

 This commit replaces single atomic with map of search prefix keys. It should have slight performance overhead,
but work consistently for cluster version. minMissingTimestamp is cached by prefix search key, which included tenantID.

 Since it will be only populated at runtime, it doesn't hold unused tenants for queries.

Related issue: 
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7417
2024-11-15 16:18:32 +01:00
Aliaksandr Valialkin
75e4a8e64b
lib/logstorage: properly skip filtered out dict values when calculating uniq_values, min, max, row_min and row_max stats functions
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7458
2024-11-14 17:21:28 +01:00
Aliaksandr Valialkin
8b287e8da4
lib/logstorage: properly clone field values at values stats function
Previously field values weren't properly cloned, which could lead to garbage output for `values` stats function

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7458
2024-11-14 17:21:28 +01:00
Aliaksandr Valialkin
a7e1575ebe
lib/logstorage: simplify the code for uniq_values stats function a bit
Move the repeated check for an empty value into statsUniqValuesProcessor.updateState() function.
This allow removing duplicate code for this check from statsUniqValuesProcessor.updateState() call sites.
2024-11-14 17:21:27 +01:00
Aliaksandr Valialkin
5b0b7d509f
lib/logstorage: support for [label1=value1 ... labelN=valueN] syntax inside syslog messages for adding arbitrary labels (fields) to log entries 2024-11-14 17:21:26 +01:00
Aliaksandr Valialkin
a02d26e853
lib/logstorage: properly take into account the end query arg when calculating time range for _time:duration filters
(cherry picked from commit e5537bc64d)
2024-11-08 17:07:57 +01:00
Aliaksandr Valialkin
f82cfa16bf
lib/logstorage: allow specifying _time filter offset without time range
This is useful when builiding graphs on time ranges in the past.

(cherry picked from commit a98fb495c6)
2024-11-08 17:07:57 +01:00
Aliaksandr Valialkin
a4ea3b87d7
lib/logstorage: optimize query imeediately after its parsing
This eliminates possible bugs related to forgotten Query.Optimize() calls.

This also allows removing optimize() function from pipe interface.

While at it, drop filterNoop inside filterAnd.

(cherry picked from commit 66b2987f49)
2024-11-08 17:07:56 +01:00
Aliaksandr Valialkin
52929c060a
app/vlselect/logsql: call Query.Optimize() inside parseCommonArgs(), which is called et every /select/logsql/* endpoint.
This reduces the probability of forgotten call to Query.Optimize().

(cherry picked from commit 0550093802)
2024-11-08 17:07:56 +01:00
Aliaksandr Valialkin
7d078dd591
lib/logstorage: add an ability to add prefix to resulting query field names in join pipe
See https://docs.victoriametrics.com/victorialogs/logsql/#join-pipe

(cherry picked from commit 5a6531b329)
2024-11-08 17:07:56 +01:00
Aliaksandr Valialkin
364a2e3e1f
docs/VictoriaLogs: properly sort log fields with floating-point numbers
(cherry picked from commit 42c9183281)
2024-11-07 13:00:20 +01:00
Aliaksandr Valialkin
7a39f526ec
lib/logstorage: add block_stats pipe for analyzing per-block storage stats
(cherry picked from commit 5ed54ebadf)
2024-11-07 13:00:19 +01:00
Aliaksandr Valialkin
83c9d42263
lib/logstorage: add join pipe for joining multiple query results
(cherry picked from commit f9e23bf8e3)
2024-11-07 13:00:19 +01:00
Zakhar Bessarab
718f8077a8
Revert "lib/mergeset: add sparse indexdb cache (#7269)"
This reverts commit 837d0d136d.
2024-11-04 10:33:22 -03:00
Aliaksandr Valialkin
fced48d540
app/vlinsert: implement the ability to add extra fields to the ingested logs
This can be done via extra_fields query arg or via VL-Extra-Fields HTTP header.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7354#issuecomment-2448671445

(cherry picked from commit 4478e48eb6)
2024-11-04 10:23:16 -03:00
Aliaksandr Valialkin
bcbaecd73f
lib/logstorage: increase the the maximum number of columns per block from 1000 to 2000
This will allow storing wide events with up to 2K fields per event into VictoriaLogs.
While at it, remove the misleading comment that columnsHeader is read in full per each matching block.
This is no longer the case after the improvements made at 202eb429a7 .
Now only the needed columnHeader is read for the column mentioned in the query.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6425#issuecomment-2418337124
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762

(cherry picked from commit 9ba6be4179)
2024-11-04 10:23:15 -03:00
Nikolay
b6e7852eee
lib/promscrape: add relabel configs to global section
This commit adds `metric_relabel_configs` and `relabel_configs` fields
into the `global` section of scrape configuration file.

 New fields are used as global relabeling rules for the scrape targets.

 These relabel configs are prepended to the target relabel configs.
This feature is useful to:
* apply global rules to __meta labels from service discovery targets.
* drop noisy labels during scrapping.
* mutate labels without affecting metrics ingested via any of push
protocols. 

Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6966

---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-10-31 20:08:31 +01:00
Aliaksandr Valialkin
1ea65d662f
lib/logstorage: properly reset cached output fields for extract and extract_regexp pipes after the log entry matches if(...) condition
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7162

(cherry picked from commit c5d08d317c)
2024-10-31 14:11:08 +01:00
Aliaksandr Valialkin
b74bcb7886
lib/logstorage: properly cache replace() and replace_regexp() results for identical adjacent field values
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7162

(cherry picked from commit 2e635a42d8)
2024-10-31 14:11:08 +01:00
Aliaksandr Valialkin
0c657a95dc
app/vlselect: add support for extra_filters and extra_stream_filters query args across all the HTTP querying APIs
These query args are going to be used for quick filtering on field values at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7365

(cherry picked from commit 7603446850)
2024-10-31 14:11:07 +01:00
Artem Fetishev
d7b3589dbd
tests: Initial version of integration tests (#7253)
### Describe Your Changes

Related issue: #7199 

This is the initial version of the integration tests for cluster. See
`README.md` for details.

Currently cluster only, but it can also be used for vm-single if needed.

The code has been added to the apptest package that resides in the root
directory of the VM codebase. This is done to exclude the integration
tests from regular testing build targets because:

- Most of the test variants do not apply to integration testing (such as
pure or race).
- The integtation tests may also be slow because each test must wait for
2 seconds so vmstorage flushes pending content). It may be okay when
there are a few tests but when there is a 100 of them running tests will
require much more time which will affect the developer wait time and CI
workflows.
- Finally, the integration tests may be flaky especially short term.

An alternative approach would be placing apptest under app package and
exclude apptest from packages under test, but that is not trivial.

The integration tests rely on retrieving some application runtime info
from the application logs, namely the application's host:port. Therefore
some changes to lib/httpserver/httpserver.go were necessary, such as
reporting the effective host:port instead the one from the flag.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2024-10-30 15:22:06 +01:00
Aliaksandr Valialkin
8baa5177aa
app/vlinsert: allow specifying comma-separated list of fields containing log message via _msg_field query arg and VL-Msg-Field HTTP request header
This msy be useful when ingesting logs from different sources, which store the log message in different fields.
For example, `_msg_field=message,event.data,some_field` will get log message from the first non-empty field:
`message`, `event.data` and `some_field`.

(cherry picked from commit ed73f8350b)
2024-10-30 15:19:52 +01:00
Aliaksandr Valialkin
bf243df9ce
lib/logstorage: make sure that the number of output (bloom, values) shards is bigger than zero.
If the number of output (bloom, values) shards is zero, then this may lead to panic
as shown at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7391 .

This panic may happen when parts with only constant fields with distinct values are merged into
output part with non-constant fields, which should be written to (bloom, values) shards.

(cherry picked from commit 102e9d4f4e)
2024-10-30 15:19:51 +01:00
cangqiaoyuzhuo
07cf3189f8
chore: fix function name (#7381)
### Describe Your Changes

 fix function name

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

(cherry picked from commit 45896fb477)
2024-10-30 13:13:05 +01:00
Aliaksandr Valialkin
1dd01b8a8f
lib/logstorage: follow-up for af831a6c906158f371f1b6810706fa0a54b78386
Sync the code between top and sort pipes regarding the code related to rank.

(cherry picked from commit 7a623c225f)
2024-10-30 09:52:52 +01:00
Aliaksandr Valialkin
329d9a46ee
lib/logstorage: add an ability to return rank from top pipe results
(cherry picked from commit 3c06d083ea)
2024-10-30 09:52:51 +01:00
Aliaksandr Valialkin
fe5f16b662
lib/logstorage: dynamically adjust the number of (bloom, values) shards in a part depending on the number of non-const columns
This allows reducing the amounts of data, which must be read during queries over logs with big number of fields (aka "wide events").
This, in turn, improves query performance when the data, which needs to be scanned during the query, doesn't fit OS page cache.

(cherry picked from commit 7a62eefa34)
2024-10-30 09:52:51 +01:00
Aliaksandr Valialkin
76b21c8560
lib/logstorage: avoid reading columnsHeader data when field_values pipe is applied directly to log filters
This improves performance of `field_values` pipe when it is applied to large number of data blocks.
This also improves performance of /select/logsql/field_values HTTP API.

(cherry picked from commit 8d968acd0a)
2024-10-30 09:52:50 +01:00
Hui Wang
9616814728
vmalert: integrate with victorialogs (#7255)
address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706.
See
https://github.com/VictoriaMetrics/VictoriaMetrics/blob/vmalert-support-vlog-ds/docs/VictoriaLogs/vmalert.md.

Related fix
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7254.

Note: in this pull request, vmalert doesn't support
[backfilling](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/vmalert-support-vlog-ds/docs/VictoriaLogs/vmalert.md#rules-backfilling)
for rules with a customized time filter. It might be added in the
future, see [this
issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7289)
for details.

Feature can be tested with image
`victoriametrics/vmalert:heads-vmalert-support-vlog-ds-0-g420629c-scratch`.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 68bad22fd2)
2024-10-29 16:32:00 +01:00
Zakhar Bessarab
517bd9392c
lib/storage/partition: prevent panic in case resulting in-memory part is empty after merge (#7329)
It is possible for in-memory part to be empty if ingested samples are
removed by retention filters. In this case, data will not be discarded
due to retention before creating in memory part. After in-memory parts
merge samples will be removed resulting in creating completely empty
part at destination.

 This commit checks for resulting part and skips it, if it's empty.

---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-10-27 20:42:42 +01:00
Zhu Jiekun
3d55605ae5
lib/promscrape: adds support for PuppetDB service discovery
This commit adds support for
[PuppetDB](https://www.puppet.com/docs/puppetdb/8/overview.html) service
discovery to the `vmagent` and `victoria-metrics-single` components.

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5744
2024-10-27 20:42:42 +01:00
Andrii Chubatiuk
f6f4884ba6
lib/promscrape/discovery/kubernetes: support kubernetes native sidecars (#7324)
This commit adds Kubernetes Native Sidecar support. 

It's the special type of init containers, that have restartPolicy == "Always" and continue to run after container initialization. 


related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7287
2024-10-27 20:25:15 +01:00
Zakhar Bessarab
8198e7241d
lib/mergeset: add sparse indexdb cache (#7269)
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7182

- add a separate index cache for searches which might read through large
amounts of random entries. Primary use-case for this is retention and
downsampling filters, when applying filters background merge needs to
fetch large amount of random entries which pollutes an index cache.
Using different caches allows to reduce effect on memory usage and cache
efficiency of the main cache while still having high cache hit rate. A
separate cache size is 5% of allowed memory.

- reduce size of indexdb/dataBlocks cache in order to free memory for
new sparse cache. Reduced size by 5% and moved this to a separate cache.

- add a separate metricName search which does not cache metric names -
this is needed in order to allow disabling metric name caching when
applying downsampling/retention filters. Applying filters during
background merge accesses random entries, this fills up cache and does
not provide an actual improvement due to random access nature.

Merge performance and memory usage stats before and after the change:

- before

![image](https://github.com/user-attachments/assets/485fffbb-c225-47ae-b5c5-bc8a7c57b36e)

- after

![image](https://github.com/user-attachments/assets/f4ba3440-7c1c-4ec1-bc54-4d2ab431eef5)

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
(cherry picked from commit 837d0d136d)
2024-10-24 12:43:06 -03:00
Artem Fetishev
a722ddbdd9
lib/storage: Fix flaky test: TestStorageRotateIndexDB (#7267)
This commit fixes the TestStorageRotateIndexDB flaky test reported at:
#6977. Sample test failure: https://pastebin.com/bTSs8HP1

The test fails because one goroutine adds items to the indexDB table
while another goroutine is closing that table. This may happen if
indexDB rotation happens twice during one Storage.add() operation:
-  Storage.add() takes the current indexDB and adds index recods to it
- First index db rotation makes the current index DB a previous one
(still ok at this point)
- Second index db rotation removes the indexDB that was current two
rotations earlier. It does this by setting the mustDrop flag to true and
decrementing the ref counter. The ref counter reaches zero which cases
the underlying indexdb table to release its resources gracefully.
Graceful release assumes that the table is not written anymore. But
Storage.add() still adds items to it.

The solution is to increment the indexDB ref counters while it is used
inside add().
The unit test has been changed a little so that the test fails reliably.
The idea is to make add() function invocation to last much longer,
therefore the test inserts not just one record at a time but thouthands
of them.

To see the test fail, just replace the idbsLocked() func with:

```go
unc (s *Storage) idbsLocked2() (*indexDB, *indexDB, func()) {
       return s.idbCurr.Load(), s.idbNext.Load(), func() {}
}
```

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>

(cherry picked from commit 6b9f57e5f7)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-24 15:45:54 +02:00
Zhu Jiekun
85f60237e2
vmstorage: auto calculate maxUniqueTimeseries based on resources (#6961)
### Describe Your Changes

Add support for
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6930

Calculate `-search.maxUniqueTimeseries` by
`-search.maxConcurrentRequests` and remaining memory if it's **not set**
or **less equal than 0**.

The remaining memory is affected by `-memory.allowedPercent`,
`-memory.allowedBytes` and cgroup memory limit.
### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-10-18 13:41:43 +02:00
Andrii Chubatiuk
1d352b92c7
lib/promscrape: fixed reload on max_scrape_size change (#7282)
### Describe Your Changes

fixed reload on max_scrape_size change
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7260

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 965a33c893)
2024-10-18 11:42:47 +02:00
Aliaksandr Valialkin
62e4baf556
lib/logstorage: use simpler in-memory cache instead of workingsetcache for caching recently ingested _stream values and recently queried set of streams
These caches aren't expected to grow big, so it is OK to use the most simplest cache based on sync.Map.
The benefit of this cache compared to workingsetcache is better scalability on systems with many CPU cores,
since it doesn't use mutexes at fast path.
An additional benefit is lower memory usage on average, since the size of in-memory cache equals
working set for the last 3 minutes.

The downside is that there is no upper bound for the cache size, so it may grow big during workload spikes.
But this is very unlikely for typical workloads.

(cherry picked from commit 0f24078146)
2024-10-18 11:42:16 +02:00
Aliaksandr Valialkin
f9d86a913c
lib/logstorage: do not persist streamIDCache, since it may go out of sync with partition directories, which can be changed manually between VictoriaLogs restarts
Partition directories can be manually deleted and copied from another sources such as backups or other VitoriaLogs instances.
In this case the persisted cache becomes out of sync with partitions. This can result in missing index entries
during data ingestion or in incorrect results during querying. So it is better to do not persist caches.
This shouldn't hurt VictoriaLogs performance just after the restart too much, since its caches usually contain
small amounts of data, which can be quickly re-populated from the persisted data.

(cherry picked from commit 8aa144fa74)
2024-10-18 11:42:16 +02:00
Aliaksandr Valialkin
b9fae4378a
lib/logstorage: consistently use "pHits := m[..]" pattern
Consistency improves maintainability of the code a bit.

(cherry picked from commit 1892e357c3)
2024-10-18 11:42:16 +02:00
Aliaksandr Valialkin
92b9b13df1
lib/logstorage: optimize performance for queries, which select all the log fields for logs containing hundreds of log fields (aka "wide events")
Unpack the full columnsHeader block instead of unpacking meta-information per each individual column
when the query, which selects all the columns, is executed. This improves performance when scanning
logs with big number of fields.

(cherry picked from commit 2023f017b1)
2024-10-18 11:42:15 +02:00
Aliaksandr Valialkin
5d541322c6
lib/logstorage: improve performance of top and field_values pipes on systems with many CPU cores
- Parallelize mering of per-CPU results.
- Parallelize writing the results to the next pipe.

(cherry picked from commit 78c6fb0883)
2024-10-18 11:42:15 +02:00
Aliaksandr Valialkin
cd7823a310
lib/logstorage: optimize 'stats by(...)' calculations for by(...) fields with millions of unique values on multi-CPU systems
- Parallelize merging of per-CPU `stats by(...)` result shards.
- Parallelize writing `stats by(...)` results to the next pipe.

(cherry picked from commit c4b2fdff70)
2024-10-18 11:42:15 +02:00
Aliaksandr Valialkin
1000ae437c
lib/logstorage: optimize performance for top pipe when it is applied to a field with millions of unique values
- Use parallel merge of per-CPU shard results. This improves merge performance on multi-CPU systems.
- Use topN heap sort of per-shard results. This improves performance when results contain millions of entries.

(cherry picked from commit 192c07f76a)
2024-10-18 11:42:15 +02:00