Commit graph

816 commits

Author SHA1 Message Date
Aliaksandr Valialkin
3c02937a34
all: consistently use 'any' instead of 'interface{}'
'any' type is supported starting from Go1.18. Let's consistently use it
instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.
2024-07-10 00:20:37 +02:00
rtm0
a42bd59ee4
Fix Date metricid cache consistency under concurrent use ()
### Describe Your Changes

Fix Date metricid cache consistency under concurrent use.
When one goroutine calls Has() and does not find the cache entry in the
immutable map it will acquire a lock and check the mutable map. And it
is possible that before that lock is acquired, the entry is moved from
the mutable map to the immutable map by another goroutine causing a
cache miss.

The fix is to check the immutable map again once the lock is acquired. 

### Checklist

The following checks are **mandatory**:

- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-06-26 17:33:38 +02:00
Roman Khavronenko
b984f4672e
lib/storage: filter deleted label names and values from `/api/v1/labe… ()
…ls` and `/api/v1/label/.../values`

Check for deleted metrics when `match[]` filter matches small number of
time series (optimized path).

The issue was introduced
[v1.81.0](https://docs.victoriametrics.com/changelog_2022/#v1810).

Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6300 Updates
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-29 14:07:44 +02:00
Aliaksandr Valialkin
4b458370c1
lib/logstorage: work-in-progress 2024-05-24 03:06:55 +02:00
Nikolay
a5d1013042
lib/storage: change default value for maxLabelValueLen to 1024 ()
* It must reduce memory usage for misbehaving clients. Since
VictoriaMetrics stores sparse index inmemory.
* Reduce disk space usage for indexdb.
* Prevent possible indexDB items drops.
* It may trigger slow insert and new timeseries registration due to
default value for flag change

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6176

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-05-22 21:53:53 +02:00
Aliaksandr Valialkin
ad505a7a9a
lib/logstorage: work-in-progress 2024-05-20 04:08:30 +02:00
Aliaksandr Valialkin
cc2647d212
lib/encoding: optimize UnmarshalVarUint64, UnmarshalVarInt64 and UnmarshalBytes a bit
Change the return values for these functions - now they return the unmarshaled result plus
the size of the unmarshaled result in bytes, so the caller could re-slice the src for further unmarshaling.

This improves performance of these functions in hot loops of VictoriaLogs a bit.
2024-05-14 01:23:54 +02:00
Hui Wang
4c80b17027
storage: correctly apply -inmemoryDataFlushInterval when it's set t… ()
…o minimum supported value 1s
pendingRowsFlushInterval was bumped to 2s in
73f0a805e2
2024-05-13 16:44:30 +02:00
Aliaksandr Valialkin
590160ddbb
lib/slicesutil: add helper functions for setting slice length and extending its capacity
The added helper functions - SetLength() and ExtendCapacity() - replace error-prone code with simple function calls.
2024-05-12 11:32:17 +02:00
Aliaksandr Valialkin
f20d452196
lib/storage: remove outdated misleading comments 2024-05-12 10:24:04 +02:00
Aliaksandr Valialkin
6b1cc9b946
lib/storage: search for all the values for the given label before applying filters and limits
It is incorrect applying the limit on the number of values to search without applying filters,
since the returned subset of label values may miss the label values matching the given filters.

This is a follow-up for 66630c7960
2024-04-18 20:29:36 +02:00
Aliaksandr Valialkin
66630c7960
lib/storage: improve performance for /api/v1/label/labelName/values when match[] contains only a single filter on labelName
This speeds up auto-suggestion for metric names in VMUI and Grafana, which use the following query in this case:

  /api/v1/label/__name__/values?match[]={__name__=~"*.some_value.*"}

When the user types `some_value` in the query input field.
2024-04-18 01:15:20 +02:00
Aliaksandr Valialkin
85d09e5a2d
lib/{mergeset,storage}: log deleting directories inside partitions if they are missing in parts.json
This should improve debuggability of unexpected deletion of directories inside partitions.

While at it, log the proper path to parts.json when the directory for big part is missing in the partition.
parts.json is located inside directory with small parts, and there is no parts.json file inside directory with big parts.
2024-04-16 19:11:32 +02:00
Aliaksandr Valialkin
6bcc6c938b
lib/storage: improve comments inside functions responsible for creating indexes for newly registered time series 2024-04-16 19:11:32 +02:00
Aliaksandr Valialkin
918cccaddf
all: fix golangci-lint(revive) warnings after 0c0ed61ce7
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001
2024-04-02 23:16:29 +03:00
Aliaksandr Valialkin
c3a72b6cdb
lib/storage: consistently use stopCh instead of stop 2024-04-02 21:24:57 +03:00
Zakhar Bessarab
af3922b1df
lib/storage: add ability to use downsampling for the given series filter ()
* lib/storage: add ability to use downsampling for the given series filter

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add information about downsampling filters

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: fix MetricsQL filter

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/downsampling: treat missing downsampling filter as a bug

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/part_header: verify correctness of downsampling filters when opening partition

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/downsampling: save only appliable rules in part metadata

Filter and save only rules which are appliable to partition based on MinTimestamp of stored data.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/downsampling: update log messages for final dedup

Properly specify a reason of re-running deduplication for partition.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage: consistently use MaxTimestamp to determine deduplication/downsampling rules

Using MinTimestamp leads to applying downsampling to parts which are only partially covered by downsampling rule.
For example, partition covers range [1000-2000]. At t=2100 and rule offset 500 data with t=2100-500 => 1600 must be downsampled. The range check against MinTimestamp evaluates to true even though partition contains range which must not be downsampled - [1600:2000].

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Follow-up

- Apply the first matching downsampling period if multiple filters match the given time series.
  This allows fine-tuning the downsampling config for the specific needs.
- Take into account downsampling filters during search queries.
- Reduce the difference between community and enterprise branches. This should simplify further maintenance of these branches.
- Properly parse series filters with colons inside them.
- Document the feature at docs/CHANGELOG.md.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4960

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-03-30 04:12:23 +02:00
Aliaksandr Valialkin
131f357098
lib/storage/table.go: reduce the difference with enterprise branch 2024-03-30 03:22:51 +02:00
Aliaksandr Valialkin
4001ca36b8
lib/storage/partition.go: reduce code difference a bit with enterprise branch 2024-03-30 01:39:27 +02:00
Nikolay
a05303eaa0
lib/storage: adds metrics for downsampling ()
* lib/storage: adds metrics for downsampling
vm_downsampling_partitions_scheduled - shows the number of parts, that must be downsampled
vm_downsampling_partitions_scheduled_size_bytes - shows total size in bytes for parts, the must be donwsampled

These two metrics answer the questions - is downsampling running? how many parts scheduled for downsampling and how many of them currently downsampled? Storage space that it occupies.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2612

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-03-30 01:11:49 +02:00
Aliaksandr Valialkin
4a359d5f67
lib/storage: follow-up for 76f00cea6b
Store the deadline when the metricID entries must be deleted from indexdb
if metricID->metricName entry isn't found after the deadline. This should
make the code more clear comparing the the previous version, where the timestamp
of the first metricID->metricName lookup miss was stored in missingMetricIDs.

Remove the misleading comment about the importance of the order for creating entries
in the inverted index when registering new time series. The order doesn't matter,
since any subset of the created entries can become visible for search
before any other subset after registering in indexdb.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959
2024-03-27 11:41:28 +02:00
Zakhar Bessarab
51f5ac1929
lib/storage/table: wait for merges to be completed when closing a table ()
* lib/storage/table: properly wait for force merges to be completed during shutdown

Properly keep track of running background merges and wait for merges completion when closing the table.
Previously, force merge was not in sync with overall storage shutdown which could lead to holding ptw ref.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add changelog entry

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-03-26 13:49:09 +01:00
Aliaksandr Valialkin
76f00cea6b
lib/storage: wait for up to 60 seconds before deciding to delete metricID entries from indexdb if metricID->metricName entry is missing during search
The metricID->metricName entry can remain invisible for search for some time after registering new metricName.
This is expected condition. So wait for up to 60 seconds in the hope that the metricID->metricName
entry will become visible before deleting all the entries from indexdb, which are associated with the given metricID.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948

See also 20812008a7
2024-03-18 00:34:32 +02:00
Aliaksandr Valialkin
d1d2771bee
lib/storage: optimize /api/v1/labels and /api/v1/label/.../values when match[] contains metric name
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-03-12 02:43:16 +02:00
Aliaksandr Valialkin
d46d87a9e0
lib/storage: move the conversion of tag filters to composite tag filters into indexSearch.searchMetricIDsInternal
This makes the code less fragile - it is harder to skip the convertToCompositeTagFilterss() call now.
While at it, call indexSearch.containsTimeRange() inside indexSearch.searchMetricIDsInternal()
in order to quickly terminate search of time series in the old indexdb for new time ranges.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055

This is a follow-up for 2d31fd7855
2024-03-11 20:40:28 +02:00
Aliaksandr Valialkin
2d31fd7855
lib/storage: use composite indexes (metricName, label=value) when searching for matching time series at /api/v1/labels, /api/v1/label/.../values and /api/v1/status/tsdb
This should improve query performance when match[], extra_filters[] or extra_label args are passed to these APIs

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-03-10 12:57:34 +02:00
Aliaksandr Valialkin
c75bfd5b07
lib/storage: use unsafe.Slice instead of deprecated reflect.SliceHeader 2024-02-29 17:24:34 +02:00
Aliaksandr Valialkin
55f1f24e62
lib/storage: replace the remaining atomic.* functions with atomic.* types for the sake of consistency
See ea9e2b19a5
2024-02-24 00:53:30 +02:00
Aliaksandr Valialkin
b3d9d36fb3
lib/storage: consistently use atomic.* types instead of atomic.* function calls on ordinary types
See ea9e2b19a5
2024-02-24 00:15:26 +02:00
Aliaksandr Valialkin
f81b480905
lib/mergeset: consistently use atomic.* types instead of atomic.* function calls on ordinary types
See ea9e2b19a5
2024-02-23 23:29:35 +02:00
Aliaksandr Valialkin
a204fd69f1
lib/storage: consistently use atomic.* type for refCount and mustDrop fields in indexDB, table and partition structs
See ea9e2b19a5
2024-02-23 22:54:59 +02:00
Aliaksandr Valialkin
0f1ea36dc8
lib/storage: convert dedupsDuringMerge from uint64 to atomic.Uint64
This should simplify code maintenance by gradually converting to atomic.* types instead of calling atomic.* functions
on int and bool types.

See ea9e2b19a5
2024-02-23 22:52:00 +02:00
Aliaksandr Valialkin
ea9e2b19a5
lib/{storage,mergeset}: properly fix 'unaligned 64-bit atomic operation' panic on 32-bit architectures
The issue has been introduced in bace9a2501
The improper fix was in the d4c0615dcd ,
since it fixed the issue just by an accident, because Go comiler aligned the rawRowsShards field
by 4-byte boundary inside partition struct.

The proper fix is to use atomic.Int64 field - this guarantees that the access to this field
won't result in unaligned 64-bit atomic operation. See https://github.com/golang/go/issues/50860
and https://github.com/golang/go/issues/19057
2024-02-23 22:27:06 +02:00
hagen1778
c8d1d2ab72
lib/storage: cleanup after d4c0615dcd
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-23 18:53:55 +01:00
Dmytro Kozlov
d4c0615dcd
lib/storage: fix aligning () 2024-02-23 16:37:21 +01:00
Aliaksandr Valialkin
9bad52b687
app/vmstorage: deprecate -snapshotCreateTimeout command-line flag
Creating snapshot shouldn't time out under normal conditions.
The timeout was related to the bug, which has been fixed in 6460475e3b .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
2024-02-23 04:49:23 +02:00
Aliaksandr Valialkin
f79944532b
lib/storage: do not drop (date, metricID) entries for the date older than 2 days if samples are ingested at this date
Previously the (date, metricID) entries for dates older than the last 2 days were removed.
This could lead to slow check for the (date, metricID) entry in the indexdb during ingesting historical data (aka backfilling).

The issue has been introduced in 431aa16c8d
2024-02-23 04:06:19 +02:00
Aliaksandr Valialkin
f46eaf92eb
app/vmselect: add -search.maxLabelsAPIDuration and -search.maxLabelsAPISeries options for fine-tuning CPU and RAM usage for /api/v1/series , /api/v1/labels and /api/v1/label/.../values
This commit returns back limits for these endpoints, which have been removed at 5d66ee88bd ,
since it has been appeared that missing limits result in high CPU usage, while the introduced concurrency limiter
results in failed lightweight requests to these endpoints because of timeout when heavyweight requests are executed.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-02-23 02:57:16 +02:00
Aliaksandr Valialkin
bace9a2501
lib/{mergeset,storage}: convert bufferred items to searchable parts more optimally
Do not convert shard items to part when a shard becomes full. Instead, collect multiple
full shards and then convert them to a searchable part at once. This reduces
the number of searchable parts, which, in turn, should increase query performance,
since queries need to scan smaller number of parts.
2024-02-23 00:16:34 +02:00
Aliaksandr Valialkin
e8b3045062
lib/storage: handle common case when the number of rows passed to flushRowsToInmemoryParts() doesnt exceed maxRawRowsPerShard 2024-02-22 20:44:11 +02:00
Aliaksandr Valialkin
73f0a805e2
lib/{storage,mergeset}: convert beffered items into searchable in-memory parts exactly once per the given flush interval
Previously the interval between item addition and its conversion to searchable in-memory part
could vary significantly because of too coarse per-second precision. Switch from fasttime.UnixTimestamp()
to time.Now().UnixMilli() for millisecond precision. It is OK to use time.Now() for tracking
the time when buffered items must be converted to searchable in-memory parts, since time.Now()
calls aren't located in hot paths.

Increase the flush interval for converting buffered samples to searchable in-memory parts
from one second to two seconds. This should reduce the number of blocks, which are needed
to be processed during high-frequency alerting queries. This, in turn, should reduce CPU usage.

While at it, hardcode the maximum size of rawRows shard to 8Mb, since this size gives the optimal
data ingestion pefromance according to load tests. This reduces memory usage and CPU usage on systems
with big amounts of RAM under high data ingestion rate.
2024-02-22 20:21:14 +02:00
Aliaksandr Valialkin
463bc27312
lib/storage: avoid superflouos copy of block header data 2024-02-22 20:21:14 +02:00
Aliaksandr Valialkin
8d9d7a8a12
app/vmstorage: expose vm_snapshots metric, which shows the current number of snapshots
While at it, refresh docs about snapshots - https://docs.victoriametrics.com/#how-to-work-with-snapshots
2024-02-22 18:32:57 +02:00
Aliaksandr Valialkin
aec9cd4316
lib/storage: do not pool rawRowsBlock when flushing rawRows to in-memory blocks
The pooled rawRowsBlock objects occupies big amounts of memory between flushes,
and the flushes are relatively rare. So it is better to don't use the pool
and to allocate rawRow blocks on demand. This should reduce the average
memory usage between flushes.
2024-02-22 17:37:48 +02:00
Aliaksandr Valialkin
b7dfe9894c
lib/storage: do not keep rawRows buffer across flush() calls
The buffer can be quite big under high ingestion rate (e.g. more than 100MB).
This leads to increased memory usage between buffer flushes.
So it is better to re-create the buffer on every flush in order to reduce memory usage
between buffer flushes.
2024-02-22 17:22:26 +02:00
Aliaksandr Valialkin
6b9bedd0f9
app/vmstorage: expose vm_last_partition_parts metrics, which may help identifying performance issues related to the increased number of parts in the last partition 2024-02-15 14:51:19 +02:00
Aliaksandr Valialkin
95222b2079
all: upgrade Go builder from Go1.21.7 to Go1.22.0
See https://go.dev/doc/go1.22
2024-02-12 21:59:51 +02:00
Aliaksandr Valialkin
a49a50701a
lib/mergeset: do not panic on too long items passed to Table.AddItems()
Instead, log a sample of these long items once per 5 seconds into error log,
so users could notice and fix the issue with too long labels or too many labels.

Previously this panic could occur in production when ingesting samples with too long labels.
2024-02-12 19:32:18 +02:00
Aliaksandr Valialkin
2eb967231b
lib/storage: do not append headerData to bsw.indexData if its size exceeds maxBlockSize
This is a follow-up optimization after 3c246cdf00
2024-02-12 18:04:37 +02:00
Aliaksandr Valialkin
39e0007e14
lib/snapshot: move Time, Validate and NewName into lib/snapshot/snapshotutil package
This allows removing importing unneeded command-line flags into binaries, which import lib/storage,
which, in turn, was importing lib/snapshot in order to use Time, Validate and NewName functions.

This is a follow-up for 83e55456e2

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5738
2024-02-09 04:18:45 +02:00
Aliaksandr Valialkin
3c246cdf00
lib/{storage,mergeset}: do not create index blocks with sizes exceeding 64Kb in common case
This should reduce memory fragmentation and memory usage for indexdb/indexBlocks and storage/indexBlocks caches
2024-02-08 13:52:24 +02:00
Aliaksandr Valialkin
9e8e4cca6a
lib/storage: move fixupTimestamps() call to Block.Init()
This is a follow-up for 0bf7921721
2024-02-06 22:43:48 +02:00
Zakhar Bessarab
0bf7921721
lib/storage/raw_row: properly initialize TS for tmp blocks ()
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-02-06 20:42:32 +00:00
Aliaksandr Valialkin
431aa16c8d
lib/storage: keep (date, metricID) entries only for the last two dates
Entries for the previous dates is usually not used, so there is little sense in keeping them in memory.

This should reduce the size of storage/date_metricID cache, which can be monitored
via vm_cache_entries{type="storage/date_metricID"} metric.
2024-01-29 18:43:59 +01:00
Aliaksandr Valialkin
5d66ee88bd
lib/storage: do not check the limit for -search.maxUniqueTimeseries when performing /api/v1/labels and /api/v1/label/.../values requests
This limit has little sense for these APIs, since:

- Thses APIs frequently result in scanning of all the time series on the given time range.
  For example, if extra_filters={datacenter="some_dc"} .

- Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit,
  which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests.

Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values
and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API.
This limit shouldn't affect typical use cases for these APIs:

- Grafana dashboard load when dashboard labels should be loaded
- Auto-suggestion list load when editing the query in Grafana or vmui

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-01-29 16:45:12 +01:00
Aliaksandr Valialkin
bb7a419cc3
lib/{mergeset,storage}: make background merge more responsive and scalable
- Maintain a separate worker pool per each part type (in-memory, file, big and small).
  Previously a shared pool was used for merging all the part types.
  A single merge worker could merge parts with mixed types at once. For example,
  it could merge simultaneously an in-memory part plus a big file part.
  Such a merge could take hours for big file part. During the duration of this merge
  the in-memory part was pinned in memory and couldn't be persisted to disk
  under the configured -inmemoryDataFlushInterval .

  Another common issue, which could happen when parts with mixed types are merged,
  is uncontrolled growth of in-memory parts or small parts when all the merge workers
  were busy with merging big files. Such growth could lead to significant performance
  degradataion for queries, since every query needs to check ever growing list of parts.
  This could also slow down the registration of new time series, since VictoriaMetrics
  searches for the internal series_id in the indexdb for every new time series.

  The third issue is graceful shutdown duration, which could be very long when a background
  merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted,
  since it merges in-memory parts.

  A separate pool of merge workers per every part type elegantly resolves both issues:
  - In-memory parts are merged to file-based parts in a timely manner, since the maximum
    size of in-memory parts is limited.
  - Long-running merges for big parts do not block merges for in-memory parts and small parts.
  - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files.
    Merging for file parts is instantly canceled on graceful shutdown now.

- Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm
  should automatically self-tune according to the number of available CPU cores.

- Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly.
  It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge

- Tune the number of shards for pending rows and items before the data goes to in-memory parts
  and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate
  for registration of new time series. This should reduce the duration of data ingestion slowdown
  in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily
  unavailable.

- Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown.

This is a follow-up for fa566c68a6 .
Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2024-01-26 22:27:47 +01:00
Aliaksandr Valialkin
c3a585cfe5
lib/storage: rename *AssistedMerges to *AssistedMergesCount in order to make these field names less misleading
These fields are counters, not gauges, so adding Count suffix to them makes easier to understand this while reading the code
2024-01-25 10:19:32 +02:00
Aliaksandr Valialkin
ae643ef1f1
lib/{storage,mergeset}: reduce the maxium compression level for the stored data
This reduces CPU usage a bit, while doesn't increase resulting file sizes according to synthetic tests.
2024-01-23 17:46:50 +02:00
Aliaksandr Valialkin
6c214397ed
lib/storage: compress metricIDs, which match the given filters, before storing them in tagFiltersToMetricIDsCache
This allows reducing the indexdb/tagFiltersToMetricIDs cache size by 8 on average.
The cache size can be checked via vm_cache_size_bytes{type="indexdb/tagFiltersToMetricIDs"} metric exposed at /metrics page.
2024-01-23 16:09:55 +02:00
Aliaksandr Valialkin
4d78954158
lib/storage: do not sort metricIDs passed to Storage.prefetchMetricNames, since the caller is responsible for the sorting 2024-01-23 16:08:38 +02:00
Aliaksandr Valialkin
3449d563bd
all: add up to 10% random jitter to the interval between periodic tasks performed by various components
This should smooth CPU and RAM usage spikes related to these periodic tasks,
by reducing the probability that multiple concurrent periodic tasks are performed at the same time.
2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin
9b4294e53e
lib/storage: reduce the contention on dateMetricIDCache mutex when new time series are registered at high rate
The dateMetricIDCache puts recently registered (date, metricID) entries into mutable cache protected by the mutex.
The dateMetricIDCache.Has() checks for the entry in the mutable cache when it isn't found in the immutable cache.
Access to the mutable cache is protected by the mutex. This means this access is slow on systems with many CPU cores.
The mutabe cache was merged into immutable cache every 10 seconds in order to avoid slow access to mutable cache.
This means that ingestion of new time series to VictoriaMetrics could result in significant slowdown for up to 10 seconds
because of bottleneck at the mutex.

Fix this by merging the mutable cache into immutable cache after len(cacheItems) / 2
cache hits under the mutex, e.g. when the entry is found in the mutable cache.
This should automatically adjust intervals between merges depending on the addition rate
for new time series (aka churn rate):

- The interval will be much smaller than 10 seconds under high churn rate.
  This should reduce the mutex contention for mutable cache.
- The interval will be bigger than 10 seconds under low churn rate.
  This should reduce the uneeded work on merging of mutable cache into immutable cache.
2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin
1c7f990fad
app/vmselect: handle negative time range start in a generic manner inside NewSearchQuery()
This is a follow-up for cf03e11d89

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5553
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5630
2024-01-21 23:45:31 +02:00
Aliaksandr Valialkin
0b2ea1a7c7
all: call atomic.Load* in front of atomic.CompareAndSwap* at places where the atomic.CompareAndSwap* returns false most of the time
This allows avoiding slow inter-CPU synchornization induced by atomic.CompareAndSwap*
2024-01-21 14:04:54 +02:00
Aliaksandr Valialkin
90768aa418
lib/storage/partition.go: remove misleading comment, which falsely states that inmemoryParts isn't visible to search
Thanks to @satjd for raising attention to this comment at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5410
2024-01-21 04:49:35 +02:00
Aliaksandr Valialkin
5bdf62de5b
lib/storage: do not prefetch metric names for small number of metricIDs
This eliminates prefetchedMetricIDsLock lock contention for queries, which return less than 500 time series.

This is a follow-up for 9d886a2eb0
2024-01-17 13:48:15 +02:00
Aliaksandr Valialkin
9d886a2eb0
lib/storage: follow-up for 4b8088e377
- Clarify the bugfix description at docs/CHANGELOG.md
- Simplify the code by accessing prefetchedMetricIDs struct under the lock
  instead of using lockless access to immutable struct.
  This shouldn't worsen code scalability too much on busy systems with many CPU cores,
  since the code executed under the lock is quite small and fast.
  This allows removing cloning of prefetchedMetricIDs struct every time
  new metric names are pre-fetched. This should reduce load on Go GC,
  since the cloning of uin64set.Set struct allocates many new objects.
2024-01-16 15:29:57 +02:00
Roman Khavronenko
4b8088e377
lib/storage: properly check for storage/prefetchedMetricIDs cache expiration deadline ()
Before, this cache was limited only by size.
Cache invalidation by time happens with jitter to prevent thundering herd problem.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-15 10:03:06 +01:00
Aliaksandr Valialkin
f2229c2e42
lib/prompb: change type of Label.Name and Label.Value from []byte to string
This makes it more consistent with lib/prompbmarshal.Label
2024-01-14 22:33:21 +02:00
Artem Navoiev
d374595e31
docs: mention staleNaN handling during deduplication
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5587

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-11 11:53:58 +01:00
hagen1778
d493da562e
lib/storage: fix typo
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-21 11:20:43 +01:00
hagen1778
e96b4410a1
lib/storage: fix typo
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-21 10:52:53 +01:00
Aliaksandr Valialkin
230230cf0b
lib/logger: add -loggerMaxArgLen command-line flag for fine-tuning the maximum length of logged args 2023-11-11 12:30:08 +01:00
Aliaksandr Valialkin
7ac49162c6
lib/storage: follow-up for 29cebd82fb
Use atomic.CompareAndSwapUint32() instead of atomic.LoadUint32() followed by atomic.StoreUint32().
This makes the code more clear.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159
2023-10-31 16:08:54 +01:00
Roman Khavronenko
29cebd82fb
lib/storage: log warning about RO mode only on state change ()
Before, vmstorage would log the same message each second producing excessive
amount of logs.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-30 10:52:57 +01:00
Aliaksandr Valialkin
42dd71bb63
all: consistently use %w instead of %s in when error is passed to fmt.Errorf()
This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.
2023-10-25 21:24:03 +02:00
hagen1778
fd2d07ba33
lib/storage: follow-up after 188cfe3a85
188cfe3a85

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-17 15:45:14 +02:00
Ilya Trefilov
188cfe3a85
lib/storage: do not create tsid if metric contains stale marker() ()
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069
2023-10-17 15:30:58 +02:00
Haleygo
8b6ccad41d
fix ingesting stale point, follow up fe8cc573d1 () 2023-10-16 09:05:37 +02:00
hagen1778
fe8cc573d1
docs: remove extra / in the end of the link
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-14 07:43:40 +02:00
Aliaksandr Valialkin
d41841c0c9
lib/{mergeset,storage}: consistently reset isInMerge field in parts passed to mergeParts() before returning from the function
While at it consistently check that the isInMerge field is set in all the parts passed to mergeParts()
2023-10-02 08:05:29 +02:00
Aliaksandr Valialkin
3ca6fea858
lib/{mergeset,storage}: perform at most one assisted merge per each call to addRows/addItems
This should reduce tail latency during data ingestion.

This shouldn't slow down data ingestion in the worst case, since assisted merges are spread among
distinct addRows/addItems calls after this change.
2023-10-01 22:19:46 +02:00
Aliaksandr Valialkin
223ef96198
lib/storage: remove unused atomicSetBool function after 717c53af27 2023-09-25 17:37:24 +02:00
Aliaksandr Valialkin
15dfd94f3b
lib/storage: make it clear that the number of big merge workers always equals to 4
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4915#issuecomment-1733922830
2023-09-25 17:15:45 +02:00
Aliaksandr Valialkin
717c53af27
lib/storage: stop exposing vm_merge_need_free_disk_space metric
This metric confuses users and has no any useful information.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686#issuecomment-1733844128
2023-09-25 16:52:39 +02:00
Aliaksandr Valialkin
3140ef7261
lib/storage: log fatal error inside searchMetricName() instead of propagating it to the caller
This simplifies the code a bit at searchMetricName() and searchMetricNameWithCache() call sites

This is a result of investigating https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4972
2023-09-22 11:41:06 +02:00
Zakhar Bessarab
bea3431ed1
lib/storage/partition: add check to ensure parts exist on disk ()
* lib/storage/partition: add check to ensure parts exist on disk

If part exists in parts.json but is missing on disk there will be a misleading error similar to "unexpected number of substrings in the part name".

This change forces verification of part existence and throws a correct error in case it is missing on disk.

Such issue can be result of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005 or disk corruption.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/partition: use filepath.Join instead of string concatenation

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/partition: add action points for error message

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* all: add a check for missing part in lib/mergeset and lib/logstorage

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-19 11:17:41 +02:00
faceair
b6ad581b45
lib/storage: remove ForceMergeAllParts internal loop ()
Signed-off-by: faceair <git@faceair.me>
2023-09-15 19:04:54 +02:00
Aliaksandr Valialkin
a09c680170
lib/storage: handle fatal errors inside indexSearch.getTSIDByMetricID() instead of returning them to the caller
This simplifies the code a bit at caller side
2023-09-15 11:55:42 +02:00
Dima Lazerka
0c7d46d637
flagutil: Make .Msecs private ()
* Introduce flagutil.Duration

To avoid conversion bugs

* Fix tests

* Clarify documentation re. month=31 days

* Add fasttime.UnixTime() to obtain time.Time

The goal is to refactor out the last usage of `.Msecs`.

* Use fasttime for time.Now()

* wip

- Remove fasttime.UnixTime(), since it doesn't improve code readability and maintainability
- Run `make docs-sync` for syncing changes from README.md to docs/ folder
- Make lib/flagutil.Duration.Msec private
- Rename msecsPerMonth const to msecsPer31Days in order to be consistent with retention31Days

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-03 10:33:37 +02:00
Aliaksandr Valialkin
edee262ecc
Makefile: update golangci-lint from v1.51.2 to v1.54.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2
2023-09-01 10:16:42 +02:00
Dima Lazerka
e0e856d2e7
Add flagutil.Duration to avoid conversion bugs ()
* Introduce flagutil.Duration

To avoid conversion bugs

* Fix tests

* Comment why not .Seconds()
2023-09-01 09:27:51 +02:00
Nikolay
c5aac34b68
lib/storage: properly caclucate nextRotationTimestamp ()
cause of typo unix millis was used instead of unix for current timestamp
calculation
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4873
2023-08-23 13:22:53 +02:00
Aliaksandr Valialkin
ac6c40e896
all: refer to https://docs.victoriametrics.com/#resource-usage-limits in the error message about -search.max* limit
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4827
2023-08-14 01:57:34 -07:00
Aliaksandr Valialkin
9082a84566
lib/storage: update nextRotationTimestamp relative to the timestamp of the indexdb rotation
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563
2023-07-28 19:48:10 -07:00
Nikolay
544fba6826
lib/storage: pre-create timeseries before indexDB rotation ()
* lib/storage: pre-create timeseries before indexDB rotation
during an hour before indexDB rotation start creating records at the next indexDB
it must improve performance during switch for the next indexDB and remove ingestion issues.
Since there is no need for creation new index records for timeseries already ingested into current indexDB
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563

* lib/storage: further work on indexdb rotation optimization

- Document the change at docs/CHAGNELOG.md
- Move back various caches from indexDB to Storage. This makes the change less intrusive.
  The dateMetricIDCache now takes into account indexDB generation, so it stores (date, metricID)
  entries for both the current and the next indexDB.
- Consolidate the code responsible for idbNext pre-filling into prefillNextIndexDB() function.
  This improves code readability and maintainability a bit.
- Rewrite and simplify the code responsible for calculating the next retention timestamp.
  Add various tests for corner cases of this code.
- Remove indexdb pre-filling from RegisterMetricNames() function, since this function is rarely called.
  It is OK to add indexdb entries on demand in this function. This simplifies the code.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

* docs/CHANGELOG.md: refer to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-07-22 15:20:21 -07:00
Aliaksandr Valialkin
140e7b6b74
all: replace atomic.Value with atomic.Pointer[T]
This eliminates the need in .(*T) casting for results obtained from Load()

Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map,
because map is already a pointer type.
2023-07-19 17:42:06 -07:00
Aliaksandr Valialkin
8815080030
app/vmselect/promql: add the ability to copy all the labels from one side of group_left()/group_right() operation
This is performed by specifying `*` inside group_left()/group_right().
Also allow specifying prefix for the copied labels via `group_left(...) prefix "..."` and `group_right(...) prefix "..."` syntax.
For example, the following query adds all the namespace-related labels to pod info, and prefixes all the copied label names with "ns_" prefix:

  kube_pod_info * on(namespace) group_left(*) prefix "ns_" kube_namespace_labels

This resolves the following StackOverflow questions:

- https://stackoverflow.com/questions/76661818/how-to-add-namespace-labels-to-pod-labels-in-prometheus
- https://stackoverflow.com/questions/76653997/how-can-i-make-a-new-copy-of-kube-namespace-labels-metric-with-a-different-name
2023-07-17 19:07:39 -07:00
Aliaksandr Valialkin
6685f6ce7c
lib/storage: move series registration in caches from createAllIndexesForMetricName into a separate function - putSeriesToCache
This makes the code more clear and easier to read

This is a follow-up for 7094fa38bc
2023-07-13 23:13:23 -07:00
Aliaksandr Valialkin
3dacdcb707
lib/storage: optimize BenchmarkIndexDBGetTSIDs()
- Sort MetricName tags only once before the benchmark loop.
- Obtain indexSearch per each benchmark loop in order to give a chance for background merge
  for the recently created parts
2023-07-13 21:56:53 -07:00