Commit graph

211 commits

Author SHA1 Message Date
Zakhar Bessarab
1c599d9661
Merge tag 'v1.109.0' into oss/pmm-6401-read-prometheus-data-files 2025-01-14 18:51:09 +04:00
Nikolay
e9f86af7f5
lib/storage: add a hint for merge about type of parts in merge (#7998)
Hint allows to choose type of cache to be used for index search:
- in-memory parts are storing recently ingested samples and should use
main cache. This improves ingestion speed and cache hit ration for
queries accessing recently ingested samples.
- merges of file parts is performed in background, using a separate
cache allows avoiding pollution of the main cache with irrelevant
entries.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7182

---------

Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-01-10 16:01:39 +04:00
Zakhar Bessarab
9f9cc24e4c
Revert "lib/mergeset: add sparse indexdb cache (#7269)"
This reverts commit 837d0d136d.
2024-11-04 10:29:14 -03:00
Zakhar Bessarab
837d0d136d
lib/mergeset: add sparse indexdb cache (#7269)
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7182

- add a separate index cache for searches which might read through large
amounts of random entries. Primary use-case for this is retention and
downsampling filters, when applying filters background merge needs to
fetch large amount of random entries which pollutes an index cache.
Using different caches allows to reduce effect on memory usage and cache
efficiency of the main cache while still having high cache hit rate. A
separate cache size is 5% of allowed memory.

- reduce size of indexdb/dataBlocks cache in order to free memory for
new sparse cache. Reduced size by 5% and moved this to a separate cache.

- add a separate metricName search which does not cache metric names -
this is needed in order to allow disabling metric name caching when
applying downsampling/retention filters. Applying filters during
background merge accesses random entries, this fills up cache and does
not provide an actual improvement due to random access nature.


Merge performance and memory usage stats before and after the change:

- before

![image](https://github.com/user-attachments/assets/485fffbb-c225-47ae-b5c5-bc8a7c57b36e)


- after

![image](https://github.com/user-attachments/assets/f4ba3440-7c1c-4ec1-bc54-4d2ab431eef5)

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-10-24 15:21:17 +02:00
f41gh7
cf7eb6bc7c v1.105.0
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEcUy2K0fVfVQqlyTqRVgxHPd17HIFAmcWVZgACgkQRVgxHPd1
 7HJ7yg/+LToSxly5iKgyZlyBToTjWIs+NhPyrDJaDXHzXkxMdcc/p43WFazjKD0A
 Sp47oKqSDUU9Bde32mk97jKq6INHQGY3SWKg6EY8pKtTtiJFol9O1Tn7wOFVr9hK
 bcfs8Q+Ibbue/YaDAKM7oaZdSfSGPA8O6vqJPtAaMgRDb7J1mBTA5a2Cs3utE30C
 FRz0wkqwf/zEyle8Tg7e2GXmn3RleiWpinhPyQg4oVoxvid4DCNSAMzmc/gZogN3
 twcf/ynH7RfysoP4iQc6Bsc417lkJvA6TcKLjm+VP6yzXcSXyqwoQSbT7zNdOgwz
 9d7M6LpJZ75voVO18f77pZj/BEYjjAlFrxGxAtT/WAEml/fDYT6NHpLpSmwWXweX
 uJjI5SLr92/0rNWnMicSC/pzd4MQOxjSfF9ij7GqPXxeFt8hGrE5fqbYHz/DXQvM
 kEMtsjDVn40FsXwz0Jxti/zPBI0J/AJlkFJF9xp0jLXYbgDb+3KaJJ3MfmHciw3V
 NrSus28nlfyMba5ktES0ZszWeJk0MTLKmiw9Q6otLDo1gtHW66ijQeIHkEIJ5KhR
 EEAYTEZyXujX96cAfrINHzFJNLFA6Zbx6oKnAZx/mMHbfrt15vd9mVb5NMWcKl65
 DDPF7dFZDB/t3HYPPmwxzdN+LptZAmtQMHh91kJYTuhKhs71lH8=
 =spFl
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEcUy2K0fVfVQqlyTqRVgxHPd17HIFAmcWiOIACgkQRVgxHPd1
 7HL4Ow//fv7iw0673L/64oGsH1XgHJKfualZj98ql3bWu4iD/LZ4XT/zqUqQD4cA
 80gtMkudTu+qAqDCmj/tJhqaDO5bIChzyLE7Esk5r+7sM1vP7Na1lxZS96r2F2Gg
 1E4gWKbp1e+Ms+ud7d8+B5cbFndZRw3rBxpsONyqQaE/GzpZv1mUomeDUEkSy0Oi
 qUo4V65Ei1ZXDN8saBb0zKDOhTPcICfhmMMyRMcF7wkAR4JRt84nmZrHZ6AORW8K
 xEuz1bXihT0HnLaxQsuPG9WCL0xOqTOnzL2Amtw5sPi6dLOcd6Lp8Z79B/uP3+Iy
 NMOLaaMJldM3pc+ZNDxYKAx4cSzOmECAs9ldiGa2QoxzEAQ4qrNfG/mdrauExVW3
 vVJ0uK5S+GL+rEKGQcD1d7fkTDizPXjuWCgCwmM0j84jriF/0slNcFe+5bPBrw3z
 vvUTyuYsv16abHraEbUq5G5ekRKhAd6QkAzyzTKcrNiKqYL5zlyJAjkfinMdhf9e
 hBvtyZqvkhxgRRL1WlAFsQY+QwkWHLbrTQU8AKB2G4qLQfiqn9Lald6LD/A/HG4E
 uXba+ndGuM0anB43L+W9UjQmF31urxLBuLag59J4JhEmKMYGrxeK4xnhpH8k1hA/
 7T4eREKl03R3IaTg9taOJI6vvnuWxGJ0Q5B/AZ3z32fLXdIMaGc=
 =glUW
 -----END PGP SIGNATURE-----

Merge tag 'v1.105.0' into pmm-6401-read-prometheus-data-files-cpc

v1.105.0
2024-10-21 19:01:12 +02:00
Roman Khavronenko
05ac508fbf
lib/flagutil: rename Duration to RetentionDuration (#7284)
The purpose of this change is to reduce confusion between using
`flag.Duration` and `flagutils.Duration`. The reason is that
`flagutils.Duration` was mistakenly used for cases that required `m`
support. See
ab0d31a7b0

The change in name should clearly indicate the purpose of this data
type.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-17 13:47:48 +02:00
hagen1778
2404b4bc00 v1.104.0
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkhL6N9vmSTjg0VSVO/dfN0HKlkAFAmb9KqIACgkQO/dfN0HK
 lkAXcBAAluUXBa4oYV4g/xvsd30oXtC79DoFY527K1cTgesDohf0FdLGU++7Aphm
 efOR8BaytBPGHGn9PmuZIbebiFv6TVBih7b8gl+frm/yGLh/1WyAYp2sClB1KcJa
 r7rHBMF7sikDkLPFlJv9qYhERj05aUTc/uwWn7KzUMPbmUZcXOJhxttm1Hf7Rc6P
 zcO1cymSEouzSOw0qoHFHRZYgkt9j1GW36vUgEX6+b3VJvOAhoaolw6OX65wt8Cm
 +YdXW51gEalZRIRNtgY3lDJnCAHn72RsRbLpylyGW1TcuBnwfSIWlPpLU04IGVlx
 06Vl47o/6vEBoVKk+2Y6La4iwD8+x/Td1RlrELOo4Qzrv1ppqOCveUa0wh6JQfjB
 aQawE7Yzh35qKvRVZtgY8NaUzkTL2QISlnpkokHfZZLIn6WAhok4c+vxnCl5CaBE
 3yRenqZ/OdMs+Wa8WMb6thcxA0eQ40t3B35iYyvMJdhSKDtdNT2F5kFh7ve6Woiu
 2TmN+GWPM0zBMVEVGy1i1L+42dlG6ANY3p5a8vz0qfqBBJF+V+P/BetfejTPjJ7r
 PN6HpdcfN+a+FGsUWckhFSU7z0LFJIytQyxb6vGn5N1UW0pupQMs5E/jFcFJl2/Q
 yO8WZmGm4QfhupcfgAkTIBgsUliqmIBXsNk6sjhzwbxYBtXSqwQ=
 =F+22
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkhL6N9vmSTjg0VSVO/dfN0HKlkAFAmb9XnUACgkQO/dfN0HK
 lkAYJw/8D+fAkqo48iNynRZf8N7kR22XBVc+zvJIFIL8wyScMec3o6+rRQ5WmmEk
 MxT73EWW2Nv81L4JC585u3zutM7Ow7nG1paxrF2hWLNAniKJd+Z+okRWThf89c8Q
 IC2egeVtgQ9ADNBTNGF72FsBBj+P6rv3Xe/M0XSLCS4mLY1eVnhdx7yuQsSNkzpr
 hxndq5odwEprFNXe9WEgH04ekS3u0ZMzWidhSHJpZVXDt6iFTxfoD+NkYpPRIZuc
 KwE0Zm1eTn98MJNZvoVyJ2hbD3f513I5yvdaNMFZ0I08Dh281uugYZu8r7mwqS49
 0uCC9PoEuErYbCGCGjmXOGVnyB6vvRjIfIOif/M1KqpH5g7xTKWc9S23P2ib3HgI
 brFl5EDl1Qa+qnkwWC98G58b85hjTJjLYhbst+O/MW+j6W2zihrt0N9UsKKTPgzj
 xvLhYz97wF0GCOfD5sZyyMdTCI6QWqtbE79ysHw+WCSrbZIKh6MFp6eO6qQF3JWT
 9IPT6O9G57Q9iwtS+MSVgriobE7qV/fHB/ICiciTGtsYfsovwxnq8BJuBiehwqau
 deqf4gbsZQiME1i+o9nnOcekDXkziKnkJIv8E5NBq77NQEzliSwfHwoaTtEusj7n
 4XbgRX37B8XtANVg1twWZb8gYFtxqYoojymAKx/Ag2e4I3qnzbM=
 =e9ts
 -----END PGP SIGNATURE-----

Merge tag 'v1.104.0' into pmm-6401-read-prometheus-data-files

v1.104.0

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-02 16:53:41 +02:00
Roman Khavronenko
0d4f4b8f7d
(app|lib)/vmstorage: do not increment vm_rows_ignored_total on NaNs (#7166)
`vm_rows_ignored_total` metric is a metric for users to signalize about
ingestion issues, such as bad timestamp or parsing error.
In commit
a5424e95b3
this metric started to increment each time vmstorage gets NaN. But NaN
is a valid value for Prometheus data model and for Prometheus metrics
exposition format. Exporters from Prometheus ecosystem could expose NaNs
as values for metrics and these values will be delivered to vmstorage
and increment the metric.
Since there is nothing user can do with this, in opposite to parsing
errors or bad timestamps, there is not much sense in incrementing this
metric. So this commit rolls-back `reason="nan_value"` increments.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-10-02 12:37:27 +02:00
Artem Fetishev
ed5da38ede
Introduce a flag for limiting the number of time series to delete (#7091)
### Describe Your Changes

Introduce the `-search.maxDeleteSeries` flag that limits the number of
time series that can be deleted with a single
`/api/v1/admin/tsdb/delete_series` call.

Currently, any number can be deleted and if the number is big (millions)
then the operation may result in unaccounted CPU and memory usage spikes
which in some cases may result in OOM kill (see #7027). The flag limits
the number to 30k by default and the users may override it if needed at
the vmstorage start time.


---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-09-30 10:02:21 +02:00
Artem Fetishev
a5424e95b3
lib/storage: adds metrics that count records that failed to insert
### Describe Your Changes

Add storage metrics that count records that failed to insert:

- `RowsReceivedTotal`: the number of records that have been received by
the storage from the clients
- `RowsAddedTotal`: the number of records that have actually been
persisted. This value must be equal to `RowsReceivedTotal` if all the
records have been valid ones. But it will be smaller otherwise. The
values of the metrics below should provide the insight of why some
records hasn't been added
-   `NaNValueRows`: the number of records whose value was `NaN`
- `StaleNaNValueRows`: the number of records whose value was `Stale NaN`
- `InvalidRawMetricNames`: the number of records whose raw metric name
has failed to unmarshal.

The following metrics existed before this PR and are listed here for
completeness:

- `TooSmallTimestampRows`: the number of records whose timestamp is
negative or is older than retention period
- `TooBigTimestampRows`: the number of records whose timestamp is too
far in the future.
- `HourlySeriesLimitRowsDropped`: the number of records that have not
been added because the hourly series limit has been exceeded.
- `DailySeriesLimitRowsDropped`: the number of records that have not
been added because the daily series limit has been exceeded.

---
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
2024-09-06 17:57:21 +02:00
f41gh7
2557e66ee0
Merge tag 'tags/v1.102.1' into pmm-6401-read-prometheus-data-files-cpc 2024-08-02 11:20:14 +02:00
Aliaksandr Valialkin
9c4b0334f2
all: consistently use stringsutil.JSONString() for formatting JSON strings with fmt.* functions instead of using "%q" formatter
The %q formatter may result in incorrectly formatted JSON string if the original string
contains special chars such as \x1b . They must be encoded as \u001b , otherwise the resulting JSON string
cannot be parsed by JSON parsers.

This is a follow-up for c0caa69939

See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24
2024-07-17 13:52:13 +02:00
rtm0
bdc0e688e8
Fix inconsistent error handling in Storage.AddRows() (#6583)
### Describe Your Changes

`Storage.AddRows()` returns an error only in one case: when
`Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But
the same error is treated as a warning when it happens inside
`Storage.add()` or returned by `Storage.prefillNextIndexDB()`.

This commit fixes this inconsistency by treating the error returned by
`Storage.updatePerDateData()` as a warning as well. As a result
`Storage.add()` does not need a return value anymore and so doesn't
`Storage.AddRows()`.

Additionally, this commit adds a unit test that checks all cases that
result in a row not being added to the storage.



---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-07-17 12:07:14 +02:00
Aliaksandr Valialkin
233e5f0a9e
lib/httpserver: skip basic auth check for additional request paths, which should call httpserver.CheckAuthFlag()
This is a follow-up for 61dce6f2a1

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6338
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329
2024-07-16 01:00:45 +02:00
Aliaksandr Valialkin
202e5704e6
vendor: update github.com/VictoriaMetrics/metrics from v1.34.1 to v1.35.0
Fix potential memory leaks across VictoriaMetrics codebase after metrics.UnregisterSet(s) call
because of missing s.UnregisterAllMetrics() call.

This is a follow-up for 6a6e34ab8e . It is OK if some vmauth metrics
aren't visible for a few microseconds when the previous metrics are unregistered and new metrics
weren't registered yet.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6247
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6252
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5805
2024-07-15 10:43:37 +02:00
Nikolay
69d244e6fb
lib/mergeset: adds tracking for indexdb records drop (#6297)
It allows to create alert for possible item drops at indexdb. It may
happen, if ingested metric size exceeds max indexdb item size.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-05-24 14:55:20 +02:00
hagen1778
381d4494e9 v1.101.0
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkhL6N9vmSTjg0VSVO/dfN0HKlkAFAmYqbhsACgkQO/dfN0HK
 lkBDhRAAhUO7SsbCCHZo4Azdw+G32J05LsYvKMGl0r+j2fYRPovl7Sgf/HdZUKxk
 4aOXPPl8YogHWHVv8qrmFXl7gPWRNaFtCxmVlVIv+eEzwzN18tH2Umn+PwfQTtmN
 VM7ujy54rH8z28AGII8P8h4s/0kNVPGwPP9gEifm8ICIXtKpdnvbtkpAoCFEZvYf
 b5chm/3NQRA1R7c+yRxVs9YH15+XgYG0z/onVaVnjUxPXvme64v3RL+nt/ezimMo
 PXDBt5HRXa6lWIxM+g3oaGJ9/qFKwTrHykXgx3oPPWsphJMVW8ltt8sqg6sGuRJz
 fD/iRjpHIGAfD/2BX90TOMyYbC+s921rPU0+aQ70U5mPU+f8E1fI1HNVlsJiZ9NL
 Xhj9GOJzNQP2moql1dsDibZXhO0aIMfweHduXN7KRK88IPtnQdy1Sj6lhAJdJ2iH
 q1s5ShDx9gLLA2ecuL4COA9tQxTncnTZdsU4Y1bnSif0Iuct03L84ovaCSAuJ5BP
 XrwVo0Vk2albDpw8n2Dzq7Xquiewyb9IlaQ8U5B/tdKSpH4aAydy56PgdC+gHaZk
 6c2aBf0HKmg3qxsp/xb593cWloToPgsgB0KB2m7b+nEPBLP62obzBEeS8P5ahrJB
 UmPA7tw6BlYT93JttotFn+gykZjAELcbHkO8Yoe7JnVQMA8irFs=
 =ecyf
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkhL6N9vmSTjg0VSVO/dfN0HKlkAFAmYrkEYACgkQO/dfN0HK
 lkAj0w//WUXbB/gR3P47t6dNSX0P0qb3D/+AQ2+he/wo3mJ1msd3XtkpiUHcpP0k
 qtrYFrY5wQQ4lC82VMOWdlw7YY5E4ah2tbYCAUBpgpp3Lu7iD/muiLlfwQslwYUy
 PzwKH5HBQQmgtKgGX4miSeVk95TUyzE5m70dF4atIY8ydK73ZqiSV+IC9/cYah2y
 Q4Y9xJZnSMR1cKdMfTpYR0s9gPg5bB9yAKq9qB8TQfxMnW2A8wkhvwf675mJJCZ+
 spRTXzrcKp3thKWmDowTtzu/ONYTRcpQfgiE5MxzySnHQcv4nQnf3jxeT1+1K+pk
 5jI5bFAjkRVy1u1oDsbjHySdFyt1jZA6Klw9dlGf9EVjXfr0jbAUO++8T538CCfW
 UUSx2h83GTvSMvVaCDCtbYdlHZwxgLTwJvFDdcpm4nci9u3wKsfnoKw6doskt1fs
 Sp241F46Ck5embAdtv1FJaGYvH8PVe4j24slBWvn5vhN8TcP7mcsw8DCGjKWfSVS
 JQQxxjCmAyxQOZj+9k2v3wlRjmHoRQ0ELu7EbYQW3eyiYi3bWVQkK5Mg4Z0knwGl
 HWWum7LMwkM59hIAc771VGOz3jELGWXBTZWP8FcoRmtzcvBnzVjndDAiwH89xhR6
 Kym5JKrCkVvytee6xNxkBOIyuiavcB2qoZ7IqhHAYnqF9uhMx/E=
 =Ppg1
 -----END PGP SIGNATURE-----

Merge tag 'v1.101.0' into pmm-6401-read-prometheus-data-files

v1.101.0

Signed-off-by: hagen1778 <roman@victoriametrics.com>

# gpg: Signature made чт 25 кві 16:52:11 2024 CEST
# gpg:                using RSA key 9212FA37DBE64938E0D154953BF75F3741CA9640
# gpg: Good signature from "hagen1778 (VM GPG key) <roman@victoriametrics.com>" [ultimate]

# Conflicts:
#	go.mod
2024-04-26 13:30:14 +02:00
Aliaksandr Valialkin
e9642e99f2
all: replace old https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html url with the new one - https://docs.victoriametrics.com/single-server-victoriametrics/ 2024-04-18 03:11:03 +02:00
Aliaksandr Valialkin
b7b731d340
Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files 2024-04-04 03:49:49 +03:00
Nikolay
a05303eaa0
lib/storage: adds metrics for downsampling (#382)
* lib/storage: adds metrics for downsampling
vm_downsampling_partitions_scheduled - shows the number of parts, that must be downsampled
vm_downsampling_partitions_scheduled_size_bytes - shows total size in bytes for parts, the must be donwsampled

These two metrics answer the questions - is downsampling running? how many parts scheduled for downsampling and how many of them currently downsampled? Storage space that it occupies.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2612

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-03-30 01:11:49 +02:00
Aliaksandr Valialkin
5c2f85f38d
vendor: run make vendor-update 2024-03-01 02:38:41 +02:00
Aliaksandr Valialkin
340638d4b0
app/vmstorage: cleanup after 9bad52b687 2024-02-23 04:55:17 +02:00
Aliaksandr Valialkin
9bad52b687
app/vmstorage: deprecate -snapshotCreateTimeout command-line flag
Creating snapshot shouldn't time out under normal conditions.
The timeout was related to the bug, which has been fixed in 6460475e3b .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
2024-02-23 04:49:23 +02:00
Aliaksandr Valialkin
8d9d7a8a12
app/vmstorage: expose vm_snapshots metric, which shows the current number of snapshots
While at it, refresh docs about snapshots - https://docs.victoriametrics.com/#how-to-work-with-snapshots
2024-02-22 18:32:57 +02:00
Aliaksandr Valialkin
6b9bedd0f9
app/vmstorage: expose vm_last_partition_parts metrics, which may help identifying performance issues related to the increased number of parts in the last partition 2024-02-15 14:51:19 +02:00
Aliaksandr Valialkin
0b503fba0b
Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files 2024-01-30 22:55:54 +02:00
Aliaksandr Valialkin
bb7a419cc3
lib/{mergeset,storage}: make background merge more responsive and scalable
- Maintain a separate worker pool per each part type (in-memory, file, big and small).
  Previously a shared pool was used for merging all the part types.
  A single merge worker could merge parts with mixed types at once. For example,
  it could merge simultaneously an in-memory part plus a big file part.
  Such a merge could take hours for big file part. During the duration of this merge
  the in-memory part was pinned in memory and couldn't be persisted to disk
  under the configured -inmemoryDataFlushInterval .

  Another common issue, which could happen when parts with mixed types are merged,
  is uncontrolled growth of in-memory parts or small parts when all the merge workers
  were busy with merging big files. Such growth could lead to significant performance
  degradataion for queries, since every query needs to check ever growing list of parts.
  This could also slow down the registration of new time series, since VictoriaMetrics
  searches for the internal series_id in the indexdb for every new time series.

  The third issue is graceful shutdown duration, which could be very long when a background
  merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted,
  since it merges in-memory parts.

  A separate pool of merge workers per every part type elegantly resolves both issues:
  - In-memory parts are merged to file-based parts in a timely manner, since the maximum
    size of in-memory parts is limited.
  - Long-running merges for big parts do not block merges for in-memory parts and small parts.
  - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files.
    Merging for file parts is instantly canceled on graceful shutdown now.

- Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm
  should automatically self-tune according to the number of available CPU cores.

- Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly.
  It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge

- Tune the number of shards for pending rows and items before the data goes to in-memory parts
  and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate
  for registration of new time series. This should reduce the duration of data ingestion slowdown
  in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily
  unavailable.

- Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown.

This is a follow-up for fa566c68a6 .
Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2024-01-26 22:27:47 +01:00
Aliaksandr Valialkin
c3a585cfe5
lib/storage: rename *AssistedMerges to *AssistedMergesCount in order to make these field names less misleading
These fields are counters, not gauges, so adding Count suffix to them makes easier to understand this while reading the code
2024-01-25 10:19:32 +02:00
Aliaksandr Valialkin
18df07e824
lib/mergeset: start assisted merge for file parts only if the number of file parts is bigger than maxFileParts
The maxFileParts usage has been accidentally removed in fa566c68a6

While at it, add Count suffix to *AssistedMerges counter names in order to make them less misleading.
Previously their names were falsely suggesting that these are gauges, which show the number of concurrently
executed assisted merges.
2024-01-24 15:08:42 +02:00
Aliaksandr Valialkin
3449d563bd
all: add up to 10% random jitter to the interval between periodic tasks performed by various components
This should smooth CPU and RAM usage spikes related to these periodic tasks,
by reducing the probability that multiple concurrent periodic tasks are performed at the same time.
2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin
1f105dde98
all: allow dynamically reading *AuthKey flag values from files and urls
Examples:

1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath
2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath
3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url

The flag value is automatically updated when the file contents changes.
2024-01-21 22:03:38 +02:00
Aliaksandr Valialkin
7fc2bd0412
app/vmstorage: expose proper types for storage metrics when -metrics.exposeMetadata command-line flag is set
This is a follow-up for 326a77c697
2024-01-16 00:20:37 +02:00
Aliaksandr Valialkin
5106045048
app/vmstorage: deregister storage metrics before stopping the storage
This prevents from possible nil pointer dereference issues when the storage metrics
are read after the storage is stopped.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548
2024-01-15 16:11:45 +02:00
Aliaksandr Valialkin
31a3672982
Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files 2023-10-02 22:36:23 +02:00
Aliaksandr Valialkin
717c53af27
lib/storage: stop exposing vm_merge_need_free_disk_space metric
This metric confuses users and has no any useful information.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686#issuecomment-1733844128
2023-09-25 16:52:39 +02:00
Aliaksandr Valialkin
f7d0d3a229
app/vmstorage: fix after 0c7d46d637: retentionPeriod.Msecs -> retentionPeriod.Milliseconds() 2023-09-09 06:20:42 +02:00
Aliaksandr Valialkin
af85055f3a
Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files 2023-09-09 06:18:18 +02:00
Aliaksandr Valialkin
24d61bf193
lib/flagutil: add Duration.Milliseconds() convenience function after 0c7d46d637
This function is a faster replacement for Duration.Duration().Milliseconds() call
2023-09-03 10:56:44 +02:00
Dima Lazerka
0c7d46d637
flagutil: Make .Msecs private (#4906)
* Introduce flagutil.Duration

To avoid conversion bugs

* Fix tests

* Clarify documentation re. month=31 days

* Add fasttime.UnixTime() to obtain time.Time

The goal is to refactor out the last usage of `.Msecs`.

* Use fasttime for time.Now()

* wip

- Remove fasttime.UnixTime(), since it doesn't improve code readability and maintainability
- Run `make docs-sync` for syncing changes from README.md to docs/ folder
- Make lib/flagutil.Duration.Msec private
- Rename msecsPerMonth const to msecsPer31Days in order to be consistent with retention31Days

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-03 10:33:37 +02:00
Dima Lazerka
e0e856d2e7
Add flagutil.Duration to avoid conversion bugs (#4835)
* Introduce flagutil.Duration

To avoid conversion bugs

* Fix tests

* Comment why not .Seconds()
2023-09-01 09:27:51 +02:00
Aliaksandr Valialkin
9f1e9c54c8
Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files 2023-07-26 14:59:49 -07:00
Nikolay
544fba6826
lib/storage: pre-create timeseries before indexDB rotation (#4652)
* lib/storage: pre-create timeseries before indexDB rotation
during an hour before indexDB rotation start creating records at the next indexDB
it must improve performance during switch for the next indexDB and remove ingestion issues.
Since there is no need for creation new index records for timeseries already ingested into current indexDB
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563

* lib/storage: further work on indexdb rotation optimization

- Document the change at docs/CHAGNELOG.md
- Move back various caches from indexDB to Storage. This makes the change less intrusive.
  The dateMetricIDCache now takes into account indexDB generation, so it stores (date, metricID)
  entries for both the current and the next indexDB.
- Consolidate the code responsible for idbNext pre-filling into prefillNextIndexDB() function.
  This improves code readability and maintainability a bit.
- Rewrite and simplify the code responsible for calculating the next retention timestamp.
  Add various tests for corner cases of this code.
- Remove indexdb pre-filling from RegisterMetricNames() function, since this function is rarely called.
  It is OK to add indexdb entries on demand in this function. This simplifies the code.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

* docs/CHANGELOG.md: refer to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-07-22 15:20:21 -07:00
Aliaksandr Valialkin
7094fa38bc
lib/storage: switch from global to per-day index for MetricName -> TSID mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).

The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.

The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.

VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:

- If `storage/tsid` cache capacity isn't enough for active time series.
  Then just increase available memory for VictoriaMetrics or reduce the number of active time series
  ingested into VictoriaMetrics.

- If new time series is ingested into VictoriaMetrics. In this case it cannot find
  the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
  since it doesn't know that the index has no the corresponding entry too.
  This is a typical event under high churn rate, when old time series are constantly substituted
  with new time series.

Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.

Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.

This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.

The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.

This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .

At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698

The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685

This is a follow-up for 1f28b46ae9
2023-07-13 16:07:30 -07:00
Aliaksandr Valialkin
a2e224593e
Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files 2023-07-06 23:51:50 -07:00
Dmytro Kozlov
24f34347f1
docs: clarify -retentionPeriod flag usage (#4417)
app/vmstorage: clarify the min value for `-retentionPeriod` flag

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-06-09 09:46:25 +02:00
Aliaksandr Valialkin
43f0baabcd
Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files 2023-05-18 12:34:50 -07:00
Alexander Marshalov
2e494e2375
fixed typos in documentation and commandline flags descriptions (#4275) 2023-05-10 09:50:41 +02:00
Aliaksandr Valialkin
eba0e6dbc0
Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files 2023-05-09 23:33:49 -07:00
Aliaksandr Valialkin
52006149b2
lib/storage: replace OpenStorage() with MustOpenStorage()
Callers of OpenStorage() log the returned error and exit.
The error logging and exit can be performed inside MustOpenStorage()
alongside with printing the stack trace for better debuggability.
This simplifies the code at caller side.
2023-04-14 23:02:40 -07:00
Aliaksandr Valialkin
e1211a1187
app/vmstorage: deprecate -bigMergeConcurrency command-line flag
Improperly configured -bigMergeConcurrency command-line flag usually leads to uncontrolled
growth of unmerged parts, which, in turn, increases CPU usage and query durations.

So it is better deprecating this flag. In rare cases -smallMergeConcurrency command-line flag
can be used instead for controlling the concurrency of background merges.
2023-04-13 20:40:24 -07:00