github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Artem Fetishev	3383589fd1	lib/storage: confirm that changing retention period can cause previous indexDB deletion (#7569 ) ### Describe Your Changes Add test cases proving that it is possible to lose indexDB after changing the retention period. See #7609 ### Checklist The following checks are mandatory: - [x ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-11-21 10:44:21 +01:00
Artem Fetishev	6b9f57e5f7	lib/storage: Fix flaky test: TestStorageRotateIndexDB (#7267 ) This commit fixes the TestStorageRotateIndexDB flaky test reported at: #6977. Sample test failure: https://pastebin.com/bTSs8HP1 The test fails because one goroutine adds items to the indexDB table while another goroutine is closing that table. This may happen if indexDB rotation happens twice during one Storage.add() operation: - Storage.add() takes the current indexDB and adds index recods to it - First index db rotation makes the current index DB a previous one (still ok at this point) - Second index db rotation removes the indexDB that was current two rotations earlier. It does this by setting the mustDrop flag to true and decrementing the ref counter. The ref counter reaches zero which cases the underlying indexdb table to release its resources gracefully. Graceful release assumes that the table is not written anymore. But Storage.add() still adds items to it. The solution is to increment the indexDB ref counters while it is used inside add(). The unit test has been changed a little so that the test fails reliably. The idea is to make add() function invocation to last much longer, therefore the test inserts not just one record at a time but thouthands of them. To see the test fail, just replace the idbsLocked() func with: ```go unc (s Storage) idbsLocked2() (indexDB, *indexDB, func()) { return s.idbCurr.Load(), s.idbNext.Load(), func() {} } ``` --------- Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>	2024-10-23 11:48:21 +02:00
Roman Khavronenko	0d4f4b8f7d	(app\|lib)/vmstorage: do not increment `vm_rows_ignored_total` on NaNs (#7166 ) `vm_rows_ignored_total` metric is a metric for users to signalize about ingestion issues, such as bad timestamp or parsing error. In commit `a5424e95b3` this metric started to increment each time vmstorage gets NaN. But NaN is a valid value for Prometheus data model and for Prometheus metrics exposition format. Exporters from Prometheus ecosystem could expose NaNs as values for metrics and these values will be delivered to vmstorage and increment the metric. Since there is nothing user can do with this, in opposite to parsing errors or bad timestamps, there is not much sense in incrementing this metric. So this commit rolls-back `reason="nan_value"` increments. ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-10-02 12:37:27 +02:00
Artem Fetishev	ed5da38ede	Introduce a flag for limiting the number of time series to delete (#7091 ) ### Describe Your Changes Introduce the `-search.maxDeleteSeries` flag that limits the number of time series that can be deleted with a single `/api/v1/admin/tsdb/delete_series` call. Currently, any number can be deleted and if the number is big (millions) then the operation may result in unaccounted CPU and memory usage spikes which in some cases may result in OOM kill (see #7027). The flag limits the number to 30k by default and the users may override it if needed at the vmstorage start time. --------- Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-09-30 10:02:21 +02:00
Artem Fetishev	a5424e95b3	lib/storage: adds metrics that count records that failed to insert ### Describe Your Changes Add storage metrics that count records that failed to insert: - `RowsReceivedTotal`: the number of records that have been received by the storage from the clients - `RowsAddedTotal`: the number of records that have actually been persisted. This value must be equal to `RowsReceivedTotal` if all the records have been valid ones. But it will be smaller otherwise. The values of the metrics below should provide the insight of why some records hasn't been added - `NaNValueRows`: the number of records whose value was `NaN` - `StaleNaNValueRows`: the number of records whose value was `Stale NaN` - `InvalidRawMetricNames`: the number of records whose raw metric name has failed to unmarshal. The following metrics existed before this PR and are listed here for completeness: - `TooSmallTimestampRows`: the number of records whose timestamp is negative or is older than retention period - `TooBigTimestampRows`: the number of records whose timestamp is too far in the future. - `HourlySeriesLimitRowsDropped`: the number of records that have not been added because the hourly series limit has been exceeded. - `DailySeriesLimitRowsDropped`: the number of records that have not been added because the daily series limit has been exceeded. --- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-09-06 17:57:21 +02:00
Artem Fetishev	39294b4919	lib/storage: do not drop stale NaN samples (#6936 ) This patch reverts `1fd3385` After discussing it we've come to conclusion that this is a valid behavior which can be avoided by deleting the time series only once the corresponding stale NaNs have been received. On the other hand, the fix leads to lost stale NaNs in some rare but valid use cases. For example: - In a cluster configuration the samples for a given time series are normally sent to the same vmstorage replica. However, wminsert may reroute the samples to another replica because the original one is down or is overloaded. In this case the stale NaN may end up on a replica that has no data for that time series, but we still want to record that sample. Thus, reverting that fix. --- related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069 Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-09-05 16:45:09 +02:00
rtm0	2c856c6951	tests: check Metrics.RowsAddedTotal in unit tests (#6895 ) ### Describe Your Changes This is a follow-up PR: Unit tests introduced in #6872 can now use RowsAddedTotal counter whose scope was fixed in #6841. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-08-30 14:31:15 +02:00
rtm0	9fcfba3927	lib/storage: properly handle maxMetrics limit at metricID search `TL;DR` This PR improves the metric IDs search in IndexDB: - Avoid seaching for metric IDs twice when `maxMetrics` limit is exceeded - Use correct error type for indicating that the `maxMetrics` limit is exceded - Simplify the logic of deciding between per-day and global index search A unit test has been added to ensure that this refactoring does not break anything. --- Function calls before the fix: ``` idb.searchMetricIDs \|__ is.searchMetricIDs \|__ is.searchMetricIDsInternal \|__ is.updateMetricIDsForTagFilters \|__ is.tryUpdatingMetricIDsForDateRange \| \| \|__ is.getMetricIDsForDateAndFilters ``` - `searchMetricIDsInternal` searches metric IDs for each filter set. It maintains a metric ID set variable which is updated every time the `updateMetricIDsForTagFilters` function is called. After each successful call, the function checks the length of the updated metric ID set and if it is greater than `maxMetrics`, the function returns `too many timeseries` error. - `updateMetricIDsForTagFilters` uses either per-day or global index to search metric IDs for the given filter set. The decision of which index to use is made is made within the `tryUpdatingMetricIDsForDateRange` function and if it returns `fallback to global search` error then the function uses global index by calling `getMetricIDsForDateAndFilters` with zero date. - `tryUpdatingMetricIDsForDateRange` first checks if the given time range is larger than 40 days and if so returns `fallback to global search` error. Otherwise it proceeds to searching for metric IDs within that time range by calling `getMetricIDsForDateAndFilters` for each date. - `getMetricIDsForDateAndFilters` searches for metric IDs for the given date and returns `fallback to global search` error if the number of found metric IDs is greater than `maxMetrics`. Problems with this solution: 1. The `fallback to global search` error returned by `getMetricIDsForDateAndFilters` in case when maxMetrics is exceeded is misleading. 2. If `tryUpdatingMetricIDsForDateRange` proceeds to date range search and returns `fallback to global search` error (because `getMetricIDsForDateAndFilters` returns it) then this will trigger global search in `updateMetricIDsForTagFilters`. However the global search uses the same maxMetrics value which means this search is destined to fail too. I.e. the same search is performed twice and fails twice. 3. `too many timeseries` error is already handled in `searchMetricIDsInternal` and therefore handing this error in `updateMetricIDsForTagFilters` is redundant 4. updateMetricIDsForTagFilters is a better place to make a decision on whether to use per-day or global index. Solution: 1. Use a dedicated error for `too many timeseries` case 2. Handle `too many timeseries` error in `searchMetricIDsInternal` only 3. Move the per-day or global search decision from `tryUpdatingMetricIDsForDateRange` to `updateMetricIDsForTagFilters` and remove `fallback to global search` error. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-08-27 21:39:03 +02:00
rtm0	eef6943084	lib/storage: properly register index records with RegisterMetricNames Once the timeseries is in tsidCache, new entries won't be created in per-day index because the RegisterMetricNames() code does consider different dates for the same timeseries. So this case has been added. The same bug exists for AddRows() but it is not manifested because the index entries are finally created in updatePerDateData(). RegisterMetricNames also updated to increase the newTimeseriesCreated counter because it actually creates new time series in index. A unit tests has been added that check all possible data patterns (different metric names and dates) and code branches in both RegisterMetricNames and AddRows. The total number of new unit tests is around 100 which increaded the running time of storage tests by 50%. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>	2024-08-27 21:33:53 +02:00
rtm0	bdc0e688e8	Fix inconsistent error handling in Storage.AddRows() (#6583 ) ### Describe Your Changes `Storage.AddRows()` returns an error only in one case: when `Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But the same error is treated as a warning when it happens inside `Storage.add()` or returned by `Storage.prefillNextIndexDB()`. This commit fixes this inconsistency by treating the error returned by `Storage.updatePerDateData()` as a warning as well. As a result `Storage.add()` does not need a return value anymore and so doesn't `Storage.AddRows()`. Additionally, this commit adds a unit test that checks all cases that result in a row not being added to the storage. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-07-17 12:07:14 +02:00
rtm0	a42bd59ee4	Fix Date metricid cache consistency under concurrent use (#6534 ) ### Describe Your Changes Fix Date metricid cache consistency under concurrent use. When one goroutine calls Has() and does not find the cache entry in the immutable map it will acquire a lock and check the mutable map. And it is possible that before that lock is acquired, the entry is moved from the mutable map to the immutable map by another goroutine causing a cache miss. The fix is to check the immutable map again once the lock is acquired. ### Checklist The following checks are mandatory: - [x ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-06-26 17:33:38 +02:00
Aliaksandr Valialkin	9bad52b687	app/vmstorage: deprecate -snapshotCreateTimeout command-line flag Creating snapshot shouldn't time out under normal conditions. The timeout was related to the bug, which has been fixed in `6460475e3b` . Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551	2024-02-23 04:49:23 +02:00
Aliaksandr Valialkin	431aa16c8d	lib/storage: keep (date, metricID) entries only for the last two dates Entries for the previous dates is usually not used, so there is little sense in keeping them in memory. This should reduce the size of storage/date_metricID cache, which can be monitored via vm_cache_entries{type="storage/date_metricID"} metric.	2024-01-29 18:43:59 +01:00
hagen1778	fd2d07ba33	lib/storage: follow-up after `188cfe3a85` `188cfe3a85` See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-17 15:45:14 +02:00
Ilya Trefilov	188cfe3a85	lib/storage: do not create tsid if metric contains stale marker(#5069 ) (#5174 ) https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069	2023-10-17 15:30:58 +02:00
Dima Lazerka	0c7d46d637	flagutil: Make .Msecs private (#4906 ) * Introduce flagutil.Duration To avoid conversion bugs * Fix tests * Clarify documentation re. month=31 days * Add fasttime.UnixTime() to obtain time.Time The goal is to refactor out the last usage of `.Msecs`. * Use fasttime for time.Now() * wip - Remove fasttime.UnixTime(), since it doesn't improve code readability and maintainability - Run `make docs-sync` for syncing changes from README.md to docs/ folder - Make lib/flagutil.Duration.Msec private - Rename msecsPerMonth const to msecsPer31Days in order to be consistent with retention31Days --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-09-03 10:33:37 +02:00
Dima Lazerka	e0e856d2e7	Add flagutil.Duration to avoid conversion bugs (#4835 ) * Introduce flagutil.Duration To avoid conversion bugs * Fix tests * Comment why not .Seconds()	2023-09-01 09:27:51 +02:00
Aliaksandr Valialkin	9082a84566	lib/storage: update nextRotationTimestamp relative to the timestamp of the indexdb rotation Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563	2023-07-28 19:48:10 -07:00
Nikolay	544fba6826	lib/storage: pre-create timeseries before indexDB rotation (#4652 ) * lib/storage: pre-create timeseries before indexDB rotation during an hour before indexDB rotation start creating records at the next indexDB it must improve performance during switch for the next indexDB and remove ingestion issues. Since there is no need for creation new index records for timeseries already ingested into current indexDB https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563 * lib/storage: further work on indexdb rotation optimization - Document the change at docs/CHAGNELOG.md - Move back various caches from indexDB to Storage. This makes the change less intrusive. The dateMetricIDCache now takes into account indexDB generation, so it stores (date, metricID) entries for both the current and the next indexDB. - Consolidate the code responsible for idbNext pre-filling into prefillNextIndexDB() function. This improves code readability and maintainability a bit. - Rewrite and simplify the code responsible for calculating the next retention timestamp. Add various tests for corner cases of this code. - Remove indexdb pre-filling from RegisterMetricNames() function, since this function is rarely called. It is OK to add indexdb entries on demand in this function. This simplifies the code. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 * docs/CHANGELOG.md: refer to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563 --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-07-22 15:20:21 -07:00
Aliaksandr Valialkin	140e7b6b74	all: replace atomic.Value with atomic.Pointer[T] This eliminates the need in .(*T) casting for results obtained from Load() Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map, because map is already a pointer type.	2023-07-19 17:42:06 -07:00
Aliaksandr Valialkin	8815080030	app/vmselect/promql: add the ability to copy all the labels from `one` side of group_left()/group_right() operation This is performed by specifying `` inside group_left()/group_right(). Also allow specifying prefix for the copied labels via `group_left(...) prefix "..."` and `group_right(...) prefix "..."` syntax. For example, the following query adds all the namespace-related labels to pod info, and prefixes all the copied label names with "ns_" prefix: kube_pod_info on(namespace) group_left(*) prefix "ns_" kube_namespace_labels This resolves the following StackOverflow questions: - https://stackoverflow.com/questions/76661818/how-to-add-namespace-labels-to-pod-labels-in-prometheus - https://stackoverflow.com/questions/76653997/how-can-i-make-a-new-copy-of-kube-namespace-labels-metric-with-a-different-name	2023-07-17 19:07:39 -07:00
Aliaksandr Valialkin	7094fa38bc	lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping Previously all the newly ingested time series were registered in global `MetricName -> TSID` index. This index was used during data ingestion for locating the TSID (internal series id) for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names). The `MetricName -> TSID` index is stored on disk in order to make sure that the data isn't lost on VictoriaMetrics restart or unclean shutdown. The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache, and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics uses in-memory cache for speeding up the lookup for active time series. This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk. VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases: - If `storage/tsid` cache capacity isn't enough for active time series. Then just increase available memory for VictoriaMetrics or reduce the number of active time series ingested into VictoriaMetrics. - If new time series is ingested into VictoriaMetrics. In this case it cannot find the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index, since it doesn't know that the index has no the corresponding entry too. This is a typical event under high churn rate, when old time series are constantly substituted with new time series. Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index, are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics. Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName` for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod. This index can become very large under high churn rate and long retention. VictoriaMetrics caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups. The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series. This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics consults only the index for the current day when new time series is ingested into it. The downside of this change is increased indexdb size on disk for workloads without high churn rate, e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store identical `MetricName -> TSID` entries for static time series for every day. This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation, since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 . At the same time the change fixes the issue, which could result in lost access to time series, which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698 The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685 This is a follow-up for `1f28b46ae9`	2023-07-13 16:07:30 -07:00
Aliaksandr Valialkin	3b50b94f7a	lib/storage: fix possible test failure in TestStorageAddRowsConcurrent The number of parts in the snapshot partition may be zero if concurrent goroutine just started creating new partition, but didn't put data into it yet when the current goroutine made a snapshot.	2023-07-13 15:03:45 -07:00
Zakhar Bessarab	242050ba94	lib/storage: follow-up after `a50d63c376` (#4289 ) * lib/storage: follow-up after `a50d63c376` - ensure retentionMsecs is rounded to day - remove localTimeOffset in test as localOffset is ignored when using `UnixMilli` Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage: restore retention timezone offset effect on retention deadline Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-05-16 17:14:08 +02:00
Zakhar Bessarab	aca256735c	lib/storage: fix indexdb rotation infinite loop (#4249 ) When using `retentionTimezoneOffset` and having local timezone being more than 4 hours different from UTC indexdb retention calculation could return negative value. This caused indexdb rotation to get in loop. Fix calculation of offset to use `retentionTimezoneOffset` value properly and add test to cover all legit timezone configs. See: - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4207 - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4206 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-05-04 17:16:48 +02:00
Aliaksandr Valialkin	52006149b2	lib/storage: replace OpenStorage() with MustOpenStorage() Callers of OpenStorage() log the returned error and exit. The error logging and exit can be performed inside MustOpenStorage() alongside with printing the stack trace for better debuggability. This simplifies the code at caller side.	2023-04-14 23:02:40 -07:00
Aliaksandr Valialkin	df619bdff0	all: consistently use fs.MustClose() for closing lock files	2023-04-14 20:14:21 -07:00
Aliaksandr Valialkin	c8f2febaa1	lib/storage: consistently use OS-independent separator in file paths This is needed for Windows support, which uses `\` instead of `/` as file separator Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2023-03-25 14:33:58 -07:00
Zakhar Bessarab	39cdc546dd	lib/storage: enhancements for snapshots process (#3873 ) * lib/{fs,mergeset,storage}: skip `.must-remove.` dirs when creating snapshot (#3858) * lib/{mergeset,storage}: add timeout configuration for snapshots creation, remove incomplete snapshots from storage * docs: fix formatting * app/vmstorage: add metrics to track status of snapshots * app/vmstorage: use `vm_http_requests_total` metric for snapshot endpoints metrics, rename new flag to make name more clear Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * app/vmstorage: update flag name in docs Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * app/vmstorage: reflect new metrics names change in docs Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-02-27 12:12:03 -08:00
Nikolay	9254e494f9	lib/storage: fixes finalDedup for backfilled data (#3737 ) previously historical data backfilling may trigger force merge for previous month every hour it consumes cpu, disk io and decrease cluster performance. Following commit fixes it by applying deduplication for InMemoryParts	2023-02-01 09:54:21 -08:00
Aliaksandr Valialkin	ba5a6c851c	lib/storage: use deterministic random generator in tests Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683	2023-01-23 20:10:32 -08:00
Aliaksandr Valialkin	8189770c50	all: add `-inmemoryDataFlushInterval` command-line flag for controlling the frequency of saving in-memory data to disk The main purpose of this command-line flag is to increase the lifetime of low-end flash storage with the limited number of write operations it can perform. Such flash storage is usually installed on Raspberry PI or similar appliances. For example, `-inmemoryDataFlushInterval=1h` reduces the frequency of disk write operations to up to once per hour if the ingested one-hour worth of data fits the limit for in-memory data. The in-memory data is searchable in the same way as the data stored on disk. VictoriaMetrics automatically flushes the in-memory data to disk on graceful shutdown via SIGINT signal. The in-memory data is lost on unclean shutdown (hardware power loss, OOM crash, SIGKILL). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337	2022-12-05 15:16:14 -08:00
Aliaksandr Valialkin	299285b147	lib/storage: fix TestUpdateCurrHourMetricIDs test when it runs on the first hour of the day by UTC	2022-12-02 18:52:37 -08:00
匠心零度	fa0ce10275	lib/storage: remove extra error check (#3396 )	2022-11-28 16:43:31 -08:00
Aliaksandr Valialkin	daa70e6560	lib/storage: follow-up for `790768f20b` - Document the bugfix at docs/CHANGELOG.md - Simplify the bugfix a bit	2022-11-07 14:04:08 +02:00
Aliaksandr Valialkin	dd88c628aa	lib/storage: remove unused isFull field from hourMetricIDs struct	2022-11-07 13:58:26 +02:00
Łukasz Marszał	790768f20b	Fix issue-3309 - currHourMetricIDs shouldn't contain metrics from prev hour (#3320 ) * fix issue-3309 currHourMetricIDs shouldn't contain metrics from prev hour * Update storage.go	2022-11-07 13:55:37 +02:00
Aliaksandr Valialkin	e1b8059086	lib/vmselectapi: rename deleteMetrics to more correct deleteSeries	2022-07-06 12:37:54 +03:00
Aliaksandr Valialkin	a350d1e81c	lib/storage: return marshaled metric names from SearchMetricNames Previously SearchMetricNames was returning unmarshaled metric names. This wasn't great for vmstorage, which should spend additional CPU time for marshaling the metric names before sending them to vmselect. While at it, remove possible duplicate metric names, which could occur when multiple samples for new time series are ingested via concurrent requests. Also sort the metric names before returning them to the client. This simplifies debugging of the returned metric names across repeated requests to /api/v1/series	2022-06-28 18:17:15 +03:00
Aliaksandr Valialkin	ba514284f1	lib/storage: add querytracer to more contexts querytracer has been added to the following storage.Storage methods: - RegisterMetricNames - DeleteMetrics - SearchTagValueSuffixes - SearchGraphitePaths	2022-06-27 13:45:51 +03:00
Aliaksandr Valialkin	52cf05c6d2	lib/storage: test GetTSDBStatusWithFiltersForDate on a global time range	2022-06-12 14:27:40 +03:00
Aliaksandr Valialkin	374beb350e	app/vmselect: optimize `/api/v1/labels` and `/api/v1/label/.../values` handlers when `match[]` query arg is passed to them	2022-06-12 04:32:13 +03:00
Aliaksandr Valialkin	41958ed5dd	all: add initial support for query tracing See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#query-tracing Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1403	2022-06-01 02:29:23 +03:00
Artem Navoiev	37cf509c3a	lib/{storage,flagutil} - Add option for snapshot autoremoval (#2487 ) * lib/{storage,flagutil} - Add option for snapshot autoremoval - add prometheus-like duration as command flag - add option to delete stale snapshots - update duration.go flag to re-use own code * wip * lib/flagutil: re-use Duration.Set() call in NewDuration * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-05-02 11:00:15 +03:00
Roman Khavronenko	cf1a8bce6b	lib/index: reduce read/write load after indexDB rotation (#2177 ) * lib/index: reduce read/write load after indexDB rotation IndexDB in VM is responsible for storing TSID - ID's used for identifying time series. The index is stored on disk and used by both ingestion and read path. IndexDB is stored separately to data parts and is global for all stored data. It can't be deleted partially as VM deletes data parts. Instead, indexDB is rotated once in `retention` interval. The rotation procedure means that `current` indexDB becomes `previous`, and new freshly created indexDB struct becomes `current`. So in any time, VM holds indexDB for current and previous retention periods. When time series is ingested or queried, VM checks if its TSID is present in `current` indexDB. If it is missing, it checks the `previous` indexDB. If TSID was found, it gets copied to the `current` indexDB. In this way `current` indexDB stores only series which were active during the retention period. To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both write and read path consult `tsidCache` and on miss the relad lookup happens. When rotation happens, VM resets the `tsidCache`. This is needed for ingestion path to trigger `current` indexDB re-population. Since index re-population requires additional resources, every index rotation event may cause some extra load on CPU and disk. While it may be unnoticeable for most of the cases, for systems with very high number of unique series each rotation may lead to performance degradation for some period of time. This PR makes an attempt to smooth out resource usage after the rotation. The changes are following: 1. `tsidCache` is no longer reset after the rotation; 2. Instead, each entry in `tsidCache` gains a notion of indexDB to which they belong; 3. On ingestion path after the rotation we check if requested TSID was found in `tsidCache`. Then we have 3 branches: 3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID. 3.2 Slow path. It wasn't found, so we generate it from scratch, add to `current` indexDB, add it to `tsidCache`. 3.3 Smooth path. It was found but does not belong to the `current` indexDB. In this case, we add it to the `current` indexDB with some probability. The probability is based on time passed since the last rotation with some threshold. The more time has passed since rotation the higher is chance to re-populate `current` indexDB. The default re-population interval in this PR is set to `1h`, during which entries from `previous` index supposed to slowly re-populate `current` index. The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs were moved from `previous` indexDB to the `current` indexDB. This metric supposed to grow only during the first `1h` after the last rotation. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-12 00:30:08 +02:00
Aliaksandr Valialkin	afafeb379a	all: typo fix: unexected -> unexpected	2021-12-20 17:39:52 +02:00
Aliaksandr Valialkin	e1a715b0f5	lib/storage: convert alternate regexps into Graphite wildcards inside `__graphite__` pseudo-label For example, `{__graphite__=~"foo.(bar\|baz)"}` is automatically converted to `{__graphite__=~"foo.{bar,baz}"}` before execution. This allows using multi-value Grafana template variables such as `{__graphite__=~"foo.($app)"}`.	2021-12-14 19:51:49 +02:00
Aliaksandr Valialkin	2d8bd41f8a	lib/storage: reduce memory allocations when syncing dateMetricIDCache	2021-06-03 16:20:42 +03:00
Aliaksandr Valialkin	ad73f226ff	app/vmstorage: add ability to limit series cardinality via `-storage.maxHourlySeries` and `-storage.maxDailySeries` command-line flags	2021-05-20 14:15:19 +03:00
Aliaksandr Valialkin	12d733dd5d	app/vminsert: add support for data ingestion via other vminsert nodes	2021-05-08 19:52:57 +03:00

1 2

79 commits