### Describe Your Changes
Add test cases proving that it is possible to lose indexDB after
changing the retention period. See #7609
### Checklist
The following checks are **mandatory**:
- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
This commit fixes the TestStorageRotateIndexDB flaky test reported at:
#6977. Sample test failure: https://pastebin.com/bTSs8HP1
The test fails because one goroutine adds items to the indexDB table
while another goroutine is closing that table. This may happen if
indexDB rotation happens twice during one Storage.add() operation:
- Storage.add() takes the current indexDB and adds index recods to it
- First index db rotation makes the current index DB a previous one
(still ok at this point)
- Second index db rotation removes the indexDB that was current two
rotations earlier. It does this by setting the mustDrop flag to true and
decrementing the ref counter. The ref counter reaches zero which cases
the underlying indexdb table to release its resources gracefully.
Graceful release assumes that the table is not written anymore. But
Storage.add() still adds items to it.
The solution is to increment the indexDB ref counters while it is used
inside add().
The unit test has been changed a little so that the test fails reliably.
The idea is to make add() function invocation to last much longer,
therefore the test inserts not just one record at a time but thouthands
of them.
To see the test fail, just replace the idbsLocked() func with:
```go
unc (s *Storage) idbsLocked2() (*indexDB, *indexDB, func()) {
return s.idbCurr.Load(), s.idbNext.Load(), func() {}
}
```
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
`vm_rows_ignored_total` metric is a metric for users to signalize about
ingestion issues, such as bad timestamp or parsing error.
In commit
a5424e95b3
this metric started to increment each time vmstorage gets NaN. But NaN
is a valid value for Prometheus data model and for Prometheus metrics
exposition format. Exporters from Prometheus ecosystem could expose NaNs
as values for metrics and these values will be delivered to vmstorage
and increment the metric.
Since there is nothing user can do with this, in opposite to parsing
errors or bad timestamps, there is not much sense in incrementing this
metric. So this commit rolls-back `reason="nan_value"` increments.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Introduce the `-search.maxDeleteSeries` flag that limits the number of
time series that can be deleted with a single
`/api/v1/admin/tsdb/delete_series` call.
Currently, any number can be deleted and if the number is big (millions)
then the operation may result in unaccounted CPU and memory usage spikes
which in some cases may result in OOM kill (see #7027). The flag limits
the number to 30k by default and the users may override it if needed at
the vmstorage start time.
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
### Describe Your Changes
Add storage metrics that count records that failed to insert:
- `RowsReceivedTotal`: the number of records that have been received by
the storage from the clients
- `RowsAddedTotal`: the number of records that have actually been
persisted. This value must be equal to `RowsReceivedTotal` if all the
records have been valid ones. But it will be smaller otherwise. The
values of the metrics below should provide the insight of why some
records hasn't been added
- `NaNValueRows`: the number of records whose value was `NaN`
- `StaleNaNValueRows`: the number of records whose value was `Stale NaN`
- `InvalidRawMetricNames`: the number of records whose raw metric name
has failed to unmarshal.
The following metrics existed before this PR and are listed here for
completeness:
- `TooSmallTimestampRows`: the number of records whose timestamp is
negative or is older than retention period
- `TooBigTimestampRows`: the number of records whose timestamp is too
far in the future.
- `HourlySeriesLimitRowsDropped`: the number of records that have not
been added because the hourly series limit has been exceeded.
- `DailySeriesLimitRowsDropped`: the number of records that have not
been added because the daily series limit has been exceeded.
---
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
This patch reverts 1fd3385
After discussing it we've come to conclusion that this is a valid
behavior which can be avoided by deleting the time series only once the
corresponding stale NaNs have been received.
On the other hand, the fix leads to lost stale NaNs in some rare but
valid use cases. For example:
- In a cluster configuration the samples for a given time series are
normally sent to the same vmstorage replica. However, wminsert may
reroute the samples to another replica because the original one is down
or is overloaded. In this case the stale NaN may end up on a replica
that has no data for that time series, but we still want to record that
sample.
Thus, reverting that fix.
---
related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
### Describe Your Changes
This is a follow-up PR: Unit tests introduced in #6872 can now use
RowsAddedTotal counter whose scope was fixed in #6841.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
`TL;DR` This PR improves the metric IDs search in IndexDB:
- Avoid seaching for metric IDs twice when `maxMetrics` limit is
exceeded
- Use correct error type for indicating that the `maxMetrics` limit is
exceded
- Simplify the logic of deciding between per-day and global index search
A unit test has been added to ensure that this refactoring does not
break anything.
---
Function calls before the fix:
```
idb.searchMetricIDs
|__ is.searchMetricIDs
|__ is.searchMetricIDsInternal
|__ is.updateMetricIDsForTagFilters
|__ is.tryUpdatingMetricIDsForDateRange
| |
|__ is.getMetricIDsForDateAndFilters
```
- `searchMetricIDsInternal` searches metric IDs for each filter set. It
maintains a metric ID set variable which is updated every time the
`updateMetricIDsForTagFilters` function is called. After each successful
call, the function checks the length of the updated metric ID set and if
it is greater than `maxMetrics`, the function returns `too many
timeseries` error.
- `updateMetricIDsForTagFilters` uses either per-day or global index to
search metric IDs for the given filter set. The decision of which index
to use is made is made within the `tryUpdatingMetricIDsForDateRange`
function and if it returns `fallback to global search` error then the
function uses global index by calling `getMetricIDsForDateAndFilters`
with zero date.
- `tryUpdatingMetricIDsForDateRange` first checks if the given time
range is larger than 40 days and if so returns `fallback to global
search` error. Otherwise it proceeds to searching for metric IDs within
that time range by calling `getMetricIDsForDateAndFilters` for each
date.
- `getMetricIDsForDateAndFilters` searches for metric IDs for the given
date and returns `fallback to global search` error if the number of
found metric IDs is greater than `maxMetrics`.
Problems with this solution:
1. The `fallback to global search` error returned by
`getMetricIDsForDateAndFilters` in case when maxMetrics is exceeded is
misleading.
2. If `tryUpdatingMetricIDsForDateRange` proceeds to date range search
and returns `fallback to global search` error (because
`getMetricIDsForDateAndFilters` returns it) then this will trigger
global search in `updateMetricIDsForTagFilters`. However the global
search uses the same maxMetrics value which means this search is
destined to fail too. I.e. the same search is performed twice and fails
twice.
3. `too many timeseries` error is already handled in
`searchMetricIDsInternal` and therefore handing this error in
`updateMetricIDsForTagFilters` is redundant
4. updateMetricIDsForTagFilters is a better place to make a decision on
whether to use per-day or global index.
Solution:
1. Use a dedicated error for `too many timeseries` case
2. Handle `too many timeseries` error in `searchMetricIDsInternal` only
3. Move the per-day or global search decision from
`tryUpdatingMetricIDsForDateRange` to `updateMetricIDsForTagFilters` and
remove `fallback to global search` error.
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
Once the timeseries is in tsidCache, new entries won't be created in
per-day index because the RegisterMetricNames() code does consider
different dates for the same timeseries. So this case has been added.
The same bug exists for AddRows() but it is not manifested because the
index entries are finally created in updatePerDateData().
RegisterMetricNames also updated to increase the newTimeseriesCreated
counter because it actually creates new time series in index.
A unit tests has been added that check all possible data patterns
(different metric names and dates) and code branches in both
RegisterMetricNames and AddRows. The total number of new unit tests is
around 100 which increaded the running time of storage tests by 50%.
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
### Describe Your Changes
`Storage.AddRows()` returns an error only in one case: when
`Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But
the same error is treated as a warning when it happens inside
`Storage.add()` or returned by `Storage.prefillNextIndexDB()`.
This commit fixes this inconsistency by treating the error returned by
`Storage.updatePerDateData()` as a warning as well. As a result
`Storage.add()` does not need a return value anymore and so doesn't
`Storage.AddRows()`.
Additionally, this commit adds a unit test that checks all cases that
result in a row not being added to the storage.
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
### Describe Your Changes
Fix Date metricid cache consistency under concurrent use.
When one goroutine calls Has() and does not find the cache entry in the
immutable map it will acquire a lock and check the mutable map. And it
is possible that before that lock is acquired, the entry is moved from
the mutable map to the immutable map by another goroutine causing a
cache miss.
The fix is to check the immutable map again once the lock is acquired.
### Checklist
The following checks are **mandatory**:
- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
Entries for the previous dates is usually not used, so there is little sense in keeping them in memory.
This should reduce the size of storage/date_metricID cache, which can be monitored
via vm_cache_entries{type="storage/date_metricID"} metric.
* Introduce flagutil.Duration
To avoid conversion bugs
* Fix tests
* Clarify documentation re. month=31 days
* Add fasttime.UnixTime() to obtain time.Time
The goal is to refactor out the last usage of `.Msecs`.
* Use fasttime for time.Now()
* wip
- Remove fasttime.UnixTime(), since it doesn't improve code readability and maintainability
- Run `make docs-sync` for syncing changes from README.md to docs/ folder
- Make lib/flagutil.Duration.Msec private
- Rename msecsPerMonth const to msecsPer31Days in order to be consistent with retention31Days
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/storage: pre-create timeseries before indexDB rotation
during an hour before indexDB rotation start creating records at the next indexDB
it must improve performance during switch for the next indexDB and remove ingestion issues.
Since there is no need for creation new index records for timeseries already ingested into current indexDB
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563
* lib/storage: further work on indexdb rotation optimization
- Document the change at docs/CHAGNELOG.md
- Move back various caches from indexDB to Storage. This makes the change less intrusive.
The dateMetricIDCache now takes into account indexDB generation, so it stores (date, metricID)
entries for both the current and the next indexDB.
- Consolidate the code responsible for idbNext pre-filling into prefillNextIndexDB() function.
This improves code readability and maintainability a bit.
- Rewrite and simplify the code responsible for calculating the next retention timestamp.
Add various tests for corner cases of this code.
- Remove indexdb pre-filling from RegisterMetricNames() function, since this function is rarely called.
It is OK to add indexdb entries on demand in this function. This simplifies the code.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
* docs/CHANGELOG.md: refer to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This eliminates the need in .(*T) casting for results obtained from Load()
Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map,
because map is already a pointer type.
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9
The number of parts in the snapshot partition may be zero if concurrent goroutine just
started creating new partition, but didn't put data into it yet when the current
goroutine made a snapshot.
* lib/storage: follow-up after a50d63c376
- ensure retentionMsecs is rounded to day
- remove localTimeOffset in test as localOffset is ignored when using `UnixMilli`
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/storage: restore retention timezone offset effect on retention deadline
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
When using `retentionTimezoneOffset` and having local timezone being more than 4 hours different from UTC indexdb retention calculation could return negative value. This caused indexdb rotation to get in loop.
Fix calculation of offset to use `retentionTimezoneOffset` value properly and add test to cover all legit timezone configs.
See:
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4207
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4206
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
Callers of OpenStorage() log the returned error and exit.
The error logging and exit can be performed inside MustOpenStorage()
alongside with printing the stack trace for better debuggability.
This simplifies the code at caller side.
* lib/{fs,mergeset,storage}: skip `.must-remove.` dirs when creating snapshot (#3858)
* lib/{mergeset,storage}: add timeout configuration for snapshots creation, remove incomplete snapshots from storage
* docs: fix formatting
* app/vmstorage: add metrics to track status of snapshots
* app/vmstorage: use `vm_http_requests_total` metric for snapshot endpoints metrics, rename new flag to make name more clear
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vmstorage: update flag name in docs
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vmstorage: reflect new metrics names change in docs
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
previously historical data backfilling may trigger force merge for previous month every hour
it consumes cpu, disk io and decrease cluster performance.
Following commit fixes it by applying deduplication for InMemoryParts
The main purpose of this command-line flag is to increase the lifetime of low-end flash storage
with the limited number of write operations it can perform. Such flash storage is usually
installed on Raspberry PI or similar appliances.
For example, `-inmemoryDataFlushInterval=1h` reduces the frequency of disk write operations
to up to once per hour if the ingested one-hour worth of data fits the limit for in-memory data.
The in-memory data is searchable in the same way as the data stored on disk.
VictoriaMetrics automatically flushes the in-memory data to disk on graceful shutdown via SIGINT signal.
The in-memory data is lost on unclean shutdown (hardware power loss, OOM crash, SIGKILL).
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Previously SearchMetricNames was returning unmarshaled metric names.
This wasn't great for vmstorage, which should spend additional CPU time
for marshaling the metric names before sending them to vmselect.
While at it, remove possible duplicate metric names, which could occur when
multiple samples for new time series are ingested via concurrent requests.
Also sort the metric names before returning them to the client.
This simplifies debugging of the returned metric names across repeated requests to /api/v1/series
querytracer has been added to the following storage.Storage methods:
- RegisterMetricNames
- DeleteMetrics
- SearchTagValueSuffixes
- SearchGraphitePaths
* lib/{storage,flagutil} - Add option for snapshot autoremoval
- add prometheus-like duration as command flag
- add option to delete stale snapshots
- update duration.go flag to re-use own code
* wip
* lib/flagutil: re-use Duration.Set() call in NewDuration
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/index: reduce read/write load after indexDB rotation
IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.
IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.
The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.
To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.
When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.
This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.
The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
For example, `{__graphite__=~"foo.(bar|baz)"}` is automatically converted to `{__graphite__=~"foo.{bar,baz}"}` before execution.
This allows using multi-value Grafana template variables such as `{__graphite__=~"foo.($app)"}`.