github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	140e7b6b74	all: replace atomic.Value with atomic.Pointer[T] This eliminates the need in .(*T) casting for results obtained from Load() Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map, because map is already a pointer type.	2023-07-19 17:42:06 -07:00
Aliaksandr Valialkin	6685f6ce7c	lib/storage: move series registration in caches from createAllIndexesForMetricName into a separate function - putSeriesToCache This makes the code more clear and easier to read This is a follow-up for `7094fa38bc`	2023-07-13 23:13:23 -07:00
Aliaksandr Valialkin	443661a5da	lib/storage: properly free up resources from newTestStorage() by calling stopTestStorage()	2023-07-13 17:13:24 -07:00
Aliaksandr Valialkin	7094fa38bc	lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping Previously all the newly ingested time series were registered in global `MetricName -> TSID` index. This index was used during data ingestion for locating the TSID (internal series id) for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names). The `MetricName -> TSID` index is stored on disk in order to make sure that the data isn't lost on VictoriaMetrics restart or unclean shutdown. The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache, and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics uses in-memory cache for speeding up the lookup for active time series. This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk. VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases: - If `storage/tsid` cache capacity isn't enough for active time series. Then just increase available memory for VictoriaMetrics or reduce the number of active time series ingested into VictoriaMetrics. - If new time series is ingested into VictoriaMetrics. In this case it cannot find the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index, since it doesn't know that the index has no the corresponding entry too. This is a typical event under high churn rate, when old time series are constantly substituted with new time series. Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index, are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics. Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName` for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod. This index can become very large under high churn rate and long retention. VictoriaMetrics caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups. The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series. This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics consults only the index for the current day when new time series is ingested into it. The downside of this change is increased indexdb size on disk for workloads without high churn rate, e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store identical `MetricName -> TSID` entries for static time series for every day. This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation, since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 . At the same time the change fixes the issue, which could result in lost access to time series, which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698 The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685 This is a follow-up for `1f28b46ae9`	2023-07-13 16:07:30 -07:00
Aliaksandr Valialkin	1f28b46ae9	lib/storage: revert the migration from global to per-day index for (MetricName -> TSID) This reverts the following commits: - `e0e16a2d36` - `2ce02a7fe6` The reason for revert: the updated logic breaks assumptions made when fixing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698 . For example, if a time series stop receiving new samples during the first day after the indexdb rotation, there are chances that the time series won't be registered in the new indexdb. This is OK until the next indexdb rotation, since the time series is registered in the previous indexdb, so it can be found during queries. But the time series will become invisible for search after the next indexdb rotation, while its data is still there. There is also incompletely solved issue with the increased CPU and disk IO resource usage just after the indexdb rotation. There was an attempt to fix it, but it didn't fix it in full, while introducing the issue mentioned above. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 TODO: to find out the solution, which simultaneously solves the following issues: - increased memory usage for setups high churn rate and long retention (e.g. what the reverted commit does) - increased CPU and disk IO usage during indexdb rotation ( https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 ) - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698 Possible solution - to create the new indexdb in one hour before the indexdb rotation and to gradually pre-populate it with the needed index data during the last hour before indexdb rotation. Then the new indexdb will contain all the needed data just after the rotation, so it won't trigger increased CPU and disk IO.	2023-05-18 11:30:49 -07:00
Aliaksandr Valialkin	e0e16a2d36	lib/storage: follow-up after `2ce02a7fe6` - Document the change at docs/CHANGELOG.md - Clarify comments for non-trivial code touched by the commit - Improve the logic behind maybeCreateIndexes(): - Correctly create per-day indexes if the indexdb rotation is performed during the first hour or the last hour of the day by UTC. Previously there was a possibility of missing index entries on that day. - Increase the duration for creating new indexes in the current indexdb for up to 22 hours after indexdb rotation. This should reduce the increased resource usage after indexdb rotation. It is safe to postpone index creation for the current day until the last hour of the current day after indexdb rotation by UTC, since the corresponding (date, ...) entries exist in the previous indexdb. - Search for TSID by (date, MetricName) in both the current and the previous indexdb. Previously the search was performed only in the current indexdb. This could lead to excess creation of per-day indexes for the current day just after indexdb rotation. - Search for (date, metricID) entries in both the current and the previous indexdb. Previously the search was performed only in the current indexdb. This could lead to excess creation of per-day indexes for the current day just after indexdb rotation.	2023-05-16 23:19:27 -07:00
Roman Khavronenko	2ce02a7fe6	lib/storage: introduce per-day MetricName=>TSID index (#4252 ) The new index substitutes global MetricName=>TSID index used for locating TSIDs on ingestion path. For installations with high ingestion and churn rate, global MetricName=>TSID index can grow enormously making index lookups too expensive. This also results into bigger than expected cache growth for indexdb blocks. New per-day index supposed to be much smaller and more efficient. This should improve ingestion speed and reliability during re-routings in cluster. The negative outcome could be occupied disk size, since per-day index is more expensive comparing to global index. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-05-16 15:46:42 -07:00
Aliaksandr Valialkin	278278af95	lib/storage: reduce the unimportant logging during Storage start / stop This should improve the visibility of potentially important logs	2023-05-16 15:14:21 -07:00
Aliaksandr Valialkin	09b403d38a	lib/{mergeset,storage}: make it clear that DebugFlush() doesn't store all the recently ingested data to disk DebugFlush() makes sure that the recently ingested data becomes visible to search. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4272	2023-05-16 11:50:17 -07:00
Zakhar Bessarab	242050ba94	lib/storage: follow-up after `a50d63c376` (#4289 ) * lib/storage: follow-up after `a50d63c376` - ensure retentionMsecs is rounded to day - remove localTimeOffset in test as localOffset is ignored when using `UnixMilli` Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage: restore retention timezone offset effect on retention deadline Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-05-16 17:14:08 +02:00
Nikolay	8f4de6fa47	lib/storage: properly update link for entry at dateMetricID cache (#4258 ) previously during sync for mutable and immutable cache parts, link for hotEntry with current date may be not properly updated it corrupts cache for backfilling metrics and increased cpu load	2023-05-05 21:45:47 -07:00
Zakhar Bessarab	aca256735c	lib/storage: fix indexdb rotation infinite loop (#4249 ) When using `retentionTimezoneOffset` and having local timezone being more than 4 hours different from UTC indexdb retention calculation could return negative value. This caused indexdb rotation to get in loop. Fix calculation of offset to use `retentionTimezoneOffset` value properly and add test to cover all legit timezone configs. See: - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4207 - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4206 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-05-04 17:16:48 +02:00
Aliaksandr Valialkin	52006149b2	lib/storage: replace OpenStorage() with MustOpenStorage() Callers of OpenStorage() log the returned error and exit. The error logging and exit can be performed inside MustOpenStorage() alongside with printing the stack trace for better debuggability. This simplifies the code at caller side.	2023-04-14 23:02:40 -07:00
Aliaksandr Valialkin	3727251910	lib/fs: add MustReadDir() function Use fs.MustReadDir() instead of os.ReadDir() across the code in order to reduce the code verbosity. The fs.MustReadDir() logs the error with the directory name and the call stack on error before exit. This information should be enough for debugging the cause of the error.	2023-04-14 22:10:46 -07:00
Aliaksandr Valialkin	df619bdff0	all: consistently use fs.MustClose() for closing lock files	2023-04-14 20:14:21 -07:00
Aliaksandr Valialkin	2a3b19e1d2	lib/fs: convert CreateFlockFile to MustCreateFlockFile Callers of CreateFlockFile log the returned err and exit. It is better to log the error inside the MustCreateFlockFile together with the path to the specified directory and the call stack. This simplifies the code at the callers' side while leaving the debuggability at the same level.	2023-04-14 19:50:01 -07:00
Aliaksandr Valialkin	809fbaeaac	lib/fs: add Must prefix to CopyDirectory and CopyFile functions Callers of these functions log the returned error and then exit. Let's log the error with the call stack inside the function itself. This simplifies the code at callers' side, while leaving the same level of debuggability in case of errors.	2023-04-13 23:02:59 -07:00
Aliaksandr Valialkin	780abc3b3b	lib/fs: rename SymlinkRelative to MustSymlinkRelative Callers of this function log the returned error and then exit. Let's log the error with the call stack inside the function itself. This simplifies the code at callers' side, while leaving the same level of debuggability in case of errors.	2023-04-13 22:52:55 -07:00
Aliaksandr Valialkin	30425ca81a	lib/fs: rename WriteFileAtomically to MustWriteAtomic Callers of this function log the returned error and exit. So let's just log the error with the given filepath and the call stack inside the function itself and then exit. This simplifies the code at callers' place while leaves the same level of debuggability in case of errors.	2023-04-13 22:41:15 -07:00
Aliaksandr Valialkin	036a7b7365	lib/fs: replace MkdirAllIfNotExist->MustMkdirIfNotExist and MkdirAllFailIfExist->MustMkdirFailIfExist Callers of these functions log the returned error and then exit. The returned error already contains the path to directory, which was failed to be created. So let's just log the error together with the call stack inside these functions. This leaves the debuggability of the returned error at the same level while allows simplifying the code at callers' side. While at it, properly use MustMkdirFailIfExist instead of MustMkdirIfNotExist inside inmemoryPart.MustStoreToDisk(). It is expected that the inmemoryPart.MustStoreToDick() must fail if there is already a directory under the given path.	2023-04-13 22:11:59 -07:00
Haleygo	0ad6010c91	fix sort pendingDateMetricsIDs (#4102 )	2023-04-10 10:23:12 -07:00
Aliaksandr Valialkin	19b189e9b7	lib/storage: use shorter code after `03bde173b7`	2023-04-02 21:35:52 -07:00
faceair	38fc55976e	lib/storage: fix reuse pendingMetricRow (#4049 )	2023-04-02 21:35:50 -07:00
Roman Khavronenko	27b958ba8b	lib/storage: check for free disk space before opening tables (#4035 ) * lib/storage: check for free disk space before opening tables We check for free disk space before call to `openTable`, so `Storage` can be set to ReadOnly before mergeWorkers start. Before the change, there was a chance that merges will start even if Storage has to start in ReadOnly mode because of `-storage.minFreeDiskSpaceBytes` limit. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4023 Signed-off-by: hagen1778 <roman@victoriametrics.com> * lib/storage: chore Signed-off-by: hagen1778 <roman@victoriametrics.com> * Update lib/storage/storage.go --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-03-31 23:50:27 -07:00
Aliaksandr Valialkin	c8f2febaa1	lib/storage: consistently use OS-independent separator in file paths This is needed for Windows support, which uses `\` instead of `/` as file separator Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2023-03-25 14:33:58 -07:00
Aliaksandr Valialkin	43b24164ef	all: add Windows build for VictoriaMetrics This commit changes background merge algorithm, so it becomes compatible with Windows file semantics. The previous algorithm for background merge: 1. Merge source parts into a destination part inside tmp directory. 2. Create a file in txn directory with instructions on how to atomically swap source parts with the destination part. 3. Perform instructions from the file. 4. Delete the file with instructions. This algorithm guarantees that either source parts or destination part is visible in the partition after unclean shutdown at any step above, since the remaining files with instructions is replayed on the next restart, after that the remaining contents of the tmp directory is deleted. Unfortunately this algorithm doesn't work under Windows because it disallows removing and moving files, which are in use. So the new algorithm for background merge has been implemented: 1. Merge source parts into a destination part inside the partition directory itself. E.g. now the partition directory may contain both complete and incomplete parts. 2. Atomically update the parts.json file with the new list of parts after the merge, e.g. remove the source parts from the list and add the destination part to the list before storing it to parts.json file. 3. Remove the source parts from disk when they are no longer used. This algorithm guarantees that either source parts or destination part is visible in the partition after unclean shutdown at any step above, since incomplete partitions from step 1 or old source parts from step 3 are removed on the next startup by inspecting parts.json file. This algorithm should work under Windows, since it doesn't remove or move files in use. This algorithm has also the following benefits: - It should work better for NFS. - It fits object storage semantics. The new algorithm changes data storage format, so it is impossible to downgrade to the previous versions of VictoriaMetrics after upgrading to this algorithm. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3236 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3821 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2023-03-19 01:36:51 -07:00
Aliaksandr Valialkin	a26c6628fd	lib/{fs,mergeset,storage}: substitute os.Open()+os.File.Readdir() with os.ReadDir() This simplifies code a bit	2023-03-17 21:03:37 -07:00
Nikolay	927d9da270	lib/storage: correctly handle io.EOF error for pre-fetched metrics (#3946 ) io.EOF shouldn't be returned from this function. It breaks all search API logic and may result in empty query results.	2023-03-11 23:29:43 -08:00
Aliaksandr Valialkin	0d3f31f60e	lib/storage: follow-up for `39cdc546dd` - Use flag.Duration instead of flagutil.Duration for -snapshotCreateTimeout, since the flagutil.Duration is intended mostly for big durations, e.g. days, months and years, while the -snapshotCreateTimeout is usually smaller than one hour. - Add links to https://docs.victoriametrics.com/#how-to-work-with-snapshots in docs/CHANGELOG.md, so readers could easily find the corresponding docs when reading the changelog. - Properly remove all the created directories on unsuccessful attempt to create snapshot in Storage.CreateSnapshot(). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551	2023-02-27 13:07:38 -08:00
Zakhar Bessarab	39cdc546dd	lib/storage: enhancements for snapshots process (#3873 ) * lib/{fs,mergeset,storage}: skip `.must-remove.` dirs when creating snapshot (#3858) * lib/{mergeset,storage}: add timeout configuration for snapshots creation, remove incomplete snapshots from storage * docs: fix formatting * app/vmstorage: add metrics to track status of snapshots * app/vmstorage: use `vm_http_requests_total` metric for snapshot endpoints metrics, rename new flag to make name more clear Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * app/vmstorage: update flag name in docs Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * app/vmstorage: reflect new metrics names change in docs Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-02-27 12:12:03 -08:00
Oleksandr Redko	9fff48c3e3	app,lib: fix typos in comments (#3804 )	2023-02-13 13:27:13 +01:00
Aliaksandr Valialkin	09d7fa2737	lib/{mergeset,storage}: do not slow down concurrently executed queries during assisted merges Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291	2023-01-16 14:31:52 -08:00
Aliaksandr Valialkin	7afcca0c51	all: use metricsql.CompileRegexp instead of regexp.Compile for compiling regexps used in graphite queries This should speed up repeated queries, since metricsql.CompileRegexp returns regexps from the cache on subsequent calls for the same input regexp.	2023-01-09 21:43:08 -08:00
Aliaksandr Valialkin	c63755c316	lib/writeconcurrencylimiter: improve the logic behind -maxConcurrentInserts limit Previously the -maxConcurrentInserts was limiting the number of established client connections, which write data to VictoriaMetrics. Some of these connections could be idle. Such connections do not consume big amounts of CPU and RAM, so there is a little sense in limiting the number of such connections. So now the -maxConcurrentInserts command-line option limits the number of concurrently executed insert requests, not including idle connections. It is recommended removing -maxConcurrentInserts command-line option, since the default value for this option should work good for most cases.	2023-01-06 22:20:19 -08:00
Roman Khavronenko	5bdd880142	vmstorage: add more context to the flock acquiring msg (#3584 ) See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3578 Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-01-05 18:30:42 -08:00
Aliaksandr Valialkin	4de9d35458	lib/flagutil/bytes.go: properly handle values bigger than 2GiB on 32-bit architectures This fixes handling of values bigger than 2GiB for the following command-line flags: - -storage.minFreeDiskSpaceBytes - -remoteWrite.maxDiskUsagePerURL	2022-12-14 19:26:31 -08:00
Aliaksandr Valialkin	33dda2809b	lib/mergeset: panic when too long item is passed to Table.AddItems()	2022-12-03 23:32:16 -08:00
Aliaksandr Valialkin	932c1f90ae	lib/storage: remove duplicate logging for filepath on errors	2022-12-03 23:15:22 -08:00
Aliaksandr Valialkin	45299efe22	lib/{storage,mergeset}: consistency rename: `flushRaw{Rows,Items} -> flushPending{Rows,Items}	2022-12-03 22:17:46 -08:00
Aliaksandr Valialkin	299285b147	lib/storage: fix TestUpdateCurrHourMetricIDs test when it runs on the first hour of the day by UTC	2022-12-02 18:52:37 -08:00
Aliaksandr Valialkin	daa70e6560	lib/storage: follow-up for `790768f20b` - Document the bugfix at docs/CHANGELOG.md - Simplify the bugfix a bit	2022-11-07 14:04:08 +02:00
Aliaksandr Valialkin	f9dc3da9e2	lib/storage: typo fix after 32d48f8dfbb03174858c00bdfe6d9d22431dc8d8	2022-11-07 13:58:27 +02:00
Aliaksandr Valialkin	dd88c628aa	lib/storage: remove unused isFull field from hourMetricIDs struct	2022-11-07 13:58:26 +02:00
Łukasz Marszał	790768f20b	Fix issue-3309 - currHourMetricIDs shouldn't contain metrics from prev hour (#3320 ) * fix issue-3309 currHourMetricIDs shouldn't contain metrics from prev hour * Update storage.go	2022-11-07 13:55:37 +02:00
Aliaksandr Valialkin	c4265322f4	lib/fs: add canOverwrite arg to WriteFileAtomically when it is allowed to overwrite the file atomically if it already exists	2022-10-26 01:07:34 +03:00
Aliaksandr Valialkin	e2f0b76ebf	lib/storage: do not pass retentionMsecs and isReadOnly args explicitly - access them via Storage arg This makes code easier to read. This is a follow-up after `d2d30581a0`	2022-10-24 01:31:04 +03:00
Aliaksandr Valialkin	d2d30581a0	lib/storage: pass Storage to table and partition instead of getDeletedMetricIDs callback This improves code readability a bit.	2022-10-23 16:10:04 +03:00
Aliaksandr Valialkin	5e4dfe50c6	lib/storage: subsitute searchTSIDs functions with more lightweight searchMetricIDs function The searchTSIDs function was searching for metricIDs matching the the given tag filters and then was locating the corresponding TSID entries for the found metricIDs. The TSID entries aren't needed when searching for time series names (aka MetricName), so this commit removes the uneeded TSID search from the implementation of /api/v1/series API. This improves perfromance of /api/v1/series calls. This commit also improves performance a bit for /api/v1/query and /api/v1/query_range calls, since now these calls cache small metricIDs instead of big TSID entries in the indexdb/tagFilters cache (now this cache is named indexdb/tagFiltersToMetricIDs) without the need to compress the saved entries in order to save cache space. This commit also removes concurrency limiter during searching for matching time series, which was introduced in `8f16388428`, since the concurrency for all the read queries is already limited with -search.maxConcurrentRequests command-line flag. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648	2022-10-23 12:23:47 +03:00
Aliaksandr Valialkin	978dcb4574	lib/storage: properly remove cache directory contents if `reset_cache_on_startup` file is located there Previously the cache directory was removed. This could result in error when the cache directory is mounted to a separate filesystem.	2022-09-13 16:17:36 +03:00
Aliaksandr Valialkin	5f28ca1f42	lib/storage: atomically remove snapshot directories Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3038	2022-09-13 16:17:36 +03:00
Aliaksandr Valialkin	796aa310c2	app/vmstorage: expose `vm_{hourly,daily}_series_limit_{max,current}_series` metrics if `-storage.max{Hourly,Daily}Series` limits are set These metrics allow alerting when the number of unique series approach the limit. For example, the following query alerts when the number of series reaches 90% of the configured limit: vm_hourly_series_limit_current_series / vm_hourly_series_limit_max_series > 0.9	2022-08-24 13:44:04 +03:00
Aliaksandr Valialkin	9f94c295ab	all: use os.{Read\|Write}File instead of ioutil.{Read\|Write}File The ioutil.{Read\|Write}File is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil VictoriaMetrics needs at least Go1.18, so it is safe to remove ioutil usage from source code. This is a follow-up for `02ca2342ab`	2022-08-21 23:52:35 +03:00
guidao	91faa152a5	add next retention metric (#2863 ) Co-authored-by: wangfeng <wangfeng@zhihu.com>	2022-07-13 12:37:04 +03:00
Aliaksandr Valialkin	e1b8059086	lib/vmselectapi: rename deleteMetrics to more correct deleteSeries	2022-07-06 12:37:54 +03:00
Aliaksandr Valialkin	a60e03b3a7	lib/vmselectapi: use string type for tagKey and tagValuePrefix args at TagValueSuffixes() This improves the API consistency	2022-07-06 12:37:53 +03:00
Aliaksandr Valialkin	edc76286ac	lib/storage: put the (date, metricID) entry in dateMetricIDCache just after the corresponding series is registered in the per-day inverted index Previously the time series could be put into dateMetricIDCache without registering in the per-day inverted index if GetOrCreateTSIDByName finds TSID entry in the global index. This could lead to missing series in query results. The issue has been introduced in the commit `55e7afae3a`, which has been included in VictoriaMetrics v1.78.0	2022-07-05 14:54:03 +03:00
Aliaksandr Valialkin	3e2dd85f7d	all: readability improvements for query traces - show dates in human-readable format, e.g. 2022-05-07, instead of a numeric value - limit the maximum length of queries and filters shown in trace messages	2022-06-30 18:20:33 +03:00
Aliaksandr Valialkin	a350d1e81c	lib/storage: return marshaled metric names from SearchMetricNames Previously SearchMetricNames was returning unmarshaled metric names. This wasn't great for vmstorage, which should spend additional CPU time for marshaling the metric names before sending them to vmselect. While at it, remove possible duplicate metric names, which could occur when multiple samples for new time series are ingested via concurrent requests. Also sort the metric names before returning them to the client. This simplifies debugging of the returned metric names across repeated requests to /api/v1/series	2022-06-28 18:17:15 +03:00
Aliaksandr Valialkin	2c836bd398	lib/storage: put into query trace the number of found entries in SearchMetricNames	2022-06-28 14:50:53 +03:00
Aliaksandr Valialkin	ba514284f1	lib/storage: add querytracer to more contexts querytracer has been added to the following storage.Storage methods: - RegisterMetricNames - DeleteMetrics - SearchTagValueSuffixes - SearchGraphitePaths	2022-06-27 13:45:51 +03:00
Aliaksandr Valialkin	134751e43e	all: locate throttled loggers via logger.WithThrottler() only once and then use them This reduces the contention on logThrottlerRegistryMu mutex when logger.WithThrottler() is called frequently from concurrent goroutines.	2022-06-27 13:45:50 +03:00
Aliaksandr Valialkin	b958fc7846	lib/storage: properly take into account already registered series when `-storage.maxHourlySeries` or `-storage.maxDailySeries` limits are enabled The commit `5fb45173ae` takes into account only newly registered series when applying cardinality limits. This means that the cardinality limit could be exceeded with already registered series. This commit returns back accounting for already registered series when applying cardinality limits.	2022-06-20 13:47:47 +03:00
Aliaksandr Valialkin	55e7afae3a	lib/storage: create per-day indexes together with global indexes when registering new time series Previously the creation of per-day indexes and global indexes for the newly registered time series was decoupled. Now global indexes and per-day indexes for the current day are created toghether for new time series. This should speed up registering new time series a bit.	2022-06-19 22:42:10 +03:00
Aliaksandr Valialkin	5fb45173ae	lib/storage: do not register new series if `-storage.maxHourlySeries` or `-storage.maxDailySeries` limits are exceeded Previously samples for new series weren't added as expected when series limits were reached, but new series were still registered in indexdb.	2022-06-19 22:42:09 +03:00
Aliaksandr Valialkin	62e2371a67	lib/storage: reset metric id caches for the previous and the current hour Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698	2022-06-19 22:42:09 +03:00
Aliaksandr Valialkin	ec7963208d	app/vmselect: accept `focusLabel` query arg at /api/v1/status/tsdb This allows filling the seriesCountByFocusLabelValue list in the /api/v1/status/tsdb response with label values for the specified focusLabel, which contain the highest number of time series. TODO: add this to Cardinality explorer at VMUI - https://docs.victoriametrics.com/#cardinality-explorer	2022-06-14 18:36:54 +03:00
Aliaksandr Valialkin	374beb350e	app/vmselect: optimize `/api/v1/labels` and `/api/v1/label/.../values` handlers when `match[]` query arg is passed to them	2022-06-12 04:32:13 +03:00
Aliaksandr Valialkin	2bcb960f17	all: improve query tracing coverage for indexdb search Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1403	2022-06-09 20:07:07 +03:00
Aliaksandr Valialkin	12ac255dae	lib/querytracer: make it easier to use by passing trace context message to New and NewChild The context message can be extended by calling Donef. If there is no need to extend the message, then just call Done.	2022-06-08 21:06:52 +03:00
Aliaksandr Valialkin	ea06d2fd3c	lib/storage: stop background merge when storage enters read-only mode This should prevent from `no space left on device` errors when VictoriaMetrics under-estimates the additional disk space needed for background merge. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2603	2022-06-01 14:36:45 +03:00
Roman Khavronenko	642eb1c534	lib/storage: make `indexdb/tagFilters` cache size configurable (#2667 ) The default size of `indexdb/tagFilters` now can be overridden via `storage.cacheSizeIndexDBTagFilters` flag. Please, be careful with changing default size since it may lead to inefficient work of the vmstorage or OOM exceptions. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2663 Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2022-06-01 10:07:53 +02:00
Aliaksandr Valialkin	41958ed5dd	all: add initial support for query tracing See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#query-tracing Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1403	2022-06-01 02:29:23 +03:00
Aliaksandr Valialkin	f6d11a49aa	lib/storage: add ability to change the indexdb rotation time offset with -retentionTimezoneOffset command-line flag This is a follow-up for `0fbf59199a` See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2574	2022-05-25 16:05:29 +03:00
阳明	0fbf59199a	lib/storage: Remove the effect of time zone on next retention period (#2568 ) (#2574 )	2022-05-25 15:08:24 +03:00
Dmytro Kozlov	7dd9f3b98e	{vmbackup, vmbackup/snapshot}: fixed problem with snapshot backup in another snapshot folder (#2535 ) * {vmbackup, vmbackup/snapshot}: validate snapshot name * vmbackup/snapshot: added another checks * backup/actions: added check that we ignore backup_complete.ignore file * vmbackup: moved snapshot to lib directory * lib/snapshot: added functions description * lib/snapshot: fixed typo * vmbackup: code cleanup * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-05-04 22:12:03 +03:00
Artem Navoiev	37cf509c3a	lib/{storage,flagutil} - Add option for snapshot autoremoval (#2487 ) * lib/{storage,flagutil} - Add option for snapshot autoremoval - add prometheus-like duration as command flag - add option to delete stale snapshots - update duration.go flag to re-use own code * wip * lib/flagutil: re-use Duration.Set() call in NewDuration * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-05-02 11:00:15 +03:00
Aliaksandr Valialkin	50cf74ce4b	lib/storage: reuse sync.WaitGroup objects This reduces GC load by up to 10% according to memory profiling	2022-04-06 13:34:04 +03:00
Aliaksandr Valialkin	6e364e19ef	app/vmselect: add fine-grained limits for the number of returned/scanned time series for various APIs	2022-03-26 11:29:49 +02:00
Aliaksandr Valialkin	2ae3a9a8a3	lib/storage: reduce the interval for checking for free disk space from 30 seconds to 1 second This should reduce the probability of out of disk space panics when -storage.minFreeDiskSpaceBytes is set to low values. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2305	2022-03-18 16:52:27 +02:00
Aliaksandr Valialkin	3eef1ddc7d	lib/storage: trashing -> thrashing typo in docs This is a follow-up for `918ed5cb32`	2022-03-16 13:05:26 +02:00
Aliaksandr Valialkin	62b46007c5	lib/workingsetcache: reduce the default cache rotation period from hour to 20 minutes This should reduce memory usage under high time series churn rate	2022-02-23 13:41:45 +02:00
Roman Khavronenko	b6ed9afd6d	lib: allow to configure cache size by type (#2206 ) * lib: allow to configure cache size by type https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1940 Signed-off-by: hagen1778 <roman@victoriametrics.com> * Apply suggestions from code review * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-21 13:50:34 +02:00
Aliaksandr Valialkin	6ff71474a6	lib/storage: document why tsid cache is reset before saving it to disk Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2205	2022-02-16 18:37:56 +02:00
Aliaksandr Valialkin	96dce63dbd	lib/storage: tune the logic for pre-populating of the per-day inverted index for the next day - Postpone the pre-poulation to the last hour of the current day. This should reduce the number of useless entries in the next per-day index, which shouldn't be created there, when the corresponding time series are stopped to be pushed during the current day. - Make the pre-population more smooth in time by using the hash of MetricID instead of MetricID itself when calculating the need for for the given MetricID pre-population. - Sync the logic for pre-population of the next day inverted index with the logic of pre-populating tsid cache after indexdb rotation. This should improve code maintainability. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401	2022-02-12 16:33:16 +02:00
Roman Khavronenko	cf1a8bce6b	lib/index: reduce read/write load after indexDB rotation (#2177 ) * lib/index: reduce read/write load after indexDB rotation IndexDB in VM is responsible for storing TSID - ID's used for identifying time series. The index is stored on disk and used by both ingestion and read path. IndexDB is stored separately to data parts and is global for all stored data. It can't be deleted partially as VM deletes data parts. Instead, indexDB is rotated once in `retention` interval. The rotation procedure means that `current` indexDB becomes `previous`, and new freshly created indexDB struct becomes `current`. So in any time, VM holds indexDB for current and previous retention periods. When time series is ingested or queried, VM checks if its TSID is present in `current` indexDB. If it is missing, it checks the `previous` indexDB. If TSID was found, it gets copied to the `current` indexDB. In this way `current` indexDB stores only series which were active during the retention period. To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both write and read path consult `tsidCache` and on miss the relad lookup happens. When rotation happens, VM resets the `tsidCache`. This is needed for ingestion path to trigger `current` indexDB re-population. Since index re-population requires additional resources, every index rotation event may cause some extra load on CPU and disk. While it may be unnoticeable for most of the cases, for systems with very high number of unique series each rotation may lead to performance degradation for some period of time. This PR makes an attempt to smooth out resource usage after the rotation. The changes are following: 1. `tsidCache` is no longer reset after the rotation; 2. Instead, each entry in `tsidCache` gains a notion of indexDB to which they belong; 3. On ingestion path after the rotation we check if requested TSID was found in `tsidCache`. Then we have 3 branches: 3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID. 3.2 Slow path. It wasn't found, so we generate it from scratch, add to `current` indexDB, add it to `tsidCache`. 3.3 Smooth path. It was found but does not belong to the `current` indexDB. In this case, we add it to the `current` indexDB with some probability. The probability is based on time passed since the last rotation with some threshold. The more time has passed since rotation the higher is chance to re-populate `current` indexDB. The default re-population interval in this PR is set to `1h`, during which entries from `previous` index supposed to slowly re-populate `current` index. The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs were moved from `previous` indexDB to the `current` indexDB. This metric supposed to grow only during the first `1h` after the last rotation. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-12 00:30:08 +02:00
Aliaksandr Valialkin	5f84b17ed6	lib/storage: properly limit cardinality when ingesting multiple samples for the same time series in a single request	2022-01-21 12:38:09 +02:00
Nikolay	8ff7da7202	adds restore.lock (#1988 ) * adds restore.lock it must prevent from running storage after incomplete restore process https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958 * return back flock file deletion * Apply suggestions from code review * wip * docs/CHANGELOG.md: document https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958 Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2021-12-22 13:10:15 +02:00
Aliaksandr Valialkin	ce333f28d8	all: use logger.WithThrottler() where appropriate	2021-12-21 17:03:25 +02:00
Aliaksandr Valialkin	e1a715b0f5	lib/storage: convert alternate regexps into Graphite wildcards inside `__graphite__` pseudo-label For example, `{__graphite__=~"foo.(bar\|baz)"}` is automatically converted to `{__graphite__=~"foo.{bar,baz}"}` before execution. This allows using multi-value Grafana template variables such as `{__graphite__=~"foo.($app)"}`.	2021-12-14 19:51:49 +02:00
Aliaksandr Valialkin	7275ebf91a	app/vmstorage: export vm_cache_size_max_bytes metrics for determining capacity of various caches The vm_cache_size_max_bytes metric can be used for determining caches which reach their capacity via the following query: vm_cache_size_bytes / vm_cache_size_max_bytes > 0.9	2021-12-02 10:30:43 +02:00
Aliaksandr Valialkin	53bb58ed2a	lib/storage: log a warning when the -storageDataPath has less than -storage.minFreeDiskSpaceBytes This should improve the debuggability of the readonly feature. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1727	2021-10-19 23:59:13 +03:00
Aliaksandr Valialkin	001750c239	lib/storage: fix unaligned access on 32-bit architectures. The bug has been introduced at `a171916ef5`	2021-10-08 19:43:03 +03:00
Aliaksandr Valialkin	cf5cbd1c70	app/{vminsert,vmstorage}: follow-up after `a171916ef5` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269	2021-10-08 14:35:49 +03:00
Nikolay	4290b46e8c	Adds read-only mode for vmstorage node (#1680 ) * adds read-only mode for vmstorage https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269 * changes order a bit * moves isFreeDiskLimitReached var to storage struct renames functions to be consistent change protoparser api - with optional storage limit check for given openned storage * renames freeSpaceLimit to ReadOnly	2021-10-08 14:35:48 +03:00
Aliaksandr Valialkin	f77dde837a	lib/promscrape: add the ability to limit the number of unique series per each scrape target The number of series per target can be limited with the following options: * Global limit with `-promscrape.maxSeriesPerTarget` command-line option. * Per-target limit with `max_series: N` option in `scrape_config` section. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1561	2021-09-01 16:03:59 +03:00
Aliaksandr Valialkin	4401464c22	all: add support for Prometheus staleness markers Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845	2021-08-13 12:10:17 +03:00
Aliaksandr Valialkin	682662b2ae	lib/storage: remove cache directory if it contains reset_cache_on_startup file See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1447	2021-07-13 17:58:51 +03:00
Aliaksandr Valialkin	4f80b2f230	lib/storage: properly limit the size of `storage/date_metricID` cache	2021-07-12 14:25:44 +03:00
Aliaksandr Valialkin	8b262d4ba7	lib/storage: periodically reset prefetchedMetricIDs cache in order to limit its size under high churn rate	2021-07-07 10:58:51 +03:00
Aliaksandr Valialkin	d0c830039d	lib/storage: tune cache sizes according to production workload	2021-07-05 15:16:11 +03:00

1 2 3 4 5 ...

275 commits