github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	1f33dd717f	lib/storage: add `/internal/force_merge` handler for running forced compactions on historical per-month partitions This may be useful for freeing up storage space after time series deletion. See https://victoriametrics.github.io/#force-merge for more details. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686	2020-09-17 12:20:40 +03:00
Aliaksandr Valialkin	f6bc608e86	app/vmselect: initial implementation of Graphite Metrics API See https://graphite-api.readthedocs.io/en/latest/api.html#the-metrics-api	2020-09-11 00:30:01 +03:00
Aliaksandr Valialkin	9d8fdff6c5	lib/storage: reuse timestamp blocks for adjancent metric blocks with identical timestamps This should reduce disk space usage when scraping targets containing metrics with identical names such as `node_cpu_seconds_total`, histograms, quantiles, etc. Expose `vm_timestamps_blocks_merged_total` and `vm_timestamps_bytes_saved_total` metrics for monitoring the effectiveness of timestamp blocks merging.	2020-09-09 23:59:32 +03:00
Aliaksandr Valialkin	dbbdfbe7ee	app/vmstorage: rename `vm_cache_size_entries{type="storage/prefetchedMetricIDs"}` to `vm_cache_entries{type="storage/prefetchedMetricIDs"}` to be consistent with other `vm_cache_entries` metrics	2020-08-06 16:34:24 +03:00
Aliaksandr Valialkin	8f16388428	lib/storage: limit the number of concurrent calls to storage.searchTSIDs to GOMAXPROCS*2 This should limit the maximum memory usage and reduce CPU trashing on vmstorage when multiple heavy queries are executed. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648	2020-08-05 18:30:07 +03:00
Aliaksandr Valialkin	e7959094f6	lib/storage: remove prioritizing of merging small parts over merging big parts, since it doesn't work as expected The prioritizing could lead to big merge starvation, which could end up in too big number of parts that must be merged into big parts. Multiple big merges may be initiated after the migration from v1.39.0 or v1.39.1. It is OK - these merges should be finished soon, which should return CPU and disk IO usage to normal levels. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/618	2020-07-30 19:57:27 +03:00
Aliaksandr Valialkin	039c9d2441	lib/storage: respect `-search.maxQueryDuration` when searching for time series in inverted index Previously the time spent on inverted index search could exceed the configured `-search.maxQueryDuration`. This commit stops searching in inverted index on query timeout.	2020-07-23 21:21:42 +03:00
Aliaksandr Valialkin	6f05c4d351	lib/storage: improve prioritizing of data ingestion over querying Prioritize also small merges over big merges. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648	2020-07-23 13:23:36 +03:00
Aliaksandr Valialkin	b35cb293f5	lib/httpserver: log remote address in error message from `httpserver.Errorf` This should improve detection of the root cause of errors. Thanks to Anant for the idea.	2020-07-20 14:11:22 +03:00
Aliaksandr Valialkin	6daa5f7500	lib/storage: prioritize data ingestion over heavy queries Heavy queries could result in the lack of CPU resources for processing the current data ingestion stream. Prevent this by delaying queries' execution until free resources are available for data ingestion. Expose `vm_search_delays_total` metric, which may be used in for alerting when there is no enough CPU resources for data ingestion and/or for executing heavy queries. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291	2020-07-05 19:42:05 +03:00
Aliaksandr Valialkin	84a37098ed	app/vmstorage: add `-denyQueriesOutsideRetention` command-line flag for denying queries outside the configured retention VictoriaMetrics returns `503 Service Unavailable` http error for requests with time ranges outside the configured retention if `-denyQueriesOutsideRetention` command-line flag is set.	2020-07-01 00:21:44 +03:00
Aliaksandr Valialkin	d5dddb0953	all: use %w instead of %s for wrapping errors in `fmt.Errorf` This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode . See https://blog.golang.org/go1.13-errors for details.	2020-06-30 23:05:11 +03:00
Aliaksandr Valialkin	82ffbcb9a6	app/vmstorage: add `vm_slow_metric_name_loads_total` metric, which could be used as an indicator when more RAM is needed for improving query performance	2020-05-15 14:11:45 +03:00
Aliaksandr Valialkin	82ccdfaa91	app/vmstorage: add `vm_slow_row_inserts_total` and `vm_slow_per_day_index_inserts_total` metrics for determining whether VictoriaMetrics required more RAM for the current number of active time series	2020-05-15 13:44:32 +03:00
Aliaksandr Valialkin	dbd0c552d5	lib/storage: gradually pre-populate per-day inverted index for the next day This should prevent from CPU usage spikes at 00:00 UTC every day when inverted index for new day must be quickly created for all the active time series. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430	2020-05-12 12:13:05 +03:00
Aliaksandr Valialkin	364db13c9c	app/vmselect: add `/api/v1/status/tsdb` page with useful stats for locating root cause for high cardinality issues See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/425 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/268	2020-04-22 22:03:43 +03:00
Aliaksandr Valialkin	b38d048dd9	app/vmstorage: add `vm_free_disk_space_bytes` metric for monitoring the remaining disk space at `-storageDataPath`	2020-04-01 23:08:58 +03:00
Aliaksandr Valialkin	301c2acd61	app/vmstorage: return 500 status code instead of 200 status code on internal errors inside `/snapshot/*` handlers	2020-03-10 23:51:55 +02:00
Aliaksandr Valialkin	18af31a4c2	all: properly split `vm_deduplicated_samples_total` among cluster components Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/345	2020-02-27 23:48:07 +02:00
Aliaksandr Valialkin	d68546aa4a	lib/storage: pre-fetch metricNames for the found metricIDs in Search.Init This should speed up Search.NextMetricBlock loop for big number of found time series.	2020-01-30 15:08:51 +02:00
Aliaksandr Valialkin	680080887d	all: consistently log durations in seconds with millisecond precision This should improve logs readability	2020-01-22 18:28:27 +02:00
Aliaksandr Valialkin	20812008a7	lib/storage: remove metricID with missing metricID->metricName entry The metricID->metricName entry can be missing in the indexdb after unclean shutdown when only a part of entries for new time series is written into indexdb. Recover from such a situation by removing the broken metricID. New metricID will be automatically created for time series with the given metricName when new data point will arive to it.	2019-12-02 20:46:44 +02:00
Aliaksandr Valialkin	119dfd01bb	lib/storage: add `vm_cache_size_bytes{type="storage/hour_metric_ids"}` metric	2019-11-13 20:24:21 +02:00
Aliaksandr Valialkin	86a1cd700b	lib/storage: remove inmemory index for recent hour, since it uses too much memory Production workload shows that the index requires ~4Kb of RAM per active time series. This is too much for high number of active time series, so let's delete this index. Now the queries should fall back to the index for the current day instead of the index for the recent hour. The query performance for the current day index should be good enough given the 100M rows/sec scan speed per CPU core.	2019-11-13 17:58:07 +02:00
Aliaksandr Valialkin	c57eb0ff83	lib/storage: add `-disableRecentHourIndex` flag for disabling inmemory index for recent hour This may be useful for saving RAM on high number of time series aka high cardinality	2019-11-13 15:02:51 +02:00
Aliaksandr Valialkin	ca259864e2	lib/storage: return back inmemory inverted index for recent hour Issues fixed: - Slow startup times. Now the index is loaded from cache during start. - High memory usage related to superflouos index copies every 10 seconds.	2019-11-13 13:11:04 +02:00
Aliaksandr Valialkin	01bb3c06c7	lib/storage: remove inmemory inverted index for recent hours Production load with >10M active time series showed it could slow down VictoriaMetrics startup times and could eat all the memory leading to OOM. Remove inmemory inverted index for recent hours until thorough testing on production data shows it works OK.	2019-11-13 10:45:53 +02:00
Aliaksandr Valialkin	8e8f98f712	lib/storage: add tests for dateMetricIDCache	2019-11-11 13:21:57 +02:00
Aliaksandr Valialkin	ee7765b10d	lib/storage: implement per-day inverted index	2019-11-10 00:02:46 +02:00
Aliaksandr Valialkin	5810ba57c2	lib/storage: use specialized cache for (date, metricID) entries This improves ingestion performance.	2019-11-09 23:06:11 +02:00
Aliaksandr Valialkin	6ad7fe8eeb	lib/storage: export `vm_new_timeseries_created_total` metric for determining time series churn rate	2019-11-08 21:21:07 +02:00
Aliaksandr Valialkin	d888b21657	lib/storage: add inmemory inverted index for the last hour It should improve performance for `last N hours` dashboards with update intervals smaller than 1 hour.	2019-11-08 21:21:07 +02:00
Aliaksandr Valialkin	e472f0b23b	lib/storage: substitute error message about unsorted items in the index block after metricIDs merge with counter The origin of the error has been detected and documented in the code, so it is enough to export a counter for such errors at `vm_index_blocks_with_metric_ids_incorrect_order_total`, so it could be monitored and alerted on high error rates. Export also the counter for processed index blocks with metricIDs - `vm_index_blocks_with_metric_ids_processed_total`, so its' rate could be compared to `rate(vm_index_blocks_with_metric_ids_incorrect_order_total)`.	2019-11-06 14:28:11 +02:00
Aliaksandr Valialkin	d18ea0c95b	app/vmstorage: add `-bigMergeConcurrency` and `-smallMergeConcurrency` flags for tuning the maximum number of CPU cores used during merges	2019-10-31 16:19:13 +02:00
Aliaksandr Valialkin	b8bb74ffc6	app/vmstorage: add `vm_concurrent_addrows_*` metrics for tracking concurrency for Storage.AddRows calls Track also the number of dropped rows due to the exceeded timeout on concurrency limit for Storage.AddRows. This number is tracked in `vm_concurrent_addrows_dropped_rows_total`	2019-08-06 15:08:33 +03:00
Aliaksandr Valialkin	c98725db55	app/vmstorage: consistency renaming for `ignored rows` metrics vm_too_big_timestamp_rows_total -> vm_rows_ignored_total{reason="big_timestamp"} vm_too_small_timestamp_rows_total -> vm_rows_ignored_total{reason="small_timestamp"}	2019-07-26 20:02:06 +03:00
Aliaksandr Valialkin	f586e1f83c	lib/storage: add metrics for calculating skipped rows outside the retention The metrics are: - vm_too_big_timestamp_rows_total - vm_too_small_timestamp_rows_total	2019-07-26 14:11:01 +03:00
Aliaksandr Valialkin	101fa258e5	app/vmstorage: prepare for integration tests with multiple Init / Stop cycles	2019-07-11 15:34:50 +03:00
Aliaksandr Valialkin	1fe6d784d8	all: consistency renaming: bytesSize -> sizeBytes	2019-07-10 00:47:36 +03:00
Aliaksandr Valialkin	56c154f45b	all: add `vm_data_size_bytes` metrics for easy monitoring of on-disk data size and on-disk inverted index size	2019-07-04 19:42:30 +03:00
Aliaksandr Valialkin	a78b3dba7f	app/vmstorage: add `vm_cache_entries{type="storage/hour_metric_ids"}` metric for tracking active time series count	2019-06-19 18:36:47 +03:00
Aliaksandr Valialkin	cbe692f0e2	app/vmselect: add `/api/v1/labels/count` handler for quick detection of labels with the maximum number of distinct values	2019-06-10 19:55:38 +03:00
Aliaksandr Valialkin	7b6623558f	lib/storage: skip adaptive searching for tag filter matching the minimum number of metrics if the identical previous search didn't found such filter This should improve speed for searching metrics among high number of time series with high churn rate like in big Kubernetes clusters with frequent deployments.	2019-06-10 14:07:39 +03:00
Aliaksandr Valialkin	7354090aad	app/vmstorage: add missing `_total` suffixes to newly added metrics	2019-06-09 22:11:36 +03:00
Aliaksandr Valialkin	d37924900b	lib/storage: optimize time series lookup for recent hours when the db contains many millions of time series with high churn rate (aka frequent deployments in Kubernetes)	2019-06-09 19:13:56 +03:00
Aliaksandr Valialkin	1836c415e6	all: open-sourcing single-node version	2019-05-23 00:18:06 +03:00

1 2

96 commits