github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	0bff96fe4b	lib/storage: prioritize data ingestion over heavy queries Heavy queries could result in the lack of CPU resources for processing the current data ingestion stream. Prevent this by delaying queries' execution until free resources are available for data ingestion. Expose `vm_search_delays_total` metric, which may be used in for alerting when there is no enough CPU resources for data ingestion and/or for executing heavy queries. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291	2020-07-05 19:44:04 +03:00
Aliaksandr Valialkin	4cb3e7595c	app/vmstorage: add `-denyQueriesOutsideRetention` command-line flag for denying queries outside the configured retention	2020-07-01 00:58:42 +03:00
Aliaksandr Valialkin	d962568e93	all: use %w instead of %s for wrapping errors in `fmt.Errorf` This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode . See https://blog.golang.org/go1.13-errors for details.	2020-06-30 23:33:46 +03:00
Aliaksandr Valialkin	01719f4949	app/vmstorage/transport: simplify setupTfss in order to prevent the possibility of nil tfs Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/534	2020-06-05 13:17:26 +03:00
Aliaksandr Valialkin	e4cef1b678	app/vmstorage: prevent from serving conns from vminsert and vmselect after the server is closed Previously it was possible that the connection is served after the server is closed if the following steps are performed: 1) Server accepts new connection. 2) Server.MustClose() is called and successfully finished. 3) Server starts processing the connection accepted at step 1. There could be various crashes like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/534 since the storage may be already closed. Now the server closes the connection at step 3 without processing it.	2020-06-05 11:55:48 +03:00
Aliaksandr Valialkin	901093279e	app/vmstorage/transport: update stale comment - vmstorage now sends small `ack` packets to `vminsert`	2020-05-21 14:04:52 +03:00
Aliaksandr Valialkin	2784015a4d	all: print `--help` output to stdout instead of stderr This is easier to grep and pipe	2020-05-16 12:03:06 +03:00
Aliaksandr Valialkin	a853869e75	app/vmstorage/transport: prevent from uncontrolled memory usage growth when `vminsert` sends big packets with too long labels Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/490	2020-05-15 15:42:54 +03:00
Aliaksandr Valialkin	1e5c1d7eaa	app/vmstorage: add `vm_slow_metric_name_loads_total` metric, which could be used as an indicator when more RAM is needed for improving query performance	2020-05-15 14:12:24 +03:00
Aliaksandr Valialkin	d6b9a49481	app/vmstorage: add `vm_slow_row_inserts_total` and `vm_slow_per_day_index_inserts_total` metrics for determining whether VictoriaMetrics required more RAM for the current number of active time series	2020-05-15 13:46:57 +03:00
Aliaksandr Valialkin	3d3f41b961	app/vmstorage/transport: fix panic during server stop on 32-bit arches See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/212	2020-05-12 20:21:40 +03:00
Aliaksandr Valialkin	f7753b1469	lib/storage: gradually pre-populate per-day inverted index for the next day This should prevent from CPU usage spikes at 00:00 UTC every day when inverted index for new day must be quickly created for all the active time series. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430	2020-05-12 12:13:32 +03:00
Aliaksandr Valialkin	3052b479b7	lib/httpserver: reduce typical duration for http server graceful shutdown Previously the duration for graceful shutdown for http server could take more than a minute because of imporperly set timeouts in setNetworkTimeout. Now typical duration for graceful shutdown should be reduced to less than 5 seconds.	2020-05-07 14:16:38 +03:00
Aliaksandr Valialkin	989d84cf3f	app/{vminsert,vmstorage}: wait for `ack` from `vmstorage` after each packet sent to it from `vminsert` This should protect from possible data loss when `vmstorage` is stopped while the packet is sent from `vminsert`. This commit switches to new protocol between vminsert and vmstorage, which is incompatible with the previous protocol. So it is required that both vminsert and vmstorage nodes are updated.	2020-04-27 09:53:26 +03:00
Aliaksandr Valialkin	e933cbac16	lib/storage: postpone reading data from blocks during search This eliminates the need for storing block data into temporary files on a single-node VictoriaMetrics during heavy queries, which touch big number of time series over long time ranges. This improves single-node VM performance on heavy queries by up to 2x.	2020-04-27 08:44:01 +03:00
Aliaksandr Valialkin	f9526809e5	app/vmselect: add `/api/v1/status/tsdb` page with useful stats for locating root cause for high cardinality issues See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/425 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/268	2020-04-22 22:03:23 +03:00
Aliaksandr Valialkin	a53e332a93	app/vmstorage: add missing shutdown for http server on graceful shutdown This could result in the following panic during graceful shutdown when `/metrics` page is requested: http: panic serving 10.101.66.5:57366: runtime error: invalid memory address or nil pointer dereference goroutine 2050 [running]: net/http.(conn).serve.func1(0xc00ef22000) net/http/server.go:1772 +0x139 panic(0xa0fc00, 0xe91d80) runtime/panic.go:973 +0x3e3 github.com/VictoriaMetrics/VictoriaMetrics/lib/workingsetcache.(Cache).UpdateStats(0x0, 0xc0000516c8) github.com/VictoriaMetrics/VictoriaMetrics/lib/workingsetcache/cache.go:224 +0x37 github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(indexDB).UpdateMetrics(0xc00b931d00, 0xc02c41acf8) github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/index_db.go:258 +0x9f github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(Storage).UpdateMetrics(0xc0000bc7e0, 0xc02c41ac00) github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/storage.go:413 +0x4c5 main.registerStorageMetrics.func1(0x0) github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:186 +0xd9 main.registerStorageMetrics.func3(0xc00008c380) github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:196 +0x26 main.registerStorageMetrics.func7(0xc) github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:211 +0x26 github.com/VictoriaMetrics/metrics.(Gauge).marshalTo(0xc000010148, 0xaa407d, 0x20, 0xb50d60, 0xc005319890) github.com/VictoriaMetrics/metrics@v1.11.2/gauge.go:38 +0x3f github.com/VictoriaMetrics/metrics.(Set).WritePrometheus(0xc000084300, 0x7fd56809c940, 0xc005319860) github.com/VictoriaMetrics/metrics@v1.11.2/set.go:51 +0x1e1 github.com/VictoriaMetrics/metrics.WritePrometheus(0x7fd56809c940, 0xc005319860, 0xa16f01) github.com/VictoriaMetrics/metrics@v1.11.2/metrics.go:42 +0x41 github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.writePrometheusMetrics(0x7fd56809c940, 0xc005319860) github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/metrics.go:16 +0x44 github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper(0xb5a120, 0xc005319860, 0xc005018f00, 0xc00002cc90) github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:154 +0x58d github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.gzipHandler.func1(0xb5a120, 0xc005319860, 0xc005018f00) github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:119 +0x8e net/http.HandlerFunc.ServeHTTP(0xc00002d110, 0xb5a660, 0xc0044141c0, 0xc005018f00) net/http/server.go:2012 +0x44 net/http.serverHandler.ServeHTTP(0xc004414000, 0xb5a660, 0xc0044141c0, 0xc005018f00) net/http/server.go:2807 +0xa3 net/http.(conn).serve(0xc00ef22000, 0xb5bf60, 0xc010532080) net/http/server.go:1895 +0x86c created by net/http.(Server).Serve net/http/server.go:2933 +0x35c	2020-04-02 21:09:55 +03:00
Aliaksandr Valialkin	3b744f3c32	app/vmstorage: typo fix	2020-04-01 23:43:09 +03:00
Aliaksandr Valialkin	f838cdc86e	app/vmstorage: add `vm_free_disk_space_bytes` metric for monitoring the remaining disk space at `-storageDataPath`	2020-04-01 23:10:44 +03:00
Aliaksandr Valialkin	d450249955	lib/storage: properly handle `{label=~"foo\|"}` filters as Prometheus does Such filters must match all the time series with `label="foo"` plus all the time series without `label` Previously only time series with `label="foo"` were matched.	2020-03-30 20:21:47 +03:00
Dmitry Naumov	b84071fc25	Rootless docker images by default (#358 ) * Rootless docker images by default * Migrate to rootless base image Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>	2020-03-27 21:18:32 +02:00
Aliaksandr Valialkin	8939c19281	app/vmstorage: return 500 status code instead of 200 status code on internal errors inside `/snapshot/*` handlers	2020-03-10 23:54:27 +02:00
Aliaksandr Valialkin	cf9aee4ec3	all: properly split `vm_deduplicated_samples_total` among cluster components Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/345	2020-02-27 23:47:51 +02:00
Aliaksandr Valialkin	afecb34491	app/vmstorage: limit the maximum error message size before sending it to client Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/315	2020-02-13 17:33:12 +02:00
Aliaksandr Valialkin	1010a57882	all: allow setting flags via environment vars Now flags can be set via environment vars with the same names as flags. Command-line flags override flags set via env vars. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/311	2020-02-10 13:31:21 +02:00
Aliaksandr Valialkin	ea66212c93	lib/storage: move `-dedup.minScrapeInterval` flag outside lib/storage, so it doesnt show up in `vminsert` in cluster version	2020-02-10 13:07:25 +02:00
Aliaksandr Valialkin	4ed5e9a7ce	lib/storage: pre-fetch metricNames for the found metricIDs in Search.Init This should speed up Search.NextMetricBlock loop for big number of found time series.	2020-01-30 15:16:16 +02:00
Aliaksandr Valialkin	ea53a21b02	all: consistently log durations in seconds with millisecond precision This should improve logs readability	2020-01-22 18:35:24 +02:00
Aliaksandr Valialkin	cffaeda0f1	all: publish Docker images for the following GOARCH: amd64, arm, arm64, ppc64le and 386 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/258	2019-12-11 23:33:11 +02:00
Aliaksandr Valialkin	4e22b521c2	lib/storage: remove metricID with missing metricID->metricName entry The metricID->metricName entry can be missing in the indexdb after unclean shutdown when only a part of entries for new time series is written into indexdb. Recover from such a situation by removing the broken metricID. New metricID will be automatically created for time series with the given metricName when new data point will arive to it.	2019-12-02 20:52:13 +02:00
Aliaksandr Valialkin	4c63caa37c	deployment/docker/certs: update TLS certs source from alpine:3.9 to alpine:3.10	2019-11-29 19:55:36 +02:00
Aliaksandr Valialkin	d297b65089	lib/storage: add `vm_cache_size_bytes{type="storage/hour_metric_ids"}` metric	2019-11-13 20:26:05 +02:00
Aliaksandr Valialkin	494ad0fdb3	lib/storage: remove inmemory index for recent hour, since it uses too much memory Production workload shows that the index requires ~4Kb of RAM per active time series. This is too much for high number of active time series, so let's delete this index. Now the queries should fall back to the index for the current day instead of the index for the recent hour. The query performance for the current day index should be good enough given the 100M rows/sec scan speed per CPU core.	2019-11-13 18:08:58 +02:00
Aliaksandr Valialkin	633dd81bb5	lib/storage: add `-disableRecentHourIndex` flag for disabling inmemory index for recent hour This may be useful for saving RAM on high number of time series aka high cardinality	2019-11-13 15:10:12 +02:00
Aliaksandr Valialkin	f1620ba7c0	lib/storage: fix inmemory inverted index issues found in v1.29 Issues fixed: - Slow startup times. Now the index is loaded from cache during start. - High memory usage related to superflouos index copies every 10 seconds.	2019-11-13 13:35:38 +02:00
Aliaksandr Valialkin	87b39222be	Revert "lib/fs: do not postpone directory removal on NFS error" This reverts commit 21aeb02b46649ac9906cb37733f7b155a77a0db9.	2019-11-12 16:29:50 +02:00
Aliaksandr Valialkin	c48e39eea9	lib/storage: add tests for dateMetricIDCache	2019-11-11 13:21:05 +02:00
Aliaksandr Valialkin	5f52eb7653	lib/fs: do not postpone directory removal on NFS error Continue trying to remove NFS directory on temporary errors for up to a minute. The previous async removal process breaks in the following case during VictoriaMetrics start - VictoriaMetrics opens index, finds incomplete merge transactions and starts replaying them. - The transaction instructs removing old directories for parts, which were already merged into bigger part. - VictoriaMetrics removes these directories, but their removal is delayed due to NFS errors. - VictoriaMetrics scans partition directory after all the incomplete merge transactions are finished and finds directories, which should be removed, but weren't still removed due to NFS errors. - VictoriaMetrics panics when it finds unexpected empty directory. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162	2019-11-10 13:27:16 +02:00
Aliaksandr Valialkin	9ea2bd822e	lib/storage: implement per-day inverted index	2019-11-10 00:20:32 +02:00
Aliaksandr Valialkin	dea2f3efed	lib/storage: use specialized cache for (date, metricID) entries This improves ingestion performance.	2019-11-09 23:09:18 +02:00
Aliaksandr Valialkin	46e67bb78c	lib/storage: export `vm_new_timeseries_created_total` metric for determining time series churn rate	2019-11-08 19:58:21 +02:00
Aliaksandr Valialkin	0063c857f5	lib/storage: add inmemory inverted index for the last hour It should improve performance for `last N hours` dashboards with update intervals smaller than 1 hour.	2019-11-08 19:37:46 +02:00
Aliaksandr Valialkin	1c777e0245	lib/storage: substitute error message about unsorted items in the index block after metricIDs merge with counter The origin of the error has been detected and documented in the code, so it is enough to export a counter for such errors at `vm_index_blocks_with_metric_ids_incorrect_order_total`, so it could be monitored and alerted on high error rates. Export also the counter for processed index blocks with metricIDs - `vm_index_blocks_with_metric_ids_processed_total`, so its' rate could be compared to `rate(vm_index_blocks_with_metric_ids_incorrect_order_total)`.	2019-11-06 14:32:41 +02:00
Aliaksandr Valialkin	6ab9c98a1e	app/vmstorage: add `-bigMergeConcurrency` and `-smallMergeConcurrency` flags for tuning the maximum number of CPU cores used during merges	2019-10-31 16:17:29 +02:00
Aliaksandr Valialkin	b101064f8b	all: report the number of bytes read on io.ReadFull error This should simplify error investigation similar to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/175	2019-09-11 14:50:24 +03:00
Aliaksandr Valialkin	2c654258ef	lib/fs: add MustStopDirRemover for waiting until pending directories are removed on graceful shutdown This patch is mainly required for laggy NFS. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162	2019-09-05 11:17:17 +03:00
Aliaksandr Valialkin	5893a9f9a3	app/vmstorage: increase default values for search.maxTagKeys, search.maxTagValues and search.maxUniqueTimeseries	2019-08-27 14:28:26 +03:00
Aliaksandr Valialkin	f56c1298ad	app/vmstorage: add `vm_concurrent_addrows_*` metrics for tracking concurrency for Storage.AddRows calls Track also the number of dropped rows due to the exceeded timeout on concurrency limit for Storage.AddRows. This number is tracked in `vm_concurrent_addrows_dropped_rows_total`	2019-08-06 15:08:43 +03:00
Aliaksandr Valialkin	880b1d80b1	app/vmselect: optimize `/api/v1/series` by skipping storage data Fetch and process only time series metainfo.	2019-08-04 23:00:46 +03:00
Aliaksandr Valialkin	8253790157	app/vmstorage: consistency renaming for `ignored rows` metrics vm_too_big_timestamp_rows_total -> vm_rows_ignored_total{reason="big_timestamp"} vm_too_small_timestamp_rows_total -> vm_rows_ignored_total{reason="small_timestamp"}	2019-07-26 20:02:24 +03:00

1 2

64 commits