github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Artem Fetishev	683f8c2780	dashboards: add Restarts panel (#7394 ) Reopening PR #7373 from a branch in VictoriaMetrics repo in order to enable edits and rebase. - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-10-30 16:44:08 +01:00
Zhu Jiekun	cd2222aa95	dashboards: fix query for full ETA vm_free_disk_space_bytes - vm_free_disk_space_limit_bytes (#7355 ) ### Describe Your Changes Fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7334 available disk space should be ``` (vm_free_disk_space_bytes{job=~...} - vm_free_disk_space_limit_bytes{job=~...}) ``` instead of ``` vm_free_disk_space_bytes{job=~...} ``` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-10-25 15:09:14 +02:00
Hui Wang	d3f110373c	dashboards: fix description about pending datapoints (#7235 ) See [our playground](https://play-grafana.victoriametrics.com/d/oS7Bi_0Wz_vm/victoriametrics-cluster-vm?orgId=1&var-ds=P996FABE17B5F6D1E&var-job=All&var-job_insert=All&var-job_select=All&var-job_storage=All&var-instance=All) for reference.	2024-10-11 13:47:14 +02:00
Roman Khavronenko	4d0b41e63b	deployment: add panel and alerts for displying go scheduler latency (#7078 ) The panel and alerting rule should help to understand whether VM component doesn't have enough CPU resources or gets throttled. The alert is applicable for all VM components. The panel was added to vmalert, vmagent, vmsingle, vm clusert and victorialogs dashes. ------------------- This alerting rule should have help us identify resource shortage for sandbox vmagent - see [this link](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/?g0.range_input=23d13h25m25s424ms&g0.end_input=2024-09-23T14%3A11%3A00&g0.relative_time=none&g0.tab=0&g0.expr=histogram_quantile%280.99%2C+sum%28rate%28go_sched_latencies_seconds_bucket%7Bjob%3D%22vmagent-monitoring-vmagent%22%7D%5B5m%5D%29%29+by+%28le%2C+job%2C+instance%29%29+%3E+0.1) for example. We weren't aware of resource shortage, because VM metrics assumed this vmagent had 1vCPU while in fact its limit was 0.2vCPU. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-09-23 16:54:42 +02:00
zjbztianya	1b1e61030b	dashboards: typo fix (#6920 ) ### Describe Your Changes Correct the spelling error of 'vminsert' in the dashboards. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-09-03 10:27:01 +02:00
hagen1778	9dd9b4442f	dashboards: use `$__interval` variable for offsets and look-behind windows in annotations This should improve precision of `restarts` and `version change` annotations when zooming-in/zooming-out on the dashboards. The change also makes `restarts` dashboard visible on the panels, so user can disable it from displaying if needed. This could be useful when restarts overlap with version change events. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-22 16:32:51 +02:00
hagen1778	c746ba154d	deployment/dashboards: fix `AnnotationQueryRunner` error in Grafana The error appears when executing annotations query against Prometheus backend because the query itself hasn't specified look-behind window (which is allowed in VictoriaMetrics query engine). https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6309 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-21 11:39:02 +02:00
hagen1778	d386a68b59	dashboards: add new panel `Concurrent selects` to `vmstorage` row The panel will show how many ongoing select queries are processed by vmstorage and should help to identify resource bottlenecks. See panel description for more details. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-04-29 13:55:19 +02:00
hagen1778	9256df17fa	deployment: bump Grafana version to 10.4.2 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-04-29 12:10:24 +02:00
hagen1778	035de57e5e	dashboards: show max number of active merges instead of cumulative The cumulative number of active merges could be red herring as it its value depends on the number of vmstorages. For example, vmstorage could be added or removed and this will affect the panel. Or, each vmstorage could start a merging process (i.e. for downsampling) and visiually it could look like a massive change. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-04-23 16:41:48 +02:00
Aliaksandr Valialkin	f4b1cbfef0	all: replace old https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html url with the new one - https://docs.victoriametrics.com/cluster-victoriametrics/	2024-04-18 02:54:20 +02:00
Aliaksandr Valialkin	8eeb045d3f	all: replace old https://docs.victoriametrics.com/MetricsQL.html url with the new one - https://docs.victoriametrics.com/metricsql/	2024-04-18 02:14:53 +02:00
hagen1778	f781c42ea4	dashboards: add more context to cluster dashboard panels Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-03-05 15:00:49 +01:00
hagen1778	0ab1069363	dashboards: update links in various panels * use docs.victoriametrics.com instead of github docs * add links to common terms used in VictoriaMetrics Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-03-04 15:43:31 +01:00
hagen1778	ecccd2a1cc	dashboards: add legend details to network panels in cluster dash Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-02-16 10:20:38 +01:00
hagen1778	3380043424	dashboards: follow-up `4369bc1df2` * add more details to changelog * simplify panels description * remove capacity planning recommendation, as it proves it incompetent Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-02-08 09:51:43 +01:00
Hui Wang	4369bc1df2	deployment/dashboards: fix `Storage full ETA` panels (#5747 ) During background downsampling, rate(vm_deduplicated_samples_total{type="merge"}) could be much bigger than rate(vm_rows_added_to_storage_total) and it could last quite some time, which causes negative values of Storage full ETA and confuses users, see playground. Instead of trying to get more accurate results during downsampling, I think it's ok to ignore vm_deduplicated_samples_total at all, it's more reasonable to see Storage full ETA increase after downsampling. --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-02-08 09:43:39 +01:00
hagen1778	487a94565b	dashboards/all: add new panel `CPU spent on GC` It should help identifying cases when too much CPU is spent on garbage collection, and advice users on how this can be addressed. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-02-02 16:21:21 +01:00
hagen1778	db11b94e30	dashboards: update to grafana/grafana:10.3.1 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-02-02 15:41:08 +01:00
hagen1778	c23e8bee89	dashboards: specify where to see details about dropped labels Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-29 07:37:51 +01:00
hagen1778	463455665b	dashboards: update cluster dashboard * add panels for detailed visualization of traffic usage between vmstorage, vminsert, vmselect components and their clients. New panels are available in the rows dedicated to specific components. * update "Slow Queries" panel to show percentage of the slow queries to the total number of read queries served by vmselect. The percentage value should make it more clear for users whether there is a service degradation. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-08 11:58:31 +01:00
Aliaksandr Valialkin	aefd744abb	dashboards: remove `path!="/favicon.ico"` filter from `requests rate` graphs The `path!="/favicon.ico"` filter has little sense, since there are many other special paths, which may be filtered out - /metrics, /flags, /health, /ping, /robots.txt, /-/healthy, /-/ready, /reload, etc. See /lib/httpserver/httpserver.go for more details. It will be hard or impossible to maintain filters for all these paths, so it is better to drop this filter in order to simplify queries and improve the consistency of these queries.	2023-11-16 19:28:49 +01:00
hagen1778	d389a4fcf3	dashboards: use `version` instead of `short_version` in annotations `version` label won't show the difference if various flavors of the same version were deployed. But `short_version` will. For example, on the sandbox env we test VM builds before new version release. Without this change, the version update won't be visible on dashboard. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-16 09:26:47 +01:00
hagen1778	d3ae2b2f62	dashboards: update description for RSS and anonymous memory panels to be consistent for single-node, cluster and vmagent dashboards. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-14 09:50:06 +01:00
hagen1778	d6ae082598	deployment/dashboards: respect `job` and `instance` filters for `alerts` annotation in cluster and single-node dashboards Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-14 09:38:15 +01:00
hagen1778	f6208965ce	dashboards/cluster: fix description about `max` threshold for `Concurrent selects` panel. Before, it was mistakenly implying that `max` is equal to the double of available CPUs. Addresses https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5214 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-31 16:05:33 +01:00
hagen1778	d43566605b	dasbhoards: fix vminsert/vmstorage/vmselect metrics filtering Fix vminsert/vmstorage/vmselect metrics filtering when dashboard is used to display data from many sub-clusters with unique job names. Before, only one specific job could have been accounted for component-specific panels, instead of all available jobs for the component. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-11 12:09:04 +02:00
hagen1778	d890038a94	dashboards: correctly calculate `Bytes per point` value Correctly calculate `Bytes per point` value for single-server and cluster VM dashboards. Before, the calculation mistakenly accounted for the number of entries in indexdb in denominator, which could have shown lower values than expected. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-08-03 16:22:50 +02:00
Zakhar Bessarab	6f3fee197e	dashboards/cluster: fix using storage filter for cache usage panel (#4657 ) Using `job=~$job_storage` forces "Cache usage" panel to display only vmstorage caches, but there is a cache peresent at vmselect(`promql/rollupResult`). Updated selector to match generic `$job` so that all caches will be displayed with an option to display per-job caches. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-07-18 11:40:40 +02:00
Roman Khavronenko	ccaa9571ef	Dashboard upd (#4438 ) dashboards: update dashboard for single-node version * add anonymous mem usage panel; * add syscall rate panel; * add location to logs panel; * update legend for panels to reflect instance name; * update queries to aggregate per instance. dashboards: update dashboard for cluster version * add syscall rate panel; * add drilldown to logs panel. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-12 15:58:47 +02:00
Aliaksandr Valialkin	91533531f5	docs/Troubleshooting.md: document an additional case, which could result in slow inserts If `-cacheExpireDuration` is lower than the interval between ingested samples for the same time series, then vm_slow_row_inserts_total` metric is increased. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3976#issuecomment-1476883183	2023-03-20 13:28:36 -07:00
Roman Khavronenko	3eebe52a06	Dashboards upd (#3942 ) * dashboards/cluser: use `quantile` since `median` isn't supported by PromQL Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards/: add `restarts` annotation to show when there were restarts The cluster's annotation query is aggregated `by job`, while vmagent/vmalert are aggregated `by job, instance`. This is because cluster dashboard can contains too many instances and annotation could become too noisy. Signed-off-by: hagen1778 <roman@victoriametrics.com> dashboards/*: support instance filter in Version annotation Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-03-10 17:13:19 +01:00
Roman Khavronenko	2e153b68cd	dashboards: account for indexdb size in Bytes-per-Point panel (#3884 ) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-02-28 17:47:52 +01:00
Roman Khavronenko	b209d4ace0	dashboards: use `median` instead of `avg` (#3800 ) `avg` can be affected by just one outlier, which may lead to false conclusions. `median` is supposed to reflect reality better by leveling outliers out. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-02-11 10:01:30 -08:00
Aliaksandr Valialkin	88fed0232c	dashboards: typo fix `Datapoints scanned per series` -> `Datapoints scanned per query`	2023-02-03 19:12:33 -08:00
Roman Khavronenko	b3a70b8284	dasbhoards: fix the tooltip info for 1.86 (#3628 ) See `c63755c316 (diff-bba263a473e7fbc9d0fde075ebef6b3d4e32c322ee1210a3e07182292c7723aaR18)` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-01-11 11:30:12 +01:00
Aliaksandr Valialkin	c63755c316	lib/writeconcurrencylimiter: improve the logic behind -maxConcurrentInserts limit Previously the -maxConcurrentInserts was limiting the number of established client connections, which write data to VictoriaMetrics. Some of these connections could be idle. Such connections do not consume big amounts of CPU and RAM, so there is a little sense in limiting the number of such connections. So now the -maxConcurrentInserts command-line option limits the number of concurrently executed insert requests, not including idle connections. It is recommended removing -maxConcurrentInserts command-line option, since the default value for this option should work good for most cases.	2023-01-06 22:20:19 -08:00
Thomas Danielsson	9d1104d812	dashboards: fix operator datasource variable (#3604 ) Got "Failed to upgrade legacy queries Datasource $ds was not found" in Grafana on operator dashboard. It's datasource variable was incorrectly named `datasource`. Also made the rest of the dashboards have homogeneous datasource-variable names and selections, matching vmagent dashboard.	2023-01-05 14:59:56 +01:00
Roman Khavronenko	e40c7d6efa	dashboards: respect $job var in sub-vars for cluster dash (#3487 ) Previously, $job_select, $job_storage and $job_insert didn't respect the $job filter. This change updates the variable queries to account for set $job variable. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-12-16 09:53:32 +01:00
Roman Khavronenko	eb275be99d	dashboards: add VersionChange annotation (#3473 ) The new annotation is hidden by default and suppose to show component `short_version` label change on the panels. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-12-12 16:32:26 +01:00
Aliaksandr Valialkin	f3e84b4dea	{dashboards,alerts}: subtitute `{type="indexdb"}` with `{type=~"indexdb.*"}` inside queries after `8189770c50` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337	2022-12-05 16:00:22 -08:00
Roman Khavronenko	6801b37e53	dashboards: add `Disk space usage %` and `Disk space usage % by type` panels (#3436 ) The new panels have been added to the vmstorage and drilldown rows. `Disk space usage %` is supposed to show disk space usage percentage. This panel is now also referred by `DiskRunsOutOfSpace` alerting rule. This panel has Drilldown option to show absolute values. `Disk space usage % by type` shows the relation between datapoints and indexdb size. It supposed to help identify cases when indexdb starts to take too much disk space. This panel has Drilldown option to show absolute values. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-12-05 08:35:33 +01:00
Roman Khavronenko	f989c20dd7	dashboards: fix typo in data link (#3426 ) Fixes a missing `&` char in data link for ETA panel on cluster dashboards. Without `&` char it generates wrong link when click on Drilldown menu. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-12-01 13:21:14 +01:00
Roman Khavronenko	31ff26065b	dashboards: update VM cluster dash (#3401 ) The change list is the following: * bump Grafana version to 9.2.6; * remove artifacts in data links. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-11-28 14:13:00 +01:00
Timur Bakeyev	9ad578214e	Update `datasource` entries consistently contain type `prometheus` and uid `$ds`. (#3393 ) Co-authored-by: Timour I. Bakeev <tbakeev@ripe.net>	2022-11-28 08:37:39 +01:00
Roman Khavronenko	42e63fe0fd	dashboards: cleanup & remove artifacts (#3387 ) * some unexpected DS UIDs were removed; * replace `$instance.` filter with `$instance` since we respect the instance port anyway; remove predefined datasource for `clusterbytenant` in favour of datasource variable `ds`. Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-11-25 09:28:14 +01:00
Roman Khavronenko	3407006cdb	dashboards: cluster dashboard update (#3380 ) The purpose of the update is to make the dash more usable for large installations with many instances. Panels which showed metrics per-instance (Mem, CPU) now are showing metrics per-job or min/max/avg aggregations in % instead. This supposed to help immediately to identify resource shortage and remain usable for small and big installations. For cases when detailed info is needed, to the bottom of the dashboard a new row `Drilldown` was added. Panels like Mem or CPU now contain a `data-link` named `Drilldown` (cis shown on line click) which takes user to more detailed panel. The change list is the following: * bump Grafana version to 9.1.0; * replace old "Graph" panel with "TimeSeries" panel; * improve Uptime panel to show number of instances per job; * show % usage of Mem and CPU instead of absolute values; * `Caches` row was removed. All needed info for caches is now part of `Troubleshooting`; * add `Drilldown` section for detailed resource usage; * add Annotations for Alert triggers. Not all alerts are supposed to be displayed on the dashboard, but only those with label `show_at: dashboard`. See `alerts-cluster.yml` change. Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-11-23 18:03:25 -08:00
Roman Khavronenko	908fe6a623	dashboards: replace `Index size` panel with `Active series` (#3157 ) Panel `Index size` showed itself impractical for users. So replacing it with `Active series` panel. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/776#issuecomment-1255823734 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-09-25 21:49:18 +02:00
Roman Khavronenko	b4410b1c63	Dashboards (#3120 ) * dashboards/cluster: few updates * apply consistent formatting across panels; * make resource usage panels per component more detailed; * add extra panels to vmselect for displaying `vm_rows_read_per_query`, `vm_rows_scanned_per_query`, `vm_rows_read_per_series` and `vm_series_read_per_query` metrics. Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards/single: few updates * apply consistent formatting across panels; * add extra panels to Performance for displaying `vm_rows_read_per_query`, `vm_rows_scanned_per_query`, `vm_rows_read_per_series` and `vm_series_read_per_query` metrics. Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards/vmagent: few updates * apply consistent formatting across panels; * add panels for showing number of samples ingested or scraped; * adapt resource usage panels for multiple selected jobs/instances; * add adhoc variable; * display vmagent's version in Stats. Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards/vmalert: few updates * apply consistent formatting across panels; * adapt resource usage panels for multiple selected jobs/instances; * show vmalert version in Stats section. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-09-16 21:24:32 +02:00
Max Golionko	7da9443686	moved cluster dashboard to master (#3074 ) dashboards: move cluster dashboard to master branch This change should simplify dashboards management.	2022-09-06 16:19:43 +02:00

50 commits