Commit graph

106 commits

Author SHA1 Message Date
Aliaksandr Valialkin
aefd744abb
dashboards: remove path!="/favicon.ico" filter from requests rate graphs
The `path!="/favicon.ico"` filter has little sense, since there are many other special paths,
which may be filtered out - /metrics, /flags, /health, /ping, /robots.txt, /-/healthy, /-/ready, /reload, etc.
See /lib/httpserver/httpserver.go for more details.
It will be hard or impossible to maintain filters for all these paths, so it is better to drop this filter
in order to simplify queries and improve the consistency of these queries.
2023-11-16 19:28:49 +01:00
hagen1778
d389a4fcf3
dashboards: use version instead of short_version in annotations
`version` label won't show the difference if various flavors of the same
version were deployed. But `short_version` will.

For example, on the sandbox env we test VM builds before new version release.
Without this change, the version update won't be visible on dashboard.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-16 09:26:47 +01:00
hagen1778
d3ae2b2f62
dashboards: update description for RSS and anonymous memory panels to be consistent for single-node, cluster and vmagent dashboards.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-14 09:50:06 +01:00
hagen1778
d6ae082598
deployment/dashboards: respect job and instance filters for alerts annotation in cluster and single-node dashboards
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-14 09:38:15 +01:00
hagen1778
f6208965ce
dashboards/cluster: fix description about max threshold for Concurrent selects panel.
Before, it was mistakenly implying that `max` is equal to the double of available CPUs.

Addresses https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5214

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 16:05:33 +01:00
hagen1778
aaf9e3d526
dashboards/vmalert: add new panel Missed evaluations
The new panel supposed to indicate alerting groups that miss their evaluations.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 10:35:19 +01:00
hagen1778
8874b525b7
dashboards: fix Errors rate to Alertmanager filter
The panel `Errors rate to Alertmanager` had `group` label filter
applied to the expression, while the metric `vmalert_alerts_send_errors_total`
doesn't have that label. This resulted into always empty results.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 10:16:45 +01:00
hagen1778
c2d252c045
dashboards/vmalert: respect job and instance filters in No data errors
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-17 09:40:39 +02:00
hagen1778
edba9f6266
dashboards/vmalert: use desc sorting for tooltips on panels
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-17 09:31:09 +02:00
hagen1778
d43566605b
dasbhoards: fix vminsert/vmstorage/vmselect metrics filtering
Fix vminsert/vmstorage/vmselect metrics filtering when dashboard is used
to display data from many sub-clusters with unique job names.
Before, only one specific job could have been accounted for component-specific panels,
instead of all available jobs for the component.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-11 12:09:04 +02:00
Roman Khavronenko
a4bd73ec7e
lib/promscrape: make concurrency control optional (#5073)
* lib/promscrape: make concurrency control optional

Before, `-maxConcurrentInserts` was limiting all calls to `promscrape.Parse`
function: during ingestion and scraping. This behavior is incorrect.
Cmd-line flag `-maxConcurrentInserts` should have effect onl on ingestion.

Since both pipelines use the same `promscrape.Parse` function, we extend it
to make concurrency limiter optional. So caller can decide whether concurrency
should be limited or not.

This commit makes c53b5788b4
obsolete.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Revert "dashboards: move `Concurrent inserts` panel to Troubleshooting section"

This reverts commit c53b5788b4.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-02 21:32:11 +02:00
Aliaksandr Valialkin
859977d591
Revert "lib/promscrape: add metric vm_promscrape_scrapes_skipped_total (#5074)"
This reverts commit 74301cdbf5.

Reason for revert:

vmagent already provides better approach for detecting slow scrape targets via the following query:

    scrape_duration_seconds / scrape_timeout_seconds > 1

This query depends on automatically generated per-target metrics.
See https://docs.victoriametrics.com/vmagent.html#automatically-generated-metrics for more details.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5074
2023-10-02 20:59:56 +02:00
Roman Khavronenko
74301cdbf5
lib/promscrape: add metric vm_promscrape_scrapes_skipped_total (#5074)
* lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total`

add metric `vm_promscrape_scrapes_skipped_total`to show whether vmagent skips the scrapes.
This could happen if vmagent is overloaded or target is responding too slow for configured `scrape_interval`.

The follow-up commit should add a corresponding alerting rule and panel to vmagent dashboard.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* deployment/docker: add `TooManyScrapeSkips` alerting rule for vmagent

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards: add panels `Scrape duration 0.99 quantile` and `Skipped scrapes` to vmagent dashboard

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-02 17:12:12 +02:00
hagen1778
c53b5788b4
dashboards: move Concurrent inserts panel to Troubleshooting section
Moved because this panel is related to both: scraped and ingested data.
Before, it could have give a misleading impression that it is related to ingested metrics only.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-26 14:26:40 +02:00
hagen1778
0c60228fea
dashboards/victoriametrics: account for instance filter in annotations
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-20 14:50:03 +02:00
Artem Navoiev
f04eb762c1
add annotation to VictoriaLogs dashboards - restarts and version change (#5008)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-09-15 15:12:23 +02:00
Artem Navoiev
fef0c232e8
Update VL daashboard. Add Resource Section, add ds and job filters, a… (#4981)
* Update VL daashboard. Add Resource Section, add ds and job filters, add metric collection in docker compose from victorialogs, fix networkigs usage in docker compose

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* add vl dashboard to docker compose

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* add vl dashboard to docker compose

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-09-10 15:04:07 +02:00
Roman Khavronenko
e8db78eaa4
dashboards: provide copies of Grafana dashboards alternated with Vict… (#4905)
dashboards: provide copies of Grafana dashboards alternated with VictoriaMetrics datasource

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-29 11:06:55 +02:00
hagen1778
481a2c70fd
dashboard: fix display of ingested rows rate
Fix display of ingested rows rate for `Samples ingested/s`
and `Samples rate` panels for vmagent's dasbhoard.
Previously, not all ingested protocols were accounted in these panels.
An extra panel `Rows rate` was added to `Ingestion` section to display the split
for rows ingested rate by protocol.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-15 08:45:10 +02:00
hagen1778
d890038a94
dashboards: correctly calculate Bytes per point value
Correctly calculate `Bytes per point` value for single-server and cluster VM dashboards.
Before, the calculation mistakenly accounted for the number of entries in indexdb in
denominator, which could have shown lower values than expected.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-03 16:22:50 +02:00
hagen1778
c47138e1b0
dashboards: add panels for absoulte value of mem and cpu usage by vmalert
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4627

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-03 11:14:14 +02:00
hagen1778
e311a7bf80
dashboards: add Concurrent inserts panel to vmagent's dasbhoard
The new panel supposed to show whether the number of concurrent
inserts processed by vmagent isn't reaching the limit.
The panel contains recommendation what to do if limit is reached.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-03 10:46:25 +02:00
Aliaksandr Valialkin
f35d27aa2b
app/vlstorage: expose vl_data_size_bytes metric at /metrics page for tracking the on-disk data size (both indexdb and the data itself) 2023-07-31 07:56:53 -07:00
Zakhar Bessarab
6f3fee197e
dashboards/cluster: fix using storage filter for cache usage panel (#4657)
Using `job=~$job_storage` forces "Cache usage" panel to display only vmstorage caches, but there is a cache peresent at vmselect(`promql/rollupResult`).
Updated selector to match generic `$job` so that all caches will be displayed with an option to display per-job caches.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-07-18 11:40:40 +02:00
Artem Navoiev
b024e46284 Add docker compose examples: filebeat(docker, syslog), fluentbit(docker), logstash, vector(docker)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-06-21 03:59:31 -07:00
Roman Khavronenko
ccaa9571ef
Dashboard upd (#4438)
dashboards: update dashboard for single-node version
* add anonymous mem usage panel;
* add syscall rate panel;
* add location to logs panel;
* update legend for panels to reflect instance name;
* update queries to aggregate per instance.

dashboards: update dashboard for cluster version
* add syscall rate panel;
* add drilldown to logs panel.



Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-12 15:58:47 +02:00
Aliaksandr Valialkin
91533531f5
docs/Troubleshooting.md: document an additional case, which could result in slow inserts
If `-cacheExpireDuration` is lower than the interval between ingested samples for the same time series,
then vm_slow_row_inserts_total` metric is increased.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3976#issuecomment-1476883183
2023-03-20 13:28:36 -07:00
Roman Khavronenko
3eebe52a06
Dashboards upd (#3942)
* dashboards/cluser: use `quantile` since `median` isn't supported by PromQL

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/*: add `restarts` annotation to show when there were restarts

The cluster's annotation query is aggregated `by job`,
while vmagent/vmalert are aggregated `by job, instance`.
This is because cluster dashboard can contains too many instances
and annotation could become too noisy.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/*: support instance filter in Version annotation

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-10 17:13:19 +01:00
Roman Khavronenko
2e153b68cd
dashboards: account for indexdb size in Bytes-per-Point panel (#3884)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-28 17:47:52 +01:00
Roman Khavronenko
b209d4ace0
dashboards: use median instead of avg (#3800)
`avg` can be affected by just one outlier, which may lead
to false conclusions. `median` is supposed to reflect
reality better by leveling outliers out.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-11 10:01:30 -08:00
Aliaksandr Valialkin
88fed0232c
dashboards: typo fix Datapoints scanned per series -> Datapoints scanned per query 2023-02-03 19:12:33 -08:00
Roman Khavronenko
ec7c3f45ba
dashboards: bump operator dash to v9 of Grafana (#3642)
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-12 16:31:26 +01:00
Roman Khavronenko
b3a70b8284
dasbhoards: fix the tooltip info for 1.86 (#3628)
See c63755c316 (diff-bba263a473e7fbc9d0fde075ebef6b3d4e32c322ee1210a3e07182292c7723aaR18)

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-11 11:30:12 +01:00
Aliaksandr Valialkin
c63755c316
lib/writeconcurrencylimiter: improve the logic behind -maxConcurrentInserts limit
Previously the -maxConcurrentInserts was limiting the number of established client connections,
which write data to VictoriaMetrics. Some of these connections could be idle.
Such connections do not consume big amounts of CPU and RAM, so there is a little sense in limiting
the number of such connections. So now the -maxConcurrentInserts command-line option
limits the number of concurrently executed insert requests, not including idle connections.

It is recommended removing -maxConcurrentInserts command-line option, since the default value
for this option should work good for most cases.
2023-01-06 22:20:19 -08:00
Thomas Danielsson
9d1104d812
dashboards: fix operator datasource variable (#3604)
Got "Failed to upgrade legacy queries Datasource $ds was not found" in
Grafana on operator dashboard.
It's datasource variable was incorrectly named `datasource`.

Also made the rest of the dashboards have homogeneous datasource-variable
names and selections, matching vmagent dashboard.
2023-01-05 14:59:56 +01:00
Roman Khavronenko
9d0e1f8e68
dashboards: add backupmanager dashboard (#3599)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-04 17:26:15 +01:00
Roman Khavronenko
e40c7d6efa
dashboards: respect $job var in sub-vars for cluster dash (#3487)
Previously, $job_select, $job_storage and $job_insert
didn't respect the $job filter. This change updates
the variable queries to account for set $job variable.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-16 09:53:32 +01:00
Roman Khavronenko
eb275be99d
dashboards: add VersionChange annotation (#3473)
The new annotation is hidden by default and suppose to show
component `short_version` label change on the panels.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-12 16:32:26 +01:00
Roman Khavronenko
0b6b6d52bf
dashboards: remove DataLinks from single version (#3456)
Those data links were copy&paste artifact from cluster version
and aren't needed on the dash.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-07 14:35:52 +01:00
Roman Khavronenko
9f1403db38
dashboards: add non-default flags panel for vmagent (#3453)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-07 12:22:20 +01:00
Aliaksandr Valialkin
f3e84b4dea
{dashboards,alerts}: subtitute {type="indexdb"} with {type=~"indexdb.*"} inside queries after 8189770c50
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
2022-12-05 16:00:22 -08:00
Roman Khavronenko
6801b37e53
dashboards: add Disk space usage % and Disk space usage % by type panels (#3436)
The new panels have been added to the vmstorage and drilldown rows.

`Disk space usage %` is supposed to show disk space usage percentage.
This panel is now also referred by `DiskRunsOutOfSpace` alerting rule.
This panel has Drilldown option to show absolute values.

`Disk space usage % by type` shows the relation between datapoints
and indexdb size. It supposed to help identify cases when indexdb
starts to take too much disk space.
This panel has Drilldown option to show absolute values.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-05 08:35:33 +01:00
Roman Khavronenko
f989c20dd7
dashboards: fix typo in data link (#3426)
Fixes a missing `&` char in data link for ETA panel
on cluster dashboards. Without `&` char it generates
wrong link when click on Drilldown menu.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-01 13:21:14 +01:00
Roman Khavronenko
bdd0683c4a
dashboards: update VM single dash (#3400)
The change list is the following:
* bump Grafana version to 9.2.6;
* replace old "Graph" panel with "TimeSeries" panel;
* show % usage of Mem and CPU additionally to of absolute values;
* `Caches` row was removed. All needed info for caches is now part of `Troubleshooting`;
* add Annotations for Alert triggers. Not all alerts are supposed to be displayed
on the dashboard, but only those with label `show_at: dashboard`.
See `alerts.yml` change.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-11-29 19:28:22 +01:00
Roman Khavronenko
5d835a6d64
dashboards: update vmalert dash (#3404)
The change list is the following:
* bump Grafana version to 9.2.6;
* replace old Graph panel with TimeSeries panel;
* add RemoteWrite section;
* allow configuring topK elements for some of the panels;
* Preer grouping by job instead of grouping by instance.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-11-29 19:26:31 +01:00
Roman Khavronenko
7dfb01bd7b
dashboards: update vmagent dash (#3411)
The change list is the following:
* bump Grafana version to 9.2.6;
* add version change annotations;
* switch to per-job panels instead of per-instance;
* add drilldown option for resource usage panels.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-11-29 19:22:13 +01:00
Roman Khavronenko
31ff26065b
dashboards: update VM cluster dash (#3401)
The change list is the following:
* bump Grafana version to 9.2.6;
* remove artifacts in data links.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-11-28 14:13:00 +01:00
Timur Bakeyev
9ad578214e
Update datasource entries consistently contain type prometheus and uid $ds. (#3393)
Co-authored-by: Timour I. Bakeev <tbakeev@ripe.net>
2022-11-28 08:37:39 +01:00
Roman Khavronenko
42e63fe0fd
dashboards: cleanup & remove artifacts (#3387)
* some unexpected DS UIDs were removed;
* replace `$instance.*` filter with `$instance` since we respect
the instance port anyway;
* remove predefined datasource for `clusterbytenant`
in favour of datasource variable `ds`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-11-25 09:28:14 +01:00
Roman Khavronenko
3407006cdb
dashboards: cluster dashboard update (#3380)
The purpose of the update is to make the dash more usable
for large installations with many instances. Panels which showed
metrics per-instance (Mem, CPU) now are showing metrics per-job or min/max/avg
aggregations in % instead. This supposed to help immediately to identify
resource shortage and remain usable for small and big installations.

For cases when detailed info is needed, to the bottom of the dashboard
a new row `Drilldown` was added. Panels like Mem or CPU now contain
a `data-link` named `Drilldown` (cis shown on line click) which takes
user to more detailed panel.

The change list is the following:
* bump Grafana version to 9.1.0;
* replace old "Graph" panel with "TimeSeries" panel;
* improve Uptime panel to show number of instances per job;
* show % usage of Mem and CPU instead of absolute values;
* `Caches` row was removed. All needed info for caches is now part of `Troubleshooting`;
* add `Drilldown` section for detailed resource usage;
* add Annotations for Alert triggers. Not all alerts are supposed to be displayed
on the dashboard, but only those with label `show_at: dashboard`.
See `alerts-cluster.yml` change.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-11-23 18:03:25 -08:00