mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2025-01-10 15:14:09 +00:00
docs/Troubleshooting.md: various typo fixes and clarifications
This commit is contained in:
parent
7c54cc2123
commit
47f3c4b12b
1 changed files with 15 additions and 13 deletions
|
@ -104,7 +104,7 @@ There are the following most commons reasons for slow data ingestion in Victoria
|
|||
|
||||
- To increase the available memory on the host where VictoriaMetrics runs until `slow inserts` percentage
|
||||
will become lower than 5%. If you run VictoriaMetrics cluster, then you need increasing total available
|
||||
memory at `vmstorage` nodes. This can be done in two ways: either increasing the available memory
|
||||
memory at `vmstorage` nodes. This can be done in two ways: either to increase the available memory
|
||||
per each existing `vmstorage` node or to add more `vmstorage` nodes to the cluster.
|
||||
|
||||
- To reduce the number of active time series. The [official Grafana dashboards for VictoriaMetrics](https://docs.victoriametrics.com/#monitoring)
|
||||
|
@ -114,7 +114,7 @@ There are the following most commons reasons for slow data ingestion in Victoria
|
|||
|
||||
2. [High churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate),
|
||||
e.g. when old time series are substituted with new time series at a high rate.
|
||||
When VitoriaMetrics encounters a sample for new time series, it needs to register the time series
|
||||
When VictoriaMetrics encounters a sample for new time series, it needs to register the time series
|
||||
in the internal index (aka `indexdb`), so it can be quickly located on subsequent select queries.
|
||||
The process of registering new time series in the internal index is an order of magnitude slower
|
||||
than the process of adding new sample to already registered time series.
|
||||
|
@ -124,7 +124,7 @@ There are the following most commons reasons for slow data ingestion in Victoria
|
|||
provides `Churn rate` graph, which shows the average number of new time series registered
|
||||
during the last 24 hours. If this number exceeds the number of [active time series](https://docs.victoriametrics.com/FAQ.html#what-is-an-active-time-series),
|
||||
then you need to identify and fix the source of [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate).
|
||||
The most commons source of high churn rate is a label, which frequently change its value. Try avoiding such labels.
|
||||
The most commons source of high churn rate is a label, which frequently changes its value. Try avoiding such labels.
|
||||
The [cardinality explorer](https://docs.victoriametrics.com/#cardinality-explorer) can help identifying
|
||||
such labels.
|
||||
|
||||
|
@ -138,18 +138,19 @@ There are the following most commons reasons for slow data ingestion in Victoria
|
|||
- 20% of free disk space
|
||||
|
||||
If VictoriaMetrics components have lower amounts of free resources, then this may lead
|
||||
to **significant** performance degradation during data ingestion.
|
||||
to **significant** performance degradation after workload increases slightly.
|
||||
For example:
|
||||
|
||||
- If the percentage of free CPU is close to 0, then VictoriaMetrics
|
||||
may experience arbitrary long delays during data ingestion when it cannot keep up
|
||||
with the data ingestion rate.
|
||||
|
||||
- If the percentage of free memory reaches 0, then the Operating System where VictoriaMetrics components run
|
||||
- If the percentage of free memory reaches 0, then the Operating System where VictoriaMetrics components run,
|
||||
may have no enough memory for [page cache](https://en.wikipedia.org/wiki/Page_cache).
|
||||
VictoriaMetrics relies on page cache for quick queries over recently ingested data.
|
||||
If the operating system has no enough free memory for page cache, then it needs
|
||||
to re-read the requested data from disk. This may **significantly** increase disk read IO.
|
||||
to re-read the requested data from disk. This may **significantly** increase disk read IO
|
||||
and slow down both queries and data ingestion.
|
||||
|
||||
- If free disk space is lower than 20%, then VictoriaMetrics is unable to perform optimal
|
||||
background merge of the incoming data. This leads to increased number of data files on disk,
|
||||
|
@ -163,7 +164,7 @@ There are the following most commons reasons for slow data ingestion in Victoria
|
|||
then this may become limiting factor for data ingestion speed.
|
||||
|
||||
The [official Grafana dashboard for cluster version of VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring)
|
||||
contain `connection saturation` graph for `vminsert` components. If these graphs reach 100%,
|
||||
contain `connection saturation` graph for `vminsert` components. If this graph reaches 100%,
|
||||
then it is likely you have issues with network latency between `vminsert` and `vmstorage`.
|
||||
Another possible issue for 100% connection saturation between `vminsert` and `vmstorage`
|
||||
is resource shortage at `vmstorage` nodes. In this case you need to increase amounts
|
||||
|
@ -189,8 +190,8 @@ There are the following solutions exist for slow queries:
|
|||
Sometimes adding more `vmstorage` nodes also can help improving the speed for slow queries.
|
||||
|
||||
- Rewriting slow queries, so they become faster. Unfortunately it is hard determining
|
||||
whether the given query will be slow by just looking at it.
|
||||
VictoriaMetrics provides [query tracing](https://docs.victoriametrics.com/#query-tracing) functionality,
|
||||
whether the given query is slow by just looking at it.
|
||||
VictoriaMetrics provides [query tracing](https://docs.victoriametrics.com/#query-tracing) feature,
|
||||
which can help determine the source of slow query.
|
||||
See also [this article](https://valyala.medium.com/how-to-optimize-promql-and-metricsql-queries-85a1b75bf986),
|
||||
which explains how to determine and optimize slow queries.
|
||||
|
@ -233,16 +234,18 @@ for processing the current workload.
|
|||
|
||||
The most common sources of cluster instability are:
|
||||
|
||||
- Workload spike. For example, if the number of active time series increases by 2x while
|
||||
- Workload spikes. For example, if the number of active time series increases by 2x while
|
||||
the cluster has no enough free resources for processing the increased workload,
|
||||
then it may become unstable.
|
||||
VictoriaMetrics provides various configuration settings, which can be used for limiting unexpected workload spikes.
|
||||
See [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#resource-usage-limits) for details.
|
||||
|
||||
- Various maintenance tasks such as rolling upgrades or rolling restarts during configuration changes.
|
||||
For example, if a cluster contains `N=3` `vmstorage` nodes and they are restarted one-by-one (aka rolling restart),
|
||||
then the cluster will have only `N-1=2` healthy `vmstorage` nodes during the rolling restart.
|
||||
This means that the load on healthy `vmstorage` nodes increases by at least `100%/(N-1)=50%`
|
||||
comparing to the load before rolling restart. E.g. they need to process 50% more incoming
|
||||
data and return 50% more data during queries. In reality the load on the remaining `vmstorage`
|
||||
data and to return 50% more data during queries. In reality the load on the remaining `vmstorage`
|
||||
nodes increases even more because they need to register new time series, which were re-routed
|
||||
from temporarily unavailable `vmstorage` node. If `vmstorage` nodes had less than 50%
|
||||
of free resources (CPU, RAM, disk IO) before the rolling restart, then the rolling restart
|
||||
|
@ -262,5 +265,4 @@ See [capacity planning docs](https://docs.victoriametrics.com/Cluster-VictoriaMe
|
|||
and [cluster resizing and scalability docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#cluster-resizing-and-scalability)
|
||||
for details.
|
||||
|
||||
VictoriaMetrics provides various configuration settings, which can be used for limiting unexpected workload spikes.
|
||||
See [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#resource-usage-limits) for details.
|
||||
|
||||
|
|
Loading…
Reference in a new issue