* lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total`
add metric `vm_promscrape_scrapes_skipped_total`to show whether vmagent skips the scrapes.
This could happen if vmagent is overloaded or target is responding too slow for configured `scrape_interval`.
The follow-up commit should add a corresponding alerting rule and panel to vmagent dashboard.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* deployment/docker: add `TooManyScrapeSkips` alerting rule for vmagent
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* dashboards: add panels `Scrape duration 0.99 quantile` and `Skipped scrapes` to vmagent dashboard
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Compare the actual free disk space to the value provided via -storage.minFreeDiskSpaceBytes
directly inside the Storage.IsReadOnly(). This should work fast in most cases.
This simplifies the logic at lib/storage.
- Do not take into account -storage.minFreeDiskSpaceBytes during background merges, since
it results in uncontrolled growth of small parts when the free disk space approaches -storage.minFreeDiskSpaceBytes.
The background merge logic uses another mechanism for determining whether there is enough
disk space for the merge - it reserves the needed disk space before the merge
and releases it after the merge. This prevents from out of disk space errors during background merge.
- Properly handle corner cases for flushing in-memory data to disk when the storage
enters read-only mode. This is better than losing the in-memory data.
- Return back Storage.MustAddRows() instead of Storage.AddRows(),
since the only case when AddRows() can return error is when the storage is in read-only mode.
This case must be handled by the caller by calling Storage.IsReadOnly()
before adding rows to the storage.
This simplifies the code a bit, since the caller of Storage.MustAddRows() shouldn't handle
errors returned by Storage.AddRows().
- Properly store parsed logs to Storage if parts of the request contain invalid log lines.
Previously the parsed logs could be lost in this case.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4945
This should reduce tail latency during data ingestion.
This shouldn't slow down data ingestion in the worst case, since assisted merges are spread among
distinct addRows/addItems calls after this change.
* lib/logstorage: prevent from panic during background merge
Fixes panic during background merge when resulting block would contain more columns than maxColumnsPerBlock.
Buffered data will be flushed and replaced by the next block.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/logstorage: clarify field description and comment
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/logstorage: switch to read-only mode when running out of disk space
Added support of `--storage.minFreeDiskSpaceBytes` command-line flag to allow graceful handling of running out of disk space at `--storageDataPath`.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/logstorage: fix error handling logic during merge
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/logstorage: fix log level
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
It was added in order to limit number of goroutines performing assisted merges during ingestion.
It turned out that blocking ingestion goroutines lower ingestion performance and limits overall ingestion around 40k items per seconds because of lock contention.
Removing parts merge sync.Cond allows to remove lock contention at write path and significantly improves write performance.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* vmui: update information about tsdb usage in cluster version
* vmui: cleanup
* vmui: add CHANGELOG.md
* vmui: cleanup
* vmui: update logic, move information to the visible place
* app/vmui: remove values fetch, update documentation for cardinality explorer
* app/vmui: update CHANGELOG.md
Moved because this panel is related to both: scraped and ingested data.
Before, it could have give a misleading impression that it is related to ingested metrics only.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* docker-compose: add vmauth to cluster env
vmauth acts as a balancer and used as an example of how to interconnect
VM components via vmauth.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* docker-compose: add vmauth to cluster env
vmauth acts as a balancer and used as an example of how to interconnect
VM components via vmauth.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
`median_over_time` is handled by predefined WITH template in MetricsQL library which translates it to `quantile_over_time(0.5)`
This makes it impossble to use `median_over_time` as a usual rollup function for `aggr_over_time`.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs
It is possible that context.Cancelled will appear after k8s watcher was closed due to reload(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850).
Logging an error misinforms user and looks like vmagent discovery will stop working even though this does not affect discovery.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
(cherry picked from commit 8d99c12a7d)
* lib/backup: fix issue with inconsistent copying of appliedRetention.txt
appliedRetention.txt can be modified in place, so it should be always copied just the same as parts.json
Updates: https://github.com/victoriaMetrics/victoriaMetrics/issues/5005
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs: add changelog entry for appliedRetention.txt copying fix
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* expose metrics `vmauth_config_last_reload_*` for tracking the state of config reloads, similarly to vmagent/vmalert components.
* do not print logs like `SIGHUP received...` once per configured `-configCheckInterval` cmd-line flag. This log will be printed only if config reload was invoked manually.
* prevent configuration reloading if there were no changes in config. This improves memory usage when `-configCheckInterval` cmd-line flag is configured and config has extensive list of regexp expressions requiring additional memory on parsing.
Signed-off-by: hagen1778 <roman@victoriametrics.com>