VictoriaMetrics/docs/CHANGELOG.md
Aliaksandr Valialkin 7a8b92b590
lib/{mergeset,storage}: make background merge more responsive and scalable
- Maintain a separate worker pool per each part type (in-memory, file, big and small).
  Previously a shared pool was used for merging all the part types.
  A single merge worker could merge parts with mixed types at once. For example,
  it could merge simultaneously an in-memory part plus a big file part.
  Such a merge could take hours for big file part. During the duration of this merge
  the in-memory part was pinned in memory and couldn't be persisted to disk
  under the configured -inmemoryDataFlushInterval .

  Another common issue, which could happen when parts with mixed types are merged,
  is uncontrolled growth of in-memory parts or small parts when all the merge workers
  were busy with merging big files. Such growth could lead to significant performance
  degradataion for queries, since every query needs to check ever growing list of parts.
  This could also slow down the registration of new time series, since VictoriaMetrics
  searches for the internal series_id in the indexdb for every new time series.

  The third issue is graceful shutdown duration, which could be very long when a background
  merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted,
  since it merges in-memory parts.

  A separate pool of merge workers per every part type elegantly resolves both issues:
  - In-memory parts are merged to file-based parts in a timely manner, since the maximum
    size of in-memory parts is limited.
  - Long-running merges for big parts do not block merges for in-memory parts and small parts.
  - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files.
    Merging for file parts is instantly canceled on graceful shutdown now.

- Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm
  should automatically self-tune according to the number of available CPU cores.

- Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly.
  It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge

- Tune the number of shards for pending rows and items before the data goes to in-memory parts
  and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate
  for registration of new time series. This should reduce the duration of data ingestion slowdown
  in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily
  unavailable.

- Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown.

This is a follow-up for fa566c68a6 .
Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2024-01-26 22:19:52 +01:00

45 KiB

sort weight title menu aliases
100 100 CHANGELOG
docs
parent weight
victoriametrics 100
/CHANGELOG.html

CHANGELOG

The following tip changes can be tested by building VictoriaMetrics components from the latest commits according to the following docs:

Metrics of the latest version of VictoriaMetrics cluster are available for viewing at our sandbox. The sandbox cluster installation is running under the constant load generated by prometheus-benchmark and used for testing latest releases.

tip

  • SECURITY: upgrade Go builder from Go1.21.5 to Go1.21.6. See the list of issues addressed in Go1.21.6.

  • FEATURE: improve new time series registration speed on systems with high number of CPU cores. Thanks to @misutoth for the initial idea and implementation.

  • FEATURE: make background merge more responsive and scalable. This should help the following issues: 5190, 3425, 648.

  • FEATURE: vmagent: add support for discovering Hetzner Cloud and Hetzner Robot scrape targets. See this feature request and these docs.

  • FEATURE: graphite: add support for negative index in groupByNode and aliasByNode functions. Thanks to @rbizos for the pull request.

  • FEATURE: vmagent: add support for DataDog v2 data ingestion protocol. See these docs and this feature request.

  • FEATURE: vmagent: expose ability to set OAuth2 endpoint parameters per each -remoteWrite.url via the command-line flag -remoteWrite.oauth2.endpointParams. See these docs. Thanks to @mhill-holoplot for the pull request.

  • FEATURE: vmagent: add ability to set attach_metadata.node=true option for all the kubernetes_sd_configs defined at -promscrape.config via -promscrape.kubernetes.attachNodeMetadataAll command-line flag. See this feature request. Thanks to @wasim-nihal for the initial implementation.

  • FEATURE: streaming aggregation: expand %{ENV_VAR} placeholders in config files with the corresponding environment variable values.

  • FEATURE: vmalert: expose ability to set OAuth2 endpoint parameters via the following command-line flags:

    • -datasource.oauth2.endpointParams for -datasource.url
    • -notifier.oauth2.endpointParams for -notifier.url
    • -remoteRead.oauth2.endpointParams for -remoteRead.url
    • -remoteWrite.oauth2.endpointParams for -remoteWrite.url
  • FEATURE: vmauth: add ability to proxy incoming requests to different backends based on the requested host via src_hosts option at url_map. See these docs.

  • FEATURE: vmauth: expose vmauth_user_request_backend_errors_total and vmauth_unauthorized_user_request_backend_errors_total metrics, which track the number of failed requests because of backend errors. See this feature request.

  • FEATURE: vmauth: add an ability to specify additional labels for per-user metrics via metric_labels section. See this feature request.

  • FEATURE: all VictoriaMetrics components: break HTTP client connection if an error occurs after the server at -httpListenAddr already sent response status code. Previously such an error couldn't be detected at client side. Now the client will get an error about invalid chunked response. The error message is simultaneously written to the server log and in the last line of the response. This should help detecting errors when migrating data between VictoriaMetrics instances by vmctl. See this issue.

  • FEATURE: all VictoriaMetrics components: add ability to specify arbitrary HTTP headers to send with every request to -pushmetrics.url. See push metrics docs.

  • FEATURE: all VictoriaMetrics components: add -metrics.exposeMetadata command-line flag, which allows displaying TYPE and HELP metadata at /metrics page exposed at -httpListenAddr. This may be needed when the /metrics page is scraped by collector, which requires the TYPE and HELP metadata such as Google Cloud Managed Prometheus.

  • FEATURE: all VictoriaMetrics components: add ability to dynamically re-read auth keys and passwords from files and urls when using file:///path/to/file or http://host/path syntax for the following command-line flags: -configAuthKey, -deleteAuthKey, -flagsAuthKey, -forceMergeAuthKey, -forceFlushAuthKey, -httpAuth.password, -metricsAuthKey, -pprofAuthKey, -reloadAuthKey, -search.resetCacheAuthKey, -snapshotAuthKey. For example, -httpAuth.password=file:///path/to/password. See these docs for details.

  • FEATURE: dashboards/cluster: add panels for detailed visualization of traffic usage between vmstorage, vminsert, vmselect components and their clients. New panels are available in the rows dedicated to specific components.

  • FEATURE: dashboards/cluster: update "Slow Queries" panel to show percentage of the slow queries to the total number of read queries served by vmselect. The percentage value should make it more clear for users whether there is a service degradation.

  • FEATURE: dashboards/single: change dashboard title from VictoriaMetrics to VictoriaMetrics - single-node. The new title should provide better understanding of this dashboard purpose.

  • FEATURE: vmctl: rename cmd-line flag vm-native-disable-retries to vm-native-disable-per-metric-migration to better reflect its meaning.

  • FEATURE: vmctl: add -vm-native-src-insecure-skip-verify and -vm-native-dst-insecure-skip-verify command-line flags for native protocol. It can be used for skipping TLS certificate verification when connecting to the source or destination addresses.

  • FEATURE: Alerting rules for VictoriaMetrics: add job label to DiskRunsOutOfSpace alerting rule, so it is easier to understand to which installation the triggered instance belongs.

  • FEATURE: vmstorage: add tenant identifier for log messages regarding dropping excessive labels due to limits defined by -maxLabelsPerTimeseries or -maxLabelValueLen command-line flags. Previously, it was hard to understand to which tenant the dropped labels belong.

  • FEATURE: vmui: add the ability to export and import query reports:

    • add a Query Analyzer page that allows you to build graphs from JSON data containing the results of executing a query request.
    • add an Export query button to the graph that saves the result of executing the query in JSON. See this pull request.
  • FEATURE: vmui: add -vmui.defaultTimezone flag to set a default timezone. See this issue and these docs.

  • FEATURE: vmui: include UTC in the timezone selection dropdown for standardized time referencing. See this issue.

  • FEATURE: add VictoriaMetrics datasource to docker compose environment. See this pull request.

  • BUGFIX: properly return errors from export APIs. Previously these errors were silently suppressed. See this pull request.

  • BUGFIX: VictoriaMetrics cluster: properly return full results when -search.skipSlowReplicas command-line flag is passed to vmselect and when vmstorage groups are in use. Previously partial results could be returned in this case.

  • BUGFIX: vminsert: properly accept samples via OpenTelemetry data ingestion protocol when these samples have no resource attributes. Previously such samples were silently skipped.

  • BUGFIX: vmstorage: added missing -inmemoryDataFlushInterval command-line flag, which was missing in VictoriaMetrics cluster after implementing this feature in v1.85.0.

  • BUGFIX: vmstorage: properly expire storage/prefetchedMetricIDs cache. Previously this cache was never expired, so it could grow big under high churn rate. This could result in increasing CPU load over time.

  • BUGFIX: vmalert: check -external.url schema when starting vmalert, must be http or https. Before, alertmanager could reject alert notifications if -external.url contained no or wrong schema.

  • BUGFIX: vmalert: automatically add exported_ prefix for original evaluation result label if it's conflicted with external or reserved one, previously it was overridden. See this issue.

  • BUGFIX: vmalert: autogenerate ALERTS_FOR_STATE time series for alerting rules with for: 0. Previously, ALERTS_FOR_STATE was generated only for alerts with for > 0. The change aligns with Prometheus behavior. See more details in this issue.

  • BUGFIX: MetricsQL: consistently sort results for q1 or q2 query, so they do not change colors with each refresh in Grafana. See this issue.

  • BUGFIX: MetricsQL: properly return results from bottomk and bottomk_*() functions when some of these results contain NaN values. See this issue. Thanks to @xiaozongyang for the fix.

  • BUGFIX: MetricsQL: properly handle queries, which wrap rollup functions with multiple arguments without explicitly specified lookbehind window in square brackets into aggregate functions. For example, sum(quantile_over_time(0.5, process_resident_memory_bytes)) was resulting to expecting at least 2 args to ...; got 1 args error. Thanks to @atykhyy for the pull request.

  • BUGFIX: vmctl: retry on import errors in vm-native mode. Before, retries happened only on writes into a network connection between source and destination. But errors returned by server after all the data was transmitted were logged, but not retried.

  • BUGFIX: vmagent: properly assume role with AWS IRSA authorization. Previously role chaining was not supported. See this issue for details.

  • BUGFIX: vmagent: exit if there is config syntax error in scrape_config_files when -promscrape.config.strictParse=true. See this issue.

  • BUGFIX: vmagent: properly discover targets for role: endpoints and role: endpointslice in kubernetes_sd_configs. Previously some endpoints and endpointslice targets could be left undiscovered or some targets could have missing __meta_* labels when performing service discovery in busy Kubernetes clusters with large number of pods. See this pull request.

  • BUGFIX: vmagent: respect explicitly set series_limit: 0 in scrape_config. This allows removing series_limit restriction on a per-scrape_config basis when global limit is set via -promscrape.seriesLimitPerTarget. Previously, 0 value was ignored in favor of -promscrape.seriesLimitPerTarget.

  • BUGFIX: vmagent: do not discover scrape targets for already terminated pods and containers in kubernetes_sd_configs. Such pods and containers cannot be scraped and cannot resurrect, so there is no sense in generating scrape targets for them.

  • BUGFIX: vmui: fix a link for the statistic inaccuracy explanation in the cardinality explorer tool. See this issue.

  • BUGFIX: vmui: fix the display of autocomplete results and cache the results. See this issue and this issue.

  • BUGFIX: vmui: send step param for instant queries. The change reverts this issue due to reasons explained in this comment.

  • BUGFIX: all: fix potential panic during components shutdown when metrics push is configured. See this issue. Thanks to @zhdd99 for the pull request.

  • BUGFIX: MetricsQL: properly process queries with too big lookbehind window such as foo[100y]. Previously, such queries could return empty responses even if foo is present in database. See this issue.

  • BUGFIX: MetricsQL: properly handle possible negative results caused by float operations precision error in rollup functions like rate() or increase(). See this issue.

  • BUGFIX: vmselect: vmsingle/vmselect returns http status 429 (TooManyRequests) instead of 503 (ServiceUnavailable) when max concurrent requests limit is reached.

v1.96.0

See changes here

v1.95.1

See changes here

v1.95.0

See changes here

v1.94.0

See changes here

v1.93.10

Released at 2024-01-17

v1.93.x is a line of LTS releases (e.g. long-time support). It contains important up-to-date bugfixes. The v1.93.x line will be supported for at least 12 months since v1.93.0 release

  • SECURITY: upgrade Go builder from Go1.21.5 to Go1.21.6. See the list of issues addressed in Go1.21.6.

  • BUGFIX: vminsert: properly accept samples via OpenTelemetry data ingestion protocol when these samples have no resource attributes. Previously such samples were silently skipped.

  • BUGFIX: vmstorage: added missing -inmemoryDataFlushInterval command-line flag, which was missing in VictoriaMetrics cluster after implementing this feature in v1.85.0.

  • BUGFIX: vmstorage: properly expire storage/prefetchedMetricIDs cache. Previously this cache was never expired, so it could grow big under high churn rate. This could result in increasing CPU load over time.

  • BUGFIX: vmalert: check -external.url schema when starting vmalert, must be http or https. Before, alertmanager could reject alert notifications if -external.url contained no or wrong schema.

  • BUGFIX: MetricsQL: properly return results from bottomk and bottomk_*() functions when some of these results contain NaN values. See this issue. Thanks to @xiaozongyang for the fix.

  • BUGFIX: MetricsQL: properly handle queries, which wrap rollup functions with multiple arguments without explicitly specified lookbehind window in square brackets into aggregate functions. For example, sum(quantile_over_time(0.5, process_resident_memory_bytes)) was resulting to expecting at least 2 args to ...; got 1 args error. Thanks to @atykhyy for the pull request.

  • BUGFIX: vmagent: properly assume role with AWS IRSA authorization. Previously role chaining was not supported. See this issue for details.

  • BUGFIX: all: fix potential panic during components shutdown when metrics push is configured. See this issue. Thanks to @zhdd99 for the pull request.

  • BUGFIX: vmctl: check for Error field in response from influx client during migration. Before, only network errors were checked. Thanks to @wozz for the pull request.

  • BUGFIX: vmctl: retry on import errors in vm-native mode. Before, retries happened only on writes into a network connection between source and destination. But errors returned by server after all the data was transmitted were logged, but not retried.

v1.93.9

See changes here

v1.93.8

See changes here

v1.93.7

See changes here

v1.93.6

See changes here

v1.93.5

See changes here

v1.93.4

See changes here

v1.93.3

See changes here

v1.93.2

See changes here

v1.93.1

See changes here

v1.93.0

See changes here

v1.92.1

See changes here

v1.92.0

See changes here

v1.91.3

See changes here

v1.91.2

See changes here

v1.91.1

See changes here

v1.91.0

See changes here

v1.90.0

See changes here

v1.89.1

See changes here

v1.89.0

See changes here

v1.88.1

See changes here

v1.88.0

See changes here

v1.87.13

Released at 2024-01-17

v1.87.x is a line of LTS releases (e.g. long-time support). It contains important up-to-date bugfixes. The v1.87.x line will be supported for at least 12 months since v1.87.0 release

  • SECURITY: upgrade Go builder from Go1.21.5 to Go1.21.6. See the list of issues addressed in Go1.21.6.

  • BUGFIX: vmstorage: added missing -inmemoryDataFlushInterval command-line flag, which was missing in VictoriaMetrics cluster after implementing this feature in v1.85.0.

  • BUGFIX: MetricsQL: properly handle queries, which wrap rollup functions with multiple arguments without explicitly specified lookbehind window in square brackets into aggregate functions. For example, sum(quantile_over_time(0.5, process_resident_memory_bytes)) was resulting to expecting at least 2 args to ...; got 1 args error. Thanks to @atykhyy for the pull request.

  • BUGFIX: vmstorage: properly expire storage/prefetchedMetricIDs cache. Previously this cache was never expired, so it could grow big under high churn rate. This could result in increasing CPU load over time.

  • BUGFIX: MetricsQL: properly return results from bottomk and bottomk_... functions when some of these results contain NaN values. See this issue. Thanks to @xiaozongyang for the fix.

  • BUGFIX: all: fix potential panic during components shutdown when metrics push is configured. See this issue. Thanks to @zhdd99 for the pull request.

v1.87.12

See changes here

v1.87.11

See changes here

v1.87.10

See changes here

v1.87.9

See changes here

v1.87.8

See changes here

v1.87.7

See changes here

v1.87.6

See changes here

v1.87.5

See changes here

v1.87.4

See changes here

v1.87.3

See changes here

v1.87.2

See changes here

v1.87.1

See changes here

v1.87.0

See changes here

v1.86.2

See changes here

v1.86.1

See changes here

v1.86.0

See changes here

v1.85.3

See changes here

v1.85.2

See changes here

v1.85.1

See changes here

v1.85.0

See changes here

v1.84.0

See changes here

v1.83.1

See changes here

v1.83.0

See changes here

v1.82.1

See changes here

v1.82.0

See changes here

v1.81.2

See changes here

v1.81.1

See changes here

v1.81.0

See changes here

v1.80.0

See changes here

v1.79.14

See changes here

v1.79.13

See changes here

v1.79.12

See changes here

v1.79.11

See changes here

v1.79.10

See changes here

v1.79.9

See changes here

v1.79.8

See changes here

v1.79.7

See changes here

v1.79.6

See changes here

v1.79.5

See changes here

v1.79.4

See changes here

v1.79.3

See changes here

v1.79.2

See changes here

v1.79.1

See changes here

v1.79.0

See changes here

v1.78.1

See changes here

v1.78.0

See changes here

v1.77.2

See changes here

v1.77.1

See changes here

v1.77.0

See changes here

v1.76.1

See changes here

v1.76.0

See changes here

v1.75.1

See changes here

v1.75.0

See changes here

v1.74.0

See changes here

v1.73.1

See changes here

v1.73.0

See changes here

v1.72.0

See changes here

v1.71.0

See changes here

v1.70.0

See changes here

v1.69.0

See changes here

v1.68.0

See changes here

v1.67.0

See changes here

v1.66.2

See changes here

v1.66.1

See changes here

v1.66.0

See changes here

v1.65.0

See changes here

v1.64.1

See changes here

v1.64.0

See changes here

v1.63.0

See changes here

v1.62.0

See changes here

v1.61.1

See changes here

v1.61.0

See changes here

v1.60.0

See changes here

v1.59.0

See changes here

v1.58.0

See changes here

v1.57.1

See changes here

v1.57.0

See changes here

v1.56.0

See changes here

v1.55.1

See changes here

v1.55.0

See changes here

v1.54.1

See changes here

v1.54.0

See changes here

v1.53.1

See changes here

v1.53.0

See changes here

v1.52.0

See changes here

v1.51.0

See changes here

v1.50.2

See changes here

v1.50.1

See changes here

v1.50.0

See changes here

v1.49.0

See changes here

v1.48.0

See changes here

v1.47.0

See changes here

v1.46.0

See changes here

v1.45.0

See changes here

v1.44.0

See changes here

v1.43.0

See changes here

v1.42.0

See changes here

Previous releases

See releases page.