docs/vmagent.md: add stream parsing mode chapter

2025-01-10 15:14:09 +00:00 · 2021-05-08 23:14:07 +03:00 · 2021-05-08 23:14:07 +03:00 · 2dddd68feb
commit 2dddd68feb
parent 4128c4db16
2 changed files with 46 additions and 40 deletions
--- a/app/vmagent/README.md
+++ b/app/vmagent/README.md
@ -184,7 +184,7 @@ Please file feature requests to [our issue tracker](https://github.com/VictoriaM
  to save network bandwidth.
 * `disable_keepalive: true` - to disable [HTTP keep-alive connections](https://en.wikipedia.org/wiki/HTTP_persistent_connection) on a per-job basis.
  By default, `vmagent` uses keep-alive connections to scrape targets to reduce overhead on connection re-establishing.
-* `stream_parse: true` - for scraping targets in a streaming manner. This may be useful for targets exporting big number of metrics.
+* `stream_parse: true` - for scraping targets in a streaming manner. This may be useful for targets exporting big number of metrics. See [these docs](#stream-parsing-mode).

 Note that `vmagent` doesn't support `refresh_interval` option for these scrape configs. Use the corresponding `-promscrape.*CheckInterval`
 command-line flag instead. For example, `-promscrape.consulSDCheckInterval=60s` sets `refresh_interval` for all the `consul_sd_configs`
@ -232,6 +232,27 @@ You can read more about relabeling in the following articles:
 * [relabel_configs vs metric_relabel_configs](https://www.robustperception.io/relabel_configs-vs-metric_relabel_configs)


+## Stream parsing mode
+
+By default `vmagent` reads the full response from scrape target into memory, then parses it, applies [relabeling](#relabeling) and then pushes the resulting metrics to the configured `-remoteWrite.url`. This mode works good for the majority of cases when the scrape target exposes small number of metrics (e.g. less than 10 thousand). But this mode may take big amounts of memory when the scrape target exposes big number of metrics. In this case it is recommended enabling stream parsing mode. When this mode is enabled, then `vmagent` reads response from scrape target in chunks, then immediately processes every chunk and pushes the processed metrics to remote storage. This allows saving memory when scraping targets that expose millions of metrics. Stream parsing mode may be enabled either globally for all of the scrape targets by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
+
+  ```yml
+  scrape_configs:
+  - job_name: 'big-federate'
+    stream_parse: true
+    static_configs:
+    - targets:
+      - big-prometeus1
+      - big-prometeus2
+    honor_labels: true
+    metrics_path: /federate
+    params:
+      'match[]': ['{__name__!=""}']
+  ```
+
+Note that `sample_limit` option doesn't work if stream parsing is enabled because the parsed data is pushed to remote storage as soon as it is parsed. Therefore the `sample_limit` option doesn't make sense during stream parsing.
+
+
 ## Scraping big number of targets

 A single `vmagent` instance can scrape tens of thousands of scrape targets. Sometimes this isn't enough due to limitations on CPU, network, RAM, etc.
@ -333,25 +354,7 @@ It may be useful to perform `vmagent` rolling update without any scrape loss.
  This option drops `"discoveredLabels"` and `"droppedTargets"` lists at `/api/v1/targets` page, which may result in reduced debuggability for improperly configured per-target relabeling.

 * If `vmagent` scrapes targets with millions of metrics per target (for example, when scraping [federation endpoints](https://prometheus.io/docs/prometheus/latest/federation/)),
-  we recommend enabling `stream parsing mode` in order to reduce memory usage during scraping. This mode may be enabled either globally for all of the scrape targets
-  by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
-
-  ```yml
-  scrape_configs:
-  - job_name: 'big-federate'
-    stream_parse: true
-    static_configs:
-    - targets:
-      - big-prometeus1
-      - big-prometeus2
-    honor_labels: true
-    metrics_path: /federate
-    params:
-      'match[]': ['{__name__!=""}']
-  ```
-
-  Note that `sample_limit` option doesn't work if stream parsing is enabled because the parsed data is pushed to remote storage as soon as it is parsed. Therefore the `sample_limit` option
- doesn't make sense during stream parsing.
+  we recommend enabling [stream parsing mode](#stream-parsing-mode) in order to reduce memory usage during scraping.

 * We recommend you increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page grows constantly.

--- a/docs/vmagent.md
+++ b/docs/vmagent.md
@ -188,7 +188,7 @@ Please file feature requests to [our issue tracker](https://github.com/VictoriaM
  to save network bandwidth.
 * `disable_keepalive: true` - to disable [HTTP keep-alive connections](https://en.wikipedia.org/wiki/HTTP_persistent_connection) on a per-job basis.
  By default, `vmagent` uses keep-alive connections to scrape targets to reduce overhead on connection re-establishing.
-* `stream_parse: true` - for scraping targets in a streaming manner. This may be useful for targets exporting big number of metrics.
+* `stream_parse: true` - for scraping targets in a streaming manner. This may be useful for targets exporting big number of metrics. See [these docs](#stream-parsing-mode).

 Note that `vmagent` doesn't support `refresh_interval` option for these scrape configs. Use the corresponding `-promscrape.*CheckInterval`
 command-line flag instead. For example, `-promscrape.consulSDCheckInterval=60s` sets `refresh_interval` for all the `consul_sd_configs`
@ -236,6 +236,27 @@ You can read more about relabeling in the following articles:
 * [relabel_configs vs metric_relabel_configs](https://www.robustperception.io/relabel_configs-vs-metric_relabel_configs)


+## Stream parsing mode
+
+By default `vmagent` reads the full response from scrape target into memory, then parses it, applies [relabeling](#relabeling) and then pushes the resulting metrics to the configured `-remoteWrite.url`. This mode works good for the majority of cases when the scrape target exposes small number of metrics (e.g. less than 10 thousand). But this mode may take big amounts of memory when the scrape target exposes big number of metrics. In this case it is recommended enabling stream parsing mode. When this mode is enabled, then `vmagent` reads response from scrape target in chunks, then immediately processes every chunk and pushes the processed metrics to remote storage. This allows saving memory when scraping targets that expose millions of metrics. Stream parsing mode may be enabled either globally for all of the scrape targets by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
+
+  ```yml
+  scrape_configs:
+  - job_name: 'big-federate'
+    stream_parse: true
+    static_configs:
+    - targets:
+      - big-prometeus1
+      - big-prometeus2
+    honor_labels: true
+    metrics_path: /federate
+    params:
+      'match[]': ['{__name__!=""}']
+  ```
+
+Note that `sample_limit` option doesn't work if stream parsing is enabled because the parsed data is pushed to remote storage as soon as it is parsed. Therefore the `sample_limit` option doesn't make sense during stream parsing.
+
+
 ## Scraping big number of targets

 A single `vmagent` instance can scrape tens of thousands of scrape targets. Sometimes this isn't enough due to limitations on CPU, network, RAM, etc.
@ -337,25 +358,7 @@ It may be useful to perform `vmagent` rolling update without any scrape loss.
  This option drops `"discoveredLabels"` and `"droppedTargets"` lists at `/api/v1/targets` page, which may result in reduced debuggability for improperly configured per-target relabeling.

 * If `vmagent` scrapes targets with millions of metrics per target (for example, when scraping [federation endpoints](https://prometheus.io/docs/prometheus/latest/federation/)),
-  we recommend enabling `stream parsing mode` in order to reduce memory usage during scraping. This mode may be enabled either globally for all of the scrape targets
-  by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
-
-  ```yml
-  scrape_configs:
-  - job_name: 'big-federate'
-    stream_parse: true
-    static_configs:
-    - targets:
-      - big-prometeus1
-      - big-prometeus2
-    honor_labels: true
-    metrics_path: /federate
-    params:
-      'match[]': ['{__name__!=""}']
-  ```
-
-  Note that `sample_limit` option doesn't work if stream parsing is enabled because the parsed data is pushed to remote storage as soon as it is parsed. Therefore the `sample_limit` option
- doesn't make sense during stream parsing.
+  we recommend enabling [stream parsing mode](#stream-parsing-mode) in order to reduce memory usage during scraping.

 * We recommend you increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page grows constantly.