Commit graph

41 commits

Author SHA1 Message Date
Aliaksandr Valialkin
d8c7cc266b
lib/promscrape: use prompbmarshal.MustParsePromMetrics function at parseData() test function
The prompbmarshal.MustParsePromMetrics function has been added in the commit cc4d57d650
2024-07-03 16:08:13 +02:00
Aliaksandr Valialkin
bb00bae353
Revert "Exemplar support (#5982)"
This reverts commit 5a3abfa041.

Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below).
Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever
once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite
that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ .

Issues with Prometheus exemplars:

- Prometheus still has only experimental support for exemplars after more than three years since they were introduced.
  It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature.
  See 0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)
  and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage

- It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs
  for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API
  for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars
  via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus .

- It looks like exemplars are supported for Histogram metric types only -
  see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar .
  Exemplars aren't supported for Counter, Gauge and Summary metric types.

- Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query
  contains histogram_quantile() function. It queries exemplars via special Prometheus API -
  https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.)
  and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work
  in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets
  exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph.
  This makes the graph unusable, since it is litterally filled with thousands of exemplar dots.
  Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars.

- Exemplars are usually connected to traces. While traces are good for some

I doubt exemplars will become production-ready in the near future because of the issues outlined above.

Alternative to exemplars:

Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs -
just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry!
This doesn't work as expected in production as shown above. Are there better solutions, which work in production?
Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job`
and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry
by these labes on the time range with the anomaly on metrics' graph.
2024-07-03 15:30:21 +02:00
Andrii Chubatiuk
1e83598be3
app/vmagent: add max_scrape_size to scrape config (#6434)
Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-06-20 13:58:42 +02:00
Ted Possible
5a3abfa041
Exemplar support (#5982)
This code adds Exemplars to VMagent and the promscrape parser adhering
to OpenMetrics Specifications. This will allow forwarding of exemplars
to Prometheus and other third party apps that support OpenMetrics specs.

---------

Signed-off-by: Ted Possible <ted_possible@cable.comcast.com>
2024-05-07 12:09:44 +02:00
Aliaksandr Valialkin
918cccaddf
all: fix golangci-lint(revive) warnings after 0c0ed61ce7
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001
2024-04-02 23:16:29 +03:00
Aliaksandr Valialkin
bc7cf4950b
lib/promscrape: use the standard net/http.Client instead of fasthttp.Client for scraping targets in non-streaming mode
While fasthttp.Client uses less CPU and RAM when scraping targets with small responses (up to 10K metrics),
it doesn't work well when scraping targets with big responses such as kube-state-metrics.
In this case it could use big amounts of additional memory comparing to net/http.Client,
since fasthttp.Client reads the full response in memory and then tries re-using the large buffer
for further scrapes.

Additionally, fasthttp.Client-based scraping had various issues with proxying, redirects
and scrape timeouts like the following ones:

- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5425
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2794
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1017

This should help reducing memory usage for the case when target returns big response
and this response is scraped by fasthttp.Client at first before switching to stream parsing mode
for subsequent scrapes. Now the switch to stream parsing mode is performed on the first scrape
after reading the response body in memory and noticing that its size exceeds the value passed
to -promscrape.minResponseSizeForStreamParse command-line flag.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5567

Overrides https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4931
2024-01-30 18:39:10 +02:00
Aliaksandr Valialkin
65bc460323
lib/promscrape: follow-up for 97373b7786
Substitute O(N^2) algorithm for exposing the `vm_promscrape_scrape_pool_targets` metric
with O(N) algorithm, where N is the number of scrape jobs. The previous algorithm could slow down
/metrics exposition significantly when -promscrape.config contains thousands of scrape jobs.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5311
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5335
2023-12-06 17:35:50 +02:00
Dmytro Kozlov
7b87fac8e7
lib/promscrape: fix honor_labels behavior (#3739)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-01 11:21:44 -08:00
Aliaksandr Valialkin
babecd8363
lib/promscrape: follow-up for 393876e52a
- Document the change in docs/CHANGELOG.md
- Reduce memory usage when sending stale markers even more by parsing the response in stream parsing mode
- Update the TestSendStaleSeries

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3668
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3675
2023-01-23 21:52:59 -08:00
Roman Khavronenko
393876e52a
lib/promscrape: limit number of sent stale series at once (#3686)
Stale series are sent when there is a difference between current
and previous scrapes. Those series which disappeared in the current scrape
are marked as stale and sent to the remote storage.

Sending stale series requires memory allocation and in case when too many
series disappear in the same it could result in noticeable memory spike.
For example, re-deploy of a big fleet of service can result into
excessive memory usage for vmagent, because all the series with old
pod name will be marked as stale and sent to the remote write storage.

This change limits the number of stale series which can be sent at once,
so memory usage remains steady.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3668
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3675
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-23 21:15:59 -08:00
Aliaksandr Valialkin
a8b8e23d68
lib/promscrape: implement target-level and metric-level relabel debugging
Target-level debugging is performed by clicking the 'debug' link at the corresponding target
on either http://vmagent:8429/targets page or on http://vmagent:8428/service-discovery page.

Metric-level debugging is perfromed at http://vmagent:8429/metric-relabel-debug page.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3407

See https://docs.victoriametrics.com/vmagent.html#relabel-debug
2022-12-10 02:09:44 -08:00
Aliaksandr Valialkin
f325410c26
lib/promscrape: optimize service discovery speed
- Return meta-labels for the discovered targets via promutils.Labels
  instead of map[string]string. This improves the speed of generating
  meta-labels for discovered targets by up to 5x.

- Remove memory allocations in hot paths during ScrapeWork generation.
  The ScrapeWork contains scrape settings for a single discovered target.
  This improves the service discovery speed by up to 2x.
2022-11-29 21:26:00 -08:00
Aliaksandr Valialkin
654e94f420
lib/promscrape: add exported_ prefix to metric names exported by scrape targets if they clash with automatically generated metrics
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3406
2022-11-28 18:37:09 -08:00
Aliaksandr Valialkin
86bce7f5f9
lib/promscrape: add more cases to TestAddRowToTimeseries
This is a follow-up for 16fdd2af8a
2022-11-09 16:13:56 +02:00
Jeremy PLANCKEEL
16fdd2af8a
test(golang): add test to function addRowToTimeseries (#3282)
Co-authored-by: jplanckeel-externe <jplanckeel.externe@bedrockstreaming.com>
2022-11-09 15:41:26 +02:00
Aliaksandr Valialkin
76e8888272
lib/promscrape: properly add exported_ prefix to labels, which clash with target labels if honor_labels: true option isn't set.
The issue was in the `labels := dst[offset:]` line in the beginning of appendExtraLabels() function.
The `dst` may be re-allocated when adding extra labels to it. In this case the addition of `exported_`
prefix to labels inside `labels` slice become invisible in the returned `dst` labels.

While at it, properly handle some corner cases:

- Add additional `exported_` prefix to clashing metric labels with already existing `exported_` prefix.
- Store scraped metric names in `exported___name__` label if scrape target contains `__name__` label.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3278

Thanks to @jplanckeel for the initial attempt to fix this issue
at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3281
2022-10-28 22:14:26 +03:00
Aliaksandr Valialkin
4998402004
lib/promscrape: add external_labels from global section of -promscrape.config after the relabeling is applied to the scraped metrics
This aligns with Prometheus behaviour.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3137
2022-10-01 16:13:19 +03:00
Aliaksandr Valialkin
7d26414b2e
lib/promscrape: automatically generate additional per-target labels for targets with non-zero series limit
The following metrics are generated:

- scrape_series_limit
- scrape_series_current
- scrape_series_limit_samples_dropped

These metrics simplify alerting on targets, which expose too many time series

See https://docs.victoriametrics.com/vmagent.html#automatically-generated-metrics
and https://docs.victoriametrics.com/vmagent.html#cardinality-limiter for more details
2022-08-17 13:19:33 +03:00
Aliaksandr Valialkin
46d7792b72
lib/promscrape: follow-up after 2c553d5a2f
- fix broken tests
- cosmetic code cleanup
- document the change at https://docs.victoriametrics.com/vmagent.html#multitenancy
- document the change at https://docs.victoriametrics.com/CHANGELOG.html
2022-08-08 14:46:26 +03:00
Aliaksandr Valialkin
e6ba2af7a1
lib/promscrape: fix a test after c66f676f3b 2022-07-06 13:26:35 +03:00
Aliaksandr Valialkin
728c4c3841 lib/promscrape: generate scrape_timeout_seconds metric per each scrape target in the same way as Prometheus 2.30 does
See https://github.com/prometheus/prometheus/pull/9247
2021-09-12 15:20:44 +03:00
Aliaksandr Valialkin
f3e89754a9 lib/promscrape: reduce CPU usage for common case when calculating scrape_series_added metric
Also reduce CPU usage when applying `series_limit` to scrape targets with constant set of metrics.

The main idea is to perform the calculations on scrape_series_added and series_limit
only if the set of metrics exposed by the target has been changed.
Scrape targets rarely change the set of exposed metrics,
so this optimization should reduce CPU usage in general case.
2021-09-12 12:53:14 +03:00
Aliaksandr Valialkin
f77dde837a lib/promscrape: add the ability to limit the number of unique series per each scrape target
The number of series per target can be limited with the following options:

* Global limit with `-promscrape.maxSeriesPerTarget` command-line option.
* Per-target limit with `max_series: N` option in `scrape_config` section.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1561
2021-09-01 16:03:59 +03:00
Aliaksandr Valialkin
c09446a9aa lib/promscrape: send stale markers for the previously scraped metrics on failed scrapes like Prometheus does 2021-08-18 21:59:03 +03:00
Aliaksandr Valialkin
d826352688 app/vmagent: follow-up after fe445f753b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1491
2021-08-05 09:52:32 +03:00
Omar Ghader
46e27d60a6 feature: Add multitenant for vmagent (#1505)
* feature: Add multitenant for vmagent

* Minor fix

* Fix rcs index out of range

* Minor fix

* Fix multi Init

* Fix multi Init

* Fix multi Init

* Add default multi

* Adjust naming

* Add TenantInserted metrics

* Add TenantInserted metrics

* fix: remove unused metrics for vmagent

* fix: remove unused metrics for vmagent

Co-authored-by: mghader <marc.ghader@ubisoft.com>
Co-authored-by: Sebastian YEPES <syepes@gmail.com>
2021-08-05 09:52:31 +03:00
Aliaksandr Valialkin
78f83dc5ad app/{vmagent,vminsert}: follow-up after 2fe045e2a4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1343
2021-06-04 20:27:58 +03:00
Aliaksandr Valialkin
a52a20659a lib/promscrape: fix tests after f0c21b6300 2021-05-28 01:32:50 +03:00
Aliaksandr Valialkin
3fd8653b40 lib/promscrape: apply sample_limit after metric relabeling is applied as Prometheus does
See the description for `sample_limit` option from Prometheus docs:

Per-scrape limit on number of scraped samples that will be accepted.
If more than this number of samples are present after metric relabeling
the entire scrape will be treated as failed. 0 means no limit.

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
2021-03-09 15:47:18 +02:00
Aliaksandr Valialkin
d136081040 lib/promrelabel: add more optimizations for relabeling for common cases 2021-02-22 16:33:55 +02:00
Aliaksandr Valialkin
2dfa746c91 lib/promscrape: remove ID field from ScrapeWork struct. Use a pointer to ScrapeWork as a key in targetStatusMap
This simplifies the code a bit.
2020-12-17 14:32:56 +02:00
Aliaksandr Valialkin
32869e4c0f lib/promscrape: fix failing tests after a906b3862f 2020-11-29 01:26:03 +02:00
Aliaksandr Valialkin
ca8b5745b5 lib/promscrape: reduce memory allocations in promLabelsString() function
This should help with reducing memory usage in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/878
2020-11-04 10:38:44 +02:00
Aliaksandr Valialkin
7d893a234c lib/promscrape: do not reset the remaining rows when pushing a part of data to remote storage during big scrapes
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/753

Thanks to @PerGon and @clmssz for help with debugging.
2020-09-11 23:39:13 +03:00
Aliaksandr Valialkin
455bf50a91 lib/promscrape: show real timestamp and real duration for the scape on /targets page
Previously the scrape duration may be negative when calculated scrape timestamp drifts away from the real scrape timestamp
2020-08-10 12:40:25 +03:00
Aliaksandr Valialkin
23c9e6b727 lib/promscrape: export scrape_samples_added per-target metric like Prometheus does
This metric may be useful for detecting targets with high churn rate for the exported metrics.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/683
2020-08-09 12:45:39 +03:00
Aliaksandr Valialkin
d5dddb0953 all: use %w instead of %s for wrapping errors in fmt.Errorf
This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode .
See https://blog.golang.org/go1.13-errors for details.
2020-06-30 23:05:11 +03:00
Ween
1cd01b5359
Fix Auto metrics relabeled errors (#593)
* Fix Auto metrics relabeled errors

* Finalize auto-genenated  Labels

* Fix Test Errors

Co-authored-by: xinyulong <xinyulong@kuaishou.com>
2020-06-29 22:29:29 +03:00
Aliaksandr Valialkin
69004a5f67 lib/promscrape: fix tests after the commit 658a8742ac
The original commit copies `__address__` label to `instance` label when generating per-target labels as Prometheus does.

See https://www.robustperception.io/life-of-a-label for details.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/453
2020-05-03 16:56:15 +03:00
Aliaksandr Valialkin
01d7d799dc lib/promscrape: rename 'scrape_config->scrape_limit' to 'scrape_config->sample_limit'
`scrape_config` block from Prometheus config contains `sample_limit` field,
while in `vmagent` this field was mistakenly named as `scrape_limit`.
2020-04-14 11:59:57 +03:00
Aliaksandr Valialkin
04762344c6 app/vmagent: initial implementation for vmagent 2020-02-23 13:36:03 +02:00