Commit graph

32 commits

Author SHA1 Message Date
Aliaksandr Valialkin
cd152693c6
Revert "Exemplar support (#5982)"
This reverts commit 5a3abfa041.

Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below).
Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever
once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite
that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ .

Issues with Prometheus exemplars:

- Prometheus still has only experimental support for exemplars after more than three years since they were introduced.
  It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature.
  See 0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)
  and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage

- It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs
  for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API
  for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars
  via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus .

- It looks like exemplars are supported for Histogram metric types only -
  see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar .
  Exemplars aren't supported for Counter, Gauge and Summary metric types.

- Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query
  contains histogram_quantile() function. It queries exemplars via special Prometheus API -
  https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.)
  and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work
  in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets
  exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph.
  This makes the graph unusable, since it is litterally filled with thousands of exemplar dots.
  Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars.

- Exemplars are usually connected to traces. While traces are good for some

I doubt exemplars will become production-ready in the near future because of the issues outlined above.

Alternative to exemplars:

Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs -
just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry!
This doesn't work as expected in production as shown above. Are there better solutions, which work in production?
Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job`
and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry
by these labes on the time range with the anomaly on metrics' graph.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5982
2024-07-03 16:09:18 +02:00
Ted Possible
0206a01d03
Exemplar support (#5982)
This code adds Exemplars to VMagent and the promscrape parser adhering
to OpenMetrics Specifications. This will allow forwarding of exemplars
to Prometheus and other third party apps that support OpenMetrics specs.

---------

Signed-off-by: Ted Possible <ted_possible@cable.comcast.com>
(cherry picked from commit 5a3abfa041)
2024-05-10 13:14:17 +02:00
Aliaksandr Valialkin
9004bc098e
all: use clear() built-in Go function for clearing []prompbmarshal.TimeSeries and []prompbmarshal.Label slices
This makes the code a bit clear.
2024-04-20 21:00:24 +02:00
Aliaksandr Valialkin
0211a04a52
all: replace the outdated url https://docs.victoriametrics.com/vmagent.html with the new one - https://docs.victoriametrics.com/vmagent/ 2024-04-18 01:32:57 +02:00
Aliaksandr Valialkin
63d635a5e4
app: consistently use atomic.* types instead of atomic.* functions
See ea9e2b19a5
2024-02-24 03:06:14 +02:00
Aliaksandr Valialkin
3b18659487
app/vmagent/remotewrite: limit the concurrency for marshaling time series before sending them to remote storage
There is no sense in running more than GOMAXPROCS concurrent marshalers,
since they are CPU-bound. More concurrent marshalers do not increase the marshaling bandwidth,
but they may result in more RAM usage.
2024-01-30 12:20:27 +02:00
Aliaksandr Valialkin
d52fd73f18
all: add up to 10% random jitter to the interval between periodic tasks performed by various components
This should smooth CPU and RAM usage spikes related to these periodic tasks,
by reducing the probability that multiple concurrent periodic tasks are performed at the same time.
2024-01-22 18:39:16 +02:00
Aliaksandr Valialkin
d566aa7d78
lib/prompbmarshal: switch to github.com/VictoriaMetrics/easyproto 2024-01-16 20:48:30 +02:00
Aliaksandr Valialkin
2f14394335
app/vmagent: follow-up for 090cb2c9de
- Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check
  for the result returned from these functions.

- Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue.

- Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function
  after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case.

- Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead
  of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage
  cannot keep up with the data ingestion rate.

- Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples.

- Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push
  data to persistent queue when -remoteWrite.disableOnDiskQueue is set.

- Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics,
  because they are replaced with vmagent_remotewrite_samples_dropped_total metric.

- Update 'Disabling on-disk persistence' docs at docs/vmagent.md

- Update stale comments in the code

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110
2023-11-25 12:13:39 +02:00
Nikolay
25ac2aac31
app/vmagent: allow to disabled on-disk persistence (#5088)
* app/vmagent: allow to disabled on-disk queue
Previously, it wasn't possible to build data processing pipeline with a
chain of vmagents. In case when remoteWrite for the last vmagent in the
chain wasn't accessible, it persisted data only when it has enough disk
capacity. If disk queue is full, it started to silently drop ingested
metrics.

New flags allows to disable on-disk persistent and immediatly return an
error if remoteWrite is not accessible anymore. It blocks any writes and
notify client, that data ingestion isn't possible.

Main use case for this feature - use external queue such as kafka for
data persistence.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110

* adds test, updates readme

* apply review suggestions

* update docs for vmagent

* makes linter happy

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-25 12:12:29 +02:00
Aliaksandr Valialkin
d9581cf3f3
app/vmagent: add -remoteWrite.vmProtoCompressLevel command-line flag for tuning the compression level for VictoriaMetrics remote write protocol 2023-02-27 11:04:11 -08:00
Aliaksandr Valialkin
80c6d1e24c
app/vmagent: add support for VictoriaMetrics remote write protocol, which allows saving up to 10x on network bandwidth costs under high load
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-20 18:40:40 -08:00
Aliaksandr Valialkin
1a88fe5b1f
lib/flagutil/bytes.go: properly handle values bigger than 2GiB on 32-bit architectures
This fixes handling of values bigger than 2GiB for the following command-line flags:

- -storage.minFreeDiskSpaceBytes
- -remoteWrite.maxDiskUsagePerURL
2022-12-14 19:29:57 -08:00
Aliaksandr Valialkin
9d91ad7124
app/vmagent/remotewrite: prevent from infinite recursion panic when pushing a time series with big number of samples to remote storage
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2335
2022-03-18 19:07:27 +02:00
Aliaksandr Valialkin
63c29099ba
app/vmagent: add -remoteWrite.maxRowsPerBlock command-line option, which may be used for improving data ingestion performance under high load 2021-11-04 15:39:55 +02:00
Aliaksandr Valialkin
dcac849c1f app/vmagent/remotewrite: sort labels before sending the series to per-remoteWrite.url queues 2021-05-20 11:54:06 +03:00
Aliaksandr Valialkin
f92db26a93 app/vmagent/remotewrite: count maxLabelsPerBlock as 10x of maxRowsPerBlock
This should increase block sizes and subsequently increase the maximum possible bandwidth per each connection to remote storage.
This, in turn, should reduce the probability of storing the data in local buffers.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1235
2021-04-23 22:05:00 +03:00
Aliaksandr Valialkin
512addc608 app/{vminsert,vmagent}: add -sortLabels command-line option for sorting time series labels before ingesting them in the storage
This option can be useful when samples for the same time series are ingested with distinct order of labels.
For example, metric{k1="v1",k2="v2"} and metric{k2="v2",k1="v1"}.
2021-03-31 23:27:21 +03:00
Aliaksandr Valialkin
b873b965af app/vmagent/remotewrite: reduce memory usage when samples with big number of labels are sent to remote storage 2021-03-31 00:45:42 +03:00
Aliaksandr Valialkin
323af49234 app/vmagent/remotewrite: clarify -remoteWrite.flushInterval flag description 2021-03-01 11:51:08 +02:00
Aliaksandr Valialkin
fdf9de98f8 app/vmagent: add -remoteWrite.roundDigits command-line option for limiting the number of digits after the point for stored values
This commit also adds --vm-round-digits command-line option to vmctl tool.
2021-02-01 14:42:15 +02:00
Aliaksandr Valialkin
b8083b7659 lib/promscrape: clean references to label name and label value strings after applying per-target relabeling
This should reduce memory usage when per-target relabeling creates big number of temporary labels
with long names and/or values.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/825
2020-11-07 16:19:52 +02:00
Aliaksandr Valialkin
8fc9b77496 app/vmagent: reduce memory usage when importing data via /api/v1/import
Previously vmagent could use big amounts of RAM when each ingested JSON line
contained many samples.
2020-09-26 04:10:13 +03:00
Aliaksandr Valialkin
a3cdef6b06 app/vmagent: properly flush big blocks of data
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/741

Thanks to @IceRain00 for the investigation and initial attempt to fix the issue
at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/742
2020-09-03 12:12:12 +03:00
Aliaksandr Valialkin
de216bab41 app/vmagent: fix data race when accessing writeRequest.lastFlushTime 2020-09-03 12:12:09 +03:00
Aliaksandr Valialkin
6aab2f4989 all: allow using KB, MB, GB, KiB, MiB and GiB suffixes in command-line flag values related to byte sizes or byte rates 2020-08-16 17:08:28 +03:00
Aliaksandr Valialkin
6c43ba1cb1 app/vmagent/remotewrite: remove unused import after the commit 93267f143f 2020-05-15 17:42:31 +03:00
Aliaksandr Valialkin
1d71253653 app/vmagent/remotewrite: allow ingesting time series with multiple samples at once
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/481
2020-05-15 17:37:27 +03:00
Aliaksandr Valialkin
3845420a8f lib: extract common code for returning fast unix timestamp into lib/fasttime 2020-05-14 23:06:50 +03:00
Aliaksandr Valialkin
c6c7843e93 app/vmagent: add -remoteWrite.maxBlockSize command-line flag for limiting the maximum size of unpacked block to send to remote storage 2020-02-25 19:58:11 +02:00
Aliaksandr Valialkin
c4194020ef app/vmagent: do not allow sending unpacked requests with sizes exceeding -maxInsertRequestSize 2020-02-25 19:35:43 +02:00
Aliaksandr Valialkin
7ee7614e90 app/vmagent: initial implementation for vmagent 2020-02-23 17:31:54 +02:00