VictoriaMetrics/lib
Aliaksandr Valialkin cd152693c6
Revert "Exemplar support (#5982)"
This reverts commit 5a3abfa041.

Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below).
Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever
once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite
that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ .

Issues with Prometheus exemplars:

- Prometheus still has only experimental support for exemplars after more than three years since they were introduced.
  It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature.
  See 0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)
  and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage

- It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs
  for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API
  for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars
  via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus .

- It looks like exemplars are supported for Histogram metric types only -
  see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar .
  Exemplars aren't supported for Counter, Gauge and Summary metric types.

- Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query
  contains histogram_quantile() function. It queries exemplars via special Prometheus API -
  https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.)
  and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work
  in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets
  exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph.
  This makes the graph unusable, since it is litterally filled with thousands of exemplar dots.
  Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars.

- Exemplars are usually connected to traces. While traces are good for some

I doubt exemplars will become production-ready in the near future because of the issues outlined above.

Alternative to exemplars:

Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs -
just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry!
This doesn't work as expected in production as shown above. Are there better solutions, which work in production?
Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job`
and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry
by these labes on the time range with the anomaly on metrics' graph.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5982
2024-07-03 16:09:18 +02:00
..
appmetrics all: add -metrics.exposeMetadata command-line flag, which can be used for adding TYPE and HELP metadata for metrics exposed at /metrics page 2023-12-19 03:26:02 +02:00
auth lib/auth: add NewTokenPossibleMultitenant() for parsing auth token, which can be multitenant 2023-08-30 14:13:51 +02:00
awsapi lib/awsapi: properly assume role with webIdentity token (#5495) 2023-12-20 19:07:04 +02:00
backup lib/backup/s3remote: fixed credsFilePath flag (#6488) 2024-06-14 14:14:58 +02:00
blockcache lib: consistently use atomic.* types instead of atomic.* functions 2024-02-24 02:10:04 +02:00
bloomfilter lib: consistently use atomic.* types instead of atomic.* functions 2024-02-24 02:10:04 +02:00
bufferedwriter app/vmselect: move common http functionality from app/vmselect/searchutils to lib/httputils 2023-07-06 17:22:23 -07:00
buildinfo all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00
bytesutil lib/bytesutil: optimize internStringMap cleanup 2024-06-13 15:09:42 +02:00
cgroup lib/cgroup: remove SetGOGC() function 2024-02-05 12:13:08 +02:00
consts app/vminsert: reduce the max packet size, which vminsert can send to vmstorage 2022-04-05 15:39:58 +03:00
decimal lib/slicesutil: add helper functions for setting slice length and extending its capacity 2024-05-12 11:33:49 +02:00
encoding lib/encoding: optimize UnmarshalVarUint64, UnmarshalVarInt64 and UnmarshalBytes a bit 2024-05-14 01:30:25 +02:00
envflag lib/envflag: do not allow unsupported form for boolean command-line flags in the form -boolFlag value 2023-08-17 13:37:05 +02:00
envtemplate allowed using dashes and dots in environment variables names (#4009) 2023-03-24 17:57:19 -07:00
fastnum lib/fastnum: use unsafe.Slice() instead of deprecated reflect.SliceHeader 2024-02-29 17:17:24 +02:00
fasttime lib: consistently use atomic.* types instead of atomic.* functions 2024-02-24 02:10:04 +02:00
filestream lib/filestream: do not measure read / write duration from / to in-memory buffers 2024-01-23 14:53:35 +02:00
flagutil app/vmagent: add max_scrape_size to scrape config (#6434) 2024-06-20 14:00:22 +02:00
formatutil app/vmbackupmanager: add metrics for better observability (#488) 2022-12-20 14:18:43 -08:00
fs lib/fs/fscore: do not trim content from path (#6503) 2024-06-19 10:37:12 +02:00
handshake lib/handshake: substitute time.Now() with fastttime.UnixTimestamp(), since profiling shows time.Now() is slow 2024-01-23 18:39:28 +02:00
htmlcomponents lib/htmlcomponents: use relative links for the top page and for favicon.ico 2023-11-13 20:28:17 +01:00
httpserver app/vlselect: properly return live tailing results 2024-06-27 15:06:15 +02:00
httputils app/vmalert: support DNS SRV record in -remoteWrite.url (#6299) 2024-05-22 10:53:22 +02:00
influxutils lib/flagutil: rename Array to ArrayString 2022-10-01 18:28:19 +03:00
ingestserver lib/logstorage: work-in-progress 2024-06-17 12:13:25 +02:00
leveledbytebufferpool lib/leveledbytebufferpool: do not pool byte slices bigger than 2^18 bytes 2024-06-13 17:02:05 +02:00
logger lib/logger: increase default -loggerMaxArgLen command-line flag value from 500 to 1000 2023-11-14 19:55:55 +01:00
logstorage lib/logstorage: allow writing after N in front of before N at stream_context pipe 2024-07-02 01:39:45 +02:00
lrucache lib: consistently use atomic.* types instead of atomic.* functions 2024-02-24 02:10:04 +02:00
memory all: cleanup: remove // +build ... lines, since they are no longer needed after Go1.17, and the minimum supported Go version for VictoriaMetrics source code is Go1.20 2023-11-13 19:15:42 +01:00
mergeset lib/mergeset: adds tracking for indexdb records drop (#6297) 2024-05-24 16:08:34 +02:00
metricsql all: make fmt via the upcoming Go1.19 2022-07-11 19:23:25 +03:00
netutil app/vmauth: fix discovering backend IPs when url_prefix contains hostname with srv+ prefix (#6401) 2024-06-12 11:47:44 +02:00
persistentqueue Fixed a typo in the FastQueue mutex comment (#6514) 2024-06-20 14:00:08 +02:00
procutil all: cleanup: remove // +build ... lines, since they are no longer needed after Go1.17, and the minimum supported Go version for VictoriaMetrics source code is Go1.20 2023-11-13 19:15:42 +01:00
promauth all: replace old https://docs.victoriametrics.com/sd_configs.html url with the new one - https://docs.victoriametrics.com/sd_configs/ 2024-04-18 02:28:26 +02:00
prompb Revert "Exemplar support (#5982)" 2024-07-03 16:09:18 +02:00
prompbmarshal Revert "Exemplar support (#5982)" 2024-07-03 16:09:18 +02:00
promrelabel lib/streamaggr: added stale samples metric, added metrics labels (#6462) 2024-07-01 15:01:49 +02:00
promscrape Revert "Exemplar support (#5982)" 2024-07-03 16:09:18 +02:00
promutils lib/logstorage: work-in-progress 2024-06-04 01:50:55 +02:00
protoparser Revert "Exemplar support (#5982)" 2024-07-03 16:09:18 +02:00
proxy lib/promscrape: use the standard net/http.Client instead of fasthttp.Client for scraping targets in non-streaming mode 2024-01-30 18:39:55 +02:00
pushmetrics lib/pushmetrics: wait until the background goroutines, which push metrics, are stopped at pushmetrics.Stop() 2024-01-16 21:18:22 +02:00
querytracer lib/querytracer: add missing blank comment line after 3121d76bee 2023-11-15 16:11:50 +01:00
ratelimiter app/vmagent: properly shutdown when -maxIngestionRate limit is reached 2024-04-03 02:41:11 +03:00
regexutil lib/logstorage: work-in-progress 2024-05-25 22:59:21 +02:00
slicesutil lib/slicesutil: add helper functions for setting slice length and extending its capacity 2024-05-12 11:33:49 +02:00
snapshot Revert "app/vmbackup: introduce new flag type URL (#6152)" 2024-04-24 17:08:26 +02:00
storage Fix Date metricid cache consistency under concurrent use (#6534) 2024-06-26 19:12:34 +02:00
streamaggr app/vmagent/remotewrite,lib/streamaggr: re-use common code in tests after 879771808b 2024-07-03 15:22:51 +02:00
stringsutil lib/logstorage: work-in-progress 2024-05-22 21:01:28 +02:00
syncwg all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00
tenantmetrics lib/encoding/zstd: switch back from atomic.Pointer to atomic.Value for map[...]... 2023-07-20 21:54:51 -07:00
timerpool lib/timerpool: use timer pool in concurrency limiters 2019-05-28 17:30:10 +03:00
timeutil all: add up to 10% random jitter to the interval between periodic tasks performed by various components 2024-01-22 18:39:16 +02:00
uint64set lib/slicesutil: add helper functions for setting slice length and extending its capacity 2024-05-12 11:33:49 +02:00
vmselectapi lib: consistently use atomic.* types instead of atomic.* functions 2024-02-24 02:10:04 +02:00
workingsetcache lib: consistently use atomic.* types instead of atomic.* functions 2024-02-24 02:10:04 +02:00
writeconcurrencylimiter app/vmagent/remotewrite: clarify the reason behind the default value for -remoteWrite.queues in the same way as the reason for -maxConcurrentInserts is defined at 73f5fb0f0c 2024-03-06 13:57:53 +02:00