Commit graph

8637 commits

Author SHA1 Message Date
Aliaksandr Valialkin
cd152693c6
Revert "Exemplar support (#5982)"
This reverts commit 5a3abfa041.

Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below).
Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever
once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite
that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ .

Issues with Prometheus exemplars:

- Prometheus still has only experimental support for exemplars after more than three years since they were introduced.
  It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature.
  See 0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)
  and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage

- It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs
  for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API
  for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars
  via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus .

- It looks like exemplars are supported for Histogram metric types only -
  see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar .
  Exemplars aren't supported for Counter, Gauge and Summary metric types.

- Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query
  contains histogram_quantile() function. It queries exemplars via special Prometheus API -
  https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.)
  and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work
  in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets
  exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph.
  This makes the graph unusable, since it is litterally filled with thousands of exemplar dots.
  Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars.

- Exemplars are usually connected to traces. While traces are good for some

I doubt exemplars will become production-ready in the near future because of the issues outlined above.

Alternative to exemplars:

Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs -
just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry!
This doesn't work as expected in production as shown above. Are there better solutions, which work in production?
Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job`
and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry
by these labes on the time range with the anomaly on metrics' graph.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5982
2024-07-03 16:09:18 +02:00
Aliaksandr Valialkin
a5d60ad78e
app/vmagent/remotewrite,lib/streamaggr: re-use common code in tests after 879771808b
- Export streamaggr.LoadFromData() function, so it could be used in tests outside the lib/streamaggr package.
  This allows removing a hack with creation of temporary files at TestRemoteWriteContext_TryPush_ImmutableTimeseries.

- Move common code for mustParsePromMetrics() function into lib/prompbmarshal package,
  so it could be used in tests for building []prompbmarshal.TimeSeries from string.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6206
2024-07-03 15:22:51 +02:00
Artem Navoiev
db5b9ca8cd
github acitons: ghaction-import-gpg v5 -> v6
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-07-03 15:22:50 +02:00
Github Actions
bc0303bca3
Automatic update Grafana datasource docs from VictoriaMetrics/victoriametrics-datasource@6182902 (#6572) 2024-07-03 15:22:50 +02:00
Github Actions
730088caaf
Automatic update Grafana datasource docs from VictoriaMetrics/victorialogs-datasource@a04844a (#6571) 2024-07-03 15:22:50 +02:00
Github Actions
a72d0310c0
Automatic update Grafana datasource docs from VictoriaMetrics/victorialogs-datasource@a53ccd7 (#6570) 2024-07-03 14:20:56 +02:00
Aliaksandr Valialkin
4268a310c1
app/vmagent/remotewrite/remotewrite.go: make remoteWriteCtx.TryPush code easier to follow
Move the code responsible for relabelCtx clearing into deferred function.
This allows making more clear the remoteWriteCtx.TryPush code.

This is a follow-up for 879771808b

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205

While at it, clarify the description of the bugfix at docs/CHANGELOG.md
2024-07-03 14:18:51 +02:00
Aliaksandr Valialkin
f406764ccc
app/vmagent/remotewrite/streamaggr.go: clarify the description for -remoteWrite.streamAggr.* command-line flags, so they are applied to the corresponding -remoteWrite.url 2024-07-03 14:18:51 +02:00
Artem Navoiev
f327eac81b
add file placeholder grafana logs datasource docs
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-07-03 14:18:50 +02:00
Artem Navoiev
7ddaf51414
remove unsued image
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-07-03 14:18:50 +02:00
Github Actions
3a11c84ee8
Automatic update Grafana datasource docs from VictoriaMetrics/victoriametrics-datasource@dae8560 (#6568)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Co-authored-by: Artem Navoiev <tenmozes@gmail.com>
2024-07-03 14:18:49 +02:00
Aliaksandr Valialkin
f9b64cdfe0
docs/MetricsQL.md: document which metric types are usually passed to which rollup functions
This should reduce incorrect usage of rollup functions like this one - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3974#issuecomment-2205574667
2024-07-03 11:36:58 +02:00
Aliaksandr Valialkin
547aa6108e
docs: update VictoriaMetrics release from v1.100.1 to v1.101.0 across all the docs
This is a follow-up for 5e8c087d42

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6194
2024-07-03 01:07:07 +02:00
Aliaksandr Valialkin
bb7406e9c0
app/vmselect/promql: follow-up for dd0d2c77c8 and 6149adbe10
Use metricsql.IsLikelyInvalid() function for determining whether the given query is likely invalid,
e.g. there is high change the query is incorrectly written, so it will return unexpected results.

The query is invalid most of the time if it passes something other than series selector into rollup function.
For example:

- rate(sum(foo))
- rate(foo + bar)
- rate(foo > bar)

Improtant note: the query is considered valid if it misses the lookbehind window in square brackes inside rollup function,
e.g. rate(foo), since this is very convenient MetricsQL extention to PromQL, and this query returns the expected results
most of the time.

Other unsafe query types can be added in the future into metricsql.IsLikelyInvalid().

TODO: probably, the -search.disableImplicitConversion command-line flag must be set by default in the future releases of VictoriaMetrics.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4338
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6180
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6450
2024-07-03 00:46:56 +02:00
Aliaksandr Valialkin
82748b2b9d
deployment/docker: update Go builder from Go1.22.4 to Go1.22.5
See https://github.com/golang/go/issues?q=milestone%3AGo1.22.5+label%3ACherryPickApproved
2024-07-03 00:07:55 +02:00
Aliaksandr Valialkin
7c9212afeb
vendor: run make vendor-update 2024-07-03 00:00:23 +02:00
Aliaksandr Valialkin
f8779d1ed2
lib/streamaggr: follow-up for the commit c0e4ccb7b5
- Clarify docs for `Ignore aggregation intervals on start` feature.

- Make more clear the code dealing with ignoreFirstIntervals at aggregator.runFlusher() functions.
  It is better from readability and maintainability PoV using distinct a.flush() calls
  for distinct cases instead of merging them into a single a.flush() call.

- Take into account the first incomplete interval when tracking the number of skipped aggregation intervals,
  since this behaviour is easier to understand by the end users.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6137
2024-07-02 21:34:48 +02:00
Aliaksandr Valialkin
41f95d90f9
README.md: sync with docs/Cluster-VictoriaMetrics.md after the commit ae76794a19
This fixes `make docs-sync` results in cluster branch

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3923
2024-07-02 18:56:21 +02:00
hagen1778
e423c4b72c
docs: mention graphite.sanitizeMetricName in cluster docs
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit ffb49c677b)
2024-07-02 17:18:07 +02:00
Andrii Chubatiuk
252aa5a3ab
lib/protoparser/graphite: added -graphite.sanitizeMetricName flag (#6489)
### Describe Your Changes

Added flag to sanitize graphite metrics
fixes #6077

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 476faf5578)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-07-02 17:16:00 +02:00
Zakhar Bessarab
4ece1747d3
vmbackupmanager: fix state restore (#773)
* app/vmbackupmanager: fix state restore on startup

Fix improperly treating completed backups as "failed" in metrics.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/changelog: document the fix

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit f3831bdd13)
2024-07-02 14:37:15 +02:00
LHHDZ
c8431c8e4d
app/vmauth: reader pool to reduce gc & mem alloc (#6533)
follow up https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6446

issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6445

---------

Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
(cherry picked from commit 4d66e042e3)
2024-07-02 14:37:15 +02:00
Github Actions
3fda4cc4de
Automatic update Grafana datasource docs from VictoriaMetrics/victoriametrics-datasource@e9bd079 (#6564)
(cherry picked from commit dd97dd6373)
2024-07-02 14:37:15 +02:00
hagen1778
309a767fc5
dashboards: fix wrong templating for vmauth
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit e45d80cd79)
2024-07-02 14:37:15 +02:00
Aliaksandr Valialkin
fed596b202
docs/VictoriaLogs/CHANGELOG.md: cut v0.27.0-victorialogs 2024-07-02 01:43:54 +02:00
Aliaksandr Valialkin
7426b40250
lib/logstorage: allow writing after N in front of before N at stream_context pipe 2024-07-02 01:39:45 +02:00
Aliaksandr Valialkin
0912a652d5
app/vlinsert/insertutils: flush the ingested logs from in-memory buffer to storage every second
Previously the in-memory buffer could remain unflushed for long periods of time under low ingestion rate.
The ingested logs weren't visible for search during this time.
2024-07-02 01:39:45 +02:00
Aliaksandr Valialkin
ab28a1f93e
app/vlinsert/syslog: add an ability to use log ingestion time as the _time field 2024-07-02 01:39:45 +02:00
Aliaksandr Valialkin
f4dd9bd988
docs/VictoriaLogs/CHANGELOG.md: use new url https://docs.victoriametrics.com/victorialogs/querying/ instead of old one https://docs.victoriametrics.com/VictoriaLogs/querying/
(cherry picked from commit 387b3b7fb7)
2024-07-01 16:40:44 +02:00
Hui Wang
085bc1f15c
vmui: increase max query tab from 4 to 10 (#6546)
(cherry picked from commit 9da78f1e0e)
2024-07-01 16:40:42 +02:00
Andrii Chubatiuk
65c742d976
deployment: remove snap packages support (#6543)
### Describe Your Changes

Removed snap packages support as it requires time for maintenance and
it's not popular at all

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 0a42c8fd8b)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-07-01 16:40:39 +02:00
Hui Wang
87cb132f53
app/vmselect/netstorage: do not retry request when complexity limit i… (#6469)
…s already exceeded

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-07-01 16:38:15 +02:00
Andrii Chubatiuk
937ae2ca90
lib/streamaggr: added stale samples metric, added metrics labels (#6462)
### Describe Your Changes

- added stale metrics counters for input and output samples
- added labels for aggregator metrics =>
`name="{rwctx}:{aggrId}:{aggrSuffix}"`
   - rwctx - global or number starting from 1
   - aggrid - aggregator id starting from 1
   - aggrSuffix - <interval>_(by|without)_label1_label2_labeln
   e.g: `name="global:1:1m_without_instance_pod"`

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 861852f262)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-07-01 15:01:49 +02:00
hagen1778
69625aa8a1
docs: mark without optional in stream aggr docs
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit f8eea0f2c9)
2024-07-01 14:56:46 +02:00
Aliaksandr Valialkin
5143e93534
docs/VictoriaLogs/CHANGELOG.md: cut v0.26.1-victorialogs 2024-07-01 02:33:02 +02:00
Aliaksandr Valialkin
208a624d4d
lib/logstorage: properly search for the surrounding logs in stream_context pipe
The set of log fields in the found logs may differ from the set of log fields present in the log stream.
So compare only the log fields in the found logs when searching for the matching log entry in the log stream.

While at it, return _stream field in the delimiter log entry, since this field is used by VictoriaLogs Web UI
for grouping logs by log streams.
2024-07-01 02:33:00 +02:00
Aliaksandr Valialkin
516c7b2ca0
docs/VictoriaLogs: remove "preview" warning - VictoriaLogs is ready for prod 2024-07-01 01:52:40 +02:00
Aliaksandr Valialkin
0e6ee55a95
docs/VictoriaLogs/CHANGELOG.md: cut v0.26.0-victorialogs 2024-07-01 01:49:55 +02:00
Aliaksandr Valialkin
76a58ae08d
lib/logstorage: add ability to store sorted log position into a separate field with sort ... rank <fieldName> syntax 2024-07-01 01:46:03 +02:00
Aliaksandr Valialkin
d0dca7b8c5
lib/logstorage: add delimiter between log chunks returned from | stream_context pipe 2024-07-01 01:46:02 +02:00
Artem Navoiev
2ac2d0919d
operator: sync docs with operator repo, add alias
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-06-29 17:19:49 +02:00
Aliaksandr Valialkin
76053a0ef0
docs/VictoriaLogs: typo fixes 2024-06-28 19:26:32 +02:00
Aliaksandr Valialkin
75c7b2c07a
docs/VictoriaLogs/CHANGELOG.md: cut v0.25.0-victorialogs 2024-06-28 19:18:21 +02:00
Aliaksandr Valialkin
4b3477e62b
lib/logstorage: add stream_context pipe, which allows selecting surrounding logs for the matching logs 2024-06-28 19:15:19 +02:00
Aliaksandr Valialkin
c9fc8079c4
app/vlinsert/syslog: properly skip empty lines in Syslog protocol
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6548
2024-06-28 14:09:45 +02:00
Aliaksandr Valialkin
bb6424aeca
app/vlselect/logsql: add optional fields_limit query arg to /select/logsql/hits HTTP endpoint
This query arg is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6545
in order to return top N groups with the biggest number of hits.
2024-06-28 03:10:05 +02:00
Aliaksandr Valialkin
3eecc3de8c
docs/VictoriaLogs: typo fixes 2024-06-28 03:10:05 +02:00
Aliaksandr Valialkin
5a24ea6cc3
docs/VictoriaLogs: document how to export logs from VictoriaLogs
While at it, mention vmctl-like tool for migrating logs from other systems to VictoriaLogs at docs/VictoriaLogs/Roadmap.md

Thanks to the question from Xavier Pestel at https://www.linkedin.com/feed/update/urn:li:activity:7212093021301927937?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7212093021301927937%2C7212171550651731969%29&dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287212171550651731969%2Curn%3Ali%3Aactivity%3A7212093021301927937%29
2024-06-28 02:06:24 +02:00
Aliaksandr Valialkin
2f28819bb1
lib/logstorage: it is safe using | unroll pipe in live tailing
`| unroll` pipe can make multiple copies of rows from the input row.
This doesn't break live tailing, so allow `| unroll` pipe in live tailing.
2024-06-27 19:45:12 +02:00
Aliaksandr Valialkin
05365999ea
docs/VictoriaLogs/querying/README.md: clarify live tailing docs 2024-06-27 19:12:54 +02:00