Commit graph

726 commits

Author SHA1 Message Date
Max Kotliar
c05ffa906d
lib/promscrape: improve streamParse performance
Previously, performance of stream.Parse could be limited by mutex.Lock on callback function. It used shared writeContext. With complicated relabeling rules and any slowness at pushData function, it could significantly decrease parsed rows processing performance.

 This commit removes locks and makes parsed rows processing lock-free in the same manner as `stream.Parse` processing implemented at push ingestion processing.

 Implementation details:
- Removing global lock around stream.Parse callback.
- Using atomic operations for counters
- Creating write contexts per callback instead of sharing
- Improving series limit checking with sync.Once
- Optimizing labels hash calculation with buffer pooling
- Adding comprehensive tests for concurrency correctness

 Benchmark performance:
```
# before
BenchmarkScrapeWorkScrapeInternalStreamBigData-10             13          81973945 ns/op          37.68 MB/s    18947868 B/op        197 allocs/op

# after
goos: darwin
goarch: arm64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape
cpu: Apple M1 Pro
BenchmarkScrapeWorkScrapeInternalStreamBigData-10             74          15761331 ns/op         195.98 MB/s    15487399 B/op        148 allocs/op
PASS
ok      github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape       1.806s
```

Related issue:
 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8159
---------
Signed-off-by: Maksim Kotlyar <kotlyar.maksim@gmail.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2025-03-20 16:49:24 +01:00
Nikolay
27d7d0b25c
lib/promscrape: properly send staleness markers
Previously, vmagent may incorrectly store partial scrape response
in case of scrapping error. It may happen if `sw.ReadData` call fetched
some chunked response and store it at buffer. And later context deadline
exceed error happened.
 As a result, at the next scrape iteration this partial response could
 be forwarded to the `sw.sendStaleSeries(lastScrape...)` function call
 and lead to `Prometheus line` parsing error.

 This commit properly set response body to the empty value in case of
scrapping error. It prevents storing partial scrape response body. And
it no longer sends partial staleness markers to the remote storage.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8528
2025-03-19 10:37:54 +01:00
Aliaksandr Valialkin
f645479b5e
lib/protoparser: rename lib/protoparser/common to lib/protoparser/protoparserutil
This improves readability of the code, which uses this package.
2025-03-18 16:24:51 +01:00
Guillem Jover
76d205feae
spelling and grammar fixes via codespell ()
### Describe Your Changes

Fix many spelling errors and some grammar, including misspellings in
filenames. 

The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`.
While this is a breaking change, this metric isn't used in alerts or dashboards. 
So it seems to have low impact on users.

The change also deprecates `cspell` as it is much heavier and less usable. 
---------

Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
2025-03-17 16:32:10 +01:00
Aliaksandr Valialkin
0451a1c9e0
app/vlinsert: follow-up for 37ed1842ab
- Properly decode protobuf-encoded Loki request if it has no Content-Encoding header.
  Protobuf Loki message is snappy-encoded by default, so snappy decoding must be used
  when Content-Encoding header is missing.

- Return back the previous signatures of parseJSONRequest and parseProtobufRequest functions.
  This eliminates the churn in tests for these functions. This also fixes broken
  benchmarks BenchmarkParseJSONRequest and BenchmarkParseProtobufRequest, which consume
  the whole request body on the first iteration and do nothing on subsequent iterations.

- Put the CHANGELOG entries into correct places, since they were incorrectly put into already released
  versions of VictoriaMetrics and VictoriaLogs.

- Add support for reading zstd-compressed data ingestion requests into the remaining protocols
  at VictoriaLogs and VictoriaMetrics.

- Remove the `encoding` arg from PutUncompressedReader() - it has enough information about
  the passed reader arg in order to properly deal with it.

- Add ReadUncompressedData to lib/protoparser/common for reading uncompressed data from the reader until EOF.
  This allows removing repeated code across request-based protocol parsers without streaming mode.

- Consistently limit data ingestion request sizes, which can be read by ReadUncompressedData function.
  Previously this wasn't the case for all the supported protocols.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8416
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8380
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8300
2025-03-15 00:03:03 +01:00
Evgeny
486b9e1c64
lib/promscrape: use original job name as scrapePool value in targets api ()
### Fix scrapePool name

If in the scrape file, I do some magic and manipulate the job name then
Prometheus will show scrapePool as the original job name in the targets
API, but vmagent will set it to the final value which is wrong.
example
```
job: consul-targets
...

- source_labels: [ __meta_consul_service ]
      regex: (\w+)[_-]exporter
      target_label: job
      replacement: $1
```

curl to prom API will show
`"scrapePool": "consul-targets",`
vmagent:
`""scrapePool": "node",`

before changes:
```
curl -s 'http://localhost:8429/api/v1/targets' | jq -r '.data.activeTargets[].scrapePool'| sort|uniq
blackbox
pgbackrest
postgres
```
after changes
```
curl -s 'http://localhost:8429/api/v1/targets' | jq -r '.data.activeTargets[].scrapePool'| sort|uniq
blackbox
consul-targets
```

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-03-11 13:11:35 +01:00
Roman Khavronenko
63f6ac3ff8
lib/promutils: move time-related funcs from promutils to timeutil ()
Since funcs `ParseDuration` and `ParseTimeMsec` are used in vlogs,
vmalert, victoriametrics and other components, importing promutils only
for this reason makes them to export irrelevant
`vm_rows_invalid_total{type="prometheus"}` metric.

This change removes `vm_rows_invalid_total{type="prometheus"}` metric
from /metrics page for these components.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-03-03 10:25:42 +01:00
Zakhar Bessarab
99de272b72
lib/promrelabel/scrape_url: properly parse IPv6 address from __address__ label
Fix parsing of IPv6 addresses after discovery. Previously, it could lead
to target being discovered and discarded afterwards.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8374

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-02-28 14:19:10 +04:00
Roman Khavronenko
38d46d149f
lib/prompbmarshal: move MustParsePromMetrics to protoparser/prometheus ()
`MustParsePromMetrics` imports `lib/protoparser/prometheus`, and this
package exposes the following metrics:
```
vm_protoparser_rows_read_total{type="promscrape"}
vm_rows_invalid_total{type="prometheus"}
```

It means every package that uses `lib/prompbmarshal` will start exposing
these metrics. For example, vlogs imports `lib/protoparser/common` which
uses `lib/prompbmarshal.Label`. And only because of this vlogs starts
exposing unrelated prometheus metrics on /metrics page.

Moving `MustParsePromMetrics` to `lib/protoparser/prometheus` seems like
the leas intrusive change.


-----------

Depends on another change
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8403

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-02-27 22:50:27 +01:00
Aliaksandr Valialkin
467cdd8a3d
lib: consistently use logger.Panicf("BUG: ...") for logging programming bugs
logger.Fatalf("BUG: ...") complicates investigating the bug, since it doesn't show the call stack,
which led to the bug. So it is better to consistently use logger.Panicf("BUG: ...") for logging programming bugs.
2025-01-24 16:39:21 +01:00
Zhu Jiekun
276989716f
lib/promscrape: add Marathon service discovery
This commit adds support for [Marathon](https://mesosphere.github.io/marathon/)
service discovery to the scrape configuration. 

The following flag is introduced:
```
  -promscrape.marathonSDCheckInterval duration
          Interval for checking for changes in Marathon service discovery. This works only if marathon_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#marathon_sd_configs for details  (default 30s)
```

The service discovery could be config like:
```yaml
scrape_configs:          
- job_name: marathon_job 
  marathon_sd_configs:   
      servers:
        - "..."
        - "..."
```
See:
[b555d94d1a/docs/sd_configs.md (marathon_sd_configs))

related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6642


---------

Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-01-08 18:57:22 +01:00
cuiweiyuan
d064e14933
chore: fix function name in comment ()
### Describe Your Changes

 fix function name in comment

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: cuiweiyuan <cuiweiyuan@aliyun.com>
2025-01-08 13:58:22 +01:00
Zhu Jiekun
43181b67b1
discovery/dockerswarm: add missing service labels to tasks discovery role
Previously service labels won't be attached when `role: tasks` is set.
Because the `addServicesLabels` function is shared by `role: tasks` and
`role: services`, and it will return nothing when `vip.Addr` is invalid
or empty.

In Prometheus, even if `vip.Addr` is empty, it attach common service
labels with [a standalone
function](f10c3454e9/discovery/moby/services.go (L129)),
which offers:
- `__meta_dockerswarm_service_id`: the id of the service.
- `__meta_dockerswarm_service_name`: the name of the service.
- `__meta_dockerswarm_service_mode`: the mode of the service.
- `__meta_dockerswarm_service_label_<labelname>`: each label of the
service, with any unsupported characters converted to an underscore.

This PR add a `addServicesLabelsForTask`, to replace the usage of
`addServicesLabels` when `role: tasks` is set. This function offers
common service labels listed above.

related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7800
2024-12-13 11:28:04 +01:00
Zhu Jiekun
6b48126603
discovery/docker: add match_first_network support for docker_sd_configs
This commit aligns behaviour of docker service discovery with Prometheus implementation.

It adds the following changes:
* introduce new config param `match_first_network` with default value of `true`. It uses the first network if the container has multiple networks
defined.  It should help to avoid collecting duplicate targets error with multi network setups.

* add `networks` for the containers with linked network to the other containers with `network_mode: container:id` setting. It resolve an issue with attached containers aka `pods` in Kubernetes.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7398
2024-12-10 20:15:33 +01:00
Zhu Jiekun
7374a8813d
lib/promscrape/discovery: properly apply the resource_group filter for Azure service discovery
Previously, this filter did not apply to virtual
machine scale sets, causing all virtual machines to be discovered.

 This commit conditionally adds `resource_group` filter for Azure service discovery on virtual
machine scale sets. 

 Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7630.
2024-11-26 19:06:43 +01:00
Andrei Baidarov
727bc02a5c
vmagent: set up a timeout for tcp connection establishment during k8s discovery
Previously, default dial timeout was used for kubernetes API server connection.

 This commit changes it for custom dialer used by the all VictoriaMetrics components. It has lower connection timeout (30s by default). 


 Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7127

---------
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2024-11-25 18:02:09 +01:00
Nikolay
bba08f7846
lib/promscrape: add relabel configs to global section
This commit adds `metric_relabel_configs` and `relabel_configs` fields
into the `global` section of scrape configuration file.

 New fields are used as global relabeling rules for the scrape targets.

 These relabel configs are prepended to the target relabel configs.
This feature is useful to:
* apply global rules to __meta labels from service discovery targets.
* drop noisy labels during scrapping.
* mutate labels without affecting metrics ingested via any of push
protocols. 

Related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6966

---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-10-31 19:58:22 +01:00
Zhu Jiekun
f06c7e99fe
lib/promscrape: adds support for PuppetDB service discovery
This commit adds support for
[PuppetDB](https://www.puppet.com/docs/puppetdb/8/overview.html) service
discovery to the `vmagent` and `victoria-metrics-single` components.

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5744
2024-10-27 20:38:34 +01:00
Andrii Chubatiuk
fc537bea00
lib/promscrape/discovery/kubernetes: support kubernetes native sidecars ()
This commit adds Kubernetes Native Sidecar support. 

It's the special type of init containers, that have restartPolicy == "Always" and continue to run after container initialization. 


related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7287
2024-10-24 17:04:12 +02:00
Andrii Chubatiuk
965a33c893
lib/promscrape: fixed reload on max_scrape_size change ()
### Describe Your Changes

fixed reload on max_scrape_size change
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7260

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-10-18 11:35:23 +02:00
Zakhar Bessarab
eefae85450
vmagent: add support of HTTP2 client for Kubernetes SD ()
### Describe Your Changes

Currently, vmagent always uses a separate `http.Client` for every group
watcher in Kubernetes SD. With a high number of group watchers this
leads to large amount of opened connections.

This PR adds 2 changes to address this:
- re-use of existing `http.Client` - in case `http.Client` is connecting
to the same API server and uses the same parameters it will be re-used
between group watchers
- HTTP2 support - this allows to reuse connections more efficiently due
to ability of using streaming via existing connections.

See this issue for the details and test results -
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5971

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-10-08 10:36:31 +02:00
Artem Fetishev
c1cd3e85a7
lib/promscrape: Fix TestClientProxyReadOk flaky test ()
This PR fixes  

For hijacked connections, one has to read from the connection buffer,
but still write directly to the connection. Otherwise, when reading
directly from such connections, the first byte may be lost. This, in
turn corrupts the ClientHello TLS handshake message and when the backend
server receives it, it closes the connection and reports the following
error in the log:

```
http: TLS handshake error from 127.0.0.1:33150: tls: first record does not look
like a TLS handshake
```

The first byte may be lost because underlying HTTP request handler may
read it from the connection and put it into the buffer. As the result,
subsequent connection reads won't see that byte.

-   See: https://github.com/golang/go/issues/27408
-   The fix is taken from : https://github.com/k3s-io/k3s/pull/6216

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2024-10-03 18:27:15 +02:00
Zhu Jiekun
7bb8853a5c
feature: [vmagent] Add service discovery support for OVH Cloud VPS and dedicated server ()
### Describe Your Changes
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6071

#### Added
- Added service discovery support for OVH Cloud:
    - VPS.
    - Dedicated server.

#### Docs
- `CHANGELOG.md`, `sd_configs.md`, `vmagent.md` are updated.

#### Note
- Useful links: 
    - OVH Cloud VPS API: https://eu.api.ovh.com/console/#/vps~GET
- OVH Cloud Dedicated server API:
https://eu.api.ovh.com/console/#/dedicated/server~GET
    - OVH Cloud SDK: https://github.com/ovh/go-ovh
- Prometheus SD:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ovhcloud_sd_config

Tested on OVH Cloud VPS and dedicated server.
<img width="1722" alt="image"
src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/30280396/d3f0adc8-b0ef-423e-9379-8a9b9b0792ee">

<img width="1724" alt="image"
src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/30280396/18b5b730-3512-4fc0-8b2c-f2450ac550fd">

---
Signed-off-by: Jiekun <jiekun@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-09-30 14:42:46 +02:00
hagen1778
8bb3f2fd43
lib/promscrape: make linter happy
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-09-24 15:12:55 +02:00
hagen1778
c7569dac50
lib/promscrape: temporary disable TestClientProxyReadOk
This test is very flaky and prevents other tests from running in CI.
Disabling this test should improve tests quality, since it isn't reliable anyway.

There is a ticket to fix this test - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7062

Once fixed, this test should be uncommented.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-09-24 14:59:25 +02:00
Dmytro Kozlov
cbeb7d50e8
lib/promscrape: show only unhealthy targets if show_only_unhealthy filter is enabled ()
### Describe Your Changes

It is better to show only unhealthy targets instead of all of them when
`show_only_unhealthy` filter is enabled.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3536

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-09-24 12:18:24 +02:00
Zhu Jiekun
c193e6d43e
lib/discovery/azure: fix host check in next link in Azure SD ()
Previous bugfix at 49f63b2 only partially fixed pagination host validation error.

 Before this fix it was:
```
unexpected nextLink host \"management.azure.com\", expecting \"https://management.azure.com\"
```

Now we only check the `Host` without schema. 

However, when Azure respond `nextLink` in `Host:Port` format, the
`nextLink` check will fail:
```
unexpected nextLink host \"management.azure.com:443\", expecting \"management.azure.com\"
```

This pull request further relaxes the checks by only checking the
`Hostname`.

---

 related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6912
2024-09-05 16:48:09 +02:00
Roman Khavronenko
f586082520
attempt to fix flaky TestClientProxyReadOk ()
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-08-30 13:23:32 +02:00
Zhu Jiekun
e97e966f82
lib/promrelabel: follow-up for 8958cecad6
In the previous commit 8958cecad6
the default ports (80/443) were removed for both the `scrapeURL` and
`instance` label values for those targets without a port in
`__address__`. Different values in the `instance` label generate new
time series.

This commit reverts the changes made to the `instance` label. Now,
for those targets:
- `scrapeURL` will remain unchanged.
- The `instance` label value will include the default port.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792
2024-08-27 13:04:26 +02:00
Nikolay
9feee15493
lib/promscrape: fixes proxy autorization ()
* Adds custom dial func for HTTP-Connect and socks5 proxy tunnels.
  Standard golang http.transport exposes GetProxyConnectHeader function,
  but it doesn't allow to use separate tls config for proxy.
  It also not possible to enforce HTTP-Connect with standard http lib.
* For http scrape targets, by default http.Transport.Proxy function must
  be used. Since it has special case with full uri forward.
* Adds proxy.URL json methods that allow to properly copy internal
fields, like User/Password.
It should fix bug with proxy_url. When credentials specified at URL was
ignored.
* Adds tests for scrape client proxy requests

related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6771
2024-08-19 22:31:18 +02:00
Zhu Jiekun
723d834c1a
lib/promrelabel: stop adding default port 80/433 to address label
*  It was necessary to add default ports for fasthttp client. After migration to the std.httpclient it's no longer needed.
* An additional configuration is required at proxy servers with implicitly set 80/443 ports to the host header (such as HA proxy.

It's expected that after upgrade __address_ label may change. But it should be rare case. 80/443 ports are not widely used at monitoring ecosystem. And it shouldn't have much impact. 

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792

Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-08-19 22:28:49 +02:00
Zhu Jiekun
9e2bd82376
app/vmagent: fixes azure service discovery pagination
Azure API response with link to the next page was incorrectly validate. Validation used url.Host header to match configure API URL.


https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6784
2024-08-09 15:22:47 +02:00
Anzor
994796367b
app/vmagent: read __sample_limit__ from labels () ()
By introducing this feature, users will have the ability to customize
the sampleLimit parameter on a per-target basis, providing more
flexibility and control over the job execution behavior.
2024-08-07 09:36:14 +02:00
Aliaksandr Valialkin
9c4b0334f2
all: consistently use stringsutil.JSONString() for formatting JSON strings with fmt.* functions instead of using "%q" formatter
The %q formatter may result in incorrectly formatted JSON string if the original string
contains special chars such as \x1b . They must be encoded as \u001b , otherwise the resulting JSON string
cannot be parsed by JSON parsers.

This is a follow-up for c0caa69939

See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24
2024-07-17 13:52:13 +02:00
Aliaksandr Valialkin
58a757cd01
lib: consistently use regexp.Regexp.ReplaceAllLiteralString instead of regexp.Regexp.ReplaceAllString in places where the replacement cannot contain matching group placeholders 2024-07-17 12:41:54 +02:00
Aliaksandr Valialkin
4304950391
lib/promscrape/discovery/yandexcloud: follow-up for 070abe5c71
- Obtain IAM token via GCE-like API instead of Amazon EC2 IMDSv2 API,
  since it looks like IMDBSv2 API isn't supported by Yandex Cloud
  according to https://yandex.cloud/en/docs/security/standard/authentication#aws-token :

  > So far, Yandex Cloud does not support version 2, so it is strongly recommended
  > to technically disable getting a service account token via the Amazon EC2 metadata service.

- Try obtaining IAM token via GCE-like API at first and then fall back to the deprecated Amazon EC2 IMDBSv1.
  This should prevent from auth errors for instances with disabled GCE-like auth API.
  This addresses @ITD27M01 concern at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513#issuecomment-1867794884

- Make more clear the description of the change at docs/CHANGELOG.md , add reference to the related issue.

P.S. This change wasn't tested in prod because I have no access to Yandex Cloud.
It is recommended to test this change by @ITD27M01 and @vmazgo , who filed
the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6524
2024-07-16 17:58:40 +02:00
Aliaksandr Valialkin
57000f5105
lib/promscrape: follow-up for 1e83598be3
- Clarify that the -promscrape.maxScrapeSize value is used for limiting the maximum
  scrape size if max_scrape_size option isn't set at https://docs.victoriametrics.com/sd_configs/#scrape_configs

- Fix query example for scrape_response_size_bytes metric at https://docs.victoriametrics.com/vmagent/#automatically-generated-metrics

- Mention about max_scrape_size option at the -help description for -promscrape.maxScrapeSize command-line flag

- Treat zero value for max_scrape_size option as 'no scrape size limit'

- Change float64 to int type for scrapeResponseSize struct fields and function args, since response size cannot be fractional

- Optimize isAutoMetric() function a bit

- Sort auto metrics in alphabetical order in isAutoMetric() and in scrapeWork.addAutoMetrics() functions
  for better maintainability in the future

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6434
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429
2024-07-16 12:38:21 +02:00
Aliaksandr Valialkin
a468a6e985
lib/{httputils,netutil}: move httputils.GetStatDialFunc to netutil.NewStatDialFunc
- Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call
- NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil
- Simplify the implementation of NewStatDialFunc by removing sync.Map from there.
- Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils
- Use gauge instead of counter type for *_conns metric

This is a follow-up for d7b5062917
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299
2024-07-15 23:02:34 +02:00
Aliaksandr Valialkin
3c02937a34
all: consistently use 'any' instead of 'interface{}'
'any' type is supported starting from Go1.18. Let's consistently use it
instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.
2024-07-10 00:20:37 +02:00
Aliaksandr Valialkin
a9525da8a4
lib: consistently use f-tests instead of table-driven tests
This makes easier to read and debug these tests. This also reduces test lines count by 15% from 3K to 2.5K
See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e

While at it, consistently use t.Fatal* instead of t.Error*, since t.Error* usually leads
to more complicated and fragile tests, while it doesn't bring any practical benefits over t.Fatal*.
2024-07-09 22:40:50 +02:00
Aliaksandr Valialkin
35b3b95cbc
lib/promscrape/discovery/vultr: follow-up after 17e3d019d2
- Sort the discovered labels in alphabetical order at https://docs.victoriametrics.com/sd_configs/#vultr_sd_configs
- Rename VultrConfigs to VultrSDConfigs to be consistent with the naming for other SD configs.
- Prepare query arg filters for `list instances API` at newAPIConfig() instead of passing them in a separate listParams struct.
  This simplifies the code a bit.
- Return error when bearer token isn't set at vultr_sd_configs, since this token is mandatory
  according to https://docs.victoriametrics.com/sd_configs/#vultr_sd_configs
- Remove unused fields from the parsed response from Vultr list instances API in order to simplify the code a bit.
- Remove double logging of errors inside getInstances() function, since these errors must be already logged by the caller.
- Simplify tests, so they are easier to maintain.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6068
2024-07-05 17:40:03 +02:00
Aliaksandr Valialkin
d8c7cc266b
lib/promscrape: use prompbmarshal.MustParsePromMetrics function at parseData() test function
The prompbmarshal.MustParsePromMetrics function has been added in the commit cc4d57d650
2024-07-03 16:08:13 +02:00
Aliaksandr Valialkin
bb00bae353
Revert "Exemplar support ()"
This reverts commit 5a3abfa041.

Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below).
Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever
once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite
that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ .

Issues with Prometheus exemplars:

- Prometheus still has only experimental support for exemplars after more than three years since they were introduced.
  It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature.
  See 0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)
  and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage

- It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs
  for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API
  for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars
  via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus .

- It looks like exemplars are supported for Histogram metric types only -
  see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar .
  Exemplars aren't supported for Counter, Gauge and Summary metric types.

- Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query
  contains histogram_quantile() function. It queries exemplars via special Prometheus API -
  https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.)
  and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work
  in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets
  exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph.
  This makes the graph unusable, since it is litterally filled with thousands of exemplar dots.
  Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars.

- Exemplars are usually connected to traces. While traces are good for some

I doubt exemplars will become production-ready in the near future because of the issues outlined above.

Alternative to exemplars:

Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs -
just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry!
This doesn't work as expected in production as shown above. Are there better solutions, which work in production?
Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job`
and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry
by these labes on the time range with the anomaly on metrics' graph.
2024-07-03 15:30:21 +02:00
Andrii Chubatiuk
070abe5c71
added IMDSv2 for YC SD ()
### Describe Your Changes

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-06-26 18:03:21 +02:00
Andrii Chubatiuk
1e83598be3
app/vmagent: add max_scrape_size to scrape config ()
Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-06-20 13:58:42 +02:00
Zakhar Bessarab
34071ac660
lib/promscrape: increase default value for promscrape.maxDroppedTargets to 10_000 ()
### Describe Your Changes
This limit can be increased since after
4513893ead
tracking of dropped targets uses much less memory per entry.

See:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6381#issuecomment-2156708228


### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-06-12 16:34:18 +02:00
Hui Wang
d7b5062917
app/vmalert: support DNS SRV record in -remoteWrite.url ()
part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053,
supports [DNS SRV](https://en.wikipedia.org/wiki/SRV_record) address in
`-remoteWrite.url` command-line option.
2024-05-22 10:52:51 +02:00
Zhu Jiekun
17e3d019d2
feature: [vmagent] Add service discovery support for Vultr ()
### Describe Your Changes
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041

#### Added
- Added service discovery support for Vultr.

#### Docs
- `CHANGELOG.md`, `sd_configs.md`, `vmagent.md` are updated.

#### Note
- Useful links: 
- Vultr API:
https://www.vultr.com/api/#tag/instances/operation/list-instances
    - Vultr client SDK: https://github.com/vultr/govultr
- Prometheus SD:
https://github.com/prometheus/prometheus/tree/main/discovery/vultr

---
### Checklist

The following checks are mandatory:

- [X] I have read the [Contributing
Guidelines](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/CONTRIBUTING.md)
- [x] All commits are signed and include `Signed-off-by` line. Use `git
commit -s` to include `Signed-off-by` your commits. See this
[doc](https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work) about
how to sign your commits.
- [x] Tests are passing locally. Use `make test` to run all tests
locally.
- [x] Linting is passing locally. Use `make check-all` to run all
linters locally.

Further checks are optional for External Contributions:

- [X] Include a link to the GitHub issue in the commit message, if issue
exists.
- [x] Mention the change in the
[Changelog](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/CHANGELOG.md).
Explain what has changed and why. If there is a related issue or
documentation change - link them as well.

  Tips for writing a good changelog message::

* Write a human-readable changelog message that describes the problem
and solution.
* Include a link to the issue or pull request in your changelog message.
* Use specific language identifying the fix, such as an error message,
metric name, or flag name.
* Provide a link to the relevant documentation for any new features you
add or modify.

- [ ] After your pull request is merged, please add a message to the
issue with instructions for how to test the fix or try the feature you
added. Here is an
[example](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4048#issuecomment-1546453726)
- [x] Do not close the original issue before the change is released.
Please note, in some cases Github can automatically close the issue once
PR is merged. Re-open the issue in such case.
- [x] If the change somehow affects public interfaces (a new flag was
added or updated, or some behavior has changed) - add the corresponding
change to documentation.

Signed-off-by: Jiekun <jiekun.dev@gmail.com>
2024-05-08 10:01:48 +02:00
Ted Possible
5a3abfa041
Exemplar support ()
This code adds Exemplars to VMagent and the promscrape parser adhering
to OpenMetrics Specifications. This will allow forwarding of exemplars
to Prometheus and other third party apps that support OpenMetrics specs.

---------

Signed-off-by: Ted Possible <ted_possible@cable.comcast.com>
2024-05-07 12:09:44 +02:00
Aliaksandr Valialkin
7531e9084a
all: use clear() built-in Go function for clearing []prompbmarshal.TimeSeries and []prompbmarshal.Label slices
This makes the code a bit clear.
2024-04-20 21:00:03 +02:00