### Describe Your Changes
- replace docs in root README with a link to official documentation
- remove old make commands for documentation
- remove redundant "VictoriaMetrics" from document titles
- merge changelog docs into a section
- rm content of Single-server-VictoriaMetrics.md as it can be included from docs/README
- add basic information to README in the root folder, so it will be useful for github users
- rm `picture` tag from docs/README as it was needed for github only, we don't display VM logo at docs.victoriametrics.com
- update `## documentation` section in docs/README to reflect the changes
- rename DD pictures, as they now belong to docs/README
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
The new panel will show the 99th quantile of scrape duration in seconds.
This should help identifying vmagent instances that experiences too high scraping durations.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Validate files specified via `-tlsKeyFile` and `-tlsCertFile` cmd-line flags on the process start-up. Previously, validation happened on the first connection accepted by HTTP server.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6608
---------
Co-authored-by: hagen1778 <roman@victoriametrics.com>
This reverts commit e280d90e9a.
Reason for revert: the updated code doesn't improve the performance of table.MustAddRows for the typical case
when rows contain timestamps belonging to ptws[0].
The performance may be improved in theory for the case when all the rows belong to partiton other than ptws[0],
but this partition is automatically moved to ptws[0] by the code at lines
6aad1d43e9/lib/storage/table.go (L287-L298) ,
so the next time the typical case will work.
Also the updated code makes the code harder to follow, since it introduces an additional level of indirection
with non-trivial semantics inside table.MustAddRows - the partition.TimeRangeInPartition() function.
This function needs to be inspected and understood when reading the code at table.MustAddRows().
This function depends on minTsInRows and maxTsInRows vars, which are defined and initialized
many lines above the partition.TimeRangeInPartition() call. This complicates reading and understanding
the code even more.
The previous code was using clearer loop over rows with the clear call to partition.HasTimestamp()
for every timestamp in the row. The partition.HasTimestamp() call is used in the table.MustAddRows()
function multiple times. This makes the use of partition.HasTimestamp() call more consistent,
easier to understand and easier to maintain comparing to the mix of partition.HasTimestamp() and partition.TimeRangeInPartition()
calls.
Aslo, there is no need in documenting some hardcore software engineering refactoring at docs/CHANGLELOG.md,
since the docs/CHANGELOG.md is intended for VictoriaMetrics users, who may not know software engineering.
The docs/CHANGELOG.md must document user-visible changes, and the docs must be concise and clear for VictoriaMetrics users.
See https://docs.victoriametrics.com/contributing/#pull-request-checklist for more details.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6629
### Describe Your Changes
The original logic is not only highly complex but also poorly readable,
so it can be modified to increase readability and reduce time
complexity.
---------
Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>
### Describe Your Changes
Use relative markdown references, removed `{{< ref >}}` shortcodes
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
- moved files from root to VictoriaMetrics folder to be able to mount
operator docs and VictoriaMetrics docs independently
- added ability to run website locally
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Change response code to 502 to align it with behaviour of other existing
reverse proxies. Currently, the following reverse proxies will return
502 in case an upstream is not available: nginx, traefik, caddy, apache.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
- Mention that credentials can be configured via env variables at both vmbackup and vmrestore docs.
- Make clear that the AZURE_STORAGE_DOMAIN env var is optional at https://docs.victoriametrics.com/vmbackup/#providing-credentials-via-env-variables
- Use string literals as is for env variable names instead of indirecting them via string constants.
This makes easier to read and understand the code. These environment variable names aren't going to change
in the future, so there is no sense in hiding them under string constants with some other names.
- Refer to https://docs.victoriametrics.com/vmbackup/#providing-credentials-via-env-variables in error messages
when auth creds are improperly configured. This should simplify figuring out how to fix the error.
- Simplify the code a bit at FS.newClient(), so it is easier to follow it now.
While at it, remove the check when superflouos environment variables are set, since it is too fragile
and it looks like it doesn't help properly configuring vmbackup / vmrestore.
- Remove envLookuper indirection - just use 'func(name string) (string, bool)' type inline.
This simplifies code reading and understanding.
- Split TestFSInit() into TestFSInit_Failure() and TestFSInit_Success(). This simplifies the test code,
so it should be easier to maintain in the future.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6518
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984
- Clarify the description of -graphite.sanitizeMetricName command-line flag at README.md
- Do not sanitize tag values - only metric names and tag names must be sanitized,
since they are treated specially by Grafana. Grafana doesn't apply any restrictions on tag values.
- Properly replace more than two consecutive dots with a single dot.
- Disallow unicode letters in metric names and tag names, since neither Prometheus nor Grafana
do not support them.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6489
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6077
### Describe Your Changes
`Storage.AddRows()` returns an error only in one case: when
`Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But
the same error is treated as a warning when it happens inside
`Storage.add()` or returned by `Storage.prefillNextIndexDB()`.
This commit fixes this inconsistency by treating the error returned by
`Storage.updatePerDateData()` as a warning as well. As a result
`Storage.add()` does not need a return value anymore and so doesn't
`Storage.AddRows()`.
Additionally, this commit adds a unit test that checks all cases that
result in a row not being added to the storage.
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
This reverts commit 4d66e042e3.
Reasons for revert:
- The commit makes unrelated invalid changes to docs/CHANGELOG.md
- The changes at app/vmauth/main.go are too complex. It is better splitting them into two parts:
- pooling readTrackingBody struct for reducing pressure on GC
- avoiding to use readTrackingBody when -maxRequestBodySizeToRetry command-line flag is set to 0
Let's make this in the follow-up commits!
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6445
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6533
- Obtain IAM token via GCE-like API instead of Amazon EC2 IMDSv2 API,
since it looks like IMDBSv2 API isn't supported by Yandex Cloud
according to https://yandex.cloud/en/docs/security/standard/authentication#aws-token :
> So far, Yandex Cloud does not support version 2, so it is strongly recommended
> to technically disable getting a service account token via the Amazon EC2 metadata service.
- Try obtaining IAM token via GCE-like API at first and then fall back to the deprecated Amazon EC2 IMDBSv1.
This should prevent from auth errors for instances with disabled GCE-like auth API.
This addresses @ITD27M01 concern at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513#issuecomment-1867794884
- Make more clear the description of the change at docs/CHANGELOG.md , add reference to the related issue.
P.S. This change wasn't tested in prod because I have no access to Yandex Cloud.
It is recommended to test this change by @ITD27M01 and @vmazgo , who filed
the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6524
- Move the test for SRV discovery into a separate function. This allows verifying round-robin discovery across SRV records.
- Restore the original netutil.Resolver after the test finishes, so it doesn't interfere with other tests.
- Move the description of the bugfix into the correct place at docs/CHANGELOG.md - it should be placed under v1.102.0-rc2
instead of v1.102.0-rc1.
- Remove unneeded code in URLPrefix.sanitizeAndInitialize(), since it is expected this function is called only once
for finishing URLPrefix initializiation. In this case URLPrefix.nextDiscoveryDeadline and URLPrefix.n are equal to 0
according to https://pkg.go.dev/sync/atomic#Uint64
- Properly fix the bug at URLPrefix.discoverBackendAddrsIfNeeded() - it is expected that hostToAddrs map uses
the original hostname keys, including 'srv+' prefix, so it shouldn't be removed when looping over up.busOriginal.
Instead, the 'srv+' prefix must be removed from the hostname only locally before passing the hostname to netutil.Resolver.LookupSRV.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6401
- Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go .
This improves code maintainability a bit.
- Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs().
This prevents from potential resource leaks.
- Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation.
This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions.
- Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label
in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics.
- Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation.
This label should simplify investigation of the exposed metrics.
- Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability:
it is hard to use these labels in query filters and aggregation functions.
- Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` .
This metric shows the number of samples generated by the corresponding streaming aggregation rule.
This metric has been added in the commit 861852f262 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462
- Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice.
This metric has been added in the commit 861852f262 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462
- Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params,
which could modify the behaviour of the constructed streaming aggregator.
Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory.
- Pass Options arg to LoadFromFile() function by reference, since this structure is quite big.
This also allows passing nil instead of Options when default options are enough.
- Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics,
so they have consistent set of labels comparing to the rest of streaming aggregation metrics.
- Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding
`aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding
`output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output
metrics. This is a follow-up for the commit 2eb1bc4f81 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604
- Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users
figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason
in the commit 2eb1bc4f81 .
- Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function.
While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268
The old link was changed globally to the new link in the commit f4b1cbfef0 .
Unfortunately, old links are still posted in new commits :(
This is a follow-up for 680b8c25c8 .
While at it, remove duplicate 'len(*remoteWriteURLs) > 0' check in the remotewrite.Init() functions,
since this check is already made at the beginning of the function.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6253
- Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage
systems are configured with the disabled on-disk queue, every in-memory queue is full
and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common,
so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx
relabeling and other processing in this case.
- Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped().
Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set.
In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full
and on-disk queue is disabled.
The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing
the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage
processes pending data, so it drops aggregated samples in this case.
- Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output,
so it is clear that this flag can be set individually per each -remoteWrite.url.
- Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems
are configured with the disabled on-disk queue, then there is no sense in keeping samples
on some of these systems, while dropping samples on the remaining systems, since this
will result in global stall on the remote storage system with the disabled on-disk queue
and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false
from remotewrite.TryPush() in this case. This will result in infinite duplicate samples
written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload
is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set.
This allows proceeding with newly scraped / pushed samples by sending them to the remaining
remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set.
- Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test.
- Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url.
See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065
We use `vm_streamaggr_flushed_samples_total` to show the number of
produced samples by aggregation rule, previously it was overcounted, and
doesn't account for `output_relabel_configs`.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
In most cases histograms are exposed in sorted manner with lower buckets
being first. This means that during scraping buckets with lower bounds
have higher chance of being updated earlier than upper ones.
Previously, values were propagated from upper to lower bounds, which
means that in most cases that would produce results higher than expected
once all buckets will become updated.
Propagating from upper bound effectively limits highest value of
histogram to the value of previous scrape. Once the data will become
consistent in the subsequent evaluation this causes spikes in the
result.
Changing propagation to be from lower to higher buckets reduces value
spikes in most cases due to nature of the original inconsistency.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580
An example histogram with previous(red) and updated(blue) versions:
![1719565540](https://github.com/VictoriaMetrics/VictoriaMetrics/assets/1367798/605c5e60-6abe-45b5-89b2-d470b60127b8)
This also makes logic of filling nan values with lower buckets values: [1 2 3 nan nan nan] => [1 2 3 3 3 3] obsolete.
Since buckets are now fixed from lower ones to upper this happens in the main loop, so there is no need in a second one.
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
These changes support using Azure Managed Identity for the `vmbackup`
utility. It adds two new environment variables:
* `AZURE_USE_DEFAULT_CREDENTIAL`: Instructs the `vmbackup` utility to
build a connection using the [Azure Default
Credential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity@v1.5.2#NewDefaultAzureCredential)
mode. This causes the Azure SDK to check for a variety of environment
variables to try and make a connection. By default, it tries to use
managed identity if that is set up.
This will close
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Testing
However you normally test the `vmbackup` utility using Azure Blob should
continue to work without any changes. The set up for that is environment
specific and not listed out here.
Once regression testing has been done you can set up [Azure Managed
Identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview)
so your resource (AKS, VM, etc), can use that credential method. Once it
is set up, update your environment variables according to the updated
documentation.
I added unit tests to the `FS.Init` function, then made my changes, then
updated the unit tests to capture the new branches.
I tested this in our environment, but with SAS token auth and managed
identity and it works as expected.
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Justin Rush <jarush@epic.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Move the code responsible for relabelCtx clearing into deferred function.
This allows making more clear the remoteWriteCtx.TryPush code.
This is a follow-up for 879771808b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
While at it, clarify the description of the bugfix at docs/CHANGELOG.md
### Describe Your Changes
Added flag to sanitize graphite metrics
fixes#6077
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>