Commit graph

1949 commits

Author SHA1 Message Date
Yury Molodov
ddc5197bce
vmui: add metric relabel debug (#3889)
* feat: add metric relabel debug (#3807)

* fix: add link to relabeling cookbook

* lib/promrelabel: merge, fix conflicts

* lib/promrelabel: fix diff

* docs/vmui: add metric relabel playground

---------

Co-authored-by: dmitryk-dk <kozlovdmitriyy@gmail.com>
2023-05-08 14:59:35 -07:00
Roman Khavronenko
20b025dc88
http server: limit max concurrent requests (#4185)
* lib/httpserver: introduce `-http.maxConcurrentRequests` command-line flag

Introduce `-http.maxConcurrentRequests` command-line flag to protect
VM components from resource exhaustion during unexpected spikes of HTTP requests.
By default, the new flag's value is set to 0 which means no limits are applied.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/httpserver: mention http.maxConcurrentRequests in docs

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-08 13:13:58 -07:00
Zakhar Bessarab
79ee1749a1
lib/httpserver: add handler to serve /robots.txt and deny search indexing (#4143)
This handler will instruct search engines that indexing is not allowed for the content exposed to the internet. This should help to address issues like #4128 when instances are exposed to the internet without authentication.
2023-05-08 09:46:34 -07:00
Aliaksandr Valialkin
8b15f93426
lib/{mergeset,storage}: make mustReadPartNames() code more clear 2023-04-14 23:17:08 -07:00
Aliaksandr Valialkin
d739511f5b
lib/storage: replace OpenStorage() with MustOpenStorage()
Callers of OpenStorage() log the returned error and exit.
The error logging and exit can be performed inside MustOpenStorage()
alongside with printing the stack trace for better debuggability.
This simplifies the code at caller side.
2023-04-14 23:04:42 -07:00
Aliaksandr Valialkin
f26e480a77
lib/storage: fix a bug, which prevents from reading pre-v1.90.0 parts
The bug has been introduced in c0b852d50d
2023-04-14 22:33:29 -07:00
Aliaksandr Valialkin
cf4701db65
lib/fs: add MustReadDir() function
Use fs.MustReadDir() instead of os.ReadDir() across the code in order to reduce the code verbosity.
The fs.MustReadDir() logs the error with the directory name and the call stack on error
before exit. This information should be enough for debugging the cause of the error.
2023-04-14 22:11:40 -07:00
Aliaksandr Valialkin
0a11c46cd2
lib/storage: validate rows in partition.AddRows() only during tests 2023-04-14 20:53:05 -07:00
Aliaksandr Valialkin
292b6a851f
all: consistently use fs.MustClose() for closing lock files 2023-04-14 20:16:11 -07:00
Aliaksandr Valialkin
a7678350ad
lib/fs: convert CreateFlockFile to MustCreateFlockFile
Callers of CreateFlockFile log the returned err and exit.
It is better to log the error inside the MustCreateFlockFile together with the path
to the specified directory and the call stack. This simplifies
the code at the callers' side while leaving the debuggability at the same level.
2023-04-14 19:51:52 -07:00
Aliaksandr Valialkin
e2de5bf763
lib/{storage,mergeset}: convert InitFromFilePart to MustInitFromFilePart
Callers of InitFromFilePart log the error and exit.
It is better to log the error with the path to the part and the call stack
directly inside the MustInitFromFilePart() function.
This simplifies the code at callers' side while leaving the same level of debuggability.
2023-04-14 15:47:20 -07:00
Aliaksandr Valialkin
df99965564
lib/filestream: change Create() to MustCreate()
Callers of this function log the returned error and exit.
It is better logging the error together with the path to the filename
and call stack directly inside the function. This simplifies
the code at callers' side without reducing the level of debuggability
2023-04-14 15:14:24 -07:00
Aliaksandr Valialkin
0bbb281c3d
lib/filestream: transform Open() -> MustOpen()
Callers of this function log the returned error and exit.
Let's log the error with the path to the filename and call stack
inside the function. This simplifies the code at callers' side
without reducing the level of debuggability.
2023-04-14 15:04:54 -07:00
Aliaksandr Valialkin
ee8be138b9
lib/fs: improve error logging at ReaderAt.MustReadAt()
- Add 'BUG:' prefix to error messages related to programming errors aka bugs.
- Consistently log the path to the file in all the messages in order to improve debuggability.
2023-04-14 14:52:14 -07:00
Aliaksandr Valialkin
b80d93d4b2
lib/fs: substitute ReadFullData with MustReadData
Callers of ReadFullData() log the error and then exit.
So let's log the error with the path to the filename and the call stack
inside MustReadData(). This simplifies the code at callers' side,
while leaving the debuggability at the same level.
2023-04-14 14:40:58 -07:00
Aliaksandr Valialkin
36559dfec2
lib/fs: improve error logging inside MustWriteData
Log the path to file on errors inside MustWriteData().
This improves debuggability of errors, which may occur inside MustWriteData().
2023-04-14 14:33:45 -07:00
Aliaksandr Valialkin
67df75484f
lib/{mergeset,storage}: remove isInMerge flag from parts only when they werent removed yet from the list of active parts
This prevents from possible panic during access to pw.p when it is set to nil at partWrapper.decRef() called inside swapSrcWithDstParts()
2023-04-14 00:16:18 -07:00
Aliaksandr Valialkin
7fb2b14ca0
docs/CHANGELOG.md: run at least 4 background mergers on systems with less than 4 CPU cores
This reduces the probability of sudden spike in the number of small parts when all the background mergers
are busy with big merges.
2023-04-13 23:37:05 -07:00
Aliaksandr Valialkin
8846ce5f1d
lib/{mergeset,storage}: make sure that getFlushToDiskDeadline() takes into account only in-memory parts 2023-04-13 23:17:24 -07:00
Aliaksandr Valialkin
f75b1b7a53
lib/fs: add Must prefix to CopyDirectory and CopyFile functions
Callers of these functions log the returned error and then exit.
Let's log the error with the call stack inside the function itself.
This simplifies the code at callers' side, while leaving the same
level of debuggability in case of errors.
2023-04-13 23:04:37 -07:00
Aliaksandr Valialkin
75b74aa837
lib/fs: rename SymlinkRelative to MustSymlinkRelative
Callers of this function log the returned error and then exit.
Let's log the error with the call stack inside the function itself.
This simplifies the code at callers' side, while leaving the same
level of debuggability in case of errors.
2023-04-13 22:53:11 -07:00
Aliaksandr Valialkin
624b86d065
lib/fs: rename HardLinkFiles to MustHardLinkFiles
Callers of this function log the returned error and then exit.
Let's log the error with the call stack inside the function itself.
This simplifies the code at callers' side, while leaving the same
level of debuggability in case of errors.
2023-04-13 22:49:38 -07:00
Aliaksandr Valialkin
c4638553a3
lib/fs: rename WriteFileAtomically to MustWriteAtomic
Callers of this function log the returned error and exit.
So let's just log the error with the given filepath and the call stack
inside the function itself and then exit. This simplifies the code
at callers' place while leaves the same level of debuggability in case of errors.
2023-04-13 22:43:30 -07:00
Aliaksandr Valialkin
aac3dccfd1
lib/fs: replace MkdirAllIfNotExist->MustMkdirIfNotExist and MkdirAllFailIfExist->MustMkdirFailIfExist
Callers of these functions log the returned error and then exit. The returned error already contains the path
to directory, which was failed to be created. So let's just log the error together with the call stack
inside these functions. This leaves the debuggability of the returned error at the same level
while allows simplifying the code at callers' side.

While at it, properly use MustMkdirFailIfExist instead of MustMkdirIfNotExist inside inmemoryPart.MustStoreToDisk().
It is expected that the inmemoryPart.MustStoreToDick() must fail if there is already a directory under the given path.
2023-04-13 22:22:08 -07:00
Aliaksandr Valialkin
b4c330ea2b
lib/fs: rename MustWriteFileAndSync to MustWriteSync in order to improve readability a bit
This is a follow-up for 2a8395be05
2023-04-13 22:20:31 -07:00
Aliaksandr Valialkin
cdee2cfc5c
lib/{mergeset,storage}: remove unused path field from blockStreamWriter
This is a follow-up after 42bba64aa7
2023-04-13 22:20:02 -07:00
Aliaksandr Valialkin
1cda542c48
lib/fs: replace WriteFileAndSync with MustWriteAndSync
When WriteFileAndSync fails, then the caller eventually logs the error message
and exits. The error message returned by WriteFileAndSync already contains the path
to the file, which couldn't be created. This information alongside the call stack
is enough for debugging the issue. So just use log.Panicf("FATAL: ...") inside MustWriteAndSync().
This simplifies error handling at caller side a bit.
2023-04-13 22:17:34 -07:00
Aliaksandr Valialkin
eb7df27e20
lib/{mergeset,storage}: properly fsync part directory listing after writing in-memory part to disk
This is a follow-up after 42bba64aa7

Previously the part directory listing was fsync'ed implicitly inside partHeader.WriteMetadata()
by calling fs.WriteFileAtomically(). Now it must be fsync'ed explicitly.

There is no need in fsync'ing the parent directory, since it is fsync'ed by the caller
when updating parts.json file.
2023-04-13 21:21:46 -07:00
Aliaksandr Valialkin
13d2350e6a
lib/{mergeset,storage}: explicitly fsync the created part directory listing
Previously the created part directory listing was fsynced implicitly
when storing metadata.json file in it.

Also remove superflouous fsync for part directory listing,
which was called at blockStreamWriter.MustClose().
After that the metadata.json file is created, so an additional fsync
for the directory contents is needed.
2023-04-13 21:07:33 -07:00
Aliaksandr Valialkin
cf53ce83a0
app/vmstorage: deprecate -bigMergeConcurrency command-line flag
Improperly configured -bigMergeConcurrency command-line flag usually leads to uncontrolled
growth of unmerged parts, which, in turn, increases CPU usage and query durations.

So it is better deprecating this flag. In rare cases -smallMergeConcurrency command-line flag
can be used instead for controlling the concurrency of background merges.
2023-04-13 20:42:22 -07:00
Aliaksandr Valialkin
e73dd1df2d
lib/{fs,persistentqueue}: use filepath.Join() instead of concatenating path parts with /
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
2023-04-13 20:14:07 -07:00
Aliaksandr Valialkin
e95e401e4d
app/vmbackupmanager: sync with enterprise-single-node branch after 41a54c775891c87e3d5ed59ff0769c869dd2fe71 2023-04-13 19:38:28 -07:00
Zakhar Bessarab
217eea6e15
lib/backup/actions: store metadata(creation and completion time) in backup files (#4117)
This makes it easier to understand exact point in time which is included in this backup.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-13 19:20:34 -07:00
Haleygo
7ee32ed06a
fix sort pendingDateMetricsIDs (#4102) 2023-04-10 10:16:36 -07:00
Dmytro Kozlov
8ec5e7f53a
app/vmctl: add multiple filters defined in --vm-native-filter-match flag to discovered metric names (#4063)
* app/vmctl: add multiple filters defined in `--vm-native-filter-match` flag to discovered metric names

* app/vmctl: fix comments

* app/vmctl: move function buildMatchWithFilter to the correct place

* app/vmctl: update CHANGELOG.md

* app/vmctl: fix CI, remove error wrapping

* app/vmctl: fix CI, simplify `Set()`
2023-04-06 15:11:53 -07:00
Aliaksandr Valialkin
bf545fcc14
lib/encoding: fix test after 4725549cb2 2023-04-05 21:38:48 -07:00
Aliaksandr Valialkin
52734c71fc
lib/storage: use shorter code after 03bde173b7 2023-04-02 21:35:34 -07:00
faceair
03bde173b7
lib/storage: fix reuse pendingMetricRow (#4049) 2023-04-02 21:28:43 -07:00
faceair
a4b4bda166
lib/storage: remove unused code (#4050) 2023-04-02 21:23:24 -07:00
Aliaksandr Valialkin
02ceebccc0
lib/promscrape: do not re-use previously loaded scrape targets on failed attempt to load updated scrape targets at file_sd_configs
The logic employed for re-using the previously loaded scrape target was broken initially.
The commit cc0427897c tried to fix it, but the new logic
became too complex and fragile. So it is better to just remove this logic,
since the targets from temporarily broken file should be eventually loaded on next
attempts every -promscrape.fileSDCheckInterval

This also allows removing fragile hacks around __vm_filepath label.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3989
2023-04-02 21:11:12 -07:00
Dmytro Kozlov
6f0512a81c
lib/promscrape: fix the problem with scrape work duplicates when file_sd_config can't be read (#4027)
* lib/promscrape: fix the problem with scrape work duplicates when file_sd_config can't be read

* lib/promscrape: clarified comment

* lib/promscrape: made better approach to handle a problem with growing []*ScrapeWork on each error when loading config

* lib/promscrape: added CHANGELOG.md

* Update docs/CHANGELOG.md

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-04-02 21:11:10 -07:00
Roman Khavronenko
5f95f9d453
lib/storage: check for free disk space before opening tables (#4035)
* lib/storage: check for free disk space before opening tables

We check for free disk space before call to `openTable`,
so `Storage` can be set to ReadOnly before mergeWorkers start.

Before the change, there was a chance that merges will start
even if Storage has to start in ReadOnly mode because of
`-storage.minFreeDiskSpaceBytes` limit.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4023
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/storage: chore

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update lib/storage/storage.go

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-31 23:50:56 -07:00
Aliaksandr Valialkin
29f376e916
lib/fs: follow-up for ec45f1bc5f
Properly close response body before checking for the response code.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4034
2023-03-31 22:54:33 -07:00
Aliaksandr Valialkin
dad13c0a91
lib/streamaggr: follow-up for ff72ca14b9
- Make sure that the last successfully loaded config is used on hot-reload failure
- Properly cleanup resources occupied by already initialized aggregators
  when the current aggregator fails to be initialized
- Expose distinct vmagent_streamaggr_config_reload* metrics per each -remoteWrite.streamAggr.config
  This should simplify monitoring and debugging failed reloads
- Remove race condition at app/vminsert/common.MustStopStreamAggr when calling sa.MustStop() while sa
  could be in use at realoadSaConfig()
- Remove lib/streamaggr.aggregator.hasState global variable, since it may negatively impact scalability
  on system with big number of CPU cores at hasState.Store(true) call inside aggregator.Push().
- Remove fine-grained aggregator reload - reload all the aggregators on config change instead.
  This simplifies the code a bit. The fine-grained aggregator reload may be returned back
  if there will be demand from real users for it.
- Check -relabelConfig and -streamAggr.config files when single-node VictoriaMetrics runs with -dryRun flag
- Return back accidentally removed changelog for v1.87.4 at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3639
2023-03-31 22:54:10 -07:00
Zakhar Bessarab
46c8be4f98
lib/fs: verify response code when reading configuration over HTTP (#4036)
Verifying status code helps to avoid misleading errors caused by attempt to parse unsuccessful response.

Related issue: #4034

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-31 22:33:53 -07:00
Alexander Marshalov
8c14d17694
added hot reload support for stream aggregation configs (#3969) (#3970)
added hot reload support for stream aggregation configs (#3969)

Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-03-31 22:31:38 -07:00
Aliaksandr Valialkin
85ca077a88
lib/flagutil: ArrayString: support commas inside quoted strings and inside [], {} and () braces
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3915
2023-03-28 21:25:07 -07:00
Aliaksandr Valialkin
fd7efad69f
lib/persistentqueue: typo fix after aea6df8197 2023-03-27 20:05:51 -07:00
Aliaksandr Valialkin
6f5bbf096a
app/vmagent/remotewrite: cosmetic updates after f3a51e8b1d
- Compare directory names instead of paths to directory when determining which persistent queues must be deleted
  This is less error-prone solution, since paths to the same directory can differ, which could lead
  to accidental directory removal for the existing -remoteWrite.url

- Log the `removed %d dangling queues` message when at least a single queue has been removed

- Consistently use filepath.Join() for creating paths to persistent queues.
  This is needed for Windows support (see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70 )

- Clarify the description of the change at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
2023-03-27 18:38:53 -07:00
Zakhar Bessarab
6ed6eb0c4c
app/vmagent: add -remoteWrite.removeDanglingQueues flag (#4017)
* app/vmagent: add `-remoteWrite.removeDanglingQueues` flag which allows to automatically remove dangling persistent queue contents

Related issue: #4014

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmagent: address review feedback

- remove persistent queues files by default
- rename `remoteWrite.removeDanglingQueues` to `remoteWrite.keepDanglingQueues`
- update docs to reflect changed behaviour

Related issue: #4014

* Apply suggestions from code review

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-27 18:38:51 -07:00