Commit graph

273 commits

Author SHA1 Message Date
Nikolay
7c19d01e9a
app/vminsert: properly close vmstorage connection (#4935)
* app/vminsert: properly close vmstorage connection
previously vmstorage may stuck in broken state until vminsert restarts
since vmstorage was marked as read-only and connection was broken to it.
checkReadonly function never marked connection as broken
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4870

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-01 17:56:41 +02:00
Nikolay
fbe2795670
app/vminsert: fixes readonly check (#4892)
* app/vminsert: fixes readonly check
previously vminsert doesn't check readOnly state for vmstorage, since check was never performed for nil buffer
In this case every 30 second storage node loss readonly state and received some data.
It caused re-routing and possible slow down for ingestion
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4870

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-08-30 16:24:24 +02:00
Aliaksandr Valialkin
3a2d035283
lib/auth: add NewTokenPossibleMultitenant() for parsing auth token, which can be multitenant
Disallow parsing multitenant token at auth.NewToken().

Use auth.NewTokenPossibleMultitenant() at vminsert only. All the other callers should call auth.NewToken(),
since they do not support multitenant token.

This is a follow-up for f0c06b428e

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4910
2023-08-30 14:13:51 +02:00
Aliaksandr Valialkin
19d61737c1
app/{vminsert,vmselect}: follow-up after 2b7b3293c1
- Document the change at docs/CHANGELOG.md
- Set the default value for -vmstorageUserTimeout to 3 seconds. This is much better
  than the 0 value, which means that TCP connection to unreachable vmstorage could block
  for up to 16 minutes.
- Document -vmstorageUserTimeout at docs/Cluster-VictoriaMetrics.md
2023-08-29 12:17:39 +02:00
Will Jordan
2b7b3293c1
Add vmstorageUserTimeout flags to configure TCP user timeout (Linux) (#4423)
`TCP_USER_TIMEOUT` (since Linux 2.6.37) specifies the maximum amount of
time that transmitted data may remain unacknowledged before TCP will
forcibly close the connection and return `ETIMEDOUT` to the application.

Setting a low TCP user timeout allows RPC connections quickly reroute
around unavailable storage nodes during network interruptions.
2023-08-29 11:46:39 +02:00
Aliaksandr Valialkin
4b1f01e45d
lib/promrelabel: properly replace : char with _ in metric names when -usePromCompatibleNaming command-line flag is set
This addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3113#issuecomment-1275077071 comment from @johnseekins
2023-08-14 16:18:17 +02:00
Nikolay
476286385f
opentelemetry: return human readable error for json encoding. (#4822)
Opentelemetry parser supports only protobuf atm.

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-08-12 05:06:19 -07:00
Nikolay
b0977b07fb
app/vminsert: adds note for dropSamplesOnOverload flag (#4797)
Adds note for dropSamplesOnOverload flag that are samples dropped before replication
2023-08-10 12:18:29 +02:00
Nikolay
85de94e85c
lib/protoparser: adds opentelemetry parser (#2570)
* lib/protoparser: adds opentelemetry parser
app/{vmagent,vminsert}: adds opentelemetry ingestion path

Adds ability to ingest data with opentelemetry protocol
protobuf and json encoding is supported
data converted into prometheus protobuf timeseries
each data type has own converter and it may produce multiple timeseries
from single datapoint (for summary and histogram).
only cumulative aggregationFamily is supported for sum(prometheus
counter) and histogram.

Apply suggestions from code review

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

updates deps

fixes tests

wip

wip

wip

wip

lib/protoparser/opentelemetry: moves to vtprotobuf generator

go mod vendor

lib/protoparse/opentelemetry: reduce memory allocations

* wip

- Remove support for JSON parsing, since it is too fragile and is rarely used in practice.
  The most clients send OpenTelemetry metrics in protobuf.
  The JSON parser can be added in the future if needed.
- Remove unused code from lib/protoparser/opentelemetry/pb and lib/protoparser/opentelemetry/proto
- Do not re-use protobuf message between ParseStream() calls, since there is high chance
  of high fragmentation of the re-used message because of too complex nested structure of the message.

* wip

* wip

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-07-27 13:37:15 -07:00
Aliaksandr Valialkin
992c300ce9
all: replace atomic.Value with atomic.Pointer[T]
This eliminates the need in .(*T) casting for results obtained from Load()

Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map,
because map is already a pointer type.
2023-07-19 17:48:26 -07:00
Roman Khavronenko
576e59d82c
cluster: standardize default HTTP responses (#4368)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-01 10:26:52 +02:00
Aliaksandr Valialkin
0397b3f0f7
lib/handshake: do not pollute logs with cannot read hello messages on TCP health checks
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1762
2023-05-18 10:37:59 -07:00
Nikolay
5514b5d552
app/vminsert: allow parsing tenant id from (#4144)
VictoriaMetrics_ProjectID and VictoriaMetrics_AccountID labels.
It should help to migrate for new labels vm_account_id vm_project_id without service downtime
2023-05-16 08:16:37 -07:00
Alexander Marshalov
d321ea91f2
fixed typos in documentation and commandline flags descriptions (#4275) 2023-05-10 02:22:06 -07:00
Nikolay
82e2f19bc2
app/vminsert: correctly allocate buffer for storagenodes (#554)
in case of dynamic discovery number of nodes may change and we have to allocate new buffer for this case
otherwise vminsert may panic
2023-05-08 08:57:15 -07:00
Aliaksandr Valialkin
dad13c0a91
lib/streamaggr: follow-up for ff72ca14b9
- Make sure that the last successfully loaded config is used on hot-reload failure
- Properly cleanup resources occupied by already initialized aggregators
  when the current aggregator fails to be initialized
- Expose distinct vmagent_streamaggr_config_reload* metrics per each -remoteWrite.streamAggr.config
  This should simplify monitoring and debugging failed reloads
- Remove race condition at app/vminsert/common.MustStopStreamAggr when calling sa.MustStop() while sa
  could be in use at realoadSaConfig()
- Remove lib/streamaggr.aggregator.hasState global variable, since it may negatively impact scalability
  on system with big number of CPU cores at hasState.Store(true) call inside aggregator.Push().
- Remove fine-grained aggregator reload - reload all the aggregators on config change instead.
  This simplifies the code a bit. The fine-grained aggregator reload may be returned back
  if there will be demand from real users for it.
- Check -relabelConfig and -streamAggr.config files when single-node VictoriaMetrics runs with -dryRun flag
- Return back accidentally removed changelog for v1.87.4 at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3639
2023-03-31 22:54:10 -07:00
Aliaksandr Valialkin
fc3d826d7f
all: add Windows build for VictoriaMetrics
This commit changes background merge algorithm, so it becomes compatible with Windows file semantics.

The previous algorithm for background merge:

1. Merge source parts into a destination part inside tmp directory.
2. Create a file in txn directory with instructions on how to atomically
   swap source parts with the destination part.
3. Perform instructions from the file.
4. Delete the file with instructions.

This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since the remaining files with instructions is replayed on the next restart,
after that the remaining contents of the tmp directory is deleted.

Unfortunately this algorithm doesn't work under Windows because
it disallows removing and moving files, which are in use.

So the new algorithm for background merge has been implemented:

1. Merge source parts into a destination part inside the partition directory itself.
   E.g. now the partition directory may contain both complete and incomplete parts.
2. Atomically update the parts.json file with the new list of parts after the merge,
   e.g. remove the source parts from the list and add the destination part to the list
   before storing it to parts.json file.
3. Remove the source parts from disk when they are no longer used.

This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since incomplete partitions from step 1 or old source parts from step 3 are removed
on the next startup by inspecting parts.json file.

This algorithm should work under Windows, since it doesn't remove or move files in use.
This algorithm has also the following benefits:

- It should work better for NFS.
- It fits object storage semantics.

The new algorithm changes data storage format, so it is impossible to downgrade
to the previous versions of VictoriaMetrics after upgrading to this algorithm.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3236
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3821
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-19 23:28:26 -07:00
Aliaksandr Valialkin
54fe207cc0
all: follow-up for 7a3e16e774
- Sync the description for -httpListenAddr.useProxyProtocol command-line flag at vmagent and vmauth,
  so it is consistent with the description at vmauth and victoria-metrics
- Add a sample of panic text to docs/CHANGELOG.md, so it could be googled
- Mention the -httpListenAddr.useProxyProtocol command-line flag in the description for the bugfix

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-03-08 01:42:58 -08:00
Aliaksandr Valialkin
bbd5914eb1
all: add makefile rules for GOARCH=s390x for all the VictoriaMetrics components
This is a follow-up for 007530f882
2023-02-26 12:38:48 -08:00
Aliaksandr Valialkin
f579cac297
app/vmagent: automatically detect whether the remote storage supports VictoriaMetrics remote write protocol
Substitute -remoteWrite.useVMProto with -remoteWrite.forcePromProto command-line flag,
which can be used for forcing Prometheus remote write protocol in cases when the remote storage
supports VictoriaMetrics remote write protocol.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3847
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-23 17:38:47 -08:00
Aliaksandr Valialkin
0c60e4a30a
all: consistently use http.Method{Get,Post,Put} across the codebase
This is a follow-up after 9dec3c8f80
2023-02-22 19:01:09 -08:00
Artem Navoiev
522179fbde
docs: fix typo OpentTSDB -> OpenTSDB (#3854)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-02-22 12:11:48 -08:00
Aliaksandr Valialkin
80c6d1e24c
app/vmagent: add support for VictoriaMetrics remote write protocol, which allows saving up to 10x on network bandwidth costs under high load
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-20 18:40:40 -08:00
Aliaksandr Valialkin
f987fb9c8b
lib/protoparser/promremotewrite: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:48:11 -08:00
Aliaksandr Valialkin
c54d17b006
lib/protoparser/native: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:44:27 -08:00
Aliaksandr Valialkin
086516a02b
lib/protoparser/clusternative: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:38:02 -08:00
Aliaksandr Valialkin
75cf5a8939
lib/protoparser/graphite: extract stream parsing code into a separate stream package 2023-02-13 10:33:24 -08:00
Aliaksandr Valialkin
1801fa6c5c
lib/protoparser/csvimport: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:26:29 -08:00
Aliaksandr Valialkin
41feed813d
lib/protoparser/vmimport: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:22:00 -08:00
Aliaksandr Valialkin
66f0a78810
lib/protoparser/opentsdbhttp: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:15:15 -08:00
Aliaksandr Valialkin
67c0281535
lib/protoparser/opentsdb: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:04:14 -08:00
Aliaksandr Valialkin
1add6c3fa0
lib/protoparser/influx: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 09:59:56 -08:00
Aliaksandr Valialkin
b691d02b92
lib/protoparser/datadog: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 09:53:20 -08:00
Roman Khavronenko
867b7e5688
lib/protoparser/prometheus: move streamparser to subpackage (#3814)
`lib/protoparser/prometheus` is used by various applications,
such as `app/vmalert`. The recent change to the
`lib/protoparser/prometheus` package introduced a new dependency
of `lib/writeconcurrencylimiter` which exposes some metrics.
Because of the dependency, now all applications which have this
dependency also expose these metrics.

Creating a new `lib/protoparser/prometheus/stream` package helps
to remove these metrics from apps which use `lib/protoparser/prometheus`
as dependency.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3761

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-13 09:44:47 -08:00
Aliaksandr Valialkin
34379d4cf1
all: run apk update && apk upgrade in base Alpine Docker image in order to get all the recent security fixes 2023-02-09 14:03:02 -08:00
Nikolay
ebebaecd94
lib/netutil: init implimentation of proxy protocol (#3687)
* lib/netutil: init implimentation of proxy protocol
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-26 23:25:22 -08:00
Aliaksandr Valialkin
4b3a207705
app/{vmagent,vminsert}: follow-up for 1cfa183c2b
- Call httpserver.GetQuotedRemoteAddr() and httpserver.GetRequestURI() only when the error occurs.
  This saves CPU time on fast path when there are no parsing errors.
- Create a helper function - httpserver.LogError() - for logging the error with the request uri and remote addr context.
2023-01-23 22:41:08 -08:00
Artem Navoiev
0ac0cfdc69
add error handler for parsing prometheus text format to vmagent and v… (#3693)
* add error handler for parsing prometheus text format to vmagent and vminsert

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* fix typo

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* typo

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* fix variables naming and error message

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-23 22:36:23 -08:00
Aliaksandr Valialkin
556484a52f
app/vminsert: return 200 OK status code when importing data in pushgateway format
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1415
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3636
2023-01-15 13:26:09 -08:00
Aliaksandr Valialkin
b275983403
lib/writeconcurrencylimiter: improve the logic behind -maxConcurrentInserts limit
Previously the -maxConcurrentInserts was limiting the number of established client connections,
which write data to VictoriaMetrics. Some of these connections could be idle.
Such connections do not consume big amounts of CPU and RAM, so there is a little sense in limiting
the number of such connections. So now the -maxConcurrentInserts command-line option
limits the number of concurrently executed insert requests, not including idle connections.

It is recommended removing -maxConcurrentInserts command-line option, since the default value
for this option should work good for most cases.
2023-01-06 22:07:16 -08:00
Aliaksandr Valialkin
7d996ced51
app/{vmagent,vminsert}/datadog: make the host label optional in DataDog data ingestion protocol
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3432
2022-12-10 23:33:17 -08:00
Aliaksandr Valialkin
97b41e727c
lib/promscrape: implement target-level and metric-level relabel debugging
Target-level debugging is performed by clicking the 'debug' link at the corresponding target
on either http://vmagent:8429/targets page or on http://vmagent:8428/service-discovery page.

Metric-level debugging is perfromed at http://vmagent:8429/metric-relabel-debug page.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3407

See https://docs.victoriametrics.com/vmagent.html#relabel-debug
2022-12-10 02:25:56 -08:00
Pedro Gonçalves
84cfe4fcaf
Datadog - Add device as a tag if it's present as a field in the series object (#3431)
* Datadog - Add device as a tag if it's present as a field in the series object

* address PR comments
2022-12-05 23:10:42 -08:00
Aliaksandr Valialkin
3a25a4b1de
app/{vminsert,vmselect}: speed up TestInitStopNodes() 2022-12-03 23:53:14 -08:00
Aliaksandr Valialkin
c5eebaffd8
app/{vminsert,vmagent}: follow-up after 53a63c6c4c
Extend /api/v1/import/prometheus with the support for Pushgateway way of specifying additional labels.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1415
2022-11-25 16:42:38 -08:00
Aliaksandr Valialkin
afc35485c1
app/vminsert: add missing vm_relabel_config_* metrics after 03d88bc066
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3345
2022-11-22 00:48:13 +02:00
Aliaksandr Valialkin
fe8d40f12c
app/{vminsert,vmselect}: test initialization with different number of storage nodes
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3329
2022-11-09 11:48:39 +02:00
Aliaksandr Valialkin
8540dd669b
app/vminsert/netstorage: move nodesHash from global state to storageNodesBucket
This should prevent from panics when the list of discovered vmstorage nodes changes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3329
2022-11-09 11:45:24 +02:00
Zakhar Bessarab
fd36c9b3b1
docs/{vminsert,vmselect}: clarify flag description for clusternativeListenAddr (#3314)
docs/{vminsert,vmselect}: clarify flag description for `clusternativeListenAddr`
2022-11-04 09:49:53 +01:00
Zakhar Bessarab
7afa67dba0
{app/vmselect,app/vminsert}: ensure -storageNodes flag has no empty values (#3291) 2022-11-01 14:54:55 +02:00