Commit graph

83 commits

Author SHA1 Message Date
Andrii Chubatiuk
5a05788719
app/vlinsert: support floats for elasticseach timestamps (#8472)
### Describe Your Changes

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8470

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 67f8fa66ed)
2025-03-10 13:54:23 +01:00
Andrii Chubatiuk
c72d5690cc
lib/protoparser/opentelemetry: properly marshal nested attributes into JSON
Previously, opentelemetry attribute parsed added extra field names according to 
golang JSON parser spec for structs:

```
struct AnyValue{
 StringValue string
}
```
 Was serialized into:
```
{"StringValue": "some-string"}
```
 While opentelemetry-collector serializes it as
```
"some-string"
```

 This commit changes this behaviour it makes parses compatible with opentelemetry-collector format. See test cases for examples.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8384
2025-03-05 18:38:25 +01:00
Andrii Chubatiuk
6d62620c9d
vlinsert: accept ES ping requests to endpoint without trailing slash (#8354)
### Describe Your Changes

related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8353

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 7a1c84b6ec)
2025-02-25 19:13:30 +01:00
Zhu Jiekun
ea30d1d014
app/vlinsert: properly ingest journald logs with single-character name entity
This commit changes journald ingestion validation regex:
from `^[A-Z_][A-Z0-9_]+` to  `^[A-Z_][A-Z0-9_]*`.

 It's needed to properly support entities with single-character
names.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8314
2025-02-17 16:08:49 +01:00
Aliaksandr Valialkin
66b5c619fa
app/vlinsert: add a link to the pull request at systemd repository, which enables compression support
This should simplify maintenance of this code in the future.
While at it, clarify the change at the docs/VictoriaLogs/CHANGELOG.md.

This is a follow-up commit for 3c9f9f49b0.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8264
Updates https://github.com/systemd/systemd/pull/34822
2025-02-12 22:10:05 +01:00
Andrii Chubatiuk
e9ca6eaaf0
app/vlinsert: add OpenTelemetry ingested logs trace_id and span_id
This commit parses additional optional fields from OpenTelemetry logs protocol.

Related issue:

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8255
(cherry picked from commit 3a27073634)
2025-02-12 12:47:42 +01:00
Andrii Chubatiuk
6254810e12
app/vlinsert: add journald content negotiation, which enables compression on a client
Adding Accept-Encoding response header to support content negotiation,
which was introduced in [this
PR](https://github.com/systemd/systemd/pull/34822) and enables
compression on journald

(cherry picked from commit 3c9f9f49b0)
2025-02-12 12:47:42 +01:00
Aliaksandr Valialkin
237a3d60b0
app/vlinsert: continue parsing JSON lines in the input stream after parse errors
Previosly the parsing of the input stream was stopped after the first parse error.
This isn't what most users expect when ingesting JSON lines in a stream where some JSON lines may be invalid.

(cherry picked from commit ebac07bcf6)
2025-02-10 21:32:31 +04:00
Aliaksandr Valialkin
7690873787
app/vlinsert: accept timestamps with microsecond and nanosecond precision at _time field 2025-02-09 22:50:59 +01:00
Aliaksandr Valialkin
bacf58de76
lib/logstorage: remove unnecesary abstraction - RowsFormatter
It is better to use the AppendFieldsToJSON function directly
instead of hiding it under RowsFormatter abstraction.

(cherry picked from commit 95f182053b)
2025-01-29 13:29:23 +01:00
Aliaksandr Valialkin
40646a125e
lib/logstorage: ignore logs with too long field names during data ingestion
Previously too long field names were silently truncated. This is not what most users expect.
It is better ignoring the whole log entry in this case and logging it with the WARNING message,
so human operator could notice and fix the ingestion of incorrect logs ASAP.

The commit also adds and updates the following entries to VictoriaLogs faq:

- https://docs.victoriametrics.com/victorialogs/faq/#how-many-fields-a-single-log-entry-may-contain
- https://docs.victoriametrics.com/victorialogs/faq/#what-is-the-maximum-supported-field-name-length
- https://docs.victoriametrics.com/victorialogs/faq/#what-length-a-log-record-is-expected-to-have

These entries are referred at `-insert.maxLineSizeBytes` and `-insert.maxFieldsPerLine` command-line descriptions
and at the WARNING messages, which are emitted when log entries are ignored because of some of these limits
are exceeded.

(cherry picked from commit 3c036e0d31)
2025-01-29 13:29:22 +01:00
Aliaksandr Valialkin
2e61cff968
app/vlinsert/insertutils: avoid excess copying of lines at LineReader.buf
1. Do not copy every line from LineReader.buf to LineReader.Line - just refer the line at LineReader.buf.
2. Do not copy the next found line to the beginning of LineReader.buf - just track the next line start index with LineReader.bufOffset.

This reduces memory copying when many lines are read into LineReader.buf by a single read() syscall.
2025-01-13 07:23:18 +01:00
Andrii Chubatiuk
4b59f5e351
datadog-serverless: fixed metrics and logs ingestion from Datadog serverless extensions for AWS and GCP (#7769)
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7761

### Describe Your Changes

- datadog /api/v2/logs api supports message field in json format, which
is not documented and is used by serverless extension. This PR allows
message field to be both string and object type. Also added support of
not documented timestamp field
- added `-datadog.streamFields` and `-datadog.ignoreFields` flags to
configure default stream fields for datadog logs, where there's no
alternative option to pass extra headers and query args
- added ingest `max` and `min` values of data, which are ingested using
`datadogsketches` API, which is also actively used by serverless
extensions
- use default `.` separator instead of `_` for sketches metric names
until metrics are not sanitized
2024-12-23 19:45:04 +01:00
Andrii Chubatiuk
772c46fb05
app/vlinsert: loki healthcheck endpoint (#7864)
### Describe Your Changes

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7824

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-12-18 22:41:08 +01:00
Aliaksandr Valialkin
bd37715c7b
app/vlinsert: use default set of log stream fields for Loki and OpenTelemetry protocols if _stream_fields query arg is empty
Loki protocol supports a list of log stream labels - see https://grafana.com/docs/loki/latest/get-started/labels/

OpenTelemetry protocol also supports a list of log stream labels, which are named resource attributes there.
See https://opentelemetry.io/docs/concepts/resources/#semantic-attributes-with-sdk-provided-default-value

Simplify logs' ingestion into VictoriaLogs for these protocols by allowing the data ingestion without
the need to specify _stream_fields query arg or VL-Stream-Fields HTTP header. In this case the upstream log stream fields
are used during data ingestion. The set of log stream fields can be overriden via _stream_fields query arg
and via VL-Stream-Fields HTTP header if needed.

Thanks to @AndrewChubatiuk for the initial idea and implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7554
2024-12-05 15:16:53 +01:00
Aliaksandr Valialkin
74a314ef77
lib/logstorage: ignore logs with too many fields instead of trying to store them
The storage isn't designed to work efficiently with logs containing too many log fields.
It is better to emit a warning to the user and ignore such logs instead of trying to store them.
This will allow fixing the issue by the user ASAP, and won't lead to excess resource usage
at VictoriaLogs side, such as RAM, CPU, disk IO and disk space.

While at it, ignore too long logs with the size exceeding the maximum block size during data ingestion.
This should prevent from possible issues when dealing with such long logs if they were stored in the storage.
Emit a warning in this case, so the user could identify and fix the issue ASAP.

This is a follow-up for 22e6385f56

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7568
2024-12-05 15:16:53 +01:00
Aliaksandr Valialkin
ec9e8b3800
app/vlinsert: properly skip too long lines at Elasticsearch bulk import protocol
Previously too long line in Elasticsearch bulk import protocol resulted in clsoing
the client stream and ignoring the rest of log messages in the stream.

Now only the too long message is ignored properly, while the rest of log messages
are read successfully.

This is a follow-up for 61e7c77ce25967269192ed2e201f67d8c48b972e
2024-12-05 15:16:53 +01:00
Aliaksandr Valialkin
955b83bb9e
app/vlinsert: track vl_rows_ingested_total metric in a single place
Previously vl_rows_ingested_total metric was tracked individually per each supported data ingestion protocols.
It is better from maintainability PoV tracking this metric consistently in a single place - at logMessageProcessor.AddRow() function
in the same way as vl_bytes_ingested_total metric is tracked.

This is a follow-up for 50bfa689c9
2024-12-05 15:16:52 +01:00
Aliaksandr Valialkin
f162586915
app/vlinsert: continue parsing lines after too long lines in JSON line stream and Elasticsearch bulk import stream
Previously all the lines after the too long line in the stream were ignored. This wasn't expected by most users.
2024-12-05 15:16:52 +01:00
Aliaksandr Valialkin
b2555491bb
app/vlinsert: expose vl_bytes_ingested_total metric
This metric tracks an approximate amounts of bytes processed when parsing the ingested logs.
The metric is exposed individually per every supported data ingestion protocol. The protocol name
is exposed via "type" label in order to be consistent with vl_rows_ingested_total metric.

Thanks to @tenmozes for the initial idea and implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7682

While at it, remove the unneeded "format" label from vl_rows_ingested_total metric.
The "type" label must be enough for encoding the data ingestion format.
2024-11-30 17:26:17 +01:00
Aliaksandr Valialkin
8a86575128
app/vlinsert/loki: show the original request body on parse errors
This should simplify debugging.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7490
2024-11-14 17:21:27 +01:00
Aliaksandr Valialkin
ab2ce3728b
app/vlinsert/syslog: allow changing the default set of log fields to use as stream fields during syslog data ingestion
Thanks to @AndrewChubatiuk for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7488
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7480

See https://docs.victoriametrics.com/victorialogs/data-ingestion/syslog/#stream-fields
2024-11-14 17:21:27 +01:00
Aliaksandr Valialkin
bab5348b8b
app/vlinsert/syslog: add an ability to drop and add fields during data ingestion via Syslog protocol
See https://docs.victoriametrics.com/victorialogs/data-ingestion/syslog/#dropping-fields
and https://docs.victoriametrics.com/victorialogs/data-ingestion/syslog/#adding-extra-fields
2024-11-14 17:21:27 +01:00
Aliaksandr Valialkin
1c7d8adc65
app/vlinsert/loki: follow-up for 3aeb1b96a2
- Disallow more than 3 items in Loki line entry, since it must contain two mandatory entries: timestamp and message,
  plus one optional entry - structured metadata. See https://grafana.com/docs/loki/latest/reference/loki-http-api/#ingest-logs

- Update references to structured metadata docs in Loki, in order to simplify further maintenance of the code

- Move the change from bugfix to feature at docs/VictoriaLogs/CHANGELOG.md, since VictoriaLogs never supported
  structured metadata over JSON Loki protocol. The support for structured metadata in protobuf Loki protocol
  has been added in ac06569c49 , which has been included in v0.28.0-victorialogs.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7431
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7432

(cherry picked from commit 3d75c39ff4)
2024-11-07 13:00:20 +01:00
Zhu Jiekun
e113a40142
app/vlinisert/loki: properly parse json logs with structured metadata
Loki protocol supports optional `metadata` object for each ingested line. It's added as 3rd field at the (ts,msg,metadata) tuple. Previously,  loki request json parsers rejected log line if tuple size != 2.

This commit allows optional tuple field. It parses it as json object and adds it as log metadata fields to the  log message stream.

related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7431

---------
Co-authored-by: f41gh7 <nik@victoriametrics.com>

(cherry picked from commit 3aeb1b96a2)
2024-11-07 13:00:18 +01:00
Andrii Chubatiuk
23dcec3911
vlinsert: support datadog logs
This commit adds the following changes:

- Added support to push datadog logs with examples of how to ingest data
using Vector and Fluentbit
- Updated VictoriaLogs examples directory structure to have single
container image for victorialogs, agent (fluentbit, vector, etc) but
multiple configurations for different protocols

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6632

(cherry picked from commit e0930687f1)
2024-11-06 13:58:16 +01:00
Aliaksandr Valialkin
fced48d540
app/vlinsert: implement the ability to add extra fields to the ingested logs
This can be done via extra_fields query arg or via VL-Extra-Fields HTTP header.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7354#issuecomment-2448671445

(cherry picked from commit 4478e48eb6)
2024-11-04 10:23:16 -03:00
Aliaksandr Valialkin
0eaec9f376
app/vlinsert: typo fix after 16ee470da6
(cherry picked from commit d2dce13df6)
2024-10-31 14:11:07 +01:00
Aliaksandr Valialkin
b4f5e5c1b8
app/vlinsert: accept logs with empty _msg field
In this case the _msg field is set to the value specified in the -defaultMsgValue command-line flag.

This should simplify first-time migration to VictoriaLogs from other systems.

(cherry picked from commit 16ee470da6)
2024-10-30 15:19:52 +01:00
Aliaksandr Valialkin
8baa5177aa
app/vlinsert: allow specifying comma-separated list of fields containing log message via _msg_field query arg and VL-Msg-Field HTTP request header
This msy be useful when ingesting logs from different sources, which store the log message in different fields.
For example, `_msg_field=message,event.data,some_field` will get log message from the first non-empty field:
`message`, `event.data` and `some_field`.

(cherry picked from commit ed73f8350b)
2024-10-30 15:19:52 +01:00
Andrii Chubatiuk
da14c7b26e
app/vlinsert: adds journald ingestion support
This commit allows to ingest logs with journald format. 

https://www.freedesktop.org/software/systemd/man/latest/systemd-journal-remote.service.html

related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4618
2024-10-27 20:42:41 +01:00
Aliaksandr Valialkin
0881e5fd5c
app/vlselect: do not show empty fields in query results
Empty fields are treated as non-existing fields by VictoriaLogs data model.
So there is no sense in returning empty fields in query results, since they may mislead and confuse users.

(cherry picked from commit bac193e50b)
2024-10-15 11:49:32 +02:00
Aliaksandr Valialkin
0f1b3852dd
app/vlinsert: support unix timestamps in seconds and milliseconds in JSON stream data ingestion API 2024-09-28 21:57:19 +02:00
Aliaksandr Valialkin
b8fa213310
app/vlinsert: accept unix timestamp in seconds additionally to milliseconds at ElasticSearch bulk API
Timestamps in seconds are sometimes used for data ingestion via ElasticSearch bulk API
2024-09-28 21:21:19 +02:00
Aliaksandr Valialkin
4d27933041
app/vlinsert: support _time field without timezone information during data ingestion
Use local timezone of the host server in this case. The timezone can be overridden
with TZ environment variable if needed.

While at it, allow using whitespace instead of T as a delimiter between data and time
in the ingested _time field. For example, '2024-09-20 10:20:30' is now accepted
during data ingestion. This is valid ISO8601 format, which is used by some log shippers,
so it should be supported. This format is also known as SQL datetime format.

Also assume local time zone when time without timezone information is passed to querying APIs.
Previously such a time was parsed in UTC timezone. Add `Z` to the end of the time string
if the old behaviour is preferred.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6721
2024-09-26 12:50:14 +02:00
Aliaksandr Valialkin
7309058282
app/vlinsert/insertutils: add a link to docs why _msg field must be non-empty 2024-09-26 09:53:24 +02:00
Zhu Jiekun
3fa72b2c1b
feature: [victorialogs] drop logs without non-empty _msg field (#7056)
### Describe Your Changes

VictoriaLogs allows logs without `_msg` field or `_msg` field is empty.
This lead to incorrect search result. See:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6785

This pull request search for non-empty `_msg` field before log entry is
added to `LogRows`.

New counter `vl_rows_dropped_total{reason="msg_not_exist"}` is
introduced.

Example log output:
```
2024-09-23T02:33:19.719Z        warn    app/vlinsert/insertutils/common_params.go:189   dropping log line without _msg field; [{@timestamp 2024-09-18T13:42:16.600000000Z} {Attributes.array.attribute ["many","values"]} {Attributes.boolean.attribute true} {Attributes.double.attribute 637.704} {Attributes.int.attribute 10} {Attributes.map.attribute.some.map.key some value} {Attributes.string.attribute some string} {Body Example ddddddddddlog record} {Resource.service.name my.service} {Scope.my.scope.attribute some scope attribute} {Scope.name my.library} {Scope.version 1.0.0} {SeverityNumber 10} {SeverityText Information} {SpanId eee19b7ec3c1b174} {TraceFlags 0} {TraceId 5b8efff798038103d269b633813fc60c}]
```

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
- [ ] Benchmark for potential performance loss.

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-09-26 09:35:58 +02:00
f41gh7
64361c2d7a
follow-up after 01430a155c
* properly check SeverityNumber at FormatSeverity function
 it could be negative, which could cause panic for victorialogs
2024-09-04 15:39:55 +02:00
Andrii Chubatiuk
711f2cc4f2
vlinsert: added opentelemetry logs support
Commit adds the following changes:

* Adds support of OpenTelemetry logs for Victoria Logs with protobuf encoded messages

*  json encoding is not supported for the following reasons:
   - It brings a lot of fragile code, which works inefficiently.
   - json encoding is impossible to use with language SDK.

* splits metrics and logs structures at lib/protoparser/opentelemetry/pb package.

* adds docs with examples for opentelemetry logs.

---
Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4839

Co-authored-by: AndrewChubatiuk <andrew.chubatiuk@gmail.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2024-09-03 20:24:01 +02:00
f41gh7
dcc525b388
follow-up after 1731c0eabf
* updates change log
* adds VL-Debug http header
* updates doc
* extracts only the first value of http headers for VL-Stream-Fields and VL-Ignore-Fields.
  It makes behaviour the same as Query string args. And allows to easily configure client applications.
  Since most of the client collectors don't support multi value headers.

Signed-off-by: f41gh7 <nik@victoriametrics.com>
2024-09-03 20:24:01 +02:00
Andrii Chubatiuk
d5fe4566e5
app/vlinsert: support getting _msg_field, _time_field, _stream_fields and _ignore_fields from headers
*  Many collectors don't support forwarding url query params to the remote system. It makes impossible to define stream fields for it. Workaround with proxy between VictoriaLogs and log shipper is too complicated solution.

* This commit adds the following changes:
 * Adds fallback to to headers params, if query param is empty for:
     _msg_field -> VL-Msg-Field
    _stream_fields -> VL-Stream-Fields
    _ignore_fields -> VL-Ignore-Fields
    _time_field -> VL-Time-Field
 * removes deprecations from victorialogs compose files, added more
output format examples for logstash, telegraf, fluent-bit

 related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5310
2024-09-03 20:24:00 +02:00
Zakhar Bessarab
a3a0bafe76
app/vlinsert/elasticsearch: add fake response for logstash requests (#6742)
### Describe Your Changes

This is needed in order to support standard Elasticsearch output in
Logstash pipelines.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6660

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
(cherry picked from commit 58b6c54da2)
2024-08-06 16:30:11 +02:00
Aliaksandr Valialkin
a1decb5ca1
app/vlinsert/loki: use easyproto instead for parsing Loki protobuf messages 2024-07-10 03:05:55 +02:00
Aliaksandr Valialkin
73ca22bb7d
app/vlinsert/loki: remove unused functions from the generated protobuf code 2024-07-10 00:22:10 +02:00
Aliaksandr Valialkin
0912a652d5
app/vlinsert/insertutils: flush the ingested logs from in-memory buffer to storage every second
Previously the in-memory buffer could remain unflushed for long periods of time under low ingestion rate.
The ingested logs weren't visible for search during this time.
2024-07-02 01:39:45 +02:00
Aliaksandr Valialkin
ab28a1f93e
app/vlinsert/syslog: add an ability to use log ingestion time as the _time field 2024-07-02 01:39:45 +02:00
Aliaksandr Valialkin
c9fc8079c4
app/vlinsert/syslog: properly skip empty lines in Syslog protocol
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6548
2024-06-28 14:09:45 +02:00
Aliaksandr Valialkin
f24123a776
lib/logstorage: parse syslog structured data into separate fields in order to simplify further querying of this data 2024-06-25 14:54:25 +02:00
Aliaksandr Valialkin
1716c4e609
lib/logstorage: properly parse timezone offset at TryParseTimestampRFC3339Nano()
The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets.

While at it, make tryParseTimestampISO8601 function private in order to prevent
from improper usage of this function from outside the lib/logstorage package.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508
2024-06-25 14:54:24 +02:00
Aliaksandr Valialkin
ee114ca59c
app/vlinsert: properly parse timestamps with nanosecond precision at /insert/jsonline HTTP endpoint
This has been broken in 2b6a634ec0
2024-06-18 00:24:11 +02:00