1. Do not copy every line from LineReader.buf to LineReader.Line - just refer the line at LineReader.buf.
2. Do not copy the next found line to the beginning of LineReader.buf - just track the next line start index with LineReader.bufOffset.
This reduces memory copying when many lines are read into LineReader.buf by a single read() syscall.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7761
### Describe Your Changes
- datadog /api/v2/logs api supports message field in json format, which
is not documented and is used by serverless extension. This PR allows
message field to be both string and object type. Also added support of
not documented timestamp field
- added `-datadog.streamFields` and `-datadog.ignoreFields` flags to
configure default stream fields for datadog logs, where there's no
alternative option to pass extra headers and query args
- added ingest `max` and `min` values of data, which are ingested using
`datadogsketches` API, which is also actively used by serverless
extensions
- use default `.` separator instead of `_` for sketches metric names
until metrics are not sanitized
The storage isn't designed to work efficiently with logs containing too many log fields.
It is better to emit a warning to the user and ignore such logs instead of trying to store them.
This will allow fixing the issue by the user ASAP, and won't lead to excess resource usage
at VictoriaLogs side, such as RAM, CPU, disk IO and disk space.
While at it, ignore too long logs with the size exceeding the maximum block size during data ingestion.
This should prevent from possible issues when dealing with such long logs if they were stored in the storage.
Emit a warning in this case, so the user could identify and fix the issue ASAP.
This is a follow-up for 22e6385f56
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7568
Previously too long line in Elasticsearch bulk import protocol resulted in clsoing
the client stream and ignoring the rest of log messages in the stream.
Now only the too long message is ignored properly, while the rest of log messages
are read successfully.
This is a follow-up for 61e7c77ce25967269192ed2e201f67d8c48b972e
Previously vl_rows_ingested_total metric was tracked individually per each supported data ingestion protocols.
It is better from maintainability PoV tracking this metric consistently in a single place - at logMessageProcessor.AddRow() function
in the same way as vl_bytes_ingested_total metric is tracked.
This is a follow-up for 50bfa689c9
This metric tracks an approximate amounts of bytes processed when parsing the ingested logs.
The metric is exposed individually per every supported data ingestion protocol. The protocol name
is exposed via "type" label in order to be consistent with vl_rows_ingested_total metric.
Thanks to @tenmozes for the initial idea and implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7682
While at it, remove the unneeded "format" label from vl_rows_ingested_total metric.
The "type" label must be enough for encoding the data ingestion format.
Loki protocol supports optional `metadata` object for each ingested line. It's added as 3rd field at the (ts,msg,metadata) tuple. Previously, loki request json parsers rejected log line if tuple size != 2.
This commit allows optional tuple field. It parses it as json object and adds it as log metadata fields to the log message stream.
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7431
---------
Co-authored-by: f41gh7 <nik@victoriametrics.com>
This commit adds the following changes:
- Added support to push datadog logs with examples of how to ingest data
using Vector and Fluentbit
- Updated VictoriaLogs examples directory structure to have single
container image for victorialogs, agent (fluentbit, vector, etc) but
multiple configurations for different protocols
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6632
In this case the _msg field is set to the value specified in the -defaultMsgValue command-line flag.
This should simplify first-time migration to VictoriaLogs from other systems.
This msy be useful when ingesting logs from different sources, which store the log message in different fields.
For example, `_msg_field=message,event.data,some_field` will get log message from the first non-empty field:
`message`, `event.data` and `some_field`.
Empty fields are treated as non-existing fields by VictoriaLogs data model.
So there is no sense in returning empty fields in query results, since they may mislead and confuse users.
Use local timezone of the host server in this case. The timezone can be overridden
with TZ environment variable if needed.
While at it, allow using whitespace instead of T as a delimiter between data and time
in the ingested _time field. For example, '2024-09-20 10:20:30' is now accepted
during data ingestion. This is valid ISO8601 format, which is used by some log shippers,
so it should be supported. This format is also known as SQL datetime format.
Also assume local time zone when time without timezone information is passed to querying APIs.
Previously such a time was parsed in UTC timezone. Add `Z` to the end of the time string
if the old behaviour is preferred.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6721
### Describe Your Changes
VictoriaLogs allows logs without `_msg` field or `_msg` field is empty.
This lead to incorrect search result. See:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6785
This pull request search for non-empty `_msg` field before log entry is
added to `LogRows`.
New counter `vl_rows_dropped_total{reason="msg_not_exist"}` is
introduced.
Example log output:
```
2024-09-23T02:33:19.719Z warn app/vlinsert/insertutils/common_params.go:189 dropping log line without _msg field; [{@timestamp 2024-09-18T13:42:16.600000000Z} {Attributes.array.attribute ["many","values"]} {Attributes.boolean.attribute true} {Attributes.double.attribute 637.704} {Attributes.int.attribute 10} {Attributes.map.attribute.some.map.key some value} {Attributes.string.attribute some string} {Body Example ddddddddddlog record} {Resource.service.name my.service} {Scope.my.scope.attribute some scope attribute} {Scope.name my.library} {Scope.version 1.0.0} {SeverityNumber 10} {SeverityText Information} {SpanId eee19b7ec3c1b174} {TraceFlags 0} {TraceId 5b8efff798038103d269b633813fc60c}]
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
- [ ] Benchmark for potential performance loss.
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Commit adds the following changes:
* Adds support of OpenTelemetry logs for Victoria Logs with protobuf encoded messages
* json encoding is not supported for the following reasons:
- It brings a lot of fragile code, which works inefficiently.
- json encoding is impossible to use with language SDK.
* splits metrics and logs structures at lib/protoparser/opentelemetry/pb package.
* adds docs with examples for opentelemetry logs.
---
Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4839
Co-authored-by: AndrewChubatiuk <andrew.chubatiuk@gmail.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
* updates change log
* adds VL-Debug http header
* updates doc
* extracts only the first value of http headers for VL-Stream-Fields and VL-Ignore-Fields.
It makes behaviour the same as Query string args. And allows to easily configure client applications.
Since most of the client collectors don't support multi value headers.
Signed-off-by: f41gh7 <nik@victoriametrics.com>
* Many collectors don't support forwarding url query params to the remote system. It makes impossible to define stream fields for it. Workaround with proxy between VictoriaLogs and log shipper is too complicated solution.
* This commit adds the following changes:
* Adds fallback to to headers params, if query param is empty for:
_msg_field -> VL-Msg-Field
_stream_fields -> VL-Stream-Fields
_ignore_fields -> VL-Ignore-Fields
_time_field -> VL-Time-Field
* removes deprecations from victorialogs compose files, added more
output format examples for logstash, telegraf, fluent-bit
related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5310
Previously the in-memory buffer could remain unflushed for long periods of time under low ingestion rate.
The ingested logs weren't visible for search during this time.
The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets.
While at it, make tryParseTimestampISO8601 function private in order to prevent
from improper usage of this function from outside the lib/logstorage package.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508