### Describe Your Changes
Fix many spelling errors and some grammar, including misspellings in
filenames.
The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`.
While this is a breaking change, this metric isn't used in alerts or dashboards.
So it seems to have low impact on users.
The change also deprecates `cspell` as it is much heavier and less usable.
---------
Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
(cherry picked from commit 76d205feae)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Properly decode protobuf-encoded Loki request if it has no Content-Encoding header.
Protobuf Loki message is snappy-encoded by default, so snappy decoding must be used
when Content-Encoding header is missing.
- Return back the previous signatures of parseJSONRequest and parseProtobufRequest functions.
This eliminates the churn in tests for these functions. This also fixes broken
benchmarks BenchmarkParseJSONRequest and BenchmarkParseProtobufRequest, which consume
the whole request body on the first iteration and do nothing on subsequent iterations.
- Put the CHANGELOG entries into correct places, since they were incorrectly put into already released
versions of VictoriaMetrics and VictoriaLogs.
- Add support for reading zstd-compressed data ingestion requests into the remaining protocols
at VictoriaLogs and VictoriaMetrics.
- Remove the `encoding` arg from PutUncompressedReader() - it has enough information about
the passed reader arg in order to properly deal with it.
- Add ReadUncompressedData to lib/protoparser/common for reading uncompressed data from the reader until EOF.
This allows removing repeated code across request-based protocol parsers without streaming mode.
- Consistently limit data ingestion request sizes, which can be read by ReadUncompressedData function.
Previously this wasn't the case for all the supported protocols.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8416
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8380
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8300
The `ignore_fields` HTTTP query args can contain prefixes ending with '*'.
For example, `ignore_fields=foo.*,bar` skips all the fields starting with `foo.`
during data ingestion.
This commit introduces common readers for multiple compression encoding algorithms.
Currently, supported encodings are:
* zstd
* gzip
* deflat
* snappy
It adds new common reader to the all VictoriaLogs ingestion protocols.
And updates opentelemetry metrics parsing for VictoriaMetrics components.
Also, it ports zstd stream parses from cluster branch.
Related issues:
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8380
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8300
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
Loki doesn't support well high-cardinality log fields (e.g. fields with big number of unique values).
That's why Promtail, Grafana Agent and Grafana Alloy encode such fields into a JSON and push them
as a plaintext log message to the remote storage. This isn't an efficient way to store high-cardinality
log fields in VictoriaLogs, since it is optimized for storing and querying such fields when they are stored
distinctly as a regular log fields according to VictoriaLogs data model ( https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model ).
This commit enables automatic parsing of JSON-encoded log fields at plaintext log message received over Loki protocol
and storing them as a separate log fields. This should improve data compression ratio and reduce disk space usage.
This should also improve query performance when the parsed log fields are used in queries for filtering and aggregation.
The old behaviour can be restored by passing -loki.disableMessageParsing command-line flag to VictoriaLogs.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8486
- Properly handle negative timestamps (e.g. timestamps before 1970-01-01)
- Optimize parsing floating-point timestamps by eliminating the memory allocation
needed for returning an error from strconv.ParseInt. Instead, check whether the string contains a dot,
and then parse it as a floating-point number.
- Add tests for ParseUnixTimestamp function.
- Make the code easier to understand and maintain by removing unneeded generic function toNano().
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8470
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8472
Previously, opentelemetry attribute parsed added extra field names according to
golang JSON parser spec for structs:
```
struct AnyValue{
StringValue string
}
```
Was serialized into:
```
{"StringValue": "some-string"}
```
While opentelemetry-collector serializes it as
```
"some-string"
```
This commit changes this behaviour it makes parses compatible with opentelemetry-collector format. See test cases for examples.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8384
This commit changes journald ingestion validation regex:
from `^[A-Z_][A-Z0-9_]+` to `^[A-Z_][A-Z0-9_]*`.
It's needed to properly support entities with single-character
names.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8314
Adding Accept-Encoding response header to support content negotiation,
which was introduced in [this
PR](https://github.com/systemd/systemd/pull/34822) and enables
compression on journald
(cherry picked from commit 3c9f9f49b0)
Previosly the parsing of the input stream was stopped after the first parse error.
This isn't what most users expect when ingesting JSON lines in a stream where some JSON lines may be invalid.
(cherry picked from commit ebac07bcf6)
It is better to use the AppendFieldsToJSON function directly
instead of hiding it under RowsFormatter abstraction.
(cherry picked from commit 95f182053b)
1. Do not copy every line from LineReader.buf to LineReader.Line - just refer the line at LineReader.buf.
2. Do not copy the next found line to the beginning of LineReader.buf - just track the next line start index with LineReader.bufOffset.
This reduces memory copying when many lines are read into LineReader.buf by a single read() syscall.
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7761
### Describe Your Changes
- datadog /api/v2/logs api supports message field in json format, which
is not documented and is used by serverless extension. This PR allows
message field to be both string and object type. Also added support of
not documented timestamp field
- added `-datadog.streamFields` and `-datadog.ignoreFields` flags to
configure default stream fields for datadog logs, where there's no
alternative option to pass extra headers and query args
- added ingest `max` and `min` values of data, which are ingested using
`datadogsketches` API, which is also actively used by serverless
extensions
- use default `.` separator instead of `_` for sketches metric names
until metrics are not sanitized
The storage isn't designed to work efficiently with logs containing too many log fields.
It is better to emit a warning to the user and ignore such logs instead of trying to store them.
This will allow fixing the issue by the user ASAP, and won't lead to excess resource usage
at VictoriaLogs side, such as RAM, CPU, disk IO and disk space.
While at it, ignore too long logs with the size exceeding the maximum block size during data ingestion.
This should prevent from possible issues when dealing with such long logs if they were stored in the storage.
Emit a warning in this case, so the user could identify and fix the issue ASAP.
This is a follow-up for 22e6385f56
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7568
Previously too long line in Elasticsearch bulk import protocol resulted in clsoing
the client stream and ignoring the rest of log messages in the stream.
Now only the too long message is ignored properly, while the rest of log messages
are read successfully.
This is a follow-up for 61e7c77ce25967269192ed2e201f67d8c48b972e
Previously vl_rows_ingested_total metric was tracked individually per each supported data ingestion protocols.
It is better from maintainability PoV tracking this metric consistently in a single place - at logMessageProcessor.AddRow() function
in the same way as vl_bytes_ingested_total metric is tracked.
This is a follow-up for 50bfa689c9
This metric tracks an approximate amounts of bytes processed when parsing the ingested logs.
The metric is exposed individually per every supported data ingestion protocol. The protocol name
is exposed via "type" label in order to be consistent with vl_rows_ingested_total metric.
Thanks to @tenmozes for the initial idea and implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7682
While at it, remove the unneeded "format" label from vl_rows_ingested_total metric.
The "type" label must be enough for encoding the data ingestion format.
Loki protocol supports optional `metadata` object for each ingested line. It's added as 3rd field at the (ts,msg,metadata) tuple. Previously, loki request json parsers rejected log line if tuple size != 2.
This commit allows optional tuple field. It parses it as json object and adds it as log metadata fields to the log message stream.
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7431
---------
Co-authored-by: f41gh7 <nik@victoriametrics.com>
(cherry picked from commit 3aeb1b96a2)
This commit adds the following changes:
- Added support to push datadog logs with examples of how to ingest data
using Vector and Fluentbit
- Updated VictoriaLogs examples directory structure to have single
container image for victorialogs, agent (fluentbit, vector, etc) but
multiple configurations for different protocols
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6632
(cherry picked from commit e0930687f1)
In this case the _msg field is set to the value specified in the -defaultMsgValue command-line flag.
This should simplify first-time migration to VictoriaLogs from other systems.
(cherry picked from commit 16ee470da6)
This msy be useful when ingesting logs from different sources, which store the log message in different fields.
For example, `_msg_field=message,event.data,some_field` will get log message from the first non-empty field:
`message`, `event.data` and `some_field`.
(cherry picked from commit ed73f8350b)
Empty fields are treated as non-existing fields by VictoriaLogs data model.
So there is no sense in returning empty fields in query results, since they may mislead and confuse users.
(cherry picked from commit bac193e50b)
Use local timezone of the host server in this case. The timezone can be overridden
with TZ environment variable if needed.
While at it, allow using whitespace instead of T as a delimiter between data and time
in the ingested _time field. For example, '2024-09-20 10:20:30' is now accepted
during data ingestion. This is valid ISO8601 format, which is used by some log shippers,
so it should be supported. This format is also known as SQL datetime format.
Also assume local time zone when time without timezone information is passed to querying APIs.
Previously such a time was parsed in UTC timezone. Add `Z` to the end of the time string
if the old behaviour is preferred.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6721
### Describe Your Changes
VictoriaLogs allows logs without `_msg` field or `_msg` field is empty.
This lead to incorrect search result. See:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6785
This pull request search for non-empty `_msg` field before log entry is
added to `LogRows`.
New counter `vl_rows_dropped_total{reason="msg_not_exist"}` is
introduced.
Example log output:
```
2024-09-23T02:33:19.719Z warn app/vlinsert/insertutils/common_params.go:189 dropping log line without _msg field; [{@timestamp 2024-09-18T13:42:16.600000000Z} {Attributes.array.attribute ["many","values"]} {Attributes.boolean.attribute true} {Attributes.double.attribute 637.704} {Attributes.int.attribute 10} {Attributes.map.attribute.some.map.key some value} {Attributes.string.attribute some string} {Body Example ddddddddddlog record} {Resource.service.name my.service} {Scope.my.scope.attribute some scope attribute} {Scope.name my.library} {Scope.version 1.0.0} {SeverityNumber 10} {SeverityText Information} {SpanId eee19b7ec3c1b174} {TraceFlags 0} {TraceId 5b8efff798038103d269b633813fc60c}]
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
- [ ] Benchmark for potential performance loss.
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>