Commit graph

24 commits

Author SHA1 Message Date
Aliaksandr Valialkin
ebac07bcf6
app/vlinsert: continue parsing JSON lines in the input stream after parse errors
Previosly the parsing of the input stream was stopped after the first parse error.
This isn't what most users expect when ingesting JSON lines in a stream where some JSON lines may be invalid.
2025-02-10 15:00:58 +01:00
Aliaksandr Valialkin
c26cbf57dd
app/vlinsert: accept timestamps with microsecond and nanosecond precision at _time field 2025-02-09 22:41:38 +01:00
Aliaksandr Valialkin
17b813ba28
app/vlinsert: use default set of log stream fields for Loki and OpenTelemetry protocols if _stream_fields query arg is empty
Loki protocol supports a list of log stream labels - see https://grafana.com/docs/loki/latest/get-started/labels/

OpenTelemetry protocol also supports a list of log stream labels, which are named resource attributes there.
See https://opentelemetry.io/docs/concepts/resources/#semantic-attributes-with-sdk-provided-default-value

Simplify logs' ingestion into VictoriaLogs for these protocols by allowing the data ingestion without
the need to specify _stream_fields query arg or VL-Stream-Fields HTTP header. In this case the upstream log stream fields
are used during data ingestion. The set of log stream fields can be overriden via _stream_fields query arg
and via VL-Stream-Fields HTTP header if needed.

Thanks to @AndrewChubatiuk for the initial idea and implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7554
2024-12-04 13:57:23 +01:00
Aliaksandr Valialkin
480a8be48f
app/vlinsert: track vl_rows_ingested_total metric in a single place
Previously vl_rows_ingested_total metric was tracked individually per each supported data ingestion protocols.
It is better from maintainability PoV tracking this metric consistently in a single place - at logMessageProcessor.AddRow() function
in the same way as vl_bytes_ingested_total metric is tracked.

This is a follow-up for 50bfa689c9
2024-12-04 12:18:30 +01:00
Aliaksandr Valialkin
c58d0549a8
app/vlinsert: continue parsing lines after too long lines in JSON line stream and Elasticsearch bulk import stream
Previously all the lines after the too long line in the stream were ignored. This wasn't expected by most users.
2024-12-04 12:18:28 +01:00
Aliaksandr Valialkin
50bfa689c9
app/vlinsert: expose vl_bytes_ingested_total metric
This metric tracks an approximate amounts of bytes processed when parsing the ingested logs.
The metric is exposed individually per every supported data ingestion protocol. The protocol name
is exposed via "type" label in order to be consistent with vl_rows_ingested_total metric.

Thanks to @tenmozes for the initial idea and implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7682

While at it, remove the unneeded "format" label from vl_rows_ingested_total metric.
The "type" label must be enough for encoding the data ingestion format.
2024-11-30 17:25:57 +01:00
Aliaksandr Valialkin
d2dce13df6
app/vlinsert: typo fix after 16ee470da6 2024-10-30 17:59:49 +01:00
Aliaksandr Valialkin
16ee470da6
app/vlinsert: accept logs with empty _msg field
In this case the _msg field is set to the value specified in the -defaultMsgValue command-line flag.

This should simplify first-time migration to VictoriaLogs from other systems.
2024-10-30 14:59:38 +01:00
Aliaksandr Valialkin
ed73f8350b
app/vlinsert: allow specifying comma-separated list of fields containing log message via _msg_field query arg and VL-Msg-Field HTTP request header
This msy be useful when ingesting logs from different sources, which store the log message in different fields.
For example, `_msg_field=message,event.data,some_field` will get log message from the first non-empty field:
`message`, `event.data` and `some_field`.
2024-10-30 14:17:33 +01:00
Aliaksandr Valialkin
bac193e50b
app/vlselect: do not show empty fields in query results
Empty fields are treated as non-existing fields by VictoriaLogs data model.
So there is no sense in returning empty fields in query results, since they may mislead and confuse users.
2024-10-14 23:43:58 +02:00
Aliaksandr Valialkin
037652d5ae
app/vlinsert: support _time field without timezone information during data ingestion
Use local timezone of the host server in this case. The timezone can be overridden
with TZ environment variable if needed.

While at it, allow using whitespace instead of T as a delimiter between data and time
in the ingested _time field. For example, '2024-09-20 10:20:30' is now accepted
during data ingestion. This is valid ISO8601 format, which is used by some log shippers,
so it should be supported. This format is also known as SQL datetime format.

Also assume local time zone when time without timezone information is passed to querying APIs.
Previously such a time was parsed in UTC timezone. Add `Z` to the end of the time string
if the old behaviour is preferred.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6721
2024-09-26 12:49:35 +02:00
Aliaksandr Valialkin
3eda4617c0
app/vlinsert: properly parse timestamps with nanosecond precision at /insert/jsonline HTTP endpoint
This has been broken in 2b6a634ec0
2024-06-18 00:23:25 +02:00
Aliaksandr Valialkin
478468e6cd
app/vlinsert: properly parse length-delimited syslog messages sent over TCP according to RFC5425 2024-06-17 22:28:26 +02:00
Aliaksandr Valialkin
2b6a634ec0
lib/logstorage: work-in-progress 2024-06-17 12:13:18 +02:00
Aliaksandr Valialkin
22107421eb
lib/logstorage: work-in-progress 2024-05-22 21:01:20 +02:00
Aliaksandr Valialkin
ad505a7a9a
lib/logstorage: work-in-progress 2024-05-20 04:08:30 +02:00
Aliaksandr Valialkin
7b33a27874
lib/logstorage: follow-up for 8a23d08c21
- Compare the actual free disk space to the value provided via -storage.minFreeDiskSpaceBytes
  directly inside the Storage.IsReadOnly(). This should work fast in most cases.
  This simplifies the logic at lib/storage.

- Do not take into account -storage.minFreeDiskSpaceBytes during background merges, since
  it results in uncontrolled growth of small parts when the free disk space approaches -storage.minFreeDiskSpaceBytes.
  The background merge logic uses another mechanism for determining whether there is enough
  disk space for the merge - it reserves the needed disk space before the merge
  and releases it after the merge. This prevents from out of disk space errors during background merge.

- Properly handle corner cases for flushing in-memory data to disk when the storage
  enters read-only mode. This is better than losing the in-memory data.

- Return back Storage.MustAddRows() instead of Storage.AddRows(),
  since the only case when AddRows() can return error is when the storage is in read-only mode.
  This case must be handled by the caller by calling Storage.IsReadOnly()
  before adding rows to the storage.
  This simplifies the code a bit, since the caller of Storage.MustAddRows() shouldn't handle
  errors returned by Storage.AddRows().

- Properly store parsed logs to Storage if parts of the request contain invalid log lines.
  Previously the parsed logs could be lost in this case.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4945
2023-10-02 16:52:23 +02:00
Zakhar Bessarab
8a23d08c21
lib/logstorage: switch to read-only mode when running out of disk space (#4945)
* lib/logstorage: switch to read-only mode when running out of disk space

Added support of `--storage.minFreeDiskSpaceBytes` command-line flag to allow graceful handling of running out of disk space at `--storageDataPath`.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/logstorage: fix error handling logic during merge

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/logstorage: fix log level

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-29 11:55:38 +02:00
Aliaksandr Valialkin
e9647bb669
app/vlinsert: follow-up for d570763c91
- Switch from summary to histogram for vl_http_request_duration_seconds metric.
  This allows calculating request duration quantiles across multiple hosts
  via histogram_quantile(0.99, sum(vl_http_request_duration_seconds_bucket) by (vmrange)).
- Take into account only successfully processed data ingestion requests
  when updating vl_http_request_duration_seconds histogram.
  Failed requests are ignored, since they may significantly skew measurements.
- Clarify the description of the change at docs/VictoriaLogs/CHANGELOG.md.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4934
2023-09-19 00:02:43 +02:00
crossoverJie
d570763c91
app/vlinsert: Add vl_http_request_duration_seconds metrics (#4934) 2023-09-16 15:10:29 +02:00
Aliaksandr Valialkin
f548adce0b
app/vlinsert/loki: follow-up after 09df5b66fd
- Parse protobuf if Content-Type isn't set to `application/json` - this behavior is documented at https://grafana.com/docs/loki/latest/api/#push-log-entries-to-loki

- Properly handle gzip'ped JSON requests. The `gzip` header must be read from `Content-Encoding` instead of `Content-Type` header

- Properly flush all the parsed logs with the explicit call to vlstorage.MustAddRows() at the end of query handler

- Check JSON field types more strictly.

- Allow parsing Loki timestamp as floating-point number. Such a timestamp can be generated by some clients,
  which store timestamps in float64 instead of int64.

- Optimize parsing of Loki labels in Prometheus text exposition format.

- Simplify tests.

- Remove lib/slicesutil, since there are no more users for it.

- Update docs with missing info and fix various typos. For example, it should be enough to have `instance` and `job` labels
  as stream fields in most Loki setups.

- Allow empty of missing timestamps in the ingested logs.
  The current timestamp at VictoriaLogs side is then used for the ingested logs.
  This simplifies debugging and testing of the provided HTTP-based data ingestion APIs.

The remaining MAJOR issue, which needs to be addressed: victoria-logs binary size increased from 13MB to 22MB
after adding support for Loki data ingestion protocol at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4482 .
This is because of shitty protobuf dependencies. They must be replaced with another protobuf implementation
similar to the one used at lib/prompb or lib/prompbmarshal .
2023-07-20 16:48:21 -07:00
Aliaksandr Valialkin
dde9ceed07
app/vlinsert/jsonline: code prettifying 2023-06-21 19:39:22 -07:00
Alexander Marshalov
944793c0f7
removed debug message from jsonlines handler of victorialogs (#4492)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-06-21 09:34:12 -07:00
Alexander Marshalov
bf081d157e
jsonline support for data ingestion in vlinsert (#4487)
added json lines / json stream format for ingestion to vlinsert
2023-06-21 15:31:28 +02:00