Fixes `docs/` and `README.md` typos and errors.
Signed-off-by: Arkadii Yakovets <ark@victoriametrics.com>
(cherry picked from commit c740a8042e
)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
11 KiB
sort | weight | title | menu | aliases | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 2 | VictoriaLogs key concepts |
|
|
VictoriaLogs key concepts
Data model
VictoriaLogs works with both structured and unstructured logs.
Every log entry must contain at least log message field plus arbitrary number of additional key=value
fields.
A single log entry can be expressed as a single-level JSON object with string keys and string values.
For example:
{
"job": "my-app",
"instance": "host123:4567",
"level": "error",
"client_ip": "1.2.3.4",
"trace_id": "1234-56789-abcdef",
"_msg": "failed to serve the client request"
}
Empty values are treated the same as non-existing values. For example, the following log entries are equivalent,
since they have only one identical non-empty field - _msg
:
{
"_msg": "foo bar",
"some_field": "",
"another_field": ""
}
{
"_msg": "foo bar",
"third_field": "",
}
{
"_msg": "foo bar",
}
VictoriaLogs automatically transforms multi-level JSON (aka nested JSON) into single-level JSON during data ingestion according to the following rules:
-
Nested dictionaries are flattened by concatenating dictionary keys with
.
char. For example, the following multi-level JSON is transformed into the following single-level JSON:{ "host": { "name": "foobar" "os": { "version": "1.2.3" } } }
{ "host.name": "foobar", "host.os.version": "1.2.3" }
-
Arrays, numbers and boolean values are converted into strings. This simplifies full-text search over such values. For example, the following JSON with an array, a number and a boolean value is converted into the following JSON with string values:
{ "tags": ["foo", "bar"], "offset": 12345, "is_error": false }
{ "tags": "[\"foo\", \"bar\"]", "offset": "12345", "is_error": "false" }
Both field name and field value may contain arbitrary chars. Such chars must be encoded during data ingestion according to JSON string encoding. Unicode chars must be encoded with UTF-8 encoding:
{
"field with whitespace": "value\nwith\nnewlines",
"Поле": "价值",
}
VictoriaLogs automatically indexes all the fields in all the ingested logs. This enables full-text search across all the fields.
VictoriaLogs supports the following special fields additionally to arbitrary other fields:
Message field
Every ingested log entry must contain at least a _msg
field with the actual log message. For example, this is the minimal
log entry, which can be ingested into VictoriaLogs:
{
"_msg": "some log message"
}
If the actual log message has other than _msg
field name, then it is possible to specify the real log message field
via _msg_field
query arg during data ingestion.
For example, if log message is located in the event.original
field, then specify _msg_field=event.original
query arg
during data ingestion.
Time field
The ingested log entries may contain _time
field with the timestamp of the ingested log entry.
The timestamp must be in RFC3339 format. The most commonly used subset of ISO8601
is also supported. It is allowed specifying seconds part of the timestamp with any precision up to nanoseconds.
For example, the following log entry contains valid timestamp with millisecond precision in the _time
field:
{
"_msg": "some log message",
"_time": "2023-04-12T06:38:11.095Z"
}
If the actual timestamp has other than _time
field name, then it is possible to specify the real timestamp
field via _time_field
query arg during data ingestion.
For example, if timestamp is located in the event.created
field, then specify _time_field=event.created
query arg
during data ingestion.
If _time
field is missing, then the data ingestion time is used as log entry timestamp.
The _time
field is used in time filter for quickly narrowing down
the search to a particular time range.
Stream fields
Some structured logging fields may uniquely identify the application instance, which generates log entries.
This may be either a single field such as instance="host123:456"
or a set of fields such as
{datacenter="...", env="...", job="...", instance="..."}
or
{kubernetes.namespace="...", kubernetes.node.name="...", kubernetes.pod.name="...", kubernetes.container.name="..."}
.
Log entries received from a single application instance form a log stream in VictoriaLogs. VictoriaLogs optimizes storing and querying of individual log streams. This provides the following benefits:
-
Reduced disk space usage, since a log stream from a single application instance is usually compressed better than a mixed log stream from multiple distinct applications.
-
Increased query performance, since VictoriaLogs needs to scan lower amounts of data when searching by stream fields.
Every ingested log entry is associated with a log stream. The name of this stream is stored in _stream
field.
This field has the format similar to labels in Prometheus metrics:
{field1="value1", ..., fieldN="valueN"}
For example, if host
and app
fields are associated with the stream, then the _stream
field will have {host="host-123",app="my-app"}
value
for the log entry with host="host-123"
and app="my-app"
fields. The _stream
field can be searched
with stream filters.
By default the value of _stream
field is {}
, since VictoriaLogs cannot determine automatically,
which fields uniquely identify every log stream. This may lead to not-so-optimal resource usage and query performance.
Therefore it is recommended specifying stream-level fields via _stream_fields
query arg
during data ingestion.
For example, if logs from Kubernetes containers have the following fields:
{
"kubernetes.namespace": "some-namespace",
"kubernetes.node.name": "some-node",
"kubernetes.pod.name": "some-pod",
"kubernetes.container.name": "some-container",
"_msg": "some log message"
}
then specify _stream_fields=kubernetes.namespace,kubernetes.node.name,kubernetes.pod.name,kubernetes.container.name
query arg during data ingestion in order to properly store
per-container logs into distinct streams.
How to determine which fields must be associated with log streams?
Log streams must contain fields, which uniquely identify the application instance, which generates logs.
For example, container
, instance
and host
are good candidates for stream fields.
Additional fields may be added to log streams if they remain constant during application instance lifetime.
For example, namespace
, node
, pod
and job
are good candidates for additional stream fields. Adding such fields to log streams
makes sense if you are going to use these fields during search and want speeding up it with stream filters.
There is no need to add all the constant fields to log streams, since this may increase resource usage during data ingestion and querying.
Never add non-constant fields to streams if these fields may change with every log entry of the same stream.
For example, ip
, user_id
and trace_id
must never be associated with log streams, since this may lead to high cardinality issues.
High cardinality
Some fields in the ingested logs may contain big number of unique values across log entries.
For example, fields with names such as ip
, user_id
or trace_id
tend to contain big number of unique values.
VictoriaLogs works perfectly with such fields unless they are associated with log streams.
Never associate high-cardinality fields with log streams, since this may lead to the following issues:
- Performance degradation during data ingestion and querying
- Increased memory usage
- Increased CPU usage
- Increased disk space usage
- Increased disk read / write IO
VictoriaLogs exposes vl_streams_created_total
metric,
which shows the number of created streams since the last VictoriaLogs restart. If this metric grows at a rapid rate
during long period of time, then there are high chances of high cardinality issues mentioned above.
VictoriaLogs can log all the newly registered streams when -logNewStreams
command-line flag is passed to it.
This can help narrowing down and eliminating high-cardinality fields from log streams.
Other fields
Every ingested log entry may contain arbitrary number of fields additionally to _msg
and _time
.
For example, level
, ip
, user_id
, trace_id
, etc. Such fields can be used for simplifying and optimizing search queries.
It is usually faster to search over a dedicated trace_id
field instead of searching for the trace_id
inside long log message.
E.g. the trace_id:="XXXX-YYYY-ZZZZ"
query usually works faster than the _msg:"trace_id=XXXX-YYYY-ZZZZ"
query.
See LogsQL docs for more details.