### Describe Your Changes
`its'` -> `its`
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
Fix many spelling errors and some grammar, including misspellings in
filenames.
The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`.
While this is a breaking change, this metric isn't used in alerts or dashboards.
So it seems to have low impact on users.
The change also deprecates `cspell` as it is much heavier and less usable.
---------
Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
This reduces the size of LogRows.streamTagCanonicals by 1/3 because of the eliminated `cap` field
in the slice header (reflect.SliceHeader) compared to the string header (reflect.StringHeader).
This commit adds lib/chunkedbuffer.Buffer - an in-memory chunked buffer
optimized for random access via MustReadAt() function.
It is better than bytesutil.ByteBuffer for storing large volumes of data,
since it stores the data in chunks of a fixed size (4KiB at the moment)
instead of using a contiguous memory region. This has the following benefits over bytesutil.ByteBuffer:
- reduced memory fragmentation
- reduced memory re-allocations when new data is written to the buffer
- reduced memory usage, since the allocated chunks can be re-used
by other Buffer instances after Buffer.Reset() call
Performance tests show up to 2x memory reduction for VictoriaLogs
when ingesting logs with big number of fields (aka wide events) under high speed.
Pre-allocate the needed slice of strings and then assign items to it by index
instead of appending them. This reduces the number of memory allocations
and improves performance a bit.
The `ignore_fields` HTTTP query args can contain prefixes ending with '*'.
For example, `ignore_fields=foo.*,bar` skips all the fields starting with `foo.`
during data ingestion.
Long constant fields cannot be stored in columnsHeader as a const column,
because their size exceeds maxConstColumnValueSize, so they are stored as regular values.
This commit optimizes storing such fields by storing only a single value
across the field values in a block instead of storing multiple values.
This should improve data ingestion performance a bit. This also should improve query
performance when the query accesses such fields because of better cache locality.
Also improve persisting of constant string lengths by storing them only once.
Since funcs `ParseDuration` and `ParseTimeMsec` are used in vlogs,
vmalert, victoriametrics and other components, importing promutils only
for this reason makes them to export irrelevant
`vm_rows_invalid_total{type="prometheus"}` metric.
This change removes `vm_rows_invalid_total{type="prometheus"}` metric
from /metrics page for these components.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
For example, `field:~".+"`, `field:~".*"` or `field:""`
Replace such filters to faster ones. For example, `field:~".*"` is replaced with `*`,
while `field:~".+"` is replaced with `field:*`.
These filters can be used for selecting logs where one field value is less than another field value.
These filter complement `<=` and `<` filters for constant literals.
This allows using different intervals for flushing in-memory data among different mergeset.Table instances.
The initial user of this feature is lib/logstorage.Storage, which explicitly passes Storage.flushInterval
to every created mereset.Table instance. Previously mergeset.Table instances were using 5 seconds
flush interval, which didn't depend on the Storage.flushInterval.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
Make sure that the maximum log field name, which can be generated by JSONParser.ParseLogMessage,
doesn't exceed the hardcoded limit maxFieldNameSize. Stop flattening of nested JSON objects
when the resulting field name becomes longer than maxFieldNameSize, and return the nested JSON object
as a string instead.
This should prevent from parse errors when ingesting deeply nested JSON logs with long field names.
- `contains_any` selects logs with fields containing at least one word/phrase from the provided list.
The provided list can be generated by a subquery.
- `contains_all` selects logs with fields containing all the words and phrases from the provided list.
The provided list can be generated by a subquery.
Examples:
- `top 5 x, y` is equivalent to `top 5 by (x, y)`
- `uniq foo, bar` is equivalent to `uniq by (foo, bar)`
- `unroll foo, bar` is equivalent to `unroll (foo, bar)`
_time:<=max_time filter must include logs with timestamps matching max_time.
For example, _time:<=2025-02-24Z must include logs with timestamps until the end of February 24, 2025.
Examples:
_time:>=2025-02-24Z selects logs with timestamps bigger or equal to 2025-02-24 UTC
_time:>1d selects logs with timestamps older than one day comparing to the current time
This simplifies writing queries with _time filters.
See https://docs.victoriametrics.com/victorialogs/logsql/#time-filter