-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Context
PR #27 (BOR-518) and PR #28 (BOR-519) introduced raw line ingestion and an ordered event parser pipeline. Code review identified the following issues to address.
High Priority
1. Duplicate Parser interfaces and data structures (#27 + #28)
pkg/ingest/raw.go defines Parser interface + ParseResult struct, while pkg/event/parser.go defines lineParser interface + ParsedLine struct. Both do the same thing — parse a raw log line. The data structures (ParseResult vs ParsedLine) are also highly overlapping. These should be consolidated into a single parser contract.
2. Unclosed quotes cause loss of parseable fields (#28)
In pkg/event/parser.go, when a quoted value is unclosed (e.g. level=INFO msg="status ok), the logfmt parser fails. The key-value parser also rejects it (requires sawQuotedValue == false). The line falls through to prefix/plaintext, losing the already-parseable level=INFO. Consider a partial-parse fallback that preserves successfully parsed fields.
Medium Priority
3. go-errors Errorf with %w may not support unwrap (#27)
pkg/ingest/raw.go:98 uses errors.Errorf("insert raw log line: %w", err) from go-errors. Unlike fmt.Errorf, go-errors.Errorf may not support the %w verb for error wrapping, which would break errors.Is/errors.As chains. Verify or switch to fmt.Errorf.
4. No support for epoch/millisecond timestamps (#28)
parseTimestamp in pkg/event/parser.go only tries string time layouts. Numeric unix timestamps (seconds or milliseconds) like "ts": 1741644000 are common in JSON logs. After stringifyValue converts them to strings, parsing silently fails and the timestamp is lost.
5. Empty string values silently discarded (#28)
assignField in pkg/event/parser.go returns early when value == "". JSON payloads with "message": "" will have the field dropped. Empty strings are valid log field values and should be preserved.
Low Priority
6. logfmt vs key-value distinction relies solely on quoted values (#28)
The only difference between logfmt and key-value parsing is whether a quoted value was seen. A logfmt line without any quoted values (e.g. level=info ts=2026-01-01T00:00:00Z) will be classified as key-value instead of logfmt. Clarify if this is acceptable or if a better heuristic is needed.
7. Prefix parser may false-positive on level extraction (#28)
levelPrefixPattern matches any line starting with a word. A line like info this is a normal sentence will have info extracted as the log level. Consider requiring stricter patterns (e.g. uppercase, brackets, or colon suffix).
8. LogEntry missing source file identifier (#27)
FromRawLine does not populate any source/filename field on LogEntry, making it impossible to trace which file produced a given record after ingestion.
References
- PR BOR-518: add raw line ingestion fallback #27: BOR-518 raw line ingestion fallback
- PR BOR-519: Implement ordered event parser pipeline #28: BOR-519 ordered event parser pipeline