Skip to content

Consolidate parser interfaces and fix edge cases in ingest/event pipeline #29

@STRRL

Description

@STRRL

Context

PR #27 (BOR-518) and PR #28 (BOR-519) introduced raw line ingestion and an ordered event parser pipeline. Code review identified the following issues to address.

High Priority

1. Duplicate Parser interfaces and data structures (#27 + #28)

pkg/ingest/raw.go defines Parser interface + ParseResult struct, while pkg/event/parser.go defines lineParser interface + ParsedLine struct. Both do the same thing — parse a raw log line. The data structures (ParseResult vs ParsedLine) are also highly overlapping. These should be consolidated into a single parser contract.

2. Unclosed quotes cause loss of parseable fields (#28)

In pkg/event/parser.go, when a quoted value is unclosed (e.g. level=INFO msg="status ok), the logfmt parser fails. The key-value parser also rejects it (requires sawQuotedValue == false). The line falls through to prefix/plaintext, losing the already-parseable level=INFO. Consider a partial-parse fallback that preserves successfully parsed fields.

Medium Priority

3. go-errors Errorf with %w may not support unwrap (#27)

pkg/ingest/raw.go:98 uses errors.Errorf("insert raw log line: %w", err) from go-errors. Unlike fmt.Errorf, go-errors.Errorf may not support the %w verb for error wrapping, which would break errors.Is/errors.As chains. Verify or switch to fmt.Errorf.

4. No support for epoch/millisecond timestamps (#28)

parseTimestamp in pkg/event/parser.go only tries string time layouts. Numeric unix timestamps (seconds or milliseconds) like "ts": 1741644000 are common in JSON logs. After stringifyValue converts them to strings, parsing silently fails and the timestamp is lost.

5. Empty string values silently discarded (#28)

assignField in pkg/event/parser.go returns early when value == "". JSON payloads with "message": "" will have the field dropped. Empty strings are valid log field values and should be preserved.

Low Priority

6. logfmt vs key-value distinction relies solely on quoted values (#28)

The only difference between logfmt and key-value parsing is whether a quoted value was seen. A logfmt line without any quoted values (e.g. level=info ts=2026-01-01T00:00:00Z) will be classified as key-value instead of logfmt. Clarify if this is acceptable or if a better heuristic is needed.

7. Prefix parser may false-positive on level extraction (#28)

levelPrefixPattern matches any line starting with a word. A line like info this is a normal sentence will have info extracted as the log level. Consider requiring stricter patterns (e.g. uppercase, brackets, or colon suffix).

8. LogEntry missing source file identifier (#27)

FromRawLine does not populate any source/filename field on LogEntry, making it impossible to trace which file produced a given record after ingestion.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions