-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Summary
Build the first version of an event-native log exploration pipeline for lapp.
The core idea is to treat each log line as an event with:
- raw text payload
- extracted attributes
- inferred metadata
This issue tracks the work needed to turn plain text logs into a searchable, filterable event stream with basic pattern grouping and drilldown-friendly metadata.
Goals
- Ingest plain text logs as structured events
- Extract stable metadata from text when no structured envelope exists
- Keep raw text as the source of truth
- Separate explicit parsed attributes from inferred metadata
- Enable basic exploration through timeline, facets, and event list views
Non-goals
- Perfect semantic extraction
- Full natural-language understanding of logs
- Complex multi-entity graph modeling in v1
- Advanced query language design
Proposed event model
{
"ts": "2026-03-10T21:00:00Z",
"text": "raw log line",
"attrs": {
"level": "error",
"service": "payments-api",
"env": "prod",
"request_id": "req_123",
"trace_id": "trace_456",
"user_id": "user_789",
"endpoint": "/checkout"
},
"inferred": {
"pattern": "user <id> failed to login",
"entity": "payments-api"
}
}Design principles
- Raw text must always be preserved
attrsandinferredmust stay separate- Favor deterministic extraction over clever guessing
- Entity detection is a navigation aid, not ground truth
- The first version should optimize for usefulness, not completeness
Execution plan
Phase 1: Event schema
- Define the v1 event schema
- Add sample event fixtures covering JSON, logfmt, key=value, and plain text logs
- Document required vs optional fields
Phase 2: Ingestion foundation
- Implement a raw log line -> event entry point
- Ensure every line can be ingested even when parsing fails
- Preserve original text unchanged in storage
Phase 3: Parser pipeline
- Add JSON parser
- Add logfmt parser
- Add key=value parser
- Add regex-based prefix parser for common timestamp/level formats
- Add plain text fallback parser
- Define parser ordering and fallback behavior
Phase 4: Stable attribute extraction
- Extract timestamp
- Extract severity / level
- Extract service name candidates
- Extract environment candidates
- Extract request ID / trace ID / span ID / correlation ID
- Extract endpoint / route candidates when possible
- Extract user / tenant identifiers when explicitly present
Phase 5: Canonical normalization
- Create canonical field mappings (for example
service,service_name,service.name->attrs.service) - Normalize severity values to a fixed enum
- Normalize environment values (
production->prod, etc.) - Normalize endpoint values where safe
- Add tests for alias resolution and value normalization
Phase 6: Inference layer
- Implement basic pattern extraction by replacing variable tokens (numbers, UUIDs, IDs, hashes)
- Store a normalized
inferred.pattern - Implement minimal entity inference
- Use
attrs.serviceas the primary entity when available - Fall back to heuristic text-based inference only when necessary
Phase 7: Indexing and filtering
- Support filtering by time range
- Support filtering by level
- Support filtering by service
- Support filtering by environment
- Support filtering by request ID / trace ID
- Support filtering by pattern
Phase 8: Minimal exploration UI
- Build a timeline view showing event counts over time
- Build a facet panel for top values (level, service, env, pattern)
- Build an event list view showing raw text and extracted metadata
- Add click-to-filter interactions from facets and event rows
Phase 9: Quality and observability
- Record extraction source for each parsed field where useful
- Measure parser hit rates
- Measure missing-field rates for timestamp, level, and service
- Add fixture-based tests for common real-world log shapes
Suggested milestone split
Milestone 1: Ingestion + schema
- Event schema
- Raw ingestion path
- Parser pipeline scaffold
Milestone 2: Basic extraction
- Timestamp
- Level
- Service
- Request / trace identifiers
- Canonical normalization
Milestone 3: Usable exploration
- Pattern extraction
- Basic indexing
- Timeline + facets + event list
- Click-to-filter
Acceptance criteria for v1
- A plain text log line can always be ingested as an event
- Common structured log formats can populate
attrs - Users can filter events by time, level, service, and pattern
- Users can inspect raw text alongside extracted metadata
- Pattern grouping works well enough to reduce repeated noisy lines
Open questions
- What is the canonical field schema for lapp beyond the v1 core fields?
- Should inferred fields carry confidence scores in v1 or wait until v2?
- What storage/index model is best for raw text + extracted attrs + inferred fields?
- Should request/trace correlation be part of v1 or follow immediately after?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels