Tracking: event-native log ingestion and exploration foundation

## Summary

Build the first version of an event-native log exploration pipeline for lapp.

The core idea is to treat each log line as an **event** with:
- raw text payload
- extracted attributes
- inferred metadata

This issue tracks the work needed to turn plain text logs into a searchable, filterable event stream with basic pattern grouping and drilldown-friendly metadata.

## Goals

- Ingest plain text logs as structured events
- Extract stable metadata from text when no structured envelope exists
- Keep raw text as the source of truth
- Separate explicit parsed attributes from inferred metadata
- Enable basic exploration through timeline, facets, and event list views

## Non-goals

- Perfect semantic extraction
- Full natural-language understanding of logs
- Complex multi-entity graph modeling in v1
- Advanced query language design

## Proposed event model

```json
{
  "ts": "2026-03-10T21:00:00Z",
  "text": "raw log line",
  "attrs": {
    "level": "error",
    "service": "payments-api",
    "env": "prod",
    "request_id": "req_123",
    "trace_id": "trace_456",
    "user_id": "user_789",
    "endpoint": "/checkout"
  },
  "inferred": {
    "pattern": "user <id> failed to login",
    "entity": "payments-api"
  }
}
```

## Design principles

- Raw text must always be preserved
- `attrs` and `inferred` must stay separate
- Favor deterministic extraction over clever guessing
- Entity detection is a navigation aid, not ground truth
- The first version should optimize for usefulness, not completeness

## Execution plan

### Phase 1: Event schema
- [ ] Define the v1 event schema
- [ ] Add sample event fixtures covering JSON, logfmt, key=value, and plain text logs
- [ ] Document required vs optional fields

### Phase 2: Ingestion foundation
- [ ] Implement a raw log line -> event entry point
- [ ] Ensure every line can be ingested even when parsing fails
- [ ] Preserve original text unchanged in storage

### Phase 3: Parser pipeline
- [ ] Add JSON parser
- [ ] Add logfmt parser
- [ ] Add key=value parser
- [ ] Add regex-based prefix parser for common timestamp/level formats
- [ ] Add plain text fallback parser
- [ ] Define parser ordering and fallback behavior

### Phase 4: Stable attribute extraction
- [ ] Extract timestamp
- [ ] Extract severity / level
- [ ] Extract service name candidates
- [ ] Extract environment candidates
- [ ] Extract request ID / trace ID / span ID / correlation ID
- [ ] Extract endpoint / route candidates when possible
- [ ] Extract user / tenant identifiers when explicitly present

### Phase 5: Canonical normalization
- [ ] Create canonical field mappings (for example `service`, `service_name`, `service.name` -> `attrs.service`)
- [ ] Normalize severity values to a fixed enum
- [ ] Normalize environment values (`production` -> `prod`, etc.)
- [ ] Normalize endpoint values where safe
- [ ] Add tests for alias resolution and value normalization

### Phase 6: Inference layer
- [ ] Implement basic pattern extraction by replacing variable tokens (numbers, UUIDs, IDs, hashes)
- [ ] Store a normalized `inferred.pattern`
- [ ] Implement minimal entity inference
- [ ] Use `attrs.service` as the primary entity when available
- [ ] Fall back to heuristic text-based inference only when necessary

### Phase 7: Indexing and filtering
- [ ] Support filtering by time range
- [ ] Support filtering by level
- [ ] Support filtering by service
- [ ] Support filtering by environment
- [ ] Support filtering by request ID / trace ID
- [ ] Support filtering by pattern

### Phase 8: Minimal exploration UI
- [ ] Build a timeline view showing event counts over time
- [ ] Build a facet panel for top values (level, service, env, pattern)
- [ ] Build an event list view showing raw text and extracted metadata
- [ ] Add click-to-filter interactions from facets and event rows

### Phase 9: Quality and observability
- [ ] Record extraction source for each parsed field where useful
- [ ] Measure parser hit rates
- [ ] Measure missing-field rates for timestamp, level, and service
- [ ] Add fixture-based tests for common real-world log shapes

## Suggested milestone split

### Milestone 1: Ingestion + schema
- Event schema
- Raw ingestion path
- Parser pipeline scaffold

### Milestone 2: Basic extraction
- Timestamp
- Level
- Service
- Request / trace identifiers
- Canonical normalization

### Milestone 3: Usable exploration
- Pattern extraction
- Basic indexing
- Timeline + facets + event list
- Click-to-filter

## Acceptance criteria for v1

- A plain text log line can always be ingested as an event
- Common structured log formats can populate `attrs`
- Users can filter events by time, level, service, and pattern
- Users can inspect raw text alongside extracted metadata
- Pattern grouping works well enough to reduce repeated noisy lines

## Open questions

- What is the canonical field schema for lapp beyond the v1 core fields?
- Should inferred fields carry confidence scores in v1 or wait until v2?
- What storage/index model is best for raw text + extracted attrs + inferred fields?
- Should request/trace correlation be part of v1 or follow immediately after?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking: event-native log ingestion and exploration foundation #25

Summary

Goals

Non-goals

Proposed event model

Design principles

Execution plan

Phase 1: Event schema

Phase 2: Ingestion foundation

Phase 3: Parser pipeline

Phase 4: Stable attribute extraction

Phase 5: Canonical normalization

Phase 6: Inference layer

Phase 7: Indexing and filtering

Phase 8: Minimal exploration UI

Phase 9: Quality and observability

Suggested milestone split

Milestone 1: Ingestion + schema

Milestone 2: Basic extraction

Milestone 3: Usable exploration

Acceptance criteria for v1

Open questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tracking: event-native log ingestion and exploration foundation #25

Description

Summary

Goals

Non-goals

Proposed event model

Design principles

Execution plan

Phase 1: Event schema

Phase 2: Ingestion foundation

Phase 3: Parser pipeline

Phase 4: Stable attribute extraction

Phase 5: Canonical normalization

Phase 6: Inference layer

Phase 7: Indexing and filtering

Phase 8: Minimal exploration UI

Phase 9: Quality and observability

Suggested milestone split

Milestone 1: Ingestion + schema

Milestone 2: Basic extraction

Milestone 3: Usable exploration

Acceptance criteria for v1

Open questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions