Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ Workspace Notes / Analyze
| `workspace add-log --topic <topic> <file>` | Add log file and rebuild patterns/notes |
| `workspace analyze --topic <topic> [question]` | Run AI analysis (`--acp claude|codex|gemini`) |

## Event Schema

The initial normalized event contract is defined in [proto/lapp/event/v1/event.proto](proto/lapp/event/v1/event.proto) and documented in [docs/event-schema-v1.md](docs/event-schema-v1.md). Representative fixtures live under `fixtures/events/v1/` for JSON, logfmt, `key=value`, and plain text logs.

## Development

```bash
Expand Down
102 changes: 102 additions & 0 deletions docs/event-schema-v1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Event Schema v1

`lapp` treats every parsed log record as a normalized event with three concerns kept separate:

- `text`: the raw source line that came in
- `attrs`: structured attributes parsed directly from the source
- `inferred`: metadata synthesized after parsing, such as a generalized pattern or owning entity

This keeps ingestion lossless while giving downstream steps a stable shape to work with even when log formats vary.

The canonical schema definition lives in `proto/lapp/event/v1/event.proto`. This document explains how to use that schema and how the JSON fixtures map onto it.

## Canonical Shape

### Protobuf

```proto
message Event {
google.protobuf.Timestamp ts = 1;
string text = 2;
map<string, string> attrs = 3;
Inferred inferred = 4;
}
```

```proto
message Fixture {
string name = 1;
SourceFormat source_format = 2;
string description = 3;
Event event = 4;
}
```

### JSON Fixture Encoding

```json
{
"ts": "2026-03-10T21:00:00Z",
"text": "ts=2026-03-10T21:00:00Z level=info service=auth-api request_id=req_123 msg=\"user user_456 authenticated\"",
"attrs": {
"level": "info",
"service": "auth-api",
"request_id": "req_123",
"msg": "user user_456 authenticated"
},
"inferred": {
"pattern": "user <*> authenticated",
"entity": "auth-api"
}
}
```

## Top-Level Fields

| Field | Type | Required | Notes |
|---|---|---|---|
| `ts` | RFC3339 timestamp string | No | Optional because plain text logs may not expose a trustworthy timestamp. |
| `text` | string | Yes | Raw log line, preserved verbatim as the source of truth. |
| `attrs` | object of string to string | Yes | Parsed key/value attributes extracted directly from the log line. Use `{}` when nothing can be extracted. |
| `inferred` | object | Yes | Metadata derived from parsing or later enrichment. Use `{}` when nothing is inferred yet. |

## Parsed Attributes

`attrs` stays intentionally flat in v1. Values are strings so the schema remains stable across JSON, logfmt, `key=value`, and plain text sources.

Recommended canonical keys when they can be recovered confidently:

| Key | Required | Meaning |
|---|---|---|
| `level` | No | Severity such as `debug`, `info`, `warn`, or `error`. |
| `service` | No | Service, worker, or subsystem name. |
| `env` | No | Deployment environment such as `prod` or `staging`. |
| `request_id` | No | Request-scoped identifier. |
| `trace_id` | No | Distributed trace identifier. |
| `span_id` | No | Distributed tracing span identifier. |
| `correlation_id` | No | Cross-system correlation token when `request_id` is not the right semantic fit. |
| `user_id` | No | User identifier present in the source line. |
| `endpoint` | No | HTTP or RPC target when available. |
| `method` | No | HTTP or RPC verb when available. |

Additional keys are allowed when they represent source fields that are useful to preserve.

## Inferred Metadata

`inferred` is reserved for values that are not copied verbatim from the source.

| Key | Required | Meaning |
|---|---|---|
| `pattern` | No | Generalized event template such as `user <*> authenticated`. |
| `entity` | No | Owning component, domain object, or actor inferred from context. |

## Fixture Coverage

Representative fixtures live in `fixtures/events/v1/`:

- `json-checkout-failure.json`
- `logfmt-auth-success.json`
- `key-value-retry.json`
- `plain-text-worker-stall.json`

Each fixture wraps a normalized event with `name`, `source_format`, and `description` metadata. This allows future parser and schema tests to consume them directly.
22 changes: 22 additions & 0 deletions fixtures/events/v1/json-checkout-failure.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"name": "json_checkout_failure",
"source_format": "json",
"description": "Structured JSON log with canonical request metadata and inferred service ownership.",
"event": {
"ts": "2026-03-10T21:00:00Z",
"text": "{\"ts\":\"2026-03-10T21:00:00Z\",\"level\":\"ERROR\",\"service\":\"payments-api\",\"env\":\"prod\",\"request_id\":\"req_123\",\"trace_id\":\"trace_456\",\"endpoint\":\"/checkout\",\"message\":\"checkout failed for user user_789\"}",
"attrs": {
"level": "error",
"service": "payments-api",
"env": "prod",
"request_id": "req_123",
"trace_id": "trace_456",
"endpoint": "/checkout",
"message": "checkout failed for user user_789"
},
"inferred": {
"pattern": "checkout failed for user <*>",
"entity": "payments-api"
}
}
}
22 changes: 22 additions & 0 deletions fixtures/events/v1/key-value-retry.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"name": "key_value_retry",
"source_format": "key_value",
"description": "Space-delimited key=value log line normalized into canonical service and correlation fields.",
"event": {
"ts": "2026-03-10T21:02:45Z",
"text": "timestamp=2026-03-10T21:02:45Z severity=WARN service_name=billing-worker environment=prod correlation_id=corr_123 tenant_id=tenant_456 action=retrying charge_id=ch_789",
"attrs": {
"level": "warn",
"service": "billing-worker",
"env": "prod",
"correlation_id": "corr_123",
"tenant_id": "tenant_456",
"action": "retrying",
"charge_id": "ch_789"
},
"inferred": {
"pattern": "retrying charge <*>",
"entity": "billing-worker"
}
}
}
23 changes: 23 additions & 0 deletions fixtures/events/v1/logfmt-auth-success.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"name": "logfmt_auth_success",
"source_format": "logfmt",
"description": "logfmt line with request and trace identifiers plus a normalized auth pattern.",
"event": {
"ts": "2026-03-10T21:01:12Z",
"text": "ts=2026-03-10T21:01:12Z level=INFO service=auth-api env=staging request_id=req_456 trace_id=trace_789 method=POST endpoint=/login msg=\"user user_123 authenticated\"",
"attrs": {
"level": "info",
"service": "auth-api",
"env": "staging",
"request_id": "req_456",
"trace_id": "trace_789",
"method": "POST",
"endpoint": "/login",
"msg": "user user_123 authenticated"
},
"inferred": {
"pattern": "user <*> authenticated",
"entity": "auth-api"
}
}
}
15 changes: 15 additions & 0 deletions fixtures/events/v1/plain-text-worker-stall.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"name": "plain_text_worker_stall",
"source_format": "plain_text",
"description": "Unstructured plain text log that keeps the raw line and only a small amount of inferred metadata.",
"event": {
"text": "ERROR worker pool stalled after 3 retries while draining queue payments",
"attrs": {
"level": "error"
},
"inferred": {
"pattern": "worker pool stalled after <*> retries while draining queue <*>",
"entity": "worker-pool"
}
}
}
118 changes: 118 additions & 0 deletions pkg/event/v1.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
package event

import (
"encoding/json"
"os"
"path/filepath"
"sort"
"strings"
"time"

goerrors "github.com/go-errors/errors"
)

const (
SourceFormatJSON = "json"
SourceFormatLogfmt = "logfmt"
SourceFormatKeyValue = "key_value"
SourceFormatPlainText = "plain_text"
)

var allowedSourceFormats = map[string]struct{}{
SourceFormatJSON: {},
SourceFormatLogfmt: {},
SourceFormatKeyValue: {},
SourceFormatPlainText: {},
}

// Event mirrors the canonical v1 schema in proto/lapp/event/v1/event.proto.
type Event struct {
Timestamp *time.Time `json:"ts,omitempty"`
Text string `json:"text"`
Attrs map[string]string `json:"attrs"`
Inferred *Inferred `json:"inferred"`
}

// Inferred contains metadata derived after parsing.
type Inferred struct {
Pattern string `json:"pattern,omitempty"`
Entity string `json:"entity,omitempty"`
}

// Fixture mirrors the protobuf fixture contract for JSON-backed examples.
type Fixture struct {
Name string `json:"name"`
SourceFormat string `json:"source_format"`
Description string `json:"description"`
Event Event `json:"event"`
}

// Validate checks that the fixture satisfies the documented v1 contract.
func (f Fixture) Validate() error {
if strings.TrimSpace(f.Name) == "" {
return goerrors.New("name is required")
}
if strings.TrimSpace(f.Description) == "" {
return goerrors.New("description is required")
}
if _, ok := allowedSourceFormats[f.SourceFormat]; !ok {
return goerrors.Errorf("validate source_format: invalid format %q, must be one of %s", f.SourceFormat, strings.Join(allowedSourceFormatNames(), ", "))
}
if strings.TrimSpace(f.Event.Text) == "" {
return goerrors.New("event.text is required")
}
if f.Event.Attrs == nil {
return goerrors.New("event.attrs is required")
}
if f.Event.Inferred == nil {
return goerrors.New("event.inferred is required")
}
for key := range f.Event.Attrs {
if strings.TrimSpace(key) == "" {
return goerrors.New("event.attrs contains an empty key")
}
}
return nil
}

func allowedSourceFormatNames() []string {
formats := make([]string, 0, len(allowedSourceFormats))
for format := range allowedSourceFormats {
formats = append(formats, format)
}
sort.Strings(formats)
return formats
}

// LoadFixtures reads all JSON fixture files from a directory.
func LoadFixtures(dir string) ([]Fixture, error) {
entries, err := os.ReadDir(dir)
if err != nil {
return nil, goerrors.Errorf("read fixtures dir: %w", err)
}

fixtures := make([]Fixture, 0, len(entries))
for _, entry := range entries {
if entry.IsDir() || filepath.Ext(entry.Name()) != ".json" {
continue
}

path := filepath.Join(dir, entry.Name())
data, err := os.ReadFile(path)
if err != nil {
return nil, goerrors.Errorf("read fixture %s: %w", path, err)
}

var fixture Fixture
if err := json.Unmarshal(data, &fixture); err != nil {
return nil, goerrors.Errorf("decode fixture %s: %w", path, err)
}
if err := fixture.Validate(); err != nil {
return nil, goerrors.Errorf("validate fixture %s: %w", path, err)
}

fixtures = append(fixtures, fixture)
}

return fixtures, nil
}
36 changes: 36 additions & 0 deletions pkg/event/v1_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
package event

import (
"path/filepath"
"testing"
)

func TestLoadFixtures(t *testing.T) {
dir := filepath.Join("..", "..", "fixtures", "events", "v1")
required := []string{
SourceFormatJSON,
SourceFormatLogfmt,
SourceFormatKeyValue,
SourceFormatPlainText,
}

fixtures, err := LoadFixtures(dir)
if err != nil {
t.Fatalf("LoadFixtures: %v", err)
}

if len(fixtures) != len(required) {
t.Fatalf("expected %d fixtures, got %d", len(required), len(fixtures))
}

seen := make(map[string]bool, len(required))
for _, fixture := range fixtures {
seen[fixture.SourceFormat] = true
}

for _, format := range required {
if !seen[format] {
t.Fatalf("missing fixture for %s", format)
}
}
}
Loading
Loading