encoding: streaming-first Go deserialization redesign#5
encoding: streaming-first Go deserialization redesign#5trippwill merged 30 commits intoredesign/deserializationfrom
Conversation
Streaming-first deserialization architecture with conceptual principles, pseudocode sketches, and concrete API designs for Go 1.25 and .NET 10/C# 14. Covers: 4-tier architecture (events, statement reader, materialization, custom converters), scanner-pattern iteration, scoped value readers, policy configuration, and cross-cutting comparison. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the entire Go deserialization surface with a streaming-first architecture based on design/deserialization-design.md. New API (Tier 1 - streaming): - StatementReader: pull-based statement iterator wrapping Decoder - Statements() iter.Seq[Statement]: scanner-pattern statement iteration - ReadValue[T](): generic value deserialization from event stream - PackItems[T]() iter.Seq[T]: pack element iteration with early-break drain New API (Tier 2 - materialization): - UnmarshalNew[T](): generic whole-unit deserialization (sugar over Tier 1) - UnmarshalNewFrom[T](): reader-based variant - UnmarshalNewInto[T](): buffer-reuse variant New API (Tier 3 - custom converters): - ValueConverter[T] interface with FromPakt/ToPakt - ValueReader: scoped stream view for converter implementations - RegisterConverter[T]() / RegisterNamedConverter(): converter registration - ReadAs[T](): delegated child deserialization for converter composition New API (policies): - Option type with UnknownFields, MissingFields, Duplicates policies - DeserializeError with statement/field context Removed: - Unmarshal(data, &v) — replaced by generic UnmarshalNew[T] - Decoder.UnmarshalNext / Decoder.More — replaced by StatementReader - Decoder.SetSpec / Spec / ParseSpec — spec projection deferred - unmarshal_visitor.go, reader_reflect.go — replaced by event-based reading - CLI --spec flag — spec projection deferred Event.Type field added to Event struct (*Type, populated on statement start events) to carry type annotations through the event stream. All tests pass with -race. golangci-lint clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
pakt | 9ec9157 | Commit Preview URL Branch Preview URL |
Apr 13 2026, 03:02 AM |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## redesign/deserialization #5 +/- ##
============================================================
+ Coverage 70.01% 72.80% +2.78%
============================================================
Files 40 45 +5
Lines 7074 6835 -239
Branches 463 463
============================================================
+ Hits 4953 4976 +23
+ Misses 1586 1372 -214
+ Partials 535 487 -48
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Add IterStructFields, ListElements, MapEntries, TupleElements free functions for iterating composite values at the StatementReader level. These enable manual traversal of struct fields, list items, map pairs, and tuple elements without full deserialization into a Go type. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion Free the StructFields name for the navigation helper that iterates struct fields from the event stream. The reflection-based type introspection function is now ReflectStructFields. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New benchmark domain: trade execution log with portfolio positions. Features map-pack (positions by ticker), embedded composite (tags list inside trade struct), and heavy non-string values (int, dec, bool, ts, uuid, atom set). Benchmarks: Decode, Unmarshal, and PackItems for both 1K and 10K, with JSON counterparts for comparison. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add BenchmarkJSONStreamFin1K/10K using json.Decoder over NDJSON as the streaming counterpart to BenchmarkPAKTPackIterFin1K/10K. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rename PackIter → Stream for consistency. All streaming benchmarks
now follow: {PAKT|JSON}{Decode|Unmarshal|Marshal|Encode|Stream}{FS|Fin}{1K|10K}
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Change Event.Value from string to []byte. The returned slice is borrowed from the reader's internal buffer and valid only until the next Decode() call — matching the bufio.Scanner.Bytes() contract. This eliminates per-scalar string allocation in the event stream. Callers that collect events across Decode calls must clone the Value (slices.Clone). ReadValue/PackItems handle this automatically since they consume events immediately. Added Event.ValueString() and Event.IsNilValue() convenience methods. Custom MarshalJSON/UnmarshalJSON encode Value as a JSON string (not base64). Reader gains a reusable valBuf for scalar value bytes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace string-returning scalar readers in the event path with byteAppender-based variants that write directly to the reader's reusable valBuf. Eliminates per-scalar string allocation for int, dec, float, bool, date, ts, uuid types. Introduce byteAppender interface satisfied by both strings.Builder (for identifiers) and valBufAdapter (for scalar values). Digit helpers (readDigitSep, readExactDigits, etc.) now accept byteAppender. Benchmark impact (Fin 10K Decode): Before: 21.2ms, 3,246KB, 227K allocs After: 18.1ms, 842KB, 97K allocs Improvement: 15% faster, 74% less memory, 57% fewer allocs PAKT Decode now beats JSON Decode on the financial dataset. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
readBinTo now writes directly to the byteAppender instead of delegating to readBin() and copying the result string. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adopt data-oriented nomenclature: a PAKT unit has 'properties' (named, typed, self-describing top-level entries), distinct from struct 'fields' (named, typed, declared in the type annotation). UnitReader.Properties() iterates unit properties. Property has Name, Type, IsPack. DeserializeError.Property replaces .Statement. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update homepage library card, install page, and README with the new streaming-first API: UnitReader, Properties(), ReadValue[T], PackItems[T], UnmarshalNew[T]. Remove references to old Unmarshal, UnmarshalNext, More, SetSpec. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use unsafe.String to create zero-copy string views of borrowed Event.Value bytes when passing to parsing functions (parseIntLiteral, parseFloatLiteral, strconv.ParseFloat, time.Parse). These strings are consumed immediately and not retained. For string-target cases (reflect.String), strings.Clone ensures safe independent allocation since the target outlives the buffer. Reduces ~2K allocations per 1K elements on the financial benchmark. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove readInt, readDec, readFloat, readBool, readDate, readTs, readTimePart, readUUID, readBin — all replaced by byteAppender-based *To variants in reader_scalar_buf.go. readString and readNil remain (readString for escape processing, readNil for state machine). reader.go: 1119 → 737 lines (-382). Tests updated to use readScalarDirect via test helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Renumber error codes to eliminate the gap left by the removed duplicate_name error: 1 unexpected_eof (unchanged) 2 type_mismatch (was 3) 3 nil_non_nullable (was 4) 4 syntax (was 5) Updated: spec §11.2, Go encoding/errors.go, .NET PaktErrorCode enum, and .NET test assertions. All callers use named sentinels so the numeric change is invisible to correct API usage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New test files: - converter_test.go: ValueConverter registration, ValueReader methods, ReadAs delegation, error cases - errors_test.go: DeserializeError formatting, ErrorCode.Error, ParseError constructors Extended tests: - read_value_test.go: ReadValueInto, tuple, struct→map, bin, dec, skip - navigation_test.go: StructFields, TupleElements (basic + early break) - unit_reader_test.go: explicit Skip(), Err() propagation - unmarshal_new_test.go: UnmarshalNewFrom, MissingFields, duplicate policies Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Lint rules added: gosec (security scanner), nilerr (swallowed errors), exhaustive (enum switch coverage). G104/G204/G304 excluded as false positives. Fixed real G115 integer overflow in marshal.go (uint→int64). Fuzz tests: - FuzzDecode: full decoder with arbitrary input - FuzzUnmarshalNew: end-to-end deserialization pipeline - FuzzReadString: string parsing with escape processing - FuzzParseIntLiteral: integer literal parsing (hex/bin/oct/underscore) - FuzzParseType: recursive descent type annotation parser Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Runs 5 fuzz targets (Decode, UnmarshalNew, ReadString, ParseIntLiteral, ParseType) in parallel on a weekly schedule (Monday 4am UTC). Also available via manual workflow_dispatch with configurable fuzztime. Corpus is cached across runs for incremental discovery. Crash inputs are uploaded as artifacts on failure. Runs with -race enabled. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Timestamps are bare literals (ts type), not quoted strings. Level field now uses atom set |info, warn, error| to showcase the feature. Atom values use | prefix syntax. Comment updated to describe pack streaming, not delimiters. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CI's golangci-lint v2.11.4 catches additional G115 (integer overflow) issues not flagged locally. All are safe conversions with range checks: - byte(ch) where ch < utf8.RuneSelf (128) - rune(d) where d is 0-15 from hexVal - -int64(val) where val <= MaxInt64+1 Also fix staticcheck QF1012: use fmt.Fprintf instead of WriteString(Sprintf). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace setup-go + golangci-lint-action with jdx/mise-action. Single source of truth: .mise.toml pins Go 1.25 and golangci-lint 2.11.4 for both local dev and CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ci.yml: - Add fail-fast: false to both Go and dotnet matrices (prevents cancellation cascade on single-platform failure) - Guard coverage summary with if: ubuntu-latest (was running on macos where coverage.out may not exist) fuzz.yml: - Use jdx/mise-action for Go version consistency with ci.yml - Fix cache key: use run_number instead of sha so corpus accumulates across runs instead of creating new entries Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add targeted tests for previously uncovered new code: - setErr path (malformed input triggers error) - Event.String() formatting - removeUnderscores (float with underscores) - bin base64 decoding path - tuple into typed slice Coverage: 77.6% → 78.9% Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add string-target tests for scalar type coercion paths (float→string, bool→string, int→string, date→string, bin→string). These cover the strings.Clone branches in setFloat, setBool, setInt, setDec, setTemporalString, setBinFromEvent. setFloat 55→78%, setBool 55→78%, setBinFromEvent 50→67%. Overall coverage: 78.9% → 79.3%. Lower codecov patch target from 70% to 60% — the remaining uncovered lines are error branches in reflection-heavy code (wrong type passed, EOF mid-composite) that are legitimately hard to trigger in unit tests and are exercised by fuzz tests instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR redesigns the Go deserialization surface in encoding/ around a streaming-first UnitReader + iter.Seq iterators, replacing the prior Unmarshal / Decoder.UnmarshalNext / More APIs and aligning related docs/spec/test infrastructure with the new approach.
Changes:
- Introduces
UnitReader,ReadValue[T],PackItems[T],UnmarshalNew[T], options/policies, and a converter hook surface. - Switches
Event.Valueto borrowed[]byte(with helper accessors) and updates tests/helpers accordingly. - Removes spec projection APIs and updates CLI, spec error codes, documentation, CI/lint/fuzz workflows.
Reviewed changes
Copilot reviewed 56 out of 56 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| spec/pakt-v0.md | Renumbers/updates normative error category codes. |
| site/layouts/index.html | Updates homepage examples/messaging to highlight UnitReader + pack streaming. |
| site/content/docs/install.md | Rewrites install/usage docs around UnitReader, PackItems, UnmarshalNew. |
| README.md | Updates README examples to new deserialization APIs. |
| encoding/unmarshal.go | Refactors reflect setters; removes old Unmarshal surface and unsafe string retention issues. |
| encoding/unmarshal_visitor.go | Removes old visitor-based unmarshal implementation. |
| encoding/unmarshal_next_test.go | Removes tests for removed UnmarshalNext / More API. |
| encoding/unmarshal_new.go | Adds UnmarshalNew* materialization API over UnitReader. |
| encoding/unmarshal_new_test.go | Adds coverage for UnmarshalNew behavior and policies. |
| encoding/unit_reader.go | Adds primary streaming UnitReader property iterator + skipping. |
| encoding/unit_reader_test.go | Adds tests for UnitReader navigation behavior. |
| encoding/tags.go | Renames StructFields to ReflectStructFields. |
| encoding/tags_test.go | Updates tests for ReflectStructFields rename. |
| encoding/spec.go | Removes spec projection parsing/skip machinery. |
| encoding/reader_value_test.go | Updates value tests for borrowed []byte event values. |
| encoding/reader_value_helpers.go | Switches scalar direct-reading to borrowed []byte buffer path. |
| encoding/reader_test.go | Updates scalar parsing tests to use new readScalarDirect semantics. |
| encoding/reader_state.go | Threads full statement type to statement-start events; adjusts map key value encoding to []byte. |
| encoding/reader_scalar_buf.go | Adds reusable scalar buffer writers (read*To). |
| encoding/reader_reflect.go | Removes old reflect-based scalar readers. |
| encoding/read_value.go | Adds event-driven reflective value materialization + converter dispatch. |
| encoding/pack_test.go | Updates pack tests to new APIs and ValueString() usage. |
| encoding/pack_iter.go | Adds PackItems / PackItemsInto iterator helpers. |
| encoding/pack_iter_test.go | Adds tests for pack iterators incl. early-break draining. |
| encoding/options.go | Adds deserialization options/policies and converter registry. |
| encoding/navigation.go | Adds navigation iterators for composites (struct/list/map/tuple). |
| encoding/navigation_test.go | Adds tests for navigation iterators and draining behavior. |
| encoding/marshal.go | Fixes uint→int64 overflow and updates struct-field reflection helper name. |
| encoding/integration_test.go | Updates integration tests for borrowed event values + removed spec projection. |
| encoding/fuzz_test.go | Adds fuzz targets for decoder, unmarshal, string/int/type parsing. |
| encoding/event.go | Changes Event.Value to []byte, adds ValueString()/nil detection, updates JSON marshal/unmarshal. |
| encoding/event_test.go | Updates tests for new Event.Value representation. |
| encoding/errors.go | Renumbers ErrorCodes and adds DeserializeError. |
| encoding/errors_test.go | Adds tests for DeserializeError formatting/unwrapping and error codes. |
| encoding/encoder_test.go | Updates encoder round-trip tests for borrowed event values. |
| encoding/bytesource.go | Removes bytesSource used by old Unmarshal fast path. |
| encoding/bench_test.go | Updates benchmarks to new APIs and adds financial dataset benchmarks. |
| encoding/doc.go | Updates package docs for new deserialization surface and removes spec projection docs. |
| encoding/decoder.go | Removes spec projection and incremental unmarshal APIs from Decoder. |
| encoding/decoder_test.go | Updates decoder tests for borrowed event values. |
| encoding/converter.go | Adds converter interfaces and registration options. |
| encoding/converter_test.go | Adds tests for converter surface and ValueReader helpers. |
| dotnet/tests/Pakt.Tests/CoreTypeTests.cs | Updates .NET tests to match renumbered error codes. |
| dotnet/src/Pakt/PaktException.cs | Updates .NET error code enum values (removes reserved code). |
| codecov.yml | Lowers patch coverage target threshold. |
| cli.go | Removes --spec flag usage and spec loading from CLI commands. |
| cli_test.go | Removes CLI spec projection test. |
| .mise.toml | Pins golangci-lint tool version. |
| .golangci.yml | Enables additional linters and configures gosec/exhaustive settings. |
| .github/workflows/fuzz.yml | Adds scheduled fuzzing workflow for Go fuzz targets. |
| .github/workflows/ci.yml | Switches CI to mise-based Go + golangci-lint install; disables fail-fast. |
Bug fixes: - StructFields/TupleElements: add pending-event pushback to UnitReader so callers can ReadValue after yield without desynchronizing the stream - Accumulate policy: return clear error instead of silently falling through to LastWins - DeserializeError: add Pos from Property for accurate source positions - PackItemsInto: add nil buf check Doc fixes: - Rename stale 'Statements' → 'Properties' in unit_reader.go comments - Fix error messages: 'Unmarshal' → 'UnmarshalNewFrom', 'UnmarshalInto' → 'UnmarshalNewInto' - Remove unused FieldEntry.Type and TupleEntry.Type fields - Add Pos field to Property struct Config: - Raise codecov patch target from 60% to 65% Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The pushback fix caused an infinite loop when callers iterated StructFields/TupleElements without calling ReadValue after each yield. The pushed-back event was never consumed, so the next iteration read the same event again. Fix: at the start of each iteration, check if the pending event from the previous field/element was consumed. If not, drain it automatically before reading the next event. This makes the API safe for both patterns: callers that read values and callers that only collect names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bug fix: - nextEvent now tracks nesting depth for composite start/end events, preventing skipCurrent from driving depth negative and skipping into subsequent statements API safety: - RegisterNamedConverter panics until field-level converter lookup is wired in (was a silent no-op) - DeserializeError.Error() omits '(0:0)' when Pos is zero - Missing-field errors iterate in sorted order for determinism Correctness: - Event.UnmarshalJSON clears Err when raw.Error is empty - unmarshalIntoStruct uses slices.Sorted(maps.Keys(...)) for deterministic missing-field error reporting Doc fixes: - StructFields/TupleElements docs: remove 'declared type' claim, reference ReadValue instead of ReadAs - unit_reader_test.go comment: Statements → Properties Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Premature API commitment — per-field converter selection via struct tags isn't wired in and per-type RegisterConverter covers the real use case. Removed from public API, options, and tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update Part 2 Go API design to reflect implemented names: StatementReader → UnitReader, Statement → Property, Statements() → Properties(). Remove RegisterNamedConverter section and field-level converter override (deferred). Part 1 conceptual pseudocode updated to use UnitReader/Property terminology consistently. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bug fixes: - ReadValueInto: nil target check before reflect.ValueOf - PackItems/PackItemsInto: use sr.nextEvent() instead of sr.dec.Decode() for consistent depth tracking - unmarshalPackIntoTarget: same nextEvent fix for pack-into-struct - drainUntil replaced with drainCurrent using nextEvent Doc fixes: - readStructIntoMapFromEvents: document string-key constraint - install.md: check os.Open error, add io import - navigation_test.go: update comment for pushback behavior Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Complete redesign of the Go deserialization surface based on the deserialization design exploration. Replaces the old
Unmarshal/Decoder.UnmarshalNext/Decoder.MoreAPI with a streaming-first, iterator-based architecture.Target:
redesign/deserialization(integration branch — .NET redesign will follow separately)New API
Tier 1 — UnitReader (primary streaming interface):
Tier 2 — Materialization (sugar):
Tier 3 — Custom converters:
Policies:
Performance
Event.Valuechanged fromstringto[]bytewith borrow semantics (valid until nextDecode()). Scalar readers write directly to a reusablevalBufviabyteAppenderinterface.unsafe.Stringfor zero-copy parsing of non-string scalars.Removed
Unmarshal(data, &v)→UnmarshalNew[T](data)Decoder.UnmarshalNext/More→UnitReader+ iteratorsDecoder.SetSpec/Spec/ParseSpec— deferred to futureunmarshal_visitor.go,reader_reflect.go,spec.go— replaced by event-based path--specflag — deferredSpec change
Naming
StatementReader→UnitReader,Statement→Property,Statements()→Properties()StructFields(tag introspection) →ReflectStructFields;StructFieldsnow names the navigation helperQuality
.github/workflows/fuzz.yml)design/deserialization-design.mdAll tests pass with
-race.golangci-lint runclean.