Skip to content

Add lossless Decimal encoding and decoding#20

Draft
mattt wants to merge 15 commits into
mainfrom
mattt/decimal
Draft

Add lossless Decimal encoding and decoding#20
mattt wants to merge 15 commits into
mainfrom
mattt/decimal

Conversation

@mattt
Copy link
Copy Markdown
Owner

@mattt mattt commented May 22, 2026

Resolves #19.

This PR updates decoder logic to read numbers as raw text so Decimal and large integers decode with full precision instead of going through Double.

Resolves #19. Read numbers as raw text so Decimal and large integers decode with full precision instead of going through Double.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds lossless Decimal support by preserving JSON number text during decoding (avoiding intermediate Double conversion) and by encoding Decimal values directly as JSON numbers rather than using Decimal’s default keyed-container encoding.

Changes:

  • Add Decimal-specific encoding paths (top-level + keyed/unkeyed/single-value containers) that emit raw JSON numeric text.
  • Update decoder to read numbers as raw text (.numberAsRaw) and add Decimal decoding that parses from the preserved numeric text.
  • Add extensive Decimal encoding/decoding test coverage, including precision and boundary/overflow scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
Tests/YYJSONTests/EncoderTests.swift Adds YYJSONEncoder decimal encoding and round-trip precision tests.
Tests/YYJSONTests/DecoderTests.swift Adds YYJSONDecoder decimal decoding tests (including precision, exponents, and error cases).
Sources/YYJSON/Encoder.swift Implements Decimal encoding as raw JSON numbers (including top-level interception).
Sources/YYJSON/Decoder.swift Enables raw-number reading and implements Decimal parsing from raw numeric text; updates numeric decoding paths accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Tests/YYJSONTests/DecoderTests.swift
Comment thread Sources/YYJSON/Decoder.swift Outdated
Comment thread Sources/YYJSON/Decoder.swift Outdated
Comment thread Sources/YYJSON/Decoder.swift Outdated
Comment thread Sources/YYJSON/Encoder.swift Outdated
@mattt
Copy link
Copy Markdown
Owner Author

mattt commented May 22, 2026

This branch enables YYJSON_READ_NUMBER_AS_RAW in YYJSONDecoder, so every numeric value is parsed from its original text instead of via yyjson's native Int64/Double getters.

Benchmarking the performance cost:

Workload Before After Δ Δ % Per-number
[Int] × 10k 2 044 µs 2 195 µs +151 µs +7.4% +15 ns
[Double] × 10k 2 068 µs 2 542 µs +474 µs +22.9% +47 ns
[Mixed] × 10k1 3 963 µs 4 551 µs +588 µs +14.8% +20 ns
canada.coords (~200k doubles) 41 320 µs 50 070 µs +8 750 µs +21.2% +44 ns

The two double-heavy workloads (synthetic 10k and real-world 200k) agree on ~45 ns per double. Ints are cheaper at ~15 ns because Int("123") is faster than Double("0.1234567"). Mixed Codable shapes amortize to ~20 ns per number once the rest of the Codable machinery is paid.

vs. Foundation (absolute, after the change)

Workload YYJSON Foundation Ratio
[Int] × 10k 2 195 µs 1 797 µs 1.22× slower
[Double] × 10k 2 542 µs 2 034 µs 1.25× slower
[Decimal] × 10k2 9 765 µs 5 734 µs 1.70× slower
[Mixed] × 10k 4 551 µs 9 937 µs 2.18× faster
canada.coords 50 070 µs 35 717 µs 1.40× slower

Let's see if we can bring this down by using the fast-path when precision isn't needed.

Footnotes

  1. Struct of {String, Int, Double, Int}

  2. New capability — didn't decode at all on main.

mattt added 2 commits May 22, 2026 05:56
Replace the `String(decoding:as:) + Swift.Double(_:)` text round-trip
in the number-decoding helpers with direct `strtod`/`strtoll`/`strtoull`
calls on yyjson's null-terminated raw buffer.

Cuts per-number Codable overhead from ~47 ns to ~9 ns for doubles and
from ~15 ns to ~12 ns for ints, recovering most of the wall-clock cost
introduced by `YYJSON_READ_NUMBER_AS_RAW`. On a 200k-double real-world
payload the regression vs the native-getter path drops from +21% to +7%.

`yyNumberText` is retained for `Decimal` decoding and for the
number-to-string coercion path, which both genuinely need a `String`.
Introduce `YYJSONDecoder.NumberDecodingStrategy` with two cases:

- `.lossless` (default) preserves the original input text of every number
  via `YYJSON_READ_NUMBER_AS_RAW`, keeping `Decimal` and large-integer
  decoding exact. This matches `JSONDecoder`'s precision contract.
- `.fast` skips the raw-number flag and lets yyjson parse numbers as
  native `Int64`/`UInt64`/`Double`, restoring the library's pre-fix
  throughput at the cost of fractional `Decimal` precision and arbitrary
  integer range.

The existing extraction helpers (`yyParseDouble`, `yyParseSignedInt`,
`yyParseUnsignedInt`, `yyNumberText`) already branch on whether the
value is stored as raw text or as a parsed number, so no further plumbing
is required: the strategy simply toggles whether the raw flag is set.

`.fast` recovers within ~2% of pre-fix throughput on number-heavy
payloads (10k double array: 2179 µs lossless → 2024 µs fast; 200k double
GeoJSON coordinate decode: 43.8 ms lossless → 41.0 ms fast).
@mattt
Copy link
Copy Markdown
Owner Author

mattt commented May 22, 2026

Alright, I think we can have our cake and eat it too!

With these changes, users now get correct behavior by default at a 6–10% penalty. Users with number-heavy workloads have a one-line opt-out (decoder.numberDecodingStrategy = .fast) that gets them all the speed back.

### NumberDecode/DecimalArray/Foundation
| Time (wall clock) (μs) * |      4334 |      4485 |      4559 |      4690 |      4858 |      5050 |      5327 |       218 |
### NumberDecode/DecimalArray/YYJSON
| Time (wall clock) (μs) * |      6020 |      6230 |      6369 |      6570 |      6754 |      7193 |      7342 |       156 |
### NumberDecode/DoubleArray/Foundation
| Time (wall clock) (μs) * |      1613 |      1736 |      1777 |      1848 |      2007 |      2173 |      2331 |       552 |
### NumberDecode/DoubleArray/YYJSON
| Time (wall clock) (μs) * |      2038 |      2132 |      2179 |      2236 |      2376 |      2533 |      2622 |       454 |
### NumberDecode/DoubleArray/YYJSON-fast
| Time (wall clock) (μs) * |      1874 |      1979 |      2024 |      2094 |      2220 |      2376 |      2505 |       487 |
### NumberDecode/IntArray/Foundation
| Time (wall clock) (μs) * |      1555 |      1632 |      1668 |      1719 |      1821 |      2086 |      2223 |       591 |
### NumberDecode/IntArray/YYJSON
| Time (wall clock) (μs) * |      1987 |      2111 |      2157 |      2214 |      2386 |      2587 |      3955 |       456 |
### NumberDecode/IntArray/YYJSON-fast
| Time (wall clock) (μs) * |      1868 |      1969 |      2015 |      2091 |      2230 |      2390 |      2692 |       489 |
### NumberDecode/MixedArray/Foundation
| Time (wall clock) (μs) * |      9354 |      9798 |      9978 |     10117 |     10314 |     10650 |     11391 |       101 |
### NumberDecode/MixedArray/YYJSON
| Time (wall clock) (μs) * |      4067 |      4215 |      4313 |      4436 |      4542 |      4714 |      4862 |       231 |
### NumberDecode/MixedArray/YYJSON-fast
| Time (wall clock) (μs) * |      3658 |      3813 |      3908 |      4076 |      4297 |      4760 |      6137 |       252 |
### NumberDecode/canada.coords/Foundation
| Time (wall clock) (μs) * |     35140 |     35389 |     35652 |     36012 |     36405 |     36593 |     36593 |        28 |
### NumberDecode/canada.coords/YYJSON
| Time (wall clock) (μs) * |     42122 |     43418 |     43778 |     44106 |     44466 |     44728 |     44728 |        23 |
### NumberDecode/canada.coords/YYJSON-fast
| Time (wall clock) (μs) * |     39928 |     40665 |     40960 |     41288 |     41943 |     42629 |     42629 |        25 |

Working through Copilot feedback and updating the README.

mattt added 4 commits May 22, 2026 12:07
Range-check Double fallbacks with `T(exactly: d.rounded(.towardZero))`
instead of comparing against `Double(T.min)`/`Double(T.max)`, which round
for 64-bit bounds and could admit out-of-range values that then trap on
`T(d)`.

Switch `strtoll`/`strtoull` to base 0 so JSON5 hex literals like `0xFF`,
preserved as raw text under `YYJSON_READ_NUMBER_AS_RAW`, decode as
integers instead of failing.
`Decimal(string:)` uses the host's current locale by default, which can
mis-parse JSON numbers under locales whose decimal separator is `,`.
Pin every JSON-text-to-`Decimal` conversion (decoder, value accessor,
serialization) to `en_US_POSIX` so parsing matches JSON's locale-
independent `.` separator.
`Decimal.description` uses the host's current locale and can emit `,` as
the decimal separator, producing invalid JSON in locales like de_DE.
Render through `NSDecimalNumber.description(withLocale:)` pinned to
`en_US_POSIX` so the encoded number always uses `.`.
`Decimal(string:)` returns an optional, but the assertions compared the
non-optional decoded value against it via implicit promotion, which
silently passes if construction ever returns nil. Force-unwrap the
expected value so a nil construction now fails the test with a clear
trap instead of masking the regression.
The non-raw fallback used `yyjson_val_write` to format parsed numbers,
which broke linking under the `noWriter` trait. Format the value through
the typed getters (`yyjson_get_sint`/`yyjson_get_uint`/`yyjson_get_real`)
and Swift's locale-independent `String` initializers, which already
produce shortest round-trippable representations. This also removes the
`malloc`/`free` from the hot path.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Comment thread Sources/YYJSON/Decoder.swift
Comment thread Sources/YYJSON/Serialization.swift
Comment thread Sources/YYJSON/Value.swift
Comment thread Sources/YYJSON/Decoder.swift
Comment thread Tests/YYJSONTests/DecoderTests.swift
Comment thread Tests/YYJSONTests/EncoderTests.swift Outdated
Comment thread Tests/YYJSONTests/EncoderTests.swift Outdated
Comment thread Tests/YYJSONTests/ValueTests.swift Outdated
Comment thread Tests/YYJSONTests/SerializationTests.swift
Comment thread Sources/YYJSON/Decoder.swift Outdated
mattt added 5 commits May 26, 2026 04:00
Three near-identical `decodeDecimal(from:path:)` methods lived on the
keyed, unkeyed, and single-value containers, each routing numeric values
through `yyNumberText` and `Decimal(string:locale:)` with the same error
text. Extract a file-scope `yyDecodeDecimal` (and a shared `yyTypeString`
for diagnostics) and have every container delegate to it, eliminating
the duplication so future tweaks land in one place.
Under `.lossless` decoding (or `YYJSONSerialization`, which always
preserves number text), JSON5 hex literals like `0xFF` arrive as raw
text. The existing raw-number parsers handled decimal-only forms, so
hex (and `Infinity`/`NaN`) decoded into `Double`/`Foundation` returned
nil and silently became a type mismatch or `NSNull`.

- `yyParseDouble` now tries `strtoll`/`strtoull` with base 0 first so
  hex integers convert to `Double`; `strtod` continues to handle the
  fractional, exponential, and non-finite forms.
- `YYJSONValue.number` routes its `.numberRaw` case through
  `yyParseDouble`, picking up hex and non-finite spellings.
- `YYJSONSerialization` adds `yyParseStrictInteger` (whole-text
  `strtoll`/`strtoull` with base 0) for the `NSNumber` integer paths so
  hex maps to `NSNumber(Int)` while fractional text still falls through
  to `Decimal`/`Double` instead of being truncated to `0`.
Several Decimal precision tests interpolated `\(decimal)` (or compared
against `Decimal.description`) when building or asserting on JSON, both
of which use the host's current locale and emit `,` as the decimal
separator on locales like de_DE. That produces invalid JSON / mismatched
expectations and made these tests locale-flaky.

Render Decimals through `NSDecimalNumber(decimal:).description(
withLocale:)` pinned to `en_US_POSIX` in the decoder, encoder, value,
and serialization precision tests so the generated text always uses `.`,
matching JSON's locale-independent format and the encoder's own POSIX
output.
The original test iterated 0.00 → 99.99 by 0.01, performing ~10000
encode+decode roundtrips per run. Replace the sweep with a small
curated sample (zero, fractional, signed, and high-precision boundary
values) that still exercises the precision guarantee without paying
the 10k iteration cost on every CI run.
The doc claimed integers outside the `Int64`/`UInt64` range "fail to
decode, even into `Decimal`", but `.fast` actually parses them through
`Double` and decodes into `Decimal` with `Double` precision rather than
throwing. Rewrite the bullet to describe the precision loss so callers
aren't surprised when an oversized integer silently rounds instead of
raising an error.
@mattt mattt requested a review from Copilot May 26, 2026 11:25

This comment was marked as resolved.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comment on lines 1 to +12
import Cyyjson
import Foundation

#if !YYJSON_DISABLE_READER

// MARK: - Helper Functions

/// Locale used to parse JSON numbers into `Decimal`. JSON numbers always use
/// `.` as the decimal separator regardless of the host's user locale, so we
/// pin parsing to POSIX to avoid mis-decoding under locales that use `,`.
private let yyPOSIXLocale = Locale(identifier: "en_US_POSIX")

Comment on lines +86 to +106
func yyDecodeDecimal(from value: UnsafeMutablePointer<yyjson_val>?, path: String) throws -> Decimal {
guard let value = value else {
throw YYJSONError.missingValue(path: path)
}
guard yyIsNumeric(value) else {
throw YYJSONError.typeMismatch(
expected: "number",
actual: yyTypeString(value),
path: path
)
}
guard let string = yyNumberText(value),
let decimal = Decimal(string: string, locale: yyPOSIXLocale)
else {
throw YYJSONError.invalidData(
"Could not parse number as Decimal",
path: path
)
}
return decimal
}
Comment on lines 1 to +37
import Cyyjson
import Foundation

/// Locale used to parse JSON numbers into `Decimal`. JSON numbers always use
/// `.` as the decimal separator regardless of the host's user locale, so we
/// pin parsing to POSIX to avoid mis-decoding under locales that use `,`.
private let yyPOSIXLocale = Locale(identifier: "en_US_POSIX")

#if !YYJSON_DISABLE_READER

/// Parses a JSON numeric literal as a fixed-width integer.
///
/// Accepts plain decimal integers (including a leading sign) and the
/// JSON5 hex spellings (`0xFF`, `-0X10`, `+0x2A`) preserved as raw text
/// under `YYJSON_READ_NUMBER_AS_RAW`. Returns `nil` for fractional or
/// exponential text so callers can fall through to `Decimal`/`Double`
/// instead of silently truncating through `Double`.
@inline(__always)
fileprivate func yyParseStrictInteger<T: FixedWidthInteger>(_ text: String) -> T? {
return text.withCString { ptr -> T? in
let len = strlen(ptr)
guard len > 0 else { return nil }
var end: UnsafeMutablePointer<CChar>?
errno = 0
if T.isSigned {
let v = strtoll(ptr, &end, 0)
guard errno == 0, let e = end, ptr.distance(to: UnsafePointer(e)) == Int(len)
else { return nil }
return T(exactly: v)
}
if ptr.pointee == 0x2D /* '-' */ { return nil }
let v = strtoull(ptr, &end, 0)
guard errno == 0, let e = end, ptr.distance(to: UnsafePointer(e)) == Int(len)
else { return nil }
return T(exactly: v)
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decimal decoding expects object, not number?

2 participants