[java] Tighten JSON number lexer to match RFC 8259 by shs96c · Pull Request #17739 · SeleniumHQ/selenium

shs96c · 2026-07-02T10:36:54Z

nextNumber() collected any of - + 0-9 . e E and handed the string to Long.valueOf / BigDecimal. It accepted several spec-invalid forms:

+5 (leading plus)
01, 007 (leading zeros)
.5 (no digit before decimal)
5. (no digit after decimal)
1e, - (sign / exponent without digits)

Very large exponents (1e9999) also silently produced Double.POSITIVE_INFINITY, which JSON cannot represent.

We now:

Have replaced the collector with a state machine that follows number = [minus] int [frac] [exp] (RFC 8259 §6).
Reject non-finite double values with a clear message.
Drop + from peek()'s NUMBER classifier — it never was a valid JSON number start.

The previous nextNumber() collected any subset of '- + 0-9 . e E' and then handed the string to Long.valueOf or BigDecimal. That accepted several spec-invalid forms: '+5' (leading plus), '01' (leading zero), '.5' (no digit before decimal), '5.' (no digit after decimal), '1e' (exponent without digits), '-' alone. Very large exponents also silently produced Double.POSITIVE_INFINITY, for which JSON has no representation. Replace the collector with a proper state machine that follows the grammar 'number = [minus] int [frac] [exp]', and reject non-finite doubles. Also drop '+' from the peek() classifier so it no longer claims to see a JSON number.

qodo-code-review · 2026-07-02T13:13:25Z

PR Summary by Qodo

Tighten JsonInput number lexing to RFC 8259 grammar and reject non-finite doubles

🐞 Bug fix 🧪 Tests 🕐 20-40 Minutes

AI Description

• Replace permissive number token collection with an RFC 8259 §6 state machine.
• Reject spec-invalid numeric forms (leading '+', leading zeros, missing digits).
• Fail fast on double overflow/NaN and add regression coverage for valid/invalid cases.

Diagram

graph TD
  T["JsonInputTest"] --> N["JsonInput.nextNumber()"] --> SM{"RFC 8259 number grammar"} --> OUT["Number (Long/double)"]
  N --> IN[("CharInput peek/read")]
  SM --> EX{{"JsonException"}}

  subgraph Legend
    direction LR
    _code["Code"] ~~~ _io[("Input")] ~~~ _decision{"Decision"} ~~~ _err{{"Error"}}
  end

High-Level Assessment

The following are alternative approaches to this PR:

1. Regex/grammar pre-check before parsing

➕ Compact validation logic that can be easier to audit against RFC grammar
➕ Keeps parsing (Long/BigDecimal) separate from validation concerns
➖ Harder to produce precise, localized error messages during streaming reads
➖ Still needs careful integration with incremental input consumption

2. Always parse as BigDecimal and validate by re-serializing

➕ Avoids double overflow concerns by keeping arbitrary precision longer
➕ Potentially simplifies numeric type decisions
➖ Behavioral change: callers currently receive Long for integers and double for decimals/exponents
➖ Re-serialization/normalization does not reliably detect all original lexical invalid forms (e.g., leading zeros) without extra checks

3. Delegate to a standard JSON parser for tokenization

➕ Leverages mature RFC-compliant number lexer implementations
➕ Reduces maintenance surface area for lexer correctness
➖ Likely invasive change to JsonInput streaming API and existing call sites
➖ Adds dependency/behavioral differences not aligned with current lightweight parser

Recommendation: The PR’s state-machine approach is the best fit for a streaming lexer: it enforces RFC 8259 at read time, preserves existing return-type behavior (Long fast-path vs double), and provides targeted failures for invalid forms and non-finite doubles. Alternatives either complicate streaming consumption (regex) or require broader API/behavior changes (BigDecimal-only or external parser).

Files changed (2) +115 / -36

Bug fix (1) +67 / -36

JsonInput.javaEnforce RFC 8259 number grammar and reject non-finite doubles +67/-36

Enforce RFC 8259 number grammar and reject non-finite doubles

• Removes '+' from NUMBER detection in peek(). Replaces the permissive character collector in nextNumber() with a grammar-driven state machine that rejects leading zeros, missing digits around '.'/exponent, and missing integer parts. Adds a finite-check for BigDecimal-to-double conversion and introduces a small digit helper.

java/src/org/openqa/selenium/json/JsonInput.java

Tests (1) +48 / -0

JsonInputTest.javaAdd coverage for spec-invalid numbers and double overflow +48/-0

Add coverage for spec-invalid numbers and double overflow

• Introduces tests that assert JsonInput rejects previously-accepted RFC-invalid numeric lexemes and still accepts valid forms. Adds an explicit regression test ensuring extreme exponents that overflow to Infinity are rejected with an out-of-range error message.

java/test/org/openqa/selenium/json/JsonInputTest.java

qodo-code-review · 2026-07-02T13:16:21Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📜 Skill insights (0)

Context used

✅ Compliance rules (platform): 18 rules

1. ~~EOF shown as U+FFFF~~ ✓ Resolved 🐞 Bug ◔ Observability

Description

In nextNumber(), several exception messages cast input.peek() (an int) to char; when the stream is
exhausted this renders EOF (-1) as '\uFFFF', producing misleading diagnostics for inputs like "-",
"5.", or "1e".
This is especially confusing because the codebase explicitly treats U+FFFF as a valid literal
character, so the error text can look like it saw a real character rather than end-of-input.

Code

java/src/org/openqa/selenium/json/JsonInput.java[R234-247]

+    int first = input.peek();
+    if (first == '0') {
+      builder.append((char) input.read());
+      // Leading zeros ("00", "01", ...) are not allowed.
+      if (isDigit(input.peek())) {
+        throw new JsonException("Leading zeros are not permitted in JSON numbers. " + input);
      }
-    } while (read);
+    } else if (first >= '1' && first <= '9') {
+      while (isDigit(input.peek())) {
+        builder.append((char) input.read());
+      }
+    } else {
+      throw new JsonException("Expected digit but saw '" + (char) first + "'. " + input);
+    }

Evidence
Input.peek() returns Input.EOF (-1) at end-of-input, and nextNumber() now casts these ints to
char inside error messages, which turns -1 into \uFFFF. The test suite also explicitly asserts
that literal U+FFFF is valid input, so reporting EOF as \uFFFF is ambiguous and confusing during
debugging.
java/src/org/openqa/selenium/json/JsonInput.java[223-281]
java/src/org/openqa/selenium/json/Input.java[31-38]
java/test/org/openqa/selenium/json/JsonInputTest.java[294-309]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`JsonInput.nextNumber()` builds error messages by casting `input.peek()` (an `int`) to `char`. When `peek()` returns `Input.EOF` (`-1`), the cast becomes `\uFFFF`, so errors at end-of-input misleadingly report that a U+FFFF character was seen.

### Issue Context
`Input.EOF` is intentionally `-1` to avoid colliding with any UTF-16 code unit, including U+FFFF; and there is a unit test ensuring U+FFFF is treated as a literal character.

### Fix Focus Areas
- java/src/org/openqa/selenium/json/JsonInput.java[223-281]
- java/src/org/openqa/selenium/json/Input.java[31-38]

### Suggested fix
- Add a small helper (e.g., `describeChar(int c)`) that returns `"<EOF>"` when `c == Input.EOF`, otherwise returns a quoted printable character.
- Use that helper in the three `JsonException` messages added in `nextNumber()` (expected digit, expected digit after '.', expected exponent digit), so EOF is reported clearly as EOF rather than `\uFFFF`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Previous review results

Review updated until commit e1f3ed2

Results up to commit 9f6599b

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

1. ~~EOF shown as U+FFFF~~ ✓ Resolved 🐞 Bug ◔ Observability

Description

In nextNumber(), several exception messages cast input.peek() (an int) to char; when the stream is
exhausted this renders EOF (-1) as '\uFFFF', producing misleading diagnostics for inputs like "-",
"5.", or "1e".
This is especially confusing because the codebase explicitly treats U+FFFF as a valid literal
character, so the error text can look like it saw a real character rather than end-of-input.

Code

java/src/org/openqa/selenium/json/JsonInput.java[R234-247]

+    int first = input.peek();
+    if (first == '0') {
+      builder.append((char) input.read());
+      // Leading zeros ("00", "01", ...) are not allowed.
+      if (isDigit(input.peek())) {
+        throw new JsonException("Leading zeros are not permitted in JSON numbers. " + input);
      }
-    } while (read);
+    } else if (first >= '1' && first <= '9') {
+      while (isDigit(input.peek())) {
+        builder.append((char) input.read());
+      }
+    } else {
+      throw new JsonException("Expected digit but saw '" + (char) first + "'. " + input);
+    }

Evidence
Input.peek() returns Input.EOF (-1) at end-of-input, and nextNumber() now casts these ints to
char inside error messages, which turns -1 into \uFFFF. The test suite also explicitly asserts
that literal U+FFFF is valid input, so reporting EOF as \uFFFF is ambiguous and confusing during
debugging.
java/src/org/openqa/selenium/json/JsonInput.java[223-281]
java/src/org/openqa/selenium/json/Input.java[31-38]
java/test/org/openqa/selenium/json/JsonInputTest.java[294-309]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`JsonInput.nextNumber()` builds error messages by casting `input.peek()` (an `int`) to `char`. When `peek()` returns `Input.EOF` (`-1`), the cast becomes `\uFFFF`, so errors at end-of-input misleadingly report that a U+FFFF character was seen.

### Issue Context
`Input.EOF` is intentionally `-1` to avoid colliding with any UTF-16 code unit, including U+FFFF; and there is a unit test ensuring U+FFFF is treated as a literal character.

### Fix Focus Areas
- java/src/org/openqa/selenium/json/JsonInput.java[223-281]
- java/src/org/openqa/selenium/json/Input.java[31-38]

### Suggested fix
- Add a small helper (e.g., `describeChar(int c)`) that returns `"<EOF>"` when `c == Input.EOF`, otherwise returns a quoted printable character.
- Use that helper in the three `JsonException` messages added in `nextNumber()` (expected digit, expected digit after '.', expected exponent digit), so EOF is reported clearly as EOF rather than `\uFFFF`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Casting Input.peek() to (char) turned end-of-input (-1) into U+FFFF, which is misleading now that U+FFFF is treated as a legitimate string character. Route the three sites in nextNumber() through a small helper that renders EOF as <EOF> and printable code units in quotes.

qodo-code-review · 2026-07-02T17:05:19Z

Code review by qodo was updated up to the latest commit e1f3ed2

selenium-ci added the C-java Java Bindings label Jul 2, 2026

shs96c force-pushed the shs-json-number-state-machine branch from 53d5bf5 to ee9fa0c Compare July 2, 2026 12:47

[java] Reformat JsonInput error messages to satisfy google-java-format

9f6599b

shs96c marked this pull request as ready for review July 2, 2026 13:12

qodo-code-review Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread java/src/org/openqa/selenium/json/JsonInput.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[java] Tighten JSON number lexer to match RFC 8259#17739

[java] Tighten JSON number lexer to match RFC 8259#17739
shs96c wants to merge 3 commits into
SeleniumHQ:trunkfrom
shs96c:shs-json-number-state-machine

shs96c commented Jul 2, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented Jul 2, 2026

Uh oh!

qodo-code-review Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

qodo-code-review Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

shs96c commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qodo-code-review Bot commented Jul 2, 2026

PR Summary by Qodo

Uh oh!

qodo-code-review Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Previous review results

Uh oh!

Uh oh!

qodo-code-review Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shs96c commented Jul 2, 2026 •

edited

Loading

qodo-code-review Bot commented Jul 2, 2026 •

edited

Loading