Skip to content

UTF-8 does not reset state when returning error #359

@ChALkeR

Description

@ChALkeR

What is the issue with the Encoding Standard?

Same as #358 but for Unicode BOM
If the proposal of #358 is to reset state for errors, then what should happen to BOM seen?

I don't argue that it should be reset, but there is definitely some sort of issue and inconsistency there


Platform status is highly inconsistent:

const r = (d, ...a) => {
  try {
    return d.decode(...a).length
  } catch {}
  return 'e'
}

const a = new TextDecoder('utf8', { fatal: true })
console.log('A',
  r(a, Uint8Array.of(0xef, 0xbb, 0xbf, 0xff), { stream: true }),
  r(a, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }), // error does not stick in Chrome/Safari
)

const b = new TextDecoder('utf8', { fatal: true })
console.log('B',
  r(b, Uint8Array.of(0xef, 0xbb, 0xbf, 0xef), { stream: true }),
  r(b, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
  r(b, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }), // error sticks in Chrome / Safari
  r(b, Uint8Array.of(0xbb, 0xbf), { stream: true }),
  r(b, Uint8Array.of(), { stream: true }),
)

const c = new TextDecoder('utf8', { fatal: true })
console.log('C',
  r(c, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
  r(c, Uint8Array.of(0xff), { stream: true }),
  r(c, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
)

// Bonus: if BOM is not reset, is it processed on errors?
const d = new TextDecoder('utf8', { fatal: true })
const e = new TextDecoder('utf8', { fatal: true })
console.log('D',
  r(d, Uint8Array.of(0x20, 0xff), { stream: true }),
  r(d, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
  r(d, Uint8Array.of(0xff), { stream: true }),
  r(d, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
  r(e, Uint8Array.of(0xff), { stream: true }),
  r(e, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
)

Chrome: first error did not get stuck, second error got stuck

A e 0
B 0 e e e e
C 0 e 1
D e 0 e 1 e 0

WebKit: first error did not get stuck, second error got stuck

A e 1
B 0 e e e e
C 0 e 1
D e 1 e 1 e 0

Firefox, Servo, Deno, Static Hermes: errors do not stick, bom does not get reset, bom seen is set on errors

A e 1
B 0 e 1 e 0
C 0 e 1
D e 1 e 1 e 1

Node.js: errors do not stick

A e 0
B 0 e 1 e 0
C 0 e 1
D e 0 e 1 e 0

Bun: just broken

A e 0
B 0 0 0 1 0
C 0 e 0
D e 0 e 0 e 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions