feat: add configurable connection charset (lc_ctype) #164

reinhardt1053 · 2026-02-11T15:08:38Z

Problem

Legacy Firebird databases commonly use charset NONE on text columns. In these databases, text is stored as raw bytes in the application's encoding (typically WIN1252) without any charset metadata on the columns.

The driver currently hardcodes utf8 as the connection charset (lc_ctype in the DPB). When connecting to a database with charset NONE columns, Firebird does not transliterate the data: it sends the raw bytes as-is. The driver then incorrectly decodes these WIN1252 bytes as UTF-8, corrupting accented characters:

Tournée → Tourn�e
Café → Caf�

This affects a large number of production Firebird databases where charset NONE was the default.

Solution

Add a charset option to ConnectOptions that:

Sets the DPB lc_ctype to the specified charset instead of hardcoded utf8
Propagates the charset to the data reader/writer so string encoding/decoding matches the connection charset

Usage

const attachment = await client.connect('host:database', {
  username: 'SYSDBA',
  password: 'masterkey',
  charset: 'WIN1252',
});

How it works

mapCharsetToEncoding() maps Firebird charset names to Node.js BufferEncoding values (utf8 for UTF8, latin1 for all single-byte charsets)
latin1 encoding in Node.js provides a 1:1 byte-to-codepoint mapping, which correctly handles any single-byte Firebird charset (WIN1252, ISO8859_1, WIN1250, etc.)
The encoding is stored on AbstractAttachment and passed through to createDataReader() and createDataWriter() via StatementImpl.prepare()

Changes

node-firebird-driver:
- ConnectOptions: add optional charset property
- createDpb(): use options.charset instead of hardcoded 'utf8'
- mapCharsetToEncoding(): new helper to map Firebird charset → Node.js encoding
- AbstractAttachment: add encoding property
- createDataReader() / createDataWriter(): accept encoding parameter
node-firebird-driver-native:
- AttachmentImpl.connect(): set encoding from mapCharsetToEncoding(options.charset)
- StatementImpl.prepare(): pass attachment.encoding to reader/writer

Backward compatible

When charset is not specified, the behavior is identical to before (defaults to utf8).

Add charset option to ConnectOptions allowing users to specify the connection character set used in the DPB (lc_ctype parameter). The charset is also propagated to the data reader and writer so that string encoding/decoding matches the connection charset. This is essential for legacy Firebird databases (commonly created with Delphi/IBX) where columns use charset NONE. In these databases, text is stored as raw bytes in the application's encoding (typically WIN1252) without any charset metadata on the columns. With the current hardcoded 'utf8' charset, the driver tells Firebird to communicate in UTF-8, but Firebird does not transliterate charset NONE columns. The raw WIN1252 bytes are then incorrectly decoded as UTF-8, corrupting accented characters (e.g. 'Tournée' becomes 'Tourn�e'). By setting charset: 'WIN1252' in ConnectOptions, Firebird sends the correct bytes and the driver decodes them using the matching Node.js encoding (latin1, which provides 1:1 byte-to-codepoint mapping for single-byte charsets). Changes: - ConnectOptions: add optional charset property - createDpb(): use options.charset instead of hardcoded 'utf8' - mapCharsetToEncoding(): map Firebird charset names to Node.js encodings - AbstractAttachment: store encoding from connection charset - createDataReader(): accept encoding parameter for string decoding - createDataWriter(): accept encoding parameter for string encoding - AttachmentImpl: set encoding on connect using mapCharsetToEncoding() - StatementImpl: pass attachment.encoding to reader/writer Backward compatible: defaults to 'utf8' when charset is not specified.

asfernandes · 2026-02-12T01:36:51Z

Isn't node.js strings assumed to be utf8?
How would it will work with strings that are just bytes?

reinhardt1053 · 2026-02-12T07:15:39Z

Isn't node.js strings assumed to be utf8?

JavaScript strings are Unicode internally but the key issue is how raw bytes from the wire are decoded into JS strings, and how JS strings are encoded back to bytes when writing. At the moment with charset NONE columns Firebird sends raw bytes without transliteration, the byte 0xE9 (which is é in WIN1252) is not valid as a single-byte UTF-8 sequence, so StringDecoder('utf8') replaces it with �

How would it will work with strings that are just bytes?

With NONE columns the data is indeed just bytes, the driver can't know the encoding. That's why it's left to the user to specify it via the charset option, the user knows what encoding their application uses (e.g. Delphi apps typically use WIN1252). The driver then uses the corresponding node.js encoding (latin1) to decode/encode correctly.

asfernandes · 2026-02-12T11:06:16Z

Usage of latin1 is wrong there.
Looks like TextDecoder would be the correct way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add configurable connection charset (lc_ctype) #164

feat: add configurable connection charset (lc_ctype) #164

reinhardt1053 commented Feb 11, 2026

Uh oh!

asfernandes commented Feb 12, 2026

Uh oh!

reinhardt1053 commented Feb 12, 2026

Uh oh!

asfernandes commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat: add configurable connection charset (lc_ctype) #164

Are you sure you want to change the base?

feat: add configurable connection charset (lc_ctype) #164

Conversation

reinhardt1053 commented Feb 11, 2026

Problem

Solution

Usage

How it works

Changes

Backward compatible

Uh oh!

asfernandes commented Feb 12, 2026

Uh oh!

reinhardt1053 commented Feb 12, 2026

Uh oh!

asfernandes commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants