feat(wordwrap): support CJK line-breaking rules by Chronostasys · Pull Request #86 · muesli/reflow

Chronostasys · 2026-05-19T03:00:43Z

Problem

CJK (Chinese, Japanese, Korean) text wrapping is broken. In CJK typography, every character is a valid line-break point — unlike Latin scripts where only spaces and explicit breakpoints allow wrapping. The current implementation treats CJK+Latin sequences without spaces as a single word, causing:

Entire mixed-language segments wrap as one unit, wasting half the available line width
Text like "manual（手动触发），很可能没跑。" at limit=12 renders as a single long line that overflows, instead of breaking at CJK character boundaries

Before (limit=12)

manual（手动触发），很可能没跑。

The entire string is one "word" (no spaces between CJK chars) → never breaks.

After (limit=12)

manual（手动
触发），很可
能没跑。

Each CJK character is a break point. CJK↔Latin boundaries also break.

Changes

Minimal changes to wordwrap.go Write() method:

isCJK(r rune) bool: Detects CJK characters by Unicode range (Han, Hiragana, Katakana, Hangul, CJK punctuation, fullwidth forms).
CJK characters are flushed immediately as individual words, making each one a valid break point — standard CJK typography rule.
CJK↔non-CJK boundaries trigger a word flush, enabling breaks between scripts (e.g., "这是" | "manual" | "触发").
Non-CJK behavior is completely unchanged — all existing tests pass.

Test Cases

Added TestWordWrapCJK with 11 cases covering:

Pure CJK text (each char is a break point)
CJK mixed with Latin (boundary detection)
CJK punctuation (fullwidth forms)
Limit=0 passthrough (no wrap)
Latin-only (unchanged behavior)

=== RUN   TestWordWrapCJK
--- PASS: TestWordWrapCJK (0.00s)
=== RUN   TestWordWrapCJKNoWrap
--- PASS: TestWordWrapCJKNoWrap (0.00s)
=== RUN   TestWordWrapCJKString
--- PASS: TestWordWrapCJKString (0.00s)

All existing tests also pass (except a pre-existing failure in truncate unrelated to this change).

In CJK (Chinese, Japanese, Korean) typography, each character is a valid line-break point — unlike Latin scripts where only spaces and explicit breakpoints allow wrapping. The original implementation treats CJK+Latin sequences without spaces as a single word, causing entire mixed-language segments like "manual（手动触发）" to wrap as one unit and waste half the available line width. Changes to Write(): - Add isCJK() to detect CJK characters by Unicode range (Han, Hiragana, Katakana, Hangul, CJK punctuation, fullwidth forms). - CJK characters are immediately flushed as individual words, making each one a valid break point (standard CJK typography rule). - CJK↔non-CJK boundaries trigger a word flush, enabling breaks between scripts (e.g., "这是" | "manual" | "触发"). - Non-CJK behavior is completely unchanged. Tests: add TestWordWrapCJK with 11 cases covering pure CJK, CJK+Latin mix, CJK punctuation, and boundary detection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(wordwrap): support CJK line-breaking rules#86

feat(wordwrap): support CJK line-breaking rules#86
Chronostasys wants to merge 1 commit into
muesli:masterfrom
Chronostasys:fix/cjk-word-break

Chronostasys commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Chronostasys commented May 19, 2026

Problem

Before (limit=12)

After (limit=12)

Changes

Test Cases

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant