feat(wordwrap): support CJK line-breaking rules#86
Open
Chronostasys wants to merge 1 commit into
Open
Conversation
In CJK (Chinese, Japanese, Korean) typography, each character is a valid line-break point — unlike Latin scripts where only spaces and explicit breakpoints allow wrapping. The original implementation treats CJK+Latin sequences without spaces as a single word, causing entire mixed-language segments like "manual(手动触发)" to wrap as one unit and waste half the available line width. Changes to Write(): - Add isCJK() to detect CJK characters by Unicode range (Han, Hiragana, Katakana, Hangul, CJK punctuation, fullwidth forms). - CJK characters are immediately flushed as individual words, making each one a valid break point (standard CJK typography rule). - CJK↔non-CJK boundaries trigger a word flush, enabling breaks between scripts (e.g., "这是" | "manual" | "触发"). - Non-CJK behavior is completely unchanged. Tests: add TestWordWrapCJK with 11 cases covering pure CJK, CJK+Latin mix, CJK punctuation, and boundary detection.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
CJK (Chinese, Japanese, Korean) text wrapping is broken. In CJK typography, every character is a valid line-break point — unlike Latin scripts where only spaces and explicit breakpoints allow wrapping. The current implementation treats CJK+Latin sequences without spaces as a single word, causing:
"manual(手动触发),很可能没跑。"at limit=12 renders as a single long line that overflows, instead of breaking at CJK character boundariesBefore (limit=12)
The entire string is one "word" (no spaces between CJK chars) → never breaks.
After (limit=12)
Each CJK character is a break point. CJK↔Latin boundaries also break.
Changes
Minimal changes to
wordwrap.goWrite()method:isCJK(r rune) bool: Detects CJK characters by Unicode range (Han, Hiragana, Katakana, Hangul, CJK punctuation, fullwidth forms)."这是" | "manual" | "触发").Test Cases
Added
TestWordWrapCJKwith 11 cases covering:All existing tests also pass (except a pre-existing failure in
truncateunrelated to this change).