Skip to content

Fix Unicode tab-completion hang for multibyte characters (closes #2)#13

Closed
Kunal-Darekar wants to merge 1 commit intomgubi:mainfrom
Kunal-Darekar:fix/issue-2-unicode-tab-completion
Closed

Fix Unicode tab-completion hang for multibyte characters (closes #2)#13
Kunal-Darekar wants to merge 1 commit intomgubi:mainfrom
Kunal-Darekar:fix/issue-2-unicode-tab-completion

Conversation

@Kunal-Darekar
Copy link
Copy Markdown

Problem

Pressing Tab to autocomplete a Unicode variable like ρ hangs the session permanently (shows "busy", no further cells can be evaluated).

Root cause: TeXmacs sends the cursor as a byte offset. For ρ (U+03C1), that offset is 2 — which lands inside the middle of the two-byte UTF-8 sequence, not on a valid character boundary. Passing that raw offset to completions() causes Julia's internal word-boundary scan to loop indefinitely (older Julia) or throw a StringIndexError (Julia 1.11+).

A second bug existed in the completion suffix slicing: range.stop+2-range.start is byte arithmetic that gives the wrong slice index whenever the already-typed prefix itself contains multibyte characters.

Fix

  • Use thisind(str, cursor) to snap the incoming byte offset to the nearest valid character boundary before calling completions()
  • Replace the broken range.stop+2-range.start slice arithmetic with ncodeunits(prefix)+1, which is correct for any Unicode prefix

Verified

Added test/test_issue2_unicode_completion.jl — 8 tests run against Julia 1.11.3:

  • Confirms isvalid("ρ", 2) == false (the invalid boundary that caused the hang)
  • Confirms thisind snaps it correctly to byte 1
  • Confirms completions return instantly for ρ, σ_val, αβ_t
  • Proves old slice formula gives wrong result for multibyte prefix ("" instead of correct suffix)
  • Confirms ASCII completions have no regression (Base.sisin, sign, etc.)

- Snap TeXmacs byte cursor to a valid character boundary using thisind()
  before calling completions(), preventing an infinite hang on multibyte
  Unicode symbols like ρ (2 bytes in UTF-8)
- Replace broken byte-arithmetic slicing (range.stop+2-range.start) with
  ncodeunits(prefix) for correct multibyte Unicode handling
- Add early return when completions() returns no results
@Kunal-Darekar Kunal-Darekar closed this by deleting the head repository Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant