This is the changelog for Unicode String v2.1.0 released on May 1st, 2026. For older changelogs please consult the release tag on GitHub
- Improve line break segmentation conformance and compatibility with ICU.
- Replaces the regex-based segmentation engine with a single-pass DFA evaluator. Sentence break on a 4 KB unbroken sentence drops from ~9,200 ms to ~11 ms (~840×); word break on a 4 KB sentence from ~7,000 ms to ~12 ms (~580×); scaling is now linear in input length instead of O(N²).
This is the changelog for Unicode String v2.0.1 released on April 29th, 2026. For older changelogs please consult the release tag on GitHub
- Fix compile + dialyzer + tests without optional :localize dependency.
This is the changelog for Unicode String v2.0.0 released on April 14th, 2026. For older changelogs please consult the release tag on GitHub
- Unicode String version 2.0 and later is supported on Elixir 1.17 or later only.
-
Replace
ex_cldrwithlocalizeas the localization library -
Fix titalcasing the letter
i- including correct handling in Turkic languages -
Use
Localize.Locale.best_match/3for locale matching -
Fixes to the
Unicode.Breakmodule.
This is the changelog for Unicode String v1.8.0 released on January 19th, 2026. For older changelogs please consult the release tag on GitHub
- Updates to Unicode 17.0 data.
This is the changelog for Unicode String v1.7.0 released on March 29th, 2025. For older changelogs please consult the release tag on GitHub
- Converts all compile-time regex compilation to runtime to be compatible with OTP 28.
This is the changelog for Unicode String v1.6.0 released on March 17th, 2025. For older changelogs please consult the release tag on GitHub
- Fix word break detection when a
\p{word_break=extend}codepoint is preceeded by a letter and followed by a letter.
- Updated to CLDR 47 break rules and test data.
This is the changelog for Unicode String v1.5.0 released on January 1st, 2025. For older changelogs please consult the release tag on GitHub
-
Update to CLDR 46.1 segmentation data and tests.
-
Pass dialyzer with
:underspecsflag set.
This is the changelog for Unicode String v1.4.1 released on March 14th, 2024. For older changelogs please consult the release tag on GitHub
- Fix performance regressing in
Uncode.String.Break.next/4. Added the scriptbench/next.exsto allow for regression testing. Thanks to @mntns for the report. Closes #6.
This is the changelog for Unicode String v1.4.0 released on March 10th, 2024. For older changelogs please consult the release tag on GitHub
- Adds dictionary-based work breaking for Chinese (zh, zh-Hant, zh-Hans, zh-Hant-HK, yue, yue-Hans), Japanese (ja), Thai (th), Lao (lo), Khmer (km) and Burmese (my). These languages don't typically use whitespace to separate words so a dictionary lookup is more appropriate - although not perfect. The same dictionary is used for Chinese and Japanese. The dictionaries implemented are those used in the CLDR since they are under an open source license and also for consistency with ICU. Note that these dictionaries need to be downloaded with
mix unicode.string.download.dictionariesprior to use. Each dictionary will be parsed and loaded into persistent_term on demand. Each dictionary has a sizable memory footprint as measured by:persistent_term.info/0:
| Dictionary | Memory Mb |
|---|---|
| Chinese | 104.8 |
| Thai | 9.6 |
| Lao | 11.4 |
| Khmer | 38.8 |
| Burmese | 23.1 |
This is the changelog for Unicode String v1.3.1 released on March 6th, 2024. For older changelogs please consult the release tag on GitHub
- Fix
Unicode.String.split/2andUnicode.String.next/2when the passing rule is:no_breakrule. Thanks to @GregLMcDonald for the report. Closes #5.
This is the changelog for Unicode String v1.3.0 released on February 27th, 2024. For older changelogs please consult the release tag on GitHub
- Fix case folding for codepoints that fold to themselves.
-
Adds case mapping functions
Unicode.String.upcase/2,Unicode.String.downcase/2andUnicode/String.titlecase/2. These functions implement the full Unicode Casing algorithm including conditiional mappings. They are locale-aware and a locale can be specified as a string, atom or a Cldr.LanguageTag thereby providing basic integration betweenunicode_stringand ex_cldr. -
Case folding always follows the
:fullpath which allows mapping of single code points to multiple code points. There is no practical reason to implement the:simplepath. As a result, thetypeparameter toUnicode.String.Case.Folding.fold/2is no longer required or supported. -
Support an ex_cldr Language Tag as a parameter to
Unicode.String.Case.Folding.fold/2. In fact any map that has a:languagekey with a value that is an ISO 639-1 language code as a lower cased atom may be passed as a parameter.
This is the changelog for Unicode String v1.2.1 released on June 2nd, 2023. For older changelogs please consult the release tag on GitHub
- Resolve segments dir at runtime, not compile time. Thanks to @crkent for the report. Closes #4.
This is the changelog for Unicode String v1.2.0 released on March 14th, 2023. For older changelogs please consult the release tag on GitHub
- Adds
Unicode.String.stream/2to support streaming graphemes, words, sentences and line breaks.
This is the changelog for Unicode String v1.1.0 released on September 21st, 2022. For older changelogs please consult the release tag on GitHub
- Updates the segmentation supplemental data (including locales) for CLDR. This adds the "sv" and "fi" locale data for sentence break suppressions.
This is the changelog for Unicode String v1.0.1 released on September 15th, 2021. For older changelogs please consult the release tag on GitHub
- Woops, the priv/segments directory was not included in the build artifact
This is the changelog for Unicode String v1.0.0 released on September 14th, 2021. For older changelogs please consult the release tag on GitHub
- Update to use Unicode 14 release data.
This is the changelog for Unicode String v0.3.0 released on October 11th, 2020. For older changelogs please consult the release tag on GitHub
- Correct deps and docs to align with Elixir 1.11 and recent releases of
ex_unicode.
This is the changelog for Unicode String v0.2.0 released on July 12th, 2020. For older changelogs please consult the release tag on GitHub
This release implements the Unicode break rules for graphemes, words, lines (word-wrapping) and sentences.
-
Adds
Unicode.String.split/2 -
Adds
Unicode.String.break?/2 -
Adds
Unicode.String.break/2 -
Adds
Unicode.String.splitter/2 -
Adds
Unicode.String.next/2
This is the changelog for Unicode String v0.1.0 released on May 17th, 2020. For older changelogs please consult the release tag on GitHub
- Initial release