Changelog

Unicode String v2.1.0

This is the changelog for Unicode String v2.1.0 released on May 1st, 2026. For older changelogs please consult the release tag on GitHub

Bug Fixes

Improve line break segmentation conformance and compatibility with ICU.

Enhancements

Replaces the regex-based segmentation engine with a single-pass DFA evaluator. Sentence break on a 4 KB unbroken sentence drops from ~9,200 ms to ~11 ms (~840×); word break on a 4 KB sentence from ~7,000 ms to ~12 ms (~580×); scaling is now linear in input length instead of O(N²).

Unicode String v2.0.1

This is the changelog for Unicode String v2.0.1 released on April 29th, 2026. For older changelogs please consult the release tag on GitHub

Bug Fixes

Fix compile + dialyzer + tests without optional :localize dependency.

Unicode String v2.0.0

This is the changelog for Unicode String v2.0.0 released on April 14th, 2026. For older changelogs please consult the release tag on GitHub

Breaking change

Unicode String version 2.0 and later is supported on Elixir 1.17 or later only.

Enhancements

Replace ex_cldr with localize as the localization library
Fix titalcasing the letter i - including correct handling in Turkic languages
Use Localize.Locale.best_match/3 for locale matching
Fixes to the Unicode.Break module.

Unicode String v1.8.0

This is the changelog for Unicode String v1.8.0 released on January 19th, 2026. For older changelogs please consult the release tag on GitHub

Enhancements

Updates to Unicode 17.0 data.

Unicode String v1.7.0

This is the changelog for Unicode String v1.7.0 released on March 29th, 2025. For older changelogs please consult the release tag on GitHub

Bug Fixes

Converts all compile-time regex compilation to runtime to be compatible with OTP 28.

Unicode String v1.6.0

This is the changelog for Unicode String v1.6.0 released on March 17th, 2025. For older changelogs please consult the release tag on GitHub

Bug Fixes

Fix word break detection when a \p{word_break=extend} codepoint is preceeded by a letter and followed by a letter.

Enhancements

Updated to CLDR 47 break rules and test data.

Unicode String v1.5.0

This is the changelog for Unicode String v1.5.0 released on January 1st, 2025. For older changelogs please consult the release tag on GitHub

Enhancements

Update to CLDR 46.1 segmentation data and tests.
Pass dialyzer with :underspecs flag set.

Unicode String v1.4.1

This is the changelog for Unicode String v1.4.1 released on March 14th, 2024. For older changelogs please consult the release tag on GitHub

Bug Fixes

Fix performance regressing in Uncode.String.Break.next/4. Added the script bench/next.exs to allow for regression testing. Thanks to @mntns for the report. Closes #6.

Unicode String v1.4.0

This is the changelog for Unicode String v1.4.0 released on March 10th, 2024. For older changelogs please consult the release tag on GitHub

Enhancements

Adds dictionary-based work breaking for Chinese (zh, zh-Hant, zh-Hans, zh-Hant-HK, yue, yue-Hans), Japanese (ja), Thai (th), Lao (lo), Khmer (km) and Burmese (my). These languages don't typically use whitespace to separate words so a dictionary lookup is more appropriate - although not perfect. The same dictionary is used for Chinese and Japanese. The dictionaries implemented are those used in the CLDR since they are under an open source license and also for consistency with ICU. Note that these dictionaries need to be downloaded with mix unicode.string.download.dictionaries prior to use. Each dictionary will be parsed and loaded into persistent_term on demand. Each dictionary has a sizable memory footprint as measured by :persistent_term.info/0:

Dictionary	Memory Mb
Chinese	104.8
Thai	9.6
Lao	11.4
Khmer	38.8
Burmese	23.1

Unicode String v1.3.1

This is the changelog for Unicode String v1.3.1 released on March 6th, 2024. For older changelogs please consult the release tag on GitHub

Bug Fixes

Fix Unicode.String.split/2 and Unicode.String.next/2 when the passing rule is :no_break rule. Thanks to @GregLMcDonald for the report. Closes #5.

Unicode String v1.3.0

This is the changelog for Unicode String v1.3.0 released on February 27th, 2024. For older changelogs please consult the release tag on GitHub

Bug Fixes

Fix case folding for codepoints that fold to themselves.

Enhancements

Adds case mapping functions Unicode.String.upcase/2, Unicode.String.downcase/2 and Unicode/String.titlecase/2. These functions implement the full Unicode Casing algorithm including conditiional mappings. They are locale-aware and a locale can be specified as a string, atom or a Cldr.LanguageTag thereby providing basic integration between unicode_string and ex_cldr.
Case folding always follows the :full path which allows mapping of single code points to multiple code points. There is no practical reason to implement the :simple path. As a result, the type parameter to Unicode.String.Case.Folding.fold/2 is no longer required or supported.
Support an ex_cldr Language Tag as a parameter to Unicode.String.Case.Folding.fold/2. In fact any map that has a :language key with a value that is an ISO 639-1 language code as a lower cased atom may be passed as a parameter.

Unicode String v1.2.1

This is the changelog for Unicode String v1.2.1 released on June 2nd, 2023. For older changelogs please consult the release tag on GitHub

Bug Fixes

Resolve segments dir at runtime, not compile time. Thanks to @crkent for the report. Closes #4.

Unicode String v1.2.0

This is the changelog for Unicode String v1.2.0 released on March 14th, 2023. For older changelogs please consult the release tag on GitHub

Enhancements

Adds Unicode.String.stream/2 to support streaming graphemes, words, sentences and line breaks.

Unicode String v1.1.0

This is the changelog for Unicode String v1.1.0 released on September 21st, 2022. For older changelogs please consult the release tag on GitHub

Enhancements

Updates the segmentation supplemental data (including locales) for CLDR. This adds the "sv" and "fi" locale data for sentence break suppressions.

Unicode String v1.0.1

This is the changelog for Unicode String v1.0.1 released on September 15th, 2021. For older changelogs please consult the release tag on GitHub

Bug Fixes

Woops, the priv/segments directory was not included in the build artifact

Unicode String v1.0.0

This is the changelog for Unicode String v1.0.0 released on September 14th, 2021. For older changelogs please consult the release tag on GitHub

Enhancements

Update to use Unicode 14 release data.

Unicode String v0.3.0

This is the changelog for Unicode String v0.3.0 released on October 11th, 2020. For older changelogs please consult the release tag on GitHub

Bug Fixes

Correct deps and docs to align with Elixir 1.11 and recent releases of ex_unicode.

Unicode String v0.2.0

This is the changelog for Unicode String v0.2.0 released on July 12th, 2020. For older changelogs please consult the release tag on GitHub

Enhancements

This release implements the Unicode break rules for graphemes, words, lines (word-wrapping) and sentences.

Adds Unicode.String.split/2
Adds Unicode.String.break?/2
Adds Unicode.String.break/2
Adds Unicode.String.splitter/2
Adds Unicode.String.next/2

Unicode String v0.1.0

This is the changelog for Unicode String v0.1.0 released on May 17th, 2020. For older changelogs please consult the release tag on GitHub

Enhancements

Initial release

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

Unicode String v2.1.0

Bug Fixes

Enhancements

Unicode String v2.0.1

Bug Fixes

Unicode String v2.0.0

Breaking change

Enhancements

Unicode String v1.8.0

Enhancements

Unicode String v1.7.0

Bug Fixes

Unicode String v1.6.0

Bug Fixes

Enhancements

Unicode String v1.5.0

Enhancements

Unicode String v1.4.1

Bug Fixes

Unicode String v1.4.0

Enhancements

Unicode String v1.3.1

Bug Fixes

Unicode String v1.3.0

Bug Fixes

Enhancements

Unicode String v1.2.1

Bug Fixes

Unicode String v1.2.0

Enhancements

Unicode String v1.1.0

Enhancements

Unicode String v1.0.1

Bug Fixes

Unicode String v1.0.0

Enhancements

Unicode String v0.3.0

Bug Fixes

Unicode String v0.2.0

Enhancements

Unicode String v0.1.0

Enhancements