Skip to content

feat(fts): add stemmer token filter based on Snowball 3.1.1#513

Open
egolearner wants to merge 2 commits into
alibaba:mainfrom
egolearner:feat/fts-stemmer-token-filter
Open

feat(fts): add stemmer token filter based on Snowball 3.1.1#513
egolearner wants to merge 2 commits into
alibaba:mainfrom
egolearner:feat/fts-stemmer-token-filter

Conversation

@egolearner

Copy link
Copy Markdown
Collaborator

Implement a stemmer token filter for FTS that reduces words to their root form using the Snowball stemming library. Supports 34+ languages configurable via stemmer_lang in extra_params (defaults to english).

Changes:

  • Integrate Snowball 3.1.1 as a thirdparty static library
  • Add StemmerTokenFilter with thread_local stemmer cache (lock-free)
  • Register 'stemmer' filter in TokenizerFactory
  • Add unit tests and FtsColumnIndexer end-to-end tests

@egolearner egolearner force-pushed the feat/fts-stemmer-token-filter branch from 9ddb69a to dd53084 Compare June 22, 2026 11:48
Implement a stemmer token filter for FTS that reduces words to their
root form using the Snowball stemming library. Supports 34+ languages
configurable via stemmer_lang in extra_params (defaults to english).

Changes:
- Integrate Snowball 3.1.1 as a thirdparty static library
- Add StemmerTokenFilter with thread_local stemmer cache (lock-free)
- Register 'stemmer' filter in TokenizerFactory
- Add unit tests and FtsColumnIndexer end-to-end tests
@egolearner egolearner force-pushed the feat/fts-stemmer-token-filter branch from dd53084 to 11961a4 Compare June 23, 2026 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant