Skip to content

Conversation

@kakao-jenna-me
Copy link

Problem

In the MeCab parser, the limit optimization is not applied to Boolean mode searches even when the search term consists of a single token.

Root Cause Analysis

The issue originates from the mecab_parse() function in plugin_mecab.cc when handling Boolean mode (MYSQL_FTPARSER_FULL_BOOLEAN_INFO):

Token Counting Logic (lines 207-214):
The parser iterates through the mecab_lattice starting from bos_node() to count nodes.
The existing code incorrectly includes BOS (Beginning of Sentence) and EOS (End of Sentence) nodes in the total token_num.

Phrase Conversion Condition (lines 217-227):
If token_num > 1, the search term is converted into a phrase search (FT_TOKEN_LEFT_PAREN).

Impact:

Even for a single-token search, the inclusion of BOS and EOS nodes results in a token_num of at least 3.
This forces all searches to be treated as phrase searches, which requires maintaining positional information and subsequently disables limit optimization.

Changes

Modified the token counting loop to exclude BOS and EOS nodes:
This ensures token_num reflects the actual number of meaningful tokens.

Expected Effects

Prevention of unnecessary phrase conversion for single-token searches.
Performance improvement by enabling limit optimization for single-token queries in Boolean mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant