fix: enable limit optimization for single-token searches in Boolean mode #5820
+3
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
In the MeCab parser, the limit optimization is not applied to Boolean mode searches even when the search term consists of a single token.
Root Cause Analysis
The issue originates from the mecab_parse() function in plugin_mecab.cc when handling Boolean mode (MYSQL_FTPARSER_FULL_BOOLEAN_INFO):
Token Counting Logic (lines 207-214):
The parser iterates through the mecab_lattice starting from bos_node() to count nodes.
The existing code incorrectly includes BOS (Beginning of Sentence) and EOS (End of Sentence) nodes in the total token_num.
Phrase Conversion Condition (lines 217-227):
If token_num > 1, the search term is converted into a phrase search (FT_TOKEN_LEFT_PAREN).
Impact:
Even for a single-token search, the inclusion of BOS and EOS nodes results in a token_num of at least 3.
This forces all searches to be treated as phrase searches, which requires maintaining positional information and subsequently disables limit optimization.
Changes
Modified the token counting loop to exclude BOS and EOS nodes:
This ensures token_num reflects the actual number of meaningful tokens.
Expected Effects
Prevention of unnecessary phrase conversion for single-token searches.
Performance improvement by enabling limit optimization for single-token queries in Boolean mode.