Skip to content

fix(api): restore CJK keyword search in workflow logs broken by PR #30450#37542

Open
ifer47 wants to merge 1 commit into
langgenius:mainfrom
ifer47:fix/cjk-workflow-search-regression
Open

fix(api): restore CJK keyword search in workflow logs broken by PR #30450#37542
ifer47 wants to merge 1 commit into
langgenius:mainfrom
ifer47:fix/cjk-workflow-search-regression

Conversation

@ifer47

@ifer47 ifer47 commented Jun 16, 2026

Copy link
Copy Markdown

Summary

Root Cause

Before PR #30450, the code used:

keyword_like_val = f"%{keyword[:30].encode('unicode_escape').decode('utf-8')}%".replace(r"\u", r"\u")

This converted CJK keywords to \uXXXX format, matching how json.dumps(ensure_ascii=True) stores data in PostgreSQL. PR #30450 replaced this with escape_like_pattern() which only escapes SQL LIKE wildcards (%, _, \), so searching for raw 你好 cannot match \u4f60\u597d stored in the database.

Fix

When the keyword contains non-ASCII characters:

  • Search for raw CJK form: ILIKE '%你好%' ESCAPE '\' — matches data stored with ensure_ascii=False (e.g., logstore backend)
  • Also search for unicode-escaped form: ILIKE '%\u4f60\u597d%' ESCAPE '\' — matches data stored with ensure_ascii=True (default)

Pure ASCII keywords skip the extra search to avoid unnecessary overhead.

Test plan

  • Unit tests added in test_workflow_app_log_cjk_search.py covering CJK, mixed ASCII+CJK, pure ASCII, and LIKE special character edge cases
  • Manual testing: create a workflow app with Chinese input, run it, search for Chinese keywords in workflow logs
  • Verify English keyword search still works as expected

Closes #37367

🤖 Generated with Claude Code Best

…nggenius#30450

PR langgenius#30450 replaced the unicode_escape encoding with escape_like_pattern()
for SQL injection safety, but this broke CJK keyword searches because
json.dumps() defaults to ensure_ascii=True, storing CJK characters as
\uXXXX escape sequences in the database. Searching for raw CJK characters
no longer matches the stored \uXXXX form.

This fix adds a dual-search strategy: when the keyword contains non-ASCII
characters, both the raw-CJK form and the unicode-escaped form are used
as LIKE patterns (OR'd together). This ensures matches regardless of
whether the data was stored with ensure_ascii=True (default repository)
or ensure_ascii=False (logstore backend). Pure ASCII keywords skip the
extra search to avoid unnecessary overhead.

Closes langgenius#37367

Co-Authored-By: zhipu/glm-5 <zai-org@claude-code-best.win>
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: Workflow log keyword search returns empty results for Chinese/CJK characters since v1.11.0 (PR #30450)

1 participant