fix(api): restore CJK keyword search in workflow logs broken by PR #30450 by ifer47 · Pull Request #37542 · langgenius/dify

ifer47 · 2026-06-16T16:44:50Z

Summary

Workflow log keyword search returns empty results for Chinese/CJK characters since v1.11.0 (PR fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries. #30450)
PR fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries. #30450 replaced unicode_escape encoding with escape_like_pattern() for SQL injection safety, but broke CJK search because json.dumps() defaults to ensure_ascii=True, storing CJK characters as \uXXXX escape sequences in the database
This fix adds a dual-search strategy: when the keyword contains non-ASCII characters, both the raw-CJK form and the unicode-escaped form are used as LIKE patterns (OR'd together), ensuring matches regardless of serialization format

Root Cause

Before PR #30450, the code used:

keyword_like_val = f"%{keyword[:30].encode('unicode_escape').decode('utf-8')}%".replace(r"\u", r"\u")

This converted CJK keywords to \uXXXX format, matching how json.dumps(ensure_ascii=True) stores data in PostgreSQL. PR #30450 replaced this with escape_like_pattern() which only escapes SQL LIKE wildcards (%, _, \), so searching for raw 你好 cannot match \u4f60\u597d stored in the database.

Fix

When the keyword contains non-ASCII characters:

Search for raw CJK form: ILIKE '%你好%' ESCAPE '\' — matches data stored with ensure_ascii=False (e.g., logstore backend)
Also search for unicode-escaped form: ILIKE '%\u4f60\u597d%' ESCAPE '\' — matches data stored with ensure_ascii=True (default)

Pure ASCII keywords skip the extra search to avoid unnecessary overhead.

Test plan

Unit tests added in test_workflow_app_log_cjk_search.py covering CJK, mixed ASCII+CJK, pure ASCII, and LIKE special character edge cases
Manual testing: create a workflow app with Chinese input, run it, search for Chinese keywords in workflow logs
Verify English keyword search still works as expected

Closes #37367

🤖 Generated with Claude Code Best

…nggenius#30450 PR langgenius#30450 replaced the unicode_escape encoding with escape_like_pattern() for SQL injection safety, but this broke CJK keyword searches because json.dumps() defaults to ensure_ascii=True, storing CJK characters as \uXXXX escape sequences in the database. Searching for raw CJK characters no longer matches the stored \uXXXX form. This fix adds a dual-search strategy: when the keyword contains non-ASCII characters, both the raw-CJK form and the unicode-escaped form are used as LIKE patterns (OR'd together). This ensures matches regardless of whether the data was stored with ensure_ascii=True (default repository) or ensure_ascii=False (logstore backend). Pure ASCII keywords skip the extra search to avoid unnecessary overhead. Closes langgenius#37367 Co-Authored-By: zhipu/glm-5 <zai-org@claude-code-best.win>

ifer47 requested review from QuantumGhost and laipz8200 as code owners June 16, 2026 16:44

dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): restore CJK keyword search in workflow logs broken by PR #30450#37542

fix(api): restore CJK keyword search in workflow logs broken by PR #30450#37542
ifer47 wants to merge 1 commit into
langgenius:mainfrom
ifer47:fix/cjk-workflow-search-regression

ifer47 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ifer47 commented Jun 16, 2026

Summary

Root Cause

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant