-
Notifications
You must be signed in to change notification settings - Fork 2.9k
CN.1875084792839892:1c1cf413b873302f5fb459f485ff2f96_692e5f206cdc22648f7bdc86.693037da6cdc22648f7bdcce.693037dad6bed23c03ffddc3:Trae CN.T(2025/12/3 21:15:06) #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #797 +/- ##
==========================================
- Coverage 84.30% 83.54% -0.77%
==========================================
Files 73 73
Lines 3263 3288 +25
==========================================
- Hits 2751 2747 -4
- Misses 512 541 +29
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements enhanced search functionality for a Django blog application, including keyword highlighting, multiple sort options (relevance, time, views), search history with localStorage, and a custom Chinese tokenizer for Whoosh. The changes also switch from MySQL to SQLite for the database and modify the slug generation library from django-uuslug to python-slugify.
Key Changes:
- Added search result sorting by relevance, time, and views with visual feedback
- Implemented keyword highlighting in search results with XSS protection attempts
- Added search history tracking using localStorage with auto-suggestions and popular searches
- Replaced jieba's built-in ChineseAnalyzer with custom ChineseTokenizer implementation
- Changed database from MySQL to SQLite and reconfigured Haystack to use Whoosh by default
Reviewed changes
Copilot reviewed 6 out of 8 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| whoosh_index/_MAIN_0.toc | Generated Whoosh index file - binary data for the search index schema |
| templates/search/search.html | Added sorting controls UI and CSS styles for highlighting search keywords |
| templates/blog/tags/sidebar.html | Added search history, suggestions dropdown with JavaScript, active state styling for categories/tags, and extensive inline CSS |
| djangoblog/whoosh_cn_backend.py | Replaced jieba.analyse.ChineseAnalyzer import with custom ChineseTokenizer class implementation |
| djangoblog/settings.py | Switched database from MySQL to SQLite, added Whoosh configuration block, commented out mdeditor app |
| blog/views.py | Enhanced EsSearchView with sorting logic and keyword highlighting in titles/excerpts |
| blog/models.py | Changed import from uuslug.slugify to slugify.slugify (python-slugify library) |
Comments suppressed due to low confidence (2)
djangoblog/settings.py:157
- This assignment to 'HAYSTACK_CONNECTIONS' is unnecessary as it is redefined before this value is used.
This assignment to 'HAYSTACK_CONNECTIONS' is unnecessary as it is redefined before this value is used.
HAYSTACK_CONNECTIONS = {
djangoblog/settings.py:164
- This assignment to 'HAYSTACK_SIGNAL_PROCESSOR' is unnecessary as it is redefined before this value is used.
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # 替换HTML特殊字符以避免XSS攻击 | ||
| query = query.replace('&', '&').replace('<', '<').replace('>', '>') |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The XSS protection is applied to the query before escaping it in the regex, but the escaped query is then used in regex substitution which injects HTML directly into article titles and excerpts. If the original query contains double quotes or single quotes, they are not escaped, which could still lead to attribute injection vulnerabilities when the highlighted content is rendered in HTML attributes. Consider using a proper HTML escaping library or escaping quotes as well.
| # 高亮标题中的关键词 | ||
| if article.title: | ||
| article.title = regex.sub(r'<em class="highlight">\1</em>', article.title) | ||
|
|
||
| # 高亮摘要中的关键词 | ||
| if hasattr(article, 'excerpt') and article.excerpt: | ||
| article.excerpt = regex.sub(r'<em class="highlight">\1</em>', article.excerpt) | ||
|
|
||
| # 如果没有摘要,从正文中提取部分内容并高亮 | ||
| elif article.body: | ||
| # 提取前200个字符作为摘要 | ||
| excerpt = article.body[:200] | ||
| if len(article.body) > 200: | ||
| excerpt += '...' | ||
|
|
||
| # 高亮摘要中的关键词 | ||
| article.excerpt = regex.sub(r'<em class="highlight">\1</em>', excerpt) |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying the article.title and article.excerpt attributes directly in the search view mutates the model instances with HTML markup. This is problematic because: 1) It permanently modifies the model objects in memory, potentially affecting other views or code that accesses these same objects, 2) The highlighted HTML will be double-escaped if the template already applies escaping (which Django does by default), and 3) It violates separation of concerns by mixing presentation logic with data retrieval. Consider storing the highlighted versions as separate variables in the context or result objects instead of mutating the model attributes.
| setTimeout(() => { | ||
| suggestionsDiv.style.display = 'none'; | ||
| }, 200); | ||
| } | ||
|
|
||
| // 搜索框事件 | ||
| searchInput.addEventListener('focus', showSuggestions); | ||
| searchInput.addEventListener('input', showSuggestions); |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a 200ms setTimeout to hide suggestions is a workaround that can lead to race conditions. A better approach would be to check if the newly focused element is within the suggestions div using relatedTarget in the blur event, or use proper focus management.
| setTimeout(() => { | |
| suggestionsDiv.style.display = 'none'; | |
| }, 200); | |
| } | |
| // 搜索框事件 | |
| searchInput.addEventListener('focus', showSuggestions); | |
| searchInput.addEventListener('input', showSuggestions); | |
| suggestionsDiv.style.display = 'none'; | |
| } | |
| // 搜索框事件 | |
| searchInput.addEventListener('focus', showSuggestions); | |
| searchInput.addEventListener('input', showSuggestions); | |
| searchInput.addEventListener('blur', function(e) { | |
| // If focus moves to an element inside suggestionsDiv, do not hide | |
| if (!suggestionsDiv.contains(e.relatedTarget)) { | |
| hideSuggestions(); | |
| } | |
| }, true); |
| # 关键词高亮处理 | ||
| query = self.query | ||
| if query: | ||
| # 替换HTML特殊字符以避免XSS攻击 | ||
| query = query.replace('&', '&').replace('<', '<').replace('>', '>') | ||
|
|
||
| # 创建正则表达式,不区分大小写 | ||
| import re | ||
| regex = re.compile(r'(' + re.escape(query) + r')', re.IGNORECASE) | ||
|
|
||
| # 对搜索结果中的标题和摘要进行关键词高亮 | ||
| for result in page.object_list: | ||
| article = result.object | ||
|
|
||
| # 高亮标题中的关键词 | ||
| if article.title: | ||
| article.title = regex.sub(r'<em class="highlight">\1</em>', article.title) | ||
|
|
||
| # 高亮摘要中的关键词 | ||
| if hasattr(article, 'excerpt') and article.excerpt: | ||
| article.excerpt = regex.sub(r'<em class="highlight">\1</em>', article.excerpt) | ||
|
|
||
| # 如果没有摘要,从正文中提取部分内容并高亮 | ||
| elif article.body: | ||
| # 提取前200个字符作为摘要 | ||
| excerpt = article.body[:200] | ||
| if len(article.body) > 200: | ||
| excerpt += '...' | ||
|
|
||
| # 高亮摘要中的关键词 | ||
| article.excerpt = regex.sub(r'<em class="highlight">\1</em>', excerpt) |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The keyword highlighting logic in get_context lacks test coverage. Tests should verify that: 1) keywords are correctly highlighted in titles and excerpts, 2) HTML entities are properly escaped to prevent XSS, 3) case-insensitive matching works correctly, and 4) the excerpt truncation logic handles various body lengths appropriately.
| <div> | ||
| <label class="screen-reader-text" for="s">{% trans 'search' %}:</label> | ||
| <input type="text" value="" name="q" id="q"/> | ||
| <input type="text" value="" name="q" id="search-input" placeholder="搜索文章..."/> |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input field's ID attribute has changed from 'q' to 'search-input', which may break other JavaScript code or tests that rely on the original 'q' ID to interact with this search input field.
| # 使用jieba分词 | ||
| words = jieba.cut(text) | ||
| for word in words: | ||
| token = Token() | ||
| token.text = word |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ChineseTokenizer implementation doesn't set important Token attributes like 'pos' (position in the stream), 'startchar', 'endchar', which are used by Whoosh for features like highlighting and phrase queries. Without these attributes, search highlighting and position-dependent features may not work correctly.
| # 使用jieba分词 | |
| words = jieba.cut(text) | |
| for word in words: | |
| token = Token() | |
| token.text = word | |
| # 使用jieba.tokenize获取分词及其在原文中的位置 | |
| pos = 0 | |
| for word, start, end in jieba.tokenize(text): | |
| token = Token() | |
| token.text = word | |
| token.pos = pos | |
| token.startchar = start | |
| token.endchar = end | |
| pos += 1 |
| } else { | ||
| history.forEach(query => { | ||
| const li = document.createElement('li'); | ||
| li.innerHTML = `<a href="/search/?q=${encodeURIComponent(query)}">${query}</a>`; |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The search history rendering uses encodeURIComponent in JavaScript but doesn't sanitize the query text itself before inserting it into the innerHTML. If a user enters malicious content containing HTML/JavaScript, it will be stored in localStorage and could execute when rendered. Consider using textContent instead of innerHTML or sanitizing the query display text.
| li.innerHTML = `<a href="/search/?q=${encodeURIComponent(query)}">${query}</a>`; | |
| const a = document.createElement('a'); | |
| a.href = `/search/?q=${encodeURIComponent(query)}`; | |
| a.textContent = query; | |
| li.appendChild(a); |
| 'ENGINE': 'django.db.backends.sqlite3', | ||
| 'NAME': BASE_DIR / 'db.sqlite3', |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing from MySQL to SQLite is a significant downgrade for production environments. SQLite lacks concurrent write support, full-text search capabilities, and doesn't scale well. This change should only be for development/testing purposes and should not be committed to the main branch without proper configuration management (e.g., using environment variables to switch between databases).
| 'ENGINE': 'django.db.backends.sqlite3', | |
| 'NAME': BASE_DIR / 'db.sqlite3', | |
| 'ENGINE': os.environ.get('DJANGO_DB_ENGINE', 'django.db.backends.sqlite3'), | |
| 'NAME': os.environ.get('DJANGO_DB_NAME', BASE_DIR / 'db.sqlite3'), | |
| 'USER': os.environ.get('DJANGO_DB_USER', ''), | |
| 'PASSWORD': os.environ.get('DJANGO_DB_PASSWORD', ''), | |
| 'HOST': os.environ.get('DJANGO_DB_HOST', ''), | |
| 'PORT': os.environ.get('DJANGO_DB_PORT', ''), |
| } | ||
| else: | ||
| # 默认使用Whoosh作为搜索引擎 | ||
| import os |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The os module is imported again locally within the else block, but it's already imported at the top of the file (line 12). This redundant import should be removed.
| import os |
| 'django.contrib.sites', | ||
| 'django.contrib.sitemaps', | ||
| 'mdeditor', | ||
| # 'mdeditor', |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mdeditor app is commented out in INSTALLED_APPS, but the model field MDTextField is still imported and used in blog/models.py (line 11). This will cause an ImportError when the app tries to start. Either keep mdeditor enabled or replace MDTextField with an alternative field type.
| # 'mdeditor', | |
| 'mdeditor', |
No description provided.