Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion blog/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from django.utils.timezone import now
from django.utils.translation import gettext_lazy as _
from mdeditor.fields import MDTextField
from uuslug import slugify
from slugify import slugify
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the import from 'uuslug' (django-uuslug) to 'slugify' (python-slugify) changes the slug generation behavior. The uuslug library is specifically designed for Django and handles slug uniqueness automatically, while python-slugify does not. This could lead to duplicate slug violations or inconsistent URL generation. If this change is intentional, the slug uniqueness logic needs to be implemented manually.

Copilot uses AI. Check for mistakes.

from djangoblog.utils import cache_decorator, cache
from djangoblog.utils import get_current_site
Expand Down
54 changes: 54 additions & 0 deletions blog/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -285,14 +285,68 @@ def get_queryset(self):


class EsSearchView(SearchView):
def get_queryset(self):
queryset = super(EsSearchView, self).get_queryset()

# 获取排序参数
sort_by = self.request.GET.get('sort', 'relevance')

# 根据排序参数对结果进行排序
if sort_by == 'time':
# 按时间排序(最新在前)
queryset = queryset.order_by('-pub_date')
elif sort_by == 'views':
# 按浏览量排序(最多在前)
queryset = queryset.order_by('-views')
# 默认按相关性排序

return queryset
Comment on lines +288 to +303
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new sorting functionality (by time and views) in get_queryset lacks test coverage. Given that the codebase has comprehensive tests for search functionality (see blog/tests.py), tests should be added to verify that sorting by time, views, and relevance works correctly and that the sort parameter is properly validated and applied.

Copilot uses AI. Check for mistakes.

def get_context(self):
paginator, page = self.build_page()

# 获取当前排序参数
sort_by = self.request.GET.get('sort', 'relevance')
Comment on lines +292 to +309
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sort_by parameter is retrieved twice in this class - once in get_queryset (line 292) and again in get_context (line 309). Consider storing it as an instance variable or creating a helper method to avoid duplication.

Copilot uses AI. Check for mistakes.

# 关键词高亮处理
query = self.query
if query:
# 替换HTML特殊字符以避免XSS攻击
query = query.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;')
Comment on lines +314 to +315
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XSS protection is applied to the query before escaping it in the regex, but the escaped query is then used in regex substitution which injects HTML directly into article titles and excerpts. If the original query contains double quotes or single quotes, they are not escaped, which could still lead to attribute injection vulnerabilities when the highlighted content is rendered in HTML attributes. Consider using a proper HTML escaping library or escaping quotes as well.

Copilot uses AI. Check for mistakes.

# 创建正则表达式,不区分大小写
import re
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The re module is imported inline within the method, but it's more efficient and follows Python conventions to import it at the top of the file. This import should be moved to the module-level imports.

Copilot uses AI. Check for mistakes.
regex = re.compile(r'(' + re.escape(query) + r')', re.IGNORECASE)

# 对搜索结果中的标题和摘要进行关键词高亮
for result in page.object_list:
article = result.object

# 高亮标题中的关键词
if article.title:
article.title = regex.sub(r'<em class="highlight">\1</em>', article.title)

# 高亮摘要中的关键词
if hasattr(article, 'excerpt') and article.excerpt:
article.excerpt = regex.sub(r'<em class="highlight">\1</em>', article.excerpt)

# 如果没有摘要,从正文中提取部分内容并高亮
elif article.body:
# 提取前200个字符作为摘要
excerpt = article.body[:200]
if len(article.body) > 200:
excerpt += '...'

# 高亮摘要中的关键词
article.excerpt = regex.sub(r'<em class="highlight">\1</em>', excerpt)
Comment on lines +325 to +341
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying the article.title and article.excerpt attributes directly in the search view mutates the model instances with HTML markup. This is problematic because: 1) It permanently modifies the model objects in memory, potentially affecting other views or code that accesses these same objects, 2) The highlighted HTML will be double-escaped if the template already applies escaping (which Django does by default), and 3) It violates separation of concerns by mixing presentation logic with data retrieval. Consider storing the highlighted versions as separate variables in the context or result objects instead of mutating the model attributes.

Copilot uses AI. Check for mistakes.
Comment on lines +311 to +341
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keyword highlighting logic in get_context lacks test coverage. Tests should verify that: 1) keywords are correctly highlighted in titles and excerpts, 2) HTML entities are properly escaped to prevent XSS, 3) case-insensitive matching works correctly, and 4) the excerpt truncation logic handles various body lengths appropriately.

Copilot uses AI. Check for mistakes.

context = {
"query": self.query,
"form": self.form,
"page": page,
"paginator": paginator,
"suggestion": None,
"sort_by": sort_by, # 将当前排序参数传递到模板
}
if hasattr(self.results, "query") and self.results.query.backend.include_spelling:
context["suggestion"] = self.results.query.get_spelling_suggestion()
Expand Down
Binary file added db.sqlite3
Binary file not shown.
30 changes: 19 additions & 11 deletions djangoblog/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def env_to_bool(env, default):
'django.contrib.staticfiles',
'django.contrib.sites',
'django.contrib.sitemaps',
'mdeditor',
# 'mdeditor',
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mdeditor app is commented out in INSTALLED_APPS, but the model field MDTextField is still imported and used in blog/models.py (line 11). This will cause an ImportError when the app tries to start. Either keep mdeditor enabled or replace MDTextField with an alternative field type.

Suggested change
# 'mdeditor',
'mdeditor',

Copilot uses AI. Check for mistakes.
'haystack',
'blog',
'accounts',
Expand Down Expand Up @@ -108,16 +108,10 @@ def env_to_bool(env, default):

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': os.environ.get('DJANGO_MYSQL_DATABASE') or 'djangoblog',
'USER': os.environ.get('DJANGO_MYSQL_USER') or 'root',
'PASSWORD': os.environ.get('DJANGO_MYSQL_PASSWORD') or 'root',
'HOST': os.environ.get('DJANGO_MYSQL_HOST') or '127.0.0.1',
'PORT': int(
os.environ.get('DJANGO_MYSQL_PORT') or 3306),
'OPTIONS': {
'charset': 'utf8mb4'},
}}
'ENGINE': 'django.db.backends.sqlite3',
'NAME': BASE_DIR / 'db.sqlite3',
Comment on lines +111 to +112
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing from MySQL to SQLite is a significant downgrade for production environments. SQLite lacks concurrent write support, full-text search capabilities, and doesn't scale well. This change should only be for development/testing purposes and should not be committed to the main branch without proper configuration management (e.g., using environment variables to switch between databases).

Suggested change
'ENGINE': 'django.db.backends.sqlite3',
'NAME': BASE_DIR / 'db.sqlite3',
'ENGINE': os.environ.get('DJANGO_DB_ENGINE', 'django.db.backends.sqlite3'),
'NAME': os.environ.get('DJANGO_DB_NAME', BASE_DIR / 'db.sqlite3'),
'USER': os.environ.get('DJANGO_DB_USER', ''),
'PASSWORD': os.environ.get('DJANGO_DB_PASSWORD', ''),
'HOST': os.environ.get('DJANGO_DB_HOST', ''),
'PORT': os.environ.get('DJANGO_DB_PORT', ''),

Copilot uses AI. Check for mistakes.
}
}

# Password validation
# https://docs.djangoproject.com/en/1.10/ref/settings/#auth-password-validators
Expand Down Expand Up @@ -360,7 +354,9 @@ def env_to_bool(env, default):

DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'

# 搜索引擎配置
if os.environ.get('DJANGO_ELASTICSEARCH_HOST'):
# 使用Elasticsearch作为搜索引擎
ELASTICSEARCH_DSL = {
'default': {
'hosts': os.environ.get('DJANGO_ELASTICSEARCH_HOST')
Expand All @@ -371,6 +367,18 @@ def env_to_bool(env, default):
'ENGINE': 'djangoblog.elasticsearch_backend.ElasticSearchEngine',
},
}
else:
# 默认使用Whoosh作为搜索引擎
import os
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The os module is imported again locally within the else block, but it's already imported at the top of the file (line 12). This redundant import should be removed.

Suggested change
import os

Copilot uses AI. Check for mistakes.
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'djangoblog.whoosh_cn_backend.WhooshEngine',
'PATH': os.path.join(BASE_DIR, 'whoosh_index'),
},
}

# 自动更新搜索索引
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

# Plugin System
PLUGINS_DIR = BASE_DIR / 'plugins'
Expand Down
18 changes: 16 additions & 2 deletions djangoblog/whoosh_cn_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,21 @@
from haystack.utils import get_identifier, get_model_ct
from haystack.utils import log as logging
from haystack.utils.app_loading import haystack_get_model
from jieba.analyse import ChineseAnalyzer
import jieba
from whoosh.analysis import Tokenizer, Token

class ChineseTokenizer(Tokenizer):
def __call__(self, text, **kwargs):
# 使用jieba分词
words = jieba.cut(text)
for word in words:
token = Token()
token.text = word
Comment on lines +30 to +34
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ChineseTokenizer implementation doesn't set important Token attributes like 'pos' (position in the stream), 'startchar', 'endchar', which are used by Whoosh for features like highlighting and phrase queries. Without these attributes, search highlighting and position-dependent features may not work correctly.

Suggested change
# 使用jieba分词
words = jieba.cut(text)
for word in words:
token = Token()
token.text = word
# 使用jieba.tokenize获取分词及其在原文中的位置
pos = 0
for word, start, end in jieba.tokenize(text):
token = Token()
token.text = word
token.pos = pos
token.startchar = start
token.endchar = end
pos += 1

Copilot uses AI. Check for mistakes.
yield token

# 创建中文分词器实例
ChineseAnalyzer = ChineseTokenizer()

from whoosh import index
from whoosh.analysis import StemmingAnalyzer
from whoosh.fields import BOOLEAN, DATETIME, IDLIST, KEYWORD, NGRAM, NGRAMWORDS, NUMERIC, Schema, TEXT
Expand Down Expand Up @@ -186,7 +200,7 @@ def build_schema(self, fields):
else:
# schema_fields[field_class.index_fieldname] = TEXT(stored=True, analyzer=StemmingAnalyzer(), field_boost=field_class.boost, sortable=True)
schema_fields[field_class.index_fieldname] = TEXT(
stored=True, analyzer=ChineseAnalyzer(), field_boost=field_class.boost, sortable=True)
stored=True, analyzer=ChineseAnalyzer, field_boost=field_class.boost, sortable=True)
if field_class.document is True:
content_field_name = field_class.index_fieldname
schema_fields[field_class.index_fieldname].spelling = True
Expand Down
Loading
Loading