Skip to content

Commit e46c805

Browse files
Wang-Daojiyuan.wangfridayL
authored
Feat/fix palyground bug (#680)
* fix playground bug, internet search judge * fix playground internet bug * modify delete mem * modify tool resp bug in multi cube * fix bug in playground chat handle and search inter * modify prompt * fix bug in playground * fix bug playfround * fix bug * fix code * fix model bug in playground * modify plan b * llm param modify * add logger in playground * modify code * fix bug * modify code * modify code * fix bug * fix search bug in plarground * fixx bug * move schadualr to back * modify pref location * modify fast net search * add tags and new package * modify prompt fix bug --------- Co-authored-by: yuan.wang <yuan.wang@yuanwangdebijibendiannao.local> Co-authored-by: chunyu li <78344051+fridayL@users.noreply.github.com>
1 parent 3edd095 commit e46c805

File tree

8 files changed

+82
-19
lines changed

8 files changed

+82
-19
lines changed

docker/requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,3 +160,5 @@ xlrd==2.0.2
160160
xlsxwriter==3.2.5
161161
prometheus-client==0.23.1
162162
pymilvus==2.5.12
163+
nltk==3.9.1
164+
rake-nltk==1.0.6

poetry.lock

Lines changed: 20 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,8 @@ all = [
121121
"sentence-transformers (>=4.1.0,<5.0.0)",
122122
"qdrant-client (>=1.14.2,<2.0.0)",
123123
"volcengine-python-sdk (>=4.0.4,<5.0.0)",
124+
"nltk (>=3.9.1,<4.0.0)",
125+
"rake-nltk (>=1.0.6,<1.1.0)",
124126

125127
# Uncategorized dependencies
126128
]

src/memos/api/handlers/chat_handler.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -395,16 +395,6 @@ def generate_chat_response() -> Generator[str, None, None]:
395395
[chat_req.mem_cube_id] if chat_req.mem_cube_id else [chat_req.user_id]
396396
)
397397

398-
# for playground, add the query to memory without response
399-
self._start_add_to_memory(
400-
user_id=chat_req.user_id,
401-
writable_cube_ids=writable_cube_ids,
402-
session_id=chat_req.session_id or "default_session",
403-
query=chat_req.query,
404-
full_response=None,
405-
async_mode="sync",
406-
)
407-
408398
# ====== first search text mem with parse goal ======
409399
search_req = APISearchPlaygroundRequest(
410400
query=chat_req.query,
@@ -450,7 +440,7 @@ def generate_chat_response() -> Generator[str, None, None]:
450440
pref_list = search_response.data.get("pref_mem") or []
451441
pref_memories = pref_list[0].get("memories", []) if pref_list else []
452442
pref_md_string = self._build_pref_md_string_for_playground(pref_memories)
453-
yield f"data: {json.dumps({'type': 'pref_md_string', 'data': pref_md_string})}\n\n"
443+
yield f"data: {json.dumps({'type': 'pref_md_string', 'data': pref_md_string}, ensure_ascii=False)}\n\n"
454444

455445
# Use first readable cube ID for scheduler (backward compatibility)
456446
scheduler_cube_id = (
@@ -531,6 +521,16 @@ def generate_chat_response() -> Generator[str, None, None]:
531521
)
532522
yield f"data: {json.dumps({'type': 'reference', 'data': reference})}\n\n"
533523

524+
# for playground, add the query to memory without response
525+
self._start_add_to_memory(
526+
user_id=chat_req.user_id,
527+
writable_cube_ids=writable_cube_ids,
528+
session_id=chat_req.session_id or "default_session",
529+
query=chat_req.query,
530+
full_response=None,
531+
async_mode="sync",
532+
)
533+
534534
# Step 2: Build system prompt with memories
535535
system_prompt = self._build_enhance_system_prompt(
536536
filtered_memories, pref_string
@@ -794,7 +794,7 @@ def _build_enhance_system_prompt(
794794
sys_body
795795
+ "\n\n# Memories\n## PersonalMemory (ordered)\n"
796796
+ mem_block_p
797-
+ "\n## OuterMemory (ordered)\n"
797+
+ "\n## OuterMemory (from Internet Search, ordered)\n"
798798
+ mem_block_o
799799
+ f"\n\n{pref_string}"
800800
)

src/memos/memories/textual/tree_text_memory/retrieve/bochasearch.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,11 @@
99
import requests
1010

1111
from memos.context.context import ContextThreadPoolExecutor
12+
from memos.dependency import require_python_package
1213
from memos.embedders.factory import OllamaEmbedder
1314
from memos.log import get_logger
1415
from memos.mem_reader.base import BaseMemReader
16+
from memos.mem_reader.read_multi_modal import detect_lang
1517
from memos.memories.textual.item import (
1618
SearchedTreeNodeTextualMemoryMetadata,
1719
SourceMessage,
@@ -121,6 +123,21 @@ def _post(self, url: str, body: dict) -> list[dict]:
121123
class BochaAISearchRetriever:
122124
"""BochaAI retriever that converts search results into TextualMemoryItem objects"""
123125

126+
@require_python_package(
127+
import_name="rake_nltk",
128+
install_command="pip install rake_nltk",
129+
install_link="https://pypi.org/project/rake-nltk/",
130+
)
131+
@require_python_package(
132+
import_name="nltk",
133+
install_command="pip install nltk",
134+
install_link="https://www.nltk.org/install.html",
135+
)
136+
@require_python_package(
137+
import_name="jieba",
138+
install_command="pip install jieba",
139+
install_link="https://github.com/fxsjy/jieba",
140+
)
124141
def __init__(
125142
self,
126143
access_key: str,
@@ -137,9 +154,25 @@ def __init__(
137154
reader: MemReader instance for processing internet content
138155
max_results: Maximum number of search results to retrieve
139156
"""
157+
import nltk
158+
159+
try:
160+
nltk.download("averaged_perceptron_tagger_eng")
161+
except Exception as err:
162+
raise Exception("Failed to download nltk averaged_perceptron_tagger_eng") from err
163+
try:
164+
nltk.download("stopwords")
165+
except Exception as err:
166+
raise Exception("Failed to download nltk stopwords") from err
167+
168+
from jieba.analyse import TextRank
169+
from rake_nltk import Rake
170+
140171
self.bocha_api = BochaAISearchAPI(access_key, max_results=max_results)
141172
self.embedder = embedder
142173
self.reader = reader
174+
self.en_fast_keywords_extractor = Rake()
175+
self.zh_fast_keywords_extractor = TextRank()
143176

144177
def retrieve_from_internet(
145178
self, query: str, top_k: int = 10, parsed_goal=None, info=None, mode="fast"
@@ -224,6 +257,13 @@ def _process_result(
224257
info_ = info.copy()
225258
user_id = info_.pop("user_id", "")
226259
session_id = info_.pop("session_id", "")
260+
lang = detect_lang(summary)
261+
tags = (
262+
self.zh_fast_keywords_extractor.textrank(summary)[:3]
263+
if lang == "zh"
264+
else self.en_fast_keywords_extractor.extract_keywords_from_text(summary)[:3]
265+
)
266+
227267
return [
228268
TextualMemoryItem(
229269
memory=(
@@ -244,6 +284,7 @@ def _process_result(
244284
background="",
245285
confidence=0.99,
246286
usage=[],
287+
tags=tags,
247288
embedding=self.embedder.embed([content])[0],
248289
internet_info={
249290
"title": title,

src/memos/memories/textual/tree_text_memory/retrieve/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
1. Keys: the high-level keywords directly relevant to the user’s task.
55
2. Tags: thematic tags to help categorize and retrieve related memories.
66
3. Goal Type: retrieval | qa | generation
7-
4. Rephrased instruction: Give a rephrased task instruction based on the former conversation to make it less confusing to look alone. Make full use of information related to the query. If you think the task instruction is easy enough to understand, or there is no former conversation, set "rephrased_instruction" to an empty string.
7+
4. Rephrased instruction: Give a rephrased task instruction based on the former conversation to make it less confusing to look alone. Make full use of information related to the query, including user's personal information. If you think the task instruction is easy enough to understand, or there is no former conversation, set "rephrased_instruction" to an empty string.
88
5. Need for internet search: If the user's task instruction only involves objective facts or can be completed without introducing external knowledge, set "internet_search" to False. Otherwise, set it to True.
99
6. Memories: Provide 2–5 short semantic expansions or rephrasings of the rephrased/original user task instruction. These are used for improved embedding search coverage. Each should be clear, concise, and meaningful for retrieval.
1010

src/memos/memories/textual/tree_text_memory/retrieve/xinyusearch.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,7 @@ def _process_result(
347347
source="web",
348348
sources=[SourceMessage(type="web", url=url)] if url else [],
349349
visibility="public",
350+
tags=self._extract_tags(title, content, summary),
350351
info=info_,
351352
background="",
352353
confidence=0.99,

src/memos/templates/mos_prompts.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,6 @@
6565
MEMOS_PRODUCT_BASE_PROMPT = """
6666
# System
6767
- Role: You are MemOS🧚, nickname Little M(小忆🧚) — an advanced Memory Operating System assistant by 记忆张量(MemTensor Technology Co., Ltd.), a Shanghai-based AI research company advised by an academician of the Chinese Academy of Sciences.
68-
- Date: {date}
6968
7069
- Mission & Values: Uphold MemTensor’s vision of "low cost, low hallucination, high generalization, exploring AI development paths aligned with China’s national context and driving the adoption of trustworthy AI technologies. MemOS’s mission is to give large language models (LLMs) and autonomous agents **human-like long-term memory**, turning memory from a black-box inside model weights into a **manageable, schedulable, and auditable** core resource.
7170
@@ -105,12 +104,14 @@
105104
- When using facts from memories, add citations at the END of the sentence with `[i:memId]`.
106105
- `i` is the order in the "Memories" section below (starting at 1). `memId` is the given short memory ID.
107106
- Multiple citations must be concatenated directly, e.g., `[1:sed23s], [
108-
2:1k3sdg], [3:ghi789]`. Do NOT use commas inside brackets.
107+
2:1k3sdg], [3:ghi789]`. Do NOT use commas inside brackets. Do not use wrong format like `[def456]`.
109108
- Cite only relevant memories; keep citations minimal but sufficient.
110109
- Do not use a connected format like [1:abc123,2:def456].
111110
- Brackets MUST be English half-width square brackets `[]`, NEVER use Chinese full-width brackets `【】` or any other symbols.
112111
- **When a sentence draws on an assistant/other-party memory**, mark the role in the sentence (“The assistant suggests…”) and add the corresponding citation at the end per this rule; e.g., “The assistant suggests choosing a midi dress and visiting COS in Guomao. [1:abc123]”
113112
113+
# Current Date: {date}
114+
114115
# Style
115116
- Tone: {tone}; Verbosity: {verbosity}.
116117
- Be direct, well-structured, and conversational. Avoid fluff. Use short lists when helpful.

0 commit comments

Comments
 (0)