Skip to content

feat: add integration test case for flow execution#62

Open
weijinglin wants to merge 4 commits intomainfrom
flow_test
Open

feat: add integration test case for flow execution#62
weijinglin wants to merge 4 commits intomainfrom
flow_test

Conversation

@weijinglin
Copy link
Copy Markdown
Collaborator

@weijinglin weijinglin commented Dec 9, 2025

Summary by CodeRabbit

发布说明

  • 重构

    • 调度流程改为使用类型安全的枚举标识,提升可靠性与可维护性。
  • 测试

    • 新增完整集成测试套件,覆盖向量索引、图数据提取、模式生成、提示生成、多种RAG 工作流、示例索引与文本到查询转换等关键流程,增强回归保障。
  • 文档

    • 添加贡献指南(CONTRIBUTING.md),说明本地运行集成测试与提交流程。

@github-actions github-actions Bot added the llm label Dec 9, 2025
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 9, 2025

@codecov-ai-reviewer review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 9, 2025

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 71ec9720-31cf-477d-b4d9-603b3cfbfc32

📥 Commits

Reviewing files that changed from the base of the PR and between 5bdb06a and 680059b.

📒 Files selected for processing (1)
  • CONTRIBUTING.md
✅ Files skipped from review due to trivial changes (1)
  • CONTRIBUTING.md

Walkthrough

将向调度器传递的流名称从字符串字面量替换为 FlowName 枚举成员,并新增一个端到端集成测试模块以验证多个流程(包括向量索引、图抽取、RAG、模式生成等)的行为。

Changes

Cohort / File(s) Summary
枚举参数替换
hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py
scheduler.schedule_flow("build_vector_index", texts) 改为 scheduler.schedule_flow(FlowName.BUILD_VECTOR_INDEX, texts),改用 FlowName 枚举传递流名称,影响调度器调用签名。
集成测试新增
hugegraph-llm/src/tests/integration/test_flows_integration.py
新增 TestFlowsIntegration 测试类(@pytest.fixture(autouse=True) 初始化),包含多个测试方法以通过 SchedulerSingleton 运行并断言 BUILD_VECTOR_INDEX、GRAPH_EXTRACT、IMPORT_GRAPH_DATA、UPDATE_VID_EMBEDDINGS、BUILD_SCHEMA、PROMPT_GENERATE、RAG、TEXT2GREMLIN 等流程的输出。
贡献指南(文档)
CONTRIBUTING.md
新增贡献指南,说明本地运行端到端集成测试的前置条件与命令,并列出需通过的若干集成测试和提交前检查步骤。

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🐰 枚举跳出旧字符串,
调度清晰步步行,
测试跑遍流程庭,
向量与图共成景,
我在草丛里欢鸣!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed 标题准确地反映了本次拉取请求的主要目标——添加流执行的集成测试用例,这与新增的test_flows_integration.py文件及其包含的多个集成测试相符。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch flow_test
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.

OpenGrep is compatible with Semgrep configurations. Add an opengrep.yml or semgrep.yml configuration file to your project to enable OpenGrep analysis.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @weijinglin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust integration test suite to ensure the stability and correctness of the hugegraph-llm project's core LLM-driven workflows. By validating the interaction between different components, from data processing to complex query generation, it significantly improves the reliability of the system. Additionally, a minor refactoring was performed to standardize flow name usage, contributing to cleaner code.

Highlights

  • New Integration Test Suite: A comprehensive integration test file, test_flows_integration.py, has been added to validate the end-to-end execution of various LLM-related flows within the system.
  • Flow Execution Validation: The new tests cover critical flows such as building knowledge graphs, generating schemas, extracting prompts, various RAG (Retrieval Augmented Generation) scenarios, and text-to-Gremlin conversions, ensuring their correct functionality and interaction.
  • Improved Flow Name Consistency: The build_vector_index utility has been updated to use the FlowName enum for scheduling flows, enhancing type safety and code readability by replacing a hardcoded string with an enum member.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has checked 410 files.

Valid Invalid Ignored Fixed
339 1 70 0
Click to see the invalid file list
  • hugegraph-llm/src/tests/integration/test_flows_integration.py
Use this command to fix any missing license headers
```bash

docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix

</details>

Comment thread hugegraph-llm/src/tests/integration/test_flows_integration.py
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds integration tests for various flows, which is a great addition for ensuring system stability. The changes in vector_index_utils.py to use the FlowName enum instead of a hardcoded string are a good improvement for maintainability.

My review of the new test file test_flows_integration.py focuses on improving the test quality and correctness:

  • Several tests use a try...except pytest.fail() pattern which is an anti-pattern in pytest and should be removed to improve debuggability. Some of these also contain copy-pasted, incorrect error messages.
  • The test_rag method has significant code duplication and a bug that causes it to run with incorrect parameters. I've suggested a refactoring using pytest.mark.parametrize to address both issues.

Overall, the core logic of adding integration tests is sound, and the suggested changes will make them more robust and maintainable.

Comment on lines +247 to +415
}
"""

self.scheduler.schedule_flow(FlowName.BUILD_SCHEMA, [self.index_text], query_examples, few_shot)
except Exception as e:
pytest.fail(f"BUILD_VECTOR_INDEX flow failed: {e}")

def test_graph_extract_prompt(self):
try:
scenario = "social relationships"
example_name = "Official Person-Relationship Extraction"

res = self.scheduler.schedule_flow(FlowName.PROMPT_GENERATE, self.index_text, scenario, example_name)
assert res is not None, "The result of PROMPT_GENERATE flow should not be None"
except Exception as e:
pytest.fail(f"BUILD_VECTOR_INDEX flow failed: {e}")

def test_rag(self):
answer_prompt = """
You are an expert in the fields of knowledge graphs and natural language processing.

Please provide precise and accurate answers based on the following context information, which is sorted in order of importance from high to low, without using any fabricated knowledge.

Given the context information and without using fictive knowledge,
answer the following query in a concise and professional manner.
Please write your answer using Markdown with MathJax syntax, where inline math is wrapped with `$...$`

Context information is below.
---------------------
{context_str}
---------------------
Query: {query_str}
Answer:

"""

keywords_extract_prompt = """
Instructions:
Please perform the following tasks on the text below:
1. Extract, evaluate, and rank keywords from the text:
- Minimum 0, maximum MAX_KEYWORDS keywords.
- Keywords should be complete semantic words or phrases, ensuring information completeness, without any changes to the English capitalization.
- Assign an importance score to each keyword, as a float between 0.0 and 1.0. A higher score indicates a greater contribution to the core idea of the text.
- Keywords may contain spaces, but must not contain commas or colons.
- The final list of keywords must be sorted in descending order based on their importance score.
2. Identify keywords that need rewriting:
- From the extracted keywords, identify those that are ambiguous or lack information in the original context.
3. Generate synonyms:
- For these keywords that need rewriting, generate synonyms or similar terms in the given context.
- Replace the corresponding keywords in the original text with generated synonyms.
- If no suitable synonym exists for a keyword, keep the original keyword unchanged.

Requirements:
- Keywords should be meaningful and specific entities; avoid meaningless or overly broad terms, or single-character words (e.g., "items", "actions", "effects", "functions", "the", "he").
- Prioritize extracting subjects, verbs, and objects; avoid function words or auxiliary words.
- Maintain semantic integrity: Extracted keywords should preserve their semantic and informational completeness in the original context (e.g., "Apple computer" should be extracted as a whole, not split into "Apple" and "computer").
- Avoid generalization: Do not expand into unrelated generalized categories.

Notes:
- Only consider context-relevant synonyms: Only consider semantic synonyms and words with similar meanings in the given context.
- Adjust keyword length: If keywords are relatively broad, you can appropriately increase individual keyword length based on context (e.g., "illegal behavior" can be extracted as a single keyword, or as "illegal", but should not be split into "illegal" and "behavior").

Output Format:
- Output only one line, prefixed with KEYWORDS:, followed by a comma-separated list of items. Each item should be in the format keyword:importance_score(round to two decimal places). If a keyword has been replaced by a synonym, use the synonym as the keyword in the output.
- Format example:
KEYWORDS:keyword1:score1,keyword2:score2,...,keywordN:scoreN

MAX_KEYWORDS: {max_keywords}
Text:
{question}

"""

query = "梁漱溟和梁济的关系是什么?"

raw_answer = True
vector_only_answer = False
graph_only_answer = False
graph_vector_answer = False
graph_ratio = 0.6
rerank_method = "bleu"
near_neighbor_first = False
custom_related_information = ""

graph_search, gremlin_prompt, vector_search = update_ui_configs(
answer_prompt,
custom_related_information,
graph_only_answer,
graph_vector_answer,
None,
keywords_extract_prompt,
query,
vector_only_answer,
)

res = self.scheduler.schedule_flow(
FlowName.RAG_RAW,
query=query,
vector_search=vector_search,
graph_search=graph_search,
raw_answer=raw_answer,
vector_only_answer=vector_only_answer,
graph_only_answer=graph_only_answer,
graph_vector_answer=graph_vector_answer,
graph_ratio=graph_ratio,
rerank_method=rerank_method,
near_neighbor_first=near_neighbor_first,
custom_related_information=custom_related_information,
answer_prompt=answer_prompt,
keywords_extract_prompt=keywords_extract_prompt,
gremlin_tmpl_num=-1,
gremlin_prompt=gremlin_prompt,
)
assert res is not None, "The result of RAG flow should not be None"

raw_answer = False
vector_only_answer = True
graph_only_answer = False
graph_vector_answer = False
res = self.scheduler.schedule_flow(
FlowName.RAG_VECTOR_ONLY,
query=query,
vector_search=vector_search,
graph_search=graph_search,
raw_answer=raw_answer,
vector_only_answer=vector_only_answer,
graph_only_answer=graph_only_answer,
graph_vector_answer=graph_vector_answer,
graph_ratio=graph_ratio,
rerank_method=rerank_method,
near_neighbor_first=near_neighbor_first,
custom_related_information=custom_related_information,
answer_prompt=answer_prompt,
keywords_extract_prompt=keywords_extract_prompt,
gremlin_tmpl_num=-1,
gremlin_prompt=gremlin_prompt,
)
assert res is not None, "The result of RAG flow should not be None"

raw_answer = False
vector_only_answer = False
graph_only_answer = True
graph_vector_answer = False
res = self.scheduler.schedule_flow(
FlowName.RAG_GRAPH_ONLY,
query=query,
vector_search=vector_search,
graph_search=graph_search,
raw_answer=raw_answer,
vector_only_answer=vector_only_answer,
graph_only_answer=graph_only_answer,
graph_vector_answer=graph_vector_answer,
graph_ratio=graph_ratio,
rerank_method=rerank_method,
near_neighbor_first=near_neighbor_first,
custom_related_information=custom_related_information,
answer_prompt=answer_prompt,
keywords_extract_prompt=keywords_extract_prompt,
gremlin_tmpl_num=-1,
gremlin_prompt=gremlin_prompt,
)
assert res is not None, "The result of RAG flow should not be None"

raw_answer = False
vector_only_answer = False
graph_only_answer = False
graph_vector_answer = True
res = self.scheduler.schedule_flow(
FlowName.RAG_GRAPH_VECTOR,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test can be significantly improved by using pytest.mark.parametrize to reduce code duplication. Currently, it repeats the call to schedule_flow four times with minor variations.

There's also a bug: update_ui_configs is called only once with initial boolean flags, so vector_search and graph_search are always False. This means the RAG flows are not being tested with the correct search modes.

The suggested refactoring uses parameterization to create a cleaner, more maintainable test and fixes the bug by ensuring update_ui_configs is called with the correct parameters for each test case.

    @pytest.mark.parametrize(
        "flow_name, raw_answer, vector_only_answer, graph_only_answer, graph_vector_answer",
        [
            (FlowName.RAG_RAW, True, False, False, False),
            (FlowName.RAG_VECTOR_ONLY, False, True, False, False),
            (FlowName.RAG_GRAPH_ONLY, False, False, True, False),
            (FlowName.RAG_GRAPH_VECTOR, False, False, False, True),
        ],
    )
    def test_rag(
        self,
        flow_name,
        raw_answer,
        vector_only_answer,
        graph_only_answer,
        graph_vector_answer,
    ):
        answer_prompt = """
        You are an expert in the fields of knowledge graphs and natural language processing.

        Please provide precise and accurate answers based on the following context information, which is sorted in order of importance from high to low, without using any fabricated knowledge.

        Given the context information and without using fictive knowledge,
        answer the following query in a concise and professional manner.
        Please write your answer using Markdown with MathJax syntax, where inline math is wrapped with `$...$`

        Context information is below.
        ---------------------
        {context_str}
        ---------------------
        Query: {query_str}
        Answer:

        """

        keywords_extract_prompt = """
        Instructions:
        Please perform the following tasks on the text below:
        1. Extract, evaluate, and rank keywords from the text:
        - Minimum 0, maximum MAX_KEYWORDS keywords.
        - Keywords should be complete semantic words or phrases, ensuring information completeness, without any changes to the English capitalization.
        - Assign an importance score to each keyword, as a float between 0.0 and 1.0. A higher score indicates a greater contribution to the core idea of the text.
        - Keywords may contain spaces, but must not contain commas or colons.
        - The final list of keywords must be sorted in descending order based on their importance score.
        2. Identify keywords that need rewriting:
        - From the extracted keywords, identify those that are ambiguous or lack information in the original context.
        3. Generate synonyms:
        - For these keywords that need rewriting, generate synonyms or similar terms in the given context.
        - Replace the corresponding keywords in the original text with generated synonyms.
        - If no suitable synonym exists for a keyword, keep the original keyword unchanged.

        Requirements:
        - Keywords should be meaningful and specific entities; avoid meaningless or overly broad terms, or single-character words (e.g., "items", "actions", "effects", "functions", "the", "he").
        - Prioritize extracting subjects, verbs, and objects; avoid function words or auxiliary words.
        - Maintain semantic integrity: Extracted keywords should preserve their semantic and informational completeness in the original context (e.g., "Apple computer" should be extracted as a whole, not split into "Apple" and "computer").
        - Avoid generalization: Do not expand into unrelated generalized categories.

        Notes:
        - Only consider context-relevant synonyms: Only consider semantic synonyms and words with similar meanings in the given context.
        - Adjust keyword length: If keywords are relatively broad, you can appropriately increase individual keyword length based on context (e.g., "illegal behavior" can be extracted as a single keyword, or as "illegal", but should not be split into "illegal" and "behavior").

        Output Format:
        - Output only one line, prefixed with KEYWORDS:, followed by a comma-separated list of items. Each item should be in the format keyword:importance_score(round to two decimal places). If a keyword has been replaced by a synonym, use the synonym as the keyword in the output.
        - Format example:
        KEYWORDS:keyword1:score1,keyword2:score2,...,keywordN:scoreN

        MAX_KEYWORDS: {max_keywords}
        Text:
        {question}

        """

        query = "梁漱溟和梁济的关系是什么?"

        graph_ratio = 0.6
        rerank_method = "bleu"
        near_neighbor_first = False
        custom_related_information = ""

        graph_search, gremlin_prompt, vector_search = update_ui_configs(
            answer_prompt,
            custom_related_information,
            graph_only_answer,
            graph_vector_answer,
            None,
            keywords_extract_prompt,
            query,
            vector_only_answer,
        )

        res = self.scheduler.schedule_flow(
            flow_name,
            query=query,
            vector_search=vector_search,
            graph_search=graph_search,
            raw_answer=raw_answer,
            vector_only_answer=vector_only_answer,
            graph_only_answer=graph_only_answer,
            graph_vector_answer=graph_vector_answer,
            graph_ratio=graph_ratio,
            rerank_method=rerank_method,
            near_neighbor_first=near_neighbor_first,
            custom_related_information=custom_related_information,
            answer_prompt=answer_prompt,
            keywords_extract_prompt=keywords_extract_prompt,
            gremlin_tmpl_num=-1,
            gremlin_prompt=gremlin_prompt,
        )
        assert res is not None, f"The result of {flow_name.value} flow should not be None"

Comment on lines +166 to +167

### Output example:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This try...except block is an anti-pattern in pytest. It suppresses the original exception and traceback, making it harder to debug test failures. The error message is also misleading as it always blames the BUILD_VECTOR_INDEX flow. Please remove the try...except block (from line 21 and this block) and let pytest handle exceptions to get more informative error reports.

Comment on lines +234 to +235
"target_label": "person",
"properties": [
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This try...except block should be removed for the same reasons as in test_build_knowledge_graph. It hinders debugging. Additionally, the error message is a copy-paste error and is incorrect. It should refer to the BUILD_SCHEMA flow, not BUILD_VECTOR_INDEX.

Comment on lines +244 to +245
"properties": []
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This try...except block should also be removed. It hides valuable debugging information. The error message is another copy-paste error from test_build_knowledge_graph.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
hugegraph-llm/src/tests/integration/test_flows_integration.py (4)

30-35: 建议添加清理逻辑以确保测试隔离。

当前的 setup fixture 没有对应的 teardown 逻辑。对于集成测试,建议添加清理步骤以确保测试之间的隔离性,避免测试结果相互影响。

     @pytest.fixture(autouse=True)
     def setup(self):
         self.index_text = """
         梁漱溟年轻时,一日,他与父亲梁济讨论当时一战欧洲的时局...
         """
         self.scheduler = SchedulerSingleton.get_instance()
+        yield
+        # 可选:添加清理逻辑,如清理测试产生的索引数据

37-184: 测试方法职责过多,且错误消息具有误导性。

  1. 单一职责问题:此测试方法同时测试了 4 个不同的流程(BUILD_VECTOR_INDEX、GRAPH_EXTRACT、IMPORT_GRAPH_DATA、UPDATE_VID_EMBEDDINGS)。建议拆分为独立的测试方法,以便更好地定位失败原因。

  2. 误导性错误消息:第 184 行的错误消息始终显示 "BUILD_VECTOR_INDEX flow failed",但实际上可能是任意一个流程失败。

-        except Exception as e:
-            pytest.fail(f"BUILD_VECTOR_INDEX flow failed: {e}")
+        except Exception as e:
+            pytest.fail(f"test_build_knowledge_graph failed: {e}")

建议将此测试拆分为多个独立的测试方法,或者使用更精确的错误消息标识具体失败的流程。


264-432: 建议使用参数化测试减少代码重复。

此测试方法对 4 个 RAG 流程(RAG_RAW、RAG_VECTOR_ONLY、RAG_GRAPH_ONLY、RAG_GRAPH_VECTOR)进行了几乎相同的调用,只有少数参数不同。此外,与其他测试方法不同,此方法缺少 try/except 错误处理。

考虑使用 @pytest.mark.parametrize 来减少代码重复:

@pytest.mark.parametrize("flow_name,raw_answer,vector_only,graph_only,graph_vector", [
    (FlowName.RAG_RAW, True, False, False, False),
    (FlowName.RAG_VECTOR_ONLY, False, True, False, False),
    (FlowName.RAG_GRAPH_ONLY, False, False, True, False),
    (FlowName.RAG_GRAPH_VECTOR, False, False, False, True),
])
def test_rag(self, flow_name, raw_answer, vector_only, graph_only, graph_vector):
    # ... 单一流程测试逻辑

438-506: 建议将大型提示字符串提取为常量或配置。

gremlin_prompt_input 字符串占据了大量代码行(约 60 行)。建议将这类大型提示模板提取到单独的常量、fixture 或配置文件中,以提高测试代码的可读性。

例如,可以在类级别定义:

class TestFlowsIntegration:
    GREMLIN_PROMPT_TEMPLATE = """..."""
    
    def test_text_2_gremlin(self):
        res = self.scheduler.schedule_flow(
            FlowName.TEXT2GREMLIN, query, example_num, schema, self.GREMLIN_PROMPT_TEMPLATE, None
        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7116b31 and 89e83f3.

📒 Files selected for processing (2)
  • hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py (2 hunks)
  • hugegraph-llm/src/tests/integration/test_flows_integration.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
hugegraph-llm/**/*.py

📄 CodeRabbit inference engine (hugegraph-llm/AGENTS.md)

hugegraph-llm/**/*.py: Adhere to ruff code style for Python code
Type-check Python code with mypy
Keep each Python file under 600 lines for maintainability

Files:

  • hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py
  • hugegraph-llm/src/tests/integration/test_flows_integration.py
hugegraph-llm/src/hugegraph_llm/utils/**/*.py

📄 CodeRabbit inference engine (hugegraph-llm/AGENTS.md)

Place utilities, logging, and decorators under src/hugegraph_llm/utils/

Files:

  • hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py
hugegraph-llm/src/tests/**/*.py

📄 CodeRabbit inference engine (hugegraph-llm/AGENTS.md)

Place unit tests under src/tests/ and ensure they are discoverable by unittest/pytest

Files:

  • hugegraph-llm/src/tests/integration/test_flows_integration.py
🧠 Learnings (6)
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/indices/**/*.py : Store vector and graph indexing code under src/hugegraph_llm/indices/

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py
📚 Learning: 2025-10-21T07:20:54.516Z
Learnt from: weijinglin
Repo: hugegraph/hugegraph-ai PR: 54
File: hugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.py:55-55
Timestamp: 2025-10-21T07:20:54.516Z
Learning: In hugegraph-llm flows, the `prepared_input.schema` field in RAG flows (rag_flow_raw.py, rag_flow_vector_only.py, rag_flow_graph_vector.py, rag_flow_graph_only.py) is intentionally assigned `huge_settings.graph_name` (a string graph name) instead of using `prepared_input.graph_name`. This is legacy design where the underlying Operator's schema field is polymorphic and accepts either JSON schema objects or graph name strings, branching internally based on content type. This pattern should not be flagged as incorrect.

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/demo/rag_demo/**/*.py : Implement the Gradio UI application under src/hugegraph_llm/demo/rag_demo/

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/tests/**/*.py : Place unit tests under src/tests/ and ensure they are discoverable by unittest/pytest

Applied to files:

  • hugegraph-llm/src/tests/integration/test_flows_integration.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/gremlin_generate_task.py : Maintain the Text2Gremlin pipeline in src/hugegraph_llm/operators/gremlin_generate_task.py

Applied to files:

  • hugegraph-llm/src/tests/integration/test_flows_integration.py
📚 Learning: 2025-06-25T09:50:06.213Z
Learnt from: day0n
Repo: hugegraph/hugegraph-ai PR: 16
File: hugegraph-llm/src/hugegraph_llm/config/models/base_prompt_config.py:124-137
Timestamp: 2025-06-25T09:50:06.213Z
Learning: Language-specific prompt attributes (answer_prompt_CN, answer_prompt_EN, extract_graph_prompt_CN, extract_graph_prompt_EN, gremlin_generate_prompt_CN, gremlin_generate_prompt_EN, keywords_extract_prompt_CN, keywords_extract_prompt_EN, doc_input_text_CN, doc_input_text_EN) are defined in the PromptConfig class in hugegraph-llm/src/hugegraph_llm/config/prompt_config.py, which inherits from BasePromptConfig, making these attributes accessible in the parent class methods.

Applied to files:

  • hugegraph-llm/src/tests/integration/test_flows_integration.py
🧬 Code graph analysis (2)
hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py (2)
hugegraph-llm/src/hugegraph_llm/flows/__init__.py (1)
  • FlowName (21-34)
hugegraph-llm/src/hugegraph_llm/flows/scheduler.py (1)
  • schedule_flow (106-141)
hugegraph-llm/src/tests/integration/test_flows_integration.py (4)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py (1)
  • update_ui_configs (123-150)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/text2gremlin_block.py (1)
  • build_example_vector_index (84-116)
hugegraph-llm/src/hugegraph_llm/flows/__init__.py (1)
  • FlowName (21-34)
hugegraph-llm/src/hugegraph_llm/flows/scheduler.py (3)
  • SchedulerSingleton (181-191)
  • get_instance (186-191)
  • schedule_flow (106-141)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build (3.10)
🔇 Additional comments (2)
hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py (1)

25-25: LGTM!

使用 FlowName.BUILD_VECTOR_INDEX 枚举替代字符串字面量是一个很好的改进,提高了类型安全性和代码可维护性。由于 FlowName 继承自 strEnum,它与 schedule_flow 方法的字符串参数兼容。

Also applies to: 87-87

hugegraph-llm/src/tests/integration/test_flows_integration.py (1)

434-436: LGTM!

测试简洁明了,验证了 build_example_vector_index 函数的基本功能。

Comment thread hugegraph-llm/src/tests/integration/test_flows_integration.py
Comment thread hugegraph-llm/src/tests/integration/test_flows_integration.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant