loader update by kenssa4eedfd · Pull Request #7 · hugegraph/hugegraph-toolchain

kenssa4eedfd · 2026-01-16T16:34:44Z

…ader, and refactored the failure-handling logic for batch inserts.

Purpose of the PR

Adjusted several default parameters and descriptions in the Loader, and refactored the failure-handling logic for batch inserts.

Main Changes

Verifying these changes

Trivial rework / code cleanup without any test coverage. (No Need)
Already covered by existing tests, such as (please modify tests here).
Need tests and can be verified as follows:
- xxx

Does this PR potentially affect the following parts?

Documentation Status

Doc - TODO
Doc - Done
Doc - No Need

Summary by CodeRabbit

发布说明

新功能

添加 --parser-threads（兼容并弃用 --parallel-count）；新增 batch-failure-fallback（默认启用）。
连接池默认值改为基于 CPU 计算，并在解析/批处理线程变化时可自动调整。

Bug修复

改进批量插入错误处理：启用回退时切换到单条插入，禁用时停止并向用户报告错误。

测试

增加/扩展单元与功能测试，覆盖并发、回退与连接池自动调整行为。

运维/其他

删除了自动 PR 评论的 CI 工作流；项目配置文件中审查规则调整。

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ader, and refactored the failure-handling logic for batch inserts.

github-actions · 2026-01-16T16:34:53Z

@codecov-ai-reviewer review

coderabbitai · 2026-01-16T16:34:58Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

将解析线程配置从 parallelCount 迁移为 parseThreads（保留兼容别名）；新增连接池默认常量并在解析后根据批量线程自动调整连接池；在批量插入失败时引入可选的单条插入回退路径；并补充/调整相关单元与功能测试，移除一个 CI 工作流文件。

Changes

Cohort / File(s)	变更概述
线程配置与连接池管理 `hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java`	新增 `DEFAULT_MAX_CONNECTIONS` 与 `DEFAULT_MAX_CONNECTIONS_PER_ROUTE`；将公开字段 `parallelCount` 替换为 `parseThreads`（同时接受 `--parallel-count` 与 `--parser-threads`，并记录弃用警告）；新增 `batchFailureFallback` 选项；增加解析后自动调整连接池的私有方法并添加相关日志。
核心加载器更新 `hugegraph-loader/src/main/java/org/apache/hugegraph/loader/HugeGraphLoader.java`	将方法内对并行解析线程的使用从 `parallelCount` 替换为 `parseThreads`；移除方法内对 `parallelCount` 的默认赋值逻辑；更新日志与线程池创建为使用 `parseThreads`。
批量插入失败处理 `hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java`	在 `submitBatch` 中添加停止状态检查以提前返回并释放信号量；在批量插入失败回调中，根据 `batchFailureFallback` 选择转为单条插入或标记错误并停止加载；确保信号量与时间统计在所有路径均被正确释放/记录并补充相应日志/用户通知。
功能测试调整 `hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java`, `.../KafkaLoadTest.java`	多处测试命令行加入 `--parser-threads "1"`（含 FIXME 注释），使测试在解析时使用固定解析线程数以提高确定性。
LoadOptions 单元测试 `hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/LoadOptionsTest.java`	新增单元测试（约 213 行），覆盖连接池自动调整、解析线程默认与验证、对弃用参数 `--parallel-count` 的处理与日志断言。
TaskManager 单元测试 `hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/TaskManagerTest.java`	新增单元测试（约 536 行），模拟批量失败回退（启用/禁用）、并发提交、信号量行为及相关指标与上下文状态断言。
CI 工作流 `.github/workflows/auto-pr-review.yml`	删除该 GitHub Actions 工作流文件（移除自动 PR 评论任务）。

Sequence Diagram

sequenceDiagram
    participant Client as Client/User
    participant TaskMgr as TaskManager
    participant BatchTask as BatchInsertTask
    participant GraphMgr as GraphManager
    participant SingleTask as SingleInsertTask

    Client->>TaskMgr: submitBatch(edges)
    TaskMgr->>TaskMgr: acquireBatchSemaphore
    alt context stopped
        TaskMgr->>TaskMgr: releaseBatchSemaphore
        TaskMgr-->>Client: return early
    else context running
        TaskMgr->>BatchTask: submit batch insert
        BatchTask->>GraphMgr: insertInBatch()
        alt success
            GraphMgr-->>BatchTask: success
            BatchTask->>TaskMgr: update metrics & releaseSemaphore
        else failure
            GraphMgr-->>BatchTask: error
            BatchTask->>TaskMgr: whenComplete(error)
            alt batchFailureFallback enabled
                TaskMgr->>TaskMgr: log warning "try single insert"
                TaskMgr->>SingleTask: submit single inserts
                SingleTask->>GraphMgr: insertOne() x N
                GraphMgr-->>SingleTask: results
                TaskMgr->>TaskMgr: releaseBatchSemaphore
            else batchFailureFallback disabled
                TaskMgr->>TaskMgr: mark error in context
                TaskMgr->>TaskMgr: stop loading & print error
                TaskMgr->>TaskMgr: releaseBatchSemaphore
            end
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Create auto-pr-review.yml #1 — 涉及相同路径 .github/workflows/auto-pr-review.yml 的变更，可能与本 PR 删除该 workflow 存在直接文件级关联或冲突。

Poem

🐰 我是小兔子在草地里跑，
解析线程改名静又闹，
批量若跌倒，单条来接套，
连接池自适应，日志悄声报，
测试护航，代码轻轻笑 :)

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title "loader update" is vague and generic, lacking specificity about the actual changes made to the codebase.	Consider using a more descriptive title that captures the main change, such as "Refactor batch insert failure handling and standardize parser thread parameters" or "Add parser-threads option and improve batch insert fallback behavior".

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch loader-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-01-16T16:35:03Z

Summary of Changes

Hello @kenssa4eedfd, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the HugeGraph Loader by refining its configuration options for parallelism and connection management, making it more robust and user-friendly. It also significantly improves the failure handling mechanism for batch inserts, ensuring that the loading process stops gracefully and informs the user immediately upon encountering critical errors, rather than attempting less efficient fallback strategies.

Highlights

Loader Parallelism Configuration: The parallelCount option has been renamed to parallelThreads (with --parser-threads as an alias) and its default behavior has been updated. It now automatically determines the number of parallel read pipelines based on the number of available CPU cores if not explicitly set, and includes validation to ensure the value is at least 1.
Connection Pool Defaults and Descriptions: New constants DEFAULT_MAX_CONNECTIONS and DEFAULT_MAX_CONNECTIONS_PER_ROUTE have been introduced. The descriptions for batch-insert-threads, max-conn, and max-conn-per-route parameters have been enhanced to clarify how connection limits may be auto-adjusted based on the batch-insert-threads setting.
Refactored Batch Insert Failure Handling: The error handling logic for batch inserts has been significantly changed. Instead of attempting single inserts after a batch failure, the system will now immediately log an error, mark the loading context as stopped, shut down the batch and single insert services, and print a user-friendly error message, ensuring a more decisive failure response.
Early Exit for Stopped Loading Context: An early exit condition has been added to the submitBatch method. If the loading context has been stopped (e.g., due to a previous error), subsequent batch submissions will be prevented, improving system stability and preventing unnecessary work.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates several loader parameters and refactors the failure handling for batch inserts. The parameter changes, such as renaming --parallel-count to --parser-threads and updating default values and descriptions, improve clarity and usability. The core change is in the batch insert failure logic, which now adopts a fail-fast strategy instead of falling back to single inserts. While this can be a valid approach, I've found a couple of issues in the new implementation within TaskManager.java. Specifically, there's a bug in metrics reporting where the 'in-flight' count is not correctly updated on failure, and the new shutdown logic is decentralized, which could be improved for better robustness and maintainability. My review includes a suggestion to address these points.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java (1)

141-156: 在 acquire() 和 runAsync() 之间存在竞态条件，执行器关闭会导致信号量与指标泄漏

context.stopped() 仅在 acquire 前检查；若在等待期间执行器被 shutdown，后续 runAsync() 将同步抛出 RejectedExecutionException，此时 whenComplete 回调未注册，无法执行 release 和指标回收。

具体场景：当某个批次插入任务失败时，whenComplete 回调内调用 batchService.shutdown()；此后提交的批次若在 acquire 返回但 runAsync 尚未执行时被提交，将遭遇关闭的执行器，导致异常未被捕获，信号量与 plusFlighting 计数永久泄漏。

建议在 acquire 后增加二次 stopped 检查，并对 runAsync 提交失败添加兜底处理。

🔧 建议修复

+import java.util.concurrent.RejectedExecutionException;
...
         try {
             this.batchSemaphore.acquire();
         } catch (InterruptedException e) {
             throw new LoadException("Interrupted while waiting to submit %s " +
                                     "batch in batch mode", e, mapping.type());
         }
+        if (this.context.stopped()) {
+            this.batchSemaphore.release();
+            return;
+        }
         LoadSummary summary = this.context.summary();
         summary.metrics(struct).plusFlighting(batch.size());
 
         InsertTask task = new BatchInsertTask(this.context, struct,
                                               mapping, batch);
-        CompletableFuture.runAsync(task, this.batchService).whenComplete(
-            (r, e) -> {
-                if (e != null) {
-                    LOG.error("Batch insert {} error, interrupting import", mapping.type(), e);
-                    this.context.occurredError();
-                    this.context.stopLoading();
-                    this.batchService.shutdown();
-                    this.singleService.shutdown();
-                    Printer.printError("Batch insert %s failed, stop loading, Please check the logs",
-                                       mapping.type().string());
-                } else {
-                    summary.metrics(struct).minusFlighting(batch.size());
-                }
-
-                this.batchSemaphore.release();
-                long end = System.currentTimeMillis();
-                this.context.summary().addTimeRange(mapping.type(), start, end);
-            });
+        try {
+            CompletableFuture.runAsync(task, this.batchService).whenComplete(
+                (r, e) -> {
+                    if (e != null) {
+                        LOG.error("Batch insert {} error, interrupting import", mapping.type(), e);
+                        this.context.occurredError();
+                        this.context.stopLoading();
+                        this.batchService.shutdown();
+                        this.singleService.shutdown();
+                        Printer.printError("Batch insert %s failed, stop loading, Please check the logs",
+                                           mapping.type().string());
+                    } else {
+                        summary.metrics(struct).minusFlighting(batch.size());
+                    }
+
+                    this.batchSemaphore.release();
+                    long end = System.currentTimeMillis();
+                    this.context.summary().addTimeRange(mapping.type(), start, end);
+                });
+        } catch (RejectedExecutionException ex) {
+            summary.metrics(struct).minusFlighting(batch.size());
+            this.batchSemaphore.release();
+            LOG.warn("Batch insert skipped because executor is shutdown", ex);
+        }

🤖 Fix all issues with AI agents

In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java`:
- Around line 158-168: The failure path in TaskManager's batch completion
handler currently skips releasing the flighting metric
(summary.metrics(struct).minusFlighting), causing inflated stats; update the
whenComplete handler for the batch future so that
summary.metrics(struct).minusFlighting(batch.size()) is invoked in all cases
(both when e != null and else), i.e., perform the minusFlighting call before or
alongside the error handling block that logs the error, calls
this.context.occurredError()/stopLoading(), and shuts down services, ensuring
the metric is always decremented regardless of success or failure.

🧹 Nitpick comments (1)

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java (1)
188-192: 建议补充参数校验注解以增强一致性和用户体验。

当前 parallelThreads 缺少 @Parameter 注解上的 validateWith 属性，与 batchSize 的实现不一致。虽然下游代码（HugeGraphLoader 第 669-673 行）已正确处理了 null 兜底和值域校验，但在参数解析层面补充验证器可以更早地捕获用户输入错误，避免运行时异常。

建议修改如下，与现有的 PositiveValidator 模式保持一致：
建议改进
     `@Parameter`(names = {"--parallel-count", "--parser-threads"}, arity = 1,
+               validateWith = {PositiveValidator.class},
                description = "The number of parallel read pipelines. " +
                              "Default: auto (min(struct count, cpu)). " +
                              "Must be >= 1")
     public Integer parallelThreads = null;

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9683b47 and d3fa860.

📒 Files selected for processing (3)

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/HugeGraphLoader.java
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: hubble-ci (11, 3.11)
GitHub Check: check-dependency-license
GitHub Check: Analyze (javascript)
GitHub Check: loader-ci (11)
GitHub Check: Analyze (java)

🔇 Additional comments (6)

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/HugeGraphLoader.java (2)

665-674: 逻辑正确，实现合理。

自动默认值逻辑与验证顺序正确：

提前返回空结构列表，避免 min(0, availableProcessors) = 0 的边界情况

空值时自动计算默认值

最后验证确保用户显式设置的值也满足 >= 1 的约束

678-685: 日志和线程池创建正确更新。

日志输出和线程池创建均已正确使用 parallelThreads 变量，与新选项名称保持一致。

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java (3)

48-49: 默认连接数常量抽取清晰，OK。

159-165: 描述更新符合新行为，OK。

171-181: 自动调整功能未完整实现，但不存在默认值判定逻辑同步问题。

检查发现 HugeClientBuilder.configPool() 的自动调整逻辑基于参数 == 0 判定，而不是比较常量值。由于 LoadOptions 初始化 maxConnections 和 maxConnectionsPerRoute 时使用的是 CPUS * 4 和 CPUS * 2（非零值），所以不会触发 configPool() 中的自动调整分支。

虽然参数描述说"If left as default...may be auto-adjusted"，但当前代码中不存在根据 batchInsertThreads 动态调整这两个值的实现。

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java (1)

29-29: 新增 Printer 依赖合理。
用于用户侧错误提示，和下面的错误处理路径一致。

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java

kenssa4eedfd · 2026-01-16T16:47:25Z

@codecov-ai-reviewer review

gemini-code-assist

Code Review

This pull request introduces several improvements to the HugeGraph loader. The parameter handling for parallel threads and HTTP connections has been refined for better flexibility and user clarity. A significant change is the refactoring of the batch insert failure logic to a fail-fast approach, which is a sound design choice to avoid long, partially failing loads. My review includes one suggestion to further improve the new failure handling logic by centralizing the shutdown process for better maintainability.

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java

…ader, and refactored the failure-handling logic for batch inserts.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java`:
- Around line 188-193: The parallelThreads field in LoadOptions lacks a
positive-value validator despite the description "Must be >= 1"; add the same
PositiveValidator used by other params to the `@Parameter` annotation for the
public Integer parallelThreads (e.g., include validator =
PositiveValidator.class in the annotation), and ensure the PositiveValidator
class is imported/available so invalid values (0 or negative) are rejected at
parse time.

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java`:
- Around line 171-181: Add positive-value validation to the HTTP connection
parameters to prevent zero/negative inputs: in class LoadOptions, update the
`@Parameter` annotations for the fields maxConnections and maxConnectionsPerRoute
to include the PositiveValidator (same validator used by other numeric options)
so that values <= 0 are rejected at CLI parsing time and avoid connection pool
initialization failures.

In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/HugeGraphLoader.java`:
- Around line 665-680: parallelThreads is declared as an Integer and may be
null, causing an NPE when autounboxed in the call to
ExecutorUtil.newFixedThreadPool; ensure a non-null primitive int is passed by
coalescing null to a safe default (e.g. Math.max(2, CPUS) or
context.options().defaultParallel) before calling newFixedThreadPool. Modify the
code around the use of this.context.options().parallelThreads (or change its
declaration) so you compute an int threads =
(this.context.options().parallelThreads != null ?
this.context.options().parallelThreads : <default>) and then call
ExecutorUtil.newFixedThreadPool(threads, "loader"); also consider validating
threads > 0 and logging a warning if a null/invalid value was replaced.

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/HugeGraphLoader.java

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java`:
- Around line 190-195: The parameter annotation for parallelThreads has a
mismatched default description ("Default: auto max(2, cpu)") versus the actual
default (public int parallelThreads = Math.max(2, CPUS/2)); update the
description in the `@Parameter` annotation to reflect the real default (e.g.,
"Default: max(2, cpu/2)") so the doc and the code match; locate the `@Parameter`
on the parallelThreads field in LoadOptions and edit only the description text
to match the Math.max(2, CPUS/2) semantics.

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java

…ader, and refactored the failure-handling logic for batch inserts.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java`:
- Around line 164-167: When a batch insert fails the code calls
this.context.stopLoading() and prints an error via Printer.printError("Batch
insert %s failed, stop loading...", mapping.type().string()) but does not set
the error flag; call this.context.occurredError() as well to mark the task
context as having errored. Update the failure branch (around the existing
this.context.stopLoading() / Printer.printError(...) block in TaskManager) to
invoke this.context.occurredError() so the loader state/reporting reflects the
error consistently.
- Around line 158-170: The flighting metric is being decremented twice when
batchFailureFallback is true (submitBatch calls plusFlighting(batch.size()) then
on batch failure calls submitInSingle, and both submitBatch's whenComplete and
submitInSingle's whenComplete call summary.metrics(...).minusFlighting), causing
incorrect counts; to fix, remove the minusFlighting call from submitInSingle's
whenComplete so only submitBatch balances the plusFlighting, leaving
submitBatch, submitInSingle, summary.metrics(...).minusFlighting, and the
batchFailureFallback path as the referenced places to edit; if submitInSingle
must also be used independently elsewhere, instead add a boolean flag parameter
to submitInSingle (e.g., skipFlightingDecrement) or document callers so that
only the original submitBatch caller performs minusFlighting in the fallback
flow.

🧹 Nitpick comments (1)

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java (1)
428-430: 自动调整逻辑存在边界条件问题。

当 batchInsertThreads 小于 CPUS 时（例如用户显式设置较小值），此逻辑会被触发但可能不会产生任何调整，因为默认连接数通常已足够。但当 batchInsertThreads 刚好等于 CPUS 时，即使用户显式指定了该值，调整逻辑也不会触发。

建议考虑使用标志位追踪用户是否显式设置了 batchInsertThreads，而非依赖值比较。
♻️ 可选改进方案

可在 LoadOptions 中添加一个 batchInsertThreadsUserSet 标志，或者在 parseOptions 中通过 JCommander 的机制检测参数是否被显式设置：
// 示例：可在解析后检测是否显式设置
// 当前逻辑在大多数情况下可以工作，但如果需要更精确的控制，
// 可考虑在未来版本中引入显式设置检测
Also applies to: 434-451

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java

…ader, and refactored the failure-handling logic for batch inserts.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java (1)

156-176: 成功路径缺少 minusFlighting 调用，导致指标永久膨胀。

当批量插入成功时（e == null），whenComplete 回调不会执行任何 minusFlighting 调用，但 plusFlighting 在第 152 行始终被调用。这将导致 flighting 指标持续增长，统计数据严重失真。

分析当前三种路径：

成功：无 minusFlighting ❌
失败 + 回退：submitInSingle 内部会调用 minusFlighting（第 194 行）✓
失败 + 不回退：第 165 行调用 minusFlighting ✓

🐛 建议修复

         CompletableFuture.runAsync(task, this.batchService).whenComplete(
             (r, e) -> {
+                boolean fallbackToSingle = false;
                 if (e != null) {
                     LOG.error("Batch insert {} error, interrupting import", mapping.type(), e);
                     if (this.options.batchFailureFallback) {
                         LOG.warn("Batch insert {} error, try single insert",
                                  mapping.type(), e);
                         this.submitInSingle(struct, mapping, batch);
+                        fallbackToSingle = true;
                     } else {
-                        summary.metrics(struct).minusFlighting(batch.size());
                         this.context.occurredError();
                         this.context.stopLoading();
                         Printer.printError("Batch insert %s failed, stop loading. Please check the logs",
                                            mapping.type().string());
                     }
                 }
+                // 仅当未回退到单条插入时扣减指标（回退时由 submitInSingle 负责）
+                if (!fallbackToSingle) {
+                    summary.metrics(struct).minusFlighting(batch.size());
+                }
 
                 this.batchSemaphore.release();
                 long end = System.currentTimeMillis();
                 this.context.summary().addTimeRange(mapping.type(), start, end);
             });

…ader, and refactored the failure-handling logic for batch inserts.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In
`@hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/TaskManagerTest.java`:
- Around line 503-507: The test currently swallows InterruptedException after
calling latch.await(5, TimeUnit.SECONDS) and does not restore the thread's
interrupted status; update the catch block that catches InterruptedException in
TaskManagerTest (the latch.await(...) call) to call
Thread.currentThread().interrupt() so the thread's interrupted flag is
preserved, while still allowing the test to proceed or exit as intended.

🧹 Nitpick comments (3)

hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/TaskManagerTest.java (3)
268-269: 建议添加注释说明范围断言的原因。

batchCalls >= 2 && batchCalls <= 3 这个范围断言可能让后续维护者困惑。建议添加注释解释为什么期望值是一个范围而非精确值（由于并发竞争条件导致）。
♻️ 建议的修改
             int batchCalls = FailingConcurrentGraphManager.BATCH_CALLS.get();
+            // Due to race conditions, the number of batch calls can be 2 or 3:
+            // - At least 2: first success call + second failure call that triggers stop
+            // - At most 3: one more call might slip through before stop is observed
             Assert.assertTrue(batchCalls >= 2 && batchCalls <= 3);
330-338: Thread.sleep(50) 可能导致测试不稳定。

使用固定时间等待来确保线程阻塞在信号量上可能在 CI 环境或负载较高的系统上导致测试失败。建议考虑使用更可靠的同步机制，例如在 submitBatch 入口处添加可观测的同步点，或者增加等待时间并添加注释说明。

请确认此测试在 CI 环境中的稳定性。如果遇到间歇性失败，可考虑以下改进：
-            Thread.sleep(50);
+            // Allow sufficient time for the thread to block on semaphore acquisition.
+            // Increase if flaky on slow CI systems.
+            Thread.sleep(100);
462-467: 建议使用 @Before 方法统一重置静态状态。

测试双工类使用静态字段存储状态，当前在各测试方法中手动重置。建议添加 @Before 方法统一重置，以提高可维护性并防止遗漏导致的测试干扰。
♻️ 建议的修改

在类中添加 @Before 方法：
`@Before`
public void resetTestDoubles() {
    FailingBatchGraphManager.BATCH_CALLS.set(0);
    FailingBatchGraphManager.SINGLE_CALLS.set(0);
    FailingConcurrentGraphManager.BATCH_CALLS.set(0);
    FailingConcurrentGraphManager.FIRST_STARTED = null;
    FailingConcurrentGraphManager.ALLOW_FIRST_FINISH = null;
    FailingConcurrentGraphManager.FAILURE_CALLED = null;
}
同时添加导入：
import org.junit.Before;
Also applies to: 511-514

hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/TaskManagerTest.java

kenssa4eedfd · 2026-01-20T03:32:53Z

@codecov-ai-reviewer review

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/loader-ci.yml:
- Around line 88-92: The workflow uploads test report artifacts but only
includes failsafe plain text files; add the missing XML failsafe reports so CI
collects full JUnit data by modifying the artifact path list (the path block in
.github/workflows/loader-ci.yml) to include **/target/failsafe-reports/*.xml
alongside the existing **/target/failsafe-reports/*.txt and the surefire globs
so both surefire and failsafe XML reports are uploaded for complete diagnostics.

coderabbitai · 2026-01-20T07:07:45Z

.github/workflows/loader-ci.yml

+          path: |
+            **/target/surefire-reports/*.txt
+            **/target/surefire-reports/*.xml
+            **/target/failsafe-reports/*.txt
+


⚠️ Potential issue | 🟡 Minor

补充 Failsafe 的 XML 报告以便完整诊断。
当前只上传了 failsafe-reports/*.txt，缺少 failsafe-reports/*.xml 会丢失标准 JUnit 报告信息，排查 IT 失败时不完整。

🔧 建议补丁

path: | **/target/surefire-reports/*.txt **/target/surefire-reports/*.xml **/target/failsafe-reports/*.txt + **/target/failsafe-reports/*.xml

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

path: |

**/target/surefire-reports/*.txt

**/target/surefire-reports/*.xml

**/target/failsafe-reports/*.txt

path: |

**/target/surefire-reports/*.txt

**/target/surefire-reports/*.xml

**/target/failsafe-reports/*.txt

**/target/failsafe-reports/*.xml

🤖 Prompt for AI Agents

In @.github/workflows/loader-ci.yml around lines 88 - 92, The workflow uploads test report artifacts but only includes failsafe plain text files; add the missing XML failsafe reports so CI collects full JUnit data by modifying the artifact path list (the path block in .github/workflows/loader-ci.yml) to include **/target/failsafe-reports/*.xml alongside the existing **/target/failsafe-reports/*.txt and the surefire globs so both surefire and failsafe XML reports are uploaded for complete diagnostics.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/loader-ci.yml:
- Around line 77-81: Restore the disabled Maven test profiles in the CI workflow
by re-enabling the commented lines for the "unit", "file", "jdbc", and "kafka"
profiles (so the pipeline runs mvn test -P unit, -P file, -P jdbc, -P kafka in
addition to -P hdfs); if they were intentionally disabled only temporarily, add
a TODO comment in the workflow and create a matching issue referencing the
reason and owner to track fixing the failing tests rather than skipping them.
Ensure the CI runs all relevant profiles to validate changes to batch insert
failure handling and LoadOptions parameter changes.

coderabbitai · 2026-01-20T11:18:29Z

.github/workflows/loader-ci.yml

+          # mvn test -P unit -ntp
+          # mvn test -P file
          mvn test -P hdfs
-          mvn test -P jdbc
-          mvn test -P kafka
+          # mvn test -P jdbc
+          # mvn test -P kafka


⚠️ Potential issue | 🟠 Major

在 CI 中禁用测试会导致回归风险。

当前注释掉了 4 个测试配置文件（unit、file、jdbc、kafka），仅保留 hdfs 测试。这将显著降低 CI 的测试覆盖率，尤其是 PR 中涉及批量插入失败处理逻辑重构和 LoadOptions 参数变更，这些改动应该由被禁用的测试来验证。

如果测试失败，建议修复测试而非跳过；如果是临时措施，请添加 TODO 注释或创建 issue 跟踪，避免将禁用状态合并到 master 分支。

🔧 建议恢复测试执行

- # mvn test -P unit -ntp - # mvn test -P file + mvn test -P unit -ntp + mvn test -P file mvn test -P hdfs - # mvn test -P jdbc - # mvn test -P kafka + mvn test -P jdbc + mvn test -P kafka

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# mvn test -P unit -ntp

# mvn test -P file

mvn test -P hdfs

mvn test -P jdbc

mvn test -P kafka

# mvn test -P jdbc

# mvn test -P kafka

mvn test -P unit -ntp

mvn test -P file

mvn test -P hdfs

mvn test -P jdbc

mvn test -P kafka

🤖 Prompt for AI Agents

In @.github/workflows/loader-ci.yml around lines 77 - 81, Restore the disabled Maven test profiles in the CI workflow by re-enabling the commented lines for the "unit", "file", "jdbc", and "kafka" profiles (so the pipeline runs mvn test -P unit, -P file, -P jdbc, -P kafka in addition to -P hdfs); if they were intentionally disabled only temporarily, add a TODO comment in the workflow and create a matching issue referencing the reason and owner to track fixing the failing tests rather than skipping them. Ensure the CI runs all relevant profiles to validate changes to batch insert failure handling and LoadOptions parameter changes.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java`:
- Around line 429-432: The deprecation warning check in LoadOptions.java
currently uses Arrays.asList(args).contains("--parallel-count") which misses
forms like "--parallel-count=1"; update the condition (in the static check where
args are inspected) to detect both exact and assignment forms by using a
prefix/any-match test (e.g. any arg that equals "--parallel-count" or startsWith
"--parallel-count="). Keep the same LOG.warn message and ensure the check covers
both "--parallel-count" and "--parallel-count=...".

🧹 Nitpick comments (1)

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java (1)
437-453: 自动调整后可能出现 max-conn-per-route > max-conn。
当用户显式设置了较小的 max-conn、但 max-conn-per-route 保持默认时，当前逻辑可能把 per-route 调大到超过 total。建议在调整后做一次裁剪/同步，以保证配置一致性。
♻️ 一种可选修复（裁剪 per-route）
         if (maxConnPerRoute == DEFAULT_MAX_CONNECTIONS_PER_ROUTE && maxConnPerRoute < batchThreads * 2) {
             options.maxConnectionsPerRoute = batchThreads * 2;
             LOG.info("Auto adjusted max-conn-per-route to {} based on batch-insert-threads({})",
                      options.maxConnectionsPerRoute, batchThreads);
         }
+
+        if (options.maxConnectionsPerRoute > options.maxConnections) {
+            LOG.warn("max-conn-per-route({}) > max-conn({}), clamping to max-conn",
+                     options.maxConnectionsPerRoute, options.maxConnections);
+            options.maxConnectionsPerRoute = options.maxConnections;
+        }

coderabbitai · 2026-01-20T14:22:25Z

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java

+        if (Arrays.asList(args).contains("--parallel-count")) {
+            LOG.warn("Parameter --parallel-count is deprecated, " +
+                     "please use --parser-threads instead");
+        }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the LoadOptions.java file fd -n "LoadOptions.java" --type f

Repository: hugegraph/hugegraph-toolchain

Length of output: 303

🏁 Script executed:

# Let's check the structure of the repository git ls-files | head -20

Repository: hugegraph/hugegraph-toolchain

Length of output: 676

🏁 Script executed:

# Search for the specific file mentioned in the review find . -name "LoadOptions.java" -type f

Repository: hugegraph/hugegraph-toolchain

Length of output: 159

🏁 Script executed:

# Read the LoadOptions.java file to understand the argument parsing context wc -l ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 163

🏁 Script executed:

# Read around the target lines sed -n '410,450p' ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 2102

🏁 Script executed:

# Look for the main method and argument parsing logic grep -n "public static void main\|String\[\] args" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java | head -20

Repository: hugegraph/hugegraph-toolchain

Length of output: 136

🏁 Script executed:

# Read the entire parseOptions method to understand how arguments are parsed sed -n '376,432p' ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 2755

🏁 Script executed:

# Search for how --parallel-count is actually parsed (look for property setting) rg -n "parallel-count|parallelCount" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 393

🏁 Script executed:

# Look for the argument parsing framework being used head -50 ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 2122

🏁 Script executed:

# Search for other deprecated parameters to see how they're handled rg -n "deprecated" -i ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 245

🏁 Script executed:

# Look for tests of LoadOptions to see how arguments are actually passed find . -path "*/test*" -name "*LoadOptions*" -type f

Repository: hugegraph/hugegraph-toolchain

Length of output: 164

🏁 Script executed:

# Search for how CLI arguments are invoked/used in tests rg -l "parallel-count|parser-threads" --type java | head -20

Repository: hugegraph/hugegraph-toolchain

Length of output: 434

🏁 Script executed:

# Check if there are any integration tests or examples find . -path "*/test*" -name "*.java" -type f | xargs grep -l "parallel-count\|parser-threads" 2>/dev/null

Repository: hugegraph/hugegraph-toolchain

Length of output: 355

🏁 Script executed:

# Read the LoadOptionsTest file to understand how arguments are tested wc -l ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/LoadOptionsTest.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 168

🏁 Script executed:

# Search for tests related to parallel-count or deprecated parameters rg -n "parallel-count|parser-threads|deprecated" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/LoadOptionsTest.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 406

🏁 Script executed:

# Look at the entire test file cat ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/LoadOptionsTest.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 7577

🏁 Script executed:

# Let's verify JCommander's behavior - search for documentation or check how JCommander handles arguments rg -n "arity" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java | head -10

Repository: hugegraph/hugegraph-toolchain

Length of output: 734

🏁 Script executed:

# Check if there are any CLI usage examples or documentation find . -name "README*" -o -name "*.md" | xargs grep -l "parallel-count\|parser-threads" 2>/dev/null

Repository: hugegraph/hugegraph-toolchain

Length of output: 57

🏁 Script executed:

# Let's check how JCommander is used elsewhere in the codebase with similar parameters rg -n "@Parameter" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java | head -20

Repository: hugegraph/hugegraph-toolchain

Length of output: 1345

🏁 Script executed:

# Check if there are any examples showing --param=value format usage rg -n "=.*\"" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/LoadOptionsTest.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 444

🏁 Script executed:

# Look for main method in LoadOptions or tests that might use the CLI grep -rn "public static void main" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

🏁 Script executed:

# Check JCommander documentation/behavior - look at how other parameters might be passed rg -B5 -A5 "arity = 1" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java | head -30

Repository: hugegraph/hugegraph-toolchain

Length of output: 1513

🏁 Script executed:

# Let's also check if there's any documentation or comments about command-line usage find . -name "*.md" -o -name "*.txt" | xargs grep -l "loader\|parallel-count\|parser-threads" 2>/dev/null | head -5

Repository: hugegraph/hugegraph-toolchain

Length of output: 301

🏁 Script executed:

# Search for JCommander usage documentation or examples in the codebase rg -n "JCommander\|parse" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java | head -10

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

🏁 Script executed:

# Let's create a test to verify JCommander's behavior with both formats python3 << 'EOF' # Check JCommander documentation/behavior # JCommander typically supports both --param value and --param=value formats # Let's verify by checking if there are any tests with = format import subprocess result = subprocess.run(['rg', '--type', 'java', '-n', r'--\w+\s*=', './hugegraph-loader/src/test/java/'], capture_output=True, text=True) print("Test files with --param=value format:") print(result.stdout if result.stdout else "(no matches found)") # Also search in main code result2 = subprocess.run(['rg', '--type', 'java', '-n', r'--\w+\s*=', './hugegraph-loader/src/main/java/'], capture_output=True, text=True) print("\nMain code files with --param=value format:") print(result2.stdout if result2.stdout else "(no matches found)") EOF

Repository: hugegraph/hugegraph-toolchain

Length of output: 159

🏁 Script executed:

# Let's verify by looking at the actual test to see what formats are tested sed -n '115,135p' ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/LoadOptionsTest.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 755

🏁 Script executed:

# Let's check the JCommander version and documentation about supported formats rg -n "jcommander\|beust" ./hugegraph-loader/pom.xml

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

🏁 Script executed:

# Let's examine if there's any indication of how arguments are passed in CLI grep -rn "testDeprecatedParallelCountParameter\|parallel-count" ./hugegraph-loader/src/test/ | head -20

Repository: hugegraph/hugegraph-toolchain

Length of output: 378

🏁 Script executed:

# Let's verify JCommander behavior by checking the dependency version find . -name "pom.xml" | xargs grep -A2 "jcommander" | head -20

Repository: hugegraph/hugegraph-toolchain

Length of output: 919

🏁 Script executed:

# Let's verify the suggested fix is syntactically correct by checking similar patterns in the codebase rg -n "anyMatch\|startsWith" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

🏁 Script executed:

# Let's search for similar deprecation warning patterns in the codebase rg -rn "deprecated.*LOG.warn\|LOG.warn.*deprecated" ./hugegraph-loader/src/main/java/

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

🏁 Script executed:

# Check if there are other places where args array is checked with similar patterns rg -rn "Arrays.asList(args)" ./hugegraph-loader/src/main/java/

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

弃用参数告警在 --parallel-count=... 形式下不生效。

当前仅检查完全等于 --parallel-count 的参数，而 JCommander 支持 --parallel-count=1 这种格式。传递该形式参数时，args 数组中的元素为 "--parallel-count=1"，contains() 无法匹配，导致告警丢失。建议改为前缀匹配以覆盖两种格式。

🛠️ 建议修复

- if (Arrays.asList(args).contains("--parallel-count")) { + if (Arrays.stream(args).anyMatch(arg -> + arg.equals("--parallel-count") || arg.startsWith("--parallel-count="))) { LOG.warn("Parameter --parallel-count is deprecated, " + "please use --parser-threads instead"); }

🤖 Prompt for AI Agents

In `@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java` around lines 429 - 432, The deprecation warning check in LoadOptions.java currently uses Arrays.asList(args).contains("--parallel-count") which misses forms like "--parallel-count=1"; update the condition (in the static check where args are inspected) to detect both exact and assignment forms by using a prefix/any-match test (e.g. any arg that equals "--parallel-count" or startsWith "--parallel-count="). Keep the same LOG.warn message and ensure the check covers both "--parallel-count" and "--parallel-count=...".

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In
`@hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java`:
- Around line 2575-2576: The struct.json used by the FileLoadTest is missing a
charset setting for the software vertex, causing GBK-encoded software records
(written in the test data) to be parsed as UTF-8 and fail; update the
struct.json entry for the software vertex's "input" to include "charset": "GBK"
so it matches the other test configs (see the incremental_mode_and_load_failure
and matched_encoding_charset test setups) and allows the test to load 7 vertices
as expected.

🧹 Nitpick comments (8)

hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java (8)

1200-1203: FIXME 注释应创建正式的 issue 进行跟踪。

FIXME 注释表明存在一个已知的 NPE bug（当 parser-threads > 1 时触发），但这种临时解决方案应该有对应的 issue 来跟踪修复进度。建议在代码库中创建一个 issue，并在注释中引用该 issue 编号。

需要我帮您创建一个 issue 来跟踪这个 NPE bug 吗？

1337-1339: 重复的 FIXME 注释，建议统一管理。

此处的 FIXME 注释与 testMultiFilesHaveHeader 中相同。建议将这些重复的注释替换为对同一个 issue 的引用，以便集中跟踪和修复。

1635-1637: 同上，建议引用统一的 issue 编号。

2067-2068: 缺少 FIXME 注释，与其他测试修改不一致。

其他添加 --parser-threads "1" 的测试都包含 FIXME 注释说明原因，但此处缺失。建议保持一致性，添加相同的注释说明。

2269-2270: 同上，建议引用统一的 issue 编号。

3059-3060: 同上，建议引用统一的 issue 编号。

3074-3075: 同上，建议引用统一的 issue 编号。

1200-1203: 整体建议：这些变更揭示了一个需要根本解决的架构问题。

多个测试需要设置 --parser-threads "1" 来避免 NPE，这表明多线程解析器存在并发安全问题。当前的逐个测试添加 workaround 的方式会：

降低测试覆盖率（多线程场景未被测试）

隐藏潜在的生产环境问题

增加技术债务

建议：

创建一个专门的 issue 跟踪 parser-threads > 1 时的 NPE 问题

在该 issue 中记录所有受影响的测试

优先修复根本原因，而非继续添加 workaround

coderabbitai · 2026-01-20T16:30:44Z

hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java

+                "--test-mode", "true",
+                "--parser-threads", "1"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the test file fd -t f "FileLoadTest.java" | head -5

Repository: hugegraph/hugegraph-toolchain

Length of output: 165

🏁 Script executed:

# Once we find the file, let's examine the specific lines mentioned fd -t f "FileLoadTest.java" -exec wc -l {} \;

Repository: hugegraph/hugegraph-toolchain

Length of output: 259

🏁 Script executed:

# Search for testSourceOrTargetPrimaryValueNull test method rg -n "testSourceOrTargetPrimaryValueNull" --type java

Repository: hugegraph/hugegraph-toolchain

Length of output: 225

🏁 Script executed:

# Read the test method and surrounding code cat -n hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | sed -n '2550,2600p'

Repository: hugegraph/hugegraph-toolchain

Length of output: 2657

🏁 Script executed:

# Search for test data configuration files related to source_or_target_pk_value_null find . -type d -name "*source_or_target_pk_value_null*" 2>/dev/null

Repository: hugegraph/hugegraph-toolchain

Length of output: 141

🏁 Script executed:

# Examine the struct.json configuration for this test cat ./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.json

Repository: hugegraph/hugegraph-toolchain

Length of output: 829

🏁 Script executed:

# Also check the schema.groovy to understand vertex/edge definitions cat ./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/schema.groovy

Repository: hugegraph/hugegraph-toolchain

Length of output: 2063

🏁 Script executed:

# Check git history for changes to the struct.json file git log --oneline -20 -- "**/source_or_target_pk_value_null/struct.json"

Repository: hugegraph/hugegraph-toolchain

Length of output: 103

🏁 Script executed:

# Check git diff to see recent changes to struct.json git diff HEAD~5 -- "**/source_or_target_pk_value_null/struct.json" 2>/dev/null || echo "No recent changes"

Repository: hugegraph/hugegraph-toolchain

Length of output: 90

🏁 Script executed:

# Look for any charset or encoding configuration that might have been removed rg -n "charset" ./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/ -A 2 -B 2

Repository: hugegraph/hugegraph-toolchain

Length of output: 1245

🏁 Script executed:

# Check if there are other test data directories with similar patterns to compare ls -la ./hugegraph-loader/src/test/resources/ | grep -E "source_|target_" | head -10

Repository: hugegraph/hugegraph-toolchain

Length of output: 402

🏁 Script executed:

# Check similar test configurations to see how they handle charset for software vertices find ./hugegraph-loader/src/test/resources -name "struct.json" -exec grep -l "software" {} \; | head -5

Repository: hugegraph/hugegraph-toolchain

Length of output: 1089

🏁 Script executed:

# Check one of these files to see if they specify charset for software vertex cat ./hugegraph-loader/src/test/resources/*/struct.json | grep -A 10 '"label": "software"' | head -30

Repository: hugegraph/hugegraph-toolchain

Length of output: 685

🏁 Script executed:

# Look for GBK encoding in test files to understand the pattern rg -n "GBK" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | head -20

Repository: hugegraph/hugegraph-toolchain

Length of output: 801

🏁 Script executed:

# Check git history of the struct.json file more thoroughly git log --oneline --all -- "**/source_or_target_pk_value_null/struct.json" | head -20

Repository: hugegraph/hugegraph-toolchain

Length of output: 214

🏁 Script executed:

# Get the full diff of changes to this specific file git show 3706a3b4:"./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.json" 2>/dev/null | head -50

Repository: hugegraph/hugegraph-toolchain

Length of output: 829

🏁 Script executed:

# Compare with a working example that has charset GBK for software cat ./hugegraph-loader/src/test/resources/customized_schema/struct.json | grep -A 10 '"label": "software"'

Repository: hugegraph/hugegraph-toolchain

Length of output: 289

🏁 Script executed:

# Check the commit that added unit tests to see what was in struct.json originally git show 3706a3b4:"./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.json" 2>/dev/null

Repository: hugegraph/hugegraph-toolchain

Length of output: 829

🏁 Script executed:

# Check if there's a PR or commit that modified the charset configuration git log -p -- "**/source_or_target_pk_value_null/struct.json" | grep -A 15 -B 5 "charset" | head -50

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

🏁 Script executed:

# Check the parent commit to see if charset was there git show 5579e18c:"./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.json" 2>/dev/null || echo "File may not exist in that commit"

Repository: hugegraph/hugegraph-toolchain

Length of output: 106

🏁 Script executed:

# Check the current HEAD to see if there's been any recent change to add charset for software git log --oneline -1 -- "**/source_or_target_pk_value_null/struct.json"

Repository: hugegraph/hugegraph-toolchain

Length of output: 103

🏁 Script executed:

# Look at the actual test to understand better - check lines around 2580-2593 more carefully cat -n ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | sed -n '2578,2593p'

Repository: hugegraph/hugegraph-toolchain

Length of output: 942

🏁 Script executed:

# Search for other similar tests with GBK encoding to see if they have charset defined rg -B 15 'ioUtil.write.*vertex_software.*GBK' ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 20 'testCustomizedSchema\|testBatchInsertThreads' | head -50

Repository: hugegraph/hugegraph-toolchain

Length of output: 797

🏁 Script executed:

# Check customized_schema struct.json to see the full configuration cat ./hugegraph-loader/src/test/resources/customized_schema/struct.json | head -50

Repository: hugegraph/hugegraph-toolchain

Length of output: 1172

🏁 Script executed:

# Look for any recent changes or issues related to charset in struct.json files git log --all -p -- "**struct.json" | grep -A 5 -B 5 "charset.*GBK" | head -80

Repository: hugegraph/hugegraph-toolchain

Length of output: 2120

🏁 Script executed:

# Check if there's a pattern - search how the test framework handles default charset rg -n "defaultCharset\|UTF-8" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/parser/*.java | head -20

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

🏁 Script executed:

# Check the very latest state of the struct.json file to see if charset has been added for software cat ./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.json | grep -A 8 '"label": "software"'

Repository: hugegraph/hugegraph-toolchain

Length of output: 259

🏁 Script executed:

# Let's trace back - check what the exact commit message says git show --stat 3706a3b4 | head -30

Repository: hugegraph/hugegraph-toolchain

Length of output: 1707

🏁 Script executed:

# Check if there have been any uncommitted changes to this file git status ./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.json

Repository: hugegraph/hugegraph-toolchain

Length of output: 138

🏁 Script executed:

# Look at the full test context - what test names mention parser-threads rg -n "parser-threads.*1" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java

Repository: hugegraph/hugegraph-toolchain

Length of output: 798

🏁 Script executed:

# Let's verify the exact expectation - look at the test assertion more carefully cat -n ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | sed -n '2550,2593p'

Repository: hugegraph/hugegraph-toolchain

Length of output: 2370

🏁 Script executed:

# Check if software vertex should have charset GBK by looking at similar passing tests rg -A 8 '"label": "software"' ./hugegraph-loader/src/test/resources/customized_schema/struct.json

Repository: hugegraph/hugegraph-toolchain

Length of output: 259

🏁 Script executed:

# Compare: check if there are passing tests with the same pattern - write software.csv in GBK with charset GBK in config rg -B 5 'ioUtil.write.*vertex_software.*GBK' ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 50 "testCustomizedSchema" | head -60

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

🏁 Script executed:

# Let's check if this struct.json should have charset: GBK for software by looking at pattern # Check what happens when we look at the exact lines where software vertex is written in GBK in the test rg -B 20 "testCustomizedSchema" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 25 "public void testCustomizedSchema" | head -35

Repository: hugegraph/hugegraph-toolchain

Length of output: 113

🏁 Script executed:

# Get the customized_schema struct.json to compare patterns cat ./hugegraph-loader/src/test/resources/customized_schema/struct.json | grep -A 10 '"label": "software"' | head -15

Repository: hugegraph/hugegraph-toolchain

Length of output: 289

🏁 Script executed:

# Check the FIXME comments to understand the parser-threads issue better rg -B 2 "FIXME.*parser-threads" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | head -20

Repository: hugegraph/hugegraph-toolchain

Length of output: 636

🏁 Script executed:

# Let's check if customized_schema has the same issue or is it configured differently cat ./hugegraph-loader/src/test/resources/customized_schema/struct.json | python3 << 'EOF' import json import sys data = json.load(sys.stdin) for vertex in data.get('vertices', []): if vertex.get('label') == 'software': print(f"Software vertex config: {json.dumps(vertex, indent=2)}") EOF

Repository: hugegraph/hugegraph-toolchain

Length of output: 159

🏁 Script executed:

# Now let's verify the test expectation - does testCustomizedSchema expect 7 vertices? rg -A 20 "public void testCustomizedSchema" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -E "Assert.assertEquals.*vertices"

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

🏁 Script executed:

# Check if this is the initial commit that added these tests git show --stat 3706a3b4 | grep -E "FileLoadTest|struct.json" | head -20

Repository: hugegraph/hugegraph-toolchain

Length of output: 1316

🏁 Script executed:

# Let's check if the test is supposed to expect an exception and then still load vertices # This would help understand the intent rg -B 5 -A 15 "AsyncThrowsAssert.assertThrows" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 20 "testSourceOrTargetPrimaryValueNull" | head -30

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

🏁 Script executed:

# Check if there are any comments in the test about why 7 vertices are expected after an exception rg -B 30 "Assert.assertEquals\(7, vertices.size" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | tail -40

Repository: hugegraph/hugegraph-toolchain

Length of output: 1782

🏁 Script executed:

# Most importantly, let's verify: does the software vertex config need charset GBK? # Check another test that writes software in GBK and passes rg -B 30 "testBatchInsertThreads\|testMultipleSourcesAndTargets" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 30 "ioUtil.write.*vertex_software.*GBK" | head -50

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

🏁 Script executed:

# Let me check what the default charset behavior is when not specified rg -n "defaultCharset\|Charset.defaultCharset" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/ -A 2

Repository: hugegraph/hugegraph-toolchain

Length of output: 55

🏁 Script executed:

# Verify: does the struct.json missing charset config for software cause parsing to fail? # Check similar tests that work - do they have charset specified for software when written in GBK? find ./hugegraph-loader/src/test/resources -name "struct.json" -exec grep -l '"label": "software"' {} \; | while read f; do echo "=== $f ==="; cat "$f" | grep -A 6 '"label": "software"'; done | head -80

Repository: hugegraph/hugegraph-toolchain

Length of output: 2977

🏁 Script executed:

# The key question: is the struct.json missing "charset": "GBK" for software vertex? diff <(cat ./hugegraph-loader/src/test/resources/customized_schema/struct.json | jq '.vertices[] | select(.label == "software")') \ <(cat ./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.json | jq '.vertices[] | select(.label == "software")')

Repository: hugegraph/hugegraph-toolchain

Length of output: 160

🏁 Script executed:

# Confirm: check if incremental_mode_and_load_failure has the same test pattern with GBK charset rg -B 10 "incremental_mode_and_load_failure" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 30 "ioUtil.write.*vertex_software" | head -40

Repository: hugegraph/hugegraph-toolchain

Length of output: 1536

🏁 Script executed:

# Also check if matched_encoding_charset test has the same structure and if it passes rg -B 10 "matched_encoding_charset" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 30 "ioUtil.write.*vertex_software" | head -40

Repository: hugegraph/hugegraph-toolchain

Length of output: 833

struct.json 中缺少 software 顶点的字符集配置。

测试数据中 software 顶点以 GBK 编码写入（行 2558），但 struct.json 配置未指定字符集，导致默认使用 UTF-8 解析，造成 software 顶点解析失败。实际仅加载 5 个顶点（person），而非预期的 7 个。

在 struct.json 中为 software 顶点的 input 配置添加 "charset": "GBK"，与 incremental_mode_and_load_failure 和 matched_encoding_charset 中的配置保持一致。

--parser-threads "1" 是针对已知 NPE 问题的独立解决方案（见代码注释中的 FIXME），与此测试失败无关。

🤖 Prompt for AI Agents

In `@hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java` around lines 2575 - 2576, The struct.json used by the FileLoadTest is missing a charset setting for the software vertex, causing GBK-encoded software records (written in the test data) to be parsed as UTF-8 and fail; update the struct.json entry for the software vertex's "input" to include "charset": "GBK" so it matches the other test configs (see the incremental_mode_and_load_failure and matched_encoding_charset test setups) and allows the test to load 7 vertices as expected.

Update required review count and collaborators

github-actions · 2026-02-20T21:16:58Z

Due to the lack of activity, the current pr is marked as stale and will be closed after 180 days, any update will remove the stale label

[feat] Adjusted several default parameters and descriptions in the Lo…

d3fa860

…ader, and refactored the failure-handling logic for batch inserts.

github-actions bot added the loader label Jan 16, 2026

gemini-code-assist bot reviewed Jan 16, 2026

View reviewed changes

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java Show resolved Hide resolved

gemini-code-assist bot reviewed Jan 16, 2026

View reviewed changes

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java Outdated Show resolved Hide resolved

kenssa4eedfd added 2 commits January 17, 2026 01:39

[feat] Adjusted several default parameters and descriptions in the Lo…

cd92e05

…ader, and refactored the failure-handling logic for batch inserts.

[feat] Adjusted descriptions in the Loader

80d1544

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java Show resolved Hide resolved

kenssa4eedfd added 2 commits January 17, 2026 08:48

[feat] Add validator for parallelThreads

d37cb4e

[feat] Update the default value of parallelThreads

ca92b2a

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java Show resolved Hide resolved

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/HugeGraphLoader.java Outdated Show resolved Hide resolved

kenssa4eedfd added 3 commits January 17, 2026 09:21

[feat] Update the default value of parallelThreads

8a487f5

[feat] Add validator for maxConnections and maxConnectionsPerRoute

0b965c6

[feat] Update the default value of parallelThreads

a432785

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java Outdated Show resolved Hide resolved

kenssa4eedfd added 2 commits January 17, 2026 10:43

[feat] Update the default value of parallelThreads

171a988

[feat] Adjusted several default parameters and descriptions in the Lo…

f062624

…ader, and refactored the failure-handling logic for batch inserts.

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java Outdated Show resolved Hide resolved

hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java Show resolved Hide resolved

kenssa4eedfd added 2 commits January 19, 2026 09:57

[feat] Adjusted several default parameters and descriptions in the Lo…

866a4a0

…ader, and refactored the failure-handling logic for batch inserts.

[feat] Adjusted several default parameters and descriptions in the Lo…

ceaed28

…ader, and refactored the failure-handling logic for batch inserts.

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

kenssa4eedfd added 3 commits January 19, 2026 10:10

[feat] Adjusted several default parameters and descriptions in the Lo…

e39f503

…ader, and refactored the failure-handling logic for batch inserts.

[feat] Add unit tests

eda716e

[feat] Add unit tests

71644b6

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/TaskManagerTest.java Show resolved Hide resolved

[feat] Add unit tests

78fa442

kenssa4eedfd added 2 commits January 20, 2026 13:55

[feat] Add unit tests

ee496dd

[feat] Add unit tests

0379118

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

[feat] Add unit tests

572cc94

kenssa4eedfd force-pushed the loader-update branch from cc737b8 to 572cc94 Compare January 20, 2026 08:29

kenssa4eedfd and others added 5 commits January 20, 2026 16:43

Update loader-ci.yml

a54eb64

[feat] Add unit tests

537445c

Merge remote-tracking branch 'origin/loader-update' into loader-update

6f663c7

[feat] Add unit tests

18e83cf

[feat] Add unit tests

dddfa40

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

[feat] Add unit tests

ed816e3

kenssa4eedfd force-pushed the loader-update branch from 6c5a3ad to ed816e3 Compare January 20, 2026 11:36

kenssa4eedfd added 2 commits January 20, 2026 19:41

[feat] Add unit tests

1831893

[feat] Add unit tests

588adad

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

kenssa4eedfd added 2 commits January 20, 2026 22:52

[feat] Add unit tests

f60d5a1

[feat] Add unit tests

3706a3b

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

kenssa4eedfd and others added 3 commits January 21, 2026 15:50

[feat] Standardize log levels and streamline parameter descriptions.

c328a01

Update required review count and collaborators

146bdf4

Merge pull request #9 from hugegraph/imbajin-patch-1

1af9a32

Update required review count and collaborators

github-actions bot added the inactive label Feb 20, 2026

Conversation

kenssa4eedfd commented Jan 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of the PR

Main Changes

Verifying these changes

Does this PR potentially affect the following parts?

Documentation Status

Summary by CodeRabbit

发布说明

新功能

Bug修复

测试

运维/其他

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

coderabbitai bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist bot commented Jan 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kenssa4eedfd commented Jan 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kenssa4eedfd commented Jan 20, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

kenssa4eedfd commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 16, 2026 •

edited

Loading