Conversation
…ader, and refactored the failure-handling logic for batch inserts.
|
@codecov-ai-reviewer review |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Walkthrough将解析线程配置从 Changes
Sequence DiagramsequenceDiagram
participant Client as Client/User
participant TaskMgr as TaskManager
participant BatchTask as BatchInsertTask
participant GraphMgr as GraphManager
participant SingleTask as SingleInsertTask
Client->>TaskMgr: submitBatch(edges)
TaskMgr->>TaskMgr: acquireBatchSemaphore
alt context stopped
TaskMgr->>TaskMgr: releaseBatchSemaphore
TaskMgr-->>Client: return early
else context running
TaskMgr->>BatchTask: submit batch insert
BatchTask->>GraphMgr: insertInBatch()
alt success
GraphMgr-->>BatchTask: success
BatchTask->>TaskMgr: update metrics & releaseSemaphore
else failure
GraphMgr-->>BatchTask: error
BatchTask->>TaskMgr: whenComplete(error)
alt batchFailureFallback enabled
TaskMgr->>TaskMgr: log warning "try single insert"
TaskMgr->>SingleTask: submit single inserts
SingleTask->>GraphMgr: insertOne() x N
GraphMgr-->>SingleTask: results
TaskMgr->>TaskMgr: releaseBatchSemaphore
else batchFailureFallback disabled
TaskMgr->>TaskMgr: mark error in context
TaskMgr->>TaskMgr: stop loading & print error
TaskMgr->>TaskMgr: releaseBatchSemaphore
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @kenssa4eedfd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the HugeGraph Loader by refining its configuration options for parallelism and connection management, making it more robust and user-friendly. It also significantly improves the failure handling mechanism for batch inserts, ensuring that the loading process stops gracefully and informs the user immediately upon encountering critical errors, rather than attempting less efficient fallback strategies. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates several loader parameters and refactors the failure handling for batch inserts. The parameter changes, such as renaming --parallel-count to --parser-threads and updating default values and descriptions, improve clarity and usability. The core change is in the batch insert failure logic, which now adopts a fail-fast strategy instead of falling back to single inserts. While this can be a valid approach, I've found a couple of issues in the new implementation within TaskManager.java. Specifically, there's a bug in metrics reporting where the 'in-flight' count is not correctly updated on failure, and the new shutdown logic is decentralized, which could be improved for better robustness and maintainability. My review includes a suggestion to address these points.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java (1)
141-156: 在 acquire() 和 runAsync() 之间存在竞态条件,执行器关闭会导致信号量与指标泄漏
context.stopped()仅在 acquire 前检查;若在等待期间执行器被 shutdown,后续runAsync()将同步抛出RejectedExecutionException,此时 whenComplete 回调未注册,无法执行 release 和指标回收。具体场景:当某个批次插入任务失败时,whenComplete 回调内调用
batchService.shutdown();此后提交的批次若在 acquire 返回但 runAsync 尚未执行时被提交,将遭遇关闭的执行器,导致异常未被捕获,信号量与 plusFlighting 计数永久泄漏。建议在 acquire 后增加二次 stopped 检查,并对 runAsync 提交失败添加兜底处理。
🔧 建议修复
+import java.util.concurrent.RejectedExecutionException; ... try { this.batchSemaphore.acquire(); } catch (InterruptedException e) { throw new LoadException("Interrupted while waiting to submit %s " + "batch in batch mode", e, mapping.type()); } + if (this.context.stopped()) { + this.batchSemaphore.release(); + return; + } LoadSummary summary = this.context.summary(); summary.metrics(struct).plusFlighting(batch.size()); InsertTask task = new BatchInsertTask(this.context, struct, mapping, batch); - CompletableFuture.runAsync(task, this.batchService).whenComplete( - (r, e) -> { - if (e != null) { - LOG.error("Batch insert {} error, interrupting import", mapping.type(), e); - this.context.occurredError(); - this.context.stopLoading(); - this.batchService.shutdown(); - this.singleService.shutdown(); - Printer.printError("Batch insert %s failed, stop loading, Please check the logs", - mapping.type().string()); - } else { - summary.metrics(struct).minusFlighting(batch.size()); - } - - this.batchSemaphore.release(); - long end = System.currentTimeMillis(); - this.context.summary().addTimeRange(mapping.type(), start, end); - }); + try { + CompletableFuture.runAsync(task, this.batchService).whenComplete( + (r, e) -> { + if (e != null) { + LOG.error("Batch insert {} error, interrupting import", mapping.type(), e); + this.context.occurredError(); + this.context.stopLoading(); + this.batchService.shutdown(); + this.singleService.shutdown(); + Printer.printError("Batch insert %s failed, stop loading, Please check the logs", + mapping.type().string()); + } else { + summary.metrics(struct).minusFlighting(batch.size()); + } + + this.batchSemaphore.release(); + long end = System.currentTimeMillis(); + this.context.summary().addTimeRange(mapping.type(), start, end); + }); + } catch (RejectedExecutionException ex) { + summary.metrics(struct).minusFlighting(batch.size()); + this.batchSemaphore.release(); + LOG.warn("Batch insert skipped because executor is shutdown", ex); + }
🤖 Fix all issues with AI agents
In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java`:
- Around line 158-168: The failure path in TaskManager's batch completion
handler currently skips releasing the flighting metric
(summary.metrics(struct).minusFlighting), causing inflated stats; update the
whenComplete handler for the batch future so that
summary.metrics(struct).minusFlighting(batch.size()) is invoked in all cases
(both when e != null and else), i.e., perform the minusFlighting call before or
alongside the error handling block that logs the error, calls
this.context.occurredError()/stopLoading(), and shuts down services, ensuring
the metric is always decremented regardless of success or failure.
🧹 Nitpick comments (1)
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java (1)
188-192: 建议补充参数校验注解以增强一致性和用户体验。当前
parallelThreads缺少@Parameter注解上的validateWith属性,与batchSize的实现不一致。虽然下游代码(HugeGraphLoader 第 669-673 行)已正确处理了 null 兜底和值域校验,但在参数解析层面补充验证器可以更早地捕获用户输入错误,避免运行时异常。建议修改如下,与现有的
PositiveValidator模式保持一致:建议改进
`@Parameter`(names = {"--parallel-count", "--parser-threads"}, arity = 1, + validateWith = {PositiveValidator.class}, description = "The number of parallel read pipelines. " + "Default: auto (min(struct count, cpu)). " + "Must be >= 1") public Integer parallelThreads = null;
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/HugeGraphLoader.javahugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.javahugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: hubble-ci (11, 3.11)
- GitHub Check: check-dependency-license
- GitHub Check: Analyze (javascript)
- GitHub Check: loader-ci (11)
- GitHub Check: Analyze (java)
🔇 Additional comments (6)
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/HugeGraphLoader.java (2)
665-674: 逻辑正确,实现合理。自动默认值逻辑与验证顺序正确:
- 提前返回空结构列表,避免
min(0, availableProcessors) = 0的边界情况- 空值时自动计算默认值
- 最后验证确保用户显式设置的值也满足
>= 1的约束
678-685: 日志和线程池创建正确更新。日志输出和线程池创建均已正确使用
parallelThreads变量,与新选项名称保持一致。hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java (3)
48-49: 默认连接数常量抽取清晰,OK。
159-165: 描述更新符合新行为,OK。
171-181: 自动调整功能未完整实现,但不存在默认值判定逻辑同步问题。检查发现
HugeClientBuilder.configPool()的自动调整逻辑基于参数== 0判定,而不是比较常量值。由于LoadOptions初始化maxConnections和maxConnectionsPerRoute时使用的是CPUS * 4和CPUS * 2(非零值),所以不会触发configPool()中的自动调整分支。虽然参数描述说"If left as default...may be auto-adjusted",但当前代码中不存在根据
batchInsertThreads动态调整这两个值的实现。hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java (1)
29-29: 新增 Printer 依赖合理。
用于用户侧错误提示,和下面的错误处理路径一致。
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java
Show resolved
Hide resolved
|
@codecov-ai-reviewer review |
There was a problem hiding this comment.
Code Review
This pull request introduces several improvements to the HugeGraph loader. The parameter handling for parallel threads and HTTP connections has been refined for better flexibility and user clarity. A significant change is the refactoring of the batch insert failure logic to a fail-fast approach, which is a sound design choice to avoid long, partially failing loads. My review includes one suggestion to further improve the new failure handling logic by centralizing the shutdown process for better maintainability.
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java
Outdated
Show resolved
Hide resolved
…ader, and refactored the failure-handling logic for batch inserts.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java`:
- Around line 188-193: The parallelThreads field in LoadOptions lacks a
positive-value validator despite the description "Must be >= 1"; add the same
PositiveValidator used by other params to the `@Parameter` annotation for the
public Integer parallelThreads (e.g., include validator =
PositiveValidator.class in the annotation), and ensure the PositiveValidator
class is imported/available so invalid values (0 or negative) are rejected at
parse time.
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java`:
- Around line 171-181: Add positive-value validation to the HTTP connection
parameters to prevent zero/negative inputs: in class LoadOptions, update the
`@Parameter` annotations for the fields maxConnections and maxConnectionsPerRoute
to include the PositiveValidator (same validator used by other numeric options)
so that values <= 0 are rejected at CLI parsing time and avoid connection pool
initialization failures.
In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/HugeGraphLoader.java`:
- Around line 665-680: parallelThreads is declared as an Integer and may be
null, causing an NPE when autounboxed in the call to
ExecutorUtil.newFixedThreadPool; ensure a non-null primitive int is passed by
coalescing null to a safe default (e.g. Math.max(2, CPUS) or
context.options().defaultParallel) before calling newFixedThreadPool. Modify the
code around the use of this.context.options().parallelThreads (or change its
declaration) so you compute an int threads =
(this.context.options().parallelThreads != null ?
this.context.options().parallelThreads : <default>) and then call
ExecutorUtil.newFixedThreadPool(threads, "loader"); also consider validating
threads > 0 and logging a warning if a null/invalid value was replaced.
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java
Show resolved
Hide resolved
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/HugeGraphLoader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java`:
- Around line 190-195: The parameter annotation for parallelThreads has a
mismatched default description ("Default: auto max(2, cpu)") versus the actual
default (public int parallelThreads = Math.max(2, CPUS/2)); update the
description in the `@Parameter` annotation to reflect the real default (e.g.,
"Default: max(2, cpu/2)") so the doc and the code match; locate the `@Parameter`
on the parallelThreads field in LoadOptions and edit only the description text
to match the Math.max(2, CPUS/2) semantics.
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java
Outdated
Show resolved
Hide resolved
…ader, and refactored the failure-handling logic for batch inserts.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java`:
- Around line 164-167: When a batch insert fails the code calls
this.context.stopLoading() and prints an error via Printer.printError("Batch
insert %s failed, stop loading...", mapping.type().string()) but does not set
the error flag; call this.context.occurredError() as well to mark the task
context as having errored. Update the failure branch (around the existing
this.context.stopLoading() / Printer.printError(...) block in TaskManager) to
invoke this.context.occurredError() so the loader state/reporting reflects the
error consistently.
- Around line 158-170: The flighting metric is being decremented twice when
batchFailureFallback is true (submitBatch calls plusFlighting(batch.size()) then
on batch failure calls submitInSingle, and both submitBatch's whenComplete and
submitInSingle's whenComplete call summary.metrics(...).minusFlighting), causing
incorrect counts; to fix, remove the minusFlighting call from submitInSingle's
whenComplete so only submitBatch balances the plusFlighting, leaving
submitBatch, submitInSingle, summary.metrics(...).minusFlighting, and the
batchFailureFallback path as the referenced places to edit; if submitInSingle
must also be used independently elsewhere, instead add a boolean flag parameter
to submitInSingle (e.g., skipFlightingDecrement) or document callers so that
only the original submitBatch caller performs minusFlighting in the fallback
flow.
🧹 Nitpick comments (1)
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java (1)
428-430: 自动调整逻辑存在边界条件问题。当
batchInsertThreads小于CPUS时(例如用户显式设置较小值),此逻辑会被触发但可能不会产生任何调整,因为默认连接数通常已足够。但当batchInsertThreads刚好等于CPUS时,即使用户显式指定了该值,调整逻辑也不会触发。建议考虑使用标志位追踪用户是否显式设置了
batchInsertThreads,而非依赖值比较。♻️ 可选改进方案
可在
LoadOptions中添加一个batchInsertThreadsUserSet标志,或者在parseOptions中通过 JCommander 的机制检测参数是否被显式设置:// 示例:可在解析后检测是否显式设置 // 当前逻辑在大多数情况下可以工作,但如果需要更精确的控制, // 可考虑在未来版本中引入显式设置检测Also applies to: 434-451
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java
Outdated
Show resolved
Hide resolved
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java
Show resolved
Hide resolved
…ader, and refactored the failure-handling logic for batch inserts.
…ader, and refactored the failure-handling logic for batch inserts.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java (1)
156-176: 成功路径缺少minusFlighting调用,导致指标永久膨胀。当批量插入成功时(
e == null),whenComplete回调不会执行任何minusFlighting调用,但plusFlighting在第 152 行始终被调用。这将导致 flighting 指标持续增长,统计数据严重失真。分析当前三种路径:
- 成功:无
minusFlighting❌- 失败 + 回退:
submitInSingle内部会调用minusFlighting(第 194 行)✓- 失败 + 不回退:第 165 行调用
minusFlighting✓🐛 建议修复
CompletableFuture.runAsync(task, this.batchService).whenComplete( (r, e) -> { + boolean fallbackToSingle = false; if (e != null) { LOG.error("Batch insert {} error, interrupting import", mapping.type(), e); if (this.options.batchFailureFallback) { LOG.warn("Batch insert {} error, try single insert", mapping.type(), e); this.submitInSingle(struct, mapping, batch); + fallbackToSingle = true; } else { - summary.metrics(struct).minusFlighting(batch.size()); this.context.occurredError(); this.context.stopLoading(); Printer.printError("Batch insert %s failed, stop loading. Please check the logs", mapping.type().string()); } } + // 仅当未回退到单条插入时扣减指标(回退时由 submitInSingle 负责) + if (!fallbackToSingle) { + summary.metrics(struct).minusFlighting(batch.size()); + } this.batchSemaphore.release(); long end = System.currentTimeMillis(); this.context.summary().addTimeRange(mapping.type(), start, end); });
…ader, and refactored the failure-handling logic for batch inserts.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/TaskManagerTest.java`:
- Around line 503-507: The test currently swallows InterruptedException after
calling latch.await(5, TimeUnit.SECONDS) and does not restore the thread's
interrupted status; update the catch block that catches InterruptedException in
TaskManagerTest (the latch.await(...) call) to call
Thread.currentThread().interrupt() so the thread's interrupted flag is
preserved, while still allowing the test to proceed or exit as intended.
🧹 Nitpick comments (3)
hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/TaskManagerTest.java (3)
268-269: 建议添加注释说明范围断言的原因。
batchCalls >= 2 && batchCalls <= 3这个范围断言可能让后续维护者困惑。建议添加注释解释为什么期望值是一个范围而非精确值(由于并发竞争条件导致)。♻️ 建议的修改
int batchCalls = FailingConcurrentGraphManager.BATCH_CALLS.get(); + // Due to race conditions, the number of batch calls can be 2 or 3: + // - At least 2: first success call + second failure call that triggers stop + // - At most 3: one more call might slip through before stop is observed Assert.assertTrue(batchCalls >= 2 && batchCalls <= 3);
330-338:Thread.sleep(50)可能导致测试不稳定。使用固定时间等待来确保线程阻塞在信号量上可能在 CI 环境或负载较高的系统上导致测试失败。建议考虑使用更可靠的同步机制,例如在
submitBatch入口处添加可观测的同步点,或者增加等待时间并添加注释说明。请确认此测试在 CI 环境中的稳定性。如果遇到间歇性失败,可考虑以下改进:
- Thread.sleep(50); + // Allow sufficient time for the thread to block on semaphore acquisition. + // Increase if flaky on slow CI systems. + Thread.sleep(100);
462-467: 建议使用@Before方法统一重置静态状态。测试双工类使用静态字段存储状态,当前在各测试方法中手动重置。建议添加
@Before方法统一重置,以提高可维护性并防止遗漏导致的测试干扰。♻️ 建议的修改
在类中添加
@Before方法:`@Before` public void resetTestDoubles() { FailingBatchGraphManager.BATCH_CALLS.set(0); FailingBatchGraphManager.SINGLE_CALLS.set(0); FailingConcurrentGraphManager.BATCH_CALLS.set(0); FailingConcurrentGraphManager.FIRST_STARTED = null; FailingConcurrentGraphManager.ALLOW_FIRST_FINISH = null; FailingConcurrentGraphManager.FAILURE_CALLED = null; }同时添加导入:
import org.junit.Before;Also applies to: 511-514
hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/TaskManagerTest.java
Show resolved
Hide resolved
|
@codecov-ai-reviewer review |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/loader-ci.yml:
- Around line 88-92: The workflow uploads test report artifacts but only
includes failsafe plain text files; add the missing XML failsafe reports so CI
collects full JUnit data by modifying the artifact path list (the path block in
.github/workflows/loader-ci.yml) to include **/target/failsafe-reports/*.xml
alongside the existing **/target/failsafe-reports/*.txt and the surefire globs
so both surefire and failsafe XML reports are uploaded for complete diagnostics.
.github/workflows/loader-ci.yml
Outdated
| path: | | ||
| **/target/surefire-reports/*.txt | ||
| **/target/surefire-reports/*.xml | ||
| **/target/failsafe-reports/*.txt | ||
|
|
There was a problem hiding this comment.
补充 Failsafe 的 XML 报告以便完整诊断。
当前只上传了 failsafe-reports/*.txt,缺少 failsafe-reports/*.xml 会丢失标准 JUnit 报告信息,排查 IT 失败时不完整。
🔧 建议补丁
path: |
**/target/surefire-reports/*.txt
**/target/surefire-reports/*.xml
**/target/failsafe-reports/*.txt
+ **/target/failsafe-reports/*.xml📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| path: | | |
| **/target/surefire-reports/*.txt | |
| **/target/surefire-reports/*.xml | |
| **/target/failsafe-reports/*.txt | |
| path: | | |
| **/target/surefire-reports/*.txt | |
| **/target/surefire-reports/*.xml | |
| **/target/failsafe-reports/*.txt | |
| **/target/failsafe-reports/*.xml |
🤖 Prompt for AI Agents
In @.github/workflows/loader-ci.yml around lines 88 - 92, The workflow uploads
test report artifacts but only includes failsafe plain text files; add the
missing XML failsafe reports so CI collects full JUnit data by modifying the
artifact path list (the path block in .github/workflows/loader-ci.yml) to
include **/target/failsafe-reports/*.xml alongside the existing
**/target/failsafe-reports/*.txt and the surefire globs so both surefire and
failsafe XML reports are uploaded for complete diagnostics.
cc737b8 to
572cc94
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/loader-ci.yml:
- Around line 77-81: Restore the disabled Maven test profiles in the CI workflow
by re-enabling the commented lines for the "unit", "file", "jdbc", and "kafka"
profiles (so the pipeline runs mvn test -P unit, -P file, -P jdbc, -P kafka in
addition to -P hdfs); if they were intentionally disabled only temporarily, add
a TODO comment in the workflow and create a matching issue referencing the
reason and owner to track fixing the failing tests rather than skipping them.
Ensure the CI runs all relevant profiles to validate changes to batch insert
failure handling and LoadOptions parameter changes.
.github/workflows/loader-ci.yml
Outdated
| # mvn test -P unit -ntp | ||
| # mvn test -P file | ||
| mvn test -P hdfs | ||
| mvn test -P jdbc | ||
| mvn test -P kafka | ||
| # mvn test -P jdbc | ||
| # mvn test -P kafka |
There was a problem hiding this comment.
在 CI 中禁用测试会导致回归风险。
当前注释掉了 4 个测试配置文件(unit、file、jdbc、kafka),仅保留 hdfs 测试。这将显著降低 CI 的测试覆盖率,尤其是 PR 中涉及批量插入失败处理逻辑重构和 LoadOptions 参数变更,这些改动应该由被禁用的测试来验证。
如果测试失败,建议修复测试而非跳过;如果是临时措施,请添加 TODO 注释或创建 issue 跟踪,避免将禁用状态合并到 master 分支。
🔧 建议恢复测试执行
- # mvn test -P unit -ntp
- # mvn test -P file
+ mvn test -P unit -ntp
+ mvn test -P file
mvn test -P hdfs
- # mvn test -P jdbc
- # mvn test -P kafka
+ mvn test -P jdbc
+ mvn test -P kafka📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # mvn test -P unit -ntp | |
| # mvn test -P file | |
| mvn test -P hdfs | |
| mvn test -P jdbc | |
| mvn test -P kafka | |
| # mvn test -P jdbc | |
| # mvn test -P kafka | |
| mvn test -P unit -ntp | |
| mvn test -P file | |
| mvn test -P hdfs | |
| mvn test -P jdbc | |
| mvn test -P kafka |
🤖 Prompt for AI Agents
In @.github/workflows/loader-ci.yml around lines 77 - 81, Restore the disabled
Maven test profiles in the CI workflow by re-enabling the commented lines for
the "unit", "file", "jdbc", and "kafka" profiles (so the pipeline runs mvn test
-P unit, -P file, -P jdbc, -P kafka in addition to -P hdfs); if they were
intentionally disabled only temporarily, add a TODO comment in the workflow and
create a matching issue referencing the reason and owner to track fixing the
failing tests rather than skipping them. Ensure the CI runs all relevant
profiles to validate changes to batch insert failure handling and LoadOptions
parameter changes.
6c5a3ad to
ed816e3
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java`:
- Around line 429-432: The deprecation warning check in LoadOptions.java
currently uses Arrays.asList(args).contains("--parallel-count") which misses
forms like "--parallel-count=1"; update the condition (in the static check where
args are inspected) to detect both exact and assignment forms by using a
prefix/any-match test (e.g. any arg that equals "--parallel-count" or startsWith
"--parallel-count="). Keep the same LOG.warn message and ensure the check covers
both "--parallel-count" and "--parallel-count=...".
🧹 Nitpick comments (1)
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java (1)
437-453: 自动调整后可能出现 max-conn-per-route > max-conn。
当用户显式设置了较小的max-conn、但max-conn-per-route保持默认时,当前逻辑可能把 per-route 调大到超过 total。建议在调整后做一次裁剪/同步,以保证配置一致性。♻️ 一种可选修复(裁剪 per-route)
if (maxConnPerRoute == DEFAULT_MAX_CONNECTIONS_PER_ROUTE && maxConnPerRoute < batchThreads * 2) { options.maxConnectionsPerRoute = batchThreads * 2; LOG.info("Auto adjusted max-conn-per-route to {} based on batch-insert-threads({})", options.maxConnectionsPerRoute, batchThreads); } + + if (options.maxConnectionsPerRoute > options.maxConnections) { + LOG.warn("max-conn-per-route({}) > max-conn({}), clamping to max-conn", + options.maxConnectionsPerRoute, options.maxConnections); + options.maxConnectionsPerRoute = options.maxConnections; + }
| if (Arrays.asList(args).contains("--parallel-count")) { | ||
| LOG.warn("Parameter --parallel-count is deprecated, " + | ||
| "please use --parser-threads instead"); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and read the LoadOptions.java file
fd -n "LoadOptions.java" --type fRepository: hugegraph/hugegraph-toolchain
Length of output: 303
🏁 Script executed:
# Let's check the structure of the repository
git ls-files | head -20Repository: hugegraph/hugegraph-toolchain
Length of output: 676
🏁 Script executed:
# Search for the specific file mentioned in the review
find . -name "LoadOptions.java" -type fRepository: hugegraph/hugegraph-toolchain
Length of output: 159
🏁 Script executed:
# Read the LoadOptions.java file to understand the argument parsing context
wc -l ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 163
🏁 Script executed:
# Read around the target lines
sed -n '410,450p' ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 2102
🏁 Script executed:
# Look for the main method and argument parsing logic
grep -n "public static void main\|String\[\] args" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java | head -20Repository: hugegraph/hugegraph-toolchain
Length of output: 136
🏁 Script executed:
# Read the entire parseOptions method to understand how arguments are parsed
sed -n '376,432p' ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 2755
🏁 Script executed:
# Search for how --parallel-count is actually parsed (look for property setting)
rg -n "parallel-count|parallelCount" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 393
🏁 Script executed:
# Look for the argument parsing framework being used
head -50 ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 2122
🏁 Script executed:
# Search for other deprecated parameters to see how they're handled
rg -n "deprecated" -i ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 245
🏁 Script executed:
# Look for tests of LoadOptions to see how arguments are actually passed
find . -path "*/test*" -name "*LoadOptions*" -type fRepository: hugegraph/hugegraph-toolchain
Length of output: 164
🏁 Script executed:
# Search for how CLI arguments are invoked/used in tests
rg -l "parallel-count|parser-threads" --type java | head -20Repository: hugegraph/hugegraph-toolchain
Length of output: 434
🏁 Script executed:
# Check if there are any integration tests or examples
find . -path "*/test*" -name "*.java" -type f | xargs grep -l "parallel-count\|parser-threads" 2>/dev/nullRepository: hugegraph/hugegraph-toolchain
Length of output: 355
🏁 Script executed:
# Read the LoadOptionsTest file to understand how arguments are tested
wc -l ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/LoadOptionsTest.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 168
🏁 Script executed:
# Search for tests related to parallel-count or deprecated parameters
rg -n "parallel-count|parser-threads|deprecated" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/LoadOptionsTest.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 406
🏁 Script executed:
# Look at the entire test file
cat ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/LoadOptionsTest.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 7577
🏁 Script executed:
# Let's verify JCommander's behavior - search for documentation or check how JCommander handles arguments
rg -n "arity" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java | head -10Repository: hugegraph/hugegraph-toolchain
Length of output: 734
🏁 Script executed:
# Check if there are any CLI usage examples or documentation
find . -name "README*" -o -name "*.md" | xargs grep -l "parallel-count\|parser-threads" 2>/dev/nullRepository: hugegraph/hugegraph-toolchain
Length of output: 57
🏁 Script executed:
# Let's check how JCommander is used elsewhere in the codebase with similar parameters
rg -n "@Parameter" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java | head -20Repository: hugegraph/hugegraph-toolchain
Length of output: 1345
🏁 Script executed:
# Check if there are any examples showing --param=value format usage
rg -n "=.*\"" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/LoadOptionsTest.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 444
🏁 Script executed:
# Look for main method in LoadOptions or tests that might use the CLI
grep -rn "public static void main" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/Repository: hugegraph/hugegraph-toolchain
Length of output: 55
🏁 Script executed:
# Check JCommander documentation/behavior - look at how other parameters might be passed
rg -B5 -A5 "arity = 1" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java | head -30Repository: hugegraph/hugegraph-toolchain
Length of output: 1513
🏁 Script executed:
# Let's also check if there's any documentation or comments about command-line usage
find . -name "*.md" -o -name "*.txt" | xargs grep -l "loader\|parallel-count\|parser-threads" 2>/dev/null | head -5Repository: hugegraph/hugegraph-toolchain
Length of output: 301
🏁 Script executed:
# Search for JCommander usage documentation or examples in the codebase
rg -n "JCommander\|parse" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java | head -10Repository: hugegraph/hugegraph-toolchain
Length of output: 55
🏁 Script executed:
# Let's create a test to verify JCommander's behavior with both formats
python3 << 'EOF'
# Check JCommander documentation/behavior
# JCommander typically supports both --param value and --param=value formats
# Let's verify by checking if there are any tests with = format
import subprocess
result = subprocess.run(['rg', '--type', 'java', '-n', r'--\w+\s*=', './hugegraph-loader/src/test/java/'],
capture_output=True, text=True)
print("Test files with --param=value format:")
print(result.stdout if result.stdout else "(no matches found)")
# Also search in main code
result2 = subprocess.run(['rg', '--type', 'java', '-n', r'--\w+\s*=', './hugegraph-loader/src/main/java/'],
capture_output=True, text=True)
print("\nMain code files with --param=value format:")
print(result2.stdout if result2.stdout else "(no matches found)")
EOFRepository: hugegraph/hugegraph-toolchain
Length of output: 159
🏁 Script executed:
# Let's verify by looking at the actual test to see what formats are tested
sed -n '115,135p' ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/LoadOptionsTest.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 755
🏁 Script executed:
# Let's check the JCommander version and documentation about supported formats
rg -n "jcommander\|beust" ./hugegraph-loader/pom.xmlRepository: hugegraph/hugegraph-toolchain
Length of output: 55
🏁 Script executed:
# Let's examine if there's any indication of how arguments are passed in CLI
grep -rn "testDeprecatedParallelCountParameter\|parallel-count" ./hugegraph-loader/src/test/ | head -20Repository: hugegraph/hugegraph-toolchain
Length of output: 378
🏁 Script executed:
# Let's verify JCommander behavior by checking the dependency version
find . -name "pom.xml" | xargs grep -A2 "jcommander" | head -20Repository: hugegraph/hugegraph-toolchain
Length of output: 919
🏁 Script executed:
# Let's verify the suggested fix is syntactically correct by checking similar patterns in the codebase
rg -n "anyMatch\|startsWith" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 55
🏁 Script executed:
# Let's search for similar deprecation warning patterns in the codebase
rg -rn "deprecated.*LOG.warn\|LOG.warn.*deprecated" ./hugegraph-loader/src/main/java/Repository: hugegraph/hugegraph-toolchain
Length of output: 55
🏁 Script executed:
# Check if there are other places where args array is checked with similar patterns
rg -rn "Arrays.asList(args)" ./hugegraph-loader/src/main/java/Repository: hugegraph/hugegraph-toolchain
Length of output: 55
弃用参数告警在 --parallel-count=... 形式下不生效。
当前仅检查完全等于 --parallel-count 的参数,而 JCommander 支持 --parallel-count=1 这种格式。传递该形式参数时,args 数组中的元素为 "--parallel-count=1",contains() 无法匹配,导致告警丢失。建议改为前缀匹配以覆盖两种格式。
🛠️ 建议修复
- if (Arrays.asList(args).contains("--parallel-count")) {
+ if (Arrays.stream(args).anyMatch(arg ->
+ arg.equals("--parallel-count") || arg.startsWith("--parallel-count="))) {
LOG.warn("Parameter --parallel-count is deprecated, " +
"please use --parser-threads instead");
}🤖 Prompt for AI Agents
In
`@hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java`
around lines 429 - 432, The deprecation warning check in LoadOptions.java
currently uses Arrays.asList(args).contains("--parallel-count") which misses
forms like "--parallel-count=1"; update the condition (in the static check where
args are inspected) to detect both exact and assignment forms by using a
prefix/any-match test (e.g. any arg that equals "--parallel-count" or startsWith
"--parallel-count="). Keep the same LOG.warn message and ensure the check covers
both "--parallel-count" and "--parallel-count=...".
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java`:
- Around line 2575-2576: The struct.json used by the FileLoadTest is missing a
charset setting for the software vertex, causing GBK-encoded software records
(written in the test data) to be parsed as UTF-8 and fail; update the
struct.json entry for the software vertex's "input" to include "charset": "GBK"
so it matches the other test configs (see the incremental_mode_and_load_failure
and matched_encoding_charset test setups) and allows the test to load 7 vertices
as expected.
🧹 Nitpick comments (8)
hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java (8)
1200-1203: FIXME 注释应创建正式的 issue 进行跟踪。FIXME 注释表明存在一个已知的 NPE bug(当
parser-threads > 1时触发),但这种临时解决方案应该有对应的 issue 来跟踪修复进度。建议在代码库中创建一个 issue,并在注释中引用该 issue 编号。需要我帮您创建一个 issue 来跟踪这个 NPE bug 吗?
1337-1339: 重复的 FIXME 注释,建议统一管理。此处的 FIXME 注释与
testMultiFilesHaveHeader中相同。建议将这些重复的注释替换为对同一个 issue 的引用,以便集中跟踪和修复。
1635-1637: 同上,建议引用统一的 issue 编号。
2067-2068: 缺少 FIXME 注释,与其他测试修改不一致。其他添加
--parser-threads "1"的测试都包含 FIXME 注释说明原因,但此处缺失。建议保持一致性,添加相同的注释说明。
2269-2270: 同上,建议引用统一的 issue 编号。
3059-3060: 同上,建议引用统一的 issue 编号。
3074-3075: 同上,建议引用统一的 issue 编号。
1200-1203: 整体建议:这些变更揭示了一个需要根本解决的架构问题。多个测试需要设置
--parser-threads "1"来避免 NPE,这表明多线程解析器存在并发安全问题。当前的逐个测试添加 workaround 的方式会:
- 降低测试覆盖率(多线程场景未被测试)
- 隐藏潜在的生产环境问题
- 增加技术债务
建议:
- 创建一个专门的 issue 跟踪 parser-threads > 1 时的 NPE 问题
- 在该 issue 中记录所有受影响的测试
- 优先修复根本原因,而非继续添加 workaround
| "--test-mode", "true", | ||
| "--parser-threads", "1" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the test file
fd -t f "FileLoadTest.java" | head -5Repository: hugegraph/hugegraph-toolchain
Length of output: 165
🏁 Script executed:
# Once we find the file, let's examine the specific lines mentioned
fd -t f "FileLoadTest.java" -exec wc -l {} \;Repository: hugegraph/hugegraph-toolchain
Length of output: 259
🏁 Script executed:
# Search for testSourceOrTargetPrimaryValueNull test method
rg -n "testSourceOrTargetPrimaryValueNull" --type javaRepository: hugegraph/hugegraph-toolchain
Length of output: 225
🏁 Script executed:
# Read the test method and surrounding code
cat -n hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | sed -n '2550,2600p'Repository: hugegraph/hugegraph-toolchain
Length of output: 2657
🏁 Script executed:
# Search for test data configuration files related to source_or_target_pk_value_null
find . -type d -name "*source_or_target_pk_value_null*" 2>/dev/nullRepository: hugegraph/hugegraph-toolchain
Length of output: 141
🏁 Script executed:
# Examine the struct.json configuration for this test
cat ./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.jsonRepository: hugegraph/hugegraph-toolchain
Length of output: 829
🏁 Script executed:
# Also check the schema.groovy to understand vertex/edge definitions
cat ./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/schema.groovyRepository: hugegraph/hugegraph-toolchain
Length of output: 2063
🏁 Script executed:
# Check git history for changes to the struct.json file
git log --oneline -20 -- "**/source_or_target_pk_value_null/struct.json"Repository: hugegraph/hugegraph-toolchain
Length of output: 103
🏁 Script executed:
# Check git diff to see recent changes to struct.json
git diff HEAD~5 -- "**/source_or_target_pk_value_null/struct.json" 2>/dev/null || echo "No recent changes"Repository: hugegraph/hugegraph-toolchain
Length of output: 90
🏁 Script executed:
# Look for any charset or encoding configuration that might have been removed
rg -n "charset" ./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/ -A 2 -B 2Repository: hugegraph/hugegraph-toolchain
Length of output: 1245
🏁 Script executed:
# Check if there are other test data directories with similar patterns to compare
ls -la ./hugegraph-loader/src/test/resources/ | grep -E "source_|target_" | head -10Repository: hugegraph/hugegraph-toolchain
Length of output: 402
🏁 Script executed:
# Check similar test configurations to see how they handle charset for software vertices
find ./hugegraph-loader/src/test/resources -name "struct.json" -exec grep -l "software" {} \; | head -5Repository: hugegraph/hugegraph-toolchain
Length of output: 1089
🏁 Script executed:
# Check one of these files to see if they specify charset for software vertex
cat ./hugegraph-loader/src/test/resources/*/struct.json | grep -A 10 '"label": "software"' | head -30Repository: hugegraph/hugegraph-toolchain
Length of output: 685
🏁 Script executed:
# Look for GBK encoding in test files to understand the pattern
rg -n "GBK" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | head -20Repository: hugegraph/hugegraph-toolchain
Length of output: 801
🏁 Script executed:
# Check git history of the struct.json file more thoroughly
git log --oneline --all -- "**/source_or_target_pk_value_null/struct.json" | head -20Repository: hugegraph/hugegraph-toolchain
Length of output: 214
🏁 Script executed:
# Get the full diff of changes to this specific file
git show 3706a3b4:"./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.json" 2>/dev/null | head -50Repository: hugegraph/hugegraph-toolchain
Length of output: 829
🏁 Script executed:
# Compare with a working example that has charset GBK for software
cat ./hugegraph-loader/src/test/resources/customized_schema/struct.json | grep -A 10 '"label": "software"'Repository: hugegraph/hugegraph-toolchain
Length of output: 289
🏁 Script executed:
# Check the commit that added unit tests to see what was in struct.json originally
git show 3706a3b4:"./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.json" 2>/dev/nullRepository: hugegraph/hugegraph-toolchain
Length of output: 829
🏁 Script executed:
# Check if there's a PR or commit that modified the charset configuration
git log -p -- "**/source_or_target_pk_value_null/struct.json" | grep -A 15 -B 5 "charset" | head -50Repository: hugegraph/hugegraph-toolchain
Length of output: 55
🏁 Script executed:
# Check the parent commit to see if charset was there
git show 5579e18c:"./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.json" 2>/dev/null || echo "File may not exist in that commit"Repository: hugegraph/hugegraph-toolchain
Length of output: 106
🏁 Script executed:
# Check the current HEAD to see if there's been any recent change to add charset for software
git log --oneline -1 -- "**/source_or_target_pk_value_null/struct.json"Repository: hugegraph/hugegraph-toolchain
Length of output: 103
🏁 Script executed:
# Look at the actual test to understand better - check lines around 2580-2593 more carefully
cat -n ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | sed -n '2578,2593p'Repository: hugegraph/hugegraph-toolchain
Length of output: 942
🏁 Script executed:
# Search for other similar tests with GBK encoding to see if they have charset defined
rg -B 15 'ioUtil.write.*vertex_software.*GBK' ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 20 'testCustomizedSchema\|testBatchInsertThreads' | head -50Repository: hugegraph/hugegraph-toolchain
Length of output: 797
🏁 Script executed:
# Check customized_schema struct.json to see the full configuration
cat ./hugegraph-loader/src/test/resources/customized_schema/struct.json | head -50Repository: hugegraph/hugegraph-toolchain
Length of output: 1172
🏁 Script executed:
# Look for any recent changes or issues related to charset in struct.json files
git log --all -p -- "**struct.json" | grep -A 5 -B 5 "charset.*GBK" | head -80Repository: hugegraph/hugegraph-toolchain
Length of output: 2120
🏁 Script executed:
# Check if there's a pattern - search how the test framework handles default charset
rg -n "defaultCharset\|UTF-8" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/parser/*.java | head -20Repository: hugegraph/hugegraph-toolchain
Length of output: 55
🏁 Script executed:
# Check the very latest state of the struct.json file to see if charset has been added for software
cat ./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.json | grep -A 8 '"label": "software"'Repository: hugegraph/hugegraph-toolchain
Length of output: 259
🏁 Script executed:
# Let's trace back - check what the exact commit message says
git show --stat 3706a3b4 | head -30Repository: hugegraph/hugegraph-toolchain
Length of output: 1707
🏁 Script executed:
# Check if there have been any uncommitted changes to this file
git status ./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.jsonRepository: hugegraph/hugegraph-toolchain
Length of output: 138
🏁 Script executed:
# Look at the full test context - what test names mention parser-threads
rg -n "parser-threads.*1" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.javaRepository: hugegraph/hugegraph-toolchain
Length of output: 798
🏁 Script executed:
# Let's verify the exact expectation - look at the test assertion more carefully
cat -n ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | sed -n '2550,2593p'Repository: hugegraph/hugegraph-toolchain
Length of output: 2370
🏁 Script executed:
# Check if software vertex should have charset GBK by looking at similar passing tests
rg -A 8 '"label": "software"' ./hugegraph-loader/src/test/resources/customized_schema/struct.jsonRepository: hugegraph/hugegraph-toolchain
Length of output: 259
🏁 Script executed:
# Compare: check if there are passing tests with the same pattern - write software.csv in GBK with charset GBK in config
rg -B 5 'ioUtil.write.*vertex_software.*GBK' ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 50 "testCustomizedSchema" | head -60Repository: hugegraph/hugegraph-toolchain
Length of output: 55
🏁 Script executed:
# Let's check if this struct.json should have charset: GBK for software by looking at pattern
# Check what happens when we look at the exact lines where software vertex is written in GBK in the test
rg -B 20 "testCustomizedSchema" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 25 "public void testCustomizedSchema" | head -35Repository: hugegraph/hugegraph-toolchain
Length of output: 113
🏁 Script executed:
# Get the customized_schema struct.json to compare patterns
cat ./hugegraph-loader/src/test/resources/customized_schema/struct.json | grep -A 10 '"label": "software"' | head -15Repository: hugegraph/hugegraph-toolchain
Length of output: 289
🏁 Script executed:
# Check the FIXME comments to understand the parser-threads issue better
rg -B 2 "FIXME.*parser-threads" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | head -20Repository: hugegraph/hugegraph-toolchain
Length of output: 636
🏁 Script executed:
# Let's check if customized_schema has the same issue or is it configured differently
cat ./hugegraph-loader/src/test/resources/customized_schema/struct.json | python3 << 'EOF'
import json
import sys
data = json.load(sys.stdin)
for vertex in data.get('vertices', []):
if vertex.get('label') == 'software':
print(f"Software vertex config: {json.dumps(vertex, indent=2)}")
EOFRepository: hugegraph/hugegraph-toolchain
Length of output: 159
🏁 Script executed:
# Now let's verify the test expectation - does testCustomizedSchema expect 7 vertices?
rg -A 20 "public void testCustomizedSchema" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -E "Assert.assertEquals.*vertices"Repository: hugegraph/hugegraph-toolchain
Length of output: 55
🏁 Script executed:
# Check if this is the initial commit that added these tests
git show --stat 3706a3b4 | grep -E "FileLoadTest|struct.json" | head -20Repository: hugegraph/hugegraph-toolchain
Length of output: 1316
🏁 Script executed:
# Let's check if the test is supposed to expect an exception and then still load vertices
# This would help understand the intent
rg -B 5 -A 15 "AsyncThrowsAssert.assertThrows" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 20 "testSourceOrTargetPrimaryValueNull" | head -30Repository: hugegraph/hugegraph-toolchain
Length of output: 55
🏁 Script executed:
# Check if there are any comments in the test about why 7 vertices are expected after an exception
rg -B 30 "Assert.assertEquals\(7, vertices.size" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | tail -40Repository: hugegraph/hugegraph-toolchain
Length of output: 1782
🏁 Script executed:
# Most importantly, let's verify: does the software vertex config need charset GBK?
# Check another test that writes software in GBK and passes
rg -B 30 "testBatchInsertThreads\|testMultipleSourcesAndTargets" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 30 "ioUtil.write.*vertex_software.*GBK" | head -50Repository: hugegraph/hugegraph-toolchain
Length of output: 55
🏁 Script executed:
# Let me check what the default charset behavior is when not specified
rg -n "defaultCharset\|Charset.defaultCharset" ./hugegraph-loader/src/main/java/org/apache/hugegraph/loader/ -A 2Repository: hugegraph/hugegraph-toolchain
Length of output: 55
🏁 Script executed:
# Verify: does the struct.json missing charset config for software cause parsing to fail?
# Check similar tests that work - do they have charset specified for software when written in GBK?
find ./hugegraph-loader/src/test/resources -name "struct.json" -exec grep -l '"label": "software"' {} \; | while read f; do echo "=== $f ==="; cat "$f" | grep -A 6 '"label": "software"'; done | head -80Repository: hugegraph/hugegraph-toolchain
Length of output: 2977
🏁 Script executed:
# The key question: is the struct.json missing "charset": "GBK" for software vertex?
diff <(cat ./hugegraph-loader/src/test/resources/customized_schema/struct.json | jq '.vertices[] | select(.label == "software")') \
<(cat ./hugegraph-loader/src/test/resources/source_or_target_pk_value_null/struct.json | jq '.vertices[] | select(.label == "software")')Repository: hugegraph/hugegraph-toolchain
Length of output: 160
🏁 Script executed:
# Confirm: check if incremental_mode_and_load_failure has the same test pattern with GBK charset
rg -B 10 "incremental_mode_and_load_failure" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 30 "ioUtil.write.*vertex_software" | head -40Repository: hugegraph/hugegraph-toolchain
Length of output: 1536
🏁 Script executed:
# Also check if matched_encoding_charset test has the same structure and if it passes
rg -B 10 "matched_encoding_charset" ./hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java | grep -A 30 "ioUtil.write.*vertex_software" | head -40Repository: hugegraph/hugegraph-toolchain
Length of output: 833
struct.json 中缺少 software 顶点的字符集配置。
测试数据中 software 顶点以 GBK 编码写入(行 2558),但 struct.json 配置未指定字符集,导致默认使用 UTF-8 解析,造成 software 顶点解析失败。实际仅加载 5 个顶点(person),而非预期的 7 个。
在 struct.json 中为 software 顶点的 input 配置添加 "charset": "GBK",与 incremental_mode_and_load_failure 和 matched_encoding_charset 中的配置保持一致。
--parser-threads "1" 是针对已知 NPE 问题的独立解决方案(见代码注释中的 FIXME),与此测试失败无关。
🤖 Prompt for AI Agents
In
`@hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java`
around lines 2575 - 2576, The struct.json used by the FileLoadTest is missing a
charset setting for the software vertex, causing GBK-encoded software records
(written in the test data) to be parsed as UTF-8 and fail; update the
struct.json entry for the software vertex's "input" to include "charset": "GBK"
so it matches the other test configs (see the incremental_mode_and_load_failure
and matched_encoding_charset test setups) and allows the test to load 7 vertices
as expected.
Update required review count and collaborators
|
Due to the lack of activity, the current pr is marked as stale and will be closed after 180 days, any update will remove the stale label |
…ader, and refactored the failure-handling logic for batch inserts.
Purpose of the PR
Main Changes
Verifying these changes
Does this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No NeedSummary by CodeRabbit
发布说明
新功能
--parser-threads(兼容并弃用--parallel-count);新增batch-failure-fallback(默认启用)。Bug修复
测试
运维/其他
✏️ Tip: You can customize this high-level summary in your review settings.