Skip to content

Fix computation graph reuse issue by preventing kernel eviction#198

Merged
niconiconi merged 2 commits intomasterfrom
hc/reuse_graph
Feb 5, 2026
Merged

Fix computation graph reuse issue by preventing kernel eviction#198
niconiconi merged 2 commits intomasterfrom
hc/reuse_graph

Conversation

@hczphn
Copy link
Collaborator

@hczphn hczphn commented Feb 3, 2026

Summary: Fixed a bug where kernels were missing during graph execution due to a conflict between insertion deduplication and extraction popping.

Technical Details:

The Bug: During kernel extraction, the code used pop(), which removed the kernel from the graph. A single kernel entry might be needed by multiple operations. Since the insertion logic deduplicates kernels, popping it on the first call caused subsequent calls in the same graph to fail.

The Fix: Access kernels without removing them from the graph container.

Result: Fixes errors when a kernel is reused and allows the graph to be reused for different witness inputs.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hczphn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求解决了计算图执行中的一个重要缺陷,即由于内核提取方式不当,导致内核在多次操作中无法重复使用。通过修改内核访问逻辑,使其不再从容器中移除内核,而是直接获取,本更改确保了内核的持久性。这不仅修复了在同一计算图中重复使用内核时出现的错误,还使得编译后的计算图能够灵活地与不同的输入数据一起重复使用,从而提高了系统的稳定性和效率。

Highlights

  • 内核提取机制修复: 修复了计算图执行中的一个关键错误,该错误导致内核因不当的提取机制而被过早移除,使得同一计算图无法重复使用。
  • 非消耗性内核访问: 将内核提取逻辑从消耗性操作(drain().next().unwrap())更改为非消耗性操作(get().clone()),确保内核在计算图的整个生命周期内保持可用,从而支持内核的重复利用。
  • 新增计算图复用测试: 新增了一个测试文件 test_bn254_load_graph_with_new_data.rs,以验证修复并展示计算图能够成功地用新数据重复使用,包括两次独立的计算、证明和验证流程。

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • expander_compiler/src/zkcuda/context.rs
    • 修改了 Context::build_computation_graph 方法中获取内核的逻辑,从 cg_kernels.drain(..1).next().unwrap() 变为 self.kernels.get(kernel_call.kernel_id).clone(),以避免移除内核。
    • 移除了对 cg_kernels.is_empty() 的断言,因为 cg_kernels 不再被消耗。
  • expander_compiler/tests/test_bn254_new_data.rs
    • 新增了测试文件,包含 add_2_macroadd_16_macro 两个内核定义。
    • 实现了 test_bn254_load_graph_with_new_data_impl 测试函数,用于验证计算图在加载新数据后的重复使用能力,包括两次独立的计算、证明和验证流程,确保结果的正确性和图的复用性。
Activity
  • 目前没有关于此拉取请求的评论或审查活动。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a critical bug concerning the reuse of computation graphs by preventing the premature eviction of kernels. The core logic change in context.rs is correct and directly addresses the issue. The addition of a new test case in test_bn254_new_data.rs is excellent, as it thoroughly validates the fix by simulating graph reuse with different inputs. I've included a couple of suggestions: one to enhance performance in context.rs by avoiding a potentially costly clone operation, and another to improve the maintainability of the new test file by standardizing the language used in comments and output messages.

Comment on lines 619 to 622
let kernel = if cg_kernels.is_some() {
// 从已加载的 kernels 中通过 kernel_id 获取
self.kernels.get(kernel_call.kernel_id).clone()
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this change correctly fixes the bug by not consuming the kernel, the use of .clone() on Kernel<C> could be inefficient if the kernel object is large, as it involves a deep copy of the circuit structure. This might impact performance when loading large, pre-compiled computation graphs.

Consider refactoring to avoid this clone. One approach is to use std::borrow::Cow<Kernel<C>> to hold either a borrowed reference (in the if case) or an owned kernel (in the else case). This would prevent the unnecessary clone when a kernel is already available in self.kernels.

Example using Cow:

use std::borrow::Cow;

// ...

let kernel: Cow<Kernel<C>> = if cg_kernels.is_some() {
    Cow::Borrowed(self.kernels.get(kernel_call.kernel_id))
} else {
    // ... logic to compile primitive ...
    Cow::Owned(compile_primitive(kernel_primitive, &psi, &pso)?)
};

// ... later

let kernel_id = self.kernels.add(&*kernel);

Comment on lines 26 to 29
println!("\n===== 第一次执行:创建并保存图(BN254) =====");
let mut ctx1: Context<C> = Context::default();

// 第一组输入数据(BN254 field 元素)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comments and println! messages in this new test file are in Chinese, while the rest of the codebase appears to be in English. To maintain consistency and ensure the code is easily understandable by all contributors, please translate all Chinese text in this file to English.

Suggested change
println!("\n===== 第一次执行:创建并保存图(BN254 =====");
let mut ctx1: Context<C> = Context::default();
// 第一组输入数据(BN254 field 元素)
println!("\n===== First run: Create and save graph (BN254) =====");
let mut ctx1: Context<C> = Context::default();
// First set of input data (BN254 field elements)

@hczphn
Copy link
Collaborator Author

hczphn commented Feb 3, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug where kernels were being evicted during computation graph reuse, preventing the graph from being executed with new inputs. The change to access kernels without removing them from their container is sound and directly addresses the issue. The addition of a new test case is excellent, as it thoroughly validates the fix by simulating the exact scenario of reusing a compiled graph with different data. I have one minor suggestion to improve documentation around the assumptions for graph reuse, which will enhance future maintainability.

let dm_shapes = self.propagate_and_get_shapes();

let (mut cg_kernels, cg_proof_templates, cg_commitments_lens) = if let Some(cg) = cg {
let (cg_kernels, cg_proof_templates, cg_commitments_lens) = if let Some(cg) = cg {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and prevent future misuse, it would be beneficial to add a comment explaining the key assumption for reusing a computation graph. Specifically, that call_kernel must be invoked with the same sequence of kernel primitives as when the graph was originally compiled. This ensures that kernel_ids align correctly when fetching pre-compiled kernels from the loaded graph.

@niconiconi niconiconi merged commit d96dad4 into master Feb 5, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants