Fix computation graph reuse issue by preventing kernel eviction#198
Fix computation graph reuse issue by preventing kernel eviction#198niconiconi merged 2 commits intomasterfrom
Conversation
Summary of ChangesHello @hczphn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求解决了计算图执行中的一个重要缺陷,即由于内核提取方式不当,导致内核在多次操作中无法重复使用。通过修改内核访问逻辑,使其不再从容器中移除内核,而是直接获取,本更改确保了内核的持久性。这不仅修复了在同一计算图中重复使用内核时出现的错误,还使得编译后的计算图能够灵活地与不同的输入数据一起重复使用,从而提高了系统的稳定性和效率。 Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively resolves a critical bug concerning the reuse of computation graphs by preventing the premature eviction of kernels. The core logic change in context.rs is correct and directly addresses the issue. The addition of a new test case in test_bn254_new_data.rs is excellent, as it thoroughly validates the fix by simulating graph reuse with different inputs. I've included a couple of suggestions: one to enhance performance in context.rs by avoiding a potentially costly clone operation, and another to improve the maintainability of the new test file by standardizing the language used in comments and output messages.
| let kernel = if cg_kernels.is_some() { | ||
| // 从已加载的 kernels 中通过 kernel_id 获取 | ||
| self.kernels.get(kernel_call.kernel_id).clone() | ||
| } else { |
There was a problem hiding this comment.
While this change correctly fixes the bug by not consuming the kernel, the use of .clone() on Kernel<C> could be inefficient if the kernel object is large, as it involves a deep copy of the circuit structure. This might impact performance when loading large, pre-compiled computation graphs.
Consider refactoring to avoid this clone. One approach is to use std::borrow::Cow<Kernel<C>> to hold either a borrowed reference (in the if case) or an owned kernel (in the else case). This would prevent the unnecessary clone when a kernel is already available in self.kernels.
Example using Cow:
use std::borrow::Cow;
// ...
let kernel: Cow<Kernel<C>> = if cg_kernels.is_some() {
Cow::Borrowed(self.kernels.get(kernel_call.kernel_id))
} else {
// ... logic to compile primitive ...
Cow::Owned(compile_primitive(kernel_primitive, &psi, &pso)?)
};
// ... later
let kernel_id = self.kernels.add(&*kernel);| println!("\n===== 第一次执行:创建并保存图(BN254) ====="); | ||
| let mut ctx1: Context<C> = Context::default(); | ||
|
|
||
| // 第一组输入数据(BN254 field 元素) |
There was a problem hiding this comment.
The comments and println! messages in this new test file are in Chinese, while the rest of the codebase appears to be in English. To maintain consistency and ensure the code is easily understandable by all contributors, please translate all Chinese text in this file to English.
| println!("\n===== 第一次执行:创建并保存图(BN254) ====="); | |
| let mut ctx1: Context<C> = Context::default(); | |
| // 第一组输入数据(BN254 field 元素) | |
| println!("\n===== First run: Create and save graph (BN254) ====="); | |
| let mut ctx1: Context<C> = Context::default(); | |
| // First set of input data (BN254 field elements) |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a bug where kernels were being evicted during computation graph reuse, preventing the graph from being executed with new inputs. The change to access kernels without removing them from their container is sound and directly addresses the issue. The addition of a new test case is excellent, as it thoroughly validates the fix by simulating the exact scenario of reusing a compiled graph with different data. I have one minor suggestion to improve documentation around the assumptions for graph reuse, which will enhance future maintainability.
| let dm_shapes = self.propagate_and_get_shapes(); | ||
|
|
||
| let (mut cg_kernels, cg_proof_templates, cg_commitments_lens) = if let Some(cg) = cg { | ||
| let (cg_kernels, cg_proof_templates, cg_commitments_lens) = if let Some(cg) = cg { |
There was a problem hiding this comment.
To improve maintainability and prevent future misuse, it would be beneficial to add a comment explaining the key assumption for reusing a computation graph. Specifically, that call_kernel must be invoked with the same sequence of kernel primitives as when the graph was originally compiled. This ensures that kernel_ids align correctly when fetching pre-compiled kernels from the loaded graph.
Summary: Fixed a bug where kernels were missing during graph execution due to a conflict between insertion deduplication and extraction popping.
Technical Details:
The Bug: During kernel extraction, the code used pop(), which removed the kernel from the graph. A single kernel entry might be needed by multiple operations. Since the insertion logic deduplicates kernels, popping it on the first call caused subsequent calls in the same graph to fail.
The Fix: Access kernels without removing them from the graph container.
Result: Fixes errors when a kernel is reused and allows the graph to be reused for different witness inputs.