Conversation
Summary of ChangesHello @hczphn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the proving system by introducing a flexible task scheduling mechanism for MPI processes, optimizing the proving pipeline for better performance and reduced memory footprint, and expanding the capabilities of LogUp queries. These changes aim to provide more control over resource utilization and improve the overall efficiency and adaptability of the system. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces several significant optimizations and features, including a more efficient proving pipeline, scheduled GKR proving with dynamic MPI process allocation, and extended LogUp support. The changes are well-aligned with the description and aim to improve performance and flexibility.
My review has identified a few areas for improvement. Most critically, there is a memory leak in the new schedule-based execution logic due to the use of Box::leak when creating MPI subgroups. I've also noted some minor issues such as unused variables, opportunities for more robust error handling in file parsing, and a potentially unreachable code block that could be clarified or removed for better maintainability. The addition of a new test case for reusing computation graphs is a great way to validate the optimizations made.
|
|
||
| // Only participants return a valid MPIConfig | ||
| if my_position_opt.is_some() { | ||
| let split_comm_static: &'static Option<_> = Box::leak(Box::new(split_comm)); |
There was a problem hiding this comment.
Using Box::leak here will leak memory for every MPI communicator created. Since this function create_mpi_subgroup_for_task is called in a loop for each unique task, it will result in a memory leak for every task. This can become significant for large schedules.
While obtaining a 'static lifetime for the MPI communicator might be necessary due to the mpi crate's API, this approach is unsafe and leads to leaks.
Consider managing the lifetime of the communicators more carefully. Perhaps they can be stored in the mpi_prove_no_oversubscribe_with_schedule function and explicitly dropped when they are no longer needed. If mpi-rs requires 'static, another approach might be to use an arena allocator or a custom pool for communicators that can be cleaned up at the end of the proving process.
| let mut inputs = vec![builder.constant(self.table.len() as u32)]; | ||
| //append table keys | ||
| for i in 0..self.table.len() { | ||
| inputs.push(self.table[i][0]); | ||
| } | ||
| //append query keys | ||
| inputs.extend(self.query_keys.clone()); |
| query_count: &[Variable], | ||
| ) { | ||
| let alpha = builder.get_random_value(); | ||
| let inputs = self.query_keys.clone(); |
|
|
||
| let parts: Vec<&str> = line.split(':').collect(); | ||
| if parts.len() != 2 { | ||
| continue; |
There was a problem hiding this comment.
|
|
||
| let parts: Vec<&str> = line.split(':').collect(); | ||
| if parts.len() != 2 { | ||
| continue; |
There was a problem hiding this comment.
| } else { | ||
| eprintln!( | ||
| "[RANK {}] Executing task {} solo (creating singl | ||
| e-rank MPI config)", | ||
| my_rank, task_name | ||
| ); | ||
|
|
||
| // Create a single-rank MPI config for solo tasks | ||
| // We use the pre-created config from task_mpi_configs | ||
| let solo_config = task_mpi_configs.get(task_name).and_then(|c| c.as_ref()); | ||
|
|
||
| if let Some(config) = solo_config { | ||
| prove_kernel_gkr_no_oversubscribe::< | ||
| GetFieldConfig<ZC>, | ||
| GetTranscript<ZC>, | ||
| ZC::ECCConfig, | ||
| >( | ||
| config, | ||
| &computation_graph.kernels()[template.kernel_id()], | ||
| &commitment_values, | ||
| next_power_of_two(template.parallel_count()), | ||
| template.is_broadcast(), | ||
| n_bytes_profiler, | ||
| ) | ||
| } else { | ||
| // Fallback: use global MPI config for solo task | ||
| prove_kernel_gkr_no_oversubscribe::< | ||
| GetFieldConfig<ZC>, | ||
| GetTranscript<ZC>, | ||
| ZC::ECCConfig, | ||
| >( | ||
| global_mpi_config, | ||
| &computation_graph.kernels()[template.kernel_id()], | ||
| &commitment_values, | ||
| next_power_of_two(template.parallel_count()), | ||
| template.is_broadcast(), | ||
| n_bytes_profiler, | ||
| ) | ||
| } | ||
| }; |
There was a problem hiding this comment.
This else block for handling "solo" tasks seems confusing. The i_am_participant check at line 1015 should ensure that this code is only run by ranks participating in the current task. For a participant, local_mpi_config (retrieved from task_mpi_configs) should be Some(MPIConfig).
This else branch, which is taken when local_mpi_config is None, appears to be unreachable for a participating rank. If it is reachable, it suggests a potential logic issue in how MPI subgroups are created or tracked.
Could you clarify the conditions under which a participating rank would have a None local_mpi_config? If this branch is indeed unreachable, it should be removed to improve clarity.
| eprintln!( | ||
| "[RANK {}] Executing task {} solo (creating singl | ||
| e-rank MPI config)", | ||
| my_rank, task_name | ||
| ); |
There was a problem hiding this comment.
This PR introduces several optimizations to the proving system, improves the scheduling flexibility for MPI processes, and extends the LogUp query capabilities.
Key Changes:
Simplified Setup: The setup result was previously redundant and unused, effectively causing the system to perform the setup twice. I have modified it to return a default value, significantly reducing setup time.
Lightweight Prove Interface: Optimized the prove interface to eliminate unnecessary dependencies on setup and the full computation graph. The new interface is lightweight and only requires device_memory, improving efficiency and reducing memory overhead.
Dynamic Process Scheduling: Introduced a scheduled GKR proving mechanism.
If a schedule.txt file is provided, MPI processes are allocated and tasks are executed according to the specified order in the file.
If no schedule file is provided, the system falls back to the default execution order.
Location: Implementation can be found in expander_compiler/src/zkcuda/proving_system/expander_no_oversubscribe/prove_impl.rs.
Extended LogUp Support
Hintless LogUp Queries: Added support for LogUp query interfaces that do not include hints, providing more flexibility for different lookup configurations.
MPI & Environment Configuration
Configurable MPI Limits: Users can now set the maximum number of MPI processes via an environment variable, allowing for better resource management in different hardware environments.
Impact:
Performance: Drastic reduction in setup time and memory footprint during the proving phase.
Flexibility: Granular control over how MPI ranks handle GKR tasks via external schedule files.
Usability: Improved configuration through environment variables and more versatile query interfaces.