feat: prototype async API, with demonstrable perf improvements via benchmark#73
feat: prototype async API, with demonstrable perf improvements via benchmark#73Pr0methean wants to merge 162 commits intomasterfrom
Conversation
This reverts commit 1da7060.
# Conflicts: # Cargo.toml # examples/write_dir.rs # src/compression.rs # src/crc32.rs # src/lib.rs # src/read.rs # src/result.rs # src/spec.rs # src/types.rs # src/write.rs
| /* } */ | ||
|
|
||
| /* let write_data = ready!(s.ring.poll_write(cx, s.ring.capacity())); */ | ||
| /* /\* FIXME: i'm pretty sure this can cause a segfault with the current */ |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality
| use std::{cmp, task::ready}; | ||
|
|
||
| pub trait KnownExpanse { | ||
| /* TODO: make this have a parameterized Self::Index type, used e.g. with RangeInclusive or |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality
| let fname = file.file_name.as_bytes(); | ||
|
|
||
| if writer.is_write_vectored() { | ||
| /* TODO: zero-copy!! */ |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality
| let fname = file.file_name.as_bytes(); | ||
|
|
||
| if writer.is_write_vectored() { | ||
| /* TODO: zero-copy!! */ |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality
| let mut name_as_string = name.into(); | ||
| // Append a slash to the filename if it does not end with it. | ||
| if !name_as_string.ends_with('/') { | ||
| /* TODO: ends_with('\\') as well? */ |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality
| }; | ||
|
|
||
| if writer.is_write_vectored() { | ||
| /* TODO: zero-copy?? */ |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality
| }; | ||
|
|
||
| if writer.is_write_vectored() { | ||
| /* TODO: zero-copy?? */ |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality
| ) -> impl Stream<Item = ZipResult<Pin<Box<ZipFile<'_, S, Limiter<S>, Sh>>>>> + '_ { | ||
| let len = self.shared.len(); | ||
| let s = std::cell::UnsafeCell::new(self); | ||
| /* FIXME: make this a stream with a known length! */ |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality
| }; | ||
|
|
||
| if writer.is_write_vectored() { | ||
| /* TODO: zero-copy!! */ |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality
|
|
||
| #[inline] | ||
| pub fn is_dir(&self) -> bool { | ||
| /* TODO: '\\' too? */ |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality
|
is this still in progress @Pr0methean? Would love to see parallelization in this crate! Thanks for your work :D |
|
You'll have to ask @cosmicexplorer, since he's the PR author. |
|
Thanks for the ping! I think #72 is probably worth pursuing first, as it achieves parallelism without the separate effort of an async API (which via tokio includes I will definitely take a look at updating #72, since it makes use of several techniques I initially developed for pex-tool/pex#2175 (which I should also get back to) and which I don't believe have yet been implemented elsewhere. @matthewgapp let me know if synchronous parallel extraction/zipping serves your purposes -- I think an async version of that (which is what this change does) would be very cool, but it requires a lot of extra code, and is slower on all the benchmarks I've tried. If there is some user that's performing a very large number of parallel zip extractions and finds this PR to be more performant than #72, I think it might be worth taking a look at again, but I would at least want to know what performance scenario could possibly produce that, because I wasn't able to figure out how to create such a scenario on my laptop. |
|
The massive amount of changes required here mean I would prefer not to maintain this myself or force @Pr0methean to do so unless there is a very specific use case for which this achieves superior performance to the synchronous work in #72. In general, I'm thinking that if #72 does its job right, it should be fast enough that we don't need to rely on |
Pull request was closed
|
@cosmicexplorer sync parallel execution is just fine! We'd be using it in async code so it would have to be Send. |
|
Ok, great!! Now that I have @Pr0methean reviewing my work I'm thinking a full async API might not be a terrible idea at some point, but I might want to try an mmap-based implementation first. I started playing around with linux-specific splice and copy_file_range and I suspect that would accelerate the split/merge operations used for bulk creation/extraction, but mmap may be more performant for random access operations. In general the performance improvements from here on out probably require using a higher-level and less-general API (as per my research in https://github.com/cosmicexplorer/medusa-zip), and I don't expect they'd replace the standard API of this crate. |
This replaces zip-rs/zip-old#409.