feat!: Enhance compression codec enum.#2288
Conversation
- Add optional compression level to gzip and zstd (needed for when avro compression usage). - Add Snappy as a compression codec (also will be used for Avro)
|
@liurenjie1024 @blackmwk would appreciate your thoughts on this approach. |
CTTY
left a comment
There was a problem hiding this comment.
LGTM! Left some minor comments
| ), | ||
| )), | ||
| CompressionCodec::None | CompressionCodec::Gzip(_) => Ok(codec), | ||
| CompressionCodec::Lz4 | CompressionCodec::Zstd(_) | CompressionCodec::Snappy => { |
There was a problem hiding this comment.
nit: since we only support None and Gzip, this can be _ => to fail all other cases
crates/iceberg/src/compression.rs
Outdated
| Zstd, | ||
| /// Gzip compression | ||
| Gzip, | ||
| /// Zstandard single compression frame with content size present. Optional level 0–22. |
There was a problem hiding this comment.
nit: We should mention that level 0 for zstd means default compression level but not no compression like Gzip
crates/iceberg/src/compression.rs
Outdated
| CompressionCodec::Gzip => { | ||
| let mut encoder = GzEncoder::new(Vec::new(), Compression::default()); | ||
| CompressionCodec::Gzip(level) => { | ||
| let compression = Compression::new(level.unwrap_or(6).min(9) as u32); |
There was a problem hiding this comment.
We can just call Compression::default() when level is None
| CompressionCodec::Snappy => Err(Error::new( | ||
| ErrorKind::FeatureUnsupported, | ||
| "Snappy decompression is not supported currently", | ||
| )), |
There was a problem hiding this comment.
Do we plan to address this in a follow up PR? If so, could you create a tracking issue?
There was a problem hiding this comment.
As far as I know the only place Snappy is used is within Avro, and Avro has its own code for doing the compression.
blackmwk
left a comment
There was a problem hiding this comment.
Thanks @emkornfield for this pr!
crates/iceberg/src/compression.rs
Outdated
| Gzip, | ||
| /// Zstandard single compression frame with content size present. Optional level 0–22, | ||
| /// where 0 means default compression level (not no compression, unlike Gzip). | ||
| Zstd(Option<u8>), |
There was a problem hiding this comment.
If there is no no compression option, why this has to be Option?
crates/iceberg/src/compression.rs
Outdated
| CompressionCodec::Zstd(level) => { | ||
| let writer = Vec::<u8>::new(); | ||
| let mut encoder = zstd::stream::Encoder::new(writer, 3)?; | ||
| let mut encoder = zstd::stream::Encoder::new(writer, level.unwrap_or(3) as i32)?; |
There was a problem hiding this comment.
Not related to this pr, but I don't think we should use a magic number here. We should create a constant.
There was a problem hiding this comment.
moved to a private constant used in default construction.
crates/iceberg/src/compression.rs
Outdated
| let mut encoder = GzEncoder::new(Vec::new(), Compression::default()); | ||
| CompressionCodec::Gzip(level) => { | ||
| let compression = | ||
| level.map_or_else(Compression::default, |l| Compression::new(l.min(9) as u32)); |
| let json = serde_json::to_string(&original_metadata).unwrap(); | ||
|
|
||
| let compressed = CompressionCodec::Gzip | ||
| let compressed = CompressionCodec::Gzip(None) |
There was a problem hiding this comment.
I think this is incorrect? It should be 9?
There was a problem hiding this comment.
If 9 is default compression level, you could create a method CompressionCodec::gzip_default for it.
There was a problem hiding this comment.
9 I think is max decompression level, 6 (is the default https://docs.rs/flate2/latest/src/flate2/lib.rs.html#251), by assigning None here I think we end up with 6. I'm in the process of changing the code to have a default constructor without the None.
crates/iceberg/src/puffin/writer.rs
Outdated
| let blobs = vec![blob_0(), blob_1()]; | ||
| let blobs_with_compression = blobs_with_compression(blobs.clone(), CompressionCodec::Zstd); | ||
| let blobs_with_compression = | ||
| blobs_with_compression(blobs.clone(), CompressionCodec::Zstd(None)); |
There was a problem hiding this comment.
This is incorrect, similar to the one with gzip.
There was a problem hiding this comment.
I'm sorry could you elaborate on why you think was incorrect, it appears to test only code and the actual compress call would use the default value?
There was a problem hiding this comment.
I was thinking not changing testing behavior.
…ld/iceberg-rust into add_compression_level
blackmwk
left a comment
There was a problem hiding this comment.
Thanks @emkornfield for this pr!
Which issue does this PR close?
This is an intermediate PR for #1731
I'm splitting out changes from #1851 to the compression codec to make it easier to review. Once we decide on approach here and merge it I'll update #1851 accordingly.
What changes are included in this PR?
AI helped with an initial version of this PR.
Are these changes tested?
Additional unit tests