Skip to content

feat!: Enhance compression codec enum.#2288

Merged
blackmwk merged 7 commits intoapache:mainfrom
emkornfield:add_compression_level
Mar 31, 2026
Merged

feat!: Enhance compression codec enum.#2288
blackmwk merged 7 commits intoapache:mainfrom
emkornfield:add_compression_level

Conversation

@emkornfield
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

This is an intermediate PR for #1731

I'm splitting out changes from #1851 to the compression codec to make it easier to review. Once we decide on approach here and merge it I'll update #1851 accordingly.

What changes are included in this PR?

  • Add optional compression level to gzip and zstd (needed for when avro compression usage).
  • Add Snappy as a compression codec (also will be used for Avro)
  • Manually code up some previously auto-generated methods as a result.

AI helped with an initial version of this PR.

Are these changes tested?

Additional unit tests

- Add optional compression level to gzip and zstd (needed for when avro
  compression usage).
- Add Snappy as a compression codec (also will be used for Avro)
@emkornfield
Copy link
Copy Markdown
Contributor Author

@liurenjie1024 @blackmwk would appreciate your thoughts on this approach.

Copy link
Copy Markdown
Collaborator

@CTTY CTTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left some minor comments

),
)),
CompressionCodec::None | CompressionCodec::Gzip(_) => Ok(codec),
CompressionCodec::Lz4 | CompressionCodec::Zstd(_) | CompressionCodec::Snappy => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since we only support None and Gzip, this can be _ => to fail all other cases

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Zstd,
/// Gzip compression
Gzip,
/// Zstandard single compression frame with content size present. Optional level 0–22.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We should mention that level 0 for zstd means default compression level but not no compression like Gzip

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

CompressionCodec::Gzip => {
let mut encoder = GzEncoder::new(Vec::new(), Compression::default());
CompressionCodec::Gzip(level) => {
let compression = Compression::new(level.unwrap_or(6).min(9) as u32);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just call Compression::default() when level is None

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

CompressionCodec::Snappy => Err(Error::new(
ErrorKind::FeatureUnsupported,
"Snappy decompression is not supported currently",
)),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to address this in a follow up PR? If so, could you create a tracking issue?

Copy link
Copy Markdown
Contributor Author

@emkornfield emkornfield Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know the only place Snappy is used is within Avro, and Avro has its own code for doing the compression.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense

Copy link
Copy Markdown
Contributor

@blackmwk blackmwk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emkornfield for this pr!

Gzip,
/// Zstandard single compression frame with content size present. Optional level 0–22,
/// where 0 means default compression level (not no compression, unlike Gzip).
Zstd(Option<u8>),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no no compression option, why this has to be Option?

CompressionCodec::Zstd(level) => {
let writer = Vec::<u8>::new();
let mut encoder = zstd::stream::Encoder::new(writer, 3)?;
let mut encoder = zstd::stream::Encoder::new(writer, level.unwrap_or(3) as i32)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this pr, but I don't think we should use a magic number here. We should create a constant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to a private constant used in default construction.

let mut encoder = GzEncoder::new(Vec::new(), Compression::default());
CompressionCodec::Gzip(level) => {
let compression =
level.map_or_else(Compression::default, |l| Compression::new(l.min(9) as u32));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

let json = serde_json::to_string(&original_metadata).unwrap();

let compressed = CompressionCodec::Gzip
let compressed = CompressionCodec::Gzip(None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect? It should be 9?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 9 is default compression level, you could create a method CompressionCodec::gzip_default for it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 I think is max decompression level, 6 (is the default https://docs.rs/flate2/latest/src/flate2/lib.rs.html#251), by assigning None here I think we end up with 6. I'm in the process of changing the code to have a default constructor without the None.

let blobs = vec![blob_0(), blob_1()];
let blobs_with_compression = blobs_with_compression(blobs.clone(), CompressionCodec::Zstd);
let blobs_with_compression =
blobs_with_compression(blobs.clone(), CompressionCodec::Zstd(None));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, similar to the one with gzip.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry could you elaborate on why you think was incorrect, it appears to test only code and the actual compress call would use the default value?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking not changing testing behavior.

@emkornfield emkornfield requested a review from blackmwk March 30, 2026 22:27
Copy link
Copy Markdown
Contributor

@blackmwk blackmwk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emkornfield for this pr!

@blackmwk blackmwk merged commit d8011a0 into apache:main Mar 31, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants