Skip to content

Move Iceberg-only API classes to iceberg-api module #3241

@andygrove

Description

@andygrove

Summary

Move classes that are exclusively used by Iceberg (not used by any non-@IcebergApi Comet code) from common into the new iceberg-api module, added in #3240. This will improve the separation of concerns and make the Iceberg API boundary clearer.

Classes to Move

The following 4 classes are only referenced by other @IcebergApi classes and exist purely for Iceberg integration:

Class Current Location Purpose
ParquetColumnSpec common/src/main/java/org/apache/comet/parquet/ Bridge class to pass column descriptors between Comet and Iceberg (since Iceberg shades Parquet)
RowGroupReader common/src/main/java/org/apache/comet/parquet/ PageReadStore implementation for row group reading
WrappedInputFile common/src/main/java/org/apache/comet/parquet/ Wraps Iceberg's InputFile to implement Parquet's InputFile interface
AbstractCometSchemaImporter common/src/main/java/org/apache/arrow/c/ Base class for schema import functionality

Important Considerations

Packaging/Shading Concerns

⚠️ This change requires careful attention to packaging and shading:

  1. Iceberg shades Parquet - ParquetColumnSpec exists specifically because Iceberg uses shaded Parquet classes. Any module structure changes must preserve this compatibility.

  2. JAR structure - Ensure the classes remain accessible to Iceberg after the move. The final packaged JAR must include these classes in a location Iceberg can access.

  3. Maven shade plugin configuration - Review and update shade plugin configuration if needed to ensure:

    • Classes are included in the shaded JAR
    • Package relocations don't break Iceberg's access
    • No duplicate classes across modules
  4. Dependency direction - The iceberg-public-api module currently only contains tests. If we move source classes there, we need to:

    • Update module dependencies
    • Ensure common module can still depend on these classes if needed
    • Consider if a separate iceberg-api module (non-test) is more appropriate

Testing Requirements

  • Verify Iceberg integration tests pass after the move
  • Verify shaded JAR contains all necessary classes
  • Test with actual Iceberg to ensure no classloader issues
  • Run iceberg-public-api module tests

Alternative Approaches

  1. Keep in common, just document - Leave classes in common but document they're Iceberg-specific
  2. New iceberg-api source module - Create a separate source module (not test-only) for Iceberg API classes
  3. Subpackage organization - Move to org.apache.comet.iceberg.* package within common

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions