Skip to content

[BUG] scan.filter after reading it as an Arrow table throws #2179

@smaheshwar-pltr

Description

@smaheshwar-pltr

Apache Iceberg version

Most recent PyIceberg

Please describe the bug 🐞

See here and the description below for a failing test.

    table = catalog.load_table(f"default.{identifier}")

    scan = table.scan()
    # assert len(scan.to_arrow()) > 0

    scan = scan.filter("ts >= '2023-03-05T00:00:00+00:00'")
    assert len(scan.to_arrow()) > 0

This code works fine, but uncommenting the first assertion causes the filter call to throw. The stack trace is immediately helpful:

pyiceberg/table/__init__.py:1710: in filter
    return self.update(row_filter=And(self.row_filter, _parse_row_filter(expr)))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <pyiceberg.table.DataScan object at 0x11c065cd0>
overrides = {'row_filter': GreaterThanOrEqual(term=Reference(name='ts'), literal=literal('2023-03-05T00:00:00+00:00'))}

    def update(self: S, **overrides: Any) -> S:
        """Create a copy of this table scan with updated fields."""
>       return type(self)(**{**self.__dict__, **overrides})
E       TypeError: TableScan.__init__() got an unexpected keyword argument 'partition_filters'

pyiceberg/table/__init__.py:1694: TypeError

DataScan has a cached_property partition_filters (see here) that will turn up in self.__dict__ below in the update method:

def update(self: S, **overrides: Any) -> S:
"""Create a copy of this table scan with updated fields."""
return type(self)(**{**self.__dict__, **overrides})

This will happen if the cache property has been accessed once - i.e. if the scan has already had plan_files called on it (essentially, if it's been read).

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions