Skip to content

DataScan count method does not respect limit #2121

@jayceslesar

Description

@jayceslesar

Apache Iceberg version

0.9.1 (latest release)

Please describe the bug 🐞

When calling count() on a DataScan, limit is not respected. Seems trivial but if I set a limit of 5 I expect 5 or less rows back, at least with a scan-like implementation

The underlying ArrowScan does not get passed the limit param

https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L1940

This results in scans taking longer due to not respecting the limit.

The fix will involve more than just passing the limit to the ArrowScan

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions