Skip to content

fix: handle LakeFS pagination to return all results beyond default 100-item limit#4349

Open
xuang7 wants to merge 2 commits intoapache:mainfrom
xuang7:fix/uncommitted-objects-100-limit
Open

fix: handle LakeFS pagination to return all results beyond default 100-item limit#4349
xuang7 wants to merge 2 commits intoapache:mainfrom
xuang7:fix/uncommitted-objects-100-limit

Conversation

@xuang7
Copy link
Copy Markdown
Contributor

@xuang7 xuang7 commented Apr 6, 2026

What changes were proposed in this PR?

This PR fixes the issue where retrieveUncommittedObjects and retrieveObjectsOfVersion only returned the first page of results (default 100 items). It adds a fetchAllPages method to retrieve the full result set across all pages. Each request sets .amount(1000) to reduce the number of round trips when fetching paginated results.

When uploading 141 files:

Before (only display 0-99) After (0-140)
before after

Any related issues, documentation, discussions?

Resolves #4343

How was this PR tested?

Added a test that uploads 110 files to a temporary repo and verifies that both retrieveUncommittedObjects (before commit) and retrieveObjectsOfVersion (after commit) return all 110 items.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.6

@chenlica
Copy link
Copy Markdown
Contributor

chenlica commented Apr 6, 2026

@carloea2 Please review it before @aicam does the review.

.asScala
.toList
fetchAllPages[Diff] { after =>
val request = branchesApi.diffBranch(repoName, branchName).amount(1000)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make 1000 configurable? Or why 1000? Thanks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1000 here is the page size for pagination. The fetchallpage will continue requesting pages until all results are retrieved, so I chose a relatively large value to reduce the number of requests. Would you recommend making it configurable, or would using a named constant be sufficient?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it to a named constant PageSize.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I still wonder why 1000?

Copy link
Copy Markdown
Contributor Author

@xuang7 xuang7 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Edited] 1000 is the maximum page size allowed by lakers (default is 100). I picked 1000 to reduce the number of requests. It can be adjusted if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks 👍

Copy link
Copy Markdown
Contributor

@aicam aicam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! left one minor comment?

.getResults
.asScala
.toList
fetchAllPages[Diff] { after =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchAllPages itself is supposed to return all pages, why do we still check after here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchAllPages returns all pages as the final result, and it relies on the provided fetch function to retrieve one page at a time. The after parameter is the cursor passed into each request. I will add a comment to make this clearer. Thanks for the suggestion.

@carloea2
Copy link
Copy Markdown
Contributor

carloea2 commented Apr 8, 2026

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large directory upload shows only 100 completed files

4 participants