Skip to content

feat: prefer upload_large_folder when many files#300

Open
drbh wants to merge 1 commit intomainfrom
prefer-upload-large-folder-when-needed
Open

feat: prefer upload_large_folder when many files#300
drbh wants to merge 1 commit intomainfrom
prefer-upload-large-folder-when-needed

Conversation

@drbh
Copy link
Collaborator

@drbh drbh commented Feb 27, 2026

This PR adds a path to use the upload_large_folder api when there are more than 200 files in the build output. This helps avoid timeouts when many files are in the build. Otherwise the normal upload_folder is preferred since it has a bit more flexibility around delete_patterns and the commit_message

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test as well? No strong opinions, of course.

Comment on lines +79 to +90
if file_count > 200:
print(
f"⚠️ Found {file_count} files to upload, which exceeds the 200 file limit for a single commit. Deleting old build files and re-uploading the whole build folder to avoid hitting file limits."
)
kernel_root_dir = build_dir.parent
api.upload_large_folder(
repo_id=repo_id,
folder_path=kernel_root_dir,
revision=branch,
repo_type="model",
allow_patterns=["build/torch*"],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not just remove old build files, since it can break our version contract. Also, if we automatically switch to a different upload type, the behavior should be exactly the same.

Is it possible to add delete_patterns to upload_large_folder?

if p.is_file() and p.relative_to(build_dir).as_posix().startswith("torch")
)

if file_count > 200:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs:

When dealing with a large folder (thousands of files or hundreds of GB), we recommend using upload_large_folder() instead.

delete_patterns=list(delete_patterns),
commit_message="Build uploaded using `kernels`.",
allow_patterns=["torch*"],
file_count = sum(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a separate function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants