Skip to content

Conversation

@crypdick
Copy link
Contributor

Description

Automatically exclude common directories (.git, .venv, venv, pycache) when uploading working_dir in runtime environment packages.

At a minimum we need to exclude .git/ because unlike the others, nobody includes .git/ in .gitignore. This causes Ray to throw a ray.exceptions.RuntimeEnvSetupError if your .git dir is larger than 512 MiB.

I also updated the documentation in handling-dependencies.rst and improved the error message if the env exceeds the GCS_STORAGE_MAX_SIZE limit.

Related issues

N/A

Additional information

This PR pytorch/tutorials#3709 was failing to run because the PyTorch tutorials .git/ folder is huge.

@crypdick crypdick requested review from a team as code owners December 19, 2025 03:28
Ricardo Decal added 2 commits December 18, 2025 19:29
Signed-off-by: Ricardo Decal <public@ricardodecal.com>
Signed-off-by: Ricardo Decal <public@ricardodecal.com>
@crypdick crypdick force-pushed the bugfix/default-excludes-working-dir branch from 57c19af to 631fa2e Compare December 19, 2025 03:29
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a helpful feature to automatically exclude common directories like .git and venv from working_dir uploads, preventing common errors with large repositories. The implementation is clean, and it's great to see that it's accompanied by thorough documentation updates and both unit and integration tests. My only suggestion is a minor improvement to the type hinting for better code clarity.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Ricardo Decal <crypdick@users.noreply.github.com>
@ray-gardener ray-gardener bot added docs An issue or change related to documentation core Issues that should be addressed in Ray Core labels Dec 19, 2025
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
@iamjustinhsu
Copy link
Contributor

Nice nice. Just some questions:

Can you clarify the following scenarios in these scenarios:

  • excludes=[], .rayignore=["file.txt"]
  • excludes=["file.txt"], .rayignore=[]
  • excludes=["file.txt"], .rayignore=["file.txt"]
  • excludes=["file.txt"], .rayignore=["file2.txt"]

I'm wondering

  1. Why do have "excludes" when we had ".gitignore" previously?
  2. Is the use-case necessary if users can specify venv, .git, pycache in their .rayignore file?

@github-actions
Copy link

github-actions bot commented Jan 3, 2026

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 3, 2026
@crypdick crypdick added the unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. label Jan 4, 2026
@crypdick
Copy link
Contributor Author

crypdick commented Jan 5, 2026

Thanks for the review @iamjustinhsu . Some thoughts:

  1. the sources are unioned, so a file is excluded if it matches any source. So for your first 3 examples, file.txt & the default files are excluded, and in the last example file.txt & file2.txt & the default files are excluded.
  2. the excludes param existed before this PR. It's a programmatic way to specify exclusions instead of having to edit static files (.gitignore, .rayignore). This PR just adds some default values.
  3. so yes, users are able to set this behavior by manually creating .rayignore. This PR is about improving the default UX.

I don't think users should have to learn about .rayignore in order to use Ray for the first time. For example, I am in the process of submitting a bunch of tutorials to the official PyTorch docs, and if anyone tries to run them they will immediately get the RuntimeEnvSetupError error since pytorch/tutorials/.git/ is so large. .git/ is never included in .gitignore, so users would always have to create a .rayignore file in their repos to prevent this behavior.

If I may flip the question: why is it desirable for the to upload .git/, .venv, __pycache__/ to workers by default? I don't see any value in doing this, only downsides: extra overhead, and the potential for production workloads to break. Ray should just work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core docs An issue or change related to documentation stale The issue is stale. It will be closed within 7 days unless there are further conversation unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants