Use moveTo API to move files for standard buckets#800
Use moveTo API to move files for standard buckets#800Yonghui-Lee wants to merge 7 commits intofsspec:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #800 +/- ##
==========================================
+ Coverage 76.11% 76.66% +0.55%
==========================================
Files 15 15
Lines 2679 2704 +25
==========================================
+ Hits 2039 2073 +34
+ Misses 640 631 -9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
How long has this API been available? :) |
|
Can you attach the benchmark values of standard buckets for both file and folder renames in the description or in a comment? |
|
|
||
| # mv method is already available as sync method in the fsspec.py | ||
| # Async version of it is introduced here so that mv can be used in async methods. | ||
| # TODO: Add async mv method in the async.py and remove from GCSFileSystem. |
There was a problem hiding this comment.
We can remove this ToDo now, as this is GCS specific custom implementation, we can't copy this method to async.py. We will have to keep it in GCSFileSystem only.
There was a problem hiding this comment.
Actually, the implementation looks generic - but no need to take any action about it now. In fact, I don't expect too much of a performance gain (but maybe some) - we just need to make sure it passes the abstract tests for mv.
There was a problem hiding this comment.
ohh yes agree that it is generic, I missed that moveTo API is abstracted by mv_file.
There was a problem hiding this comment.
Should we consider moving the implementation from GCSFileSystem to fsspec? We might also see performance gains if other fsspec implementations also include atomic mv_file like we see in case of GCSFS.
There is also a significant code duplication in mv method from other fsspec methods like copy which can be extracted into a helper.
There was a problem hiding this comment.
I think it's a good idea. But the default implementation of mv_file of AsyncFileSystem isn't atomic. If we use mv_file for mv by default, there may be performance drop for some fsspec implementations.
There was a problem hiding this comment.
The existing mv implementation in AbstractFileSystem is also not atomic, it relies on a copy+delete logic. Therefore IMO changing from copy+delete to mv_file (which is also not atomic in AsyncFileSystem) should not have any performance implications.
@martindurant Let us know your thoughts about this
It has been available for about one year. https://docs.cloud.google.com/storage/docs/release-notes#February_24_2025 |
|
(e2e test error is not showing up for me) |
Tests didn't run yet, we need to add |
|
/gcbrun |
|
(the CI runs should show up as pending if they have not begun yet, but now I get it) |


I update the implementation of move method for standard bucket to use the atomic moveTo API instead of the non-atomic copy-and-delete.
Key Changes
Performance
For previous copy-and-delete implementation
For moveTo API, tasks are executed in coroutines.
The micro benchmark results show that moveTo is 10-20% faster than copy-and-delete using the same batch size.