-
Notifications
You must be signed in to change notification settings - Fork 110
feat: add fsspec support #444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
❌ DCO Check Failed Hi @Nintorac, your pull request has failed the Developer Certificate of Origin (DCO) check. This repository supports remediation commits, so you can fix this without rewriting history — but you must follow the required message format. 🛠 Quick Fix: Add a remediation commitRun this command: git commit --allow-empty -s -m "DCO Remediation Commit for Nintorac <Nintorac@users.noreply.github.com>
I, Nintorac <Nintorac@users.noreply.github.com>, hereby add my Signed-off-by to this commit: 0ab64663e8039dc85cbfe01fc46f275be779c034"
git push🔧 Advanced: Sign off each commit directlyFor the latest commit: git commit --amend --signoff
git push --force-with-leaseFor multiple commits: git rebase --signoff origin/main
git push --force-with-leaseMore info: DCO check report |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Require two reviewer for test updatesThis rule is failing.When test data is updated, we require two reviewers
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
|
@Nintorac this looks interesting, but we would need to validate the dependency impact it brings to docling-core. We are trying to keep docling-core dependencies light and make them optional where possible. Do you think Also, please note we have created a layer for handling workloads on non-local file storage already with docling-jobkit. It implements a number of connectors for different cloud storage. |
|
No need for any further dependencies, the main changes here are to make sure all filesystem calls run through pathlib, UPath implements the pathlib api - so tests also only need to test that it works with pathlib.Path I think if this lib were to implement anything it would be optional and at the fsspec level but this is a simple enough way to get compatibility without extra dependencies and minimal code changes. In case you haven't gone down the rabbit hole fsspec is quite well supported - |
|
Ah sorry, didn't mean to send the last message from that account |
ceberam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Nintorac for creating this PR.
Please review the contributing guidelines and ensure that the PR follows the conventional commits and that the code checks pass.
|
@Nintorac This is an interesting PR, for finalizing we need
Additionally, (optional) we could add some tests which uses the |
cca2acd to
dd5e1c2
Compare
Enable cloud storage backends (S3, GCS, Azure, etc.) for document serialization methods via fsspec/universal-pathlib integration.
dd5e1c2 to
0ab6466
Compare
|
Sorry, got a little busy. I have fixed the commit message and added some testing :) |
Unfortunately the commit message still doesn't contain the sign-off line. FYI, we usually add it by doing the commit with |
Great project! Thank you :)
Here is a quick and dirty AI sketchup to resolve #443 - still need some tests and polish, but wanted to get some initial feedback. Is this something you're interested in? Is there anything missed?
Cheers :)
some manual test code, instal upath with
pip install universal_pathlib