Support attaching SQLite files residing on S3#198
Closed
staticlibs wants to merge 4 commits into
Closed
Conversation
When DuckDB file on S3 is attached as read-only, the remote file system instance (`S3FileSystem` from `httpfs` extension) is used to perform the remote scans over that file. This does not work for SQLite files, as the file path is passed to `sqlite3_open_v2` that doesn't know about remote filesystems. As a remote `ATTACH` is supposed to be read-only, a possible workaround is to download the whole `.sqlite` file locally and then attach it as readonly. This PR implements the auto-download for remote SQLite files (`https://`, `s3://`, `abfss://` etc) into a system temp directory and open the file from there (file is deleted on `DETACH`). It is intended to be used with the files of reasonable size in scenarios, when manual download is not possible/convenient. For example: opening a remote DuckLake catalog (residing on S3) from a BI tool. Testing: new test is added that opens an SQLite file from GitHub; also tested manualy DuckLake catalogs on `s3://` and `abfss://`. Fixes: duckdb/ducklake#912
Contributor
|
Thanks - getting this to work is cool but I think the vfs direction seems more promising for this. See #66 - maybe that can be picked up again / reworked to get this to work? That has a bunch of other nice outcomes (e.g. making this extension work in WASM as well). |
Member
Author
|
VFS indeed will be much nicer than this workaround, closing this one in favour of it. |
Contributor
Just wanted to mention that there's what I believe is now a complete VFS implementation in #154 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When DuckDB file on S3 is attached as read-only, the remote file system instance (
S3FileSystemfromhttpfsextension) is used to perform the remote scans over that file. This does not work for SQLite files, as the file path is passed tosqlite3_open_v2that doesn't know about remote filesystems.As a remote
ATTACHis supposed to be read-only, a possible workaround is to download the whole.sqlitefile locally and then attach it as readonly.This PR implements the auto-download for remote SQLite files (
https://,s3://,abfss://etc) into a system temp directory and open the file from there (file is deleted onDETACH). It is intended to be used with the files of reasonable size in scenarios, when manual download is not possible/convenient. For example: opening a remote DuckLake catalog (residing on S3) from a BI tool.Testing: new test is added that opens an SQLite file from GitHub; also tested manualy DuckLake catalogs on
s3://andabfss://.Fixes: duckdb/ducklake#912