fix(security): eliminate SQL injection via config interpolation#21
Merged
Conversation
Every path that interpolated user-supplied connection config into
DuckDB SQL (SET s3_*, iceberg_scan, iceberg_metadata, read_text,
CREATE SECRET, ATTACH, SHOW TABLES) is now protected by a combination
of:
1. Strict Pydantic validators on ConnectionConfig that reject any
value containing SQL-breaking characters (quotes, semicolons,
whitespace, control chars). Patterns are scoped per field:
- endpoint: URL-safe chars
- region: alphanumeric + dashes
- sessionToken: base64url + common token chars
- catalogEndpoint: http(s):// URL
- namespace: SQL identifier
- tablePath: s3:// URL, normalised to strip trailing / and
/metadata suffix so downstream code can rely
on a canonical value
storageType / catalogType are now Literal enums.
2. DuckDB ? parameter binding for every site that supports it: all
SET statements (endpoint, region, session token, access/secret
keys), iceberg_scan(), iceberg_metadata(), read_text().
3. A _sql_string_literal helper used only for CREATE SECRET and
ATTACH (which don't accept prepared-statement parameters). It
doubles embedded single quotes and refuses control characters —
defence in depth on top of the regex validators.
The test_connection flow no longer mutates config.tablePath; the
Pydantic validator normalises once at ingress.
Manually verified with the demo stack: happy-path queries return
37,537 rows; attempts to inject into endpoint, tablePath, region,
namespace, or pass an unknown storageType all return HTTP 422
before any SQL is built.
Closes #5
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Every site where user-supplied connection config was f-string-interpolated into DuckDB SQL is now safe. Defence is layered:
ConnectionConfigreject SQL-breaking characters at the API boundary. Per-field patterns:endpoint— URL-safe charsregion— alphanumeric + dashessessionToken— base64url + common token charscatalogEndpoint—http(s)://...URLnamespace— SQL identifier ([A-Za-z_][A-Za-z0-9_]*)tablePath—s3://..., and normalised to strip trailing/and/metadataso downstream code gets a canonical valuestorageType/catalogTypeare nowLiteralenums?parameter binding for every statement that supports it: allSET s3_*,iceberg_scan(?),iceberg_metadata(?),read_text(?)._sql_string_literalhelper for the two DDLs that don't support?(CREATE SECRET,ATTACH). It doubles embedded single quotes and rejects control chars — defence in depth on top of the validators.test_connectionno longer mutatesconfig.tablePath; the Pydantic validator normalises once at ingress.Before / after
SET s3_endpoint='{endpoint}'(MinIO, R2)?bindingSET s3_region='{region}'?bindingSET s3_session_token='{token}'?bindingiceberg_metadata('{path}')?bindingread_text('{path}/...')?bindingiceberg_scan('{path}')?bindingCREATE SECRET ... TOKEN '{k}:{s}'ATTACH '{ns}' ... ENDPOINT '{ep}'SHOW TABLES FROM iceberg_catalog.{ns}Verification (done locally)
Full
docker compose down -v && docker compose up --build. Tests against the running stack:POST /api/queryhappy path against demo37537rowsPOST /api/connect/testwith demo (no tablePath)200 successPOST /api/connect/testwith valid tablePath200 success+suggestedQuerytablePathwith trailing/tablePathending in/metadataendpointcontaining'; DROP TABLE x; --tablePathcontaining injectionregioncontaining injectionnamespacecontaining;storageType: "hacker"Test plan for reviewer
docker compose down -v && docker compose up --buildcurlthe demo query (see PR description) — expect 37,537curlan injection attempt — expect 422Follow-ups (out of scope)
#6(sqlglot validator) addresses the user's raw SQL, which is a separate surface. The_convert_to_iceberg_queryrewriter still operates on user SQL strings — acceptable because the output goes through DuckDB's own parser, and Replace substring-based destructive keyword check with AST validation #6 will enforce SELECT-only semantics.#10(tests) will cover the validators and binding sites with automated cases.Closes #5