Skip to content

fix(security): eliminate SQL injection via config interpolation#21

Merged
gordonmurray merged 1 commit into
mainfrom
fix/sql-injection
Apr 15, 2026
Merged

fix(security): eliminate SQL injection via config interpolation#21
gordonmurray merged 1 commit into
mainfrom
fix/sql-injection

Conversation

@gordonmurray
Copy link
Copy Markdown
Owner

Summary

Every site where user-supplied connection config was f-string-interpolated into DuckDB SQL is now safe. Defence is layered:

  1. Pydantic validators on ConnectionConfig reject SQL-breaking characters at the API boundary. Per-field patterns:
    • endpoint — URL-safe chars
    • region — alphanumeric + dashes
    • sessionToken — base64url + common token chars
    • catalogEndpointhttp(s)://... URL
    • namespace — SQL identifier ([A-Za-z_][A-Za-z0-9_]*)
    • tablePaths3://..., and normalised to strip trailing / and /metadata so downstream code gets a canonical value
    • storageType / catalogType are now Literal enums
  2. DuckDB ? parameter binding for every statement that supports it: all SET s3_*, iceberg_scan(?), iceberg_metadata(?), read_text(?).
  3. _sql_string_literal helper for the two DDLs that don't support ? (CREATE SECRET, ATTACH). It doubles embedded single quotes and rejects control chars — defence in depth on top of the validators.

test_connection no longer mutates config.tablePath; the Pydantic validator normalises once at ingress.

Before / after

Site Before After
SET s3_endpoint='{endpoint}' (MinIO, R2) f-string ? binding
SET s3_region='{region}' f-string ? binding
SET s3_session_token='{token}' f-string ? binding
iceberg_metadata('{path}') f-string ? binding
read_text('{path}/...') f-string ? binding
iceberg_scan('{path}') f-string ? binding
CREATE SECRET ... TOKEN '{k}:{s}' f-string validated + escaped literal
ATTACH '{ns}' ... ENDPOINT '{ep}' f-string validated + escaped literal
SHOW TABLES FROM iceberg_catalog.{ns} f-string validated SQL identifier

Verification (done locally)

Full docker compose down -v && docker compose up --build. Tests against the running stack:

Scenario Result
POST /api/query happy path against demo 37537 rows
POST /api/connect/test with demo (no tablePath) 200 success
POST /api/connect/test with valid tablePath 200 success + suggestedQuery
tablePath with trailing / normalised, query succeeds
tablePath ending in /metadata normalised, query succeeds
endpoint containing '; DROP TABLE x; -- 422
tablePath containing injection 422
region containing injection 422
namespace containing ; 422
storageType: "hacker" 422

Test plan for reviewer

  • docker compose down -v && docker compose up --build
  • curl the demo query (see PR description) — expect 37,537
  • curl an injection attempt — expect 422
  • Check CI goes green (lint + two docker builds)

Follow-ups (out of scope)

  • #6 (sqlglot validator) addresses the user's raw SQL, which is a separate surface. The _convert_to_iceberg_query rewriter still operates on user SQL strings — acceptable because the output goes through DuckDB's own parser, and Replace substring-based destructive keyword check with AST validation #6 will enforce SELECT-only semantics.
  • #10 (tests) will cover the validators and binding sites with automated cases.

Closes #5

Every path that interpolated user-supplied connection config into
DuckDB SQL (SET s3_*, iceberg_scan, iceberg_metadata, read_text,
CREATE SECRET, ATTACH, SHOW TABLES) is now protected by a combination
of:

1. Strict Pydantic validators on ConnectionConfig that reject any
   value containing SQL-breaking characters (quotes, semicolons,
   whitespace, control chars). Patterns are scoped per field:
     - endpoint:        URL-safe chars
     - region:          alphanumeric + dashes
     - sessionToken:    base64url + common token chars
     - catalogEndpoint: http(s):// URL
     - namespace:       SQL identifier
     - tablePath:       s3:// URL, normalised to strip trailing / and
                        /metadata suffix so downstream code can rely
                        on a canonical value
   storageType / catalogType are now Literal enums.

2. DuckDB ? parameter binding for every site that supports it: all
   SET statements (endpoint, region, session token, access/secret
   keys), iceberg_scan(), iceberg_metadata(), read_text().

3. A _sql_string_literal helper used only for CREATE SECRET and
   ATTACH (which don't accept prepared-statement parameters). It
   doubles embedded single quotes and refuses control characters —
   defence in depth on top of the regex validators.

The test_connection flow no longer mutates config.tablePath; the
Pydantic validator normalises once at ingress.

Manually verified with the demo stack: happy-path queries return
37,537 rows; attempts to inject into endpoint, tablePath, region,
namespace, or pass an unknown storageType all return HTTP 422
before any SQL is built.

Closes #5
@gordonmurray gordonmurray merged commit f8ddd12 into main Apr 15, 2026
3 checks passed
@gordonmurray gordonmurray deleted the fix/sql-injection branch April 15, 2026 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: user-supplied values interpolated into DuckDB SQL via f-string

1 participant