Query your Apache Iceberg data lake in seconds. No clusters. No ops. Just SQL.
You have data in Apache Iceberg. You just want to query it.
But here's what you face:
- Trino/Presto — Heavy clusters, complex setup, operational overhead
- AWS Athena — Vendor lock-in, slower iteration, costs add up
- Local DuckDB — Works great solo, painful to share and collaborate
- Spark — Overkill for exploratory queries, slow startup
- Direct Parquet reads — Fast but dangerous: bypasses Iceberg metadata, can return deleted rows
You don't need a hammer when you need a magnifying glass.
Cloudfloe is a lightweight, browser-based SQL interface for Apache Iceberg data lakes, powered by DuckDB.
- Reads Iceberg correctly — uses the metadata layer, validates snapshots
- Instant queries on S3, R2, or MinIO — no data movement
- Browser-based SQL editor — no CLI, no local setup
- Zero lock-in — your data stays where it is
- Sub-second startup — no cluster spin-up time
- Read-only by design — query, don't mutate
Think of it as a web-based scratchpad for your Iceberg data lake.
| Feature | Description |
|---|---|
| Iceberg Native | Reads via iceberg_scan() — respects metadata and snapshots |
| Table Validation | Auto-detects row-level deletes and rejects unsafe tables |
| Multi-Cloud | AWS S3, Cloudflare R2, MinIO — any S3-compatible storage |
| Web SQL Editor | Syntax highlighting, query history, sample queries |
| Query Stats | Execution time, bytes scanned, rows returned |
| Docker Ready | One command to run locally |
- Docker and Docker Compose
- S3-compatible storage with an Iceberg table (or use the included demo data)
git clone https://github.com/gordonmurray/cloudfloe
cd cloudfloe
docker compose up --buildWait about 30 seconds for initialization, then open http://localhost:3000
On first start, the bundled demo seeds a 37,537-row Iceberg table at s3://movies/warehouse/demo/movies in the local MinIO so you can query it immediately.
In the Connection panel, enter your details:
Storage Type: AWS S3
Endpoint: s3.amazonaws.com (or leave blank for default)
Table Path: s3://your-bucket/warehouse/db/table_name
Access Key: your-access-key
Secret Key: your-secret-key
Region: us-east-1
Notes:
- Table Path should point to the Iceberg table root (where the
/metadatafolder is located) - Do not include
/metadatain the path — Cloudfloe adds it automatically - Trailing slashes are automatically removed
Click Test Connection. On success, the Connection panel shows the table's Iceberg format version, row count, file count, and last snapshot time — plus a sample query loaded into the editor.
After connection succeeds, a query like this will be auto-loaded:
SELECT * FROM iceberg_scan('s3://your-bucket/warehouse/db/table_name') LIMIT 10;Click Run Query to see your data.
SELECT * FROM iceberg_scan('s3://bucket/warehouse/db/table_name') LIMIT 100;SELECT user_id, event_type, timestamp
FROM iceberg_scan('s3://bucket/warehouse/events/user_events')
WHERE event_type = 'purchase'
AND timestamp > '2024-01-01'
ORDER BY timestamp DESC;SELECT
date_trunc('day', timestamp) as day,
COUNT(*) as event_count,
COUNT(DISTINCT user_id) as unique_users
FROM iceberg_scan('s3://bucket/warehouse/events/user_events')
GROUP BY day
ORDER BY day DESC;-- View table snapshots
SELECT * FROM iceberg_snapshots('s3://bucket/warehouse/db/table_name');
-- View manifests and partitions
SELECT * FROM iceberg_metadata('s3://bucket/warehouse/db/table_name');Your AWS credentials need these permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
}
]
}Before using Cloudfloe, confirm your credentials work:
aws s3 ls s3://your-bucket/warehouse/db/table_name/metadata/
aws s3 cp s3://your-bucket/warehouse/db/table_name/metadata/version-hint.text -If these work, Cloudfloe will too.
User query credentials are sent per-request in the API body, applied to a short-lived in-memory DuckDB session, and discarded when the connection closes. They are not:
- read from environment variables
- written to disk
- logged
- stored in a database
That means a docker inspect on the backend container will not surface the S3 keys a user is querying with.
For self-hosted deployments, we recommend the same discipline for any service-level credentials you add (e.g. upstream databases, metadata stores):
- Docker Swarm / Compose: use Docker secrets with the
*_FILEenv var convention. Most upstream images (MinIO, Postgres, etc.) support it natively. - Kubernetes: mount a
Secretas a file under/run/secrets/or a writable config dir. - AWS: prefer IAM roles for the compute layer (EC2 instance profile, ECS task role, EKS IRSA) over baking AKIAs into env vars.
The bundled docker-compose.yml uses plain env vars for the demo MinIO (public credentials cloudfloe / cloudfloe123) — not because that's the right pattern for real data, but to keep docker compose up a single command. Don't copy the demo pattern for production storage.
Supported:
- Iceberg v1 and v2 table formats
- Append-only tables (no deletes)
- Parquet data files
- Time travel queries via snapshots
- Partition pruning
Not yet supported:
- Row-level deletes (position or equality deletes) — tables with deletes will be rejected
- Write operations — read-only for now
- REST Catalog — direct S3 path access only
- Complex schema evolution
If your table has deletes, compact it first using Spark, Trino, or the Iceberg CLI before querying with Cloudfloe.
Cloudfloe refuses to read tables with position or equality delete manifests — silently returning removed rows would break the "reads Iceberg correctly" promise. Compact the table first:
- Spark:
CALL system.rewrite_data_files('<catalog>.<db>.<table>') - Trino:
ALTER TABLE <table> EXECUTE optimize - Iceberg CLI:
iceberg rewrite_data_files
Then re-run the query.
The probe couldn't read any Iceberg metadata at the path. Most common causes, roughly in order:
- Wrong table path. Point at the table root (the directory containing
metadata/anddata/), not atmetadata/itself. Trailing slashes and a trailing/metadataare stripped automatically, but a typo in the bucket or table name won't be. - Missing S3 permissions. Cloudfloe needs
s3:ListBucketon the bucket ands3:GetObjecton everything under the table path. See S3 Access Setup. Verify withaws s3 ls s3://your-bucket/warehouse/db/table_name/metadata/— if that fails, Cloudfloe will too. - Wrong region. The Region field must match the bucket's region (AWS S3). For MinIO and R2 the Region value is generally ignored but must still be set.
- Wrong endpoint (R2 / MinIO). Use the full endpoint hostname, without a scheme:
xxx.r2.cloudflarestorage.com, nothttps://xxx.r2.cloudflarestorage.com.
- Lots of small files — DuckDB can get sluggish past ~10,000 files. The Connection panel shows the file count after a successful probe; if it's high, compact the table.
- No partition filter —
iceberg_scan()reads all partitions unless yourWHEREclause prunes them. Always include a partition column predicate on large tables. - Cold extension — the first query after starting the backend has to download and load the
httpfsandicebergDuckDB extensions. Subsequent queries are much faster.
Cloudfloe is read-only by design — the backend parses every query and rejects anything that isn't a single SELECT/WITH/UNION/VALUES statement. Rewrite the query as a SELECT, or use Spark / Trino / DuckDB CLI directly for write workloads.
+-----------------+
| Frontend | Nginx + HTML/CSS/JS
| (Port 3000) | CodeMirror SQL Editor
+--------+--------+
|
v HTTP
+--------+--------+
| Backend | FastAPI + Python
| (Port 8000) | DuckDB 1.4.1 + Iceberg Extension
+--------+--------+
|
v S3 API
+--------+--------+
| S3 Storage | AWS S3 / R2 / MinIO
| | Iceberg table (metadata + data)
+-----------------+
docker compose up --buildcd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reloadBackend runs on http://localhost:8000
cd frontend
python3 -m http.server 3000Frontend runs on http://localhost:3000
