Subset a PostgreSQL database and anonymize sensitive columns in one step. Extract referentially-intact slices of production data, mask PII, and seed dev, test, or staging environments.
extract- Pick a root table and optional filtering clauses. Fixturize follows foreign keys in both directions (parents and children) to collect a referentially-intact subset of the data. Supports composite FKs, self-referencing tables, and circular dependencies.apply- Load the snapshot into another database. Tables are inserted in FK-dependency order, constraints are deferred.inspect- Display schema structure with FK relationships without extracting any data.analyze- Discover PII columns automatically by scanning column names and types. Covers emails, names, phones, addresses, financial data, API keys, and more. Outputs ready-to-use--maskexpressions.
go build -o ~/bin/fixturize ./cmd/fixturize# one org and everything it touches
fixturize extract --connection "$DB" \
--root "organizations WHERE id = 42"
# 3 random orgs, cap child tables at 500 rows
fixturize extract --connection "$DB" \
--root "organizations ORDER BY random() LIMIT 3" \
--limit 500
# pull in lookup tables that have no FK path, skip audit noise
fixturize extract --connection "$DB" \
--root "organizations WHERE id = 42" \
--include "roles,permissions" \
--exclude "audit_log,event_log"
# preview without writing a file
fixturize extract --connection "$DB" \
--root "users LIMIT 5" --dry-run
# only completed orders for each org
fixturize extract --connection "$DB" \
--root "organizations WHERE id = 42" \
--filter "orders=status='completed'"The --root flag accepts any valid SQL fragment after the table name: WHERE, ORDER BY, LIMIT.
--filter adds a WHERE condition to a specific child table during traversal (repeatable). Format: table=condition.
--include pulls entire tables (all rows) - useful for enums and lookups that aren't FK-linked.
--exclude skips tables completely. You'll get a warning if an excluded table is a FK parent (dangling references).
# load fixture into test DB
fixturize apply --connection "postgresql://..." fixtures/org-42.json
# Wipe target tables first
fixturize apply --connection "postgresql://..." --force fixtures/org-42.jsonPreview the table structure before extraction.
# show all tables with columns, PKs, FKs, unique constraints
fixturize inspect --connection "$DB"
# only tables reachable from users (2 FK hops)
fixturize inspect --connection "$DB" --root users --depth 2Scan schema for PII columns and get ready-to-use --mask expressions:
fixturize analyze --connection "$DB"public.users
email character varying(255) Email MED 'user_' || "id" || '@test.com'
first_name character varying(100) First Name MED 'First' || "id"
last_name character varying(100) Last Name MED 'Last' || "id"
public.contacts
phone character varying(20) Phone MED '+1555' || LPAD(("id" % 10000000)::text, 7, '0')
4 PII column(s) in 2 table(s)
Filter by confidence level or scope to a subgraph:
# only medium+ confidence (skip noisy matches)
fixturize analyze --connection "$DB" --min-confidence medium
# scope to tables reachable from users
fixturize analyze --connection "$DB" --root users --depth 2Detection uses column name patterns (word-level matching, e.g. user_email matches but emailed_at does not) combined with PostgreSQL type checking. Columns that are boolean, timestamp, integer, PK, or FK are automatically excluded to reduce false positives.
Pass --connection or set DATABASE_URL env variable:
export DATABASE_URL="postgresql://user:pass@localhost/mydb"
fixturize extract --root "users LIMIT 10"Replace PII and sensitive columns with SQL expressions during extraction (static data masking — the fixture never contains real values). Expressions run in the SELECT and can reference the same row:
fixturize extract --connection "$DB" \
--root "organizations WHERE id = 42" \
--mask "auth.users.email='user_' || id || '@test.com'" \
--mask "auth.users.name='User ' || id" \
--mask "billing.cards.number='4111111111111111'"Format: schema.table.column=sql_expression (or table.column=expr for public schema).
Since the expression runs as raw SQL in the SELECT, you can use CASE to preserve NULLs:
--mask "users.email=CASE WHEN email IS NOT NULL THEN 'user_' || id || '@test.com' END"Masks are recorded in the fixture metadata (columns masked, expressions used, extraction timestamp) so you have an audit trail of what was anonymized.
Use fixturize analyze to auto-detect which columns need masking and get suggested expressions.
Store default flags in a YAML file instead of typing them every time. Fixturize auto-loads .fixturize.yaml from the current directory, or you can point to a specific file with --profile:
fixturize extract --profile staging.yaml --root "users LIMIT 5"CLI flags always take precedence over profile values.
connection: "postgresql://${DB_USER}:${DB_PASS}@localhost/mydb"
schema: public
masks:
pii:
- "users.email='user_' || id || '@test.com'"
- "users.first_name='First' || id"
- "users.last_name='Last' || id"
billing:
- "billing.cards.number='4111111111111111'"
extract:
root: "organizations WHERE id = 42"
output: fixtures/org-42.json
format: json
limit: 500
depth: 0
statement_timeout: 60
transaction: false
on_conflict_do_nothing: false
include:
- roles
- permissions
exclude:
- audit_log
- event_log
mask_policies:
- pii
- billing
masks:
- "contacts.phone='+1555' || LPAD((id % 10000000)::text, 7, '0')"
apply:
force: false
disable_triggers: false
sync_sequences: true.fixturize.yamlin the working directory is loaded automatically. Pass--profile path/to/file.yamlto use a different file.- Environment variables in the
connectionfield are expanded ($VARand${VAR}syntax). Unresolved variables cause an error so you don't accidentally connect with a blank password. masksat the top level defines reusable mask policies - named groups of mask expressions. Reference them fromextract.mask_policiesto compose sets of masks without repeating yourself. Inlineextract.masksare appended after policy masks.- All fields are optional. Only set what you need - everything else falls back to CLI defaults.
- extraction runs under REPEATABLE READ isolation - consistent snapshot, but holds a transaction open. Use
--statement-timeout(default 30s) to bound query time. --forceon apply truncates target tables before insert. Don't point it at production.- Circular FK dependencies are detected and warned about. The tool handles them via deferred constraints, but review the output.
- generated/identity columns are excluded from extraction and use
OVERRIDING SYSTEM VALUEon apply. - Always mask PII before sharing fixtures across environments, staging databases, or CI/CD pipelines.