diff --git a/.gitignore b/.gitignore
index 6cf9326..fea65f3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -5,6 +5,8 @@
 
 # Testing artefacts
 .temp-profile
+tests/.env
+tests/.env.local
 
 # logs
 geckodriver.log
diff --git a/docs/test-plan.md b/docs/test-plan.md
new file mode 100644
index 0000000..a4265eb
--- /dev/null
+++ b/docs/test-plan.md
@@ -0,0 +1,162 @@
+# Selenium Test Harness — Improvement Plan
+
+Date: 2026-04-30
+
+Overview
+
+This document captures an actionable plan to improve the Selenium-based integration tests in `tests/test.py` for the Zeeschuimer Firefox extension. The goals are to:
+
+- Make profile handling reliable and reusable (so logged-in sessions persist across runs).
+- Preserve and export captured data per platform for offline analysis and for passing to 4CAT.
+- Add optional automated upload to a 4CAT instance for mapping/validation tests.
+- Reduce fragility caused by popups and interactive dialogs (pausing/dismissal patterns).
+- Improve robustness, error handling, and machine-readable results.
+
+Scope
+
+All changes are confined to the test harness and test metadata (`tests/test.py` and `tests/tests.json`) and to this planning document. No changes are required in the extension source for the planned items (the test harness will interact with the extension's UI pages and background DB).
+
+Phases & Changes
+
+Phase 1 — Profile management
+
+- Problem: copying an entire profile can race with a running Firefox and the current ignore rule hides potentially useful session data.
+- Changes:
+  - Detect if the selected profile directory appears locked (presence of `lock` or `.parentlock`) and warn if Firefox is running.
+  - Replace the naive ignore lambda used in `shutil.copytree` with a function that only excludes `storage`, `extensions`, and `signedInUser.json` at the profile root.
+  - Add CLI flags: `--profile-name NAME` (choose profile by display name from `profiles.ini`), `--save-profile PATH` (save the temp profile for reuse), and `--no-cleanup` (do not remove `.temp-profile` after run).
+
+Implementation note (copytree ignore example):
+
+```python
+def _profile_ignore(root, names):
+    # Only ignore these entries in the root profile dir
+    if os.path.abspath(root) == os.path.abspath(profile_dir):
+        return {"storage", "extensions", "signedInUser.json"}
+    return set()
+
+shutil.copytree(profile_dir, profile_file, ignore=_profile_ignore)
+```
+
+Phase 2 — Data preservation & export
+
+- Problem: `reset-all` wipes the DB before each URL; no artifacts are kept for post-mortem or mapping tests.
+- Decision: export a single combined NDJSON file per platform containing items collected while testing that platform.
+- Changes:
+  - Add CLI `--export-dir PATH` (default `./zeeschuimer-exports/{timestamp}/`).
+  - Before clicking `reset-all` for each URL, read the current DB contents from the extension background page (Dexie) via `execute_async_script` and append those items to a per-platform in-memory list in Python. After all URLs for a platform are done, write `{export-dir}/{platform}.ndjson`.
+  - Optionally add `--no-reset` to skip the `reset-all` call entirely (default behavior remains to reset before each URL).
+
+Execute_async_script pattern (example):
+
+```python
+script = '''
+const cb = arguments[0];
+background.db.items.toArray().then(items => cb(JSON.stringify(items))).catch(e => cb(JSON.stringify({error: String(e)})));
+'''
+items_json = driver.execute_async_script(script)
+items = json.loads(items_json)
+```
+
+Phase 3 — 4CAT integration (optional)
+
+- Problem: mapping tests live in 4CAT and need NDJSON input.
+- Changes:
+  - Add CLI flags: `--4cat-url URL` and `--4cat-key KEY` (API key). Require both for upload.
+  - After writing the per-platform NDJSON, POST it to `{4cat_url.rstrip('/')}/api/import-dataset/` with header `X-Zeeschuimer-Platform: {platform}` and `Authorization: {key}` (confirm header with your 4CAT instance; alternative is to trigger the extension UI upload button when cookie-based auth is required).
+  - Do not fail the test run on 4CAT errors — print status and continue.
+
+Example upload with `requests`:
+
+```python
+import requests
+with open(ndjson_path, 'rb') as f:
+    headers = {
+        'X-Zeeschuimer-Platform': platform,
+        'Authorization': f'{fourcat_key}'
+    }
+    r = requests.post(f"{fourcat_url.rstrip('/')}/api/import-dataset/", headers=headers, data=f)
+    # check r.status_code and r.text for details
+```
+
+Phase 4 — Interactive controls & popup dismissals
+
+- Problem: cookie banners, paywall prompts, and other popups frequently interfere with automated navigation and can cause false failures.
+- Decision: pause by default **once per platform** (not before every URL) so the tester can clear residual prompts; provide opt-out and finer-grained options.
+- Changes:
+  - CLI flags: `--no-interactive` (disable all pauses), `--pause-before-url` (pause before each URL), `--pause-on-fail` (pause on failure), `--extra-wait N` (add N seconds to every wait), `--screenshot-dir PATH` (capture screenshots on fail/warning).
+  - Add a `dismiss-selectors` optional field in `tests.json` per URL: a list of CSS selectors to click to dismiss known popups. Example:
+
+```json
+"dismiss-selectors": ["button.cookie-accept", ".modal .close"]
+```
+
+  - Add per-URL `timeout` (page load timeout override).
+
+Phase 5 — Runner robustness & reporting
+
+- Problem: unhandled exceptions abort the run; final runtime is calculated incorrectly; no machine-readable results.
+- Changes:
+  - Wrap each URL test body in try/except, increment `failed` on exceptions, and continue.
+  - Move the global `start_time = time.time()` to before the outer platform loop so the final elapsed time is for the full run.
+  - Add CLI flags: `--results-file PATH` (write JSON summary), `--resume-from PLATFORM` (skip earlier platforms), and `--screenshot-dir PATH` (as noted).
+  - Fix small test metadata issues (e.g., `more-after-scrolll` typo in `tests.json`).
+
+tests.json schema additions
+
+- Per-URL optional fields:
+  - `dismiss-selectors`: array of CSS selectors to click after page load
+  - `timeout`: numeric page load timeout seconds for this URL
+  - `extra-wait`: per-URL additional wait seconds
+
+CLI flags (summary)
+
+- `--profiledir PATH` — explicit profile path (existing)
+- `--profile-name NAME` — choose Firefox profile by display name
+- `--save-profile PATH` — persist the copied profile for reuse
+- `--no-cleanup` — keep `.temp-profile`
+- `--export-dir PATH` — where to write NDJSON exports
+- `--no-reset` — do not click `reset-all` between URLs
+- `--4cat-url URL` — base URL for 4CAT server
+- `--4cat-key KEY` — API key for 4CAT uploads
+- `--4cat-per-url` — upload per URL instead of per platform (optional)
+- `--no-interactive` — disable pausing (default is to pause per-platform)
+- `--pause-before-url` — pause before each URL
+- `--pause-on-fail` — pause when a test fails
+- `--extra-wait N` — add N seconds to every URL wait
+- `--screenshot-dir PATH` — save screenshots on fail/warning
+- `--results-file PATH` — write machine-readable results JSON
+- `--resume-from PLATFORM` — resume a run from a platform
+
+Verification checklist
+
+1. `python tests/test.py --sources instagram.com --export-dir ./exports` -> `exports/instagram.com.ndjson` exists and contains NDJSON with captured items.
+2. `python tests/test.py --save-profile .saved-profile --login` -> create a saved profile that can be reused with `--profiledir .saved-profile`.
+3. Run with default interactive behavior and confirm one pause per platform.
+4. `python tests/test.py --results-file results.json` -> JSON summary produced with per-URL status and counts.
+5. Test 4CAT upload using a local mock server and `--4cat-url http://localhost:8000 --4cat-key KEY`.
+
+Implementation steps (recommended order)
+
+1. Docs and small fixes (this document + tests.json typo fix).
+2. Profile management changes (`--profile-name`, improved copy ignore, `--save-profile`, lock detection).
+3. Export behavior: `--export-dir` + `execute_async_script` collection and NDJSON write.
+4. Runner robustness: try/except around URL loop, `--results-file`, fix `start_time` placement.
+5. Interactive and dismissal features (`dismiss-selectors`, pause flags, screenshots).
+6. 4CAT upload integration (optional, requires confirmation of auth header).
+
+Estimated effort: 6–10 hours of focused work to implement and test everything end-to-end; can be split into 3-4 incremental PRs.
+
+Open questions / confirmations needed
+
+- Confirm 4CAT API key header format (currently suggested: `Authorization: {key}`). If your 4CAT requires cookie-based auth, we should emulate the extension upload button via Selenium instead.
+- Confirm desired default for interactive mode. (Current recommendation: pause once per platform by default; provide `--no-interactive` to run fully headless.)
+
+Next steps
+
+- I have created a matching TODO list in the session tracker and written this document to `docs/test-plan.md`.
+- If you want, I can start implementing Phase 1 (profile management) in `tests/test.py` now and submit incremental changes.
+
+---
+
+Requested file: `docs/test-plan.md`
diff --git a/js/lib.js b/js/lib.js
index e38430e..c618a6a 100644
--- a/js/lib.js
+++ b/js/lib.js
@@ -57,6 +57,12 @@ class MissingMappedField {
     toString() {
         return `${this.value}`;
     }
+
+    // Mirror 4CAT's API serialization so JSON.stringify produces the same
+    // tagged form on both sides. See docs/4cat-map-item-api.md.
+    toJSON() {
+        return { __missing: true, value: this.value };
+    }
 }
 
 /**
diff --git a/modules/package.json b/modules/package.json
new file mode 100644
index 0000000..3dbc1ca
--- /dev/null
+++ b/modules/package.json
@@ -0,0 +1,3 @@
+{
+  "type": "module"
+}
diff --git a/tests/.env.example b/tests/.env.example
new file mode 100644
index 0000000..2e021bb
--- /dev/null
+++ b/tests/.env.example
@@ -0,0 +1,9 @@
+# 4CAT API config for the map_item comparison tests.
+# Copy this file to .env in this directory and fill in real values.
+# .env is gitignored; .env.example is the committed template.
+
+# Base URL of the 4CAT instance to hit. No trailing slash.
+FOURCAT_URL=http://localhost
+
+# API key for that 4CAT instance. Get one from the 4CAT UI; tied to your user.
+FOURCAT_API_KEY=your-api-key-here
diff --git a/tests/__pycache__/test.cpython-39.pyc b/tests/__pycache__/test.cpython-39.pyc
new file mode 100644
index 0000000..745e2b4
Binary files /dev/null and b/tests/__pycache__/test.cpython-39.pyc differ
diff --git a/tests/_module-info.js b/tests/_module-info.js
new file mode 100644
index 0000000..e261e4e
--- /dev/null
+++ b/tests/_module-info.js
@@ -0,0 +1,45 @@
+/**
+ * Shared helper for the map_item test drivers.
+ *
+ * Pre-validates a module by:
+ *   1. Running `node --check` on its file (syntax check; avoids the
+ *      worker-killing experimental-ESM crash when a syntax error reaches
+ *      the dynamic importer).
+ *   2. Dynamically importing it and checking for a `map_item` export.
+ *
+ * Returns one of four states the test driver can branch on:
+ *   { state: 'ok',           map_item: <fn> }
+ *   { state: 'no_map_item' }
+ *   { state: 'syntax_error', error: <string> }
+ *   { state: 'import_error', error: <Error> }
+ */
+
+import { spawnSync } from 'node:child_process';
+import { join, dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const MODULES_ROOT = join(__dirname, '..', 'modules');
+
+function check_module_syntax(module_name) {
+    const module_path = join(MODULES_ROOT, `${module_name}.js`);
+    const result = spawnSync(process.execPath, ['--check', module_path], { encoding: 'utf8' });
+    if (result.status === 0) return null;
+    return (result.stderr || result.stdout || `exit code ${result.status}`).trim();
+}
+
+export async function inspect_module(module_name) {
+    const syntax_error = check_module_syntax(module_name);
+    if (syntax_error) {
+        return { state: 'syntax_error', error: syntax_error };
+    }
+    try {
+        const mod = await import(`../modules/${module_name}.js`);
+        if (typeof mod.map_item !== 'function') {
+            return { state: 'no_map_item' };
+        }
+        return { state: 'ok', map_item: mod.map_item };
+    } catch (e) {
+        return { state: 'import_error', error: e };
+    }
+}
diff --git a/tests/duplicate-behavior.test.js b/tests/duplicate-behavior.test.js
index 031f663..9f0662b 100644
--- a/tests/duplicate-behavior.test.js
+++ b/tests/duplicate-behavior.test.js
@@ -5,8 +5,9 @@
  * update or merge behaviors to duplicates across navigation boundaries.
  */
 
+import 'fake-indexeddb/auto';
+
 let Dexie;
-require('fake-indexeddb/auto');
 
 // Mock browser extension APIs
 global.browser = {
diff --git a/tests/fixtures/.gitignore b/tests/fixtures/.gitignore
new file mode 100644
index 0000000..8e89a83
--- /dev/null
+++ b/tests/fixtures/.gitignore
@@ -0,0 +1,5 @@
+# Ignore everything in this directory
+*
+# Except these files
+!.gitignore
+!README.md
\ No newline at end of file
diff --git a/tests/fixtures/README.md b/tests/fixtures/README.md
new file mode 100644
index 0000000..d24fe06
--- /dev/null
+++ b/tests/fixtures/README.md
@@ -0,0 +1,29 @@
+# Test fixtures for `map_item`
+
+Real captured items used to exercise each module's auto-generated `map_item`
+function.
+
+## Layout
+
+```
+tests/fixtures/
+  <module_name>/
+    <whatever>.ndjson
+    <whatever-else>.ndjson
+```
+
+`<module_name>` matches the filename in `modules/` without `.js` —
+e.g. `tiktok/` → `modules/tiktok.js`, `pinterest/` → `modules/pinterest.js`.
+You can drop multiple `.ndjson` files in a module folder; each gets its own
+`describe` block and each line becomes its own `test`.
+
+Filenames are free-form — the auto-export filename from the popup
+(`zeeschuimer-export-<platform>-<timestamp>.ndjson`) is fine.
+
+## Privacy / committing
+
+These files contain real captured platform data — usernames, post
+content, URLs, sometimes images and other PII. 
+
+If we want to create test exports or annonomize real exports, add them to 
+.gitignore.
\ No newline at end of file
diff --git a/tests/jest.config.js b/tests/jest.config.cjs
similarity index 64%
rename from tests/jest.config.js
rename to tests/jest.config.cjs
index 7dd5b02..ea72b10 100644
--- a/tests/jest.config.js
+++ b/tests/jest.config.cjs
@@ -3,6 +3,7 @@ module.exports = {
   testMatch: ['**/*.test.js'],
   transform: {},
   moduleFileExtensions: ['js', 'json'],
-  collectCoverageFrom: ['duplicate-behavior.test.js'],
+  collectCoverageFrom: ['*.test.js'],
+  setupFiles: ['<rootDir>/setup-globals.cjs'],
   verbose: true
 };
diff --git a/tests/map_item.test.js b/tests/map_item.test.js
new file mode 100644
index 0000000..2dc1bb6
--- /dev/null
+++ b/tests/map_item.test.js
@@ -0,0 +1,121 @@
+/**
+ * Smoke test driver for module `map_item` functions.
+ *
+ * Convention:
+ *   tests/fixtures/<module_name>/*.ndjson
+ *
+ * <module_name> matches a file in modules/ (e.g. "tiktok" maps to modules/tiktok.js).
+ * Each .ndjson line is one Zeeschuimer-stored item exported from the popup.
+ *
+ * Each item is wrapped via wrap_for_map_item to mirror how 4CAT's importer
+ * presents items to a map_item function, then run through the module's
+ * map_item. Tests assert: function returns a non-null object, and any fields
+ * listed in REQUIRED_NON_EMPTY for that module are present and non-empty.
+ *
+ * Module-level state is determined upfront by inspect_module():
+ *   - 'ok'            → register per-item tests
+ *   - 'no_map_item'   → register a single skipped test (not applicable)
+ *   - 'syntax_error'  → register a single failing test pointing at the line
+ *   - 'import_error'  → register a single failing test with the message
+ */
+
+import { readdirSync, readFileSync, statSync, existsSync } from 'node:fs';
+import { join, dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
+import { inspect_module } from './_module-info.js';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const FIXTURE_ROOT = join(__dirname, 'fixtures');
+
+const REQUIRED_NON_EMPTY = {
+    tiktok: ['id', 'author', 'unix_timestamp'],
+};
+
+/**
+ * Local mirror of wrap_for_map_item from js/lib.js. lib.js is loaded by
+ * the browser as a plain script and so cannot be imported from Node; this
+ * three-line mirror is cheaper than restructuring lib.js into a module.
+ */
+function wrap_for_map_item(stored_item) {
+    const { data, ...meta } = stored_item;
+    return { ...data, __import_meta: meta };
+}
+
+function list_module_dirs() {
+    if (!existsSync(FIXTURE_ROOT)) return [];
+    return readdirSync(FIXTURE_ROOT).filter(name => {
+        try { return statSync(join(FIXTURE_ROOT, name)).isDirectory(); }
+        catch { return false; }
+    });
+}
+
+const module_dirs = list_module_dirs();
+
+// Pre-pass: synchronously determine each module's state so we can branch
+// on it at describe/test registration time. Top-level await is supported
+// in Jest's experimental-vm-modules mode.
+const module_info = {};
+for (const module_name of module_dirs) {
+    module_info[module_name] = await inspect_module(module_name);
+}
+
+let total_fixtures = 0;
+
+for (const module_name of module_dirs) {
+    const fixture_dir = join(FIXTURE_ROOT, module_name);
+    const fixture_files = readdirSync(fixture_dir).filter(f => f.endsWith('.ndjson'));
+    if (fixture_files.length === 0) continue;
+    total_fixtures += fixture_files.length;
+
+    const info = module_info[module_name];
+
+    if (info.state === 'no_map_item') {
+        describe(`map_item: ${module_name}`, () => {
+            test.skip(`modules/${module_name}.js does not export a map_item function — nothing to smoke test`, () => {});
+        });
+        continue;
+    }
+
+    if (info.state === 'syntax_error' || info.state === 'import_error') {
+        const msg = info.state === 'syntax_error'
+            ? `syntax error:\n${info.error}`
+            : `import failed: ${info.error.message}`;
+        describe(`map_item: ${module_name}`, () => {
+            test(`module loads`, () => { throw new Error(msg); });
+        });
+        continue;
+    }
+
+    // state === 'ok' — register per-item tests
+    const map_item = info.map_item;
+
+    describe(`map_item: ${module_name}`, () => {
+        for (const fixture_file of fixture_files) {
+            const lines = readFileSync(join(fixture_dir, fixture_file), 'utf8')
+                .split('\n')
+                .filter(line => line.trim().length > 0);
+
+            describe(fixture_file, () => {
+                lines.forEach((line, i) => {
+                    test(`item ${i} maps without throwing`, () => {
+                        const stored_item = JSON.parse(line);
+                        const mapped = map_item(wrap_for_map_item(stored_item));
+                        expect(mapped).not.toBeNull();
+                        expect(typeof mapped).toBe('object');
+                        for (const field of REQUIRED_NON_EMPTY[module_name] ?? []) {
+                            expect(mapped[field]).toBeDefined();
+                            expect(mapped[field]).not.toBe('');
+                            expect(mapped[field]).not.toBeNull();
+                        }
+                    });
+                });
+            });
+        }
+    });
+}
+
+if (total_fixtures === 0) {
+    describe('map_item', () => {
+        test.skip('no fixtures found under tests/fixtures/<module_name>/*.ndjson', () => {});
+    });
+}
diff --git a/tests/map_item_compare.test.js b/tests/map_item_compare.test.js
new file mode 100644
index 0000000..37e3e4c
--- /dev/null
+++ b/tests/map_item_compare.test.js
@@ -0,0 +1,283 @@
+/**
+ * @jest-environment node
+ *
+ * This file runs in Node test environment (not jsdom) because undici's
+ * fetch implementation uses Node-internal APIs (`clearImmediate`,
+ * `markResourceTiming`, fast-now timers, etc.) that jsdom shadows or
+ * doesn't expose. Polyfilling them into jsdom is whack-a-mole; node env
+ * has them all natively.
+ *
+ * Trade-off: no DOMParser in node env. The four modules that use
+ * `strip_tags` (gab, pinterest, rednote, truth) will need a DOMParser
+ * polyfill (e.g. via linkedom) before the comparator can run against
+ * them. Other modules (including instagram) work as-is.
+ */
+/**
+ * Compare JS map_item output against 4CAT's Python map_item via the API.
+ *
+ * For every line in every fixture, runs the JS map_item locally AND sends
+ * the same stored item to 4CAT's /api/map-item/<datasource>/ endpoint, then
+ * diffs the two outputs field-by-field. Each item is its own Jest test —
+ * failures point at exactly which item and which fields diverge.
+ *
+ * Skips itself entirely if FOURCAT_URL / FOURCAT_API_KEY aren't set, so
+ * `npm test` keeps working without 4CAT configuration. Drop real values in
+ * tests/.env to enable.
+ *
+ * Datasource id mapping: tests/zeeschuimer-to-4cat.json (Zeeschuimer
+ * module filename → 4CAT datasource id, for the few names that diverge).
+ *
+ * Module-level state is determined upfront by inspect_module() (no
+ * map_item / syntax errors / import errors are handled before tests are
+ * registered, so they appear once per module, not once per item).
+ */
+
+import 'dotenv/config';
+import { jest } from '@jest/globals';
+import { readdirSync, readFileSync, statSync, existsSync } from 'node:fs';
+import { join, dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
+import { inspect_module } from './_module-info.js';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+
+const FOURCAT_URL = process.env.FOURCAT_URL?.replace(/\/$/, '');
+const FOURCAT_API_KEY = process.env.FOURCAT_API_KEY;
+const HAS_4CAT = Boolean(
+    FOURCAT_URL && FOURCAT_API_KEY && FOURCAT_API_KEY !== 'your-api-key-here'
+);
+
+// When true (default), once any item in a module fails, subsequent items
+// in that same module skip the HTTP + map_item work and fail fast with a
+// "halted" message. Saves time when generator output is broken at the top.
+// Set FAIL_FAST=0 in env to run all items regardless.
+// Trim because cmd.exe's `set FAIL_FAST=0 && ...` includes the trailing
+// space in the variable value, which would otherwise defeat `!== '0'`.
+const FAIL_FAST = (process.env.FAIL_FAST ?? '').trim() !== '0';
+const halted_modules = new Set();
+
+const FIXTURE_ROOT = join(__dirname, 'fixtures');
+const ID_MAP_PATH = join(__dirname, 'zeeschuimer-to-4cat.json');
+const ID_MAP = existsSync(ID_MAP_PATH)
+    ? JSON.parse(readFileSync(ID_MAP_PATH, 'utf8'))
+    : {};
+
+function wrap_for_map_item(stored_item) {
+    const { data, ...meta } = stored_item;
+    return { ...data, __import_meta: meta };
+}
+
+async function call_4cat_map_item(datasource_id, item) {
+    const res = await fetch(`${FOURCAT_URL}/api/map-item/${datasource_id}/`, {
+        method: 'POST',
+        headers: {
+            // 4CAT accepts the raw key without a `Bearer ` prefix, per probe
+            'Authorization': FOURCAT_API_KEY,
+            'Content-Type': 'application/json',
+        },
+        body: JSON.stringify({ item }),
+    });
+    const text = await res.text();
+    if (!res.ok) {
+        throw new Error(`HTTP ${res.status} from 4CAT: ${text}`);
+    }
+    return JSON.parse(text);
+}
+
+// Round-trip a value through JSON so MappedItem, MissingMappedField, etc.
+// become plain JSON-compatible objects matching what 4CAT emits.
+function normalize(value) {
+    return JSON.parse(JSON.stringify(value));
+}
+
+// Recursive structural equality. Doesn't care about object key order, which
+// matters for nested values like {__missing: true, value: ""} where JS and
+// Python might emit keys in different orders.
+function deep_equal(a, b) {
+    if (a === b) return true;
+    if (a === null || b === null) return a === b;
+    if (typeof a !== typeof b) return false;
+    if (typeof a !== 'object') return false;
+    if (Array.isArray(a) !== Array.isArray(b)) return false;
+    if (Array.isArray(a)) {
+        if (a.length !== b.length) return false;
+        return a.every((v, i) => deep_equal(v, b[i]));
+    }
+    const a_keys = Object.keys(a);
+    const b_keys = Object.keys(b);
+    if (a_keys.length !== b_keys.length) return false;
+    return a_keys.every(k => k in b && deep_equal(a[k], b[k]));
+}
+
+function diff_objects(js_obj, py_obj) {
+    const diffs = [];
+    const keys = new Set([...Object.keys(js_obj ?? {}), ...Object.keys(py_obj ?? {})]);
+    for (const key of keys) {
+        const in_js = js_obj && key in js_obj;
+        const in_py = py_obj && key in py_obj;
+        if (!in_js) {
+            diffs.push({ key, kind: 'only_python', python: py_obj[key] });
+        } else if (!in_py) {
+            diffs.push({ key, kind: 'only_js', js: js_obj[key] });
+        } else if (!deep_equal(js_obj[key], py_obj[key])) {
+            diffs.push({ key, kind: 'mismatch', js: js_obj[key], python: py_obj[key] });
+        }
+    }
+    return diffs;
+}
+
+function format_diffs(diffs) {
+    return diffs.map(d => {
+        if (d.kind === 'only_js') {
+            return `  + only in JS:     ${d.key} = ${JSON.stringify(d.js)}`;
+        }
+        if (d.kind === 'only_python') {
+            return `  - only in Python: ${d.key} = ${JSON.stringify(d.python)}`;
+        }
+        return `  ~ ${d.key}\n      JS:     ${JSON.stringify(d.js)}\n      Python: ${JSON.stringify(d.python)}`;
+    }).join('\n');
+}
+
+// Pull out the first few module-frame lines from an error's stack so the
+// failure message points at where in modules/<name>.js the throw happened.
+function format_error_with_location(err) {
+    if (!err) return String(err);
+    const message = err.message || String(err);
+    const stack = err.stack || '';
+    const module_frames = stack.split('\n')
+        .filter(l => l.includes('/modules/') || l.includes('\\modules\\'))
+        .slice(0, 3)
+        .map(l => l.trim());
+    return module_frames.length
+        ? `${message}\n  ${module_frames.join('\n  ')}`
+        : message;
+}
+
+function list_module_dirs() {
+    if (!existsSync(FIXTURE_ROOT)) return [];
+    return readdirSync(FIXTURE_ROOT).filter(name => {
+        try { return statSync(join(FIXTURE_ROOT, name)).isDirectory(); }
+        catch { return false; }
+    });
+}
+
+// Per-test timeout: each test does one HTTP round-trip to 4CAT. Jest's
+// default 5s is tight under load.
+jest.setTimeout(30000);
+
+if (!HAS_4CAT) {
+    describe('map_item compare (JS vs 4CAT Python)', () => {
+        test.skip('FOURCAT_URL / FOURCAT_API_KEY not configured — set them in tests/.env to enable', () => {});
+    });
+} else {
+    const module_dirs = list_module_dirs();
+
+    // Pre-pass: synchronously determine each module's state so we can branch
+    // on it at registration time.
+    const module_info = {};
+    for (const module_name of module_dirs) {
+        module_info[module_name] = await inspect_module(module_name);
+    }
+
+    let any_fixtures = false;
+
+    for (const module_name of module_dirs) {
+        const fixture_dir = join(FIXTURE_ROOT, module_name);
+        const fixture_files = readdirSync(fixture_dir).filter(f => f.endsWith('.ndjson'));
+        if (fixture_files.length === 0) continue;
+        any_fixtures = true;
+
+        const datasource_id = ID_MAP[module_name] ?? module_name;
+        const info = module_info[module_name];
+
+        if (info.state === 'no_map_item') {
+            // eslint-disable-next-line no-console
+            console.log(`[compare] skipping ${module_name}: modules/${module_name}.js does not export a map_item`);
+            continue;
+        }
+
+        if (info.state === 'syntax_error' || info.state === 'import_error') {
+            const msg = info.state === 'syntax_error'
+                ? `syntax error:\n${info.error}`
+                : `import failed: ${info.error.message}`;
+            describe(`map_item compare: ${module_name}`, () => {
+                test(`module loads`, () => { throw new Error(msg); });
+            });
+            continue;
+        }
+
+        // state === 'ok' — register per-item comparison tests
+        const map_item = info.map_item;
+
+        describe(`map_item compare: ${module_name} (4CAT id: ${datasource_id})`, () => {
+            for (const fixture_file of fixture_files) {
+                const lines = readFileSync(join(fixture_dir, fixture_file), 'utf8')
+                    .split('\n')
+                    .filter(line => line.trim().length > 0);
+
+                describe(fixture_file, () => {
+                    lines.forEach((line, i) => {
+                        test(`item ${i}`, async () => {
+                            if (FAIL_FAST && halted_modules.has(module_name)) {
+                                throw new Error(
+                                    '[halted after prior failure in this module — set FAIL_FAST=0 to run all items]'
+                                );
+                            }
+                            try {
+                                const stored_item = JSON.parse(line);
+
+                                // 4CAT side
+                                const response = await call_4cat_map_item(datasource_id, stored_item);
+
+                                // JS side
+                                let js_result;
+                                let js_error;
+                                try {
+                                    js_result = map_item(wrap_for_map_item(stored_item));
+                                } catch (e) {
+                                    js_error = e;
+                                }
+
+                                if (response.status === 'mapped') {
+                                    if (js_error) {
+                                        throw new Error(
+                                            `4CAT mapped this item but JS threw: ${format_error_with_location(js_error)}`
+                                        );
+                                    }
+                                    const js_obj = normalize(js_result);
+                                    const py_obj = normalize(response.item);
+                                    const diffs = diff_objects(js_obj, py_obj);
+                                    if (diffs.length > 0) {
+                                        throw new Error(
+                                            `${diffs.length} field(s) differ between JS and 4CAT:\n${format_diffs(diffs)}`
+                                        );
+                                    }
+                                } else if (response.status === 'skipped') {
+                                    if (!js_error) {
+                                        throw new Error(
+                                            `4CAT skipped this item ("${response.reason}") but JS produced a result`
+                                        );
+                                    }
+                                    // Both rejected — good. Skip reasons may differ in wording.
+                                } else if (response.status === 'error') {
+                                    throw new Error(`4CAT errored on this item: ${response.message}`);
+                                } else {
+                                    throw new Error(`unexpected 4CAT response status: ${JSON.stringify(response)}`);
+                                }
+                            } catch (e) {
+                                if (FAIL_FAST) halted_modules.add(module_name);
+                                throw e;
+                            }
+                        });
+                    });
+                });
+            }
+        });
+    }
+
+    if (!any_fixtures) {
+        describe('map_item compare (JS vs 4CAT Python)', () => {
+            test.skip('no fixtures under tests/fixtures/<module>/*.ndjson', () => {});
+        });
+    }
+}
diff --git a/tests/package-lock.json b/tests/package-lock.json
index cc8f457..7758e9f 100644
--- a/tests/package-lock.json
+++ b/tests/package-lock.json
@@ -9,9 +9,11 @@
       "version": "1.0.0",
       "devDependencies": {
         "dexie": "^3.2.4",
+        "dotenv": "^16.4.5",
         "fake-indexeddb": "^5.0.1",
         "jest": "^29.7.0",
-        "jest-environment-jsdom": "^29.7.0"
+        "jest-environment-jsdom": "^29.7.0",
+        "undici": "^6.20.0"
       }
     },
     "node_modules/@babel/code-frame": {
@@ -1758,6 +1760,19 @@
         "node": ">=12"
       }
     },
+    "node_modules/dotenv": {
+      "version": "16.6.1",
+      "resolved": "https://registry.npmjs.org/dotenv/-/dotenv-16.6.1.tgz",
+      "integrity": "sha512-uBq4egWHTcTt33a72vpSG0z3HnPuIl6NqYcTrKEg2azoEyl2hpW0zqlxysq2pK9HlDIHyHyakeYaYnSAwd8bow==",
+      "dev": true,
+      "license": "BSD-2-Clause",
+      "engines": {
+        "node": ">=12"
+      },
+      "funding": {
+        "url": "https://dotenvx.com"
+      }
+    },
     "node_modules/dunder-proto": {
       "version": "1.0.1",
       "resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz",
@@ -4183,6 +4198,16 @@
         "url": "https://github.com/sponsors/sindresorhus"
       }
     },
+    "node_modules/undici": {
+      "version": "6.26.0",
+      "resolved": "https://registry.npmjs.org/undici/-/undici-6.26.0.tgz",
+      "integrity": "sha512-4yqz8a3n5HmGTlsbADNtr/dJlhkh/55Rq798G6ibiULcXbDtaLpTl1pvdqcbFfeoj3iSi52lePFM7h9H21cw/A==",
+      "dev": true,
+      "license": "MIT",
+      "engines": {
+        "node": ">=18.17"
+      }
+    },
     "node_modules/undici-types": {
       "version": "7.16.0",
       "resolved": "https://registry.npmjs.org/undici-types/-/undici-types-7.16.0.tgz",
diff --git a/tests/package.json b/tests/package.json
index dc3654c..390fdd3 100644
--- a/tests/package.json
+++ b/tests/package.json
@@ -2,14 +2,18 @@
   "name": "zeeschuimer-db-tests",
   "version": "1.0.0",
   "description": "Unit tests for Zeeschuimer duplicate handling logic",
+  "type": "module",
   "scripts": {
-    "test": "jest",
-    "test:watch": "jest --watch"
+    "test": "node --experimental-vm-modules node_modules/jest/bin/jest.js",
+    "test:watch": "node --experimental-vm-modules node_modules/jest/bin/jest.js --watch",
+    "probe": "node probe-4cat.mjs"
   },
   "devDependencies": {
     "dexie": "^3.2.4",
+    "dotenv": "^16.4.5",
     "fake-indexeddb": "^5.0.1",
     "jest": "^29.7.0",
-    "jest-environment-jsdom": "^29.7.0"
+    "jest-environment-jsdom": "^29.7.0",
+    "undici": "^6.20.0"
   }
 }
diff --git a/tests/probe-4cat.mjs b/tests/probe-4cat.mjs
new file mode 100644
index 0000000..0bf4e4d
--- /dev/null
+++ b/tests/probe-4cat.mjs
@@ -0,0 +1,140 @@
+/**
+ * Manually exercise 4CAT's /api/map-item/ endpoint against a fixture item.
+ *
+ * Usage:
+ *   node probe-4cat.mjs <module_name> [<fixture_filename>] [--index N]
+ *
+ * <module_name> is the Zeeschuimer module filename without `.js` (e.g.
+ *   "tiktok", "pinterest"). If <fixture_filename> is omitted, the first
+ *   .ndjson in tests/fixtures/<module_name>/ is used. --index selects which
+ *   line of the fixture to send (default 0).
+ *
+ * Requires tests/.env with FOURCAT_URL and FOURCAT_API_KEY.
+ */
+
+import 'dotenv/config';
+import { readFileSync, existsSync, readdirSync } from 'node:fs';
+import { join, dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+
+const FOURCAT_URL = process.env.FOURCAT_URL?.replace(/\/$/, '');
+const FOURCAT_API_KEY = process.env.FOURCAT_API_KEY;
+
+if (!FOURCAT_URL || !FOURCAT_API_KEY || FOURCAT_API_KEY === 'your-api-key-here') {
+    console.error('error: FOURCAT_URL and FOURCAT_API_KEY must be set in tests/.env');
+    console.error('       (copy tests/.env.example to tests/.env and fill in real values)');
+    process.exit(1);
+}
+
+const ID_MAP_PATH = join(__dirname, 'zeeschuimer-to-4cat.json');
+const ID_MAP = existsSync(ID_MAP_PATH)
+    ? JSON.parse(readFileSync(ID_MAP_PATH, 'utf8'))
+    : {};
+
+function auth_headers() {
+    return { 'Authorization': `${FOURCAT_API_KEY}` };
+}
+
+async function list_datasources() {
+    const res = await fetch(`${FOURCAT_URL}/api/datasources/`, { headers: auth_headers() });
+    if (!res.ok) {
+        throw new Error(`GET /api/datasources/ → ${res.status}: ${await res.text()}`);
+    }
+    const body = await res.json();
+    return body.datasources ?? [];
+}
+
+async function map_item(datasource_id, item) {
+    const res = await fetch(`${FOURCAT_URL}/api/map-item/${datasource_id}/`, {
+        method: 'POST',
+        headers: { ...auth_headers(), 'Content-Type': 'application/json' },
+        body: JSON.stringify({ item }),
+    });
+    const text = await res.text();
+    let body;
+    try { body = JSON.parse(text); } catch { body = { raw: text }; }
+    return { status_code: res.status, body };
+}
+
+function parse_args(argv) {
+    const args = { module: null, fixture: null, index: 0 };
+    const positional = [];
+    for (let i = 2; i < argv.length; i++) {
+        if (argv[i] === '--index') {
+            args.index = parseInt(argv[++i], 10);
+        } else if (argv[i].startsWith('--index=')) {
+            args.index = parseInt(argv[i].split('=')[1], 10);
+        } else {
+            positional.push(argv[i]);
+        }
+    }
+    args.module = positional[0];
+    args.fixture = positional[1];
+    return args;
+}
+
+async function main() {
+    const args = parse_args(process.argv);
+    if (!args.module) {
+        console.error('Usage: node probe-4cat.mjs <module_name> [<fixture_filename>] [--index N]');
+        process.exit(1);
+    }
+
+    const datasource_id = ID_MAP[args.module] ?? args.module;
+    const fixture_dir = join(__dirname, 'fixtures', args.module);
+
+    if (!existsSync(fixture_dir)) {
+        console.error(`error: no fixture dir at ${fixture_dir}`);
+        process.exit(1);
+    }
+
+    const candidates = readdirSync(fixture_dir).filter(f => f.endsWith('.ndjson'));
+    if (candidates.length === 0) {
+        console.error(`error: no .ndjson fixtures under ${fixture_dir}`);
+        process.exit(1);
+    }
+    const fixture_name = args.fixture ?? candidates[0];
+    const fixture_path = join(fixture_dir, fixture_name);
+    if (!existsSync(fixture_path)) {
+        console.error(`error: fixture ${fixture_path} not found`);
+        process.exit(1);
+    }
+
+    const lines = readFileSync(fixture_path, 'utf8').split('\n').filter(l => l.trim().length > 0);
+    if (args.index >= lines.length) {
+        console.error(`error: --index ${args.index} but fixture has ${lines.length} items`);
+        process.exit(1);
+    }
+    const item = JSON.parse(lines[args.index]);
+
+    console.log(`Module:        ${args.module}`);
+    console.log(`Datasource id: ${datasource_id}${ID_MAP[args.module] ? ' (mapped via zeeschuimer-to-4cat.json)' : ''}`);
+    console.log(`URL:           ${FOURCAT_URL}/api/map-item/${datasource_id}/`);
+    console.log(`Fixture:       ${fixture_name}, item ${args.index} (item_id=${item.item_id ?? item.id})`);
+    console.log('');
+
+    const { status_code, body } = await map_item(datasource_id, item);
+    console.log(`HTTP ${status_code}`);
+    console.log(JSON.stringify(body, null, 2));
+
+    if (status_code === 404) {
+        console.error('');
+        console.error('Hint: datasource id may be wrong. Available Zeeschuimer-origin datasources:');
+        try {
+            const datasources = await list_datasources();
+            datasources
+                .filter(d => d.is_from_zeeschuimer && d.has_map_item)
+                .forEach(d => console.error(`  - ${d.id}  (${d.name})`));
+        } catch (e) {
+            console.error(`  (couldn't fetch list: ${e.message})`);
+        }
+        process.exit(2);
+    }
+}
+
+main().catch(e => {
+    console.error(`probe failed: ${e.message}`);
+    process.exit(2);
+});
diff --git a/tests/setup-globals.cjs b/tests/setup-globals.cjs
new file mode 100644
index 0000000..6793cc0
--- /dev/null
+++ b/tests/setup-globals.cjs
@@ -0,0 +1,53 @@
+/**
+ * Make js/lib.js's helpers available as globals inside the Jest test
+ * environment, mirroring how the browser sees them after the manifest
+ * loads lib.js as a plain script.
+ *
+ * map_item bodies reference these as free identifiers (MappedItem,
+ * MissingMappedField, strip_tags, normalize_url_encoding, ...). Without this
+ * shim they'd hit ReferenceError as soon as a test invokes map_item.
+ *
+ * Approach: read lib.js, wrap it in a new Function() body that returns the
+ * named helpers, call the function, and assign the returned object onto
+ * globalThis. (Earlier attempt with vm.runInThisContext failed because in
+ * the jsdom env the vm context's global differs from jsdom's window.)
+ *
+ * If a new helper is added to lib.js, append its name to EXPOSED_NAMES.
+ */
+
+const fs = require('node:fs');
+const path = require('node:path');
+
+const EXPOSED_NAMES = [
+    'traverse_data',
+    'MappedItem',
+    'MissingMappedField',
+    'MapItemException',
+    'wrap_for_map_item',
+    'strip_tags',
+    'normalize_url_encoding',
+    'formatUtcTimestamp',
+];
+
+const lib_source = fs.readFileSync(
+    path.join(__dirname, '..', 'js', 'lib.js'),
+    'utf8',
+);
+
+const factory = new Function(`
+${lib_source}
+return { ${EXPOSED_NAMES.join(', ')} };
+`);
+
+Object.assign(globalThis, factory());
+
+// jsdom doesn't expose fetch and Jest's jsdom env shadows Node's global
+// fetch, so the comparator can't hit 4CAT without help. Polyfill from
+// undici (a Node-friendly HTTP client, separately installable on npm —
+// distinct from the undici bundled internally by Node, which isn't
+// require()-able by name).
+// Note: tests that use fetch (e.g. map_item_compare.test.js) declare
+// `@jest-environment node` at the top of the file. Node env has fetch
+// natively. Don't try to polyfill into jsdom — undici's internals use
+// Node-specific globals that jsdom shadows (clearImmediate,
+// markResourceTiming, fast timers), and polyfilling them all is brittle.
diff --git a/tests/translation-errors.md b/tests/translation-errors.md
new file mode 100644
index 0000000..fcc160d
--- /dev/null
+++ b/tests/translation-errors.md
@@ -0,0 +1,430 @@
+# Auto-generator translation errors
+
+Patterns of incorrect Python → JavaScript translation observed in
+auto-generated `modules/*.js` files. Each entry has a search pattern so
+this doc doubles as a checklist when reviewing a new auto-generator PR.
+
+When an entry is fixed at the generator level (no longer appears in
+fresh output), mark it `[fixed]` and keep the entry around — useful
+history when something regresses.
+
+## How to use
+
+- Found a new pattern? Add an entry below following the template.
+- Reviewing a generator PR? `grep` each `Search pattern` against the
+  changed module files. Anything that hits is worth a manual look.
+- Iterating on the generator prompt? The "Why" lines are the
+  feedback to add — they describe the exact Python-vs-JS semantic
+  difference the LLM keeps missing.
+
+## Template
+
+```
+### <short-name>
+
+**Status:** open | fixed in generator | accepted
+
+**Why it happens:** <one-line description of the Python-vs-JS difference>
+
+**Wrong JS:**
+```js
+<the broken pattern>
+```
+
+**Correct JS:**
+```js
+<what it should look like>
+```
+
+**Example:** `modules/<file>.js:<line>`
+
+**Search pattern:** `<grep-able regex>`
+```
+
+---
+
+## Observed patterns
+
+### `in` operator on strings
+
+**Status:** open
+
+**Why it happens:** In Python, `"x" in some_string` is a substring check.
+In JavaScript, the `in` operator only works on **objects** and checks for
+property/key existence; using it with a string on the right-hand side
+throws `TypeError: cannot use 'in' operator to search for "x" in <string>`.
+
+**Wrong JS:**
+```js
+const is_polaris = '__typename' in item && 'polaris' in item.__typename.toLowerCase();
+```
+
+**Correct JS:**
+```js
+const is_polaris = '__typename' in item && item.__typename.toLowerCase().includes('polaris');
+```
+
+**Example:** `modules/instagram.js:513`
+
+**Search pattern:** `'[^']+' in [a-zA-Z_$][\w$]*\.` — quoted string followed
+by `in` followed by a method call. Quick rough check: `grep -E "' in [a-zA-Z]" modules/`
+
+**Watch out for partial fixes:** seen as `'polaris' in (item.__typename ?? '').toLowerCase()`
+— adding `?? ''` guards against `undefined` but the `in` operator itself
+still throws on the resulting *string*. The fix is `.includes()`, not just
+defaulting the operand.
+
+---
+
+### Python f-string syntax left in single-quoted JS strings
+
+**Status:** open
+
+**Why it happens:** Python `f"... {var} ..."` interpolates. JS uses
+template literals (backticks) with `${var}`. The auto-generator leaves the
+`{var}` notation in a regular single- or double-quoted JS string, which is
+just literal text — no interpolation happens.
+
+**Wrong JS:**
+```js
+throw new MapItemException('Unable to parse item: different user {user.id} and owner {owner.id}');
+```
+
+**Correct JS:**
+```js
+throw new MapItemException(`Unable to parse item: different user ${user.id} and owner ${owner.id}`);
+```
+
+**Example:** `modules/instagram.js:754`
+
+**Search pattern:** `'[^']*\{[a-zA-Z_$][\w$.]*\}[^']*'` or `"[^"]*\{[a-zA-Z_$][\w$.]*\}[^"]*"`
+— a non-template-literal string containing `{identifier}` or `{identifier.path}`.
+Quick check: `grep -nE "['\"][^'\"]*\{[a-zA-Z_][a-zA-Z0-9_.]*\}[^'\"]*['\"]" modules/`
+
+---
+
+### `?? {}` default that defeats subsequent truthy checks
+
+**Status:** open
+
+**Why it happens:** When porting Python's `node.get('user') or {}` (which is
+intended to make subsequent code safe to call), the generator emits
+`node.user ?? {}`. That's a *valid* Python-equivalent, **but** any following
+`if (user && owner) { ... }` guard then never short-circuits because both
+`{}` references are truthy. The check ends up reading "if user and owner
+*objects* exist" when the intent was "if user and owner data exist."
+Subsequent property accesses then compare real ids/usernames against
+`undefined` on the missing side, often throwing.
+
+**Wrong JS:**
+```js
+const user  = node.user  ?? {};
+const owner = node.owner ?? {};
+if (user && owner) {
+    if (user.id === owner.id) { /* … */ }
+    else if (user.username !== owner.username) {
+        throw new MapItemException('different user and owner');
+    }
+}
+```
+
+**Correct JS** (depending on intent — pick one):
+```js
+// (a) drop the defaults so truthy guard means "both present"
+const user  = node.user;
+const owner = node.owner;
+if (user && owner) { /* compare */ }
+```
+```js
+// (b) check for actual content, not just object identity
+const user  = node.user  ?? {};
+const owner = node.owner ?? {};
+if (Object.keys(user).length && Object.keys(owner).length) { /* compare */ }
+```
+
+**Example:** `modules/instagram.js:748-756`
+
+**Search pattern:** `\?\?\s*\{\s*\}` — any `?? {}` occurrence is worth a
+review of subsequent guards. Quick check: `grep -nE "\?\?\s*\{\s*\}" modules/`
+
+---
+
+### Bare relative path as a statement (junk auto-imports section)
+
+**Status:** open
+
+**Why it happens:** The generator emits an "auto-generated imports" marker
+block at the top of the module but writes the import target as a bare
+relative path on its own line (`../js/lib.js`) instead of a real `import`
+statement. JS parses that as `..` then `.` then `/js/lib.js` — syntax error.
+
+**Wrong JS:**
+```js
+// === auto-generated imports for map_item — DO NOT EDIT BY HAND ===
+../js/lib.js
+// === end auto-generated imports ===
+```
+
+**Correct JS** (one of):
+```js
+// === auto-generated imports — DO NOT EDIT BY HAND ===
+// Provided as globals by js/lib.js (loaded via manifest.json):
+//   MappedItem, MissingMappedField, MapItemException, traverse_data,
+//   strip_tags, normalize_url_encoding, formatUtcTimestamp
+// === end auto-generated imports ===
+```
+
+Or, if a real import is intended, an ESM import with named bindings:
+```js
+import { MappedItem, MissingMappedField } from '../js/lib.js';
+```
+
+**Example:** seen historically in `modules/tiktok.js:2`
+
+**Search pattern:** `^\.\./` at the start of a line in module files.
+Quick check: `grep -nE "^\.\." modules/*.js`
+
+---
+
+### Key-existence check (`'X' in obj`) used where Python intended value-truthiness (`obj.get('X')`)
+
+**Status:** open
+
+**Why it happens:** Python's `if node.get('usertags'):` is a *truthy check on
+the value* — returns False if the key is missing **or** if the value is
+`None`/empty/falsy. The generator translates this to `if ('usertags' in
+node)`, which in JS is a *key-existence check* — returns True even when
+the value is `null`. Subsequent property accesses on the null value then
+throw `Cannot read properties of null`.
+
+**Wrong JS:**
+```js
+const usertags = 'usertags' in node ? node.usertags.in.map(...).join(',') : '';
+// node.usertags can be null → .in.map blows up
+```
+
+**Correct JS:**
+```js
+const usertags = node.usertags ? node.usertags.in.map(...).join(',') : '';
+```
+
+**Example:** `modules/instagram.js:777`
+
+**Search pattern:** `'[^']+' in [a-zA-Z_$][\w$]*\s*\?` — quoted-string `in`
+identifier followed by `?` (ternary). Quick check:
+`grep -nE "'[^']+' in [a-zA-Z_]+ \?" modules/`
+
+---
+
+### Datetime serialization format mismatch
+
+**Status:** open
+
+**Why it happens:** Python's `datetime.utcfromtimestamp(t).strftime('%Y-%m-%d %H:%M:%S')`
+produces `"2026-05-13 21:27:31"` — space-separated, no timezone marker. JS's
+`new Date(t * 1000).toISOString()` produces `"2026-05-13T21:27:31.000Z"` — T
+separator, milliseconds, Z. The generator emits the JS `.toISOString()` form
+instead of using the existing `formatUtcTimestamp` helper from lib.js that
+mimics Python's output exactly.
+
+**Wrong JS:**
+```js
+collected_at = new Date(node.taken_at * 1000).toISOString();
+```
+
+**Correct JS:**
+```js
+collected_at = formatUtcTimestamp(node.taken_at);
+// formatUtcTimestamp is defined in js/lib.js as:
+//   new Date(unixSeconds * 1000).toISOString().replace('T', ' ').slice(0, 19)
+```
+
+**Example:** `modules/instagram.js:782`
+
+**Search pattern:** `new Date\([^)]+\)\.toISOString\(\)` — any use of
+`.toISOString()`. The helper should be used instead. Quick check:
+`grep -nE "\.toISOString\(\)" modules/`
+
+---
+
+### `re.findall` capture groups vs JS `.match` with /g flag
+
+**Status:** open
+
+**Why it happens:** Python's `re.findall(r'#(\w+)', s)` returns the **capture
+group contents**: `['lotr', 'woodart']`. JS's `s.match(/#(\w+)/g)` (with the
+global flag) returns the **full matches**: `['#lotr', '#woodart']` — capture
+groups are ignored. The generator translates the regex literally without
+adjusting for this semantic difference, so the resulting strings keep
+prefixes/wrappers that Python would have stripped.
+
+**Wrong JS:**
+```js
+hashtags: caption.match(/#([^\s!@#$%^&*()_+{}:"|<>?;',./`~]+)/g)?.join(',')
+// produces "#lotr,#woodart"
+```
+
+**Correct JS:**
+```js
+// Option A: strip the literal prefix from each full match
+hashtags: caption.match(/#([^\s...]+)/g)?.map(h => h.slice(1)).join(',') ?? ''
+// Option B: use matchAll to get capture groups properly
+hashtags: [...caption.matchAll(/#([^\s...]+)/g)].map(m => m[1]).join(',') ?? ''
+```
+
+**Example:** `modules/instagram.js:812` (also 766, 870 — three copies)
+
+**Search pattern:** `\.match\(/[^/]*\([^/]*\)[^/]*/g\)` — any `.match()` with
+a global-flag regex containing a capture group. Quick check:
+`grep -nE "\.match\(/.*\(.*\).*\/g\)" modules/`
+
+---
+
+### `undefined` field values get dropped from JSON, but Python's `None` becomes `null`
+
+**Status:** open
+
+**Why it happens:** When `JSON.stringify` encounters an object property whose
+value is `undefined`, it **omits the key entirely** from the output. Python's
+`json.dumps` serializes `None` as `null`, keeping the key. The generator
+writes assignments like `location.city = node.location.city` where the
+right-hand side can be `undefined`, producing missing keys in JS output
+that show up as `only in Python: <field> = null` diffs against 4CAT.
+
+**Wrong JS:**
+```js
+location.city = node.location.city;  // undefined if .city missing
+// JSON.stringify({location_city: undefined}) → "{}" (key omitted)
+
+body: caption,  // null if no caption — Python returns "" here, not null
+```
+
+**Correct JS:**
+```js
+// Whichever fallback Python uses for that specific field:
+location.city = node.location.city ?? null;   // some fields → null
+body: caption ?? '',                          // other fields → ""
+```
+
+**Example:** `modules/instagram.js:745, 853` (`null` flavor),
+559, 648, 798 (`""` flavor for `body`)
+
+**Note:** Python's choice of `None` vs `""` is per-field — there's no
+universal rule. When the comparator reports `~ X  JS: null  Python: ""` use
+`?? ''`. When it reports `- only in Python: X = null` use `?? null`. The
+distinction matters because the JS output should match Python's choice
+exactly for that field.
+
+**Search pattern:** harder to grep automatically — any property assignment
+where the RHS could be `undefined`/`null` and the resulting field is
+expected to appear in the mapped output. Look at "only in Python: X = null"
+and "~ X  JS: null  Python: \"\"" diffs in the comparator output to find
+specific cases.
+
+---
+
+### Object-reference inequality used as type check
+
+**Status:** open
+
+**Why it happens:** The generator emits `caption !== new MissingMappedField('')`
+to mean "caption is not a missing-marker", but `new MissingMappedField('')`
+creates a fresh object every time, and `!==` on objects compares references.
+The expression is **always true**, so the conditional never takes the
+"missing" branch. Likely originates from Python idioms like `caption != ""`
+or `caption is not None`, mistranslated through the MissingMappedField
+abstraction.
+
+**Wrong JS:**
+```js
+hashtags: caption !== new MissingMappedField('') ? caption.match(...) : '',
+// !== between two different object references is always true
+```
+
+**Correct JS:**
+```js
+// If the intent was "if caption has content", just truthy-check it:
+hashtags: caption ? caption.match(...) : '',
+// If the intent was "if caption is not a MissingMappedField instance":
+hashtags: !(caption instanceof MissingMappedField) ? caption.match(...) : '',
+```
+
+**Example:** `modules/instagram.js:812` (and two other copies)
+
+**Search pattern:** `!== new [A-Z]` or `=== new [A-Z]` — any equality
+comparison with a freshly-constructed object. Quick check:
+`grep -nE "(!==|===) new [A-Z]" modules/`
+
+---
+
+### `.method()` chain on potentially-null result
+
+**Status:** open
+
+**Why it happens:** In Python, calling a method on `None` raises
+`AttributeError`, which 4CAT sometimes catches. In JS, calling a method on
+`null`/`undefined` throws `TypeError: Cannot read properties of null
+(reading '<method>')`. The generator emits the same dotted chain without
+optional-chaining (`?.`) protection.
+
+**Wrong JS:**
+```js
+hashtags: caption !== new MissingMappedField('')
+    ? caption.match(/#([^\s!@#$%^&*()_+{}:"|<>?;',./`~]+)/g)?.join(',')
+    : '',
+```
+(here `caption` is allowed to be `null`, so `caption.match(...)` blows up
+on null caption)
+
+**Correct JS:**
+```js
+hashtags: caption
+    ? caption.match(/#([^\s!@#$%^&*()_+{}:"|<>?;',./`~]+)/g)?.join(',') ?? ''
+    : '',
+```
+
+**Example:** `modules/instagram.js:809`
+
+**Search pattern:** harder to grep — needs reading. Worth manual review of
+any field that uses `caption.match`, `something.split`, `something.join`
+without `?.` on a value that could be null/undefined.
+
+---
+
+## Generator prompt feedback (running list)
+
+Concrete things to fold into the generator's prompt over time:
+
+1. **Python `x in y` where `y` is a string** → use `y.includes(x)` in JS,
+   never `x in y`.
+2. **Python f-strings** → use JS template literals (backticks) with
+   `${...}` syntax. Never leave `{...}` in single- or double-quoted strings.
+3. **`?? {}` after a `.get(...) or {}` translation** → only use this if the
+   following code does property-access. If the following code does a
+   truthy guard (`if (x && y)`), drop the default and use just `node.user`.
+4. **Method chains on possibly-null values** → use `?.` (optional
+   chaining) instead of `.` whenever the receiver could be null/undefined.
+5. **The auto-imports header block** → emit either real `import { ... }`
+   statements with valid relative paths, or a comment-only header.
+   Never emit bare paths as JS statements.
+6. **Python `node.get('X')` truthy check** → in JS, use `node.X` (or
+   `node.X != null`), not `'X' in node`. The `in` operator checks key
+   existence, which is True even for explicit-null values.
+7. **Datetime serialization** → use the `formatUtcTimestamp` helper from
+   lib.js (which mimics Python's `strftime('%Y-%m-%d %H:%M:%S')` format),
+   not `new Date(...).toISOString()` (which has a different output shape:
+   T separator, milliseconds, Z suffix).
+8. **`re.findall` with capture groups** → in JS, `.match(/.../g)` returns
+   full matches, NOT capture groups. To get capture-group behavior, use
+   either `[...s.matchAll(/.../g)].map(m => m[1])` or post-process the
+   full matches with `.map(...)` to strip the literal parts.
+9. **Object-reference equality (`!== new X(...)`)** → never. Creating an
+   object with `new` produces a fresh reference; `===`/`!==` compares
+   identity. Use `instanceof X` for type checks, or compare values
+   directly. The MissingMappedField "is this missing?" check should be
+   `caption instanceof MissingMappedField` or just truthy-check the value.
+10. **Python `None` → JSON `null` vs JS `undefined` → omitted** — when a
+    field's value could be missing and Python returns `null` for it,
+    JS must explicitly assign `null` (not leave the value as `undefined`).
+    `JSON.stringify` drops `undefined` keys silently. Use `value ?? null`
+    when the field is expected to appear in the mapped output.
diff --git a/tests/zeeschuimer-to-4cat.json b/tests/zeeschuimer-to-4cat.json
new file mode 100644
index 0000000..f7de942
--- /dev/null
+++ b/tests/zeeschuimer-to-4cat.json
@@ -0,0 +1,7 @@
+{
+  "_comment": "Maps Zeeschuimer module filenames (without .js) to 4CAT datasource ids when they differ. Default behavior is identity — only include entries where the two diverge. Discovered via http://localhost/api/datasources/.",
+  "9gag": "ninegag",
+  "truth": "truthsocial",
+  "rednote": "xiaohongshu",
+  "rednote-comments": "xiaohongshu-comments"
+}