digitalmethodsinitiative · dale-wahl · May 5, 2026 · May 6, 2026 · May 6, 2026 · May 6, 2026
diff --git a/.gitignore b/.gitignore
@@ -5,6 +5,8 @@
 
 # Testing artefacts
 .temp-profile
+tests/.env
+tests/.env.local
 
 # logs
 geckodriver.log
diff --git a/docs/test-plan.md b/docs/test-plan.md
@@ -0,0 +1,162 @@
+# Selenium Test Harness — Improvement Plan
+
+Date: 2026-04-30
+
+Overview
+
+This document captures an actionable plan to improve the Selenium-based integration tests in `tests/test.py` for the Zeeschuimer Firefox extension. The goals are to:
+
+- Make profile handling reliable and reusable (so logged-in sessions persist across runs).
+- Preserve and export captured data per platform for offline analysis and for passing to 4CAT.
+- Add optional automated upload to a 4CAT instance for mapping/validation tests.
+- Reduce fragility caused by popups and interactive dialogs (pausing/dismissal patterns).
+- Improve robustness, error handling, and machine-readable results.
+
+Scope
+
+All changes are confined to the test harness and test metadata (`tests/test.py` and `tests/tests.json`) and to this planning document. No changes are required in the extension source for the planned items (the test harness will interact with the extension's UI pages and background DB).
+
+Phases & Changes
+
+Phase 1 — Profile management
+
+- Problem: copying an entire profile can race with a running Firefox and the current ignore rule hides potentially useful session data.
+- Changes:
+  - Detect if the selected profile directory appears locked (presence of `lock` or `.parentlock`) and warn if Firefox is running.
+  - Replace the naive ignore lambda used in `shutil.copytree` with a function that only excludes `storage`, `extensions`, and `signedInUser.json` at the profile root.
+  - Add CLI flags: `--profile-name NAME` (choose profile by display name from `profiles.ini`), `--save-profile PATH` (save the temp profile for reuse), and `--no-cleanup` (do not remove `.temp-profile` after run).
+
+Implementation note (copytree ignore example):
+
+```python
+def _profile_ignore(root, names):
+    # Only ignore these entries in the root profile dir
+    if os.path.abspath(root) == os.path.abspath(profile_dir):
+        return {"storage", "extensions", "signedInUser.json"}
+    return set()
+
+shutil.copytree(profile_dir, profile_file, ignore=_profile_ignore)
+```
+
+Phase 2 — Data preservation & export
+
+- Problem: `reset-all` wipes the DB before each URL; no artifacts are kept for post-mortem or mapping tests.
+- Decision: export a single combined NDJSON file per platform containing items collected while testing that platform.
+- Changes:
+  - Add CLI `--export-dir PATH` (default `./zeeschuimer-exports/{timestamp}/`).
+  - Before clicking `reset-all` for each URL, read the current DB contents from the extension background page (Dexie) via `execute_async_script` and append those items to a per-platform in-memory list in Python. After all URLs for a platform are done, write `{export-dir}/{platform}.ndjson`.
+  - Optionally add `--no-reset` to skip the `reset-all` call entirely (default behavior remains to reset before each URL).
+
+Execute_async_script pattern (example):
+
+```python
+script = '''
+const cb = arguments[0];
+background.db.items.toArray().then(items => cb(JSON.stringify(items))).catch(e => cb(JSON.stringify({error: String(e)})));
+'''
+items_json = driver.execute_async_script(script)
+items = json.loads(items_json)
+```
+
+Phase 3 — 4CAT integration (optional)
+
+- Problem: mapping tests live in 4CAT and need NDJSON input.
+- Changes:
+  - Add CLI flags: `--4cat-url URL` and `--4cat-key KEY` (API key). Require both for upload.
+  - After writing the per-platform NDJSON, POST it to `{4cat_url.rstrip('/')}/api/import-dataset/` with header `X-Zeeschuimer-Platform: {platform}` and `Authorization: {key}` (confirm header with your 4CAT instance; alternative is to trigger the extension UI upload button when cookie-based auth is required).
+  - Do not fail the test run on 4CAT errors — print status and continue.
+
+Example upload with `requests`:
+
+```python
+import requests
+with open(ndjson_path, 'rb') as f:
+    headers = {
+        'X-Zeeschuimer-Platform': platform,
+        'Authorization': f'{fourcat_key}'
+    }
+    r = requests.post(f"{fourcat_url.rstrip('/')}/api/import-dataset/", headers=headers, data=f)
+    # check r.status_code and r.text for details
+```
+
+Phase 4 — Interactive controls & popup dismissals
+
+- Problem: cookie banners, paywall prompts, and other popups frequently interfere with automated navigation and can cause false failures.
+- Decision: pause by default **once per platform** (not before every URL) so the tester can clear residual prompts; provide opt-out and finer-grained options.
+- Changes:
+  - CLI flags: `--no-interactive` (disable all pauses), `--pause-before-url` (pause before each URL), `--pause-on-fail` (pause on failure), `--extra-wait N` (add N seconds to every wait), `--screenshot-dir PATH` (capture screenshots on fail/warning).
+  - Add a `dismiss-selectors` optional field in `tests.json` per URL: a list of CSS selectors to click to dismiss known popups. Example:
+
+```json
+"dismiss-selectors": ["button.cookie-accept", ".modal .close"]
+```
+
+  - Add per-URL `timeout` (page load timeout override).
+
+Phase 5 — Runner robustness & reporting
+
+- Problem: unhandled exceptions abort the run; final runtime is calculated incorrectly; no machine-readable results.
+- Changes:
+  - Wrap each URL test body in try/except, increment `failed` on exceptions, and continue.
+  - Move the global `start_time = time.time()` to before the outer platform loop so the final elapsed time is for the full run.
+  - Add CLI flags: `--results-file PATH` (write JSON summary), `--resume-from PLATFORM` (skip earlier platforms), and `--screenshot-dir PATH` (as noted).
+  - Fix small test metadata issues (e.g., `more-after-scrolll` typo in `tests.json`).
+
+tests.json schema additions
+
+- Per-URL optional fields:
+  - `dismiss-selectors`: array of CSS selectors to click after page load
+  - `timeout`: numeric page load timeout seconds for this URL
+  - `extra-wait`: per-URL additional wait seconds
+
+CLI flags (summary)
+
+- `--profiledir PATH` — explicit profile path (existing)
+- `--profile-name NAME` — choose Firefox profile by display name
+- `--save-profile PATH` — persist the copied profile for reuse
+- `--no-cleanup` — keep `.temp-profile`
+- `--export-dir PATH` — where to write NDJSON exports
+- `--no-reset` — do not click `reset-all` between URLs
+- `--4cat-url URL` — base URL for 4CAT server
+- `--4cat-key KEY` — API key for 4CAT uploads
+- `--4cat-per-url` — upload per URL instead of per platform (optional)
+- `--no-interactive` — disable pausing (default is to pause per-platform)
+- `--pause-before-url` — pause before each URL
+- `--pause-on-fail` — pause when a test fails
+- `--extra-wait N` — add N seconds to every URL wait
+- `--screenshot-dir PATH` — save screenshots on fail/warning
+- `--results-file PATH` — write machine-readable results JSON
+- `--resume-from PLATFORM` — resume a run from a platform
+
+Verification checklist
+
+1. `python tests/test.py --sources instagram.com --export-dir ./exports` -> `exports/instagram.com.ndjson` exists and contains NDJSON with captured items.
+2. `python tests/test.py --save-profile .saved-profile --login` -> create a saved profile that can be reused with `--profiledir .saved-profile`.
+3. Run with default interactive behavior and confirm one pause per platform.
+4. `python tests/test.py --results-file results.json` -> JSON summary produced with per-URL status and counts.
+5. Test 4CAT upload using a local mock server and `--4cat-url http://localhost:8000 --4cat-key KEY`.
+
+Implementation steps (recommended order)
+
+1. Docs and small fixes (this document + tests.json typo fix).
+2. Profile management changes (`--profile-name`, improved copy ignore, `--save-profile`, lock detection).
+3. Export behavior: `--export-dir` + `execute_async_script` collection and NDJSON write.
+4. Runner robustness: try/except around URL loop, `--results-file`, fix `start_time` placement.
+5. Interactive and dismissal features (`dismiss-selectors`, pause flags, screenshots).
+6. 4CAT upload integration (optional, requires confirmation of auth header).
+
+Estimated effort: 6–10 hours of focused work to implement and test everything end-to-end; can be split into 3-4 incremental PRs.
+
+Open questions / confirmations needed
+
+- Confirm 4CAT API key header format (currently suggested: `Authorization: {key}`). If your 4CAT requires cookie-based auth, we should emulate the extension upload button via Selenium instead.
+- Confirm desired default for interactive mode. (Current recommendation: pause once per platform by default; provide `--no-interactive` to run fully headless.)
+
+Next steps
+
+- I have created a matching TODO list in the session tracker and written this document to `docs/test-plan.md`.
+- If you want, I can start implementing Phase 1 (profile management) in `tests/test.py` now and submit incremental changes.
+
+---
+
+Requested file: `docs/test-plan.md`
diff --git a/js/lib.js b/js/lib.js
@@ -57,6 +57,12 @@ class MissingMappedField {
     toString() {
         return `${this.value}`;
     }
+
+    // Mirror 4CAT's API serialization so JSON.stringify produces the same
+    // tagged form on both sides. See docs/4cat-map-item-api.md.
+    toJSON() {
+        return { __missing: true, value: this.value };
+    }
 }
 
 /**

diff --git a/modules/package.json b/modules/package.json
@@ -0,0 +1,3 @@
+{
+  "type": "module"
+}
diff --git a/tests/.env.example b/tests/.env.example
@@ -0,0 +1,9 @@
+# 4CAT API config for the map_item comparison tests.
+# Copy this file to .env in this directory and fill in real values.
+# .env is gitignored; .env.example is the committed template.
+
+# Base URL of the 4CAT instance to hit. No trailing slash.
+FOURCAT_URL=http://localhost
+
+# API key for that 4CAT instance. Get one from the 4CAT UI; tied to your user.
+FOURCAT_API_KEY=your-api-key-here
diff --git a/tests/__pycache__/test.cpython-39.pyc b/tests/__pycache__/test.cpython-39.pyc
diff --git a/tests/_module-info.js b/tests/_module-info.js
@@ -0,0 +1,45 @@
+/**
+ * Shared helper for the map_item test drivers.
+ *
+ * Pre-validates a module by:
+ *   1. Running `node --check` on its file (syntax check; avoids the
+ *      worker-killing experimental-ESM crash when a syntax error reaches
+ *      the dynamic importer).
+ *   2. Dynamically importing it and checking for a `map_item` export.
+ *
+ * Returns one of four states the test driver can branch on:
+ *   { state: 'ok',           map_item: <fn> }
+ *   { state: 'no_map_item' }
+ *   { state: 'syntax_error', error: <string> }
+ *   { state: 'import_error', error: <Error> }
+ */
+
+import { spawnSync } from 'node:child_process';
+import { join, dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const MODULES_ROOT = join(__dirname, '..', 'modules');
+
+function check_module_syntax(module_name) {
+    const module_path = join(MODULES_ROOT, `${module_name}.js`);
+    const result = spawnSync(process.execPath, ['--check', module_path], { encoding: 'utf8' });
+    if (result.status === 0) return null;
+    return (result.stderr || result.stdout || `exit code ${result.status}`).trim();
+}
+
+export async function inspect_module(module_name) {
+    const syntax_error = check_module_syntax(module_name);
+    if (syntax_error) {
+        return { state: 'syntax_error', error: syntax_error };
+    }
+    try {
+        const mod = await import(`../modules/${module_name}.js`);
+        if (typeof mod.map_item !== 'function') {
+            return { state: 'no_map_item' };
+        }
+        return { state: 'ok', map_item: mod.map_item };
+    } catch (e) {
+        return { state: 'import_error', error: e };
+    }
+}
diff --git a/tests/duplicate-behavior.test.js b/tests/duplicate-behavior.test.js
@@ -5,8 +5,9 @@
  * update or merge behaviors to duplicates across navigation boundaries.
  */
 
+import 'fake-indexeddb/auto';
+
 let Dexie;
-require('fake-indexeddb/auto');
 
 // Mock browser extension APIs
 global.browser = {

diff --git a/tests/fixtures/.gitignore b/tests/fixtures/.gitignore
@@ -0,0 +1,5 @@
+# Ignore everything in this directory
+*
+# Except these files
+!.gitignore
+!README.md
diff --git a/tests/fixtures/README.md b/tests/fixtures/README.md
@@ -0,0 +1,29 @@
+# Test fixtures for `map_item`
+
+Real captured items used to exercise each module's auto-generated `map_item`
+function.
+
+## Layout
+
+```
+tests/fixtures/
+  <module_name>/
+    <whatever>.ndjson
+    <whatever-else>.ndjson
+```
+
+`<module_name>` matches the filename in `modules/` without `.js` —
+e.g. `tiktok/` → `modules/tiktok.js`, `pinterest/` → `modules/pinterest.js`.
+You can drop multiple `.ndjson` files in a module folder; each gets its own
+`describe` block and each line becomes its own `test`.
+
+Filenames are free-form — the auto-export filename from the popup
+(`zeeschuimer-export-<platform>-<timestamp>.ndjson`) is fine.
+
+## Privacy / committing
+
+These files contain real captured platform data — usernames, post
+content, URLs, sometimes images and other PII. 
+
+If we want to create test exports or annonomize real exports, add them to 
+.gitignore.
diff --git a/tests/jest.config.js → tests/jest.config.cjs b/tests/jest.config.js → tests/jest.config.cjs
@@ -3,6 +3,7 @@ module.exports = {
   testMatch: ['**/*.test.js'],
   transform: {},
   moduleFileExtensions: ['js', 'json'],
-  collectCoverageFrom: ['duplicate-behavior.test.js'],
+  collectCoverageFrom: ['*.test.js'],
+  setupFiles: ['<rootDir>/setup-globals.cjs'],
   verbose: true
 };