Skip to content

overhaul: audit-and-cleanup architecture + accuracy corpus + agent API#121

Merged
sidmohan0 merged 8 commits intodevfrom
overhaul/audit-and-cleanup
Feb 13, 2026
Merged

overhaul: audit-and-cleanup architecture + accuracy corpus + agent API#121
sidmohan0 merged 8 commits intodevfrom
overhaul/audit-and-cleanup

Conversation

@sidmohan0
Copy link
Contributor

Summary

This PR executes the architectural overhaul spec for datafog-python and ships the v4.3.0 audit release shape.

Core changes

  • Added clean internal engine boundary in datafog/engine.py:
    • scan()
    • redact()
    • scan_and_redact()
    • dataclasses Entity, ScanResult, RedactResult
  • Added EngineNotAvailable and clear optional dependency behavior.
  • Rewired compatibility layers (DataFog, core helpers, CLI text redaction paths) to delegate through the engine API.

Accuracy infrastructure

  • Added corpus fixtures under tests/corpus/:
    • structured_pii.json (70)
    • unstructured_pii.json (20)
    • mixed_pii.json (20)
    • negative_cases.json (15)
    • edge_cases.json (20)
  • Added corpus-driven test suite tests/test_detection_accuracy.py.
  • Added explicit xfail cases for known model limitations with reasons.
  • Regex accuracy fixes (email boundaries, SSN adjacency, strict IPv4 handling, date/year behavior).

Agent API

  • Added datafog/agent.py and top-level exports:
    • sanitize()
    • scan_prompt()
    • filter_output()
    • create_guardrail()

Testing / CI / Docs

  • Added tests/test_engine_api.py and tests/test_agent_api.py.
  • Updated CI matrix in .github/workflows/ci.yml:
    • Python 3.10/3.11/3.12
    • core / nlp / nlp-advanced profiles
    • coverage enforcement
    • accuracy corpus run
  • Updated README and CHANGELOG.
  • Added audit deliverables in docs/audit/.

Validation

  • Full test run: 802 passed, 3 skipped, 27 xfailed, 0 failed
  • Accuracy suite: 534 passed, 27 xfailed, 0 failed
  • Coverage (final): 87.47% line, 76.63% branch
  • Build/install and backward-compat checks passed.

Related

Fixes #118

@sidmohan0 sidmohan0 merged commit d421cf4 into dev Feb 13, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Basic Usage Example Doesn't Work

1 participant