Skip to content

fix(security): auto-cleanup merged_data_model.yaml, remove CLI option and eliminate redundant re-read (#677 #704)#727

Open
oboehmer wants to merge 31 commits intomainfrom
fix/677-704-merged-data-cleanup-refactor
Open

fix(security): auto-cleanup merged_data_model.yaml, remove CLI option and eliminate redundant re-read (#677 #704)#727
oboehmer wants to merge 31 commits intomainfrom
fix/677-704-merged-data-cleanup-refactor

Conversation

@oboehmer
Copy link
Copy Markdown
Collaborator

@oboehmer oboehmer commented Mar 27, 2026

Description

merged_data_model_test_variables.yaml is written to the output directory on every run and contains the full merged data model, which may include credentials and other sensitive values. Prior to this PR the file was never deleted, leaving it on disk after test execution. This PR registers it with CleanupManager for automatic deletion on normal exit, SIGTERM, and SIGINT.

Per-device job scripts (.py) and testbed files (.yaml) created by DeviceExecutor with NamedTemporaryFile(delete=False) were similarly never deleted, accumulating in the OS temp directory across runs. These are also now registered with CleanupManager.

As part of the same cleanup, a performance regression is also fixed: RobotWriter was re-reading and re-parsing the merged data YAML from disk after it had just been written — costing ~51 s and ~400 MB peak RAM on large data models. The already-loaded dict is now passed through the call chain (CombinedOrchestratorRobotOrchestratorRobotWriter) directly, eliminating the redundant file read entirely.

Additional simplifications made possible by these changes:

  • RobotWriter._convert_data_model_for_templates() deleted — this method performed a JSON round-trip (json.dumpsjson.loads) to convert CommentedMap (ruamel.yaml's dict subclass) into plain Python dicts before passing data to Jinja2. This was originally necessary to avoid duplicate-key issues with older ruamel.yaml behaviour. It is now safe to remove because nac-yaml 2.0 guarantees plain dict / list outputCommentedMap and CommentedSeq are fully unwrapped before being returned to callers. Jinja2 receives ordinary Python types directly, so no conversion layer is needed. Removing the JSON round-trip is also the primary source of the ~51 s / ~400 MB saving on large data models.
  • DataMerger.load_yaml_file() removed — no remaining callers.
  • DataMerger.write_merged_data_model() filename parameter removed — the output filename is now always MERGED_DATA_FILENAME (renamed from DEFAULT_MERGED_DATA_FILENAME; the DEFAULT_ prefix implied variability that no longer exists).
  • DataMerger.merged_data_path() added as single query point for the file location — callers no longer reconstruct output_dir / MERGED_DATA_FILENAME independently.
  • write_merged_data_model() returns the Path it wrote and now owns the chmod 0o600 call — "write this sensitive file securely" is a single responsibility.
  • DeviceExecutor receives merged_data_path as a constructor argument from the orchestrator (which already holds it) rather than re-deriving it from base_output_dir, removing the DataMerger import from device_executor.py entirely.
  • CleanupManager.register() gains a keep_if_debug flag. Files registered with keep_if_debug=True are retained when NAC_TEST_DEBUG is set, letting developers inspect intermediate artifacts (merged data YAML, job scripts, testbed YAMLs) without any extra flags or code changes.
  • logger.exception() added to the top-level exception handler in main() — the full stack trace (including chained causes) is now captured in the log file while terminal output remains a clean single line for end users ( unrelated to this PR, but I touched this area so I fixed Add logger.exception() to main.py exception handler for full stack traces #573 while at it)
  • MERGED_DATA_FILENAME constant introduced as the single source of truth for the merged data filename, replacing hardcoded string literals across all call sites.
  • DataMerger unit and integration tests added (tests/unit/test_data_merger.py).

Closes

Related Issue(s)

  • N/A

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring / Technical debt (internal improvements with no user-facing changes)
  • Documentation update
  • Chore (build process, CI, tooling, dependencies)
  • Other (please describe):

Breaking Changes

The --merged-data-filename CLI option has been removed. The output filename for the merged data model is now always merged_data_model_test_variables.yaml and cannot be customised. Any scripts or CI pipelines referencing --merged-data-filename must remove that argument.

Test Framework Affected

  • PyATS
  • Robot Framework
  • Both
  • N/A (not test-framework specific)

Network as Code (NaC) Architecture Affected

  • All architectures

Platform Tested

nac-test supports macOS and Linux only

  • macOS (version tested: macOS 15 / Python 3.12)
  • Linux (distro/version tested: )

Key Changes

  • CleanupManager: new utility that deletes registered files on exit, SIGTERM, and SIGINT; register() gains keep_if_debug flag
  • main.py: registers merged_data_model_test_variables.yaml with CleanupManager immediately after writing; --merged-data-filename CLI option removed; exit_with_cleanup() closure removed;
  • DeviceExecutor: registers per-device job scripts and testbed YAMLs with CleanupManager; receives merged_data_path injected by orchestrator rather than re-deriving it
  • RobotWriter: constructor now accepts merged_data: dict instead of merged_data_path: Path; no file I/O or type conversion on init
  • RobotOrchestrator / CombinedOrchestrator: accept and forward merged_data dict
  • DataMerger: removes filename parameter from write_merged_data_model(), removes load_yaml_file() entirely, adds merged_data_path() helper, returns Path from write_merged_data_model(), owns chmod 0o600
  • _convert_data_model_for_templates() deleted — safe due to nac-yaml 2.0 plain-dict output guarantee
  • MERGED_DATA_FILE_MODE = 0o600 constant added; DEFAULT_MERGED_DATA_FILENAME renamed to MERGED_DATA_FILENAME
  • Data merge fixtures moved from tests/integration/fixtures/ to tests/unit/fixtures/

Testing Done

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed:
    • PyATS tests executed successfully
    • Robot Framework tests executed successfully
    • D2D/SSH tests executed successfully (if applicable)
    • HTML reports generated correctly
  • All existing tests pass (pytest / pre-commit run -a)

Test Commands Used

uv run pytest -n auto --dist loadscope tests/
uv run pre-commit run -a

Additional Notes

PyATS subprocesses still read the merged data file from disk during their execution window — the file is only deleted after all test execution completes, so this is safe.

🤖 Generated with Claude Code

oboehmer added 30 commits March 22, 2026 22:29
Disable EnvironmentDebugPlugin which was writing environment variables
(including passwords) to env.txt files in PyATS result archives.

Add E2E regression test that scans all output artifacts (including ZIP
contents) for a sentinel password value to catch future credential leaks.
Automatically delete merged_data_model_test_variables.yaml after test
execution to prevent potential credential exposure in shared environments.

Changes:
- Add CleanupManager singleton for registering files to delete on exit
- Register merged data file for cleanup immediately after creation
- Remove duplicate data merging in RobotWriter (CLI already merges)
- Simplify RobotWriter to accept merged_data_path instead of data_paths
- Handle cleanup on normal exit (atexit) and SIGTERM signals

The merged data file contains sensitive test variables that should not
persist after test execution completes.

Closes #677
Eliminates the redundant YAML re-read in RobotWriter (fixes #704).
Previously, main.py merged and wrote the data model to disk, then
RobotWriter re-read and re-parsed it — costing ~51 s and ~400 MB peak
RAM on large data models. The already-loaded dict is now passed through
CombinedOrchestrator → RobotOrchestrator → RobotWriter directly.

DataMerger is simplified in line with the removal of --merged-data-filename
in #677: the filename parameter is removed from write_merged_data_model()
(hardcoded to DEFAULT_MERGED_DATA_FILENAME), and load_yaml_file() is
removed entirely as it had no remaining callers.

RobotWriter._convert_data_model_for_templates() is deleted: the JSON
round-trip was intended to convert CommentedMap to plain dicts for
Jinja2, but CommentedMap is a dict subclass and is fully compatible
natively. The redundant self.template_data alias is collapsed into
self.data.

Tests:
- Add tests/unit/test_data_merger.py covering merge content correctness
  and write/roundtrip behaviour, replacing coverage lost when the
  commented-out integration tests were cleaned up
- Add tests/unit/robot/test_robot_writer.py covering constructor
  semantics and render_template context selection
- Move data_merge fixtures from tests/integration/fixtures/ to
  tests/unit/fixtures/ where they belong
signal.SIG_DFL has the integer value 0, which is falsy. The previous
`original if original else signal.SIG_DFL` expression would silently
treat a legitimate SIG_DFL original disposition as "not set" and still
return SIG_DFL — technically a no-op here, but masking the real issue:
any None (unrecorded handler) and SIG_DFL were indistinguishable.
Switch to `original if original is not None else signal.SIG_DFL` so the
intent is explicit and the two cases are correctly distinguished.

Add parametrized tests covering SIG_DFL (falsy), SIG_IGN (truthy), and
None (fallback) to pin the correct behaviour.
Consolidate all knowledge of where the merged data model file lives into
DataMerger, eliminating independent path reconstruction across callers.

- Rename DEFAULT_MERGED_DATA_FILENAME → MERGED_DATA_FILENAME; the
  DEFAULT_ prefix implied variability that no longer exists since the
  --merged-data-filename CLI option was removed
- Add MERGED_DATA_FILE_MODE = 0o600 constant (owner read/write only)
- Add DataMerger.merged_data_path() as the single query point for the
  file location; callers no longer construct output_dir / MERGED_DATA_FILENAME
- write_merged_data_model() now returns the Path it wrote, owns the
  chmod call, and takes no filename argument — "write this sensitive
  file securely" is a single responsibility
- Move merged_data_path injection into DeviceExecutor constructor so the
  orchestrator (which already holds the path) passes it down rather than
  DeviceExecutor re-deriving it from base_output_dir; removes the
  DataMerger import from device_executor.py entirely
typer.Exit is caught by Click internally, which calls sys.exit() —
this triggers atexit handlers reliably on every exit path. The
exit_with_cleanup closure (which called cleanup_now() before raising
typer.Exit) was therefore redundant with the atexit handler already
registered by CleanupManager.

Replace all exit_with_cleanup() call sites with plain raise typer.Exit(),
remove the closure, and drop the now-unused NoReturn import.
CleanupManager.register() gains a skip_if_debug parameter. Files
registered with skip_if_debug=True are deleted on normal exit but
retained when NAC_TEST_DEBUG is set, letting developers inspect
intermediate artifacts without any extra flags or code changes.

The merged data model YAML is the first consumer: it now registers with
skip_if_debug=True so it survives for inspection in debug runs. The
same mechanism is available for job scripts, testbed YAMLs, and any
other artifacts tracked in PR #697.

Internally, _files changes from set[Path] to dict[Path, bool], keeping
the flag co-located with the path and avoiding two collections to sync.

Remove the exit_with_cleanup() closure from main() — typer.Exit is
caught by Click which calls sys.exit(), reliably triggering atexit
handlers on every exit path. The closure and all cleanup_now() pre-exit
calls were redundant with the already-registered atexit handler.
…ut clean (#573)

Add logger.exception() before the typer.echo() in the top-level exception
handler so the full stack trace (including chained causes) is captured in
the log file. Terminal output remains a single clean line for end users.
Call .absolute() when assigning self.merged_data_path in PyATSOrchestrator
so all use sites get a consistent absolute path without redundant calls.
Move the explanatory comment to the assignment and rephrase it to not imply
a cwd change is always in effect.
The per-device job (.py) and testbed (.yaml) files created with
NamedTemporaryFile(delete=False) were never deleted, accumulating in the
OS temp directory across runs. Register them with CleanupManager so they
are removed on normal exit, SIGTERM, and SIGINT.

skip_if_debug=True is used consistently with merged_data_model_test_variables.yaml
so that NAC_TEST_DEBUG retains all intermediate artifacts for debugging.

Also removes the longstanding "UNCOMMENT ME" commented-out deletion block
that this replaces.
The mock-wiring and artifact-creation block (pabot return value,
RobotReportGenerator stub, robot_results/ stub files) was copy-pasted
verbatim across test_verbose_flag_passed_to_pabot and
test_include_exclude_tags_passed_to_pabot. Extract it into a
_setup_run_tests_mocks static helper so future tests can reuse it
without duplicating the setup.
test_returns_path_to_written_file already asserts the same equality
(with operands swapped) and additionally checks that the file exists,
making the duplicate redundant.
Parametrized subprocess test covering the production exit path with
skip_if_debug in both debug and non-debug modes — the atexit handler
can't be exercised from in-process unit tests.
The old name required parsing "skip [deletion] if debug" to understand
the intent. keep_if_debug=True reads directly as what it does.
Fill the contract gap where unit tests call _signal_handler() directly,
bypassing the _install_signal_handlers() registration path.

- Add test_cleanup_on_signal parametrized over SIGTERM and SIGINT:
  spawns a subprocess, waits for a "ready" handshake, delivers the
  signal, and asserts the sentinel file was deleted
- Add timeout=30 to the existing subprocess.run() call
- Rename _SCRIPT → _ATEXIT_SCRIPT, add _SIGNAL_SCRIPT at module level
- Move @pytest.mark.windows to the atexit test only; signal test runs
  Linux-only by CI convention
…nstant

- robot_writer.py: raise ValueError with file/directive context when
  iterate_list_chunked chunk_size cannot be parsed — propagates through
  the existing except Exception handler in combined_orchestrator, aborting
  execution cleanly instead of silently continuing with a bad directive
- robot/orchestrator.py: remove stale step "3. Creates merged data model
  file" from run_tests() docstring; renumber remaining steps 4-6 → 3-5
- combined_orchestrator.py: replace bare "log.html" with LOG_HTML constant
Remove the private _cleanup() method and the cleanup_now() wrapper —
the split served no purpose since both had identical behaviour and
there was no reason to bypass the public API in atexit.register() or
_signal_handler(). Inline the implementation directly into run_cleanup(),
which is now the single method registered with atexit, called from the
signal handler, and available for manual invocation.
- Parametrize test_unregister_removes_path over keep_if_debug values
- Remove test_register_multiple_files (only tested dict insertion count)
- Strengthen test_cleanup_is_idempotent — assert file deleted on first call
- Replace test_concurrent_registration with test_concurrent_register_and_cleanup:
  races register() against run_cleanup() to verify the lock prevents silent
  data loss, rather than just testing concurrent dict insertions
@oboehmer oboehmer added enhancement New feature or request mvp-tech-debt Technical debt from MVP that needs addressing refactor Code refactoring without changing functionality labels Mar 28, 2026
@oboehmer oboehmer self-assigned this Mar 28, 2026
@oboehmer oboehmer marked this pull request as ready for review March 28, 2026 16:29
@oboehmer oboehmer changed the title fix(security): auto-cleanup merged_data_model_test_variables.yaml and eliminate redundant re-read (#677 #704) fix(security): auto-cleanup merged_data_model.yaml, remove CLI option and eliminate redundant re-read (#677 #704) Mar 28, 2026
@oboehmer oboehmer requested review from aitestino and danischm March 28, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request mvp-tech-debt Technical debt from MVP that needs addressing prio: high refactor Code refactoring without changing functionality

Projects

None yet

1 participant