feat: add summary generation and export functionality for results by JessUWE · Pull Request #341 · AI-SDC/ACRO

JessUWE · 2026-02-19T18:40:54Z

This PR introduces a session-level summary of outputs to help output checkers get an overview of all results generated during a session.

Implemented generate_summary() method to create session output summary DataFrame
Added write_summary() method to export summary to CSV file
Enhanced finalise_excel() to include a 'summary' sheet in Excel results
Integrated summary generation into finalise() workflow

- Add _extract_table_info() helper method to extract variables and record counts from table outputs - Add _extract_regression_info() helper method to extract observation counts from regression outputs - Add _mark_diff_risk() helper method to identify outputs with differencing risk - Add write_summary() method to export summary to CSV - Enhance finalise_excel() to include summary sheet as first sheet - Add comprehensive test coverage for all helper methods and edge cases

JessUWE · 2026-02-24T15:41:06Z

Needs further updating

- Add generate_summary() to provide high-level output overview for checkers - Add multi-layered 'DO NOT RELEASE' warnings (filename, comment) - Integrate session summary into JSON and Excel outputs Resolves #224

codecov · 2026-03-05T09:40:52Z

Codecov Report

❌ Patch coverage is 82.40000% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.96%. Comparing base (fe1f204) to head (9002c7f).

Files with missing lines	Patch %	Lines
acro/record.py	78.57%	30 Missing ⚠️
acro/acro_regression.py	80.28%	14 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #341      +/-   ##
==========================================
- Coverage   99.70%   96.96%   -2.74%     
==========================================
  Files           9        9              
  Lines        1354     1584     +230     
==========================================
+ Hits         1350     1536     +186     
- Misses          4       48      +44

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>

…kers - Add multi-layered 'DO NOT RELEASE' warnings (filename, comment) - Integrate session summary into JSON and Excel outputs

- Integrate session summary into JSON and Excel outputs

for more information, see https://pre-commit.ci

test/test_initial.py

Remove tests for extracting table info based on index and columns names. Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>

for more information, see https://pre-commit.ci

- All core functionality preserved and tested with real DataFrame scenarios

jim-smith · 2026-03-06T12:57:14Z

Hi @JessUWE three things:

1: Just run this on a same demo which has the same tables with and without suppression. Should there fore be reporting diff risk as both have identical variables, but that is showing as False.

will look into the logic there.

2: think we re nearly ready to go out to the community saying this is what we can produce, is this helpful / how should we tweak it?

3: Based on the original meetings with PHS, I think we ma y want to add another table which has one column for each variable (and an extra for the output type) and one row for each output. Entryies big 'output type are 'table', 'regression', 'plot' etc pulled from summary) then the rest of the table is a binary -'is the variable (column) in the list of variables for that output.

…DC/ACRO into feature/224-session-summary

…emove made changes to the comments

JessUWE · 2026-03-10T12:00:31Z

@jim-smith

Signed-off-by: Jim-smith <jim-smith@users.noreply.github.com>

acro/record.py

test/test_initial.py

acro/record.py

jim-smith · 2026-03-12T18:22:49Z

@JessUWE @rpreen hope to have time to look at this tomorrow.
Would like to be able to make screenshot and put it on uk-tre and sec-reboot email lists asking for input/feedback from output checkers about how we present this info and whether it is problematic if this got accidentally released.

Refactored summary generation methods to address reviewer feedback: - Removed unused parameters from _extract_table_info method Eliminated 'method' and 'properties' parameters with noqa pragmas - Fixed _extract_regression_info to return only total_records Removed unused 'variables' list that was never populated Changed return type from tuple[list[str], int] to int Updated docstring to reflect actual behavior - Added documentation for differencing risk detection - Removed test_extract_table_info_exception_handling

acro/record.py

test/test_initial.py

jim-smith · 2026-03-16T19:06:03Z

test/test_initial.py

    assert not acro.suppress


-def test_crosstab_std_dropna(data, acro):


is this deleted test not needed any more?

test/test_initial.py

jim-smith

Some changes needed to what variables are extracted for regression analyses.

Don;t understand what the diff is telling me about insertions and deletions within the test file. think those are easy yes/no answers

jim-smith · 2026-03-17T15:01:15Z

one option to consider is to extract the variables involved at the time that the query is first dealt with e.g.

for a crosstab: called as

acro.crosstab((df.rowvar1,df.rowvar2), 
                               (df.colvar1,df.colvar2),
                                values=valvar,
                                 aggfunc="mean")

inside our acro.crosstab method we do:

....
vars_used = []
#get names of variables used in rows
if isinstance (index, pd.series):
     vars_used .append(index.name)
else: #index must be a list of pandas series to define row hierarchy)
     for var in index: 
             vars_used .append(var.name)

#get names of variables used in columns
if isinstance (columns, pd.series):
     vars_used .append(columns)
else: #columns must be a list of pandas series to define columns hierarchy)
     for var in columns: 
             vars_used .append(var.name)
#get name of series reported on by aggregation function if present
if values is not None:
    vars_used.append(values.name)

# now save your array of values in the json

jim-smith · 2026-03-17T15:10:36Z

for regression commands specified with the trailing 'r' , the variables are all held in the lists endog and exog,

For regression commands specified with a formula, it has the form of a string and can be parsed

cheatsheet to help with that parsing

Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>

Add assertions to verify expected variables in summary row Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>

- Add _get_endog_exog_variables() to extract variable names from endog/exog - Add _get_formula_variables() to parse R-style formulas - Add _split_formula_terms() helper for formula parsing - Update regression methods to track variables in properties - Add comprehensive tests for formula and variable extraction

Refactor tests and update docstrings for clarity. Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>

JessUWE · 2026-03-27T11:49:31Z

Closing this PR for now as this approach isn't giving us the desired result instead of leaving the PR longer. I'll revisit this after doing some further research and testing. Thanks for the reviews so far, I'll incorporate this feedback when I come back to it

JessUWE added 6 commits February 19, 2026 15:24

test: add coverage tests for table info extraction

ea0a63e

fix coverage

f3c6d96

test: add coverage for empty summary edge case (line 595)

f9ef74e

test: remove unused variable idx_type

47359b2

test: fix unused variable warnings by using underscore

f817df2

JessUWE requested a review from rpreen February 19, 2026 18:44

JessUWE linked an issue Feb 19, 2026 that may be closed by this pull request

[Feature]write summary report of all the outputs in a session #224

Open

JessUWE removed the request for review from rpreen February 24, 2026 13:57

JessUWE removed a link to an issue Feb 24, 2026

[Feature]write summary report of all the outputs in a session #224

Open

JessUWE linked an issue Feb 24, 2026 that may be closed by this pull request

[Feature]write summary report of all the outputs in a session #224

Open

feat: implement session summary with differencing risk detection

e8bc926

- Add generate_summary() to provide high-level output overview for checkers - Add multi-layered 'DO NOT RELEASE' warnings (filename, comment) - Integrate session summary into JSON and Excel outputs Resolves #224

feat(record): update warning message

e3665ff

JessUWE force-pushed the feature/224-session-summary branch from 552f10d to e3665ff Compare March 5, 2026 11:04

Merge branch 'main' into feature/224-session-summary

5f14a04

Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>

JessUWE force-pushed the feature/224-session-summary branch 2 times, most recently from b50d8a2 to 5f14a04 Compare March 5, 2026 13:37

JessUWE and others added 3 commits March 5, 2026 13:43

Add generate_summary() to provide high-level output overview for chec…

404bba3

…kers - Add multi-layered 'DO NOT RELEASE' warnings (filename, comment) - Integrate session summary into JSON and Excel outputs

- Add multi-layered 'DO NOT RELEASE' warnings (filename, comment)

fcbe9fa

- Integrate session summary into JSON and Excel outputs

[pre-commit.ci] auto fixes from pre-commit.com hooks

4c16dcd

for more information, see https://pre-commit.ci

JessUWE requested review from jim-smith and rpreen March 5, 2026 14:00

rpreen reviewed Mar 5, 2026

View reviewed changes

test/test_initial.py Outdated Show resolved Hide resolved

JessUWE and others added 2 commits March 6, 2026 06:00

Remove tests for index and columns name extraction

e1271ae

Remove tests for extracting table info based on index and columns names. Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6a96fd2

for more information, see https://pre-commit.ci

JessUWE requested a review from rpreen March 6, 2026 06:00

refactor: remove unreachable elif branches and unnecessary tests

c2cd7b6

- All core functionality preserved and tested with real DataFrame scenarios

JessUWE added 2 commits March 10, 2026 11:17

Merge branch 'feature/224-session-summary' of https://github.com/AI-S…

7234259

…DC/ACRO into feature/224-session-summary

refactor: simplify docstring for generate_variable_matrix_table and r…

a0116b8

…emove made changes to the comments

Add per-file ignores for acro_stata_parser.py

1238fdd

Signed-off-by: Jim-smith <jim-smith@users.noreply.github.com>