Skip to content

feat: add summary generation and export functionality for results#341

Closed
JessUWE wants to merge 33 commits intomainfrom
feature/224-session-summary
Closed

feat: add summary generation and export functionality for results#341
JessUWE wants to merge 33 commits intomainfrom
feature/224-session-summary

Conversation

@JessUWE
Copy link
Copy Markdown
Contributor

@JessUWE JessUWE commented Feb 19, 2026

This PR introduces a session-level summary of outputs to help output checkers get an overview of all results generated during a session.

  • Implemented generate_summary() method to create session output summary DataFrame
  • Added write_summary() method to export summary to CSV file
  • Enhanced finalise_excel() to include a 'summary' sheet in Excel results
  • Integrated summary generation into finalise() workflow

- Add _extract_table_info() helper method to extract variables and record counts from table outputs
- Add _extract_regression_info() helper method to extract observation counts from regression outputs
- Add _mark_diff_risk() helper method to identify outputs with differencing risk
- Add write_summary() method to export summary to CSV
- Enhance finalise_excel() to include summary sheet as first sheet
- Add comprehensive test coverage for all helper methods and edge cases
@JessUWE JessUWE requested a review from rpreen February 19, 2026 18:44
@JessUWE JessUWE linked an issue Feb 19, 2026 that may be closed by this pull request
@JessUWE JessUWE removed the request for review from rpreen February 24, 2026 13:57
@JessUWE
Copy link
Copy Markdown
Contributor Author

JessUWE commented Feb 24, 2026

Needs further updating

- Add generate_summary() to provide high-level output overview for checkers

- Add multi-layered 'DO NOT RELEASE' warnings (filename, comment)
- Integrate session summary into JSON and Excel outputs

Resolves #224
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 5, 2026

Codecov Report

❌ Patch coverage is 82.40000% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.96%. Comparing base (fe1f204) to head (9002c7f).

Files with missing lines Patch % Lines
acro/record.py 78.57% 30 Missing ⚠️
acro/acro_regression.py 80.28% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #341      +/-   ##
==========================================
- Coverage   99.70%   96.96%   -2.74%     
==========================================
  Files           9        9              
  Lines        1354     1584     +230     
==========================================
+ Hits         1350     1536     +186     
- Misses          4       48      +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JessUWE JessUWE force-pushed the feature/224-session-summary branch from 552f10d to e3665ff Compare March 5, 2026 11:04
Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>
@JessUWE JessUWE force-pushed the feature/224-session-summary branch 2 times, most recently from b50d8a2 to 5f14a04 Compare March 5, 2026 13:37
JessUWE and others added 3 commits March 5, 2026 13:43
…kers

- Add multi-layered 'DO NOT RELEASE' warnings (filename, comment)
- Integrate session summary into JSON and Excel outputs
- Integrate session summary into JSON and Excel outputs
@JessUWE JessUWE requested review from jim-smith and rpreen March 5, 2026 14:00
JessUWE and others added 2 commits March 6, 2026 06:00
Remove tests for extracting table info based on index and columns names.

Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>
@JessUWE JessUWE requested a review from rpreen March 6, 2026 06:00
- All core functionality preserved and tested with real DataFrame scenarios
@jim-smith
Copy link
Copy Markdown
Contributor

Hi @JessUWE three things:

1: Just run this on a same demo which has the same tables with and without suppression. Should there fore be reporting diff risk as both have identical variables, but that is showing as False.

  • will look into the logic there.

2: think we re nearly ready to go out to the community saying this is what we can produce, is this helpful / how should we tweak it?

3: Based on the original meetings with PHS, I think we ma y want to add another table which has one column for each variable (and an extra for the output type) and one row for each output. Entryies big 'output type are 'table', 'regression', 'plot' etc pulled from summary) then the rest of the table is a binary -'is the variable (column) in the list of variables for that output.

@JessUWE
Copy link
Copy Markdown
Contributor Author

JessUWE commented Mar 10, 2026

Screenshot 2026-03-10 at 12 00 10 @jim-smith

Signed-off-by: Jim-smith <jim-smith@users.noreply.github.com>
@jim-smith
Copy link
Copy Markdown
Contributor

@JessUWE @rpreen hope to have time to look at this tomorrow.
Would like to be able to make screenshot and put it on uk-tre and sec-reboot email lists asking for input/feedback from output checkers about how we present this info and whether it is problematic if this got accidentally released.

Refactored summary generation methods to address reviewer feedback:

- Removed unused parameters from _extract_table_info method
   Eliminated 'method' and 'properties' parameters with noqa pragmas

- Fixed _extract_regression_info to return only total_records
   Removed unused 'variables' list that was never populated
   Changed return type from tuple[list[str], int] to int
   Updated docstring to reflect actual behavior

- Added  documentation for differencing risk detection

- Removed test_extract_table_info_exception_handling
assert not acro.suppress


def test_crosstab_std_dropna(data, acro):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this deleted test not needed any more?

Copy link
Copy Markdown
Contributor

@jim-smith jim-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes needed to what variables are extracted for regression analyses.

Don;t understand what the diff is telling me about insertions and deletions within the test file. think those are easy yes/no answers

@jim-smith
Copy link
Copy Markdown
Contributor

jim-smith commented Mar 17, 2026

one option to consider is to extract the variables involved at the time that the query is first dealt with e.g.

for a crosstab: called as

acro.crosstab((df.rowvar1,df.rowvar2), 
                               (df.colvar1,df.colvar2),
                                values=valvar,
                                 aggfunc="mean")

inside our acro.crosstab method we do:

....
vars_used = []
#get names of variables used in rows
if isinstance (index, pd.series):
     vars_used .append(index.name)
else: #index must be a list of pandas series to define row hierarchy)
     for var in index: 
             vars_used .append(var.name)

#get names of variables used in columns
if isinstance (columns, pd.series):
     vars_used .append(columns)
else: #columns must be a list of pandas series to define columns hierarchy)
     for var in columns: 
             vars_used .append(var.name)
#get name of series reported on by aggregation function if present
if values is not None:
    vars_used.append(values.name)

# now save your array of values in the json

@jim-smith
Copy link
Copy Markdown
Contributor

for regression commands specified with the trailing 'r' , the variables are all held in the lists endog and exog,

For regression commands specified with a formula, it has the form of a string and can be parsed

cheatsheet to help with that parsing

@JessUWE JessUWE force-pushed the feature/224-session-summary branch from 0e9f08b to 10b8ccd Compare March 26, 2026 14:12
JessUWE and others added 3 commits March 26, 2026 14:27
Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>
Add assertions to verify expected variables in summary row

Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>
- Add _get_endog_exog_variables() to extract variable names from endog/exog
- Add _get_formula_variables() to parse R-style formulas
- Add _split_formula_terms() helper for formula parsing
- Update regression methods to track variables in properties
- Add comprehensive tests for formula and variable extraction
Refactor tests and update docstrings for clarity.

Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>
@JessUWE
Copy link
Copy Markdown
Contributor Author

JessUWE commented Mar 27, 2026

Closing this PR for now as this approach isn't giving us the desired result instead of leaving the PR longer. I'll revisit this after doing some further research and testing. Thanks for the reviews so far, I'll incorporate this feedback when I come back to it

@JessUWE JessUWE closed this Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants