feat: add summary generation and export functionality for results#341
feat: add summary generation and export functionality for results#341
Conversation
- Add _extract_table_info() helper method to extract variables and record counts from table outputs - Add _extract_regression_info() helper method to extract observation counts from regression outputs - Add _mark_diff_risk() helper method to identify outputs with differencing risk - Add write_summary() method to export summary to CSV - Enhance finalise_excel() to include summary sheet as first sheet - Add comprehensive test coverage for all helper methods and edge cases
|
Needs further updating |
- Add generate_summary() to provide high-level output overview for checkers - Add multi-layered 'DO NOT RELEASE' warnings (filename, comment) - Integrate session summary into JSON and Excel outputs Resolves #224
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #341 +/- ##
==========================================
- Coverage 99.70% 96.96% -2.74%
==========================================
Files 9 9
Lines 1354 1584 +230
==========================================
+ Hits 1350 1536 +186
- Misses 4 48 +44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
552f10d to
e3665ff
Compare
Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>
b50d8a2 to
5f14a04
Compare
…kers - Add multi-layered 'DO NOT RELEASE' warnings (filename, comment) - Integrate session summary into JSON and Excel outputs
- Integrate session summary into JSON and Excel outputs
for more information, see https://pre-commit.ci
Remove tests for extracting table info based on index and columns names. Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>
for more information, see https://pre-commit.ci
- All core functionality preserved and tested with real DataFrame scenarios
|
Hi @JessUWE three things: 1: Just run this on a same demo which has the same tables with and without suppression. Should there fore be reporting diff risk as both have identical variables, but that is showing as False.
2: think we re nearly ready to go out to the community saying this is what we can produce, is this helpful / how should we tweak it? 3: Based on the original meetings with PHS, I think we ma y want to add another table which has one column for each variable (and an extra for the output type) and one row for each output. Entryies big 'output type are 'table', 'regression', 'plot' etc pulled from summary) then the rest of the table is a binary -'is the variable (column) in the list of variables for that output. |
…DC/ACRO into feature/224-session-summary
…emove made changes to the comments
Signed-off-by: Jim-smith <jim-smith@users.noreply.github.com>
Refactored summary generation methods to address reviewer feedback: - Removed unused parameters from _extract_table_info method Eliminated 'method' and 'properties' parameters with noqa pragmas - Fixed _extract_regression_info to return only total_records Removed unused 'variables' list that was never populated Changed return type from tuple[list[str], int] to int Updated docstring to reflect actual behavior - Added documentation for differencing risk detection - Removed test_extract_table_info_exception_handling
| assert not acro.suppress | ||
|
|
||
|
|
||
| def test_crosstab_std_dropna(data, acro): |
There was a problem hiding this comment.
is this deleted test not needed any more?
jim-smith
left a comment
There was a problem hiding this comment.
Some changes needed to what variables are extracted for regression analyses.
Don;t understand what the diff is telling me about insertions and deletions within the test file. think those are easy yes/no answers
|
one option to consider is to extract the variables involved at the time that the query is first dealt with e.g. for a crosstab: called as inside our acro.crosstab method we do: ....
vars_used = []
#get names of variables used in rows
if isinstance (index, pd.series):
vars_used .append(index.name)
else: #index must be a list of pandas series to define row hierarchy)
for var in index:
vars_used .append(var.name)
#get names of variables used in columns
if isinstance (columns, pd.series):
vars_used .append(columns)
else: #columns must be a list of pandas series to define columns hierarchy)
for var in columns:
vars_used .append(var.name)
#get name of series reported on by aggregation function if present
if values is not None:
vars_used.append(values.name)
# now save your array of values in the json |
|
for regression commands specified with the trailing 'r' , the variables are all held in the lists For regression commands specified with a formula, it has the form of a string and can be parsed |
0e9f08b to
10b8ccd
Compare
Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>
Add assertions to verify expected variables in summary row Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>
- Add _get_endog_exog_variables() to extract variable names from endog/exog - Add _get_formula_variables() to parse R-style formulas - Add _split_formula_terms() helper for formula parsing - Update regression methods to track variables in properties - Add comprehensive tests for formula and variable extraction
90b5dab to
b4a3871
Compare
Refactor tests and update docstrings for clarity. Signed-off-by: Jessica Ikechukwu <Jessica.Ikechukwu@uwe.ac.uk>
|
Closing this PR for now as this approach isn't giving us the desired result instead of leaving the PR longer. I'll revisit this after doing some further research and testing. Thanks for the reviews so far, I'll incorporate this feedback when I come back to it |

This PR introduces a session-level summary of outputs to help output checkers get an overview of all results generated during a session.