diff --git a/.github/workflows/actions.yml b/.github/workflows/actions.yml index d554b7d..d9e29d0 100644 --- a/.github/workflows/actions.yml +++ b/.github/workflows/actions.yml @@ -33,7 +33,7 @@ jobs: coverage html - name: Upload coverage html report artifact - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: coverage-html-report path: htmlcov/ diff --git a/docs/recommendations.md b/docs/recommendations.md new file mode 100644 index 0000000..145eaae --- /dev/null +++ b/docs/recommendations.md @@ -0,0 +1,43 @@ +# Recommendations for SurPyval Documentation and Package Structure + +Based on an architectural and codebase review, here are several recommendations for improving SurPyval's package structure and documentation, specifically regarding Accelerated Failure Time (AFT) modeling, Proportional Hazards (PH) modeling, and Counting Processes. + +## 1. Package Structure Recommendations + +### Consolidate Regression and Recurrence Regression Models +Currently, there is a divergence in how regression models are structured. +* Standard survival regression models are located in `surpyval/regression/` (e.g., `cox_ph.py`, `accelerated_failure_time.py`, `proportional_hazards_fitter.py`). +* Recurrence/Counting regression models are located in `surpyval/recurrence/regression/` (e.g., `proportional_intensity.py`, `hpp_proportional_intensity.py`, `nhpp_proportional_intensity.py`). + +**Recommendation:** It would improve discoverability and logical flow to consolidate all regression-based modeling under `surpyval/regression/`. You could introduce sub-packages such as `surpyval/regression/survival/` for classic survival models (PH, AFT) and `surpyval/regression/recurrence/` for proportional intensity and recurrence regressions. + +### Standardize Naming Conventions +There are naming inconsistencies between class definitions and what is exposed to the user. +* `AcceleratedFailureTimeFitter` is defined, but when used with life models, the dynamically generated classes are suffixed with `AL` (Accelerated Life) instead of `AFT` (e.g., `WeibullPowerAL`). However, the explicitly defined `WeibullInversePowerAFT` uses `AFT`. +* **Recommendation:** Unify terminology. If standardizing on Accelerated Failure Time, ensure dynamically generated class names use the `AFT` suffix (e.g., `WeibullPowerAFT`) rather than `AL`. + +### Extracting `lifemodels` +The `lifemodels` directory under `surpyval/regression/` contains multiple models (`power`, `exponential`, `linear`, etc.). These are primarily used for AFT modeling. +* **Recommendation:** Move `lifemodels` into a more descriptive submodule like `surpyval/regression/aft_models/` or keep it as `surpyval/regression/lifemodels/` but ensure it is explicitly linked to AFT in the documentation. + +## 2. Documentation Improvements + +### Accelerated Failure Time (AFT) Modeling +The documentation for AFT modeling is currently very sparse or completely missing. +* In `docs/regression/parametric.rst`, there is a detailed section for "Parametric Proportional Hazards," but there is no equivalent section for AFT models, despite `surpyval/regression/__init__.py` creating numerous AFT models dynamically (e.g., `WeibullInversePowerAFT`, `NormalLinearAL`). +* The `docs/surpyval.regression.rst` file lists "Accelerated Time Models" and "Accelerated Life Models" as headers but contains no links to sub-pages or content under them. +* **Recommendation:** Create a dedicated file `docs/regression/accelerated_failure_time.rst` to document the AFT modeling capabilities. Explain the difference between `AFT` and `AL` terminology if both are kept. Provide examples of how to fit data using these dynamic classes. + +### Proportional Hazards (PH) Modeling +The Proportional Hazards documentation is present but can be expanded. +* `docs/regression/parametric.rst` provides a good overview of `ExponentialPH` and `WeibullPH` but doesn't fully detail how the baseline hazard is parameterized or how covariates are handled under the hood. +* The Semi-Parametric Cox PH model (`surpyval.regression.cox_ph.CoxPH_`) is documented but lacks clear examples in `docs/regression/cox_ph.rst` of how to handle left-truncation, right-censoring, and tie-breaking methods (Breslow vs. Efron). +* **Recommendation:** Add comprehensive code examples demonstrating `CoxPH.fit()` and `CoxPH.fit_from_df()`, illustrating tie-handling options. + +### Counting Processes (Recurrence/Renewal Models) +The Counting Processes models (HPP, NHPP, Proportional Intensity) have robust docstrings in the source code (e.g., `ProportionalIntensityNHPP.fit()`), but this isn't fully bubbled up to the Sphinx documentation in a user-friendly tutorial format. +* **Recommendation:** Create a dedicated tutorial in the docs (e.g., `docs/Recurrent Event Regression Modelling with SurPyval.rst`) that leverages the `rossi_static` dataset examples currently hidden in the docstrings. Show how to interpret the `beta` coefficients and the baseline rate parameters. +* Ensure the difference between `ProportionalIntensityHPP` and `ProportionalIntensityNHPP` is explicitly documented with mathematical formulations, just like the PH models. + +## Conclusion +SurPyval has an incredibly rich set of features, particularly the dynamic generation of `AL`/`AFT` parameter substitution fitters and counting process regressions. Standardizing the terminology, consolidating regression modules, and expanding the documentation to showcase these advanced capabilities will significantly lower the barrier to entry for new users. diff --git a/requirements.txt b/requirements.txt index f29c293..e3877f7 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,7 +1,8 @@ # Production requirements # For development, use requirements_dev.txt instead -lifelines==0.27.4 +lifelines>=0.28.0 numba==0.56.4 numpy-indexed==0.3.5 reliability==0.8.6 -matplotlib==3.6 \ No newline at end of file +matplotlib==3.6 +numdifftools \ No newline at end of file diff --git a/setup.py b/setup.py index 41e6789..57481d5 100755 --- a/setup.py +++ b/setup.py @@ -30,6 +30,7 @@ "numpy_indexed", "numba", "formulaic", + "numdifftools", ], include_package_data=True, package_data={"": ["datasets/*.csv", "py.typed"]},