Deriving the term-structure of loan write-off risk under IFRS 9 by using survival analysis: A benchmark study
The estimation of marginal loan write-off probabilities is a non-trivial task when modelling the loss given default (LGD) risk parameter in credit risk. We explore two types of survival models in estimating the overall write-off probability over default spell time, where these probabilities form the term-structure of write-off risk in aggregate. These survival models include a discrete-time hazard (DtH) model and a conditional inference survival tree. Both models are compared to a cross-sectional logistic regression model for write-off risk. All of these (first-stage) models are then ensconced in a broader two-stage LGD-modelling approach, wherein a loss severity model is estimated in the second stage. In expanding the model suite, a novel dichotomisation step is introduced for collapsing the write-off probability into a 0/1-value, prior to LGD-calculation. A benchmark study is subsequently conducted amongst the resulting LGD-models. We find that the DtH-model outperforms other two-stage LGD-models admirably across most diagnostics. However, a single-stage LGD-model still had the best results, likely due to the peculiar `L-shaped' LGD-distribution in our data. Ultimately, we believe that our tutorial-style work can enhance LGD-modelling practices when estimating the expected credit loss under IFRS 9.
This R-codebase can be run sequentially using the file numbering itself as a structure. Delinquency measures are algorithmically defined in DelinqM.R as data-driven functions, which may be valuable to the practitioner outside of the study's current scope. These delinquency measures were formulated and empirically tested in Botha22, as part of a loss optimisation exercise of recovery decision times, as implemented in the corresponding R-codebase. A simulation study from Botha2021 also demonstrated these delinquency measures at length, with its corresponding R-codebase. Similarly, the TruEnd-procedure from Botha2024 and its corresponding R-codebase is implemented in the TruEnd.R script, which includes a small variety of functions related to running the TruEnd-procedure practically.
This R-codebase assumes that monthly loan performance data is available. Naturally, the data itself can't be made publically available given its sensitive nature, as well as various data privacy laws, particularly the Protection of Personal Information (POPI) Act of 2013 in South Africa. However, the structure and type of data that is required for reproducing this study, is sufficiently described in the commentary within the scripts. This should enable the practitioner to extract and prepare data accordingly. Moreover, this codebase assumes South African macroeconomic data is available, as sourced and collated by internal staff of the bank in question.
All code and scripts are hereby released under an MIT license. Similarly, all graphs produced by relevant scripts as well as those published here, are hereby released under a Creative Commons Attribution (CC-BY 4.0) licence.