A Comparison of Machine Learning Techniques for Predicting Payment Probability
Abstract
In credit risk, scoring models based on logistic regression have been developed to optimize the default risk assessment. However, these models require complex feature engineering, and their accuracy worsens as the arrears progresses. This study proposes the use of machine learning techniques (XGBoost and artificial neural networks) to generate scores in different arrears segments (No Arrears Segment, 1–30 Days of Arrears Segment, 31–90 Days of Arrears Segment, and All Segments). The Kolmogorov–Smirnov (KS) metric is used to assess the efficiency and predictive power of the models. To ensure the accuracy and reliability of the models, a five-step methodology is employed. It starts with the formulation of the problem, followed by the selection of a data sample and definition of the target variable, then a descriptive analysis of the data is performed to facilitate the data cleaning. Subsequently, the models are trained and tested, and finally, the results are analyzed, and the models obtained are interpreted. The results show that both XGBoost and artificial neural network models outperform logistic regression in most of the arrears segments. In the No Arrears Segment, the XGBoost model is the best with KS = 63.36%. In the 1–30 Segment, XGBoost is also the best with KS = 51.38%. In the 31–90 Segment, the artificial neural network model is the best with KS = 38.77%. Finally, with all segments of arrears, the XGBoost model is again the best with KS = 74.05%.
[....]