ROC-Tree Algorithm for Stratification of Binary Classifier Sets with Varied Discrimination Threshold
Abstract
Y.M.Ganushchak, P.J.C. Barenburg, J.G.Maessen, P. Sardari Nia
Binary classifier systems are used in multiple practical situations. Evaluation of diagnostic ability of a binary classifier, as its discrimination threshold is varied, often requires data transformation by performing aggregation operations. One of the most used aggregation methods is division by percentiles which divides the data set at the equal by size subgroups blindly, independently from the structure of data. We developed a ROC-tree algorithm for selection of threshold values, which is a recursive downwards splitting of each group at the two subgroups (branches) by cut-off point of ROC curve. We showed that suggested ROC-tree algorithm allows to define optimal (natural) boundaries and number of groups.
Two methods of data aggregation (percentiles and ROC-tree algorithms) were tested using the dataset ‘Credit Card Fraud Detection’ (https://www.kaggle.com/mlg-ulb/creditcardfraud). The results of one-vs-one reduction for the assessment of the multiclass classifications were presented as macro-average of hybrid threshold performance metrics. The macro-averages of metrics like Youden index, accuracy, optimized precision, and geometric mean were significantly different between used aggregation algorithms. The differences between macro-average of metrics ROC-tree and quartiles algorithms of stratification were preserved during 10-fold stratified cross-validation procedure.
Using algorithm sensitive to the distribution patterns, e.g., ROC-tree algorithm showed adequate stratification at groups by natural cut-off points determined by the data set composition. This method provides effective aggregation for summarizing or analyzing data in a various field of sciences. In health care described algorithm allows effective evaluation of mortality causes and quality control specialized medical care by hospitals.