Performance Evaluation of State-of-the-Art Texture Feature Extraction Techniques on Medical Imagery Tasks
Abstract
Samuel Kusi-Duah, Obed Appiah, Peter Appiahene
Interpreting medical images is certainly a complex task which requires extensive knowledge [1]. According to Computer Aided Diagnosis (CAD) serves as a second opinion that will help radiologists in diagnosis and on the other hand Content-based Image Retrieval uses visual content to help users browse, search and retrieve similar medical images from a database based on the user’s interest [2-4]. The competency of the CBMIR system depends on feature extraction methods [5]. The textural features are very important to determine the content of a medical image. Textural features provide scenic depth, the spatial distribution of tonal variation, and surface orientation [6]. Therefore, this study seeks to compare and evaluate some of the hand-crafted texture feature extraction techniques in CBMIR. This is to help those concerned in enhancing CBIR systems to make informed decisions concerning the selection of the best textural feature extraction techniques.
Since there is no clear indication of which of the various texture feature extraction techniques is best suited for a given performance metric when considering which of the techniques to choose for a particular study in CBMIR systems. The objective of this work, therefore, is to comparatively evaluate the performance of the following texture feature extraction techniques; Local Binary Pattern (LBP), Gabor Filter, Gray-Level Co-occurrence Matrix (GLCM), Haralick Descriptor, Features from Accelerated Segment Test (FAST) and Features from Accelerated Segment Test and Binary Robust Independent Elementary Features (FAST &BRIEF) using the metrics; precision, recall, f1-score, mean squared error (MSE), accuracy and time. These techniques are coupled with specific similarity measure to obtain results.
The results showed that LBP, Haralick Descriptor, FAST, and GLCM had the best results in terms of (Precision and Accuracy), Time, F1-Score, and Recall respectively. LBP had 82.05% and 88.23% scores for precision and accuracy respectively. The following scores represent the performance of the Haralick descriptor, FAST, and GLCM models respectively; 0.88s, 38.7%, and 44.82%. These test scores are obtained from datasets ranging from 1k-10.5k.
Aside from LBP outperforming the other 5 models mentioned, it still outperformed the following proposed models ' [7]’, ‘Tamura texture feature and wavelet transform combined with Hausdorff distance - [8]’, ‘ [9]’ in terms of (precision, accuracy, and recall) and (precision and recall) respectively and probably f1-score (since f1-score is the weighted average of precision and recall). It is believed that an ensemble of LBP, Haralick descriptors, and Support Vector Machine (SVM) can represent a robust system for both medical image retrieval and classification.