EXPLAINABLE ARTIFICIAL INTELLIGENCE MODEL FOR EARLY PREDICTION OF HEART ATTACK USING LIME ON CLINICAL DATA
Ipek Balikci Cicek* and Zeynep Kucukakcali
ABSTRACT
Object: Cardiovascular Diseases (CVD) have become one of the leading causes of mortality on a global scale, claiming almost 17.9 million lives each year. According to studies, it is responsible for around 35% of all fatalities worldwide. Heart attack (HA) diagnosis and prognosis are critical medical duty to ensure accurate categorization, which allows cardiologists to deliver appropriate therapy to patients. Machine learning (ML) applications in the medical field have grown in popularity because they can spot patterns in data. Using ML to classify the occurrence of HA can assist decrease misdiagnosis. This study aims to create a model that can accurately forecast HA in order to reduce the number of people who die from them. Method: An open-access data was employed within the scope of the study to investigate the risk factors related with HA. The dataset used contains the values of 1319 patients and 8 input variables. The extreme gradient boosting (XGBoost) model was chosen to predict and classify patients, and an explainable approach based on the Local Interpretable Model-Agnostic Explanations (LIME) method was used to generate individual explanations for the model's decisions. The 10-fold cross validation approach was employed in the modeling, and the dataset was divided as 80%: 20% as training and test datasets. For model assessment, the measures of accuracy (ACC), balanced accuracy (b-ACC), sensitivity (SE), specificity (SP), positive predictive value (ppv), negative predictive value (npv), and F1-score were utilized. Results: When the HA positive and HA negative groups were examined, statistically significant differences were found in age, CK-MB and troponin variables. As a result of modeling with the XGBoost method, 98.5%, 98.6%, 99.3%, 97.9%, 96.8%, 99.6%, and 98.1% was obtained from ACC, b-ACC, SE, SP, ppv, npv and F1-score performance metrics, respectively. When the variable importance values obtained as a result of the model are examined, it is seen that the variables that best explain the HA and are most associated with the HA are troponin and CK-MB, respectively. When we examine at the individual-based LIME findings of the patients in the test data set, troponin >= 0.01, kcm > 5.79, 117.00 < glucose, 58.00 144.00 values enhance the likelihood of being HA positive. Conclusion: ML combined with LIME might provide a clear explanation of personalised risk prediction and provide clinicians with an intuitive knowledge of the effect of important model variables.
Keywords: Heart attack, classification, machine learning, risk factor, Local Interpretable Model-Agnostic Explanations.
[Full Text Article]
[Download Certificate]