Heart Disease Patients: A Prediction Analysis Using Machine Learning Models
DOI:
https://doi.org/10.64252/8y1eq713Keywords:
Heart disease, Prediction analysis, Machine learning model, Cosine similarity, Logistic Regression, Decision Tree classifierAbstract
Cardiovascular disease is the number one killer disease worldwide that causes up to a third of deaths recorded every year. In 2022, it has claimed 19.8 million life in the Eastern Europe alone with mortality rates being equally or higher in the United States, Southeast Asia, and India. To minimize these figures, early recognition is important, and machine learning (ML) can facilitate it using abundant amounts of health data. In this paper, the researcher investigates how to predict heart disease using five ML models and a data set of 70,000 registered patients. The data had 13 clinical features and lifestyle elements like age, cholesterol, blood pressure, glucose, smoking, and alcohol consumption. The dataset was prepared after data pre-processing that is, feature selection, and normalization prior to the training and test of the model on a 70:30 data split. Gradient Boosting model also produced the highest performance accuracy of 73.48%, precision of 0.7637, recall of 0.690 and the F1-score of 0.7249. These were supported by results of confusion matrix that saw high values of true positive and true negative. This paper has shown that ML models, especially Gradient Boosting model, can greatly assist in the early detection of heart diseases in patients, which suggests their use in the clinical diagnostics area provided that the models are further modified and tested.