An Application Of K-Nearest Neighbor Using Cross-Validation Methods For Prediction Of Diabetes
DOI:
https://doi.org/10.64252/jvhx5k59Keywords:
KNN, Diabetes,Cross validation, Logistic Regression, Stratified K Fold, Train-Test Split.Abstract
Diabetes is a category of long-lasting metabolic diseases that result in person's blood sugar levels remaining consistently high. This disorder can develop either because the patient’s body does not create enough insulin or because the cells do not respond to insulin as expected. The three different subtypes of the disease are Type-1, Type-2and Gestational. Patient's body cannot produce insulin when they have Type-1 diabetes.The body's cells do not utilise insulin as it should be in people with Type-2 diabetes. Pregnancy can lead to Gestational diabetes.Numerous methods are employed to study this illness. For the analysis of diabetes, we employed machine learningon ‘Pima diabetes dataset’ consisting of 768 records. This study made use ofvariousCross-validation methods using K-nearest neighbor approach.Stratified K-fold cross-validation outperformed othermethodswhile considering accuracy and other performance parameters. Comparative analysis of results achieved usingKNN and Logistic Regression through different cross validation approaches is also studied during research.