Jaira Venessa C. Obmina, Madeleine S. Tisang. 4 0

An Enhancement of Nikhath, Subrahmanyam, Vasavi's K-nearest neighbor algorithm's data pre - processing for dataset classifications in predicting multiple medical diseases. 6 6 - - - - - - - - . - . - 0 . - . - 0 .

Undergraduate Thesis : (Bachelor of Science in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2024.





5



ABSTRACT: This research intends to improve the K-Nearest Neighbor Algorithm's data preparation, with an emphasios on improving disease prediction across datasets of varied sizes by addressing imbalanced datasets and optimizing the selection of an effective k value. The researchers utilized ADASYN oversampling technique, PCA and GridSearch technique to address challenges in the K-Nearest Neighbor Algorithm. ADASYN provide a clode to equally balanced dataset to prevent inaccurate representations, while the PCA is utilized to extract and capture most significant variations in the data. Additionally, GridSearch improved the k value accuracy, reducing challenges with constant fixed l values. These techniques all con tribute to the study's overall effectiveness in accurately predicting diseases. When compared to eight datasets, the improved K-Nearest Neighbor algorithm consistently surpasses the previous approach in terms of accuracy, precision, RMSE, MSE, and t-test evaluation. The findings suggest that the enhanced KNN algorithm outperformed the existing KNN method in terms of prediction. The ADASYN method along with the PCA technique and GridSearch technique were incorporated into the data processing, which enhanced the algorithm's overall performance. This resulted in improved performance in predicting a wide range of medical problems across eight datasets. In conclusions, the study effectively aimed to boost the performance of the K-Nearest Neighbor (KNN) algorithm in categorizing medical conditions through enhanced data pre-processing techniques. Ultimately, the study's findings show that the enhanced KNN algorithm is effective in accurately predicting medical disease across a variety of datasets. The researchers recommend employing high-dimensional datasets to address the Dimensionality Curse: and to further ascertain the significance of this study, the researchers also proposed the exploration of its applicability beyond medical datasets. The results of this study will help improve medical diagnostics by predicting diseases mopre accurately. This will improve patient outcomes in healthcare settings by potentially enabling earlier detection and specific treatments.













5







2 = =









2




2 --0------


6 --0-- 2 --------



0 2 --


--20------





--------20--


--------20--


----2

/ 2

/ 2

/

/