Patricia Nicole C. Trajano, Ayra Shane C. Villacarlos. 4 0
An Enhanced Hartigan-Wong algorithm applied for determining the crime rates per area in the Philippines. 6
6
-
-
-
-
-
-
-
- .
- .
- 0 .
- .
- 0 .
Undergraduate Thesis : (Bachelor of Science in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2024.
5
ABSTRACT: Clustering algorithms are very crucial in data analysis wherein it is used to analyse and find patterns within datasets. The Hartigan-Wong algorithm, a version of the K-means, is widely used in many different applications, but it exhibits limitations involving the accuracy of the final clusters for high-dimensional data, determining the outliers in the data, and its initialization of centroids. To address these issues, the authors integrate the Isolation Forest algorithm to identify and remove outliers present in the dataset, thereby improving the quality of input data. Second, the Principal Component Analysis (PCA) is employed as a dimensionality reduction method to address the issue of high-dimensional data. Lastly, the Quintile Methods is used to determine the optimal initial centroids, especially for high-dimensional datasets. The researchers evaluate these enhancements using real-world crime data from the Philippines, comparing against existing Hartigan-Wong implementations. The results demonstrate imrproved accuracy and efficiency against the existing implementations of the Hartigan-Wong algorithm. Overall, the study contributes to the advancement of data clustering by enhancing the Hartigan-Wong algorithm to better suit the complexities of high dimensional data. The enhancements offer valuable insights for enhancing the applicability and performance of the algorithm's performance in real-world scenarios involving high-dimensional data.