Francis Earl S. Fojas andAna Mikaela A. Vilar. 4 0

A further enhancement of K-Medoid clustering algorithm applied on blood donor behavior segmentation / 6 6 Francis Earl S. Fojas and Ana Mikaela A. Vilar. - - - 101 pp. 28cm. - - - - - . - . - 0 . - . - 0 .

Thesis: (BSCS major in Computer Science)- Pamantasan ng Lungsod ng Maynila, 2018.





5



ABSTRACT: Cluster analysis serves a significant role in unsupervised machine learning, which is the exploration for new and undiscovered intelligence. Cluster analysis or commonly known as clustering is the grouping of similar data objects in a set. As technology evolves, the number of algorithm used in clustering increased. One of these clustering algorithm is the k-medoids algorithm. K-Medoids is an algorithm under the same umbrella of k-means. Presently, the latest version of k-medoid was formulated in a book, entitled Foundation of Intelligent Systems by M. Kryszkiewicz, A. Appice, D. Slezak, H. Rybinski, A. Skowron and Z.W. Ras in June 2017. In this enhanced k-medoid clustering, the algorithm produces clustering results with higher intra-cluster similarity and lower inter-cluster similarity. However, there are still three problems are identified on this algorithm. First is that the algorithm has no method of deciding where to best assign a data point having the same distance to multiple clusters. Second, the existing algorithm is not guaranteed to provide an optimal medoid for each cluster due to this algorithm using greedy search. Lastly, the algorithm naturally clusters, without identifying potential outliers. Due to this, the proponents decided to further enhance this algorithm and apply it on an application will be very beneficial for blood donation advertisers. The researchers aim to intergrade and algorithms to the existing k-medoid in order to solve the problems stated. The proponents gathered foreign and local documents, lectures and studies in order to support and prove the problems stated, as well as to find the appropriate and best solution for each problem. For the first problem, the formula for population variance was used. For the second problem, instead of finding new medoids by computing the total cost for each clustering, the solution formulated by the researchers was to ensure that in each cluster, the data point that has the minimum average distance will be considered as the medoid. Lastly, distance-based method was used in order to determine whether a data point is an outlier. As a result, variance was effective in determining where to assign a data point, since the rightness of the cluster is being considered if the data point will be designated. For the second solution, results show that clusters have tighter data points and all medoids are always in the centremost part of each cluster. For the third and final solution, all data points whose distance to its medoid is greater than the maximum inter-cluster distance are considered outliers. Aside from being more complex than the existing k-medoid, this algorithm provides better and more effective clustering results.













5







2 = =









2




2 --0------


6 --0-- 2 --------



0 2 --


--20------





--------20--


--------20--


----2

/ 2

/ 2

/

/

© Copyright 2024 Phoenix Library Management System - Pinnacle Technologies, Inc. All Rights Reserved.