Ivan Joshua Francisco and Marry Anne Gutierrez. 4 0

An enhancement of bisecting K-means algorithm applied in exploratory data analysis for student performance monitoring system / 6 6 Ivan Joshua Francisco and Marry Anne Gutierrez. - - - 161 pp. 28 cm. - - - - - . - . - 0 . - . - 0 .

Thesis: (BSCS major in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2016.





5



ABSTRACT: Data Mining is the process of analyzing data from different perspectives and summarizes it into useful information. It has different parameters or methods. One of these is Cluster Analysis or Clustering, the grouping of data into similarities and dissimilarities. Clustering algorithms influence the clustering results directly. This study discusses Bisecting K-Means Algorithm as one of the popular algorithms applied in various fields of cluster analysis, and analyzes its shortcomings. These involve the randomly selection of centroids, bisection of dataset that has the largest cluster, and complete execution of the total iterations inputted. This study probes the implication of these, concerning the clustering's effectiveness and efficiency. Along with these problems, this study aims to improve the algorithm by providing an enhanced version, supported by the following objectives: (1) to append a formula, Su of Squared Errors, for initial centroids selection; (2) to modify the algorithm's criterion bisection through the application of Standard Deviation formula; and (3) to incorporate a new basis for the iterations termination. Subsequent to this, both the existing Bisecting K-Means and the researchers enhanced version were applied to Student Performance Monitoring System, which classifies the students to low, average and high performers according to the scores in the exams, to assay and compare the algorithms. Through various tests and simulations, in different sizes of data (small-10 students, medium-30 students), the study found that the researchers modifications with the Bisecting K-Means Algorithm were effective relatively. It clustered the data faster and more accurate. Further research is recommended to asses other ways (other formulas or criteria) and apply to larger data sets and other applications to enrich the proposed algorithm.













5







2 = =









2




2 --0------


6 --0-- 2 --------



0 2 --


--20------





--------20--


--------20--


----2

/ 2

/ 2

/

/

© Copyright 2024 Phoenix Library Management System - Pinnacle Technologies, Inc. All Rights Reserved.