| 000 -LEADER |
| fixed length control field |
02497nam a22002417a 4500 |
| 003 - CONTROL NUMBER IDENTIFIER |
| control field |
FT8875 |
| 005 - DATE AND TIME OF LATEST TRANSACTION |
| control field |
20251216130016.0 |
| 008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
| fixed length control field |
251216b ||||| |||| 00| 0 eng d |
| 041 ## - LANGUAGE CODE |
| Language code of text/sound track or separate title |
engtag |
| 050 ## - LIBRARY OF CONGRESS CALL NUMBER |
| Classification number |
QA76.9 A43 B39 2025 |
| 082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER |
| Classification number |
. |
| 100 1# - MAIN ENTRY--PERSONAL NAME |
| Personal name |
Arpon, Jasmia C.; Japson Denise H. |
| 245 ## - TITLE STATEMENT |
| Title |
Enhancement of K-means algorithm applied to movie recommendation system |
| 264 #1 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE |
| Place of production, publication, distribution, manufacture |
. |
| Name of producer, publisher, distributor, manufacturer |
. |
| Date of production, publication, distribution, manufacture, or copyright notice |
c2025 |
| 300 ## - PHYSICAL DESCRIPTION |
| Other physical details |
Undergraduate Thesis: (Bachelor of Science in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2025 |
| 336 ## - CONTENT TYPE |
| Source |
text |
| Content type term |
text |
| Content type code |
text |
| 337 ## - MEDIA TYPE |
| Source |
unmediated |
| Media type term |
unmediated |
| Media type code |
unmediated |
| 338 ## - CARRIER TYPE |
| Source |
volume |
| Carrier type term |
volume |
| Carrier type code |
volume |
| 505 ## - FORMATTED CONTENTS NOTE |
| Formatted contents note |
ABSTRACT: This study aims to enhance the traditional K-Means clustering algorithm, which is known for its sensitivity to outliers, reliance on manually selected cluster numbers, and difficulty in clustering data with varying sizes and densities. To address these issues, the enhanced algorithm integrated three key enhancements: optimal cluster selection using the Calinski-Harabasz Index (CHI), outlier detection though Local Outlier Factor (LOF), and the use of Cosine Similarity for distance metric. The CHI determined that only 2 clusters were optimal, compared to the 5 clusters used in the original method, simplifying interpretation and automating the selection of k clusters. To address the algorithm’s challenges in clustering data of varying size and density, the enhanced method utilized Cosine Similarity, allowing it to handle clusters with irregular shapes and varying densities more effectively than Euclidean distance. This resulted in clearer boundaries and reduced overlap between user groups. Lastly, to address the algorithm’s sensitivity to outliers, LOF was implemented which effectively identified and removed 51 outliers from the original 610-user dataset. This resulted in tighter, less noisy clusters. These enhancements led to an improved silhouette score from 0.01012 to 0.1359, demonstrating greater intra-cluster cohesion and inter-cluster separation. The results, visualized through comparative plots, highlight the performance advantage of the enhanced algorithm in generating cleaner and more meaningful clusters. Overall, the enhanced K-Means method more effective in capturing user preferences by generating accurate and robust clusters, making it a valuable tool for recommendation systems and user behavior analysis. |
| 526 ## - STUDY PROGRAM INFORMATION NOTE |
| Classification |
Filipiniana |
| 655 ## - INDEX TERM--GENRE/FORM |
| Genre/form data or focus term |
academic writing |
| 942 ## - ADDED ENTRY ELEMENTS |
| Source of classification or shelving scheme |
|
| Item type |
Thesis/Dissertation |