000 02712nam a22001817a 4500
003 FT8937
005 20260112125804.0
050 _aQA76.9 A43 M35 2025
100 1 _aMalaga, Karl Benedict K.; Verdillo, Korinne L.
245 _aAn enhancement of the Jaro-Winkler fuzzy searching algorithm applied in library search engine
264 1 _cc2025
300 _bUndergraduate Thesis: (Bachelor of Science in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2025
336 _2text
_atext
_btext
337 _2 unmediated
_a unmediated
_b unmediated
338 _2 volume
_a volume
_b volume
505 _aABSTRACT: The Jaro-Winkler algorithm is an approximate string-matching algorithm that calculates the similarity between two strings based on matching characters within a defined window (Ranzjin, 2013). While the algorithm generally performs well in string matching, it has several limitations. When comparing a shorter string that matches a later portion of a longer string, it often receives a lower similarity score, causing it to be excluded from the results despite being a valid match. Similarity, when comparing longer strings where words appear in different orders, the algorithm assigns a lower score, leading to incorrect exclusions. Additionally, the Winkler scale introduces a bias by favoring words with similar prefixes. This study enhances the Jaro-Winkler algorithm to improve its accuracy in handling text variations and mitigating bias, particularly in a library search engine. The enhancements integrate a Rabin-Karp-inspired technique, a Jaccard similarity method, and a suffix weighting method to better recognize relevant matches. A dataset of over 100,000 book titles obtained from Kaggle was used to evaluate the enhanced Jaro-Winkler’s performance. Key ,metrics analyzed include the average number of search results returned, the number of exact matches in search results, matching accuracy, and execution time. The enhanced algorithm was compared against other fuzzy matching techniques, including Soundex, Levenstein, Jaccard, Jaro Distance, and the traditional Jaro-Winkler. Additionally, its performance was tested across three similarity thresholds: 0.7, 0.8, and 0.9. Results indicate that while the enhancements introduced additional execution time, they significantly improved the algorithm’s ability to handle text variations. The enhanced Jaro-Winkler achieved a matching accuracy between 99.90% and 100%, leading to an increase in relevant search results. Overall, it outperformed all other fuzzy matching algorithm’s, including its traditional counterpart, achieving an average matching accuracy of 99.83% across all texts conducted in this stud
942 _2lcc
_cMS
999 _c37419
_d37419