An enhancement of the Jaro-Winkler fuzzy searching algorithm applied in library search engine

By: Malaga, Karl Benedict K.; Verdillo, Korinne L
Publisher: c2025Description: Undergraduate Thesis: (Bachelor of Science in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2025Content type: text Media type: unmediated Carrier type: volumeLOC classification: QA76.9 A43 M35 2025
Contents:
ABSTRACT: The Jaro-Winkler algorithm is an approximate string-matching algorithm that calculates the similarity between two strings based on matching characters within a defined window (Ranzjin, 2013). While the algorithm generally performs well in string matching, it has several limitations. When comparing a shorter string that matches a later portion of a longer string, it often receives a lower similarity score, causing it to be excluded from the results despite being a valid match. Similarity, when comparing longer strings where words appear in different orders, the algorithm assigns a lower score, leading to incorrect exclusions. Additionally, the Winkler scale introduces a bias by favoring words with similar prefixes. This study enhances the Jaro-Winkler algorithm to improve its accuracy in handling text variations and mitigating bias, particularly in a library search engine. The enhancements integrate a Rabin-Karp-inspired technique, a Jaccard similarity method, and a suffix weighting method to better recognize relevant matches. A dataset of over 100,000 book titles obtained from Kaggle was used to evaluate the enhanced Jaro-Winkler’s performance. Key ,metrics analyzed include the average number of search results returned, the number of exact matches in search results, matching accuracy, and execution time. The enhanced algorithm was compared against other fuzzy matching techniques, including Soundex, Levenstein, Jaccard, Jaro Distance, and the traditional Jaro-Winkler. Additionally, its performance was tested across three similarity thresholds: 0.7, 0.8, and 0.9. Results indicate that while the enhancements introduced additional execution time, they significantly improved the algorithm’s ability to handle text variations. The enhanced Jaro-Winkler achieved a matching accuracy between 99.90% and 100%, leading to an increase in relevant search results. Overall, it outperformed all other fuzzy matching algorithm’s, including its traditional counterpart, achieving an average matching accuracy of 99.83% across all texts conducted in this stud
Tags from this library: No tags from this library for this title. Log in to add tags.
    Average rating: 0.0 (0 votes)
Item type Current location Home library Collection Call number Status Date due Barcode Item holds
Thesis/Dissertation PLM
PLM
Filipiniana Section
Filipiniana-Thesis QA76.9 A43 M35 2025 (Browse shelf) Available FT8937
Total holds: 0

ABSTRACT: The Jaro-Winkler algorithm is an approximate string-matching algorithm that calculates the similarity between two strings based on matching characters within a defined window (Ranzjin, 2013). While the algorithm generally performs well in string matching, it has several limitations. When comparing a shorter string that matches a later portion of a longer string, it often receives a lower similarity score, causing it to be excluded from the results despite being a valid match. Similarity, when comparing longer strings where words appear in different orders, the algorithm assigns a lower score, leading to incorrect exclusions. Additionally, the Winkler scale introduces a bias by favoring words with similar prefixes. This study enhances the Jaro-Winkler algorithm to improve its accuracy in handling text variations and mitigating bias, particularly in a library search engine. The enhancements integrate a Rabin-Karp-inspired technique, a Jaccard similarity method, and a suffix weighting method to better recognize relevant matches. A dataset of over 100,000 book titles obtained from Kaggle was used to evaluate the enhanced Jaro-Winkler’s performance. Key ,metrics analyzed include the average number of search results returned, the number of exact matches in search results, matching accuracy, and execution time. The enhanced algorithm was compared against other fuzzy matching techniques, including Soundex, Levenstein, Jaccard, Jaro Distance, and the traditional Jaro-Winkler. Additionally, its performance was tested across three similarity thresholds: 0.7, 0.8, and 0.9. Results indicate that while the enhancements introduced additional execution time, they significantly improved the algorithm’s ability to handle text variations. The enhanced Jaro-Winkler achieved a matching accuracy between 99.90% and 100%, leading to an increase in relevant search results. Overall, it outperformed all other fuzzy matching algorithm’s, including its traditional counterpart, achieving an average matching accuracy of 99.83% across all texts conducted in this stud

There are no comments for this item.

to post a comment.

© Copyright 2024 Phoenix Library Management System - Pinnacle Technologies, Inc. All Rights Reserved.