An enhanced of Boyer-Moore algorithm applied to profanity detection in Filipino-text
- Undergraduate Thesis: (Bachelor of Science in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2025
ABSTRACT: The Boyer-Moore algorithm’s performance can degrade when handling large datasets or complex patterns, particularly due to high memory usage and inefficiencies in processing repetitive sequences. The algorithm requires storing large tables or arrays, which increases resource consumption. Additionally, it may struggle when the text or patterns contain repetitions, leading to increased mismatch checks and inefficient shifts. This study proposes an enhancement to the Boyer-Moore algorithm by integrating hash tables to improve efficiency in handling large datasets and complex patterns while addressing execution time and memory usage challenges. The proposed method optimizes pattern preprocessing and matchmaking redundant computations and improving data access. Performance evaluation on datasets ranging from 100,000 to over 5 million characters demonstrated a 16.39% improvement in execution time, with the enhanced algorithm processing 5,289,118 characters in 1.64 seconds compared to 1.96 seconds for the original. Additionally, memory usage was reduced by approximately 1,210 KB across different dataset sizes, highlighting its scalability and resource efficiency. The enhancement also improves performance in scenarios with repetitive characters or sequences by reducing the number of mismatch checks and optimizing shifts, ensuring better adaptability to diverse text structures. These improvements make the algorithm highly suitable for real-time applications, including content moderation, search engines, and large-scale data analytics. The findings suggest that industries reliant on fast and efficient text processing can adopt this enhanced algorithm to meet growing computational demands. Future research should explore hybrid models, advanced data structures, and multilingual datasets to further optimize performance and validate its adaptability across different domains. The broader implications include supporting industries with scalable analytics tools, enhancing real-time processing capabilities, and fostering safer online environments through more effective content moderation.