Masan, Jhon Patrick D.; Molon, Miriam Juliene F.

Enhancement of naïve Bayes classifier algorithm applied to email spam filtering - Undergraduate Thesis: (Bachelor of Science in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2025

ABSTRACT: This study focuses on enhancing the Naïve Bayes classifier for email spam detection by addressing its core limitations: high dimensionality, zero-probability issues, and class imbalance. Specifically, the enhancement aims to reduce the impact of high dimensionality through Term Frequency-Inverse Document Frequency (TF-IDF), resolve the zero-probability problem using Laplace Smoothing for more reliable probability estimation, and address class imbalance by applying the Synthetic Minority Over-sampling Techniques (SMOTE) to improve spam recognition. A labeled dataset of email messages was used for training and evaluation. Results showed that these enhancements significantly improved classification performance. Accuracy increased from 0.96 to 0.99, demonstrating overall improvement in correct predictions. The macro average F1 score rose from 0.90 to 0.98, indicating better balance across both classes. Recall for the spam class improved from 0.71 to 0.99, while its F1 score increased from 0.83 to 0.96, reflecting greater success in detecting and classifying spam. Cross-validation F1 scores also improved from approximately 0.91 to 0.99, confirming enhanced model generalization and stability. These findings demonstrate that the proposed enhancements make the Naïve Bayes classifier significantly more robust, accurate and effective for spam filtering application.

QA76.9 A43 M37 2025