Enhancement of naïve Bayes classifier algorithm applied to email spam filtering

By: Masan, Jhon Patrick D.; Molon, Miriam Juliene F
Publisher: c2025Description: Undergraduate Thesis: (Bachelor of Science in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2025Content type: text Media type: unmediated Carrier type: volumeLOC classification: QA76.9 A43 M37 2025
Contents:
ABSTRACT: This study focuses on enhancing the Naïve Bayes classifier for email spam detection by addressing its core limitations: high dimensionality, zero-probability issues, and class imbalance. Specifically, the enhancement aims to reduce the impact of high dimensionality through Term Frequency-Inverse Document Frequency (TF-IDF), resolve the zero-probability problem using Laplace Smoothing for more reliable probability estimation, and address class imbalance by applying the Synthetic Minority Over-sampling Techniques (SMOTE) to improve spam recognition. A labeled dataset of email messages was used for training and evaluation. Results showed that these enhancements significantly improved classification performance. Accuracy increased from 0.96 to 0.99, demonstrating overall improvement in correct predictions. The macro average F1 score rose from 0.90 to 0.98, indicating better balance across both classes. Recall for the spam class improved from 0.71 to 0.99, while its F1 score increased from 0.83 to 0.96, reflecting greater success in detecting and classifying spam. Cross-validation F1 scores also improved from approximately 0.91 to 0.99, confirming enhanced model generalization and stability. These findings demonstrate that the proposed enhancements make the Naïve Bayes classifier significantly more robust, accurate and effective for spam filtering application.
Tags from this library: No tags from this library for this title. Log in to add tags.
    Average rating: 0.0 (0 votes)

ABSTRACT: This study focuses on enhancing the Naïve Bayes classifier for email spam detection by addressing its core limitations: high dimensionality, zero-probability issues, and class imbalance. Specifically, the enhancement aims to reduce the impact of high dimensionality through Term Frequency-Inverse Document Frequency (TF-IDF), resolve the zero-probability problem using Laplace Smoothing for more reliable probability estimation, and address class imbalance by applying the Synthetic Minority Over-sampling Techniques (SMOTE) to improve spam recognition. A labeled dataset of email messages was used for training and evaluation. Results showed that these enhancements significantly improved classification performance. Accuracy increased from 0.96 to 0.99, demonstrating overall improvement in correct predictions. The macro average F1 score rose from 0.90 to 0.98, indicating better balance across both classes. Recall for the spam class improved from 0.71 to 0.99, while its F1 score increased from 0.83 to 0.96, reflecting greater success in detecting and classifying spam. Cross-validation F1 scores also improved from approximately 0.91 to 0.99, confirming enhanced model generalization and stability. These findings demonstrate that the proposed enhancements make the Naïve Bayes classifier significantly more robust, accurate and effective for spam filtering application.

There are no comments for this item.

to post a comment.

© Copyright 2024 Phoenix Library Management System - Pinnacle Technologies, Inc. All Rights Reserved.