Enhancement of logistic regression algorithm applied in email spam detection

By: Carlos, Vince Anthony S.; Pancho, John Cedric C
Publisher: c2025Description: Undergraduate Thesis: (Bachelor of Science in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2025 Carrier type: volumeLOC classification: QA76.9 A43 C37 2025
Contents:
ABSTRACT: Logistic regression is a popular binary classification approach, but like any machine learning algorithms, it has its limitations and possible concerns such as class imbalance, large datasets, and overfitting, which reduce its accuracy and efficiency. This study enhanced the Logistic Regression algorithm’s performance for email spam detection by addressing these problems using the techniques of Term Frequency-Inverse Document Frequency (TF-IDF) for class imbalance, Recursive Feature Elimination (RFE) for large datasets, and Principal Component Analysis (PCA) for overfitting concerns, TF-IDF improves feature representation, highlighting key terms that differentiate spam from non-spam. RFE systematically eliminates irrelevant features, reducing computational complexity and enhancing efficiency, particularly for large datasets, PCA mitigates overfitting by reducing the dimensionality of feature spaces, ensuring the model generalizes effectively to unseen data. Experimental results showed that the enhanced Logistic Regression model demonstrated a significant improvement in spam detection accuracy, achieving up to 98% accuracy with TF-IDF compared to the baseline model’s 91% RFE reduced training time by 35% while maintaining robust performance on large datasets, and PCA improved model generalization by reducing variance in predictions. The proposed enhancements successfully address the key limitations of traditional Logistic Regression models in spam detection. This refined approach improves predictive accuracy, computational efficiency, and robustness, making it highly applicable to real-world email security systems, and enhancing spam filtering effectiveness.
Tags from this library: No tags from this library for this title. Log in to add tags.
    Average rating: 0.0 (0 votes)
Item type Current location Home library Collection Call number Status Date due Barcode Item holds
Thesis/Dissertation PLM
PLM
Filipiniana Section
Filipiniana-Thesis QA76.9 A43 C37 2025 (Browse shelf) Available FT8905
Total holds: 0

ABSTRACT: Logistic regression is a popular binary classification approach, but like any machine learning algorithms, it has its limitations and possible concerns such as class imbalance, large datasets, and overfitting, which reduce its accuracy and efficiency. This study enhanced the Logistic Regression algorithm’s performance for email spam detection by addressing these problems using the techniques of Term Frequency-Inverse Document Frequency (TF-IDF) for class imbalance, Recursive Feature Elimination (RFE) for large datasets, and Principal Component Analysis (PCA) for overfitting concerns, TF-IDF improves feature representation, highlighting key terms that differentiate spam from non-spam. RFE systematically eliminates irrelevant features, reducing computational complexity and enhancing efficiency, particularly for large datasets, PCA mitigates overfitting by reducing the dimensionality of feature spaces, ensuring the model generalizes effectively to unseen data. Experimental results showed that the enhanced Logistic Regression model demonstrated a significant improvement in spam detection accuracy, achieving up to 98% accuracy with TF-IDF compared to the baseline model’s 91% RFE reduced training time by 35% while maintaining robust performance on large datasets, and PCA improved model generalization by reducing variance in predictions. The proposed enhancements successfully address the key limitations of traditional Logistic Regression models in spam detection. This refined approach improves predictive accuracy, computational efficiency, and robustness, making it highly applicable to real-world email security systems, and enhancing spam filtering effectiveness.

There are no comments for this item.

to post a comment.

© Copyright 2024 Phoenix Library Management System - Pinnacle Technologies, Inc. All Rights Reserved.