Jerhica Kim T. Canaya and Jolina P. Escolano.

A further enhancement of Paul Graham's Batesian algorithm applied in spam filtering - Undergraduate Thesis: (BSCS major in COmputer Science) -Pamantasan ng Lungsod ng Maynila, 2017.

ABSTRACT: Nowadays, spammers are just in a corner, sending random and irrelevant mails to our-e-mails, considering that we need to check our received mails for the day. These spam mails may contain malicious words or attachment, links that redirects you to an unwanted website, and some links contain viruses that can harm your computer without even knowing it. These are threats to users, spammers can get information just by simply opening the mail they sent. This research paper presents a variation of token to consider that may use in filtering and number of token to test. This will be beneficial to all who’s using the email to send messages, these may prevent the user in having unnecessary files and viruses attached to the email. We have done manual simulations and computerized simulation to know the possible result of mail stricter. The Paul Graham’s Bayesian algorithm is a machine learning algorithm that trains and classifies a data or a token with different score, we therefore conclude that considering HTML tags and multiple word as token use to determine whether it is from a spam ail or non-spam mail is much effective. In applying this algorithm to spam filtering stricter, we can distinguish whether our received mail is a spam or not.

5

ABSTRACT: Nowadays, spammers are just in a corner, sending random and irrelevant mails to our-e-mails, considering that we need to check our received mails for the day. These spam mails may contain malicious words or attachment, links that redirects you to an unwanted website, and some links contain viruses that can harm your computer without even knowing it. These are threats to users, spammers can get information just by simply opening the mail they sent. This research paper presents a variation of token to consider that may use in filtering and number of token to test. This will be beneficial to all who's using the email to send messages, these may prevent the user in having unnecessary files and viruses attached to the email. We have done manual simulations and computerized simulation to know the possible result of mail stricter. The Paul Graham's Bayesian algorithm is a machine learning algorithm that trains and classifies a data or a token with different score, we therefore conclude that considering HTML tags and multiple word as token use to determine whether it is from a spam ail or non-spam mail is much effective. In applying this algorithm to spam filtering stricter, we can distinguish whether our received mail is a spam or not.

5

Index Terms--Genre/Form:
academic writing

LC Class. No.: QA76.9 C36 2017

Dewey Class. No.: / 2