Normal view MARC view ISBD view

An enhancement of the gibberish classification algorithm for detecting gibberish content in text document / Jasmine P. Laurente, Carla Johnica D. Quilop. 6

By: Jasmine P. Laurente, Carla Johnica D. Quilop. 4 0 16, [, ] | [, ] |

Contributor(s): 5 6 [] |

Language: Unknown language code Summary language: Unknown language code Original language: Unknown language code Series: ; March 2016.46Edition: Description: 28 cm. 92 ppContent type: text Media type: unmediated Carrier type: volumeISBN: ISSN: 2Other title: 6 []Uniform titles: | | Related works: 1 40 6 []Subject(s): -- 2 -- 0 -- -- | -- 2 -- 0 -- 6 -- | 2 0 -- | -- -- 20 -- | | -- -- -- -- 20 -- | -- -- -- 20 -- --Genre/Form: -- 2 -- Additional physical formats: DDC classification: | LOC classification: | | 2Other classification:

Contents:

Action note: In: Summary: ABSTRACT: Gibberish Classification algorithm aims to detect whether the text is valid, or randomly typed in a keyboard. It returns a percentage where a low one means valid test, and a high one means gibberish text. If the result is lower than 50%, it's likely that the text is valid. If a result is higher than 50%, it's likely that the text is gibberish. The algorithm is optimized for the English Language and for longer text. It will still work for shorter text, for example is one sentence, but then the result will be less accurate. The algorithm won't give a percentage lower than 1%, except if the input string is null or empty, then it returns 0%. The algorithm checks three things. First, it checks whether the amount of unique characters, in a chunk of 35 characters, is in usual range, Second, if the amount of vowels in the letters is in usual range. Third, it checks whether the word/clear ration is in usual range. The final percentage will be computed based from these three things. The researchers purpose is to improve the Gibberish Classification Algorithm so it can be more accurate in giving the final percentage of how the text is. It will now based from the right spelling or structure of words in the English language and not by the range of unique characters, vowels and word/char ratio. Since the Gibberish Classification Algorithm is still in its early stage, so there are still some incorrect return values. There are still cases that the Gibberish Classification Algorithm produces a high percentage to a clearly valid sentence and conversely, for gibberish inputs, the algorithm sometimes produces a low percentage. While studying the existing gibberish classification algorithm, the researchers encountered these problems. First, words with correct spelling are being considered as gibberish with 15 out of 25 of the sample valid inputs are evaluated as gibberish (60% incorrect results). Second, the algorithm returns a valid percentage to sentences that uses numerous punctuation marks with 17 out of 25 of the sample invalid inputs are evaluated as valid (68% incorrect result). Third, words that uses mixed uppercase and lowercase letters are being considered as valid. In order to improve the existing algorithm, the researchers solution to the encountered problems are the following: First is to lessen the 60% incorrect results regarding into 10% by adding additional computation for words and for sentence. Second is to lessen the 68% incorrect results regarding punctuation into 18% by adding additional computation for punctuation marks. Third is to be able to check the case of the letters in each word and consider a word as gibberish if different cases is detected. Improving the Gibberish Classification Algorithm will be a great help to people, especially to English Proficiency teachers, and people who wants to detect if there are gibberish content in their documents. Other editions:

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings ( 1 )
Title notes
Comments ( 0 )

Item type	Current location	Home library	Collection	Call number	Status	Date due	Barcode	Item holds
Book	PLM	PLM Filipiniana Section	Filipiniana-Thesis	T QA76.9.L38.2016 (Browse shelf)	Available		FT6071

Total holds: 0

Browsing PLM Shelves , Shelving location: Filipiniana Section , Collection code: Filipiniana-Thesis Close shelf browser

Previous	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	Next
Previous	T QA76.9.H85.A23 2024 An Enhancement of audio-based completely automated public turing test to tell computer and human apart (CAPTCHA) algorithm in web security applied in PLM admission portal	T QA76.9.J36.2017 An enhancement of the burrows-wheeler compression algorithm applied in BZIP2 /	T QA76.9.J38.2023 Enhanced profile hidden markov model for the metamorphic malware detection /	T QA76.9.L38.2016 An enhancement of the gibberish classification algorithm for detecting gibberish content in text document /	T QA76.9.L56.2017 An enhancement of RSA Algorithm in data encryption applied in a LAN messaging application /	T QA76.9.L66.2017 Enhancement of MD5 cryptographic hash algorithm /	T QA76.9.L67.2019 Shortstraw algorithm applied in draw-a-person test using android mobile applicants /	Next

Thesis: (BSCS major in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2016. 56

ABSTRACT: Gibberish Classification algorithm aims to detect whether the text is valid, or randomly typed in a keyboard. It returns a percentage where a low one means valid test, and a high one means gibberish text. If the result is lower than 50%, it's likely that the text is valid. If a result is higher than 50%, it's likely that the text is gibberish. The algorithm is optimized for the English Language and for longer text. It will still work for shorter text, for example is one sentence, but then the result will be less accurate. The algorithm won't give a percentage lower than 1%, except if the input string is null or empty, then it returns 0%. The algorithm checks three things. First, it checks whether the amount of unique characters, in a chunk of 35 characters, is in usual range, Second, if the amount of vowels in the letters is in usual range. Third, it checks whether the word/clear ration is in usual range. The final percentage will be computed based from these three things. The researchers purpose is to improve the Gibberish Classification Algorithm so it can be more accurate in giving the final percentage of how the text is. It will now based from the right spelling or structure of words in the English language and not by the range of unique characters, vowels and word/char ratio. Since the Gibberish Classification Algorithm is still in its early stage, so there are still some incorrect return values. There are still cases that the Gibberish Classification Algorithm produces a high percentage to a clearly valid sentence and conversely, for gibberish inputs, the algorithm sometimes produces a low percentage. While studying the existing gibberish classification algorithm, the researchers encountered these problems. First, words with correct spelling are being considered as gibberish with 15 out of 25 of the sample valid inputs are evaluated as gibberish (60% incorrect results). Second, the algorithm returns a valid percentage to sentences that uses numerous punctuation marks with 17 out of 25 of the sample invalid inputs are evaluated as valid (68% incorrect result). Third, words that uses mixed uppercase and lowercase letters are being considered as valid. In order to improve the existing algorithm, the researchers solution to the encountered problems are the following: First is to lessen the 60% incorrect results regarding into 10% by adding additional computation for words and for sentence. Second is to lessen the 68% incorrect results regarding punctuation into 18% by adding additional computation for punctuation marks. Third is to be able to check the case of the letters in each word and consider a word as gibberish if different cases is detected. Improving the Gibberish Classification Algorithm will be a great help to people, especially to English Proficiency teachers, and people who wants to detect if there are gibberish content in their documents.

There are no comments for this item.

to post a comment.