000 05941nam a2201225Ia 4500
000 03772ntm a2200205 i 4500
001 87877
003 0
005 20250920173719.0
008 230216n 000 0 eng d
010 _z
_z
_o
_a
_b
015 _22
_a
016 _2
_2
_a
_z
020 _e
_e
_a
_b
_z
_c
_q
_x
022 _y
_y
_l
_a2
024 _2
_2
_d
_c
_a
_q
028 _a
_a
_b
029 _a
_a
_b
032 _a
_a
_b
035 _a
_a
_b
_z
_c
_q
037 _n
_n
_c
_a
_b
040 _e
_erda
_a
_d
_b
_c
041 _e
_e
_a
_b
_g
_h
_r
043 _a
_a
_b
045 _b
_b
_a
050 _a
_a
_d
_b2
_c0
051 _c
_c
_a
_b
055 _a
_a
_b
060 _a
_a
_b
070 _a
_a
_b
072 _2
_2
_d
_a
_x
082 _a
_a
_d
_b2
_c
084 _2
_2
_a
086 _2
_2
_a
090 _a
_a
_m
_b
_q
092 _f
_f
_a
_b
096 _a
_a
_b
097 _a
_a
_b
100 _e
_e
_aJasmine P. Laurente, Carla Johnica D. Quilop.
_d
_b4
_u
_c0
_q16
110 _e
_e
_a
_d
_b
_n
_c
_k
111 _a
_a
_d
_b
_n
_c
130 _s
_s
_a
_p
_f
_l
_k
210 _a
_a
_b
222 _a
_a
_b
240 _s
_s
_a
_m
_g
_n
_f
_l
_o
_p
_k
245 0 _a
_aAn enhancement of the gibberish classification algorithm for detecting gibberish content in text document /
_d
_b
_n
_cJasmine P. Laurente, Carla Johnica D. Quilop.
_h6
_p
246 _a
_a
_b
_n
_i
_f6
_p
249 _i
_i
_a
250 _6
_6
_a
_b
260 _e
_e
_a
_b
_f
_c
_g
264 _3
_3
_a
_d
_b
_cMarch 2016.46
300 _e
_e
_c28 cm.
_a92 pp.
_b
310 _a
_a
_b
321 _a
_a
_b
336 _b
_atext
_2rdacontent
337 _3
_30
_b
_aunmediated
_2rdamedia
338 _3
_30
_b
_avolume
_2rdacarrier
340 _2
_20
_g
_n
344 _2
_2
_a0
_b
347 _2
_2
_a0
362 _a
_a
_b
385 _m
_m
_a2
410 _t
_t
_b
_a
_v
440 _p
_p
_a
_x
_v
490 _a
_a
_x
_v
500 _a
_aThesis: (BSCS major in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2016.
_d
_b
_c56
504 _a
_a
_x
505 _a
_a
_b
_t
_g
_r
506 _a
_a5
510 _a
_a
_x
520 _b
_b
_c
_aABSTRACT: Gibberish Classification algorithm aims to detect whether the text is valid, or randomly typed in a keyboard. It returns a percentage where a low one means valid test, and a high one means gibberish text. If the result is lower than 50%, it's likely that the text is valid. If a result is higher than 50%, it's likely that the text is gibberish. The algorithm is optimized for the English Language and for longer text. It will still work for shorter text, for example is one sentence, but then the result will be less accurate. The algorithm won't give a percentage lower than 1%, except if the input string is null or empty, then it returns 0%. The algorithm checks three things. First, it checks whether the amount of unique characters, in a chunk of 35 characters, is in usual range, Second, if the amount of vowels in the letters is in usual range. Third, it checks whether the word/clear ration is in usual range. The final percentage will be computed based from these three things. The researchers purpose is to improve the Gibberish Classification Algorithm so it can be more accurate in giving the final percentage of how the text is. It will now based from the right spelling or structure of words in the English language and not by the range of unique characters, vowels and word/char ratio. Since the Gibberish Classification Algorithm is still in its early stage, so there are still some incorrect return values. There are still cases that the Gibberish Classification Algorithm produces a high percentage to a clearly valid sentence and conversely, for gibberish inputs, the algorithm sometimes produces a low percentage. While studying the existing gibberish classification algorithm, the researchers encountered these problems. First, words with correct spelling are being considered as gibberish with 15 out of 25 of the sample valid inputs are evaluated as gibberish (60% incorrect results). Second, the algorithm returns a valid percentage to sentences that uses numerous punctuation marks with 17 out of 25 of the sample invalid inputs are evaluated as valid (68% incorrect result). Third, words that uses mixed uppercase and lowercase letters are being considered as valid. In order to improve the existing algorithm, the researchers solution to the encountered problems are the following: First is to lessen the 60% incorrect results regarding into 10% by adding additional computation for words and for sentence. Second is to lessen the 68% incorrect results regarding punctuation into 18% by adding additional computation for punctuation marks. Third is to be able to check the case of the letters in each word and consider a word as gibberish if different cases is detected. Improving the Gibberish Classification Algorithm will be a great help to people, especially to English Proficiency teachers, and people who wants to detect if there are gibberish content in their documents.
_u
521 _a
_a
_b
533 _e
_e
_a
_d
_b
_n
_c
540 _c
_c
_a5
542 _g
_g
_f
546 _a
_a
_b
583 _5
_5
_k
_c
_a
_b
590 _a
_a
_b
600 _b
_b
_v
_t
_c2
_q
_a
_x0
_z
_d
_y
610 _b
_b
_v
_t2
_x
_a
_k0
_p
_z
_d6
_y
611 _a
_a
_d
_n2
_c0
_v
630 _x
_x
_a
_d
_p20
_v
648 _2
_2
_a
650 _x
_x
_a
_d
_b
_z
_y20
_v
651 _x
_x
_a
_y20
_v
_z
655 _0
_0
_a
_y2
_z
700 _i
_i
_t
_c
_b
_s1
_q
_f
_k40
_p
_d
_e
_a
_l
_n6
710 _b
_b
_t
_c
_e
_f
_k40
_p
_d5
_l
_n6
_a
711 _a
_a
_d
_b
_n
_t
_c
730 _s
_s
_a
_d
_n
_p
_f
_l
_k
740 _e
_e
_a
_d
_b
_n
_c6
753 _c
_c
_a
767 _t
_t
_w
770 _t
_t
_w
_x
773 _a
_a
_d
_g
_m
_t
_b
_v
_i
_p
775 _t
_t
_w
_x
776 _s
_s
_a
_d
_b
_z
_i
_t
_x
_h
_c
_w
780 _x
_x
_a
_g
_t
_w
785 _t
_t
_w
_a
_x
787 _x
_x
_d
_g
_i
_t
_w
800 _a
_a
_d
_l
_f
_t0
_q
_v
810 _a
_a
_b
_f
_t
_q
_v
830 _x
_x
_a
_p
_n
_l0
_v
942 _a
_alcc
_cBK
999 _c25386
_d25386