An Enhancement of tagalog Stemming Algorithm (TAGSA ) applied in Tagalog Dictionary Searching

By: Matthew Paulo B. Buena; Mary Ruth J. Datu and Abegail C. Estobanez
Language: English Publisher: . . c2015Description: Undergraduate Thesis(BSCS major in Computer Science) - Pamantasan ng Lungsod ng Maynila. 2015Content type: text Media type: unmediated Carrier type: volumeGenre/Form: academic writingDDC classification: . LOC classification: QA76.9 B84 2015
Contents:
ABSTRACT: Tagalog Stemming Algorithm or TagSa is an algorithm developed for all forms of Tagalog words as input. It basically used in information retrieval systems to improve performance. In this study it is used as a morphological analyser that extract the root words from Filipino words conjugated in different forms as inputs and produces affixes used and the tenses of the original input word. Studying and analysing the original and existing algorithm, the researchers found three problems that exist in the original algorithm in terms of deriving the root word. The researchers formulated three specific objectives to solve the observed problems. 1. To solve the understemming and overstemming error of the existing algorithm. 2. To provide rules in the Partial Reduplication Routine for words that starts with consonant blends. 3. To include sub-steps in the Context Sensitive Attribute for words that should end with “o”. This study applied descriptive research method and used surveying by creating questionnaire/as an instrument in order to gather data from the target population. The collected data is processed according to the requirement of the study. This method helped the researchers to improve the existing algorithm. The researchers conducted intensive research method in order to provide enhanced algorithm to solve the observed problems in the original algorithm and accomplish the objectives. For the first problem and objective, the researchers improved the stemming methods of Tagalog words to minimize the understemming and overstemming errors. For the second problem and objective, the researchers added sub-steps in the Partial Reduplication Routine that cater the first syllable with a cluster of consonants to produce the correct stem. There can be two ways to solve this depending on the input word, either it reduplicates the first consonant and the first vowel of the stem, or it reduplicates the cluster of consonants including the succeeding vowel of the stem. For the third problem and objective, the enhanced algorithm provided steps to check if the stemmed word should originally end with “o” and if the end of the stemmed word should be change with “u”. The existing and enhanced algorithm was compared using the simulator and a Tagalog Dictionary Searching application was developed by using the enhanced algorithm.
Tags from this library: No tags from this library for this title. Log in to add tags.
    Average rating: 0.0 (0 votes)
Item type Current location Home library Collection Call number Status Date due Barcode Item holds
Archival materials PLM
PLM
Archives
Filipiniana-Thesis QA76.9 B84 2015 (Browse shelf) Available FT6103
Total holds: 0

ABSTRACT: Tagalog Stemming Algorithm or TagSa is an algorithm developed for all forms of Tagalog words as input. It basically used in information retrieval systems to improve performance. In this study it is used as a morphological analyser that extract the root words from Filipino words conjugated in different forms as inputs and produces affixes used and the tenses of the original input word. Studying and analysing the original and existing algorithm, the researchers found three problems that exist in the original algorithm in terms of deriving the root word. The researchers formulated three specific objectives to solve the observed problems. 1. To solve the understemming and overstemming error of the existing algorithm. 2. To provide rules in the Partial Reduplication Routine for words that starts with consonant blends. 3. To include sub-steps in the Context Sensitive Attribute for words that should end with “o”. This study applied descriptive research method and used surveying by creating questionnaire/as an instrument in order to gather data from the target population. The collected data is processed according to the requirement of the study. This method helped the researchers to improve the existing algorithm. The researchers conducted intensive research method in order to provide enhanced algorithm to solve the observed problems in the original algorithm and accomplish the objectives. For the first problem and objective, the researchers improved the stemming methods of Tagalog words to minimize the understemming and overstemming errors. For the second problem and objective, the researchers added sub-steps in the Partial Reduplication Routine that cater the first syllable with a cluster of consonants to produce the correct stem. There can be two ways to solve this depending on the input word, either it reduplicates the first consonant and the first vowel of the stem, or it reduplicates the cluster of consonants including the succeeding vowel of the stem. For the third problem and objective, the enhanced algorithm provided steps to check if the stemmed word should originally end with “o” and if the end of the stemmed word should be change with “u”. The existing and enhanced algorithm was compared using the simulator and a Tagalog Dictionary Searching application was developed by using the enhanced algorithm.

Filipiniana

There are no comments for this item.

to post a comment.

© Copyright 2024 Phoenix Library Management System - Pinnacle Technologies, Inc. All Rights Reserved.