Enhancement to low-resource text classification via sequential transfer learning / Neil Christian R. Riego. Danny Bell Villarba. 6

By: Neil Christian R. Riego. Danny Bell Villarba. 4 0 16, [, ] | [, ] |
Contributor(s): 5 6 [] |
Language: Unknown language code Summary language: Unknown language code Original language: Unknown language code Series: ; June 2023.46Edition: Description: 28 cm. 91 ppContent type: text Media type: unmediated Carrier type: volumeISBN: ISSN: 2Other title: 6 []Uniform titles: | | Related works: 1 40 6 []Subject(s): -- 2 -- 0 -- -- | -- 2 -- 0 -- 6 -- | 2 0 -- | -- -- 20 -- | | -- -- -- -- 20 -- | -- -- -- 20 -- --Genre/Form: -- 2 -- Additional physical formats: DDC classification: | LOC classification: | | 2Other classification:
Contents:
Action note: In: Summary: ABSTRACT: Textual data on many platforms has increased dramatically in recent years. With this amount of data, anyone may do text classification, such as sentiment analysis and hatespeech recognition. However, the lack of various NLP Tools in low-resource areas such as Asia and Africa limits its ability to be leveraged. We provided three (3) contributions. First, we provided a Tagalog product review dataset as a baseline for sentiment analysis task. Second, we pretrained and finetuned a Tagalog variation of XLNet in two datasets, reaching 78.05% accuracy in the hatespeech dataset and 95.02% in the shopee, which is 0.33% and 3.87% higher than the benchmark RoBERTa-tagalog model, repectively. Third, in the finetuning step, an improvement using bootstrap aggregation (bagging) is implemented, which boosts accuracy by 0.16% when 70% of the data is used in finetuning three XLNet-Tagalog models. Furthermore, combining RoBERTa-Tagalog and XLNet-Tagalog finetuned in 100% of data results in an accuracy of 74.47%, a 1.26% improvement over the best-performing setup using the XLNet-Tagalog. Finally, the XLNet Tagalog degrades slower than the benchmark model by 4.53. We make all our models and datasets available to the research community. Other editions:
Tags from this library: No tags from this library for this title. Log in to add tags.
    Average rating: 0.0 (0 votes)
Item type Current location Home library Collection Call number Status Date due Barcode Item holds
Book PLM
PLM
Filipiniana Section
Filipiniana-Thesis QA76.9.A43 R54 2023 (Browse shelf) Available FT7754
Total holds: 0

Undergraduate Thesis: (Bachelor of Science in Computer Science) Pamantasan ng Lungsod ng Maynila, 2023. 56

5

ABSTRACT: Textual data on many platforms has increased dramatically in recent years. With this amount of data, anyone may do text classification, such as sentiment analysis and hatespeech recognition. However, the lack of various NLP Tools in low-resource areas such as Asia and Africa limits its ability to be leveraged. We provided three (3) contributions. First, we provided a Tagalog product review dataset as a baseline for sentiment analysis task. Second, we pretrained and finetuned a Tagalog variation of XLNet in two datasets, reaching 78.05% accuracy in the hatespeech dataset and 95.02% in the shopee, which is 0.33% and 3.87% higher than the benchmark RoBERTa-tagalog model, repectively. Third, in the finetuning step, an improvement using bootstrap aggregation (bagging) is implemented, which boosts accuracy by 0.16% when 70% of the data is used in finetuning three XLNet-Tagalog models. Furthermore, combining RoBERTa-Tagalog and XLNet-Tagalog finetuned in 100% of data results in an accuracy of 74.47%, a 1.26% improvement over the best-performing setup using the XLNet-Tagalog. Finally, the XLNet Tagalog degrades slower than the benchmark model by 4.53. We make all our models and datasets available to the research community.

5

There are no comments for this item.

to post a comment.

© Copyright 2024 Phoenix Library Management System - Pinnacle Technologies, Inc. All Rights Reserved.