000 02202nam a22001817a 4500
003 FT8933
005 20260112135035.0
050 _aQA76.9 N38 M34 2025
100 1 _aMagno, Martin Angelo M.; Pangindian, Elmer Joaqi F.
245 _aEnhancement of Bert model with predictive text generation and attention mechanisms for improvements in text autocomplete
264 1 _cc2025
300 _bUndergraduate Thesis: (Bachelor of Science in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2025
336 _2text
_atext
_btext
337 _2text
_atext
_btext
338 _2 unmediated
_a unmediated
_b unmediated
505 _aABSTRACT: Transformer-based fine-tuning strategies have shown effectiveness in-low resource and low-data context. Still, the lack of properly established baselines and benchmark datasets makes it difficult to compare different ways of dealing with low-resource settings. This research consists of three contributions. First, two previously unreleased datasets serve as benchmarks for text classification and low-resource multilabel text classification in the Filipino language. Second, the improved BERT models were pre-trained for application in the Filipino context. Third, presentation of a simple degradation text that measures a model’s susceptibility to performance decline as the number of training samples decreases. The deterioration rates of the pre-trained model and the consideration of using this method to compare models designed for low-resource environments are examined. The findings show that BERT models, even when distilled into a smaller model for low-resource contexts, maintain high performance with a minimum fine-tuning and degrade slowly in low-data conditions, making them well-suited to such restriction. There has also been research showing effective transfer from supervised tasks with large datasets, such as natural language inference (Conneau et al., 2022) and machine translation (McCann et al., 2022). Computer vision research has also demonstrated the importance of transfer learning from large pre-trained models, where an effective recipe is to fine-tune models pre-trained with ImageNet (Yosinski et. al., 2019).
942 _2ddc
_cMS
999 _c37424
_d37424