Enhanced named entity recognition algorithm for Filipino cultural and heritage texts

By: Robantes, Jhan Lou P.; Serrano, Andreo A
Language: English Publisher: . . c2025Description: Undergraduate Thesis: (Bachelor of Science in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2025Content type: text Media type: unmediated Carrier type: volumeGenre/Form: academic writingDDC classification: . LOC classification: QA76.9 A43 R63 2025
Contents:
ABSTRACT: Named Entity Recognition (NER) is a crucial natural language processing task that extracts and classifies named entities from unstructured text into predefined categories. While existing NER methods have shown success in general domains, they often face significant challenges when applied to specialized context like Filipino cultural and historical texts. These challenges stem from unique linguistics features, and diverse naming conventions. This research introduces an enhanced rule-based NER approach that specifically addresses these challenges. At its core, the system utilizes curated Corpus of Historical Filipino and Philippine English (COHFIE), which serves as both training and evaluation data. This research an enhanced rule-based approach for NER using a Corpus of Historical Filipino and Philippine English (COHFIE) building on pattern-learning methods, incorporating character and token features, and by using positive and negative example sets. To enrich the classification process, we used the International Committee for Documentation – Conceptual Reference Model (CIDOC-CRM), a cultural heritage framework, to provide a more nuanced categorization of entities based on their historical and cultural significance. Tested across existing Filipino based models (calamanCy and RoBERTa), the enhanced model shows improvement on identifying entities related to Filipino culture (CUL) and history terms (PER, ORG, LOC).
Tags from this library: No tags from this library for this title. Log in to add tags.
    Average rating: 0.0 (0 votes)
Item type Current location Home library Collection Call number Status Date due Barcode Item holds
Thesis/Dissertation PLM
PLM
Filipiniana Section
Filipiniana-Thesis QA76.9 A43 R63 2025 (Browse shelf) Available FT8874
Total holds: 0

ABSTRACT: Named Entity Recognition (NER) is a crucial natural language processing task that extracts and classifies named entities from unstructured text into predefined categories. While existing NER methods have shown success in general domains, they often face significant challenges when applied to specialized context like Filipino cultural and historical texts. These challenges stem from unique linguistics features, and diverse naming conventions. This research introduces an enhanced rule-based NER approach that specifically addresses these challenges. At its core, the system utilizes curated Corpus of Historical Filipino and Philippine English (COHFIE), which serves as both training and evaluation data. This research an enhanced rule-based approach for NER using a Corpus of Historical Filipino and Philippine English (COHFIE) building on pattern-learning methods, incorporating character and token features, and by using positive and negative example sets. To enrich the classification process, we used the International Committee for Documentation – Conceptual Reference Model (CIDOC-CRM), a cultural heritage framework, to provide a more nuanced categorization of entities based on their historical and cultural significance. Tested across existing Filipino based models (calamanCy and RoBERTa), the enhanced model shows improvement on identifying entities related to Filipino culture (CUL) and history terms (PER, ORG, LOC).

Filipiniana

There are no comments for this item.

to post a comment.

© Copyright 2024 Phoenix Library Management System - Pinnacle Technologies, Inc. All Rights Reserved.