| 000 | 02194nam a22001817a 4500 | ||
|---|---|---|---|
| 003 | FT8935 | ||
| 005 | 20260112133338.0 | ||
| 050 | _aQA76.9 A43 B46 2025 | ||
| 100 | 1 | _aBenavides, Sean Lester C.; Masapol, Cid Antonio F. | |
| 245 | _aAn enhancement of Jiang, Z., et al.,’s compression-based classification algorithm applied to news article categorization | ||
| 264 | 1 | _cc2025 | |
| 300 | _bUndergraduate Thesis: (Bachelor of Science in Computer Science) - Pamantasan ng Lungsod ng Maynila, 2025 | ||
| 336 |
_2text _atext _btext |
||
| 337 |
_2 unmediated _a unmediated _b unmediated |
||
| 338 |
_2volume _avolume _bvolume |
||
| 505 | _aABSTRACT: This study enhances the compression-based classification algorithm proposed by Jiang et al., specifically for news article categorization, by improving classification accuracy and computational efficiency. The original algorithm faces challenges related to resilience against stopwords, limitations in detecting semantic similarities, and inefficiencies in classification due to the computational cost of k-nearest neighbors (k-NN). Online news sites for example, may miscategorize articles, affecting their searchability and hindering users’ ability to find relevant content. To address these issues, the study implements preprocessing techniques such as stopword removal and unigram extraction to refine feature selection and reduce redundancy. The gzip compression method is optimized to detect textual patterns more efficiently, improving classification performance. Additionally, the k-NN algorithm is replaced with Approximate Nearest Neighbors On Yeah! (ANNOY), significantly enhancing scalability and reducing execution time. Experiments conducted on multiple datasets demonstrate substantial improvements, with classification accuracy increasing by an average of 6.47% and processing time decreasing by an average of 2.45% in large datasets. These results highlight gzip’s effectiveness as a lightweight, training-free method for text classification. The proposed enhancements offer a practical and computationally efficient approach to news article categorization, particularly in resource-constrained environments. | ||
| 942 |
_2lcc _cMS |
||
| 999 |
_c37422 _d37422 |
||