Research Article Open Access

A Hybridized BERT-Based Approach for Crime News Collection and Classification from Online Newspapers

Ashour Ali1, Shahrul Azman Mohd Noah1, Lailatul Qadri Zakaria1 and Saeed Amer Al Ameri1
  • 1 Centre for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia

Abstract

Crime news analysis is crucial for understanding criminal activity, enhancing public safety, and informing policy decisions. The exponential growth and unstructured nature of online news articles, however, present significant challenges for efficient and accurate information extraction. This study aims to enhance the efficiency and accuracy of crime news data collection and classification through advanced Natural Language Processing (NLP) techniques and pre-trained language models. We propose a hybridized approach that combines topic modelling, an external knowledge base, and a BERT-based pre-trained model fine-tuned specifically for crime-related content. Our comprehensive experiments demonstrate that this method significantly outperforms existing models, achieving a new state-of-the-art result with a 0.58% increase in accuracy for crime news classification. These findings underscore the practical applicability of our approach in real-world scenarios for improving public safety and crime awareness.

Journal of Computer Science
Volume 21 No. 9, 2025, 2000-2015

DOI: https://doi.org/10.3844/jcssp.2025.2000.2015

Submitted On: 30 August 2024 Published On: 16 October 2025

How to Cite: Ali, A., Noah, S. A. M., Zakaria, L. Q. & Al Ameri, S. A. (2025). A Hybridized BERT-Based Approach for Crime News Collection and Classification from Online Newspapers. Journal of Computer Science, 21(9), 2000-2015. https://doi.org/10.3844/jcssp.2025.2000.2015

  • 234 Views
  • 109 Downloads
  • 0 Citations

Download

Keywords

  • BERT
  • Crime News Classification
  • Natural Language Processing
  • Web Scraping
  • Topic Modeling
  • Knowledge Bases
  • Deep Learning
  • Text Classification
  • Data Filtering
  • Online News