Research Article Open Access

Comparative Study: Algorithms for Short Message Service Classification

Evaristus Didik Madyatmadja1, Aldi1, Fiona Fheren1, Helen Angelica1, Hanny Juwitasary1 and David Jumpa Malem Sembiring2
  • 1 Department of Information Systems, School of Information Systems, Bina Nusantara University, Jakarta, Indonesia
  • 2 Teknik Informatika, Institut Teknologi Dan Bisnis Indonesia, Medan, Indonesia

Abstract

This research aims to classify Short Message Service (SMS) data by applying classification models that have studied SMS data to classify SMS data into SMS spam and SMS ham. The classification model is made from data mining algorithms: Naive Bayes and support vector machine. Before implementing the two algorithms, the SMS data will go through a text preprocessing stage, including data cleaning (whitespace removal, removal of punctuation, and removal of numbers), case folding, stemming, tokenizing, and stop word removal. In this research, a comparison of the accuracy of the two data mining methods will be carried out to see and get the best classification algorithm. Researchers also implemented several experiments by comparing the use of testing data by 20 and 30% and comparing the application of preprocessing stemming and without stemming. This study found that the support vector machine algorithm using testing data of 20% by applying the stemming stage had the highest accuracy rate, 97.5%.

Journal of Computer Science
Volume 19 No. 11, 2023, 1333-1344

DOI: https://doi.org/10.3844/jcssp.2023.1333.1344

Submitted On: 9 February 2023 Published On: 29 September 2023

How to Cite: Madyatmadja, E. D., Aldi, ., Fheren, F., Angelica, H., Juwitasary, H. & Sembiring, D. J. M. (2023). Comparative Study: Algorithms for Short Message Service Classification. Journal of Computer Science, 19(11), 1333-1344. https://doi.org/10.3844/jcssp.2023.1333.1344

  • 1,313 Views
  • 803 Downloads
  • 0 Citations

Download

Keywords

  • SMS Spam
  • SMS HAM
  • Naive Bayes
  • Support Vector Machine
  • Classification
  • Data Mining
  • Text Mining