Comparative Study: Algorithms for Short Message Service Classification
- 1 Department of Information Systems, School of Information Systems, Bina Nusantara University, Jakarta, Indonesia
- 2 Teknik Informatika, Institut Teknologi Dan Bisnis Indonesia, Medan, Indonesia
Abstract
This research aims to classify Short Message Service (SMS) data by applying classification models that have studied SMS data to classify SMS data into SMS spam and SMS ham. The classification model is made from data mining algorithms: Naive Bayes and support vector machine. Before implementing the two algorithms, the SMS data will go through a text preprocessing stage, including data cleaning (whitespace removal, removal of punctuation, and removal of numbers), case folding, stemming, tokenizing, and stop word removal. In this research, a comparison of the accuracy of the two data mining methods will be carried out to see and get the best classification algorithm. Researchers also implemented several experiments by comparing the use of testing data by 20 and 30% and comparing the application of preprocessing stemming and without stemming. This study found that the support vector machine algorithm using testing data of 20% by applying the stemming stage had the highest accuracy rate, 97.5%.
DOI: https://doi.org/10.3844/jcssp.2023.1333.1344
Copyright: © 2023 Evaristus Didik Madyatmadja, Aldi, Fiona Fheren, Helen Angelica, Hanny Juwitasary and David Jumpa Malem Sembiring. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 1,313 Views
- 803 Downloads
- 0 Citations
Download
Keywords
- SMS Spam
- SMS HAM
- Naive Bayes
- Support Vector Machine
- Classification
- Data Mining
- Text Mining