Machine Learning and Deep Learning for Phishing Email Classification using One-Hot Encoding

Sikha Bagui; Debarghya Nandi; Subhash Bagui; Robert Jamie White

doi:10.3844/jcssp.2021.610.623

Research Article Open Access

Machine Learning and Deep Learning for Phishing Email Classification using One-Hot Encoding

Sikha Bagui¹, Debarghya Nandi², Subhash Bagui³ and Robert Jamie White⁴

¹ The University of West Florida, United States
² University of Illinois at Chicago, United States
³ University of West Florida, United States
⁴ AppRiver, Pensacola, United States

Abstract

Representation of text is a significant task in Natural Language Processing (NLP) and in recent years Deep Learning (DL) and Machine Learning (ML) have been widely used in various NLP tasks like topic classification, sentiment analysis and language translation. Until very recently, little work has been devoted to semantic analysis in phishing detection or phishing email detection. The novelty of this study is in using deep semantic analysis to capture inherent characteristics of the text body. One-hot encoding was used with DL and ML techniques to classify emails as phishing or non-phishing. A comparison of various parameters and hyperparameters was performed for DL. The results of various ML models, Naïve Bayes, SVM, Decision Tree, as well as DL models, Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM), were presented. The DL models performed better than the ML models in terms of accuracy, but the ML models performed better than the DL models in terms of computation time. CNN with Word Embedding performed the best in terms of accuracy (96.34%), demonstrating the effectiveness of semantic analysis in phishing email detection.

Journal of Computer Science

Volume 17 No. 7, 2021, 610-623

DOI: https://doi.org/10.3844/jcssp.2021.610.623

Submitted On: 1 May 2021 Published On: 23 July 2021

How to Cite: Bagui, S., Nandi, D. & White, R. J. (2021). Machine Learning and Deep Learning for Phishing Email Classification using One-Hot Encoding. Journal of Computer Science, 17(7), 610-623. https://doi.org/10.3844/jcssp.2021.610.623

Copyright: © 2021 Sikha Bagui, Debarghya Nandi, Subhash Bagui and Robert Jamie White. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

5,148 Views
3,305 Downloads
39 Citations

Download

Keywords

One-Hot Encoding
Phishing Email Classification
Deep Learning
Machine Learning
Convolutional Neural Networks
Long Short Term Memory