A Hybrid Method of Long Short-Term Memory and Auto-Encoder Architectures for Sarcasm Detection
- 1 Universiti Kebangsaan Malaysia (UKM), Malaysia
Abstract
Sarcasm detection is considered one of the most challenging tasks in sentiment analysis and opinion mining applications in the social media. Sarcasm identification is therefore essential for a good public opinion decision. There are some studies on sarcasm detection that apply standard word2vec model and have shown great performance with word-level analysis. However, once a sequence of terms is being tackled, the performance drops. This is because averaging the embedding of each term in a sentence to get the general embedding would discard the important embedding of some terms. LSTM showed significant improvement in terms of document embedding. However, within the classification LSTM requires adding additional information in order to precisely classify the document into sarcasm or not. This study aims to propose two technique based on LSTM and Auto-Encoder for improving the sarcasm detection. A benchmark dataset has been used in the experiments along with several pre-processing operations that have been applied. These include stop word removal, tokenization and special character removal with LSTM which can be represented by configuring the document embedding and using Auto-Encoder the classifier that was trained on the proposed LSTM. Results showed that the proposed LSTM with Auto-Encoder outperformed the baseline by achieving 84% of f-measure for the dataset. The main reason behind the superiority is that the proposed auto encoder is processing the document embedding as input and attempt to output the same embedding vector. This will enable the architecture to learn the interesting embedding that have significant impact on sarcasm polarity.
DOI: https://doi.org/10.3844/jcssp.2021.1093.1098
Copyright: © 2021 Mohammed M. AL-Ani, Nazlia Omar and Ahmed Adil Nafea. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,365 Views
- 1,344 Downloads
- 8 Citations
Download
Keywords
- Sarcasm Detection
- Irony
- LSTM
- Auto-Encoder
- Sentiment Analysis