A CURE Algorithm for Vietnamese Sentiment Classification in a Parallel Environment
- 1 Nguyen Tat Thanh University, Vietnam
- 2 Vietnam National University, Vietnam
- 3 Sumatra Univesity, Thailand
Abstract
Solutions to process big data are imperative and beneficial for numerous fields of research and commercial applications. Thus, a new model has been proposed in this paper to be used for big data set sentiment classification in the Cloudera parallel network environment. Clustering Using Representatives (CURE), combined with Hadoop MAP (M) / REDUCE (R) in Cloudera – a parallel network system, was used for 20,000 documents in a Vietnamese testing data set. The testing data set included 10,000 positive Vietnamese documents and 10,000 negative ones. After testing our new model on the data set, a 62.92% accuracy rate of sentiment classification was achieved. Although our data set is small, this proposed model is able to process millions of Vietnamese documents, in addition to data in other languages, to shorten the execution time in the distributed environment
DOI: https://doi.org/10.3844/jcssp.2019.1355.1377
Copyright: © 2019 Vo Ngoc Phu, Vo Thi Ngoc Tran and Jack Max. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,989 Views
- 2,464 Downloads
- 1 Citations
Download
Keywords
- Sentiment Classification
- Vietnamese Sentiment Classification
- Vietnamese Sentence Sentiment Classification
- Opinion Mining
- Vietnamese Opinion Mining
- Vietnamese Document Opinion Mining
- Clustering Using Representatives
- Cure
- Cloudera
- Parallel Environment
- Parallel Network
- Parallel Network Environment