A Multi Layer Perceptron Along with Memory Efficient Feature Extraction Approach for Bengali Document Categorization
- 1 Shahjalal University of Science and Technology, Bangladesh
Abstract
In terms of the total number of speakers in the world Bengali stands as the seventh language and it has been used by approximately 265 million people worldwide. Day by day more people are expressing their views and opinions in Bengali in digital platforms like blogs and social media on various topics. Despite this, very little work has been done to structure these electronic documents according to their categories. In this paper, a methodology is developed for automatically categorizing Bengali news among twelve predefined categories using a Multi Layer Perceptron (MLP) model. We also explored the optimization opportunities that lie within the feature space and illustrated the difficulties that arise while handling large feature spaces in neural networks. It has been shown in this paper that the feature space can be optimized to achieve better accuracy. Using our modified feature extraction technique, we reduced the feature space and achieved an accuracy of 93.3%.
DOI: https://doi.org/10.3844/jcssp.2020.378.390
Copyright: © 2020 Quazi Ishtiaque Mahmud, Noymul Islam Chowdhury and Md Masum. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,929 Views
- 1,527 Downloads
- 0 Citations
Download
Keywords
- Document Categorization
- TF-IDF
- Multi Layer Perceptron
- Activation Functions