Review Article Open Access

A Systematic Literature Review on English and Bangla Topic Modeling

Md. Basim Uddin Ahmed1, Ananta Akash Podder1, Mahruba Sharmin Chowdhury1 and Mohammad Abdullah Al Mumin1
  • 1 Shahjalal University of Science and Technology, Bangladesh

Abstract

Due to the enormous growth of information and technology, the digitized texts and data are being immensely generated. Therefore, identifying the main topics in a vast collection of documents by humans is merely impossible. Topic modeling is such a statistical framework that infers the latent and underlying topics from text documents, corpus, or electronic archives through a probabilistic approach. It is a promising field in Natural Language Processing (NLP). Though many researchers have researched this field, only a few significant research has been done for Bangla. In this literature review paper, we have followed a systematic approach for reviewing topic modeling studies published from 2003 to 2020. We have analyzed topic modeling methods from different aspects and identified the research gap between topic modeling in English and Bangla language. After analyzing these papers, we have identified several types of topic modeling techniques, such as Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), Support Vector Machine (SVM), Bi-term Topic Modeling (BTM). Furthermore, this review paper also highlights the real-world applications of topic modeling. Several evaluation methods were used to evaluate these models’ performances, which we have discussed in this study. We conclude by mentioning the huge future research scopes for topic modeling in Bangla.

Journal of Computer Science
Volume 17 No. 1, 2021, 1-18

DOI: https://doi.org/10.3844/jcssp.2021.1.18

Submitted On: 21 September 2020 Published On: 8 January 2021

How to Cite: Ahmed, M. B. U., Podder, A. A., Chowdhury, M. S. & Al Mumin, M. A. (2021). A Systematic Literature Review on English and Bangla Topic Modeling. Journal of Computer Science, 17(1), 1-18. https://doi.org/10.3844/jcssp.2021.1.18

  • 3,872 Views
  • 1,721 Downloads
  • 4 Citations

Download

Keywords

  • English Bangla Comparison
  • Latent Dirichlet Allocation (LDA)
  • Systematic Literature Review (SLR)
  • Topic Modeling Bangla
  • Topic Modeling Methods
  • Topic Extraction