Research Article Open Access

Topic-Transformer for Document-Level Language Understanding

Oumaima Hourrane1 and El Habib Benlahmar1
  • 1 Hassan II University, Morocco

Abstract

As long as natural language processing applications are considered prediction problems with insufficient context, usually referred to as a single sentence or paragraph, this does not reveal how humans perceive natural language. When reading a text, humans are sensitive to much more context, such as the rest or other relevant documents. This study focuses on simultaneously capturing syntax and global semantics from a text, thus acquiring document-level understanding. Accordingly, we introduce a Topic-Transformer that combines the benefits of a neural topic model that captures global semantic information and a transformer-based language model, which can capture the local structure of texts both semantically and syntactically. Experiments on various datasets confirm that our model has a lower perplexity metric compared to standard transformer architecture and the recent topic-guided language models and generates topics that are conceivably coherent compared to those of regular Latent Dirichlet Allocation (LDA) topic model.

Journal of Computer Science
Volume 18 No. 1, 2022, 18-25

DOI: https://doi.org/10.3844/jcssp.2022.18.25

Submitted On: 21 September 2021 Published On: 22 January 2022

How to Cite: Hourrane, O. & Benlahmar, E. H. (2022). Topic-Transformer for Document-Level Language Understanding. Journal of Computer Science, 18(1), 18-25. https://doi.org/10.3844/jcssp.2022.18.25

  • 2,746 Views
  • 1,324 Downloads
  • 0 Citations

Download

Keywords

  • Neural Topic Model
  • Neural Language Model
  • Topic-Guided Language Model
  • Document-Level Understanding
  • Long-Range Semantic Dependencies