Research Article Open Access

Cross-Language Semantic Similarity of Arabic-English Short Phrases and Sentences

Salha Alzahrani1
  • 1 Taif University, Saudi Arabia

Abstract

Measuring cross-language semantic similarity between short texts is a task that is challenging in terms of human understanding. This paper addresses this problem by carrying out a study of Arabic–English semantic similarity in short phrases and sentences. Human-rated benchmark dataset was carefully constructed for this research. Dictionary and machine translation techniques were employed to determine the relatedness between the cross-lingual texts from a monolingual perspective. Three algorithms were developed to rate the semantic similarity and these were applied to the human-rated benchmark. An averaged maximum-translation similarity algorithm was proposed using the term sets produced by the dictionary-based technique. Noun-verb and term vectors obtained by the Machine Translation (MT) technique were also suggested to compute the semantic similarity. The results were compared with the human ratings in our benchmark using Pearson correlation coefficient and these were triangulated with the best, worst and mean for all human participants. MT-based term vector semantic similarity algorithm obtained the highest correlation (r = 0.8657) followed by averaged maximum-translation similarity algorithm (r = 0.7206). Further statistical analysis showed no significant difference between both algorithms and the humans’ judgement.

Journal of Computer Science
Volume 12 No. 1, 2016, 1-18

DOI: https://doi.org/10.3844/jcssp.2016.1.18

Submitted On: 18 December 2015 Published On: 12 March 2016

How to Cite: Alzahrani, S. (2016). Cross-Language Semantic Similarity of Arabic-English Short Phrases and Sentences. Journal of Computer Science, 12(1), 1-18. https://doi.org/10.3844/jcssp.2016.1.18

  • 3,622 Views
  • 3,569 Downloads
  • 11 Citations

Download

Keywords

  • Semantic Similarity
  • Cross-Language
  • Machine Translation
  • Arabic
  • English