An Automatic Collocation Extraction from Arabic Corpus

Abdulgabbar Mohammad Saif; Mohd J.A. Aziz

doi:10.3844/jcssp.2011.6.11

Research Article Open Access

An Automatic Collocation Extraction from Arabic Corpus

Abdulgabbar Mohammad Saif and Mohd J.A. Aziz

Abstract

Problem statement: The identification of collocations is very important part in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. Because of the complexities of Arabic, the collocations undergo some variations such as, morphological, graphical, syntactic variation that constitutes the difficulties of identifying the collocation. Approach: We used the hybrid method for extracting the collocations from Arabic corpus that is based on linguistic information and association measures. Results: This method extracted the bi-gram candidates of Arabic collocation from corpus and evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. Conclusion: The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.

Journal of Computer Science

Volume 7 No. 1, 2011, 6-11

DOI: https://doi.org/10.3844/jcssp.2011.6.11

Submitted On: 14 October 2010 Published On: 16 December 2010

How to Cite: Saif, A. M. & Aziz, M. J. (2011). An Automatic Collocation Extraction from Arabic Corpus. Journal of Computer Science, 7(1), 6-11. https://doi.org/10.3844/jcssp.2011.6.11

Copyright: © 2011 Abdulgabbar Mohammad Saif and Mohd J.A. Aziz. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

4,777 Views
4,730 Downloads
14 Citations

Download

Keywords

Collocation extraction
hybrid methods
collocation variations
Association measures
morphosyntactic
graphical variants
n-best evaluation