An Automatic Collocation Extraction from Arabic Corpus
Abstract
Problem statement: The identification of collocations is very important part in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. Because of the complexities of Arabic, the collocations undergo some variations such as, morphological, graphical, syntactic variation that constitutes the difficulties of identifying the collocation. Approach: We used the hybrid method for extracting the collocations from Arabic corpus that is based on linguistic information and association measures. Results: This method extracted the bi-gram candidates of Arabic collocation from corpus and evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. Conclusion: The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.
DOI: https://doi.org/10.3844/jcssp.2011.6.11
Copyright: © 2011 Abdulgabbar Mohammad Saif and Mohd J.A. Aziz. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 4,132 Views
- 4,241 Downloads
- 14 Citations
Download
Keywords
- Collocation extraction
- hybrid methods
- collocation variations
- Association measures
- morphosyntactic
- graphical variants
- n-best evaluation