Research Article Open Access

A COMPARATIVE STUDY OF COMBINED FEATURE SELECTION METHODS FOR ARABIC TEXT CLASSIFICATION

Aisha Adel1, Nazlia Omar1 and Adel Al-Shabi1
  • 1 University Kebangsaan Malaysia, Malaysia

Abstract

Text classification is a very important task due to the huge amount of electronic documents. One of the problems of text classification is the high dimensionality of feature space. Researchers proposed many algorithms to select related features from text. These algorithms have been studied extensively for English text, while studies for Arabic are still limited. This study introduces an investigation on the performance of five widely used feature selection methods namely Chi-square, Correlation, GSS Coefficient, Information Gain and Relief F. In addition, this study also introduces an approach of combination of feature selection methods based on the average weight of the features. The experiments are conducted using Naïve Bayes and Support Vector Machine classifiers to classify a published Arabic corpus. The results show that the best results were obtained when using Information Gain method. The results also show that the combination of multiple feature selection methods outperforms the best results obtain by the individual methods.

Journal of Computer Science
Volume 10 No. 11, 2014, 2232-2239

DOI: https://doi.org/10.3844/jcssp.2014.2232.2239

Submitted On: 9 April 2014 Published On: 20 December 2014

How to Cite: Adel, A., Omar, N. & Al-Shabi, A. (2014). A COMPARATIVE STUDY OF COMBINED FEATURE SELECTION METHODS FOR ARABIC TEXT CLASSIFICATION. Journal of Computer Science, 10(11), 2232-2239. https://doi.org/10.3844/jcssp.2014.2232.2239

  • 4,320 Views
  • 2,623 Downloads
  • 13 Citations

Download

Keywords

  • Feature Selection
  • Combination Method
  • Arabic Text Classification