Research Article Open Access

Feature Selection in Data-Mining for Genetics Using Genetic Algorithm

V. N. Rajavarman and S. P. Rajagopalan

Abstract

We discovered genetic features and environmental factors which were involved in multifactorial diseases. To exploit the massive data obtained from the experiments conducted at the General Hospital, Chennai, data mining tools were required and we proposed a 2-Phase approach using a specific genetic algorithm. This heuristic approach had been chosen as the number of features to consider was large (upto 3654 for biological data under our study). Collected data indicated for pairs of affected individuals of a same family their similarity at given points (locus) of their chromosomes. This was represented in a matrix where each locus was represented by a column and each pairs of individuals considered by a row. The objective was first to isolate the most relevant associations of features and then to class individuals that had the considered disease according to these associations. For the first phase, the feature selection problem, we used a genetic algorithm (GA). To deal with this very specific problem, some advanced mechanisms had been introduced in the genetic algorithm such as sharing, random immigrant, dedicated genetic operators and a particular distance operator had been defined. Then, the second phase, a clustering based on the features selected during the previous phase, will use the clustering algorithm k-means.

Journal of Computer Science
Volume 3 No. 9, 2007, 723-725

DOI: https://doi.org/10.3844/jcssp.2007.723.725

Submitted On: 6 September 2007 Published On: 30 September 2007

How to Cite: Rajavarman, V. N. & Rajagopalan, S. P. (2007). Feature Selection in Data-Mining for Genetics Using Genetic Algorithm. Journal of Computer Science, 3(9), 723-725. https://doi.org/10.3844/jcssp.2007.723.725

  • 3,065 Views
  • 2,652 Downloads
  • 8 Citations

Download

Keywords

  • Crossover
  • mutation
  • selection
  • fitness function
  • random immigrant
  • k-means algorithm