Research Article Open Access

Predicting Missing Attribute Values Using k-Means Clustering

Nambiraj Suguna and Keppana Gowder Thanushkodi

Abstract

Problem statement: Predicting the value for missing attributes is an important data preprocessing problem in data mining and knowledge discovery tasks. Several methods have been proposed to treat missing data and the one used more frequently is deleting instances containing at least one missing value of a feature. When the dataset has minimum number of missing attribute values then we can neglect the instances. But if it is high, deleting those instances may neglect the essential information. Some methods, such as assigning an average value to the missing attribute, assigning the most common values make good use of all the available data. However the assigned value may not come from the information which the data originally derived from, thus noise is brought to the data. Approach: In this study, k-means clustering is proposed for predicting missing attribute values. The performance of the proposed approach is analyzed with nine different methods. The overall analysis shows that the k-means clustering can predict the missing attribute values better than other methods. After assigning the missing attributes, the feature selection is performed with Bees Colony Optimization (BCO) and the improved Genetic KNN is applied for finding the classification performance as discussed in our previous study. Results: The performance is analyzed with four different medical datasets; Dermatology, Cleveland Heart, Lung Cancer and Wisconsin. For all the datasets, the proposed k-means based missing attribute prediction achieves higher accuracy of 94.60 %, 90.45 %, 87.51 % and 95.70 % respectively. Conclusion: The greater classification accuracy shows the superior performance of the k-means based missing attribute value prediction.

Journal of Computer Science
Volume 7 No. 2, 2011, 216-224

DOI: https://doi.org/10.3844/jcssp.2011.216.224

Submitted On: 16 December 2010 Published On: 25 February 2011

How to Cite: Suguna, N. & Thanushkodi, K. G. (2011). Predicting Missing Attribute Values Using k-Means Clustering. Journal of Computer Science, 7(2), 216-224. https://doi.org/10.3844/jcssp.2011.216.224

  • 3,892 Views
  • 4,554 Downloads
  • 22 Citations

Download

Keywords

  • Bees Colony Optimization (BCO)
  • K-Nearest Neighbor (KNN)
  • missing attributes
  • Most Common Attribute Value (MCAV)
  • Event-Covering Method (EC)
  • genetic algorithm
  • k-means clustering
  • clustering algorithm
  • onlooker bee
  • Artificial Bee Colony (ABC)