Predicting Missing Attribute Values Using k-Means Clustering

Nambiraj Suguna; Keppana Gowder Thanushkodi

doi:10.3844/jcssp.2011.216.224

Research Article Open Access

Predicting Missing Attribute Values Using k-Means Clustering

Nambiraj Suguna and Keppana Gowder Thanushkodi

Abstract

Problem statement: Predicting the value for missing attributes is an important data preprocessing problem in data mining and knowledge discovery tasks. Several methods have been proposed to treat missing data and the one used more frequently is deleting instances containing at least one missing value of a feature. When the dataset has minimum number of missing attribute values then we can neglect the instances. But if it is high, deleting those instances may neglect the essential information. Some methods, such as assigning an average value to the missing attribute, assigning the most common values make good use of all the available data. However the assigned value may not come from the information which the data originally derived from, thus noise is brought to the data. Approach: In this study, k-means clustering is proposed for predicting missing attribute values. The performance of the proposed approach is analyzed with nine different methods. The overall analysis shows that the k-means clustering can predict the missing attribute values better than other methods. After assigning the missing attributes, the feature selection is performed with Bees Colony Optimization (BCO) and the improved Genetic KNN is applied for finding the classification performance as discussed in our previous study. Results: The performance is analyzed with four different medical datasets; Dermatology, Cleveland Heart, Lung Cancer and Wisconsin. For all the datasets, the proposed k-means based missing attribute prediction achieves higher accuracy of 94.60 %, 90.45 %, 87.51 % and 95.70 % respectively. Conclusion: The greater classification accuracy shows the superior performance of the k-means based missing attribute value prediction.

Journal of Computer Science

Volume 7 No. 2, 2011, 216-224

DOI: https://doi.org/10.3844/jcssp.2011.216.224

Submitted On: 16 December 2010 Published On: 25 February 2011

How to Cite: Suguna, N. & Thanushkodi, K. G. (2011). Predicting Missing Attribute Values Using k-Means Clustering. Journal of Computer Science, 7(2), 216-224. https://doi.org/10.3844/jcssp.2011.216.224

Copyright: © 2011 Nambiraj Suguna and Keppana Gowder Thanushkodi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

6,745 Views
6,084 Downloads
22 Citations

Download

Keywords

Bees Colony Optimization (BCO)
K-Nearest Neighbor (KNN)
missing attributes
Most Common Attribute Value (MCAV)
Event-Covering Method (EC)
genetic algorithm
k-means clustering
clustering algorithm
onlooker bee
Artificial Bee Colony (ABC)