Missing Values Treatment and Feature Reduction Analysis to Enhance Classification
- 1 SASTRA Deemed University, India
Abstract
Datasets may have large number of features which makes it hard and time consuming to classify. Additionally, they may have irrelevant and noise features too with missing values. The missing values should be treated in a proper way so that the classifier accuracy can be improved. There is also a need to reduce features and select only the features necessary to the classifier. Principal Component Analysis (PCA) is commonly considered for this process of reducing the number of features in a dataset. These reduced components can be applied as input to the classifiers. In this study, standard datasets are checked for missing values, classified using Support vector Machines (SVM) and Naive Bayes with and without reducing the features using PCA. Then, the proposed algorithm for missing value imputation is used on the datasets and the same analysis were carried out. The accuracy is evaluated using Confusion Matrix. The results are discussed with analysis based on the nature of features and missing values and how different datasets behave when used with machine learning algorithms.
DOI: https://doi.org/10.3844/jcssp.2020.211.216
Copyright: © 2020 D. Muralidharan, K. Renuka, Mulagala Jaswant, J. Karthikeyan and G.R. Brindha. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,965 Views
- 1,556 Downloads
- 0 Citations
Download
Keywords
- PCA
- SVM
- Naive Bayes
- Missing Value Treatment