Improving Accuracy and Coverage of Data Mining Systems that are Built from Noisy Datasets: A New Model
Abstract
Problem statement: Noise within datasets has to be dealt with under most circumstances. This noise includes misclassified data or information as well as missing data or information. Simple human error is considered as misclassification. These errors will decrease the accuracy of the data mining system so it will not be likely to be used. The objective was to propose an effective algorithm to deal with noise which is represented by missing data in datasets. Approach: A model for improving the accuracy and coverage of data mining systems was proposed and the algorithm of this model was constructed. The algorithm was dealing with missing values in datasets. It splits the original dataset into two new datasets; one contains tuples that have no missing values and the other one contains tuples that have missing values. The proposed algorithm was applied to each of the two new datasets. It finds the reduct of each of them and then it merges the new reducts into one new dataset which will be ready for training. Results: The results showed interesting as it increases the accuracy and coverage of the tested dataset compared to the traditional models. Conclusion: The proposed algorithm performs effectively and generates better results than the previous ones.
DOI: https://doi.org/10.3844/jcssp.2009.131.135
Copyright: © 2009 Luai A. Al Shalabi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 4,078 Views
- 3,089 Downloads
- 1 Citations
Download
Keywords
- Data mining
- noise
- missing values
- rule generation
- knowledge discovery