Research Article Open Access

Improving Accuracy and Coverage of Data Mining Systems that are Built from Noisy Datasets: A New Model

Luai A. Al Shalabi

Abstract

Problem statement: Noise within datasets has to be dealt with under most circumstances. This noise includes misclassified data or information as well as missing data or information. Simple human error is considered as misclassification. These errors will decrease the accuracy of the data mining system so it will not be likely to be used. The objective was to propose an effective algorithm to deal with noise which is represented by missing data in datasets. Approach: A model for improving the accuracy and coverage of data mining systems was proposed and the algorithm of this model was constructed. The algorithm was dealing with missing values in datasets. It splits the original dataset into two new datasets; one contains tuples that have no missing values and the other one contains tuples that have missing values. The proposed algorithm was applied to each of the two new datasets. It finds the reduct of each of them and then it merges the new reducts into one new dataset which will be ready for training. Results: The results showed interesting as it increases the accuracy and coverage of the tested dataset compared to the traditional models. Conclusion: The proposed algorithm performs effectively and generates better results than the previous ones.

Journal of Computer Science
Volume 5 No. 2, 2009, 131-135

DOI: https://doi.org/10.3844/jcssp.2009.131.135

Submitted On: 13 December 2008 Published On: 28 February 2009

How to Cite: Al Shalabi, L. A. (2009). Improving Accuracy and Coverage of Data Mining Systems that are Built from Noisy Datasets: A New Model . Journal of Computer Science, 5(2), 131-135. https://doi.org/10.3844/jcssp.2009.131.135

  • 4,078 Views
  • 3,089 Downloads
  • 1 Citations

Download

Keywords

  • Data mining
  • noise
  • missing values
  • rule generation
  • knowledge discovery