An Efficient Implementation of Re-Sampling Technique for High Performance Multiple Classifier Systems
Abstract
Due to the large size of the database, the entire training dataset could not be used to construct the classifiers. One popular solution is to separate stream data into chunks, learn a base classifier from each chunk and then integrate all base classifiers to form Multiple classifier system (MCS). Sometimes this data streams does not include all the classes in its equal proportion as in the entire training data set. So we have newly introduced a method of Re-Sampling based on the statistical value of the class attribute. In the Proposed Method, the probability of occurrences of every class for the entire training data set have been estimated. Based on the probability, thresholds have been fixed for all the classes. When the data set have been selected randomly, the probabilities of the classes have been checked against the thresholds. The sample, which satisfies all the thresholds, is allowed to construct the Model. Otherwise, Re-sampling is performed and the process is repeated until the sample satisfies all the thresholds for the classes. The proposed method yields more accuracy than the one which does not have threshold on classes in the random samples. We have also compared the accuracy of different classifiers. Experimental results and comparative studies demonstrate the efficiency and efficacy of our method.
DOI: https://doi.org/10.3844/jcssp.2007.195.198
Copyright: © 2007 S. Sathiyabama, K. Thyagarajah and D. Ayyamuthukumar. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,355 Views
- 2,255 Downloads
- 0 Citations
Download
Keywords
- Accuracy
- classifier
- euclidean distance
- sampling
- threshold
- normalization