A Reformed K-Nearest Neighbors Algorithm for Big Data Sets
- 1 Nguyen Tat Thanh University, Vietnam
- 2 Vietnam National University, Vietnam
Abstract
A Data Mining Has Already Had Many Algorithms Which A K-Nearest Neighbors Algorithm, K-NN, Is A Famous Algorithm For Researchers. K-NN Is Very Effective On Small Data Sets, However It Takes A Lot Of Time To Run On Big Datasets. Today, Data Sets Often Have Millions Of Data Records, Hence, It Is Difficult To Implement K-NN On Big Data. In This Research, We Propose An Improvement To K-NN To Process Big Datasets In A Shortened Execution Time. The Reformed K-Nearest Neighbors Algorithm (R-K-NN) Can Be Implemented On Large Datasets With Millions Or Even Billions Of Data Records. R-K-NN Is Tested On A Data Set With 500,000 Records. The Execution Time Of R-K-NN Is Much Shorter Than That Of K-NN. In Addition, R-K-NN Is Implemented In A Parallel Network System With Hadoop Map (M) And Hadoop Reduce (R).
DOI: https://doi.org/10.3844/jcssp.2018.1213.1225
Copyright: © 2018 Vo Ngoc Phu and Vo Thi Ngoc Tran. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,898 Views
- 2,101 Downloads
- 4 Citations
Download
Keywords
- K-Nearest Neighbors Algorithm
- K-NN
- Parallel Network Environment
- Distributed System
- Data Mining
- Association Rules
- Cloudera
- Hadoop Map
- Hadoop Reduce