Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler

P. Jaganathan; T. Karthikeyan

doi:10.3844/jcssp.2015.120.126

Research Article Open Access

Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler

P. Jaganathan¹ and T. Karthikeyan²

¹ , India
² Bharathiar University, India

Abstract

With the growing industrial impact over the recent years in computer science, data mining has established itself as one of the most important disciplines. In the fast growing Web and in an appropriate amount of time, locating the resources that are precise and relevant is a huge challenge for the all-purpose single process crawlers, which makes the enhanced and the convincing algorithm in demand. Gradually Large scale search engines frequently update their index and in a timely behavior which are not capable to present such information. In this study a scalable focused crawling is proposed with an incremental parallel Web crawler, the Web pages can be crawled concurrently that are relevant to multiple pre-defined topics. Furthermore, to solve the issue of URL distribution, a compound decision model based on multi-objective decision making method is introduced, which will consider multiple factors synthetically such as load balance and relevance, the update frequency issue can be solved by the local repository decision. The result shows that our proposed system will efficiently produce high quality, relevance and freshness with significantly low memory requirement.

Journal of Computer Science

Volume 11 No. 1, 2015, 120-126

DOI: https://doi.org/10.3844/jcssp.2015.120.126

Submitted On: 9 February 2014 Published On: 13 September 2014

How to Cite: Jaganathan, P. & Karthikeyan, T. (2015). Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler. Journal of Computer Science, 11(1), 120-126. https://doi.org/10.3844/jcssp.2015.120.126

Copyright: © 2015 P. Jaganathan and T. Karthikeyan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

4,953 Views
3,434 Downloads
0 Citations

Download

Keywords

Focused Crawler
Incremental Web Crawler
URL Distribution Issue
Load Balance
Relevance