Research Article Open Access

FROM DATA MINING AND KNOWLEDGE DISCOVERY TO BIG DATA ANALYTICS AND KNOWLEDGE EXTRACTION FOR APPLICATIONS IN SCIENCE

Subana Shanmuganathan1
  • 1 Auckland University of Technology (AUT), New Zealand

Abstract

"Data mining" for "knowledge discovery in databases" and associated computational operations first introduced in the mid-1990 s can no longer cope with the analytical issues relating to the so-called "big data". The recent buzzword big data refers to large volumes of diverse, dynamic, complex, longitudinal and/or distributed data generated from instruments, sensors, Internet transactions, email, video, click streams, noisy, structured/unstructured and/or all other digital sources available today and in the future at speeds and on scales never seen before in human history. The big data also being described using 3 Vs, volume, variety and velocity (with an additional 4th V for "veracity" and more recently with a 5th V for "value"), requires a set of new technologies, such as high performance computing i.e., exascale, architectures (distributed or grid), algorithms (for data clustering and generating association rules), programming languages, automated and scalable software tools, to uncover hidden patterns, unknown correlations and other useful information lately referred to as "actionable knowledge" or "data products" from the massive volumes of complex raw data. In view of the above facts, the paper gives an introduction to the synergistic challenges in "data-intensive" science and "exascale" computing for resolving "big data analytics" and "data science" issues in four main disciplines namely, computer science, computational science, statistics and mathematics. For the realisation of vital identified foundational aspects of an effective cyber infrastructure, basic problems need to be addressed adequately in the respective disciplines and are outlined. Finally, the paper looks at five scientific research projects that are urgently in need of high performance computing; this is in contrast to the earlier situations where private business enterprises were the drivers of better modern and faster technologies.

Journal of Computer Science
Volume 10 No. 12, 2014, 2658-2665

DOI: https://doi.org/10.3844/jcssp.2014.2658.2665

Submitted On: 13 October 2014 Published On: 20 January 2015

How to Cite: Shanmuganathan, S. (2014). FROM DATA MINING AND KNOWLEDGE DISCOVERY TO BIG DATA ANALYTICS AND KNOWLEDGE EXTRACTION FOR APPLICATIONS IN SCIENCE. Journal of Computer Science, 10(12), 2658-2665. https://doi.org/10.3844/jcssp.2014.2658.2665

  • 4,489 Views
  • 2,339 Downloads
  • 4 Citations

Download

Keywords

  • Unstructured Data
  • High Performance Computing
  • Data Science