WHAT IS BIG DATA :A COMPLETE OVERVIEW.

"BIG DATA IS THE TERM FOR COLLECTION OF DATA SETS SO LARGE AND COMPLEX THAT IT BECOMES DIFFICULT TO PROCESS USING ON-HAND DATABASE SYSTEM TOOLS OR TRADITIONAL DATA PROCESSING APPLICATIONS."

big data

EVOLUTION OF DATA:

1⇒EVOLUTION OF TECHNOLOGY       
EVOLUTION OF TECHNOLOGY

2⇒IOT

BIG DATA AND IOT
3⇒SOCIAL MEDIA
SOCIAL MEDIA
4⇒OTHER FACTORS
DATASOURCES IN BIG DATA

3V OF BIG DATA:

💨VOLUME : BY 2020, ACCUMULATED DIGITAL UNIVERSE OF DATA WILL GROW FROM 4.4 ZETA-BYTE TO 44 ZETA-BYTE OR 44 TRILLION GIZA-BYTE.
💨VARIETY :DIFFERENT KINDS OF DATA IS BEING GENERATED FROM VARIOUS SOURCES. 
💨VELOCITY :DATA IS BEING GENERATED AT AN ALARMING RATE.

BIG DATA AS AN OPPORTUNITY:

Big Data Analytics

Cost reduction:Cost effective storage system for use data sets

Faster and better decision making :Provide ways to analyse information quickly and make decisions

Next generation production:Automated car, Healthcare, etc

Improved service or product :Evolution of customer need and satisfaction 

 AND 
many more opportunities....


PROBLEMS WITH BIG DATA:

PROBLEM 1:  STORING EXPONENTIALLY GROWING HUGE DATA-SETS.
PROBLEM 2: PROCESSING DATA HAVING COMPLEX STRUCTURE.
PROBLEM 3: PROCESSING DATA FASTER.

HADOOP- AS- A- SOLUTION:


 "HADOOP IS A FRAMEWORK THAT ALLOWS US TO STORE AND PROCESS DATA SETS IN PARALLEL AND DISTRIBUTED FASHION. "



HDFS (STORAGE)

ALLOWS TO DUMP ANY KIND OF DATA ACROSS THE CLUSTERS.



MAP REDUCE (PROCESSING)
ALLOW PARALLEL PROCESSING OF THE DATA STORED IN HDFS.

HADOOP DISTRIBUTED FILE SYSTEM:


HDFS HAS TWO CORE COMPONENTS, THAT IS NAMENODE AND DATANODE :
THE NAME NODE IS THE MAIN NODE THAT CONTAINS METADATA ABOUT THE DATA STORED.
DATA IS STORED ON THE DATANODES WHICH ARE COMMODITY HARDWARE IN THE DISTRIBUTED ENVIRONMENT.
STORING DATA(SOLLUTION):
PROBLEM 1:  STORING EXPONENTIALLY GROWING HUGE DATASETS
SOLUTION : HDFS
STORAGE UNIT OF HADOOP
IT IS A DISTRIBUTED FILE SYSTEM
DIVIDE FILES (INPUT DATA) INTO SMALLER CHUNKS AND STORES IT ACROSS THE CLUSTER
SCALABLE PER REQUIREMENT

PROBLEM 2 : STORING UNSTRUCTURED DATA:
SOLUTION: HDFS
ALLOWS TO STORE ANY KIND OF DATA, BE IT STRUCTURED, SEMI-STRUCTURED OR UNSTRUCTURED

HADOOP ECOSYSTEM:

HADOOP: HADOOP PROVIDES A SCALABLE SOLUTION TO STORE AND PROCESS HUGE DATA SETS IN PARALLEL AND DISTRIBUTED FASHION.
APACHE HIVE : APACHE HIVE IS A DATA WAREHOUSING TOOL THAT ALLOWS US TO PERFORM BIG DATA ANALYTICS USING HIVE QUERY LANGUAGE WHICH IS VERY SIMILAR TO SQL.
APACHE PIG: APACHE PIG IS A PLATFORM, USED TO ANALYSE LARGE DATASETS REPRESENTING THEM TO DATAFLOWS.
APACHE SPARK
APACHE HBASE

KEEP VISITING...

5 Comments

  1. Nice article https://www.railwayjobss.in all government job update https://www.dollar13.com how to make money online

    ReplyDelete
  2. You develop data on taking care of information by enlist yourself in a course offered by any of the prestigious organizations that have been working personally with the continually creating business. Data Analytics Course

    ReplyDelete
  3. Nice post. Thanks for sharing! I want people to know just how good this information is in your blog. It’s interesting content and Great work
    data analytics course
    Business Analytics Certification Course Training in Hyderabad
    <a href="https://360digitmg.com/india/python-r-programming''>Python & R Programming Course Training for Beginners</a>

    ReplyDelete
  4. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.

    Simple Linear Regression

    Correlation vs Covariance

    ReplyDelete