WHAT IS BIG DATA :A COMPLETE OVERVIEW.

"BIG DATA IS THE TERM FOR COLLECTION OF DATA SETS SO LARGE AND COMPLEX THAT IT BECOMES DIFFICULT TO PROCESS USING ON-HAND DATABASE SYSTEM TOOLS OR TRADITIONAL DATA PROCESSING APPLICATIONS."

big data

EVOLUTION OF DATA:

1⇒EVOLUTION OF TECHNOLOGY       
EVOLUTION OF TECHNOLOGY

2⇒IOT

BIG DATA AND IOT
3⇒SOCIAL MEDIA
SOCIAL MEDIA
4⇒OTHER FACTORS
DATASOURCES IN BIG DATA

3V OF BIG DATA:

💨VOLUME : BY 2020, ACCUMULATED DIGITAL UNIVERSE OF DATA WILL GROW FROM 4.4 ZETA-BYTE TO 44 ZETA-BYTE OR 44 TRILLION GIZA-BYTE.
💨VARIETY :DIFFERENT KINDS OF DATA IS BEING GENERATED FROM VARIOUS SOURCES. 
💨VELOCITY :DATA IS BEING GENERATED AT AN ALARMING RATE.

BIG DATA AS AN OPPORTUNITY:

Big Data Analytics

Cost reduction:Cost effective storage system for use data sets

Faster and better decision making :Provide ways to analyse information quickly and make decisions

Next generation production:Automated car, Healthcare, etc

Improved service or product :Evolution of customer need and satisfaction 

 AND 
many more opportunities....


PROBLEMS WITH BIG DATA:

PROBLEM 1:  STORING EXPONENTIALLY GROWING HUGE DATA-SETS.
PROBLEM 2: PROCESSING DATA HAVING COMPLEX STRUCTURE.
PROBLEM 3: PROCESSING DATA FASTER.

HADOOP- AS- A- SOLUTION:


 "HADOOP IS A FRAMEWORK THAT ALLOWS US TO STORE AND PROCESS DATA SETS IN PARALLEL AND DISTRIBUTED FASHION. "



HDFS (STORAGE)

ALLOWS TO DUMP ANY KIND OF DATA ACROSS THE CLUSTERS.



MAP REDUCE (PROCESSING)
ALLOW PARALLEL PROCESSING OF THE DATA STORED IN HDFS.

HADOOP DISTRIBUTED FILE SYSTEM:


HDFS HAS TWO CORE COMPONENTS, THAT IS NAMENODE AND DATANODE :
THE NAME NODE IS THE MAIN NODE THAT CONTAINS METADATA ABOUT THE DATA STORED.
DATA IS STORED ON THE DATANODES WHICH ARE COMMODITY HARDWARE IN THE DISTRIBUTED ENVIRONMENT.
STORING DATA(SOLLUTION):
PROBLEM 1:  STORING EXPONENTIALLY GROWING HUGE DATASETS
SOLUTION : HDFS
STORAGE UNIT OF HADOOP
IT IS A DISTRIBUTED FILE SYSTEM
DIVIDE FILES (INPUT DATA) INTO SMALLER CHUNKS AND STORES IT ACROSS THE CLUSTER
SCALABLE PER REQUIREMENT

PROBLEM 2 : STORING UNSTRUCTURED DATA:
SOLUTION: HDFS
ALLOWS TO STORE ANY KIND OF DATA, BE IT STRUCTURED, SEMI-STRUCTURED OR UNSTRUCTURED

HADOOP ECOSYSTEM:

HADOOP: HADOOP PROVIDES A SCALABLE SOLUTION TO STORE AND PROCESS HUGE DATA SETS IN PARALLEL AND DISTRIBUTED FASHION.
APACHE HIVE : APACHE HIVE IS A DATA WAREHOUSING TOOL THAT ALLOWS US TO PERFORM BIG DATA ANALYTICS USING HIVE QUERY LANGUAGE WHICH IS VERY SIMILAR TO SQL.
APACHE PIG: APACHE PIG IS A PLATFORM, USED TO ANALYSE LARGE DATASETS REPRESENTING THEM TO DATAFLOWS.
APACHE SPARK
APACHE HBASE

KEEP VISITING...

Comments

  1. Nice article https://www.railwayjobss.in all government job update https://www.dollar13.com how to make money online

    ReplyDelete