0% found this document useful (0 votes)
7 views31 pages

big_data-intro

The document provides an overview of Big Data, defining it as a massive collection of complex data that traditional tools cannot efficiently process. It discusses the importance of Big Data in various fields, including business and healthcare, and highlights the benefits of Big Data analytics. Additionally, it covers the characteristics, types, and challenges associated with Big Data, along with potential solutions like Hadoop.

Uploaded by

mamakemrosly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views31 pages

big_data-intro

The document provides an overview of Big Data, defining it as a massive collection of complex data that traditional tools cannot efficiently process. It discusses the importance of Big Data in various fields, including business and healthcare, and highlights the benefits of Big Data analytics. Additionally, it covers the characteristics, types, and challenges associated with Big Data, along with potential solutions like Hadoop.

Uploaded by

mamakemrosly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Introduction to Big Data

M. Atemkeng (Rhodes)
CONTENT
 What is Big Data
 What is an example of Big Data
 Why is Big Data Important?
 Big Data Analytics
 Benefits of Big Data Analytics
 Types of Big Data
 Characteristics of Big Data
 Primary Source of Big Data
 Big Data Tools and Software
 Big Data Mining
 Top Trends in Big Data
WHAT IS BIG DATA DATA
 Big Data is a massive collection of data that is growing exponentially over time

 Big Data is a data set that is so large and complex that traditional data

management tools (including traditional machine learning) cannot store or process

it efficiently

 Big Data is a type of data that is extremely large in size


WHAT IS AN EXAMPLE OF BIG
DATA?
 Combating Cyber Threats Data

 Network Traffic Examination

 Enhancing Enterprise Protection Data

 Cloud Security Monitoring Data

 User Behaviour Data


WHY IS BIG DATA IMPORTANT
 Compagnies use big data in their system to improve operations, provide better

customer services, create personalized marketing campaign and take other actions

that, ultimately, can increase revenue and profits

 Big data is also used by medical researchers to identify disease signs and risk

factors and by doctors to help

 Astronomy, cybersecurity, medicine, ecology, etc


BIG DATA ANALYTICS
 Big data analytics examines large amounts of data to uncover hidden patterns,

correlations and other insights

 Big data analytics helps organization harness their data and use it to identify new

opportunities

 That, in turn, leads to smarter business moves, more efficient operations, higher

profits and happier customers


TYPE OF DATA
 Structured

 Unstructured

 Semi-structured
STRUCTURED DATA
 Structured data is used to refer to data which is already stored in databases, in an

ordered manner

 Two source of structured data: Human-Generated, Machine-Generated

 All data received from sensors, antennas, web logs and financial systems are

classified as machine-generated data

 Human-generated structured data includes all the data human input to a computer
UN-STRUCTURED DATA
 Unstructured data is defined as any data with an unknown form or structure

 Aside from its massive size, un-structured data presents several challenges in terms

of processing and extracting value from it

 A heterogeneous data source containing a mix of simple text files, images, videos,

and so on is an example of unstructured data


SEMI-STRUCTURED DATA
 Semi-structured data can contain both types of information

 Semi-structured data appears to be structured, but it is not defined in the same

way that a table definition in a relational database is

 A data representation in an XML file is an example of semi-structured data


CHARACTERISTICS OF BIG DATA
VOLUME
 The name Big Data itself is related to a size which is enormous

 Size of the data plays a very crucial role in determining value out of the data, Also

whether a particular data can be considered as a Big Data or not, is dependent

upon the volume of the data

 Hence, Volume is one characteristics which needs to be considered while dealing

with Big Data solutions

 For example: Cybersecurity data


VELOCITY
 The term “VELOCITY” refers to the speed of generation of data

 How fast the data is generated and processed to meet the demands, determines

real potential in the data

 Big Data Velocity deals with the speed at which data flow in from sources like

business processes, application logs, networks, and social media sites, sensors,

mobiles devices, antennas, etc

 The flow of data is massive and continuous.


VERACITY
 When we are dealing with a high volume, velocity and variety of data, it is not

possible that all the data is going to be 100% correct, there will be dirty data

 The quality of the data being captured can vary greatly

 The data accuracy of analysis depends on the veracity of the source data
VALUE
 Value is the most important aspect in the big data

 Through the potential value of the big data is huge

 It is all well and good having access to big data but unless we can turn it into value

it becomes useless
VARIETY
 Big data is not always structured data and it is not always easy to put big data into a

relational database

 This means that the category to which big data belongs to is also a very essential

fact that needs to be known by the data analyst

 Dealing with a variety of structured and unstructured data greatly increases the

complexity if both storing and processing

 90 of data generated is unstructured


A MORE COMPLETE DEFINTION
“Big data is high-volume, high-velocity and high-variety information assets that

demand cost-effective, innovative forms of information processing for enhanced

insight and decision making.” -- Gartner


PRIMARY SOURCE OF BIG DATA
CHALLENGES OF BIG DATA
CHALLENGES OF BIG DATA
CHALLENGES OF BIG DATA
HADOOP AS A SOLUTION
HADOOP AS A SOLUTION

Compagnies using HADOOP: https://bigdataanalyticsnews.com/top-12-hadoop-technology-


companies/
HADOOP AS A SOLUTION
HADOOP AS A SOLUTION
HADOOP AS A SOLUTION
HADOOP AS A SOLUTION
HADOOP AS A SOLUTION
HADOOP AS A SOLUTION
HADOOP ECOSYSTEM

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy