0% found this document useful (0 votes)
17 views230 pages

Bda MST Merged

The document discusses big data analytics and related concepts. It covers topics like the characteristics of structured, semi-structured, and unstructured data; the evolution and definition of big data; big data analytics; NoSQL and NewSQL databases; Hadoop and its components like MapReduce and YARN; Hive and Pig for analytics on Hadoop; and Spark as a big data processing framework. The syllabus outlines key technologies and concepts for working with big data at scale.

Uploaded by

Deepti Agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views230 pages

Bda MST Merged

The document discusses big data analytics and related concepts. It covers topics like the characteristics of structured, semi-structured, and unstructured data; the evolution and definition of big data; big data analytics; NoSQL and NewSQL databases; Hadoop and its components like MapReduce and YARN; Hive and Pig for analytics on Hadoop; and Spark as a big data processing framework. The syllabus outlines key technologies and concepts for working with big data at scale.

Uploaded by

Deepti Agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 230

Big Data Analytics

Syllabus
1. Introduction to Big Data:Data, Characteristics of data and Types of digital data: Unstructured, Semi-
structured and Structured, Sources of data, working with unstructured data, Evolution and Definition of big
data, Characteristics and Need of big data, Challenges of big data, Data environment versus big data
environment.

2. Big data analytics:Overview of business intelligence, Data science and Analytics, Meaning and
Characteristics of big data analytics, Need of big data analytics, Classification of analytics, Challenges to big
data analytics, Importance of big data analytics, Basic terminologies in big data environment.

3. Big data technologies and Databases:NoSQL, Uses, Features and Types, Need, Advantages,
Disadvantages and Application of NoSQL, Overview of NewSQL, Comparing SQL, NoSQL and NewSQL,
Introduction to MongoDB and its needs, Characteristics of MongoDB, Introduction of apache cassandra and
its needs, Characteristics of Cassandra.

4. Hadoop foundation for analytics:History, Needs, Features, Key advantage and Versions of Hadoop,
Essential of Hadoop ecosystems, RDBMS versus Hadoop, Key aspects and Components of Hadoop, Hadoop
architectures.
Contd...
5. Hadoop MapReduce and YARN framework:Introduction to MapReduce, Processing data with Hadoop
using MapReduce, Introduction to YARN, Components, Need and Challenges of YARN, Dissecting YARN,
MapReduce application, Data serialization and Working with common serialization formats, Big data
serialization formats.

6. Big data with Hive and Pig:Overview of hive and its architecture, Hive data types and File format, Hive
query language (HQL), Introduction to Pig, pig latin overview, Data types in Pig and Running Pig
What is Data?
The quantities, characters, or symbols on which operations are performed by a computer, which
may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical,
or mechanical recording media.
Types of Digital Data

● Structured Data
● Semi-Structured Data
● Unstructured Data
Distribution of Types of Digital Data

Here is a percent distribution of the three forms of data -


What is Structured Data?
Where does Structured Data Come from?
Structured Data: Everything in its Place
Ease with Structured Data-Storage
Ease with Structured Data-Retrieval
Semi-structured Data
•Semi-structured data does not conform to any data model i.e. it is difficult to determine the meaning of data neither can data be
stored in rows and columns as in a database but semi-structured data has tags and markers which help to group data and
describe how data is stored, giving some metadata but it is not sufficient for management and automation of data.

•Similar entities in the data are grouped and organized in a hierarchy. The attributes or the properties within a group may or may
not be the same.

Eg. HTML, XML JSON etc

For example two addresses may or may not contain the same number of properties as in Address 1

Address 1 <house number><street name><area name><city>

Address 2<house number><street name><city>•

For example an e-mail follows a standard format

To: <Name>From: <Name>Subject: <Text>CC: <Name>Body: <Text, Graphics, Images etc. >

•The tags give us some metadata but the body of the e-mail contains no format neither is such which conveys meaning of the
data it contains.•There is very fine line between unstructured and semi-structured data
What is Semi-structured Data?
Where does Semi-structured Data Come from?
Characteristics of Unstructured Data
Where does Unstructured Data Come from?
Where does Unstructured Data Come from?

1. Anything in a non-database form is unstructured data.


2. It can be classified into two broad categories:

•Bitmap objects : For example, image, video, or audio files.


•Textual objects : For example, Microsoft Word documents,emails, or
Microsoft Excel spread-sheets
How to Store Unstructured Data?
How to Store Unstructured Data?
How to Extract Information from Unstructured Data?
Data Storage Units Chart:
Data Growth over the years
What is Big Data?

Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a
data with so large size and complexity that none of traditional data management tools can store it
or process it efficiently. Big data is also a data but with huge size.

According to Gartner, the definition of Big Data –

“Big data” is high-volume, velocity, and variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight and decision making.”
Examples Of Big Data

1. The New York Stock Exchange generates about one terabyte of new trade data
per day.
2. The statistic shows that 500+terabytes of new data get ingested into the
databases of social media site Facebook, every day. This data is mainly
generated in terms of photo and video uploads, message exchanges, putting
comments etc.
Characteristics of Data( 6 Types of V’s)
A Single View to the Customer
Volume
Variety
Velocity(Speed)
Real Time /Fast Data
Real-Time Analytics/Decision Requirement
Harnessing Big Data
The Model Has Changed…
What’s driving Big Data
Evolution of Big Data
Challenges of Big Data
Big Data enabling technologies
Hadoop Stack for Big Data
Hadoop MapReduce
Hadoop MapReduce
MapReduce Applications
MapReduce Examples
BDA Lec 10 Introduction to Spark
Lec-11 Spark Built In Liabraries
DESIGN OF KEY-VALUE STORES

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy