Bda MST Merged
Bda MST Merged
Syllabus
1. Introduction to Big Data:Data, Characteristics of data and Types of digital data: Unstructured, Semi-
structured and Structured, Sources of data, working with unstructured data, Evolution and Definition of big
data, Characteristics and Need of big data, Challenges of big data, Data environment versus big data
environment.
2. Big data analytics:Overview of business intelligence, Data science and Analytics, Meaning and
Characteristics of big data analytics, Need of big data analytics, Classification of analytics, Challenges to big
data analytics, Importance of big data analytics, Basic terminologies in big data environment.
3. Big data technologies and Databases:NoSQL, Uses, Features and Types, Need, Advantages,
Disadvantages and Application of NoSQL, Overview of NewSQL, Comparing SQL, NoSQL and NewSQL,
Introduction to MongoDB and its needs, Characteristics of MongoDB, Introduction of apache cassandra and
its needs, Characteristics of Cassandra.
4. Hadoop foundation for analytics:History, Needs, Features, Key advantage and Versions of Hadoop,
Essential of Hadoop ecosystems, RDBMS versus Hadoop, Key aspects and Components of Hadoop, Hadoop
architectures.
Contd...
5. Hadoop MapReduce and YARN framework:Introduction to MapReduce, Processing data with Hadoop
using MapReduce, Introduction to YARN, Components, Need and Challenges of YARN, Dissecting YARN,
MapReduce application, Data serialization and Working with common serialization formats, Big data
serialization formats.
6. Big data with Hive and Pig:Overview of hive and its architecture, Hive data types and File format, Hive
query language (HQL), Introduction to Pig, pig latin overview, Data types in Pig and Running Pig
What is Data?
The quantities, characters, or symbols on which operations are performed by a computer, which
may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical,
or mechanical recording media.
Types of Digital Data
● Structured Data
● Semi-Structured Data
● Unstructured Data
Distribution of Types of Digital Data
•Similar entities in the data are grouped and organized in a hierarchy. The attributes or the properties within a group may or may
not be the same.
For example two addresses may or may not contain the same number of properties as in Address 1
To: <Name>From: <Name>Subject: <Text>CC: <Name>Body: <Text, Graphics, Images etc. >
•The tags give us some metadata but the body of the e-mail contains no format neither is such which conveys meaning of the
data it contains.•There is very fine line between unstructured and semi-structured data
What is Semi-structured Data?
Where does Semi-structured Data Come from?
Characteristics of Unstructured Data
Where does Unstructured Data Come from?
Where does Unstructured Data Come from?
Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a
data with so large size and complexity that none of traditional data management tools can store it
or process it efficiently. Big data is also a data but with huge size.
“Big data” is high-volume, velocity, and variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight and decision making.”
Examples Of Big Data
1. The New York Stock Exchange generates about one terabyte of new trade data
per day.
2. The statistic shows that 500+terabytes of new data get ingested into the
databases of social media site Facebook, every day. This data is mainly
generated in terms of photo and video uploads, message exchanges, putting
comments etc.
Characteristics of Data( 6 Types of V’s)
A Single View to the Customer
Volume
Variety
Velocity(Speed)
Real Time /Fast Data
Real-Time Analytics/Decision Requirement
Harnessing Big Data
The Model Has Changed…
What’s driving Big Data
Evolution of Big Data
Challenges of Big Data
Big Data enabling technologies
Hadoop Stack for Big Data
Hadoop MapReduce
Hadoop MapReduce
MapReduce Applications
MapReduce Examples
BDA Lec 10 Introduction to Spark
Lec-11 Spark Built In Liabraries
DESIGN OF KEY-VALUE STORES