0% found this document useful (0 votes)

33 views6 pages

BigDataSystems Regular HO

The document provides details about the Big Data Systems course offered at BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI. The course introduces concepts related to storing and processing big data using distributed computing models and platforms like Hadoop, Spark and Amazon services. The course objectives are to understand big data ecosystem, leverage infrastructure for big data and develop skills in big data processing and stream processing.

Uploaded by

sameer_888

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views6 pages

BigDataSystems Regular HO

Uploaded by

sameer_888

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES

COURSE HANDOUT

Part A: Content Design

Course Title Big Data Systems

Course No(s) DSECL ZG522
Credit Units 5
Course Author Prof. Shan Balasubramaniam
Version No 2.0
Last Revised By Pravin Y Pawar
Date 1 January 2020

Course Description
The course introduces the students to the concepts of Systems for Analytics with particular emphasis on
processing Big Data. It introduces distributed computing models for storage and processing of Big Data with
specific coverage of block storage, file systems, and databases on the one hand and batch processing,
in-memory distributed processing, and stream processing on the other. Hadoop (along with associated
technologies such as Hive and Pig), Spark, and Amazon’s storage and database services are used as exemplar
platforms.

Course Objectives
CO Enable students to understand requirements for and constraints in storing and processing Big Data
1

CO Enable students to leverage commodity infrastructure (such as scale-out clusters, distributed data
2 stores, and the cloud) and the appropriate platforms and services for storing and processing Big Data.

CO Enable students to implement solutions for big data processing

CO Enable students to develop a working knowledge of stream processing

Text Book(s)
T1 Seema Acharya and Subhashini Chellappan. Big Data and Analytics. Wiley India Pvt. Ltd.
Second Edition

Reference Book(s) & other resources

R1 DT Editorial Services. Big Data - Black Book. DreamTech. Press. 2016
R2 Kai Hwang, Jack Dongarra, and Geoffrey C. Fox. Distributed and Cloud Computing: From
Parallel Processing to the Internet of Things. Morgan Kauffman 2011
R3 Additional Reading (as assigned for specific topics)

Learning Outcomes:
No Learning Outcomes

LO1 A comprehensive understanding of the Big Data ecosystem and along with the typical
technologies involved.

LO2 Apply concepts from distributed computing and use the Hadoop/Map-reduce framework and for
solving typical big data problems.

LO3 Identify and use appropriate storage / database platforms for Big data storage along with
appropriate querying mechanisms / interfaces for retrieval.

LO4 Use in-memory processing and stream processing techniques for building Big Data systems.

Modular Structure

Module # Name of Module Contact Sessions

1 Data Engineering 1-2
2 Big Data Analytics 3-5
3 Hadoop Ecosystem 6-9
4 Big Data Storages 10 - 12
5 Spark for Big Data Processing 13 - 16

Part B: Contact Session Plan

Academic Term II Semester 2019-2010

Course Title Big Data Systems
Course No DSECL ZG522
Lead Instructor Pravin Y Pawar

Session # Contact List of Topic / Title Text/Ref

Hours(#) Book/external
resource
1 1 Different Types of Data and Storage for Data:
Structured Data (Relational Databases) , Semi-structured T1 Ch. 1
data (Object Stores), and Unstructured Data (File systems)

What is Big Data? T1 Ch.2

Characteristics of Big Data.

Systems perspective - Processing: In-memory vs. (from)

secondary storage vs. (over the) network R2 Sec 1.2.3

2 Locality of Reference: Principle, examples

Impact of Latency: Algorithms and data structures that Class Slides

leverage locality, data organization on disk for better
locality
2 3 Parallel and Distributed Processing: Motivation (Size R2 Sec. 1.2, 1.3.4,
of data and complexity of processing); Storing data in and 1.4.1
parallel and distributed systems: Shared Memory vs.
Message Passing; Strategies for data access: Partition,
Replication, and Messaging.

4 Memory Hierarchy in Distributed Systems: In-node Class Slides

vs. over the network latencies, Locality, Communication
Cost.

Distributed Systems: Motivation (size, scalability, R2 Sec. 2.1 to 2.3

cost-benefit), Client-Server vs. Peer-to-Peer models,
Cluster Computing: Components and Architecture
3 5 Big Data Analytics: Requirements, constraints, T1 Sec. 3.1 to 3.11;
approaches, and technologies. R1 Ch. 3 and Ch. 6

6 Big Data Systems – Characteristics: Failures; T1 Ch. 4

Reliability and Availability; Consistency – Notions of AR
Consistency.

4 7 CAP Theorem and implications for Big data T1 Sec. 3.12 and
Analytics 3.13; AR

8 Big Data Lifecycle: Data Acquisition, Data Extraction T1 Sec. 2.9 to 2.12;
–Validation and Cleaning, Data Loading, Data R1 Ch. 6 and Ch. 7
Transformation, Data Analysis and Visualization. Case
study – Big data application

5 9-10 Distributed Computing - Design Strategy: AR

Divide-and-conquer for Parallel / Distributed Systems -
Basic scenarios and Implications.
Programming Patterns: Data-parallel programs and
map as a construct; Tree-parallelism, and reduce as a
construct; Map-reduce model: Examples (of map, reduce,
map-reduce combinations, and Iterative map-reduce)
6 11-12 Hadoop: Introduction, Architecture, and Map-reduce T1 Sec. 5.1 and 5.2,
Programming on Hadoop Sec. 5.7, Sec. 5.11,
and Ch. 8; R1 Ch.
5 and Ch. 9; R2
Sec. 1.4.3 and 6.2.2;
AR
7 13-14 Hadoop: Hadoop Distributed File System (HDFS), T1 5.10 and 5.12;
Scheduling in Hadoop (using YARN). Example – R1 Ch. 4 (sections
Hadoop application. on HDFS and Yarn)
and Ch. 11;
AR
8 15-16 Hadoop Ecosystem: Databases and Querying (HBASE, T1 Sec. 5.13;
Pig, and Hive) R1 Ch. 4 (sections
on HBase, Hive,
and Pig) and Ch. 5
(section on HBase)
9 17-18 Hadoop Ecosystem: Integration and coordination T1 Sec. 5.13;
(Sqoop, Flume, Zookeeper & Oozie) R1 Ch. 4 (sections
on Sqoop, Flume,
Zookeeper &
Oozie)
10 19-20 NoSQL databases: Introduction, Architecture, T1 Sec. 4.1, Ch. 6,
Querying, Variants, Case Study. and Ch. 7

11 21 Cloud Computing: A brief overview: Motivation, AR

Structure and Components; Characteristics and
advantages – Elasticity. Services on the cloud.

22 Storage as a Service: Forms of storage on the cloud, AR

databases on the cloud.
12 23 Amazon’s storage services: block storage, file system, AR (sourced from
and database; EBS, SimpleDB, S3 Amazon)

24 Case study – Amazon DynamoDB (Access/Querying -

model, Database architecture and applications on the
cloud).
13 25 Spark: Introduction, Architecture and Features AR

26 Programming on Spark: Resilient Distributed Datasets, AR (Apache Spark

Transformation, Examples docs.)
14 27-28 Machine Learning (on Spark): Regression, AR (Apache Spark
Classification, Collaborative Filtering, and Clustering. docs.)
15 29-30 Streaming: Stream Processing – Motivation, Examples, AR
Constraints, and Approaches.
16 31-32 Streaming on Spark: Architecture of Spark Streaming, AR (Apache Spark
Stream Processing Model, Example. docs.)

Select Topics for experiential learning

Topic Select Topics in Syllabus for experiential learning
No.

1 ● Exercises on Distributed Systems – Hadoop;

● Exercises using Map-reduce model: Map only and reduce only jobs, Standard patterns in
map reduce models.

2 ● Exercises on NoSQL;
● Exercises on NoSQL database – Simple CRUD operations and Failure / Consistency tests;
● Exercises to implement a Web based application that uses NoSQL databases

3 ● Exercises with Pig queries to perform Map-reduce job and understand how to build queries
and underlying principles;
● Exercises on creating Hive databases and operations on Hive, exploring built in functions,
partitioning, data analysis

4 ● Exercises on Spark to demonstrate RDD, and operations such as Map, FlatMap, Filter,
PairRDD;
● Typical Spark Programming idioms such as : Selecting Top N, Sorting, and Joins;
● Exercises on Spark SQL and DataFrames

5 ● Exercises using Spark MLLib: Regression, Classification, Collaborative Filtering, Clustering

6 ● Exercises on Analytics on the Cloud – using AWS, AWS Map-Reduce, AWS data stores /
databases.

[Note: A few of these topics for experiential learning will be covered by video demonstrations and/or
participatory lab sessions operated remotely. Rest of them will be assigned as homework and may be included
for evaluation – see below. End of Note.]

Evaluation Scheme
Legend: EC = Evaluation Component
No Name Type Duration Weight Day, Date, Session, Time
Assignment I Take-home, Programming (10+15)
and use of platforms = 25 %
EC-1 Assignment II

Quiz I Online, at scheduled time 5%

EC-2 Mid-Semester Test Closed Book 1.5 hours 30% As per Programme Calendar
EC-3 Comprehensive Exam Open Book 2.5 hours 40% As per Programme Calendar

Important Information
Syllabus for Mid-Semester Test (Closed Book): Topics in Weeks 1-8
Syllabus for Comprehensive Exam (Open Book): All topics given in plan of study

Evaluation Guidelines:
1. EC-1 consists of two Assignments and one quiz. Announcements regarding the same will be made in a
timely manner.
2. For Closed Book tests: No books or reference material of any kind will be permitted. Laptops/Mobiles
of any kind are not allowed. Exchange of any material is not allowed.
3. For Open Book exams: Use of prescribed and reference text books, in original (not photocopies) is
permitted. Class notes/slides as reference material in filed or bound form is permitted. All other
additional reading materials in filed / bound form are also permitted. However, loose sheets of paper
will not be allowed. Use of calculators is permitted in all exams. Laptops/Mobiles of any kind are not
allowed. Exchange of any material is not allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student should
follow the procedure to apply for the Make-Up Test/Exam. The genuineness of the reason for absence
in the Regular Exam shall be assessed prior to giving permission to appear for the Make-up Exam.
Make-Up Test/Exam will be conducted only at selected exam centres on the dates to be announced
later.
It shall be the responsibility of the individual student to be regular in maintaining the self-study schedule as
given in the course handout, attend the lectures, and take all the prescribed evaluation components such as
Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme.

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2133)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

BigDataSystems Regular HO

Uploaded by

BigDataSystems Regular HO

Uploaded by

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES

Part A: Content Design

Course Title Big Data Systems

CO Enable students to implement solutions for big data processing

CO Enable students to develop a working knowledge of stream processing

Reference Book(s) & other resources

Module # Name of Module Contact Sessions

Part B: Contact Session Plan

Academic Term II Semester 2019-2010

Session # Contact List of Topic / Title Text/Ref

What is Big Data? T1 Ch.2

Systems perspective - Processing: In-memory vs. (from)

2 Locality of Reference: Principle, examples

Impact of Latency: Algorithms and data structures that Class Slides

4 Memory Hierarchy in Distributed Systems: In-node Class Slides

Distributed Systems: Motivation (size, scalability, R2 Sec. 2.1 to 2.3

6 Big Data Systems – Characteristics: Failures; T1 Ch. 4

5 9-10 Distributed Computing - Design Strategy: AR

11 21 Cloud Computing: A brief overview: Motivation, AR

22 Storage as a Service: Forms of storage on the cloud, AR

24 Case study – Amazon DynamoDB (Access/Querying -

26 Programming on Spark: Resilient Distributed Datasets, AR (Apache Spark

Select Topics for experiential learning

1 ● Exercises on Distributed Systems – Hadoop;

5 ● Exercises using Spark MLLib: Regression, Classification, Collaborative Filtering, Clustering

Quiz I Online, at scheduled time 5%

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.