0% found this document useful (0 votes)

74 views9 pages

Gujarat Technological University

This document outlines the syllabus for the Big Data Tools course offered at Gujarat Technological University. The course aims to introduce students to big data and various big data tools. It covers topics like introduction to big data, NoSQL and Hadoop, MongoDB and MapReduce, HIVE and Pig, and an overview of Spark. The course is divided into 5 units that cover these topics. It also lists the learning objectives, prerequisites, textbook references, and other resources to be used for the course. Upon completion, students will understand the fundamentals of big data and its tools and techniques.

Uploaded by

navin rathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views9 pages

Gujarat Technological University

Uploaded by

navin rathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

GUJARAT TECHNOLOGICAL UNIVERSITY

With effective
Syllabus for Master of Computer Applications, 4th Semester from academic
Subject Name: Big Data Tools (BDT) year 2018-19
Subject Code: 4649306

1. Learning Objectives:
 To understand basics of Big Data
 To understand various Big Data Tools

2. Prerequisites: Working knowledge of Programming Language and Database Concepts

3. Contents:

Unit Course Content Weightage

percentage
Unit I Unit 1: Introduction to Big Data 15%

Types of Digital Data: Classification of Data (Structured, Semi-

structured and Unstructured), Characteristics of Data, Evolution of Big
Data, Definition of Big Data, Challenges of Big Data, Characteristics of
Big Data (Volume, Velocity, Variety), Other characteristics of Big Data
which are not Definitional Traits of Big Data, Why Big Data?, Are we
Information Consumer or Producer? Traditional BI vs Big Data, Typical
Data Warehouse Environment, Typical Hadoop Environment, What is
Changing in Realms of Big Data? Terminologies used in Big Data
Environments
Unit II Unit 2: Introduction to NoSQL and Hadoop 25%

NoSQL: Introduction: Where is it used? What is it?, Types of NoSQL

databases, Why NoSQL?, Advantages of NoSQL, Use of NoSQL in
Industry, SQL vs NoSQL, NewSQL

Hadoop: Introduction, Distributed Computing Challenges, History of

Hadoop, Overview of Hadoop and Hadoop Ecosystems, Features and
key advantages of Hadoop, Versions of Hadoop, Hadoop distributions,
RDBMS versus Hadoop, Hadoop vs SQL, Integrated Hadoop Systems
offered by leading market vendors, Cloud based Hadoop solutions,
HDFS, Processing data with Hadoop, Managing Resources and
applications with Hadoop YARN, Interacting with Hadoop Ecosystem
Unit III Unit 3: Introduction to MongoDB and MapReduce 25%

MongoDB: Introduction: What is MongoDB? Why MongoDb? (using

JSON, Creating or generating a unique key, Support for Dynamic Queries,
Storing Binary Data, Replication, Sharding, Updating information in –
place), Terms used in RDBMS and MongoDb, Data types in MongoDb,
MongoDB Query Language

MapReduce: Data Flow, Map, Shuffle, Sort, Reduce, Hadoop Streaming,

mrjob, Installation, wordcount in mrjob, Executing mrjob

Page no. 1 of 9
GUJARAT TECHNOLOGICAL UNIVERSITY
With effective
Syllabus for Master of Computer Applications, 4th Semester from academic
Subject Name: Big Data Tools (BDT) year 2018-19
Subject Code: 4649306

Unit IV Unit 4: Introduction to HIVE and Pig 25%

HIVE: Introduction: What is HIVE? HIVE Architecture, HIVE data

Types, HIVE File Formats, HIVE Query Language, RCFile
implementation, SerDe, User-Defined Functions (UDF)
Pig: Introduction: What is Pig? The anatomy of Pig, Pig on Hadoop, Pig
philosophy, Use Case for Pig- ETL Processing, Pig Latin overview,
Data types in Pig, Running Pig, Execution modes of Pig, HDFS
commands, Relational operators, Eval function, Complex Data Types,
Piggy Bank, User-defined Functions, Parameter substitution, Diagnostic
Operator, Word Count Example using Pig, When to use and not use
Pig? ,Pig at Yahoo, Pig vs HIVE.
Unit V Unit 5: Overview of SPARK 10%
Introduction to Data Analysis with Spark, Downloading Spark and
Getting Started, Programming with RDDs

4. Text Book(s):

1) Seema Acharya, Subhashini Chellappan, “ Big Data and Analytics”, Wiley India Pvt.
Ltd.,2015
2) Matei Zaharia, Patrick Wendell, Andy Konwinski, Holden Karau ,“Learning
Spark”,O'Reilly Media,2015
3) Zachary Radtka and Donald Miner,“Hadoop with Python'',O'Reilly Media,2016
(Free ebook is available on the following link)(As on 12-10-2018)
https://www.oreilly.com/programming/free/hadoop-with-python.csp

5. Reference Books:

1) Shashank Tiwari, “ Professional NoSQL”, Wiley India Pvt. Ltd.,2011

2) Kyle Banker,Peter Bakkum,Shaun Verch,Douglas Garrett,Tim Hawkins,“MongoDB in
Action”, DreamTech Press, 2nd Edition ,2016
3) Chris Eaton,Paul Zikopoulos,Tom Deutsch,George Lapis,Dirk Deroos,“Understanding
Big Data : Analytics for Enterprise Class Hadoop and Streaming Data”, Mcgraw Hill
Education (India)Pvt.Ltd.,2012
4) Tom White,“Hadoop: The Definitive Guide”,O'Reilly Media,4th Edition,2015
5) Vignesh Prajapati,“Big Data Analytics With R and Hadoop”, Packt Pub Ltd ,2013
6) Dt Editorial Services,“Big Data - Black Book”, Dreamtech Press,2016

Web Resources:

a) http://www.bigdatauniversity.com
b) http://www.mongodb.com
c) http://hadoop.apache.org/

6. Unit wise coverage from Textbook(s):

Unit 1 Book# Topics
I 1 Chapter. 1, 2, 3.12

Page no. 2 of 9
GUJARAT TECHNOLOGICAL UNIVERSITY
With effective
Syllabus for Master of Computer Applications, 4th Semester from academic
Subject Name: Big Data Tools (BDT) year 2018-19
Subject Code: 4649306

II 1 Chapter 4,5
III 1,3 Chapter 6 (Book 1), Chapter 2 (Book 3)
IV 1 Chapter 9,10
V 2 Chapter 1,2 and 3 (For Chapter 2 and 3, only Python, No Java, No Scala)

7. Accomplishment
Student will understand fundamentals of Big Data, Tools and Techniques.

Page no. 3 of 9
GUJARAT TECHNOLOGICAL UNIVERSITY
With effective
Syllabus for Master of Computer Applications, 4th Semester from academic
Subject Name: Big Data Tools (BDT) year 2018-19
Subject Code: 4649306

Practical List
Part I: MongoDb

 Learn to Use MongoDB Atlas (The Cloud Version of MongoDB)

 Install and configure MongoDB

MongoDB Shell Commands / Queries: View all databases, Create new database, Drop
existing database, View current database, Switch over to a given database, db.help(), Display
statistics of a given database, Display current version of MongoDB Server, Display list of
collections in current database, Create Collection, Drop Collection, CRUD operations
(Create, Read, Update, Delete), Insert, Update else insert, save, update, remove, Find,
Dealing with Using NULL Values, Count, Limit, Sort, Skip, Arrays and Array Operations,
Aggregate

1) Create a StudentMaster database with a collection called “Student” containing

documents with some or all of the following fields: StudentRollNo, StudentName,
Grade, Hobbies, and DOJ. Perform the following operations on the database:
1. Insert 10 Records in the database.
2. Find the document wherein the “StudName” has value “Ajay Rathod”.
3. Find all documents in proper format. (Without _Id field)
4. Retrieve only Student Name and Grade.
5. Retrieve Student Name and Grade of student who is having _id column is 1.
6. Add new field “Address” in Student Collection.
7. Find those documents where the Grade is set to ‘VII’.
8. Find those documents where the Grade is not set to ‘VII’.
9. Find those documents where the Hobbies is set to either ‘Chess’or is set to
‘Dancing”.
10. Find those documents where the Hobbies is set neither to ‘Chess’ nor is set to
‘Dancing”.
11. Find those documents where the student name begins with ‘M’.
12. Find those documents where the student name has an “e” in any position.
13. Find those documents where the student name ends in “a”.
14. Find total number of documents.
15. Find total the number of documents where Grade is ‘VII’.
16. Sort the documents in ascending order of student name.
17. Display the last two records.

2) Create a MovieMaker Database with a collection called “Movies “containing

documents with some or all of the following fields: titles, directors, years, actors.
Perform the following operations on the database (either in the console or using any
programming language):
1. Retrieve all documents
2. Retrieve all documents with Director set to "Quentin Tarantino"
3. Retrieve all documents where actors include "Brad Pitt".
4. Retrieve all movies released before the year 2000 or after 2010.

Page no. 4 of 9
GUJARAT TECHNOLOGICAL UNIVERSITY
With effective
Syllabus for Master of Computer Applications, 4th Semester from academic
Subject Name: Big Data Tools (BDT) year 2018-19
Subject Code: 4649306

5. Add a synopsis to "The Hobbit: An Unexpected Journey" : "A reluctant hobbit,

Bilbo Baggins, sets out to the Lonely Mountain with a spirited group of dwarves
to reclaim their mountain home - and the gold within it - from the dragon Smaug."
6. Add a synopsis to "The Hobbit: The Desolation of Smaug" : "The dwarves, along
with Bilbo Baggins and Gandalf the Grey, continue their quest to reclaim Erebor,
their homeland, from Smaug. Bilbo Baggins is in possession of a mysterious and
magical ring."
7. Add an actor named "Samuel L. Jackson" to the movie "Pulp Fiction"
8. Find all movies that have a synopsis that contains the word "Bilbo"
9. Find all movies that have a synopsis that contains the word "Gandalf"
10. Find all movies that have a synopsis that contains the word "Bilbo" and not the
word "Gandalf"
11. Find all movies that have a synopsis that contains the word "dwarves" or "hobbit"
12. Find all movies that have a synopsis that contains the word "gold" and "dragon".
13. Delete the movie "Pee Wee Herman's Big Adventure”

3) Create a database named “BookStore” in MongoDB with a collection called “Books”

containing documents with some or all of the following fields: bookId, bookTitle,
authors(containing fields: authorName), publicationYear, publisher,
Orders(containing fields: OrderedId, orderDate, customerName, price,
quantityOrdered, discount).
Note that a book may have one or more authors and orders. Also, the same OrdereId
can be present in one or more books. Perform the following operations on the
database (either in the console or using any programming language):
1. Insert records for 10 books from 5 authors, and at least 20 orders in total.
2. Update the title of a particular book.
3. Display all the books having less than 3 authors and sort by book name.
4. Display the number of books from each publisher.
5. Use MapReduce function to display the total quantity of books ordered for each
date.
6. Use MapReduce function to display the discount offered to a particular customer.
4) Create a database named “Store” in MongoDB with a collection called “Sales”
containing documents with some or all of the following fields: customerId,
customerName, gender, dataOfBirth, contactNumber, address (containing fields:
houseNo, street, area, city, pincode), orders(containing fields: orderId, orderDate,
items(containing fields: itemId, itemName, itemPrice, quantityOrdered, discount)).
Note that some customers may not provide their date of birth and/or contact number.
Also, not all products would be sold at a discount. Perform the following operations
on the database (either in the console or using any programming language):

1. Insert records for 3 customers and 5 items in at least 20 orders.

2. Update the contact number of a particular customer.
3. Display customerId, customerName, gender, contactNumber, of customers
residing in “Ahmedabad”.
4. Display city-wise count of customers
5. Use MapReduce function to display the number of times each item was sold.

Page no. 5 of 9
GUJARAT TECHNOLOGICAL UNIVERSITY
With effective
Syllabus for Master of Computer Applications, 4th Semester from academic
Subject Name: Big Data Tools (BDT) year 2018-19
Subject Code: 4649306

5) Create a database “BookStore” with a collection called “Books” containing

documents with some or all of the following fields:
Category,BookName,Author,quantity, price,pages. Perform the following operations
on the database:
1. Insert Records for 5 books.
2. Write Map & Reduce functions to split the books into the following two
categories: Bigbooks, Smallbooks. (Books which have more than 300 pages
should be in the Big books category. Books which have less than 300 pages
should be in the Small books category.)
3. Count the number of books in each category
4. Store the output as follow as documents in a new collection called “Book_Result”.

Book Count of the

Category Books
Big books 2
Small books 3

Part II: Hadoop HDFS

 Installation and configuration for: Apache Hadoop Stand-Alone Mode and Pseudo
Distributed Mode
 Installation and configuration for: Apache Hadoop Real Cluster consisting of a single
Master and Two Slave nodes.
 Test the above set-up with sample examples bundled along with the downloaded
package.
 To develop and execute sample programs like word-count, maximum temperature,
etc. Using Python with Map-Reduce in Hadoop
 HDFS Commands: -ls, -ls -R, -mkdir, -put, -get, -copyFromLocal, -copyToLocal, -
cat, -cp, -rm-r
1) Create a file “Sample” in a local file system and export it to the HDFS File System.
2) Write the HDFS command for copying a “Sample” file from HDFS to local File
System.
3) Write HDFS commands for creating “Test” directory in HDFS and then removing that
directory.
4) Write HDFS command to display complete list of directories and files of HDFS.
5) Write HDFS command for displaying the contents of “Sample” text file in HDFS on
screen.
6) Write HDFS command for copying an existing “Sample” file in a “Test” HDFS
directory to some another HDFS directory.

Part III: MapReduce

1) Prepare an “input” folder containing multiple text files. Create a program using
MapReduce that would accept the path to the “input” folder and generate an “output”
folder having a text file containing the total number of occurrences of each single
word present in text document. For example, if the text containing in input files is as
follows:
“We thank you for your visit to Ahmedabad. We hope that you would visit us again.”
The Output should be as follow:

Page no. 6 of 9
GUJARAT TECHNOLOGICAL UNIVERSITY
With effective
Syllabus for Master of Computer Applications, 4th Semester from academic
Subject Name: Big Data Tools (BDT) year 2018-19
Subject Code: 4649306

Word Word Length No of occurrences

We 2 2
To 2 1
Us 2 1
You 3 2
For 3 1
Your 4 1
That 4 1
Hope 4 1
Thank 5 1
Visit 5 2
Would 5 1
Again 5 1
Ahmedabad 9 1
2) Write a program for Matrix Vector Multiplication using MapReduce.
3) Write a program to perform Union, Intersection and Difference operation using
MapReduce on following files.
Input files:
a) Content of file 1 (apple, orange, mango, apple, banana)
b) Content of file 2 (apple, apple, plum, kiwi, kiwi, mango, mango)
c) Content of file 3 (orange, orange, plum, grapes, kiwi, mango, apple)

Part IV: Pig

 Install and configure Apache Pig
 Test the Pig Installation for local and map-reduce mode execution
 Test the Pig Installation for Interactive (Grunt Shell) and Batch Mode (.pig file)
Execution
 Develop UDF (User Defined Function) in Python for Pig

Working with Pig Operators/Functions (LOAD, DUMP, FOREACH, GROUP, DISTINCT,

LIMIT, ORDER BY, JOIN, UNION, SPLIT, SAMPLE, AVG, MAX, COUNT, TUPLE,
MAP,PIGGY BANK, PARAMETER SUBSTITUTION, DESCRIBE, Simple Problems like
Word Count using PIG

1) Write a pig script to load and store “Student data”.(Student file contain Roll no,
Name, Marks and GPA).
a) Filter all the students who are having GPA>5.
b) Display the name of all Students in Uppercase.
c) Group tuples of students based on their GPA.
d) Remove duplicates tuple of Student list.
e) Display first three tuples from “student” relation.
f) Display the names of students in ascending order.
g) Join two relation namely Student and department (Rno,DeptNo,DeptName) based
on the values contain in the roll no column.
h) Merge content of two relation Student and department.
i) Partition a relation based on the GPA’s acquired by students.

Page no. 7 of 9
GUJARAT TECHNOLOGICAL UNIVERSITY
With effective
Syllabus for Master of Computer Applications, 4th Semester from academic
Subject Name: Big Data Tools (BDT) year 2018-19
Subject Code: 4649306

j) To calculate the average marks for each student.

k) Calculate maximum marks of each student.
l) Count the number of tuples in a bag.

2) Load the file menu.csv (Category,Name, Price) and write one Pig script

a) Which meals cost more than 30.00?

b) Which meals contain the word “Panner ”?
c) Which are the 10 most expensive meals?
d) For every day, what’s the average price for a meal?
e) For every day, what’s the most expensive meal?

3) Write a program to count Word on Pig.

4) Write a pig script to spilt customers for reward program based on their life time
values.
If Life time values is >1000 and < =2000 then Silver Program
If Life time values is >20000 then Gold Program
Input:

Customers Lifetime value

Jack 25000
Smith 8000
David 12000
John 15000
Scott 12000
Lucy 28000
Ajay 12000
Vinay 30000
Joseph 21000
Joshi 25000

5) Create a data file for below schemas:

Order: CustomerId, ItemId,ItemName,OrderDate,DelivaryDate
Customer: CustomerId,CustomerName, Address,City,State,Country
a) Load Order and Customer Data.
b) Write a pig latin Script to determine number of items bought by each customer.
6) Do the Following:
1. Create a file which contains bag dataset as shown below.
User Id From To
user1001 user1001@sample.com {(user003@sample.com),(user004@sample.com),
(user006@sample.com)}
user1002 user1002@sample.com {(user005@sample.com),(user006@sample.com)}
user1003 user1003@sample.com {(user001@sample.com),(user005@sample.com)}

2. Write a pig latin statement to display the names of all users who have sent emails
and also a list of the people that have sent the email to.
3. Store the result in a file.

Page no. 8 of 9
GUJARAT TECHNOLOGICAL UNIVERSITY
With effective
Syllabus for Master of Computer Applications, 4th Semester from academic
Subject Name: Big Data Tools (BDT) year 2018-19
Subject Code: 4649306

7) Create a UDF to convert name into uppercase.

Part V: Hive
 Install and configure Apache Hive
 SerDe and User Defined Function Creation in Hive using Java

Create database, display list of existing databases, describe database, describe extended
database, alter database properties, to make a given database as current database, drop
database, create managed table, create external table, loading data into a table, working with
collection data types, querying a table using select, querying collection data types, create
static partition and load data into it from original table, static partition creation using alter,
create dynamic partition, load data into dynamic partition, create bucket, create view, query
view, drop view, sub-query, joins, Aggregation, Group By and Having, RC File
Implementation

1. Create a data file for below schemas

Order: CustId,ItemId,ItemName,OrderDate,Delivary Date
Customer:CustId,CustName,Address,City,State,Country

a) Create a table for Orders and Customer Data.

b) Write Hive Query Language to find number of items bought by each customer.

2. Create a partition table for Customer Schema to reward customer based on their life
time value.
Customer Customers Lifetime value
Id
1001 Jack 25000
1002 Smith 8000
1003 David 12000
1004 John 15000
1005 Scott 12000
1006 Lucy 28000
1007 Ajay 12000
1008 Vinay 30000
1009 Joseph 21000
1010 Joshi 25000

a) Create partition table if life time value is 12000.

b) Create partition table for all life time values.

Note: Some of the practicals form the above practical list may have seemingly similar
definitions. For better learning and good practice, it is advised that students do maximum
number of practicals. In the practical examination, the definition asked need not have the
same wordings as given in the practical list. However, the definitions asked in the exams will
be similar to the ones given in the practical list.

Page no. 9 of 9

Quantitative Controller: ZJ-LCD-M Flow Meter User Manual
0% (1)
Quantitative Controller: ZJ-LCD-M Flow Meter User Manual
4 pages
It - (R20) - 4-1 - Big Data Analytics - Digital Notes
No ratings yet
It - (R20) - 4-1 - Big Data Analytics - Digital Notes
117 pages
Digital Notes of Big Data Analytics Dated 5.1.2024
No ratings yet
Digital Notes of Big Data Analytics Dated 5.1.2024
175 pages
Bda - Digital Notes
No ratings yet
Bda - Digital Notes
85 pages
BDA Courseplan
No ratings yet
BDA Courseplan
3 pages
IP5516 - Boat 408
No ratings yet
IP5516 - Boat 408
17 pages
Big Data Analytics-Digital Notes
No ratings yet
Big Data Analytics-Digital Notes
86 pages
Bca Bigdata Fifth - Sem Approved Syllabus
No ratings yet
Bca Bigdata Fifth - Sem Approved Syllabus
23 pages
Amc Book 1 2018 Secure
100% (7)
Amc Book 1 2018 Secure
275 pages
Microwave Remote Sensing
No ratings yet
Microwave Remote Sensing
66 pages
Big Data Analytics
No ratings yet
Big Data Analytics
131 pages
ENCOR - Chapter - 1 - Packet Forwarding
No ratings yet
ENCOR - Chapter - 1 - Packet Forwarding
57 pages
Unit 1 - BD - Introduction To Big Data
100% (1)
Unit 1 - BD - Introduction To Big Data
90 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
Ibda Course File
No ratings yet
Ibda Course File
33 pages
Unit 1
No ratings yet
Unit 1
118 pages
BE AIDS R 20 VII VIII Sem Syllabus - Compressed
No ratings yet
BE AIDS R 20 VII VIII Sem Syllabus - Compressed
55 pages
Big Data Analytics (R20a0520)
No ratings yet
Big Data Analytics (R20a0520)
84 pages
Manual Motor DC (Siemens)
No ratings yet
Manual Motor DC (Siemens)
104 pages
COMP 002 Computer Application Module TEACHERS
No ratings yet
COMP 002 Computer Application Module TEACHERS
34 pages
22IS61 Big Data Analytics 2025
No ratings yet
22IS61 Big Data Analytics 2025
4 pages
BDA Techmax (Searchable)
No ratings yet
BDA Techmax (Searchable)
150 pages
Fundamental Biostatistics Dillon Jones
No ratings yet
Fundamental Biostatistics Dillon Jones
68 pages
MCA - II Sem - Curriculum and Syllabus
No ratings yet
MCA - II Sem - Curriculum and Syllabus
15 pages
Big Data Analytics Digital Notes
No ratings yet
Big Data Analytics Digital Notes
119 pages
Mathematical Literacy P2 Feb-March 2011 Memo Eng
No ratings yet
Mathematical Literacy P2 Feb-March 2011 Memo Eng
23 pages
Big Data Analytics Syllabus - 22UAI603C - 204 - 2025
No ratings yet
Big Data Analytics Syllabus - 22UAI603C - 204 - 2025
2 pages
Big Data Analytics - Sem 7 CVMU
No ratings yet
Big Data Analytics - Sem 7 CVMU
4 pages
Mrcet R20 Iv 1 QB
No ratings yet
Mrcet R20 Iv 1 QB
79 pages
Bda Ap
No ratings yet
Bda Ap
13 pages
Unit 1
No ratings yet
Unit 1
19 pages
Chpt4 ThConsumer Satisfaction Theories A Critical Revieweories
67% (3)
Chpt4 ThConsumer Satisfaction Theories A Critical Revieweories
35 pages
COMP9313: Big Data Management
No ratings yet
COMP9313: Big Data Management
79 pages
Iit Jodhpur Data Engineering & Cloud Computing: Post Graduate Diploma in
No ratings yet
Iit Jodhpur Data Engineering & Cloud Computing: Post Graduate Diploma in
20 pages
Industrial Training Presentation (BHEL)
No ratings yet
Industrial Training Presentation (BHEL)
25 pages
BAD601 Important Question
No ratings yet
BAD601 Important Question
2 pages
Big Data Analytics
No ratings yet
Big Data Analytics
19 pages
Big Data Analytics (R18a0529)
No ratings yet
Big Data Analytics (R18a0529)
134 pages
IA Big Data Lab Works
No ratings yet
IA Big Data Lab Works
7 pages
Information Technology Engineering Syllabus Sem Viii Mumbai University
No ratings yet
Information Technology Engineering Syllabus Sem Viii Mumbai University
60 pages
Introduction To Data Analytics Syllabus
No ratings yet
Introduction To Data Analytics Syllabus
3 pages
BD Course Handout
No ratings yet
BD Course Handout
5 pages
Portion / Timetable STD: III Formative Evaluation (January 2023) Date Subject Portion
No ratings yet
Portion / Timetable STD: III Formative Evaluation (January 2023) Date Subject Portion
2 pages
CS8091 BDA Unit1
No ratings yet
CS8091 BDA Unit1
63 pages
Siddharth Big Data Report 1000016431
No ratings yet
Siddharth Big Data Report 1000016431
6 pages
SEM VII BDA Syllabus Theory
No ratings yet
SEM VII BDA Syllabus Theory
4 pages
On Bottom Stability Analysis and Mudmat Design
No ratings yet
On Bottom Stability Analysis and Mudmat Design
9 pages
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
No ratings yet
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
3 pages
Artificial Intelligence September Month Notes
No ratings yet
Artificial Intelligence September Month Notes
17 pages
IV Yr II Sem Lesson Plans
No ratings yet
IV Yr II Sem Lesson Plans
19 pages
EWP Micro Project
No ratings yet
EWP Micro Project
5 pages
Bda QP PDF
No ratings yet
Bda QP PDF
12 pages
Course Pack BDA
No ratings yet
Course Pack BDA
6 pages
BDA Syllabus - Sem VII - Mumbai University
No ratings yet
BDA Syllabus - Sem VII - Mumbai University
3 pages
CCS334 BDA Syllabus
No ratings yet
CCS334 BDA Syllabus
5 pages
ET472 Datamanagementandanalytics
No ratings yet
ET472 Datamanagementandanalytics
4 pages
Gujarat Technological University: Prerequisite: Rationale
No ratings yet
Gujarat Technological University: Prerequisite: Rationale
4 pages
BIG DATA Analytics
No ratings yet
BIG DATA Analytics
3 pages
Mbacirc
No ratings yet
Mbacirc
6 pages
B.Tech. CS - CE and CSE Syllabus 3rd Year 2024-25
No ratings yet
B.Tech. CS - CE and CSE Syllabus 3rd Year 2024-25
2 pages
Abyss MiniRPG
No ratings yet
Abyss MiniRPG
4 pages
Tyit - Big Data and Next Gen - Syllabus
No ratings yet
Tyit - Big Data and Next Gen - Syllabus
2 pages
IOT Analytics - AI361
No ratings yet
IOT Analytics - AI361
3 pages
2024 25 ODD CE449 BDA Syllabus
No ratings yet
2024 25 ODD CE449 BDA Syllabus
4 pages
Infobasic Programming and T-24 Standard
No ratings yet
Infobasic Programming and T-24 Standard
7 pages
How The Switching Frequency Affects The Performance of A Buck Converter
No ratings yet
How The Switching Frequency Affects The Performance of A Buck Converter
8 pages
Topic 1 Past Exam Extended Questions
No ratings yet
Topic 1 Past Exam Extended Questions
3 pages
Big Data
No ratings yet
Big Data
4 pages
Bigdata Syllabus
No ratings yet
Bigdata Syllabus
3 pages
IIT Jodhpur Postgraduate Diploma in Data Engineering & Cloud Computing
No ratings yet
IIT Jodhpur Postgraduate Diploma in Data Engineering & Cloud Computing
18 pages
Design of Pulley and V Belt
100% (1)
Design of Pulley and V Belt
12 pages
Syllabus
No ratings yet
Syllabus
3 pages
Big Data and Analytics Syllabus 2021
No ratings yet
Big Data and Analytics Syllabus 2021
3 pages
Marketing Research Outline
67% (3)
Marketing Research Outline
7 pages
(Reg. Relationship Steps
No ratings yet
(Reg. Relationship Steps
4 pages
Land Use & Zoning: Line & Grade
No ratings yet
Land Use & Zoning: Line & Grade
19 pages
Syllabus
No ratings yet
Syllabus
2 pages
1 - Assignment - PH 401 (EE) - MODULE - 6 (Statistical Mechanics)
No ratings yet
1 - Assignment - PH 401 (EE) - MODULE - 6 (Statistical Mechanics)
2 pages
Flutter Analysis of The Aircraft Wing: Paramasivam Suresh (Ur13Ae044)
No ratings yet
Flutter Analysis of The Aircraft Wing: Paramasivam Suresh (Ur13Ae044)
9 pages
Electrophoresis Kit Ref. SL-IFX-25 25 Tests: Immunofixation
No ratings yet
Electrophoresis Kit Ref. SL-IFX-25 25 Tests: Immunofixation
3 pages
Experiment No. 1 Linear System Simulator
100% (1)
Experiment No. 1 Linear System Simulator
2 pages
10-An - Swimming Pool Dehumidifier Sizing
No ratings yet
10-An - Swimming Pool Dehumidifier Sizing
4 pages
Basic Principles and Practices in CC1 1
No ratings yet
Basic Principles and Practices in CC1 1
2 pages
Big Data Analytics
No ratings yet
Big Data Analytics
3 pages
BDA Syllabus
No ratings yet
BDA Syllabus
4 pages
Big Data Analytics Syllabus
No ratings yet
Big Data Analytics Syllabus
2 pages
Big Data Analytics Comp Syllabus Sem7
No ratings yet
Big Data Analytics Comp Syllabus Sem7
4 pages
Building Scalable Data-Intensive Applications
From Everand
Building Scalable Data-Intensive Applications
Chandani Kaul
No ratings yet
Mastering Pandas in Python: Course Book
From Everand
Mastering Pandas in Python: Course Book
Pedro Martins
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Gujarat Technological University

Uploaded by

Gujarat Technological University

Uploaded by

GUJARAT TECHNOLOGICAL UNIVERSITY

2. Prerequisites: Working knowledge of Programming Language and Database Concepts

Unit Course Content Weightage

Types of Digital Data: Classification of Data (Structured, Semi-

NoSQL: Introduction: Where is it used? What is it?, Types of NoSQL

Hadoop: Introduction, Distributed Computing Challenges, History of

MongoDB: Introduction: What is MongoDB? Why MongoDb? (using

MapReduce: Data Flow, Map, Shuffle, Sort, Reduce, Hadoop Streaming,

Unit IV Unit 4: Introduction to HIVE and Pig 25%

HIVE: Introduction: What is HIVE? HIVE Architecture, HIVE data

1) Shashank Tiwari, “ Professional NoSQL”, Wiley India Pvt. Ltd.,2011

6. Unit wise coverage from Textbook(s):

 Learn to Use MongoDB Atlas (The Cloud Version of MongoDB)

1) Create a StudentMaster database with a collection called “Student” containing

2) Create a MovieMaker Database with a collection called “Movies “containing

5. Add a synopsis to "The Hobbit: An Unexpected Journey" : "A reluctant hobbit,

3) Create a database named “BookStore” in MongoDB with a collection called “Books”

1. Insert records for 3 customers and 5 items in at least 20 orders.

5) Create a database “BookStore” with a collection called “Books” containing

Book Count of the

Part II: Hadoop HDFS

Part III: MapReduce

Word Word Length No of occurrences

Part IV: Pig

Working with Pig Operators/Functions (LOAD, DUMP, FOREACH, GROUP, DISTINCT,

j) To calculate the average marks for each student.

a) Which meals cost more than 30.00?

3) Write a program to count Word on Pig.

Customers Lifetime value

5) Create a data file for below schemas:

7) Create a UDF to convert name into uppercase.

1. Create a data file for below schemas

a) Create a table for Orders and Customer Data.

a) Create partition table if life time value is 12000.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.