0% found this document useful (0 votes)
55 views2 pages

Big Data-Spark Lab Syllabus

The document outlines the syllabus for the CS606PC Big Data-Spark course at JNTU Hyderabad, focusing on processing Big Data using Spark and Hadoop. It includes course objectives, outcomes, a list of experiments, and recommended textbooks and web links. Key topics covered include MapReduce programming, Hive queries, Spark SQL, and PySpark operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views2 pages

Big Data-Spark Lab Syllabus

The document outlines the syllabus for the CS606PC Big Data-Spark course at JNTU Hyderabad, focusing on processing Big Data using Spark and Hadoop. It includes course objectives, outcomes, a list of experiments, and recommended textbooks and web links. Key topics covered include MapReduce programming, Hive queries, Spark SQL, and PySpark operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

R22 B.Tech.

CSE Syllabus JNTU Hyderabad

CS606PC: BIG DATA-SPARK

B.Tech. III Year II Sem. L T P C


0 0 4 2
Course Objectives:
 The main objective of the course is to process Big Data with advance architecture like spark
and streaming data in Spark

Course Outcomes:
 Develop MapReduce Programs to analyze large dataset Using Hadoop and Spark
 Write Hive queries to analyze large dataset Outline the Spark Ecosystem and its components
 Perform the filter, count, distinct, map, flatMap RDD Operations in Spark.
 Build Queries using Spark SQL
 Apply Spark joins on Sample Data Sets
 Make use of sqoop to import and export data from hadoop to database and vice-versa

List of Experiments:
1. To Study of Big Data Analytics and Hadoop Architecture
(i) know the concept of big data architecture
(ii) know the concept of Hadoop architecture

2. Loading DataSet in to HDFS for Spark Analysis


Installation of Hadoop and cluster management
(i) Installing Hadoop single node cluster in ubuntu environment
(ii) Knowing the differencing between single node clusters and multi-node clusters
(iii) Accessing WEB-UI and the port number
(iv) Installing and accessing the environments such as hive and sqoop

3. File management tasks & Basic linux commands


(i) Creating a directory in HDFS
(ii) Moving forth and back to directories
(iii) Listing directory contents
(iv) Uploading and downloading a file in HDFS
(v) Checking the contents of the file
(vi) Copying and moving files
(vii) Copying and moving files between local to HDFS environment
(viii) Removing files and paths
(ix) Displaying few lines of a file
(x) Display the aggregate length of a file
(xi) Checking the permissions of a file
(xii) Zipping and unzipping the files with & without permission pasting it to a location
(xiii) Copy, Paste commands

4. Map-reducing
(i) Definition of Map-reduce
(ii) Its stages and terminologies
(iii) Word-count program to understand map-reduce (Mapper phase, Reducer phase, Driver
code)
5. Implementing Matrix-Multiplication with Hadoop Map-reduce

6. Compute Average Salary and Total Salary by Gender for an Enterprise.

Page 115 of 154


R22 B.Tech. CSE Syllabus JNTU Hyderabad

7. (i) Creating hive tables (External and internal)


(ii) Loading data to external hive tables from sql tables(or)Structured c.s.v using scoop
(iii) Performing operations like filterations and updations
(iv) Performing Join (inner, outer etc)
(v) Writing User defined function on hive tables

8. Create a sql table of employees Employee table with id,designation Salary table (salary ,dept
id) Create external table in hive with similar schema of above tables,Move data to hive using
scoop and load the contents into tables,filter a new table and write a UDF to encrypt the table
with AES-algorithm, Decrypt it with key to show contents

9. (i) Pyspark Definition(Apache Pyspark) and difference between Pyspark, Scala, pandas
(ii) Pyspark files and class methods
(iii) get(file name)
(iv) get root directory()

10. Pyspark -RDD’S


(i) what is RDD’s?
(ii) ways to Create RDD
(iii) parallelized collections
(iv) external dataset
(v) existing RDD’s
(vi) Spark RDD’s operations (Count, foreach(), Collect, join,Cache()

11. Perform pyspark transformations


(i) map and flatMap
(ii) to remove the words, which are not necessary to analyze this text.
(iii) groupBy
(iv) What if we want to calculate how many times each word is coming in corpus ?
(v) How do I perform a task (say count the words ‘spark’ and ‘apache’ in rdd3) separatly on
each partition and get the output of the task performed in these partition ?
(vi) unions of RDD
(vii) join two pairs of RDD Based upon their key

12. Pyspark sparkconf-Attributes and applications


(i) What is Pyspark spark conf ()
(ii) Using spark conf create a spark session to write a dataframe to read details in a c.s.v and
later move that c.s.v to another location

TEXT BOOKS:
1. Spark in Action, Marko Bonaci and Petar Zecevic, Manning.
2. PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes, Raju Kumar Mishra and
Sundar Rajan Raman, Apress Media.

WEB LINKS:
1. https://infyspringboard.onwingspan.com/web/en/app/toc/lex_auth_013301505844518912251
8 2_shared/overview
2. https://infyspringboard.onwingspan.com/web/en/app/toc/lex_auth_01258388119638835242_s
hared/overview
3. https://infyspringboard.onwingspan.com/web/en/app/toc/lex_auth_012605268423008256169
2 _shared/overview

Page 116 of 154

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy