Big Data-Spark Lab Syllabus
Big Data-Spark Lab Syllabus
Course Outcomes:
Develop MapReduce Programs to analyze large dataset Using Hadoop and Spark
Write Hive queries to analyze large dataset Outline the Spark Ecosystem and its components
Perform the filter, count, distinct, map, flatMap RDD Operations in Spark.
Build Queries using Spark SQL
Apply Spark joins on Sample Data Sets
Make use of sqoop to import and export data from hadoop to database and vice-versa
List of Experiments:
1. To Study of Big Data Analytics and Hadoop Architecture
(i) know the concept of big data architecture
(ii) know the concept of Hadoop architecture
4. Map-reducing
(i) Definition of Map-reduce
(ii) Its stages and terminologies
(iii) Word-count program to understand map-reduce (Mapper phase, Reducer phase, Driver
code)
5. Implementing Matrix-Multiplication with Hadoop Map-reduce
8. Create a sql table of employees Employee table with id,designation Salary table (salary ,dept
id) Create external table in hive with similar schema of above tables,Move data to hive using
scoop and load the contents into tables,filter a new table and write a UDF to encrypt the table
with AES-algorithm, Decrypt it with key to show contents
9. (i) Pyspark Definition(Apache Pyspark) and difference between Pyspark, Scala, pandas
(ii) Pyspark files and class methods
(iii) get(file name)
(iv) get root directory()
TEXT BOOKS:
1. Spark in Action, Marko Bonaci and Petar Zecevic, Manning.
2. PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes, Raju Kumar Mishra and
Sundar Rajan Raman, Apress Media.
WEB LINKS:
1. https://infyspringboard.onwingspan.com/web/en/app/toc/lex_auth_013301505844518912251
8 2_shared/overview
2. https://infyspringboard.onwingspan.com/web/en/app/toc/lex_auth_01258388119638835242_s
hared/overview
3. https://infyspringboard.onwingspan.com/web/en/app/toc/lex_auth_012605268423008256169
2 _shared/overview