Big Data Hadoop & Spark: Certification Training
Big Data Hadoop & Spark: Certification Training
Certification Training
In Collaboration with IBM
3. About Intellipaat
4. Key Features
5. Career Support
8. Program Curriculum
9. Self-paced Courses
11. Certification
13. Contact Us
About Intellipaat
Intellipaat is one of the leading e-learning training providers with more than 600,000
learners across 55+ countries. We are on a mission to democratize education as we
believe that everyone has the right to quality education.
Our courses are delivered by subject matter experts from top MNCs, and our world-class
pedagogy enables learners to quickly learn difficult topics in no time. Our 24/7 technical
support and career services will help them jump-start their careers in their dream
companies.
MOCK INTERVIEWS
Mock interviews to make you prepare for cracking interviews by top employers
RESUME PREPARATION
Get assistance in creating a world-class resume from our career services team
The number of jobs for all the US data professionals will increase to 2.7 million per
year – IBM
Big Data is the fastest growing and the most promising technology for handling large
volumes of data for doing Data Analytics. This Big Data Hadoop training will help you be
up and running in the most demanding professional skills. Almost all top MNCs are trying
to get into Big Data Hadoop; hence, there is a huge demand for certified Big Data
professionals. Our Big Data online training will help you learn Big Data and upgrade your
career in the domain.
Big Data Hadoop Developers eager to learn other verticals such as testing,
analytics, and administration
4. INTRODUCTION TO HIVE
4.1 Introducing Hadoop Hive
4.2 Detailed architecture of Hive
4.3 Comparing Hive with Pig and RDBMS
4.4 Working with Hive Query Language
4.5 Creation of a database, table, group by, and other clauses
4.6 Various types of Hive tables and HCatalog
4.7 Storing Hive results, Hive partitioning, and buckets
Hands-on Exercise: Database creation in Hive, dropping a database, Hive table creation,
how to change a database, data loading, dropping and altering a table, pulling data by
writing Hive queries with filter conditions, table partitioning in Hive, and using the Group by
clause
Hands-on Exercise: How to work with Hive queries, the process of joining a table and
writing indexes, external table and sequence table deployment, and data storage in a
different table
6. INTRODUCTION TO PIG
6.1 Apache Pig introduction and its various features
6.2 Various data types and schema in Pig
6.3 The available functions in Pig, Hive bags, tuples, and fields
Hands-on Exercise: Working with Flume for generating a sequence number and
consuming it, using Flume Agent to consume Twitter data, using AVRO to create a Hive
table, AVRO with Pig, creating a table in HBase, and deploying Disable, Scan, and Enable
table functions
Hands-on Exercise: Writing a Spark application using Scala and understanding the
robustness of Scala for the Spark real-time analytics operation
Hands-on Exercise: The resilient distributed dataset (RDD) in Spark, How does it help
speed up Big Data processing?
Hands-on Exercise: How to deploy RDDs with HDFS?, Using the in-memory dataset,
using file for RDDs, how to define the base RDD from an external file? Deploying RDDs via
transformation, using the Map and Reduce functions, and working on word count and
count log severity
Hands-on Exercise: Data querying and transformation using DataFrames and finding out
the benefits of DataFrames over Spark SQL and Spark RDDs
Hands-on Exercise: Twitter Sentiment Analysis, streaming using netcat server, Kafka–
Spark Streaming, and Spark–Flume Streaming
Hands-on Exercise: How to go about ensuring the MapReduce File System Recovery
for different scenarios, JMX monitoring of the Hadoop cluster, How to use the logs and
stack traces for monitoring and troubleshooting, Using the Job Scheduler for scheduling
jobs in the same cluster, Getting the MapReduce job submission flow, FIFO schedule,
and Getting to know the Fair Scheduler and its configuration
25. TEST PLAN STRATEGY AND WRITING TEST CASES FOR TESTING
HADOOP APPLICATION
25.1 Test, install, and configure test cases
In this project, you will successfully import data using Sqoop into HDFS for data
analysis. The transfer will be done via Sqoop data transfer from RDBMS to Hadoop.
You will code in the Hive query language and carry out data querying and analysis.
You will acquire an understanding of Hive and Sqoop after the completion of this
project.
You will create the top-ten-movies list using the MovieLens data. For this project,
you will use the MapReduce program to work on the data file, Apache Pig to analyze
Bring the daily incremental data into the Hadoop Distributed File System. As part of
the project, you will be using Sqoop commands to bring the data into HDFS, working
with the end-to-end flow of transaction data, and the data from HDFS. You will work
on a live Hadoop YARN cluster. You will also work on the YARN central resource
manager.
In this project, you will learn how to improve the query speed using Hive data
partitioning. You will get hands-on experience in partitioning Hive tables manually,
deploying single SQL execution in dynamic partitioning, and bucketing of data to
break it into manageable chunks.
You will deploy ETL for data analysis activities. In this project, you will challenge
your working knowledge of ETL and Business Intelligence. You will configure
Pentaho to work with Hadoop distribution and load, transform, and extract data into
the Hadoop cluster.
You will set up a Hadoop real-time cluster on Amazon EC2. The project will involve
installing and configuring Hadoop. You will need to run a Hadoop multi-node using a
4-node cluster on Amazon EC2 and deploy a MapReduce job on the Hadoop
cluster. Java will need to be installed as a prerequisite for running Hadoop.
In this project, you will be required to test MapReduce applications. You will write
JUnit tests using MRUnit for MapReduce applications. You will also be doing mock
static methods using PowerMock and Mockito and implementing MapReduce Driver
for testing the map and reduce pair.
Hadoop Maintenance
Through this project, you will learn how to administer a Hadoop cluster for
maintaining and managing it. You will be working with the NameNode directory
structure, audit logging, DataNode block scanner, balancer, failover, fencing,
DISTCP, and Hadoop file formats.
In this project, you will find out what is the reaction of the people to the
demonetization move by India by analyzing their tweets. You will have to download
the tweets, load them into Pig storage, divide the tweets into words to calculate
sentiment, rate the words from +5 to −5 on the AFFIN dictionary, filter them, and
then, analyze sentiment.
This project will require you to analyze an entire cricket match and get any details of
it. You will need to load the IPL dataset into HDFS. You will then analyze the data
using Apache Pig or Hive. Based on the user queries, the system will have to give
the right output.
Movie Recommendation
In this project, you need to recommend the most appropriate movie to a user based
on his taste. This is a hands-on Apache Spark project, which will include performing
collaborative filtering, regression, clustering, and dimensionality reduction. You will
need to make use of the Apache Spark MLlib component and statistical analysis.
Here, you will analyze the user sentiment based on a tweet. In this Twitter analysis
project, you will integrate the Twitter API and use Python or PHP for developing the
essential server-side codes. You will carry out filtering, parsing, and aggregation,
depending on the tweet analysis requirement.
In this project, you will be making use of the Spark SQL tool for analyzing the
Wikipedia dataset. You will be integrating Spark SQL for batch analysis, working
with Machine Learning, visualizing, and processing data and ETL processes, along
with real-time analysis of data.
Joel Bassa
I am really thankful to Intellipaat for the Hadoop Architect course with Big Data certification.
First of all, the team supported me in finding the best Big Data online course based on my
experiences and current assignment. institution.
Kevin K Wada
Thank you very much for your top-class service. A special mention should
be made for your patience in listening to my queries and giving me a
solution, which was exactly what I was looking for. I am giving you a 10 on
10!
Sampson Basoah
The Intellipaat team helped me in selecting the perfect course that suits my
profile. The whole course was practically oriented, and the trainers are
always ready to answer any question. I found this course to be impactful.
Thank you.
Paschal Ositadima
This regards to conveying my deepest gratitude to Intellipaat. The quality and methodology
of this online Hadoop training were matchless. The self-study program for Big Data Hadoop
training, for which I had enrolled, ticked off all the right boxes. I had access to free tutorials
and videos to help me in my learning endeavor. A special mention must be made regarding the promptness
and enthusiasm that Intellipaat showed when it comes to query resolution and doubt clearance. Kudos!
Rich Baker
Intellipaat’s Hadoop tutorial delivered more than what they had promised to me. Since I
have undergone a previous Hadoop training course, I was quite familiar with Big Data
Hadoop concepts, but Intellipaat took it to a different level with their attention to detail and
Hadoop domain expertise. I recommend this training
to everybody. You will learn everything from basic Hadoop concepts to advanced Big Data technology
deployment. I am more than satisfied with this training. Thank you, Intellipaat!
Bangalore
AMR Tech Park 3, Ground Floor, Tower B,
Hongasandra Village, Bommanahalli,
Hosur Road, Bangalore – 560068
USA
1219 E. Hillsdale Blvd. Suite 205,
Foster City, CA 94404
If you have any further queries or just want to have a conversation with us, then do call us.