0% found this document useful (0 votes)

84 views5 pages

CCA-175 Docs and Projects

The CCA Spark and Hadoop Developer Exam (CCA175) consists of 8-12 performance-based tasks to be completed in 120 minutes. The exam tests skills in data ingestion, transformation, analysis and configuration using Spark, Hive, Sqoop and Hadoop. Upon passing, a digital certificate is issued within a few days.

Uploaded by

Murthydvms

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views5 pages

CCA-175 Docs and Projects

Uploaded by

Murthydvms

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CCA Spark and Hadoop Developer Exam (CCA175)

•Number of Questions:8–12 performance-based (hands-on) tasks on Cloudera Enterprise cluster. See

below for full cluster configuration
•Time Limit:120 minutes
•Passing Score:70%
•Language:English, Japanese (forthcoming)
•Price:USD $295
Exam Question Format

Each CCA question requires you to solve a particular scenario. In some cases, a tool such as Impala or Hive
may be used. In other cases, coding is required. In order to speed up development time of Spark questions, a
template is often provided that contains a skeleton of the solution, asking the candidate to fill in the missing
lines with functional code. This template is written in either Scala or Python.

You are not required to use the template and may solve the scenario using a language you prefer. Be aware,
however, that coding every problem from scratch may take more time than is allocated for the exam.

Evaluation, Score Reporting, and Certificate

Your exam is graded immediately upon submission and you are e-mailed a score report the same day as your
exam. Your score report displays the problem number for each problem you attempted and a grade on that
problem. If you fail a problem, the score report includes the criteria you failed (e.g., “Records contain
incorrect data” or “Incorrect file format”). We do not report more information in order to protect the exam
content.

If you pass the exam, you receive a second e-mail within a few days of your exam with your digital
certificate as a PDF, your license number, a Linkedin profile update, and a link to download your CCA logos
for use in your personal business collateral and social media profiles

Required Skills
Data Ingest

The skills to transfer data between external systems and your cluster. This includes the following:

•Import data from a MySQL database into HDFS using Sqoop

•Export data to a MySQL database from HDFS using Sqoop

•Change the delimiter and file format of data during import using Sqoop

•Ingest real-time and near-real-time streaming data into HDFS

•Process streaming data as it is loaded onto the cluster

•Load data into and out of HDFS using the Hadoop File System commands
Transform, Stage, and Store

Convert a set of data values in a given format stored in HDFS into new data values or a new data format and
write them into HDFS.

•Load RDD data from HDFS for use in Spark applications

•Write the results from an RDD back into HDFS using Spark

•Read and write files in a variety of file formats

•Perform standard extract, transform, load (ETL) processes on data

Data Analysis

Use Spark SQL to interact with the metastore programmatically in your applications. Generate reports by
using queries against loaded data.

•Use metastore tables as an input source or an output sink for Spark applications

•Understand the fundamentals of querying datasets in Spark

•Filter data using Spark

•Write queries that calculate aggregate statistics

•Join disparate datasets using Spark

•Produce ranked or sorted data

Configuration

This is a practical exam and the candidate should be familiar with all aspects of generating a result, not just
writing code.

•Supply command-line options to change your application configuration, such as increasing available
memory
TEST YOURSELF :

1. Write the missing Spark SQL in the given program to sort by Name column.
"Output: Array((E04,Amer), (E05,Ankit), (E08,Deshdeep), (E02,Karthik), (E09,Kumar), (E03,Rakesh),
(E06,Roopesh), (E01,Shivank), (E07,Tejas), (E10
,Venkat))"
Program
val emp = sc.textFile("/user/simplilearn/Employee")
val pairRDD = emp.map(x => (x.split(",")(0), x.split(",")(1)))
< Write your code>
Answer :
val swap1 = pairRDD.map(_.swap).sortByKey().map(_.swap)

2. Create a database named "XYZ" or if already created, use the existing database. Write Hive DDL script to
create a table named "simplilearn3" and load a dataset as per the given format into the table and complete the
following requirement:
Write Hive query for employee who has salary more than 10,000
Format:
Sl. No, Name, Age, Salary
Emp001, John, 34, 20000
Paste the create table syntax in the given space.
Answer
select * from simplilearn3 where salary >=100000;

3. Paste the Hadoop syntax in the given space.

Write the missing code in the given Spark Python script which reads the HIVE table and also prints the
first_name column name.
Script
sqlContext = HiveContext(sc)
employee2 = sqlContext.sql("select * from employee2")
employee2.show()
< Write your code>
Answer
employee2.first_name

4. Create a database named "XYZ" or if already created, use the existing database. Write Hive DDL script to
create a table named "simplilearn1" and load some sample data into the table in the given below format:
Name Sex Age Father_Name
Example:
Anupam Male 45 Daulat
Paste the create table syntax in the given space.
Answer
create table if not exists simplilearn1(name string,business_places array<string>,sex_age
struct<sex:string,age:int>,fathername_nuofchild map<string,int>)row format delimited fields terminated by
'|'collection items terminated by ','map keys terminated by ':';

5. Write the missing Spark SQL in the given program to sort by Name column.
"Output: Array((E04,Amer), (E05,Ankit), (E08,Deshdeep), (E02,Karthik), (E09,Kumar), (E03,Rakesh),
(E06,Roopesh), (E01,Shivank), (E07,Tejas), (E10
,Venkat))"
Program
val emp = sc.textFile("/user/simplilearn/Employee")
val pairRDD = emp.map(x => (x.split(",")(0), x.split(",")(1)))
< Write your code>

6. Execute the following Python program, using four local threads to count the words.
Program:
# create a program wordcount.py
import sys
from operator import add
from pyspark import SparkContext
if __name__ == "__main__":
if len(sys.argv) != 2:
print >> sys.stderr, "Usage: wordcount <file>"
exit(-1)
sc = SparkContext(appName="PythonWordCount")
lines = sc.textFile(sys.argv[1], 1)
<Write your code>
<Write your code>
<Write your code>
output = counts.collect()
for (word, count) in output:
print "%s: %i" % (word, count)
sc.stop()
Answer
counts = lines.flatMap(lambda x: x.split(' ')) \.map(lambda x: (x, 1)) \.reduceByKey(add)

7. Find the missing code in the Scala program to display the output in the following format.
Output: Array[(Int, String)] = Array((4,anar), (5,applelichi), (6,bananagrapes), (7,oranges))
Program
val a = sc.parallelize(List("apple","banana","oranges","grapes","lichi","anar"))
val b = a.map(x =>(x.length,x))
< Write your code>
Answer
b.foldByKey("")(_+_).collect

8. Write the missing code in the given program to display the expected output to identify animals that have
names with four letters.
Output: Array((4,lion))
Program
val a = sc.parallelize(List("dog","tiger","lion","cat","spider","eagle"),2)
val b = a.keyBy(_.length)
val c = sc.parallelize(List("ant","falcon","squid"),2)
val d = c.keyBy(_.length)
< Write your code>
Answer
b.subtractByKey(d).collect

9. Perform the given tasks and share the queries

Use the same database and create HIVE table simplilearn5 that has schema similar to emp.txt and load
emp.txt in table
Create HIVE table mailid that has schema similar to email.txt and load the email.txt in table
Paste all the syntax in the given space.
Perform the following :
1) Join of tables employee and mailid
2) Left outer join on both tables
Dataset
emp.txt
swetha,250000,Chennai
anamika,200000,Kanyakumari
tarun,300000,Pondi
anita,250000,Selam
email.txt
swetha,swetha@gmail.com
tarun,tarun@shivank.in
nagesh,nagesh@yahoo.com
venkatesh,venki@gmail.com
Answer
create table simplilearn5(name string, salary float,city string)
row format delimited
fields terminated by ',';
create table mailid (name string, email string)row format delimited fields terminated by ',';
select a.name,a.city,a.salary,b.email from simplilearn5 a join mailid b on a.name = b.name;
select a.name,a.city,a.salary,b.email from simplilearn5 a left outer join mailid b on a.name = b.name;
10. Write the missing code in the given Scala program to display the output in the following format.
Output:Array((even,CompactBuffer(2, 4, 6, 8, 10, 12, 14, 16, 18)), (odd,CompactBuffer(1, 3, 5, 7, 9, 11, 13,
15, 17)))
Program
val a = sc.parallelize( 1 to 18,3)
< Write your code>
Answer
a.groupBy(x => {if(x%2 == 0) "even" else "odd"}).collect

11. Write the missing code in the given Scala program to display the output in the below format.
Output: Map(5 -> 1, 1 -> 6, 6 -> 1, 2 -> 3, 7 -> 1, 3 -> 1, 8 -> 1, 4 -> 2)
Program
val b = sc.parallelize(List(1,2,3,4,5,6,7,8,2,4,2,1,1,1,1,1))
< Write your code>
Answer
b.countByvalue

Projects :
https://drive.google.com/drive/folders/0B9tN1aTNNV0RLWhLOTdWSHN5NXM?usp=sharing

Informatio N Technolog y Project
No ratings yet
Informatio N Technolog y Project
39 pages
BDA - Week04 - 10
No ratings yet
BDA - Week04 - 10
41 pages
BDC Final Record
No ratings yet
BDC Final Record
36 pages
XII - Final Practical - Cs
No ratings yet
XII - Final Practical - Cs
33 pages
Class 12 Practical File Informatics Practices (Laxmi Yadav)
No ratings yet
Class 12 Practical File Informatics Practices (Laxmi Yadav)
26 pages
BDA LabRecord Week04 07
No ratings yet
BDA LabRecord Week04 07
31 pages
Class-Xii-Ip-Practical-File-2020-21 H
No ratings yet
Class-Xii-Ip-Practical-File-2020-21 H
50 pages
Ip File - Jasleen
No ratings yet
Ip File - Jasleen
44 pages
Practical 2023
No ratings yet
Practical 2023
10 pages
Bda QB
No ratings yet
Bda QB
8 pages
Exercise 7,8,9 Basic Commands
No ratings yet
Exercise 7,8,9 Basic Commands
7 pages
Algorithm Sample2
No ratings yet
Algorithm Sample2
8 pages
Certified Cloudera
No ratings yet
Certified Cloudera
5 pages
CCS334 Set1
No ratings yet
CCS334 Set1
3 pages
Bda Lab
No ratings yet
Bda Lab
2 pages
XII Final Practical File
No ratings yet
XII Final Practical File
32 pages
List of Questions Big Data
No ratings yet
List of Questions Big Data
5 pages
Learn Well Technocraft: Hadoop/Big Data Syllabus
100% (1)
Learn Well Technocraft: Hadoop/Big Data Syllabus
12 pages
Class 12 IP Practical TASK
No ratings yet
Class 12 IP Practical TASK
3 pages
Unit 4 Spark SQL
No ratings yet
Unit 4 Spark SQL
49 pages
Practical File IP 2025
No ratings yet
Practical File IP 2025
54 pages
Practical - File2024-25 2
No ratings yet
Practical - File2024-25 2
53 pages
Revision Exam I (XII) Ans-1
No ratings yet
Revision Exam I (XII) Ans-1
8 pages
Trend Nologies Curriculum
No ratings yet
Trend Nologies Curriculum
30 pages
Set - 1 To 3 Practical Exam Cbse 2024-25
No ratings yet
Set - 1 To 3 Practical Exam Cbse 2024-25
3 pages
Program List Dbms
No ratings yet
Program List Dbms
8 pages
BDT MSE2Scheme 23-24
No ratings yet
BDT MSE2Scheme 23-24
4 pages
Bigdata Syllabus
No ratings yet
Bigdata Syllabus
3 pages
ScalaJVMBigData SparkLessons PDF
100% (1)
ScalaJVMBigData SparkLessons PDF
100 pages
Big Data Hadoop & Spark Curriculum
No ratings yet
Big Data Hadoop & Spark Curriculum
10 pages
IP Sample Paper 2
No ratings yet
IP Sample Paper 2
6 pages
Hadoop Development Download Syllabus PDF
No ratings yet
Hadoop Development Download Syllabus PDF
5 pages
Dsebl ZG522
No ratings yet
Dsebl ZG522
4 pages
CCA175 Demo Examenes
No ratings yet
CCA175 Demo Examenes
19 pages
DBMS 3b (Employee Department Location)
No ratings yet
DBMS 3b (Employee Department Location)
9 pages
Cambridge IGCSE - O Level Computer Science Programming Book For Microsoft Visual Basic Sample
100% (1)
Cambridge IGCSE - O Level Computer Science Programming Book For Microsoft Visual Basic Sample
49 pages
Bigdata
No ratings yet
Bigdata
3 pages
Spark SQL Meetup - 4-8-2012
No ratings yet
Spark SQL Meetup - 4-8-2012
27 pages
Hadoop Architect Brochure
No ratings yet
Hadoop Architect Brochure
13 pages
Cs File
No ratings yet
Cs File
5 pages
Big Data Analytics - Sem 7 CVMU
No ratings yet
Big Data Analytics - Sem 7 CVMU
4 pages
Bigdata Hadoop Spark - Python
No ratings yet
Bigdata Hadoop Spark - Python
8 pages
Developer Training For Apache Spark and Hadoop
No ratings yet
Developer Training For Apache Spark and Hadoop
3 pages
All India Senior Secondary Certificate Examination
No ratings yet
All India Senior Secondary Certificate Examination
12 pages
Class 11 Question Paper
No ratings yet
Class 11 Question Paper
6 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Big Data Hadoop Certification Training: About Intellipaat
No ratings yet
Big Data Hadoop Certification Training: About Intellipaat
13 pages
Int 421
No ratings yet
Int 421
2 pages
Q - Pratical Program 24 - 25
No ratings yet
Q - Pratical Program 24 - 25
6 pages
MapR Certified Data Analyst (MCDA) Study Guide 16Skmxd
No ratings yet
MapR Certified Data Analyst (MCDA) Study Guide 16Skmxd
34 pages
Semin Detailed Lesson Plan in CSS 11 Assembly and Disassembly
100% (4)
Semin Detailed Lesson Plan in CSS 11 Assembly and Disassembly
4 pages
Data Factory
100% (2)
Data Factory
26 pages
IIT Kharagpur Data Science PDF
No ratings yet
IIT Kharagpur Data Science PDF
22 pages
New Questions From Batch
No ratings yet
New Questions From Batch
7 pages
List of Practicals of XII Computer Science 083 Practical Files 2022 23
No ratings yet
List of Practicals of XII Computer Science 083 Practical Files 2022 23
5 pages
Creativity and Innovation BM006-3-2-CRI Individual Assignment
No ratings yet
Creativity and Innovation BM006-3-2-CRI Individual Assignment
24 pages
Setup OfficeLite en
No ratings yet
Setup OfficeLite en
27 pages
Get 2015 Offroad Catalog
No ratings yet
Get 2015 Offroad Catalog
25 pages
How Is Unit Price Populated On An Internal Requisition For A Process Enabled Org
No ratings yet
How Is Unit Price Populated On An Internal Requisition For A Process Enabled Org
2 pages
AIP-210 CertNexus Certified Artificial Intelligence Practitioner Practice Questions
No ratings yet
AIP-210 CertNexus Certified Artificial Intelligence Practitioner Practice Questions
8 pages
Playwright With Java
No ratings yet
Playwright With Java
6 pages
Maintenance and Service Guide: HP Pavilion 15 Laptop PC
No ratings yet
Maintenance and Service Guide: HP Pavilion 15 Laptop PC
102 pages
The Art of Accompanying
100% (1)
The Art of Accompanying
127 pages
JD - AI - ML Architect
No ratings yet
JD - AI - ML Architect
3 pages
Other Tools Manual
No ratings yet
Other Tools Manual
362 pages
LWPG194-001 Lesson 5
No ratings yet
LWPG194-001 Lesson 5
61 pages
QUINN Edition 6 Chapter 5
No ratings yet
QUINN Edition 6 Chapter 5
53 pages
Trace
No ratings yet
Trace
394 pages
Class X CA-Record-combined-1
No ratings yet
Class X CA-Record-combined-1
34 pages
Chapter-11& 12 File System
No ratings yet
Chapter-11& 12 File System
52 pages
Lesson 3 ICT Skills
No ratings yet
Lesson 3 ICT Skills
10 pages
Bab - La Phrases Resume CV English Arabic
No ratings yet
Bab - La Phrases Resume CV English Arabic
4 pages
Library Management
No ratings yet
Library Management
15 pages
An Introduction To Presentation 2
No ratings yet
An Introduction To Presentation 2
48 pages
Remote Desktop Support With VNC
No ratings yet
Remote Desktop Support With VNC
3 pages
Saidu Muhammad CV
No ratings yet
Saidu Muhammad CV
3 pages
Contents:: ASAP Installation and Administration
No ratings yet
Contents:: ASAP Installation and Administration
21 pages
CSNETWK - Machine Project Demo Kit T3 AY2023-2024
No ratings yet
CSNETWK - Machine Project Demo Kit T3 AY2023-2024
2 pages
SQRRL Reservior RSAC 2016 1
No ratings yet
SQRRL Reservior RSAC 2016 1
13 pages
SAS94 9CBX9B 12601755 Win X64 WRKSTN
No ratings yet
SAS94 9CBX9B 12601755 Win X64 WRKSTN
3 pages
Analisis Hubungan Antara Volume, Kecepatan Dan Kepadatan Lalu Lintas (Studi Kasus: Jembatan Lamnyong, Jalan Teuku Nyak Arief Banda Aceh)
No ratings yet
Analisis Hubungan Antara Volume, Kecepatan Dan Kepadatan Lalu Lintas (Studi Kasus: Jembatan Lamnyong, Jalan Teuku Nyak Arief Banda Aceh)
12 pages
COMP3331 Assignment
No ratings yet
COMP3331 Assignment
10 pages
SQL Server: Tips and Tricks - 2
From Everand
SQL Server: Tips and Tricks - 2
Priyanka Agarwal
4.5/5 (3)
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
Oracle APEX Tips and Tricks
From Everand
Oracle APEX Tips and Tricks
Priyanka Agarwal
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CCA-175 Docs and Projects

Uploaded by

CCA-175 Docs and Projects

Uploaded by

CCA Spark and Hadoop Developer Exam (CCA175)

•Number of Questions:8–12 performance-based (hands-on) tasks on Cloudera Enterprise cluster. See

Evaluation, Score Reporting, and Certificate

•Import data from a MySQL database into HDFS using Sqoop

•Export data to a MySQL database from HDFS using Sqoop

•Ingest real-time and near-real-time streaming data into HDFS

•Process streaming data as it is loaded onto the cluster

•Load RDD data from HDFS for use in Spark applications

•Read and write files in a variety of file formats

•Perform standard extract, transform, load (ETL) processes on data

•Understand the fundamentals of querying datasets in Spark

•Filter data using Spark

•Write queries that calculate aggregate statistics

•Join disparate datasets using Spark

•Produce ranked or sorted data

3. Paste the Hadoop syntax in the given space.

9. Perform the given tasks and share the queries

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.