0% found this document useful (0 votes)

469 views7 pages

Databricks Certified Data Engineer Associate Exam Guide

Uploaded by

daniel olaru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

469 views7 pages

Databricks Certified Data Engineer Associate Exam Guide

Uploaded by

daniel olaru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Databricks Exam Guide

Databricks Certified
Data Engineer Associate
Provide Exam Guide Feedback

Purpose of this Exam Guide

The purpose of this exam guide is to give you an overview of the exam and what is covered on the
exam to help you determine your exam readiness. This document will get updated anytime there
are any changes to an exam (and when those changes will take effect on an exam) so that you can
be prepared. This version covers the currently live exam as of April 1, 2024. Please check back
two weeks before you take your exam to make sure you have the most current version.

Audience Description
The Databricks Certified Data Engineer Associate certification exam assesses an individual’s ability
to use the Databricks Lakehouse Platform to complete introductory data engineering tasks. This
includes an understanding of the Lakehouse Platform and its workspace, its architecture, and its
capabilities. It also assesses the ability to perform multi-hop architecture ETL tasks using Apache
Spark SQL and Python in both batch and incrementally processed paradigms. Finally, the exam
assesses the tester’s ability to put basic ETL pipelines and Databricks SQL queries and dashboards
into production while maintaining entity permissions. Individuals who pass this certification exam
can be expected to complete basic data engineering tasks using Databricks and its associated
tools.

About the Exam

● Number of items: 45 multiple-choice questions
● Time limit: 90 minutes
● Registration fee: USD 200, plus applicable taxes as required per local law
● Delivery method: Online Proctored
● Test aides: none allowed.
● Prerequisite: None required; course attendance and six months of hands-on experience in
Databricks is highly recommended
● Validity: 2 years
● Recertification: Recertification is required every two years to maintain your certified status.
To recertify, you must take the full exam that is currently live. Please review the “Getting
Ready for the Exam” section on the exam webpage to prepare for taking the exam again.
● Unscored Content: Exams may include unscored items to gather statistical information for
future use. These items are not identified on the form and do not impact your score.
Additional time is factored into account for this content.

Recommended Training
● Instructor-led: Data Engineering with Databricks
● Self-paced: Data Engineering with Databricks (available in Databricks Academy)

Exam outline

Section 1: Databricks Lakehouse Platform

● Describe the relationship between the data lakehouse and the data warehouse.
● Identify the improvement in data quality in the data lakehouse over the data lake.
● Compare and contrast silver and gold tables, which workloads will use a bronze table as a
source, which workloads will use a gold table as a source.
● Identify elements of the Databricks Platform Architecture, such as what is located in the
data plane versus the control plane and what resides in the customer’s cloud account
● Differentiate between all-purpose clusters and jobs clusters.
● Identify how cluster software is versioned using the Databricks Runtime.
● Identify how clusters can be filtered to view those that are accessible by the user.
● Describe how clusters are terminated and the impact of terminating a cluster.
● Identify a scenario in which restarting the cluster will be useful.
● Describe how to use multiple languages within the same notebook.
● Identify how to run one notebook from within another notebook.
● Identify how notebooks can be shared with others.
● Describe how Databricks Repos enables CI/CD workflows in Databricks.
● Identify Git operations available via Databricks Repos.
● Identify limitations in Databricks Notebooks version control functionality relative to Repos.

Section 2: ELT with Apache Spark

● Extract data from a single file and from a directory of files
● Identify the prefix included after the FROM keyword as the data type.
● Create a view, a temporary view, and a CTE as a reference to a file
● Identify that tables from external sources are not Delta Lake tables.
● Create a table from a JDBC connection and from an external CSV file
● Identify how the count_if function and the count where x is null can be used
● Identify how the count(row) skips NULL values.
● Deduplicate rows from an existing Delta Lake table.
● Create a new table from an existing table while removing duplicate rows.
● Deduplicate a row based on specific columns.
● Validate that the primary key is unique across all rows.
● Validate that a field is associated with just one unique value in another field.
● Validate that a value is not present in a specific field.
● Cast a column to a timestamp.
● Extract calendar data from a timestamp.
● Extract a specific pattern from an existing string column.
● Utilize the dot syntax to extract nested data fields.
● Identify the benefits of using array functions.
● Parse JSON strings into structs.
● Identify which result will be returned based on a join query.
● Identify a scenario to use the explode function versus the flatten function
● Identify the PIVOT clause as a way to convert data from wide format to a long format.
● Define a SQL UDF.
● Identify the location of a function.
● Describe the security model for sharing SQL UDFs.
● Use CASE/WHEN in SQL code.
● Leverage CASE/WHEN for custom control flow.

Section 3: Incremental Data Processing

● Identify where Delta Lake provides ACID transactions
● Identify the benefits of ACID transactions.
● Identify whether a transaction is ACID-compliant.
● Compare and contrast data and metadata.
● Compare and contrast managed and external tables.
● Identify a scenario to use an external table.
● Create a managed table.
● Identify the location of a table.
● Inspect the directory structure of Delta Lake files.
● Identify who has written previous versions of a table.
● Review a history of table transactions.
● Roll back a table to a previous version.
● Identify that a table can be rolled back to a previous version.
● Query a specific version of a table.
● Identify why Zordering is beneficial to Delta Lake tables.
● Identify how vacuum commits deletes.
● Identify the kind of files Optimize compacts.
● Identify CTAS as a solution.
● Create a generated column.
● Add a table comment.
● Use CREATE OR REPLACE TABLE and INSERT OVERWRITE
● Compare and contrast CREATE OR REPLACE TABLE and INSERT OVERWRITE
● Identify a scenario in which MERGE should be used.
● Identify MERGE as a command to deduplicate data upon writing.
● Describe the benefits of the MERGE command.
● Identify why a COPY INTO statement is not duplicating data in the target table.
● Identify a scenario in which COPY INTO should be used.
● Use COPY INTO to insert data.
● Identify the components necessary to create a new DLT pipeline.
● Identify the purpose of the target and of the notebook libraries in creating a pipeline.
● Compare and contrast triggered and continuous pipelines in terms of cost and latency
● Identify which source location is utilizing Auto Loader.
● Identify a scenario in which Auto Loader is beneficial.
● Identify why Auto Loader has inferred all data to be STRING from a JSON source
● Identify the default behavior of a constraint violation
● Identify the impact of ON VIOLATION DROP ROW and ON VIOLATION FAIL UPDATEfor a
constraint violation
● Explain change data capture and the behavior of APPLY CHANGES INTO
● Query the events log to get metrics, perform audit loggin, examine lineage.
● Troubleshoot DLT syntax: Identify which notebook in a DLT pipeline produced an error,
identify the need for LIVE in create statement, identify the need for STREAM in from clause.

Section 4: Production Pipelines

● Identify the benefits of using multiple tasks in Jobs.
● Set up a predecessor task in Jobs.
● Identify a scenario in which a predecessor task should be set up.
● Review a task's execution history.
● Identify CRON as a scheduling opportunity.
● Debug a failed task.
● Set up a retry policy in case of failure.
● Create an alert in the case of a failed task.
● Identify that an alert can be sent via email.

Section 5: Data Governance

● Identify one of the four areas of data governance.
● Compare and contrast metastores and catalogs.
● Identify Unity Catalog securables.
● Define a service principal.
● Identify the cluster security modes compatible with Unity Catalog.
● Create a UC-enabled all-purpose cluster.
● Create a DBSQL warehouse.
● Identify how to query a three-layer namespace.
● Implement data object access control
● Identify colocating metastores with a workspace as best practice.
● Identify using service principals for connections as best practice.
● Identify the segregation of business units across catalog as best practice.

Sample Questions
These questions are retired from a previous version of the exam. The purpose is to show you
objectives as they are stated on the exam guide, and give you a sample question that aligns to the
objective. The exam guide lists the objectives that could be covered on an exam. The best way to
prepare for a certification exam is to review the exam outline in the exam guide.

Question 1
Objective: Describe the benefits of a data lakehouse over a traditional data warehouse.

What is a benefit of a data lakehouse that is unavailable in a traditional data warehouse?

A. A data lakehouse provides a relational system of data management.

B. A data lakehouse captures snapshots of data for version control purposes.
C. A data lakehouse couples storage and compute for complete control.
D. A data lakehouse utilizes proprietary storage formats for data.
E. A data lakehouse enables both batch and streaming analytics.

Question 2
Objective: Identify query optimization techniques

A data engineering team needs to query a Delta table to extract rows that all meet the same
condition. However, the team has noticed that the query is running slowly. The team has already
tuned the size of the data files. Upon investigating, the team has concluded that the rows meeting
the condition are sparsely located throughout each of the data files.

Which optimization techniques could speed up the query?

A. Data skipping
B. Z-Ordering
C. Bin-packing
D. Write as a Parquet file
E. Tuning the file size
Question 3
Objective: Identify data workloads that utilize a Silver table as its source.

Which data workload will utilize a Silver table as its source?

A. A job that enriches data by parsing its timestamps into a human-readable format
B. A job that queries aggregated data that already feeds into a dashboard
C. A job that ingests raw data from a streaming source into the Lakehouse
D. A job that aggregates cleaned data to create standard summary statistics
E. A job that cleans data by removing malformatted records

Question 4
Objective: Describe how to configure a refresh schedule

An engineering manager uses a Databricks SQL query to monitor their team’s progress on
fixes related to customer-reported bugs. The manager checks the results of the query every
day, but they are manually rerunning the query each day and waiting for the results.

How should the query be scheduled to ensure the results of the query are updated each day?

A. To refresh every 12 hours from the query’s page in Databricks SQL.

B. To refresh every 1 day from the query’s page in Databricks SQL.
C. To run every 12 hours from the Jobs UI.
D. To refresh every 1 day from the SQL warehouse page in Databricks SQL.
E. To refresh every 12 hours from the SQL warehouse page in Databricks SQL.

Question 5
Objective: Identify commands for granting appropriate permissions

A new data engineer has started at a company. The data engineer has recently been added to the
company’s Databricks workspace as new.engineer@company.com. The data engineer needs to be
able to query the table sales in the database retail. The new data engineer already has been
granted USAGE on the database retail.
Which command should be used to grant the appropriate permissions to the new data engineer?

A. GRANT USAGE ON TABLE sales TO new.engineer@company.com;

B. GRANT CREATE ON TABLE sales TO new.engineer@company.com;
C. GRANT SELECT ON TABLE sales TO new.engineer@company.com;
D. GRANT USAGE ON TABLE new.engineer@company.com TO sales;
E. GRANT SELECT ON TABLE new.engineer@company.com TO sales;
Answers
Question 1: E
Question 2: B
Question 3: D
Question 4: B
Question 5: C

Databricks Data Engineer Associate Notes
No ratings yet
Databricks Data Engineer Associate Notes
5 pages
Databricks Data Engineer Professional
No ratings yet
Databricks Data Engineer Professional
98 pages
Databricks Associate Data Engineer Notes
No ratings yet
Databricks Associate Data Engineer Notes
39 pages
Data Bricks Certified Associated at A Engineer Exam
No ratings yet
Data Bricks Certified Associated at A Engineer Exam
142 pages
Cert DEWD (Edits)
No ratings yet
Cert DEWD (Edits)
158 pages
Databricks Data Engineer Professional Practice
No ratings yet
Databricks Data Engineer Professional Practice
10 pages
PySpark and Azure Data Engineer Free Notes
No ratings yet
PySpark and Azure Data Engineer Free Notes
65 pages
Databricks Certified Data Engineer Professional Exam Guide 1 Mar 2025
No ratings yet
Databricks Certified Data Engineer Professional Exam Guide 1 Mar 2025
6 pages
Databricks Certified Data Analyst Associate
No ratings yet
Databricks Certified Data Analyst Associate
110 pages
Databricks Certified Professional Data Engineer 1 1
No ratings yet
Databricks Certified Professional Data Engineer 1 1
16 pages
Databricks Certified Data Analyst Associate Exam Guide
No ratings yet
Databricks Certified Data Analyst Associate Exam Guide
7 pages
Azure Databricks Interview Question
No ratings yet
Azure Databricks Interview Question
12 pages
Databricks Certified Data Engineer Associate 9
No ratings yet
Databricks Certified Data Engineer Associate 9
12 pages
Databricks Course Curriculum
No ratings yet
Databricks Course Curriculum
2 pages
DCP Examen
100% (1)
DCP Examen
112 pages
Databricks Certified Data Engineer Associate Practice Questions
No ratings yet
Databricks Certified Data Engineer Associate Practice Questions
6 pages
Databricks Certified Associate Data Engineer
100% (1)
Databricks Certified Associate Data Engineer
18 pages
Azure Data Factory
100% (1)
Azure Data Factory
6 pages
Databricks Certified Data Engineer Associate Practice Exams - 1
100% (1)
Databricks Certified Data Engineer Associate Practice Exams - 1
25 pages
Databricks Quiz Questions
No ratings yet
Databricks Quiz Questions
35 pages
Databricks Certified Data Engineer Associate - 6
No ratings yet
Databricks Certified Data Engineer Associate - 6
10 pages
Databricks Certified Data Engineer Professional Practice Questions
No ratings yet
Databricks Certified Data Engineer Professional Practice Questions
13 pages
Databricks Certified Data Engineer Associate
No ratings yet
Databricks Certified Data Engineer Associate
4 pages
Databricks Final
100% (1)
Databricks Final
81 pages
PracticeExam DataEngineerAssociate
No ratings yet
PracticeExam DataEngineerAssociate
23 pages
Databricks Questions
No ratings yet
Databricks Questions
31 pages
Introduction To Databricks
No ratings yet
Introduction To Databricks
149 pages
Databricks Data Engineer Associate Practice
No ratings yet
Databricks Data Engineer Associate Practice
12 pages
Certified Data Engineer Associate
No ratings yet
Certified Data Engineer Associate
24 pages
Databricks Data Engg Pro Certification Dumps
100% (2)
Databricks Data Engg Pro Certification Dumps
41 pages
Azure Comapny Wise Question
No ratings yet
Azure Comapny Wise Question
68 pages
Databricks Certified Data Engineer Associate 4
100% (1)
Databricks Certified Data Engineer Associate 4
13 pages
Databricks Certified Data Engineer Professional Dumps by Ball 21-03-2024 10qa Ebraindumps
No ratings yet
Databricks Certified Data Engineer Professional Dumps by Ball 21-03-2024 10qa Ebraindumps
19 pages
Databricks Certified Data Engineer Associate PDF
0% (1)
Databricks Certified Data Engineer Associate PDF
5 pages
Databricks For The SQL Developer: Gerhard Brueckl
No ratings yet
Databricks For The SQL Developer: Gerhard Brueckl
40 pages
Databricks Question
No ratings yet
Databricks Question
89 pages
Unity Catalog
No ratings yet
Unity Catalog
16 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
54 pages
Apache Airflow Fundamentals Study Guide
No ratings yet
Apache Airflow Fundamentals Study Guide
7 pages
Pyspark MCQ
No ratings yet
Pyspark MCQ
3 pages
Performance Tuning Spark UI
No ratings yet
Performance Tuning Spark UI
37 pages
Interview DE by Company Azurelib Dot Com
No ratings yet
Interview DE by Company Azurelib Dot Com
14 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
Microsoft DP 600 Dumps by Mcfadden 07 02 2024 8qa Braindumpscollection
No ratings yet
Microsoft DP 600 Dumps by Mcfadden 07 02 2024 8qa Braindumpscollection
10 pages
DP 3011 ENU PowerPoint - 01 Content
No ratings yet
DP 3011 ENU PowerPoint - 01 Content
42 pages
Data Engineer Certification Questions1
100% (1)
Data Engineer Certification Questions1
22 pages
ADF Copy Data
100% (1)
ADF Copy Data
81 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
What Are DBT Sources
No ratings yet
What Are DBT Sources
109 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
Snowproans
No ratings yet
Snowproans
85 pages
Databricks Certification Preparation Associate DE
50% (2)
Databricks Certification Preparation Associate DE
65 pages
Maneesh Azure
No ratings yet
Maneesh Azure
6 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
39 pages
Databricks Questions
No ratings yet
Databricks Questions
23 pages
Document Shopping Mall
60% (5)
Document Shopping Mall
14 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
Lab 3 - Enabling Team Based Data Science With Azure Databricks
No ratings yet
Lab 3 - Enabling Team Based Data Science With Azure Databricks
18 pages
C Hamod 2404-Demo
No ratings yet
C Hamod 2404-Demo
5 pages
Tanel Poder Active Session History Seminar
No ratings yet
Tanel Poder Active Session History Seminar
88 pages
CS8481 - CN Lab Questions
No ratings yet
CS8481 - CN Lab Questions
5 pages
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
4 pages
Unit-I: Database Concepts: A Relational Approach: Database - Relationships - DBMS
No ratings yet
Unit-I: Database Concepts: A Relational Approach: Database - Relationships - DBMS
15 pages
Oracle Index Internals
No ratings yet
Oracle Index Internals
83 pages
Databricks Delta Guide
No ratings yet
Databricks Delta Guide
11 pages
BIS-JURAZ-2020 Adm - 221110 - 215616
No ratings yet
BIS-JURAZ-2020 Adm - 221110 - 215616
4 pages
Reading Statspack Report
100% (1)
Reading Statspack Report
24 pages
Relational Database Management System
No ratings yet
Relational Database Management System
30 pages
DBMS Language - Javatpoint
No ratings yet
DBMS Language - Javatpoint
6 pages
DB2 DBA Course Contents - Mainframesguru
No ratings yet
DB2 DBA Course Contents - Mainframesguru
4 pages
SAP IQ Cockpit en Hanmc
No ratings yet
SAP IQ Cockpit en Hanmc
1,164 pages
Unit 1. Introduction To Big Data: False
No ratings yet
Unit 1. Introduction To Big Data: False
7 pages
4.4 Normalization
No ratings yet
4.4 Normalization
55 pages
Chapter 1 SQL 2024
No ratings yet
Chapter 1 SQL 2024
46 pages
DBMS Assignment2
No ratings yet
DBMS Assignment2
3 pages
OLAP - An Introduction
No ratings yet
OLAP - An Introduction
23 pages
Lectures
No ratings yet
Lectures
60 pages
Yupanadb
No ratings yet
Yupanadb
29 pages
Assignment Cse311
No ratings yet
Assignment Cse311
27 pages
Independent University, Bangladesh: Serial No Student ID Student Name Contributor Status Marks
No ratings yet
Independent University, Bangladesh: Serial No Student ID Student Name Contributor Status Marks
25 pages
History: Was Created First, Was DB1.) in IMS, The Hierarchical Model Is Implemented Using Blocks of
No ratings yet
History: Was Created First, Was DB1.) in IMS, The Hierarchical Model Is Implemented Using Blocks of
8 pages
Paglory University End of Term Two Exam: Question One
No ratings yet
Paglory University End of Term Two Exam: Question One
4 pages
Hbase
No ratings yet
Hbase
8 pages
Laboratory Activity No 5 - Displaying Data From Multiple Tables
No ratings yet
Laboratory Activity No 5 - Displaying Data From Multiple Tables
3 pages
How To Use SQL Tuning Advisor
No ratings yet
How To Use SQL Tuning Advisor
6 pages
Important Wait Events in Oracle Database SQL Execution Level
No ratings yet
Important Wait Events in Oracle Database SQL Execution Level
3 pages
Unt 3
No ratings yet
Unt 3
2 pages
Pt-1 Exam XII Computer Science (083) Time:1:30Hrs M.M:40
No ratings yet
Pt-1 Exam XII Computer Science (083) Time:1:30Hrs M.M:40
2 pages
Database Security Mechanisms and Implementations
No ratings yet
Database Security Mechanisms and Implementations
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Databricks Certified Data Engineer Associate Exam Guide

Uploaded by

Databricks Certified Data Engineer Associate Exam Guide

Uploaded by

Databricks Exam Guide

Purpose of this Exam Guide

About the Exam

Section 1: Databricks Lakehouse Platform

Section 2: ELT with Apache Spark

Section 3: Incremental Data Processing

Section 4: Production Pipelines

Section 5: Data Governance

What is a benefit of a data lakehouse that is unavailable in a traditional data warehouse?

A. A data lakehouse provides a relational system of data management.

Which optimization techniques could speed up the query?

Which data workload will utilize a Silver table as its source?

A. To refresh every 12 hours from the query’s page in Databricks SQL.

A. GRANT USAGE ON TABLE sales TO new.engineer@company.com;

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.