Databricks Certified Data Engineer Associate Exam Guide
Databricks Certified Data Engineer Associate Exam Guide
Databricks Certified
Data Engineer Associate
Provide Exam Guide Feedback
Audience Description
The Databricks Certified Data Engineer Associate certification exam assesses an individual’s ability
to use the Databricks Lakehouse Platform to complete introductory data engineering tasks. This
includes an understanding of the Lakehouse Platform and its workspace, its architecture, and its
capabilities. It also assesses the ability to perform multi-hop architecture ETL tasks using Apache
Spark SQL and Python in both batch and incrementally processed paradigms. Finally, the exam
assesses the tester’s ability to put basic ETL pipelines and Databricks SQL queries and dashboards
into production while maintaining entity permissions. Individuals who pass this certification exam
can be expected to complete basic data engineering tasks using Databricks and its associated
tools.
Recommended Training
● Instructor-led: Data Engineering with Databricks
● Self-paced: Data Engineering with Databricks (available in Databricks Academy)
Exam outline
Sample Questions
These questions are retired from a previous version of the exam. The purpose is to show you
objectives as they are stated on the exam guide, and give you a sample question that aligns to the
objective. The exam guide lists the objectives that could be covered on an exam. The best way to
prepare for a certification exam is to review the exam outline in the exam guide.
Question 1
Objective: Describe the benefits of a data lakehouse over a traditional data warehouse.
Question 2
Objective: Identify query optimization techniques
A data engineering team needs to query a Delta table to extract rows that all meet the same
condition. However, the team has noticed that the query is running slowly. The team has already
tuned the size of the data files. Upon investigating, the team has concluded that the rows meeting
the condition are sparsely located throughout each of the data files.
A. Data skipping
B. Z-Ordering
C. Bin-packing
D. Write as a Parquet file
E. Tuning the file size
Question 3
Objective: Identify data workloads that utilize a Silver table as its source.
A. A job that enriches data by parsing its timestamps into a human-readable format
B. A job that queries aggregated data that already feeds into a dashboard
C. A job that ingests raw data from a streaming source into the Lakehouse
D. A job that aggregates cleaned data to create standard summary statistics
E. A job that cleans data by removing malformatted records
Question 4
Objective: Describe how to configure a refresh schedule
An engineering manager uses a Databricks SQL query to monitor their team’s progress on
fixes related to customer-reported bugs. The manager checks the results of the query every
day, but they are manually rerunning the query each day and waiting for the results.
How should the query be scheduled to ensure the results of the query are updated each day?
Question 5
Objective: Identify commands for granting appropriate permissions
A new data engineer has started at a company. The data engineer has recently been added to the
company’s Databricks workspace as new.engineer@company.com. The data engineer needs to be
able to query the table sales in the database retail. The new data engineer already has been
granted USAGE on the database retail.
Which command should be used to grant the appropriate permissions to the new data engineer?