0% found this document useful (0 votes)
202 views12 pages

Databricks_Data_Engineer_Associate_Practice

The document contains practice questions for the Databricks Certified Data Engineer Associate exam, covering topics such as Apache Spark, data governance, data ingestion, the Databricks Lakehouse platform, Delta Lake, and ETL pipelines. Each question is followed by multiple-choice answers, with the correct answer indicated. The questions focus on key concepts and functionalities relevant to using Databricks and Spark effectively.

Uploaded by

Maneet Mathur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
202 views12 pages

Databricks_Data_Engineer_Associate_Practice

The document contains practice questions for the Databricks Certified Data Engineer Associate exam, covering topics such as Apache Spark, data governance, data ingestion, the Databricks Lakehouse platform, Delta Lake, and ETL pipelines. Each question is followed by multiple-choice answers, with the correct answer indicated. The questions focus on key concepts and functionalities relevant to using Databricks and Spark effectively.

Uploaded by

Maneet Mathur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Databricks Certified Data Engineer Associate - Practice Questions

Apache Spark & Notebooks

Q: What is a common use of markdown cells in notebooks?

A. C++

B. Returns all elements of the DataFrame as a list

C. Documentation

D. To run another notebook

Answer: C

Q: What is a benefit of using notebooks in Databricks?

A. Returns all elements of the DataFrame as a list

B. C++

C. Supports interactive development

D. Documentation

Answer: C

Q: Which language is NOT supported in Databricks notebooks?

A. To run another notebook

B. Supports interactive development

C. df.cache()

D. C++

Answer: D

Q: How do you cache a DataFrame in Spark?

A. Documentation

B. df.cache()

C. DataFrame

D. Supports interactive development

Answer: B

Q: How is SparkSession accessed in Databricks?

A. spark

B. C++

C. Documentation

D. To run another notebook

Answer: A
Databricks Certified Data Engineer Associate - Practice Questions

Q: How do you write comments in Python notebooks?

A. To run another notebook

B. # This is a comment

C. spark

D. C++

Answer: B

Q: What does `display(df)` do?

A. Supports interactive development

B. # This is a comment

C. Renders a DataFrame in a tabular format with visualization options

D. spark

Answer: C

Q: What is `%run` used for in notebooks?

A. Supports interactive development

B. To run another notebook

C. spark

D. DataFrame

Answer: B

Q: What does the `.collect()` method do?

A. Renders a DataFrame in a tabular format with visualization options

B. DataFrame

C. df.cache()

D. Returns all elements of the DataFrame as a list

Answer: D

Q: What does `spark.read.csv()` return?

A. C++

B. df.cache()

C. Documentation

D. DataFrame

Answer: D

Data Governance & Security


Databricks Certified Data Engineer Associate - Practice Questions

Q: Which layer defines table-level access?

A. Stores metadata about data assets

B. Catalog permissions

C. A shared environment for users

D. Data permissions and lineage

Answer: B

Q: Who defines data access policies in Unity Catalog?

A. Data permissions and lineage

B. Through access control lists (ACLs)

C. Stores metadata about data assets

D. Data stewards or admins

Answer: D

Q: What does Unity Catalog manage?

A. A shared environment for users

B. Data permissions and lineage

C. Through access control lists (ACLs)

D. Role-Based Access Control

Answer: B

Q: How are user permissions granted?

A. Role-Based Access Control

B. Stores metadata about data assets

C. Through access control lists (ACLs)

D. Assign roles to users

Answer: C

Q: What is a workspace in Databricks?

A. Tracks access logs and usage history

B. Role-Based Access Control

C. Assign roles to users

D. A shared environment for users

Answer: D

Q: What is one way to restrict data access?


Databricks Certified Data Engineer Associate - Practice Questions

A. Data permissions and lineage

B. Tracking data origin and transformations

C. Catalog permissions

D. Assign roles to users

Answer: D

Q: What is data lineage?

A. A shared environment for users

B. Catalog permissions

C. Data stewards or admins

D. Tracking data origin and transformations

Answer: D

Q: What is RBAC?

A. Assign roles to users

B. Tracks access logs and usage history

C. Role-Based Access Control

D. Data stewards or admins

Answer: C

Q: What is the role of a metastore?

A. Role-Based Access Control

B. Stores metadata about data assets

C. Tracks access logs and usage history

D. Data stewards or admins

Answer: B

Q: How does Unity Catalog improve auditing?

A. Assign roles to users

B. Tracks access logs and usage history

C. Data stewards or admins

D. Catalog permissions

Answer: B

Data Ingestion & Transformation

Q: Which tool helps with transformation jobs?


Databricks Certified Data Engineer Associate - Practice Questions

A. JSON

B. XLS

C. Databricks Workflows

D. df.write.format('delta').save('path')

Answer: C

Q: What is a common data ingestion format in Databricks?

A. XLS

B. Incrementally ingesting data from cloud storage

C. df.write.format('delta').save('path')

D. JSON

Answer: D

Q: Which function applies transformation to each row?

A. JSON

B. Structured Streaming

C. Databricks Workflows

D. map

Answer: D

Q: Which format is NOT typically used in Databricks ingestion?

A. map

B. XLS

C. spark.read.csv('file.csv')

D. dropna

Answer: B

Q: How do you write a DataFrame as Delta?

A. Structured Streaming

B. Incrementally ingesting data from cloud storage

C. JSON

D. df.write.format('delta').save('path')

Answer: D

Q: How to read CSV data into a DataFrame?

A. JSON
Databricks Certified Data Engineer Associate - Practice Questions

B. df.write.format('delta').save('path')

C. Structured Streaming

D. spark.read.csv('file.csv')

Answer: D

Q: Which method is used for cleaning data?

A. Structured Streaming

B. JSON

C. df.write.format('delta').save('path')

D. dropna

Answer: D

Q: Which method ingests streaming data?

A. JSON

B. Structured Streaming

C. readStream

D. dropna

Answer: C

Q: What is 'autoloader' used for?

A. JSON

B. spark.read.csv('file.csv')

C. df.write.format('delta').save('path')

D. Incrementally ingesting data from cloud storage

Answer: D

Q: Which API supports streaming in Spark?

A. dropna

B. JSON

C. Structured Streaming

D. map

Answer: C

Databricks Lakehouse Platform

Q: Which storage format does Lakehouse architecture commonly use?

A. Unified BI and ML analytics


Databricks Certified Data Engineer Associate - Practice Questions

B. Lack of schema enforcement and consistency

C. Open formats and APIs

D. Delta Lake

Answer: D

Q: How does Lakehouse support ML workloads?

A. By enabling data scientists to access the same data used in analytics

B. ACID transactions

C. Unified BI and ML analytics

D. Open formats and APIs

Answer: A

Q: What is one way Lakehouse reduces data movement?

A. It combines the benefits of data lakes and data warehouses

B. Unified data platform

C. Unified BI and ML analytics

D. By enabling data scientists to access the same data used in analytics

Answer: B

Q: Which layer of Lakehouse handles governance and security?

A. Open formats and APIs

B. Metadata layer

C. By enabling data scientists to access the same data used in analytics

D. ACID transactions

Answer: B

Q: Which component enables data reliability in a Lakehouse?

A. Unified data platform

B. Lack of schema enforcement and consistency

C. It combines the benefits of data lakes and data warehouses

D. ACID transactions

Answer: D

Q: What is a common use case of a Lakehouse?

A. Unified BI and ML analytics

B. ACID transactions
Databricks Certified Data Engineer Associate - Practice Questions

C. Batch and streaming workloads

D. Unified data platform

Answer: A

Q: Why are traditional data lakes insufficient for BI workloads?

A. Batch and streaming workloads

B. Lack of schema enforcement and consistency

C. Metadata layer

D. Open formats and APIs

Answer: B

Q: Which feature allows multiple tools to access the same data in Lakehouse?

A. Open formats and APIs

B. Delta Lake

C. Metadata layer

D. It combines the benefits of data lakes and data warehouses

Answer: A

Q: What is the primary benefit of the Databricks Lakehouse Platform?

A. Open formats and APIs

B. By enabling data scientists to access the same data used in analytics

C. Batch and streaming workloads

D. It combines the benefits of data lakes and data warehouses

Answer: D

Q: What type of data workloads can be handled by a Lakehouse?

A. It combines the benefits of data lakes and data warehouses

B. Open formats and APIs

C. Delta Lake

D. Batch and streaming workloads

Answer: D

Delta Lake

Q: Which method updates a Delta table conditionally?

A. Parquet

B. MERGE INTO
Databricks Certified Data Engineer Associate - Practice Questions

C. Data reliability with ACID transactions

D. _delta_log

Answer: B

Q: How can schema evolution be enabled in Delta?

A. RESTORE

B. A table stored in Delta format with transaction support

C. Transaction log

D. mergeSchema=True

Answer: D

Q: What is a Delta table?

A. Transaction log

B. Parquet

C. Data reliability with ACID transactions

D. A table stored in Delta format with transaction support

Answer: D

Q: How to enable change data feed in Delta Lake?

A. VACUUM

B. Transaction log

C. Set 'delta.enableChangeDataFeed = true'

D. RESTORE

Answer: C

Q: Which command is used to remove old files in Delta tables?

A. Parquet

B. RESTORE

C. A table stored in Delta format with transaction support

D. VACUUM

Answer: D

Q: What does Delta Lake use for ACID transactions?

A. VACUUM

B. Data reliability with ACID transactions

C. _delta_log
Databricks Certified Data Engineer Associate - Practice Questions

D. A table stored in Delta format with transaction support

Answer: C

Q: What operation allows restoring a table to a previous state?

A. Transaction log

B. RESTORE

C. mergeSchema=True

D. Set 'delta.enableChangeDataFeed = true'

Answer: B

Q: What is one benefit of Delta Lake?

A. Set 'delta.enableChangeDataFeed = true'

B. VACUUM

C. A table stored in Delta format with transaction support

D. Data reliability with ACID transactions

Answer: D

Q: Which file format is used by Delta Lake?

A. VACUUM

B. Set 'delta.enableChangeDataFeed = true'

C. Transaction log

D. Parquet

Answer: D

Q: What enables time travel in Delta Lake?

A. A table stored in Delta format with transaction support

B. Transaction log

C. VACUUM

D. RESTORE

Answer: B

ETL Pipelines & Workflows

Q: What is a task in Databricks Jobs?

A. Via Widgets or Job Parameters

B. A unit of work like running a notebook or script

C. Single Node
Databricks Certified Data Engineer Associate - Practice Questions

D. Python task

Answer: B

Q: How are job parameters passed?

A. Governance on cluster configurations

B. Jobs UI

C. Via Widgets or Job Parameters

D. max_retries

Answer: C

Q: What is a multi-task job?

A. Workflow with multiple dependent tasks

B. Jobs UI

C. Single Node

D. Use the cron expression

Answer: A

Q: What parameter controls retry attempts?

A. max_retries

B. Via Widgets or Job Parameters

C. Use the cron expression

D. Job run history page

Answer: A

Q: How to schedule a job weekly?

A. Workflow with multiple dependent tasks

B. Via Widgets or Job Parameters

C. Python task

D. Use the cron expression

Answer: D

Q: Which task type supports Python scripts?

A. Python task

B. A unit of work like running a notebook or script

C. Governance on cluster configurations

D. max_retries
Databricks Certified Data Engineer Associate - Practice Questions

Answer: A

Q: What is the default cluster mode in a job?

A. Single Node

B. max_retries

C. Jobs UI

D. Job run history page

Answer: A

Q: Where do you find job run logs?

A. Jobs UI

B. max_retries

C. Governance on cluster configurations

D. Job run history page

Answer: D

Q: What is a cluster policy?

A. Via Widgets or Job Parameters

B. Governance on cluster configurations

C. Use the cron expression

D. Single Node

Answer: B

Q: What UI is used to create workflows in Databricks?

A. Via Widgets or Job Parameters

B. Single Node

C. Jobs UI

D. Workflow with multiple dependent tasks

Answer: C

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy