DEA - JULY2024-No
DEA - JULY2024-No
CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta
Live Tables (DLT) tables using SQL?
A. CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.
B. CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.
C. CREATE STREAMING LIVE TABLE is redundant for DLT and it does not need to be used.
D. CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated
aggregations.
E. CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.
B. E.
C.
B. E.
C.
What is an advantage of using Databricks Repos over the Databricks Notebooks versioning?
A. Databricks Repos allows users to revert to previous versions of a notebook
B. Databricks Repos is wholly housed within the Databricks Data Intelligence Platform
C. Databricks Repos provides the ability to comment on specific changes
D. Databricks Repos supports the use of multiple branches
What is an advantage of using Databricks Repos over the Databricks Notebooks versioning?
A. Databricks Repos allows users to revert to previous versions of a notebook
B. Databricks Repos is wholly housed within the Databricks Data Intelligence Platform
C. Databricks Repos provides the ability to comment on specific changes
D. Databricks Repos supports the use of multiple branches
In which location can the data engineer review their permissions on the table?
A. Jobs
B. Dashboards
C. Catalog Explorer
D. Repos
In which location can the data engineer review their permissions on the table?
A. Jobs
B. Dashboards
C. Catalog Explorer
D. Repos
Which Git operation does the data engineer need to run to accomplish this task?
A. Clone
B. Pull
C. Merge
D. Push
Which Git operation does the data engineer need to run to accomplish this task?
A. Clone
B. Pull
C. Merge
D. Push
Which code block is used by SQL DDL command to create an empty Delta table in the above format
regardless of whether a table already exists with this name?
A. CREATE OR REPLACE TABLE table_name ( employeeId STRING, startDate DATE, avgRating FLOAT )
B. CREATE OR REPLACE TABLE table_name WITH COLUMNS ( employeeId STRING, startDate DATE, avgRating
FLOAT ) USING DELTA
C. CREATE TABLE IF NOT EXISTS table_name ( employeeId STRING, startDate DATE, avgRating FLOAT )
D. CREATE TABLE table_name AS SELECT employeeId STRING, startDate DATE, avgRating FLOAT
Which code block is used by SQL DDL command to create an empty Delta table in the above format
regardless of whether a table already exists with this name?
A. CREATE OR REPLACE TABLE table_name ( employeeId STRING, startDate DATE, avgRating FLOAT )
B. CREATE OR REPLACE TABLE table_name WITH COLUMNS ( employeeId STRING, startDate DATE, avgRating
FLOAT ) USING DELTA
C. CREATE TABLE IF NOT EXISTS table_name ( employeeId STRING, startDate DATE, avgRating FLOAT )
D. CREATE TABLE table_name AS SELECT employeeId STRING, startDate DATE, avgRating FLOAT
id STRING = 'a1'
rank INTEGER = 6
rating FLOAT = 9.4
Which SQL commands can be used to append the new record to an existing Delta table my_table?
A. INSERT INTO my_table VALUES ('a1', 6, 9.4)
B. INSERT VALUES ('a1', 6, 9.4) INTO my_table
C. UPDATE my_table VALUES ('a1', 6, 9.4)
D. UPDATE VALUES ('a1', 6, 9.4) my_table
id STRING = 'a1'
rank INTEGER = 6
rating FLOAT = 9.4
Which SQL commands can be used to append the new record to an existing Delta table my_table?
A. INSERT INTO my_table VALUES ('a1', 6, 9.4)
B. INSERT VALUES ('a1', 6, 9.4) INTO my_table
C. UPDATE my_table VALUES ('a1', 6, 9.4)
D. UPDATE VALUES ('a1', 6, 9.4) my_table
Which of the following data entities should the data engineer create?
A. Table
B. Function
C. View
D. Temporary view
Which of the following data entities should the data engineer create?
A. Table
B. Function
C. View
D. Temporary view
Today, the data engineer runs the following command to complete this task:
After running the command today, the data engineer notices that the number of records in table
transactions has not changed.
What explains why the statement might not have copied any new records into the table?
A. The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.
B. The COPY INTO statement requires the table to be refreshed to view the copied rows.
C. The previous day’s file has already been copied into the table.
D. The PARQUET file format does not support COPY INTO.
Today, the data engineer runs the following command to complete this task:
After running the command today, the data engineer notices that the number of records in table
transactions has not changed.
What explains why the statement might not have copied any new records into the table?
A. The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.
B. The COPY INTO statement requires the table to be refreshed to view the copied rows.
C. The previous day’s file has already been copied into the table.
D. The PARQUET file format does not support COPY INTO.
Which command could the data engineering team use to access sales in PySpark?
A. SELECT * FROM sales
B. spark.table("sales")
C. spark.sql("sales")
D. spark.delta.table("sales")
Which command could the data engineering team use to access sales in PySpark?
A. SELECT * FROM sales
B. spark.table("sales")
C. spark.sql("sales")
D. spark.delta.table("sales")
After running this command, the engineer notices that the data files and metadata files have been
deleted from the file system.
After running this command, the engineer notices that the data files and metadata files have been
deleted from the file system.
Which of the following lines of code fills in the above blank to successfully complete the task?
A. FROM "path/to/csv"
B. USING CSV
C. FROM CSV
D. USING DELTA
Which of the following lines of code fills in the above blank to successfully complete the task?
A. FROM "path/to/csv"
B. USING CSV
C. FROM CSV
D. USING DELTA
What can be used to fill in the blank to successfully complete the task?
A. spark.delta.sql
B. spark.sql
C. spark.table
D. dbutils.sql
What can be used to fill in the blank to successfully complete the task?
A. spark.delta.sql
B. spark.sql
C. spark.table
D. dbutils.sql
sales favorite_stores
customer_id spend units customer_id store_id
a1 28.94 7 a1 s1
a3 874.1223 a2 s1
a4 8.99 1 a4 s2
The data engineer runs the following query to join these tables together:
SELECT
sales.customer id, sales.spend,
favorite_stores.store_id
FROM sales
LEFT JOIN favorite stores
ON sales.customer id = favorite_stores.customer_id;
A. C.
B. D.
sales favorite_stores
customer_id spend units customer_id store_id
a1 28.94 7 a1 s1
a3 874.1223 a2 s1
a4 8.99 1 a4 s2
The data engineer runs the following query to join these tables together:
SELECT
sales.customer id, sales.spend,
favorite_stores.store_id
FROM sales
LEFT JOIN favorite stores
ON sales.customer id = favorite_stores.customer_id;
A. C.
B. D.
Which code block can the data engineer use to complete this task?
A.
function add_integers (x, y):
return x + y
B.
def add_integers (x, y):
print(x + y)
C.
def add_integers (x, y):
x+y
D.
def add integers (x, y):
return x + y
Which code block can the data engineer use to complete this task?
A.
function add_integers (x, y):
return x + y
B.
def add_integers (x, y):
print(x + y)
C.
def add_integers (x, y):
x+y
D.
def add integers (x, y):
return x + y
Which line of code should the data engineer use to fill in the blank if the data engineer only wants the
query to execute a micro-batch to process data every 5 seconds?
A. trigger("5 seconds")
B. trigger(continuous="5 seconds")
C. trigger(once="5 seconds")
D. trigger(processingTime="5 seconds")
Which line of code should the data engineer use to fill in the blank if the data engineer only wants the
query to execute a micro-batch to process data every 5 seconds?
A. trigger("5 seconds")
B. trigger(continuous="5 seconds")
C. trigger(once="5 seconds")
D. trigger(processingTime="5 seconds")
Which of the following tools can the data engineer use to solve this problem?
A. Auto Loader
B. Unity Catalog
C. Delta Lake
D. Delta Live Tables
Which of the following tools can the data engineer use to solve this problem?
A. Auto Loader
B. Unity Catalog
C. Delta Lake
D. Delta Live Tables
Which approach can the data engineer take to identify the table that is dropping the records?
A. They can set up separate expectations for each table when developing their DLT pipeline.
B. They can navigate to the DLT pipeline page, click on the “Error” button, and review the present errors.
C. They can set up DLT to notify them via email when records are dropped.
D. They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.
Which approach can the data engineer take to identify the table that is dropping the records?
A. They can set up separate expectations for each table when developing their DLT pipeline.
B. They can navigate to the DLT pipeline page, click on the “Error” button, and review the present errors.
C. They can set up DLT to notify them via email when records are dropped.
D. They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.
The table is configured to run in Production mode using the Continuous Pipeline Mode.
What is the expected outcome after clicking Start to update the pipeline assuming previously
unprocessed data exists and all definitions are valid?
A. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will
persist to allow for additional testing.
B. All datasets will be updated once and the pipeline will shut down. The compute resources will persist to
allow for additional testing.
C. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be
deployed for the update and terminated when the pipeline is stopped.
D. All datasets will be updated once and the pipeline will shut down. The compute resources will be
terminated.
The table is configured to run in Production mode using the Continuous Pipeline Mode.
What is the expected outcome after clicking Start to update the pipeline assuming previously
unprocessed data exists and all definitions are valid?
A. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will
persist to allow for additional testing.
B. All datasets will be updated once and the pipeline will shut down. The compute resources will persist to
allow for additional testing.
C. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be
deployed for the update and terminated when the pipeline is stopped.
D. All datasets will be updated once and the pipeline will shut down. The compute resources will be
terminated.
Why has Auto Loader inferred all of the columns to be of the string type?
A. Auto Loader cannot infer the schema of ingested data
B. JSON data is a text-based format
C. Auto Loader only works with string data
D. All of the fields had at least one null value
Why has Auto Loader inferred all of the columns to be of the string type?
A. Auto Loader cannot infer the schema of ingested data
B. JSON data is a text-based format
C. Auto Loader only works with string data
D. All of the fields had at least one null value
D.
B.
D.
B.
What is the expected behavior when a batch of data containing data that violates these constraints is
processed?
A. Records that violate the expectation cause the job to fail.
B. Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to
the target dataset.
C. Records that violate the expectation are dropped from the target dataset and recorded as invalid in the
event log.
D. Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.
What is the expected behavior when a batch of data containing data that violates these constraints is
processed?
A. Records that violate the expectation cause the job to fail.
B. Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to
the target dataset.
C. Records that violate the expectation are dropped from the target dataset and recorded as invalid in the
event log.
D. Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.
Which action can the data engineer perform to improve the start up time for the clusters used for the
Job?
A. They can use endpoints available in Databricks SQL
B. They can use jobs clusters instead of all-purpose clusters
C. They can configure the clusters to autoscale for larger data sizes
D. They can use clusters that are from a cluster pool
Which action can the data engineer perform to improve the start up time for the clusters used for the
Job?
A. They can use endpoints available in Databricks SQL
B. They can use jobs clusters instead of all-purpose clusters
C. They can configure the clusters to autoscale for larger data sizes
D. They can use clusters that are from a cluster pool
Which approach can the data engineer use to set up the new task?
A. They can clone the existing task in the existing Job and update it to run the new notebook.
B. They can create a new task in the existing Job and then add it as a dependency of the original task.
C. They can create a new task in the existing Job and then add the original task as a dependency of the new
task.
D. They can create a new job from scratch and add both tasks to run concurrently.
Which approach can the data engineer use to set up the new task?
A. They can clone the existing task in the existing Job and update it to run the new notebook.
B. They can create a new task in the existing Job and then add it as a dependency of the original task.
C. They can create a new task in the existing Job and then add the original task as a dependency of the new
task.
D. They can create a new job from scratch and add both tasks to run concurrently.
Which approach can the tech lead use to identify why the notebook is running slowly as part of the
Job?
A. They can navigate to the Runs tab in the Jobs UI to immediately review the processing notebook.
B. They can navigate to the Tasks tab in the Jobs UI and click on the active run to review the processing
notebook.
C. They can navigate to the Runs tab in the Jobs UI and click on the active run to review the processing
notebook.
D. They can navigate to the Tasks tab in the Jobs UI to immediately review the processing notebook.
Which approach can the tech lead use to identify why the notebook is running slowly as part of the
Job?
A. They can navigate to the Runs tab in the Jobs UI to immediately review the processing notebook.
B. They can navigate to the Tasks tab in the Jobs UI and click on the active run to review the processing
notebook.
C. They can navigate to the Runs tab in the Jobs UI and click on the active run to review the processing
notebook.
D. They can navigate to the Tasks tab in the Jobs UI to immediately review the processing notebook.
Which approach can the data engineering team use to improve the latency of the team’s queries?
A. They can increase the cluster size of the SQL endpoint.
B. They can increase the maximum bound of the SQL endpoint’s scaling range.
C. They can turn on the Auto Stop feature for the SQL endpoint.
D. They can turn on the Serverless feature for the SQL endpoint.
Which approach can the data engineering team use to improve the latency of the team’s queries?
A. They can increase the cluster size of the SQL endpoint.
B. They can increase the maximum bound of the SQL endpoint’s scaling range.
C. They can turn on the Auto Stop feature for the SQL endpoint.
D. They can turn on the Serverless feature for the SQL endpoint.
Which approach can the data engineer use to notify their entire team via a messaging webhook
whenever the number of NULL values reaches 100?
A. They can set up an Alert with a custom template.
B. They can set up an Alert with a new email alert destination.
C. They can set up an Alert with a new webhook alert destination.
D. They can set up an Alert with one-time notifications.
Which approach can the data engineer use to notify their entire team via a messaging webhook
whenever the number of NULL values reaches 100?
A. They can set up an Alert with a custom template.
B. They can set up an Alert with a new email alert destination.
C. They can set up an Alert with a new webhook alert destination.
D. They can set up an Alert with one-time notifications.
Which approach can the data engineer use to minimize the total running time of the SQL endpoint used
in the refresh schedule of their dashboard?
A. They can ensure the dashboard’s SQL endpoint matches each of the queries’ SQL endpoints.
B. They can set up the dashboard’s SQL endpoint to be serverless.
C. They can turn on the Auto Stop feature for the SQL endpoint.
D. They can ensure the dashboard’s SQL endpoint is not one of the included query’s SQL endpoint.
Which approach can the data engineer use to minimize the total running time of the SQL endpoint used
in the refresh schedule of their dashboard?
A. They can ensure the dashboard’s SQL endpoint matches each of the queries’ SQL endpoints.
B. They can set up the dashboard’s SQL endpoint to be serverless.
C. They can turn on the Auto Stop feature for the SQL endpoint.
D. They can ensure the dashboard’s SQL endpoint is not one of the included query’s SQL endpoint.
Which approach can the engineering team use to ensure the query does not cost the organization any
money beyond the first week of the project’s release?
A. They can set a limit to the number of DBUs that are consumed by the SQL Endpoint.
B. They can set the query’s refresh schedule to end after a certain number of refreshes.
C. They can set the query’s refresh schedule to end on a certain date in the query scheduler.
D. They can set a limit to the number of individuals that are able to manage the query’s refresh schedule.
Which approach can the engineering team use to ensure the query does not cost the organization any
money beyond the first week of the project’s release?
A. They can set a limit to the number of DBUs that are consumed by the SQL Endpoint.
B. They can set the query’s refresh schedule to end after a certain number of refreshes.
C. They can set the query’s refresh schedule to end on a certain date in the query scheduler.
D. They can set a limit to the number of individuals that are able to manage the query’s refresh schedule.
Which command can be used to grant the necessary permission on the entire database to the new
team?
A. GRANT VIEW ON CATALOG customers TO team;
B. GRANT CREATE ON DATABASE customers TO team;
C. GRANT USAGE ON CATALOG team TO customers;
D. GRANT USAGE ON DATABASE customers TO team;
Which command can be used to grant the necessary permission on the entire database to the new
team?
A. GRANT VIEW ON CATALOG customers TO team;
B. GRANT CREATE ON DATABASE customers TO team;
C. GRANT USAGE ON CATALOG team TO customers;
D. GRANT USAGE ON DATABASE customers TO team;
Which command can be used to grant full permissions on the database to the new data engineering
team?
A. GRANT ALL PRIVILEGES ON TABLE sales TO team;
B. GRANT SELECT CREATE MODIFY ON TABLE sales TO team;
C. GRANT SELECT ON TABLE sales TO team;
D. GRANT ALL PRIVILEGES ON TABLE team TO sales;
Which command can be used to grant full permissions on the database to the new data engineering
team?
A. GRANT ALL PRIVILEGES ON TABLE sales TO team;
B. GRANT SELECT CREATE MODIFY ON TABLE sales TO team;
C. GRANT SELECT ON TABLE sales TO team;
D. GRANT ALL PRIVILEGES ON TABLE team TO sales;