DP 203 2
DP 203 2
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 1/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Question #1 Topic 2
HOTSPOT -
You plan to create a real-time monitoring app that alerts users when a device travels more than 200 meters away from a designated location.
You need to design an Azure Stream Analytics job to process the data for the planned app. The solution must minimize the amount of code
What should you include in the Stream Analytics job? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
You can process real-time IoT data streams with Azure Stream Analytics.
Function: Geospatial -
With built-in geospatial functions, you can use Azure Stream Analytics to build applications for scenarios such as fleet management, ride
Note: In a real-world scenario, you could have hundreds of these sensors generating events as a stream. Ideally, a gateway device would run
code to push these events to Azure Event Hubs or Azure IoT Hubs.
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 2/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-get-started-with-azure-stream-analytics-to-process-data-from-iot-
devices https://docs.microsoft.com/en-us/azure/stream-analytics/geospatial-scenarios
Question #2 Topic 2
A company has a real-time data analysis solution that is hosted on Microsoft Azure. The solution uses Azure Event Hub to ingest data and an
Azure Stream
Analytics cloud job to analyze the data. The cloud job is configured to use 120 Streaming Units (SU).
You need to optimize performance for the Azure Stream Analytics job.
Which two actions should you perform? Each correct answer presents part of the solution.
F. Implement query parallelization by partitioning the data input. Most Voted Most Voted
Correct Answer: DF
D: Scale out the query by allowing the system to process each input partition separately.
F: A Stream Analytics job definition includes inputs, a query, and output. Inputs are where the job reads the data stream from.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization
Question #3 Topic 2
You need to trigger an Azure Data Factory pipeline when a file arrives in an Azure Data Lake Storage Gen2 container.
A. Microsoft.Sql
B. Microsoft.Automation
D. Microsoft.EventHub
Correct Answer: C
Event-driven architecture (EDA) is a common data integration pattern that involves production, detection, consumption, and reaction to events.
Data integration scenarios often require Data Factory customers to trigger pipelines based on events happening in storage account, such as the
Blob Storage account. Data Factory natively integrates with Azure Event Grid, which lets you trigger pipelines on such events.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger https://docs.microsoft.com/en-us/azure/data-
factory/concepts-pipeline-execution-triggers
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 3/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Question #4 Topic 2
A. High Concurrency
C. interactive
Correct Answer: B
Automated Databricks clusters are the best for jobs and automated batch processing.
Note: Azure Databricks has two types of clusters: interactive and automated. You use interactive clusters to analyze data collaboratively with
interactive notebooks. You use automated clusters to run fast and robust automated jobs.
This scenario involves running batch job JARs and notebooks on a regular cadence through the Databricks platform.
The suggested best practice is to launch a new cluster for each run of critical jobs. This helps avoid any issues (failures, missing SLA, and so
Reference:
https://docs.microsoft.com/en-us/azure/databricks/clusters/create https://docs.databricks.com/administration-guide/cloud-
configurations/aws/cmbp.html#scenario-3-scheduled-batch-workloads-data-engineers-running-etl-jobs
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 4/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Question #5 Topic 2
HOTSPOT -
You are processing streaming data from vehicles that pass through a toll booth.
You need to use Azure Stream Analytics to return the license plate, vehicle make, and hour the last vehicle passed during each 10-minute window.
How should you complete the query? To answer, select the appropriate options in the answer area.
Hot Area:
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 5/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Correct Answer:
Box 1: MAX -
The first step on the query finds the maximum time stamp in 10-minute windows, that is the time stamp of the last event for that window. The
second step joins the results of the first query with the original stream to find the event that match the last time stamps in each window.
Query:
WITH LastInWindow AS -
SELECT -
MAX(Time) AS LastEventTime -
FROM -
GROUP BY -
TumblingWindow(minute, 10)
SELECT -
Input.License_plate,
Input.Make,
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 6/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Input.Time -
FROM -
Box 2: TumblingWindow -
Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.
Box 3: DATEDIFF -
DATEDIFF is a date-specific function that compares and returns the time difference between two DateTime fields, for more information, refer to
date functions.
Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 7/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Question #6 Topic 2
You have an Azure Data Factory instance that contains two pipelines named Pipeline1 and Pipeline2.
Correct Answer: A
Activities are linked together via dependencies. A dependency has a condition of one of the following: Succeeded, Failed, Skipped, or
Completed.
Consider Pipeline1:
If we have a pipeline with two activities where Activity2 has a failure dependency on Activity1, the pipeline will not fail just because Activity1
failed. If Activity1 fails and Activity2 succeeds, the pipeline will succeed. This scenario is treated as a try-catch block by Data Factory.
Note:
If we have a pipeline containing Activity1 and Activity2, and Activity2 has a success dependency on Activity1, it will only execute if Activity1 is
Reference:
https://datasavvy.me/category/azure-data-factory/
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 8/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Question #7 Topic 2
HOTSPOT -
A company plans to use Platform-as-a-Service (PaaS) to create the new data pipeline process. The process must meet the following requirements:
Ingest:
Which technologies should you use? To answer, select the appropriate options in the answer area.
Hot Area:
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 9/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Correct Answer:
In Azure, the following services and tools will meet the core requirements for pipeline orchestration, control flow, and data movement: Azure
Data Factory, Oozie on HDInsight, and SQL Server Integration Services (SSIS).
Note: Data at rest includes information that resides in persistent storage on physical media, in any digital format. Microsoft Azure offers a
variety of data storage solutions to meet different needs, including file, disk, blob, and table storage. Microsoft also provides encryption to
Azure Databricks provides enterprise-grade Azure security, including Azure Active Directory integration.
With Azure Databricks, you can set up your Apache Spark environment in minutes, autoscale and collaborate on shared projects in an
interactive workspace.
Azure Databricks supports Python, Scala, R, Java and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch and
scikit-learn.
Azure Synapse Analytics/ SQL Data Warehouse stores data into relational tables with columnar storage.
Azure SQL Data Warehouse connector now offers efficient and scalable structured streaming write support for SQL Data Warehouse. Access
SQL Data
Warehouse from Azure Databricks using the SQL Data Warehouse connector.
Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics.
Reference:
https://docs.microsoft.com/bs-latn-ba/azure/architecture/data-guide/technology-choices/pipeline-orchestration-data-movement
https://docs.microsoft.com/en-us/azure/azure-databricks/what-is-azure-databricks
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 10/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Question #8 Topic 2
DRAG DROP -
You need to calculate the employee_type value based on the hire_date value.
How should you complete the Transact-SQL statement? To answer, drag the appropriate values to the correct targets. Each value may be used
once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
Correct Answer:
Box 1: CASE -
CASE evaluates a list of conditions and returns one of multiple possible result expressions.
CASE can be used in any statement or clause that allows a valid expression. For example, you can use CASE in statements such as SELECT,
UPDATE,
DELETE and SET, and in clauses such as select_list, IN, WHERE, ORDER BY, and HAVING.
CASE input_expression -
[ ELSE else_result_expression ]
END -
Box 2: ELSE -
Reference:
https://docs.microsoft.com/en-us/sql/t-sql/language-elements/case-transact-sql
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 11/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Question #9 Topic 2
DRAG DROP -
You have an Azure Data Lake Storage Gen2 container that contains JSON-formatted files in the following format.
You need to use the serverless SQL pool in WS1 to read the files.
How should you complete the Transact-SQL statement? To answer, drag the appropriate values to the correct targets. Each value may be used
once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 12/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Correct Answer:
Box 1: openrowset -
The easiest way to see to the content of your CSV file is to provide file URL to OPENROWSET function, specify csv FORMAT.
Example:
SELECT *
FROM OPENROWSET(
BULK 'csv/population/population.csv',
DATA_SOURCE = 'SqlOnDemandDemo',
FIELDTERMINATOR =',',
ROWTERMINATOR = '\n'
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 13/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Box 2: openjson -
You can access your JSON files from the Azure File Storage share by using the mapped drive, as shown in the following example:
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/query-single-csv-file https://docs.microsoft.com/en-us/sql/relational-
databases/json/import-json-documents-into-sql-server
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 14/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
DRAG DROP -
You have an Apache Spark DataFrame named temperatures. A sample of the data is shown in the following table.
You need to produce the following table by using a Spark SQL query.
How should you complete the query? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once,
or not at all.
You may need to drag the split bar between panes or scroll to view content.
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 15/16
26/12/2023, 14:33 DP-203 Exam – Free Actual Q&As, Page 11 | ExamTopics
Correct Answer:
Box 1: PIVOT -
PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output.
And PIVOT runs aggregations where they're required on any remaining column values that are wanted in the final output.
Incorrect Answers:
UNPIVOT carries out the opposite operation to PIVOT by rotating columns of a table-valued expression into column values.
Box 2: CAST -
If you want to convert an integer value to a DECIMAL data type in SQL Server use the CAST() function.
Example:
SELECT -
decimal_value
12.00
Reference:
https://learnsql.com/cookbook/how-to-convert-an-integer-to-a-decimal-in-sql-server/ https://docs.microsoft.com/en-us/sql/t-sql/queries/from-
using-pivot-and-unpivot
https://www.examtopics.com/exams/microsoft/dp-203/view/11/ 16/16
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
You need to label each pipeline with its main purpose of either ingest, transform, or load. The labels must be available for grouping and filtering
A. a resource tag
B. a correlation ID
C. a run group ID
Correct Answer: D
Annotations are additional, informative tags that you can add to specific factory resources: pipelines, datasets, linked services, and triggers. By
adding annotations, you can easily filter and search for specific factory resources.
Reference:
https://www.cathrinewilhelmsen.net/annotations-user-properties-azure-data-factory/
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 1/13
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
HOTSPOT -
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
Hot Area:
Correct Answer:
Box 1: Yes -
A cluster mode of 'High Concurrency' is selected, unlike all the others which are 'Standard'. This results in a worker type of Standard_DS13_v2.
Box 2: No -
When you run a job on a new cluster, the job is treated as a data engineering (job) workload subject to the job workload pricing. When you run a
job on an existing cluster, the job is treated as a data analytics (all-purpose) workload subject to all-purpose workload pricing.
Box 3: Yes -
Delta Lake on Databricks allows you to configure Delta Lake based on your workload patterns.
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 2/13
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
Reference:
https://adatis.co.uk/databricks-cluster-sizing/
https://docs.microsoft.com/en-us/azure/databricks/jobs
https://docs.databricks.com/administration-guide/capacity-planning/cmbp.html https://docs.databricks.com/delta/index.html
You are designing a statistical analysis solution that will use custom proprietary Python functions on near real-time data from Azure Event Hubs.
You need to recommend which Azure service to use to perform the statistical analysis. The solution must minimize latency.
Correct Answer: C
Reference:
https://docs.microsoft.com/en-us/azure/event-hubs/process-data-azure-stream-analytics
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 3/13
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
HOTSPOT -
You have an enterprise data warehouse in Azure Synapse Analytics that contains a table named FactOnlineSales. The table contains data from the
You need to improve the performance of queries against FactOnlineSales by using table partitions. The solution must meet the following
requirements:
Hot Area:
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 4/13
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
Correct Answer:
Range Left or Right, both are creating similar partition but there is difference in comparison
For example: in this scenario, when you use LEFT and 20100101,20110101,20120101
Partition will be, datecol<=20100101, datecol>20100101 and datecol<=20110101, datecol>20110101 and datecol<=20120101,
datecol>20120101
Partition will be, datecol<20100101, datecol>=20100101 and datecol<20110101, datecol>=20110101 and datecol<20120101,
datecol>=20120101
In this example, Range RIGHT will be suitable for calendar comparison Jan 1st to Dec 31st
Reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-partition-function-transact-sql?view=sql-server-ver15
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 5/13
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
You need to implement a Type 3 slowly changing dimension (SCD) for product category data in an Azure Synapse Analytics dedicated SQL pool.
You have a table that was created by using the following Transact-SQL statement.
Which two columns should you add to the table? Each correct answer presents part of the solution.
Correct Answer: BE
A Type 3 SCD supports storing two versions of a dimension member as separate columns. The table includes a column for the current value of
a member plus either the original or previous value of the member. So Type 3 uses additional columns to track one key instance of history,
rather than storing additional rows to track each change like in a Type 2 SCD.
This type of tracking may be used for one or two columns in a dimension table. It is not common to use it for many members of the same table.
Reference:
https://k21academy.com/microsoft-azure/azure-data-engineer-dp203-q-a-day-2-live-session-review/
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 6/13
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that
might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an Azure Stream Analytics solution that will analyze Twitter data.
You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once.
Solution: You use a hopping window that uses a hop size of 10 seconds and a window size of 10 seconds.
B. No
Correct Answer: B
Instead use a tumbling window. Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.
Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that
might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an Azure Stream Analytics solution that will analyze Twitter data.
You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once.
Solution: You use a hopping window that uses a hop size of 5 seconds and a window size 10 seconds.
A. Yes
B. No Most Voted
Correct Answer: B
Instead use a tumbling window. Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.
Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 7/13
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
HOTSPOT -
You are building an Azure Stream Analytics job to identify how much time a user spends interacting with a feature on a webpage.
The job receives events based on user actions on the webpage. Each row of data represents an event. Each event has a type of either 'start' or
'end'.
You need to calculate the duration between start and end events.
How should you complete the query? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
Box 1: DATEDIFF -
DATEDIFF function returns the count (as a signed integer value) of the specified datepart boundaries crossed between the specified startdate
and enddate.
Box 2: LAST -
The LAST function can be used to retrieve the last event within a specific condition. In this example, the condition is an event of type Start,
partitioning the search by PARTITION BY user and feature. This way, every user and feature is treated independently when searching for the
Start event. LIMIT DURATION limits the search back in time to 1 hour between the End and Start events.
Example:
SELECT -
[user],
feature,
DATEDIFF(
second,
Time) as duration -
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 8/13
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
WHERE -
Event = 'end'
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-stream-analytics-query-patterns
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 9/13
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
You are creating an Azure Data Factory data flow that will ingest data from a CSV file, cast columns to specified types of data, and insert the data
into a table in an
Azure Synapse Analytic dedicated SQL pool. The CSV file contains three columns named username, comment, and date.
✑ A source transformation.
✑ A Derived Column transformation to set the appropriate types of data.
✑ A sink transformation to land the data in the pool.
You need to ensure that the data flow meets the following requirements:
A. To the data flow, add a sink transformation to write the rows to a file in blob storage. Most Voted
B. To the data flow, add a Conditional Split transformation to separate the rows that will cause truncation errors. Most Voted
C. To the data flow, add a filter transformation to filter out rows that will cause truncation errors.
D. Add a select transformation to select only the rows that will cause truncation errors.
Correct Answer: AB
B: Example:
1. This conditional split transformation defines the maximum length of "title" to be five. Any row that is less than or equal to five will go into the
GoodRows stream.
Any row that is larger than five will go into the BadRows stream.
2. This conditional split transformation defines the maximum length of "title" to be five. Any row that is less than or equal to five will go into the
GoodRows stream.
Any row that is larger than five will go into the BadRows stream.
A:
3. Now we need to log the rows that failed. Add a sink transformation to the BadRows stream for logging. Here, we'll "auto-map" all of the fields
so that we have logging of the complete transaction record. This is a text-delimited CSV file output to a single file in Blob Storage. We'll call the
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 10/13
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
4. The completed data flow is shown below. We are now able to split off error rows to avoid the SQL truncation errors and put those entries into
a log file.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-flow-error-rows
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 11/13
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
DRAG DROP -
You need to create an Azure Data Factory pipeline to process data for the following three departments at your company: Ecommerce, retail, and
wholesale. The solution must ensure that data can also be processed for the entire company.
How should you complete the Data Factory data flow script? To answer, drag the appropriate values to the correct targets. Each value may be used
once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
Correct Answer:
The conditional split transformation routes data rows to different streams based on matching conditions. The conditional split transformation
is similar to a CASE decision structure in a programming language. The transformation evaluates expressions, and based on the results, directs
First we put the condition. The order must match the stream labeling we define in Box 3.
Syntax:
<incomingStream>
split(
<conditionalExpression1>
<conditionalExpression2>
...
disjoint is false because the data goes to the first matching condition. All remaining rows matching the third condition go to output stream all.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/data-flow-conditional-split
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 12/13
26/12/2023, 14:35 DP-203 Exam – Free Actual Q&As, Page 12 | ExamTopics
https://www.examtopics.com/exams/microsoft/dp-203/view/12/ 13/13
26/12/2023, 14:39 DP-203 Exam – Free Actual Q&As, Page 13 | ExamTopics
DRAG DROP -
You have an Azure Data Lake Storage Gen2 account that contains a JSON file for customers. The file contains two attributes named FirstName
and LastName.
You need to copy the data from the JSON file to an Azure Synapse Analytics table by using Azure Databricks. A new column must be created that
actions to the answer area and arrange them in the correct order.
Correct Answer:
Begin with creating a file system in the Azure Data Lake Storage Gen2 account.
You can load the json files as a data frame in Azure Databricks.
Specify a temporary folder to use while moving data between Azure Databricks and Azure Synapse.
You upload the transformed data frame into Azure Synapse. You use the Azure Synapse connector for Azure Databricks to directly upload a
Reference:
https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-extract-load-sql-data-warehouse
https://www.examtopics.com/exams/microsoft/dp-203/view/13/ 2/13
26/12/2023, 14:39 DP-203 Exam – Free Actual Q&As, Page 13 | ExamTopics
HOTSPOT -
You build an Azure Data Factory pipeline to move data from an Azure Data Lake Storage Gen2 container to a database in an Azure Synapse
Analytics dedicated
SQL pool.
/in/{YYYY}/{MM}/{DD}/{HH}/{mm}
Hot Area:
Correct Answer:
Box 2:
Delay: 2 minutes.
The amount of time to delay the start of data processing for the window. The pipeline run is started after the expected execution time plus the
amount of delay.
https://www.examtopics.com/exams/microsoft/dp-203/view/13/ 3/13
26/12/2023, 14:39 DP-203 Exam – Free Actual Q&As, Page 13 | ExamTopics
The delay defines how long the trigger waits past the due time before triggering a new run. The delay doesn't alter the window startTime.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger
https://www.examtopics.com/exams/microsoft/dp-203/view/13/ 4/13
26/12/2023, 14:39 DP-203 Exam – Free Actual Q&As, Page 13 | ExamTopics
HOTSPOT -
You are designing a near real-time dashboard solution that will visualize streaming data from remote sensors that connect to the internet. The
streaming data must be aggregated to show the average value of each 10-second interval. The data will be discarded after being displayed in the
dashboard.
The solution will use Azure Stream Analytics and must meet the following requirements:
Hot Area:
https://www.examtopics.com/exams/microsoft/dp-203/view/13/ 5/13
26/12/2023, 14:39 DP-203 Exam – Free Actual Q&As, Page 13 | ExamTopics
Correct Answer:
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-power-bi-dashboard
https://www.examtopics.com/exams/microsoft/dp-203/view/13/ 6/13
26/12/2023, 14:39 DP-203 Exam – Free Actual Q&As, Page 13 | ExamTopics
DRAG DROP -
You have an Azure Stream Analytics job that is a Stream Analytics project solution in Microsoft Visual Studio. The job accepts data generated by
You need to modify the job to accept data generated by the IoT devices in the Protobuf format.
Which three actions should you perform from Visual Studio on sequence? To answer, move the appropriate actions from the list of actions to the
Correct Answer:
Step 1: Add an Azure Stream Analytics Custom Deserializer Project (.NET) project to the solution.
1. Open Visual Studio and select File > New > Project. Search for Stream Analytics and select Azure Stream Analytics Custom Deserializer
2. In Solution Explorer, right-click your Protobuf Deserializer project and select Manage NuGet Packages from the menu. Then install the
https://www.examtopics.com/exams/microsoft/dp-203/view/13/ 7/13
26/12/2023, 14:39 DP-203 Exam – Free Actual Q&As, Page 13 | ExamTopics
3. Add the MessageBodyProto class and the MessageBodyDeserializer class to your project.
Step 2: Add .NET deserializer code for Protobuf to the custom deserializer project
Azure Stream Analytics has built-in support for three data formats: JSON, CSV, and Avro. With custom .NET deserializers, you can read data
from other formats such as Protocol Buffer, Bond and other user defined formats for both cloud and edge jobs.
1. In Solution Explorer, right-click the Protobuf Deserializer solution and select Add > New Project. Under Azure Stream Analytics > Stream
Analytics, choose
2. Right-click References under the ProtobufCloudDeserializer Azure Stream Analytics project. Under Projects, add Protobuf Deserializer. It
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/custom-deserializer
You have an Azure Storage account and a data warehouse in Azure Synapse Analytics in the UK South region.
You need to copy blob data from the storage account to the data warehouse by using Azure Data Factory. The solution must meet the following
requirements:
✑ Ensure that the data remains in the UK South region at all times.
✑ Minimize administrative effort.
Which type of integration runtime should you use?
Correct Answer: A
Incorrect Answers:
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime
https://www.examtopics.com/exams/microsoft/dp-203/view/13/ 8/13
26/12/2023, 14:39 DP-203 Exam – Free Actual Q&As, Page 13 | ExamTopics
HOTSPOT -
You have an Azure SQL database named Database1 and two Azure event hubs named HubA and HubB. The data consumed from each source is
You need to implement Azure Stream Analytics to calculate the average fare per mile by driver.
How should you configure the Stream Analytics input for each source? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
HubA: Stream -
HubB: Stream -
Database1: Reference -
Reference data (also known as a lookup table) is a finite data set that is static or slowly changing in nature, used to perform a lookup or to
augment your data streams. For example, in an IoT scenario, you could store metadata about sensors (which don't change often) in reference
data and join it with real time IoT data streams. Azure Stream Analytics loads reference data in memory to achieve low latency stream
processing
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-use-reference-data
https://www.examtopics.com/exams/microsoft/dp-203/view/13/ 9/13
26/12/2023, 14:39 DP-203 Exam – Free Actual Q&As, Page 13 | ExamTopics
You have an Azure Stream Analytics job that receives clickstream data from an Azure event hub.
You need to define a query in the Stream Analytics job. The query must meet the following requirements:
✑ Count the number of clicks within each 10-second window based on the country of a visitor.
✑ Ensure that each click is NOT counted more than once.
How should you define the Query?
A. SELECT Country, Avg(*) AS Average FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, SlidingWindow(second, 10)
B. SELECT Country, Count(*) AS Count FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, TumblingWindow(second, 10)
Most Voted
C. SELECT Country, Avg(*) AS Average FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, HoppingWindow(second, 10, 2)
D. SELECT Country, Count(*) AS Count FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, SessionWindow(second, 5, 10)
Correct Answer: B
Tumbling window functions are used to segment a data stream into distinct time segments and perform a function against them, such as the
example below. The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot belong to more than one
tumbling window.
Example:
Incorrect Answers:
A: Sliding windows, unlike Tumbling or Hopping windows, output events only for points in time when the content of the window actually
changes. In other words, when an event enters or exits the window. Every window has at least one event, like in the case of Hopping windows,
C: Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap, so
events can belong to more than one Hopping window result set. To make a Hopping window the same as a Tumbling window, specify the hop
D: Session windows group events that arrive at similar times, filtering out periods of time where there is no data.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
https://www.examtopics.com/exams/microsoft/dp-203/view/13/ 10/13
26/12/2023, 14:39 DP-203 Exam – Free Actual Q&As, Page 13 | ExamTopics
HOTSPOT -
You are building an Azure Analytics query that will receive input data from Azure IoT Hub and write the results to Azure Blob storage.
You need to calculate the difference in the number of readings per sensor per hour.
How should you complete the query? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
Box 1: LAG -
ג ג
The LAG analytic operator allows one to look up a €previous € event in an event stream, within certain constraints. It is very useful for
computing the rate of growth of a variable, detecting when a variable crosses a threshold, or when a condition starts or stops being true.
SELECT sensorId,
growth = reading -
FROM input -
Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/lag-azure-stream-analytics
https://www.examtopics.com/exams/microsoft/dp-203/view/13/ 11/13
26/12/2023, 14:39 DP-203 Exam – Free Actual Q&As, Page 13 | ExamTopics
You need to schedule an Azure Data Factory pipeline to execute when a new file arrives in an Azure Data Lake Storage Gen2 container.
A. on-demand
B. tumbling window
C. schedule
Correct Answer: D
Event-driven architecture (EDA) is a common data integration pattern that involves production, detection, consumption, and reaction to events.
Data integration scenarios often require Data Factory customers to trigger pipelines based on events happening in storage account, such as the
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger
You have two Azure Data Factory instances named ADFdev and ADFprod. ADFdev connects to an Azure DevOps Git repository.
You publish changes from the main branch of the Git repository to ADFdev.
Correct Answer: C
In Azure Data Factory, continuous integration and delivery (CI/CD) means moving Data Factory pipelines from one environment (development,
Note: The following is a guide for setting up an Azure Pipelines release that automates the deployment of a data factory to multiple
environments.
1. In Azure DevOps, open the project that's configured with your data factory.
2. On the left side of the page, select Pipelines, and then select Releases.
3. Select New pipeline, or, if you have existing pipelines, select New and then New release pipeline.
5. Select Add artifact, and then select the git repository configured with your development data factory. Select the publish branch of the
repository for the Default branch. By default, this publish branch is adf_publish.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment
https://www.examtopics.com/exams/microsoft/dp-203/view/13/ 12/13
26/12/2023, 14:45 DP-203 Exam – Free Actual Q&As, Page 14 | ExamTopics
You are developing a solution that will stream to Azure Stream Analytics. The solution will have both streaming data and reference data.
Which input type should you use for the reference data?
A. Azure Cosmos DB
Correct Answer: B
Stream Analytics supports Azure Blob storage and Azure SQL Database as the storage layer for Reference Data.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-use-reference-data
You are designing an Azure Stream Analytics job to process incoming events from sensors in retail environments.
You need to process the events to produce a running average of shopper counts during the previous 15 minutes, calculated at five-minute
intervals.
A. snapshot
B. tumbling
D. sliding
Correct Answer: C
Unlike tumbling windows, hopping windows model scheduled overlapping windows. A hopping window specification consist of three
parameters: the timeunit, the windowsize (how long each window lasts) and the hopsize (by how much each window moves forward relative to
Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/hopping-window-azure-stream-analytics
https://www.examtopics.com/exams/microsoft/dp-203/view/14/ 1/12
26/12/2023, 14:45 DP-203 Exam – Free Actual Q&As, Page 14 | ExamTopics
HOTSPOT -
You are designing a monitoring solution for a fleet of 500 vehicles. Each vehicle has a GPS tracking device that sends data to an Azure event hub
You have a CSV file in an Azure Data Lake Storage Gen2 container. The file maintains the expected geographical area in which each vehicle should
be.
You need to ensure that when a GPS position is outside the expected area, a message is added to another event hub for processing within 30
What should you include in the solution? To answer, select the appropriate options in the answer area.
Hot Area:
https://www.examtopics.com/exams/microsoft/dp-203/view/14/ 2/12
26/12/2023, 14:45 DP-203 Exam – Free Actual Q&As, Page 14 | ExamTopics
Correct Answer:
Box 2: Hopping -
Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap and be
emitted more often than the window size. Events can belong to more than one Hopping window result set. To make a Hopping window the
same as a Tumbling window, specify the hop size to be the same as the window size.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
https://www.examtopics.com/exams/microsoft/dp-203/view/14/ 3/12
26/12/2023, 14:45 DP-203 Exam – Free Actual Q&As, Page 14 | ExamTopics
You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day.
You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs
Correct Answer: B
The Databricks ABS-AQS connector uses Azure Queue Storage (AQS) to provide an optimized file source that lets you find new files written to
an Azure Blob storage (ABS) container without repeatedly listing all of the files. This provides two major advantages:
✑ Lower latency: no need to list nested directory structures on ABS, which is slow and resource intensive.
✑ Lower costs: no more costly LIST API requests made to ABS.
Reference:
https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/aqs
https://www.examtopics.com/exams/microsoft/dp-203/view/14/ 4/12
26/12/2023, 14:45 DP-203 Exam – Free Actual Q&As, Page 14 | ExamTopics
HOTSPOT -
The current status of the integration runtime has the following configurations:
✑ Status: Running
✑ Type: Self-Hosted
✑ Version: 4.4.7292.1
✑ Running / Registered Node(s): 1/1
✑ High Availability Enabled: False
✑ Linked Count: 0
✑ Queue Length: 0
✑ Average Queue Duration. 0.00s
The integration runtime has the following node details:
✑ Name: X-M
✑ Status: Running
✑ Version: 4.4.7292.1
✑ Available Memory: 7697MB
✑ CPU Utilization: 6%
✑ Network (In/Out): 1.21KBps/0.83KBps
✑ Concurrent Jobs (Running/Limit): 2/14
✑ Role: Dispatcher/Worker
✑ Credential Status: In Sync
Use the drop-down menus to select the answer choice that completes each statement based on the information presented.
Hot Area:
https://www.examtopics.com/exams/microsoft/dp-203/view/14/ 5/12
26/12/2023, 14:45 DP-203 Exam – Free Actual Q&As, Page 14 | ExamTopics
Correct Answer:
Note: Higher availability of the self-hosted integration runtime so that it's no longer the single point of failure in your big data solution or cloud
Data Factory.
Box 2: lowered -
We see:
CPU Utilization: 6%
Note: When the processor and available RAM aren't well utilized, but the execution of concurrent jobs reaches a node's limits, scale up by
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime
https://www.examtopics.com/exams/microsoft/dp-203/view/14/ 6/12
26/12/2023, 14:45 DP-203 Exam – Free Actual Q&As, Page 14 | ExamTopics
You have an Azure Databricks workspace named workspace1 in the Standard pricing tier.
You need to configure workspace1 to support autoscaling all-purpose clusters. The solution must meet the following requirements:
✑ Automatically scale down workers when the cluster is underutilized for three minutes.
✑ Minimize the time it takes to scale to the maximum number of workers.
✑ Minimize costs.
What should you do first?
Correct Answer: B
For clusters running Databricks Runtime 6.4 and above, optimized autoscaling is used by all-purpose clusters in the Premium plan
Optimized autoscaling:
Can scale down even if the cluster is not idle by looking at shuffle file state.
On job clusters, scales down if the cluster is underutilized over the last 40 seconds.
On all-purpose clusters, scales down if the cluster is underutilized over the last 150 seconds.
The spark.databricks.aggressiveWindowDownS Spark configuration property specifies in seconds how often a cluster makes down-scaling
decisions. Increasing the value causes a cluster to scale down more slowly. The maximum value is 600.
Starts with adding 8 nodes. Thereafter, scales up exponentially, but can take many steps to reach the max. You can customize the first step by
Scales down only when the cluster is completely idle and it has been underutilized for the last 10 minutes.
Reference:
https://docs.databricks.com/clusters/configure.html
https://www.examtopics.com/exams/microsoft/dp-203/view/14/ 7/12
26/12/2023, 14:45 DP-203 Exam – Free Actual Q&As, Page 14 | ExamTopics
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that
might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an Azure Stream Analytics solution that will analyze Twitter data.
You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once.
Solution: You use a tumbling window, and you set the window size to 10 seconds.
B. No
Correct Answer: A
Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. The following diagram illustrates a stream with a
series of events and how they are mapped into 10-second tumbling windows.
Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics
https://www.examtopics.com/exams/microsoft/dp-203/view/14/ 8/12
26/12/2023, 14:45 DP-203 Exam – Free Actual Q&As, Page 14 | ExamTopics
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that
might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an Azure Stream Analytics solution that will analyze Twitter data.
You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once.
Solution: You use a session window that uses a timeout size of 10 seconds.
A. Yes
B. No Most Voted
Correct Answer: B
Instead use a tumbling window. Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.
Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics
https://www.examtopics.com/exams/microsoft/dp-203/view/14/ 9/12
26/12/2023, 14:45 DP-203 Exam – Free Actual Q&As, Page 14 | ExamTopics
You use Azure Stream Analytics to receive data from Azure Event Hubs and to output the data to an Azure Blob Storage account.
You need to output the count of records received from the last five minutes every minute.
A. Session
B. Tumbling
C. Sliding
Correct Answer: D
Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap and be
emitted more often than the window size. Events can belong to more than one Hopping window result set. To make a Hopping window the
same as a Tumbling window, specify the hop size to be the same as the window size.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
https://www.examtopics.com/exams/microsoft/dp-203/view/14/ 10/12
26/12/2023, 14:45 DP-203 Exam – Free Actual Q&As, Page 14 | ExamTopics
HOTSPOT -
You configure version control for an Azure Data Factory instance as shown in the following exhibit.
Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic.
Hot Area:
Correct Answer:
Box 1: adf_publish -
https://www.examtopics.com/exams/microsoft/dp-203/view/14/ 11/12
26/12/2023, 14:45 DP-203 Exam – Free Actual Q&As, Page 14 | ExamTopics
The Publish branch is the branch in your repository where publishing related ARM templates are stored and updated. By default, it's adf_publish.
Box 2: / dwh_batchetl/adf_publish/contososales
Note: RepositoryName (here dwh_batchetl): Your Azure Repos code repository name. Azure Repos projects contain Git repositories to manage
your source code as your project grows. You can create a new repository or use an existing repository that's already in your project.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/source-control
https://www.examtopics.com/exams/microsoft/dp-203/view/14/ 12/12
26/12/2023, 14:46 DP-203 Exam – Free Actual Q&As, Page 15 | ExamTopics
HOTSPOT -
You are designing an Azure Stream Analytics solution that receives instant messaging data from an Azure Event Hub.
You need to ensure that the output from the Stream Analytics job counts the number of messages per time zone every 15 seconds.
How should you complete the Stream Analytics query? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
Box 1: timestamp by -
Box 2: TUMBLINGWINDOW -
Tumbling window functions are used to segment a data stream into distinct time segments and perform a function against them, such as the
example below. The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot belong to more than one
tumbling window.
https://www.examtopics.com/exams/microsoft/dp-203/view/15/ 2/13
26/12/2023, 14:46 DP-203 Exam – Free Actual Q&As, Page 15 | ExamTopics
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
https://www.examtopics.com/exams/microsoft/dp-203/view/15/ 3/13
26/12/2023, 14:46 DP-203 Exam – Free Actual Q&As, Page 15 | ExamTopics
HOTSPOT -
You have an Azure Data Factory instance named ADF1 and two Azure Synapse Analytics workspaces named WS1 and WS2.
✑ P1: Uses a copy activity to copy data from a nonpartitioned table in a dedicated SQL pool of WS1 to an Azure Data Lake Storage Gen2 account
✑ P2: Uses a copy activity to copy data from text-delimited files in an Azure Data Lake Storage Gen2 account to a nonpartitioned table in a
dedicated SQL pool of WS2
Which dataset settings should you configure for the copy activity if each pipeline? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
While SQL pool supports many loading methods including non-Polybase options such as BCP and SQL BulkCopy API, the fastest and most
scalable way to load data is through PolyBase. PolyBase is a technology that accesses external data stored in Azure Blob storage or Azure Data
https://www.examtopics.com/exams/microsoft/dp-203/view/15/ 4/13
26/12/2023, 14:46 DP-203 Exam – Free Actual Q&As, Page 15 | ExamTopics
Polybase not possible for text files. Have to use Bulk insert.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview
https://www.examtopics.com/exams/microsoft/dp-203/view/15/ 5/13
26/12/2023, 14:46 DP-203 Exam – Free Actual Q&As, Page 15 | ExamTopics
HOTSPOT -
You have an Azure Storage account that generates 200,000 new files daily. The file names have a format of
{YYYY}/{MM}/{DD}/{HH}/{CustomerID}.csv.
You need to design an Azure Data Factory solution that will load new data from the storage account to an Azure Data Lake once hourly. The
How should you configure the solution? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. The following diagram illustrates a stream with a
series of events and how they are mapped into 10-second tumbling windows.
https://www.examtopics.com/exams/microsoft/dp-203/view/15/ 6/13
26/12/2023, 14:46 DP-203 Exam – Free Actual Q&As, Page 15 | ExamTopics
Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that
might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:
✑ A workload for data engineers who will use Python and SQL.
✑ A workload for jobs that will run notebooks that use Python, Scala, and SQL.
✑ A workload that data scientists will use to perform ad hoc analysis in Scala and R.
The enterprise architecture team at your company identifies the following standards for Databricks environments:
✑ All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are
three data scientists.
Solution: You create a Standard cluster for each data scientist, a Standard cluster for the data engineers, and a High Concurrency cluster for the
jobs.
A. Yes
B. No Most Voted
Correct Answer: B
We need a High Concurrency cluster for the data engineers and the jobs.
Note: Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they provide Apache Spark-
native fine-grained sharing for maximum resource utilization and minimum query latencies.
Reference:
https://docs.azuredatabricks.net/clusters/configure.html
https://www.examtopics.com/exams/microsoft/dp-203/view/15/ 7/13
26/12/2023, 14:46 DP-203 Exam – Free Actual Q&As, Page 15 | ExamTopics
Data from System2. Populate Facts must execute after Populate Dimensions pipeline. All the pipelines must execute every eight hours.
C. Create a patient pipeline that contains the four pipelines and use a schedule trigger. Most Voted
D. Create a patient pipeline that contains the four pipelines and use an event trigger.
Correct Answer: C
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers
https://www.examtopics.com/exams/microsoft/dp-203/view/15/ 8/13
26/12/2023, 14:46 DP-203 Exam – Free Actual Q&As, Page 15 | ExamTopics
DRAG DROP -
You are responsible for providing access to an Azure Data Lake Storage Gen2 account.
Your user account has contributor access to the storage account, and you have the application ID and access key.
You plan to use PolyBase to load data into an enterprise data warehouse in Azure Synapse Analytics.
You need to configure PolyBase to connect the data warehouse to storage account.
Which three components should you create in sequence? To answer, move the appropriate components from the list of components to the answer
Correct Answer:
A master key should be created only once in a database. The Database Master Key is a symmetric key used to protect the private keys of
Create a Database Scoped Credential. A Database Scoped Credential is a record that contains the authentication information required to
connect an external resource. The master key needs to be created first before creating the database scoped credential.
Create an External Data Source. External data sources are used to establish connectivity for data loading using Polybase.
Reference:
https://www.sqlservercentral.com/articles/access-external-data-from-azure-synapse-analytics-using-polybase
https://www.examtopics.com/exams/microsoft/dp-203/view/15/ 9/13
26/12/2023, 14:46 DP-203 Exam – Free Actual Q&As, Page 15 | ExamTopics
You are monitoring an Azure Stream Analytics job by using metrics in Azure.
You discover that during the last 12 hours, the average watermark delay is consistently greater than the configured late arrival tolerance.
A. Events whose application timestamp is earlier than their arrival time by more than five minutes arrive as inputs.
D. The job lacks the resources to process the volume of incoming data. Most Voted
Correct Answer: D
Watermark Delay indicates the delay of the streaming data processing job.
There are a number of resource constraints that can cause the streaming pipeline to slow down. The watermark delay metric can rise due to:
1. Not enough processing resources in Stream Analytics to handle the volume of input events. To scale up resources, see Understand and
adjust Streaming
Units.
2. Not enough throughput within the input event brokers, so they are throttled. For possible solutions, see Automatically scale up Azure Event
3. Output sinks are not provisioned with enough capacity, so they are throttled. The possible solutions vary widely based on the flavor of output
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-time-handling
https://www.examtopics.com/exams/microsoft/dp-203/view/15/ 10/13
26/12/2023, 14:46 DP-203 Exam – Free Actual Q&As, Page 15 | ExamTopics
HOTSPOT -
You are building an Azure Stream Analytics job to retrieve game data.
You need to ensure that the job returns the highest scoring record for each five-minute time interval of each game.
How should you complete the Stream Analytics query? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
TopOne returns the top-rank record, where rank defines the ranking position of the event in the window according to the specified ordering.
Box 2: Hopping(minute,5)
Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap and be
emitted more often than the window size. Events can belong to more than one Hopping window result set. To make a Hopping window the
same as a Tumbling window, specify the hop size to be the same as the window size.
https://www.examtopics.com/exams/microsoft/dp-203/view/15/ 11/13
26/12/2023, 14:46 DP-203 Exam – Free Actual Q&As, Page 15 | ExamTopics
Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/topone-azure-stream-analytics https://docs.microsoft.com/en-us/azure/stream-
analytics/stream-analytics-window-functions
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that
might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Azure Data Lake Storage account that contains a staging zone.
You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert
Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that copies the data to a staging table in the data warehouse, and
A. Yes
B. No Most Voted
Correct Answer: A
If you need to transform data in a way that is not supported by Data Factory, you can create a custom activity with your own data processing
Note: You can use data transformation activities in Azure Data Factory and Synapse pipelines to transform and process your raw data into
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/transform-data
https://www.examtopics.com/exams/microsoft/dp-203/view/15/ 12/13
26/12/2023, 14:46 DP-203 Exam – Free Actual Q&As, Page 15 | ExamTopics
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that
might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:
✑ A workload for data engineers who will use Python and SQL.
✑ A workload for jobs that will run notebooks that use Python, Scala, and SQL.
✑ A workload that data scientists will use to perform ad hoc analysis in Scala and R.
The enterprise architecture team at your company identifies the following standards for Databricks environments:
✑ All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are
three data scientists.
Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster for the data engineers, and a Standard cluster
A. Yes
B. No Most Voted
Correct Answer: B
Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they provide Apache Spark-
native fine-grained sharing for maximum resource utilization and minimum query latencies.
Reference:
https://docs.azuredatabricks.net/clusters/configure.html
https://www.examtopics.com/exams/microsoft/dp-203/view/15/ 13/13
26/12/2023, 14:51 DP-203 Exam – Free Actual Q&As, Page 16 | ExamTopics
You are designing an Azure Databricks cluster that runs user-defined local processes.
You need to recommend a cluster configuration that meets the following requirements:
Correct Answer: B
A High Concurrency cluster is a managed cloud resource. The key benefits of High Concurrency clusters are that they provide fine-grained
Databricks chooses the appropriate number of workers required to run your job. This is referred to as autoscaling. Autoscaling makes it easier
to achieve high cluster utilization, because you don't need to provision the cluster to match a workload.
Incorrect Answers:
C: The cluster configuration includes an auto terminate setting whose default value depends on cluster mode:
Standard and Single Node clusters terminate automatically after 120 minutes by default.
Reference:
https://docs.microsoft.com/en-us/azure/databricks/clusters/configure
https://www.examtopics.com/exams/microsoft/dp-203/view/16/ 1/10
26/12/2023, 14:51 DP-203 Exam – Free Actual Q&As, Page 16 | ExamTopics
HOTSPOT -
You are building an Azure Data Factory solution to process data received from Azure Event Hubs, and then ingested into an Azure Data Lake
The data will be ingested every five minutes from devices into JSON files. The files have the following naming pattern.
/{deviceType}/in/{YYYY}/{MM}/{DD}/{HH}/{deviceID}_{YYYY}{MM}{DD}HH}{mm}.json
You need to prepare the data for batch data processing so that there is one dataset per hour per deviceType. The solution must minimize read
times.
How should you configure the sink for the copy activity? To answer, select the appropriate options in the answer area.
Hot Area:
https://www.examtopics.com/exams/microsoft/dp-203/view/16/ 2/10
26/12/2023, 14:51 DP-203 Exam – Free Actual Q&As, Page 16 | ExamTopics
Correct Answer:
Box 1: @trigger().startTime -
startTime: A date-time value. For basic schedules, the value of the startTime property applies to the first occurrence. For complex schedules,
Box 2: /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json
- FlattenHierarchy: All files from the source folder are in the first level of the target folder. The target files have autogenerated names.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers https://docs.microsoft.com/en-us/azure/data-
factory/connector-file-system
https://www.examtopics.com/exams/microsoft/dp-203/view/16/ 3/10
26/12/2023, 14:51 DP-203 Exam – Free Actual Q&As, Page 16 | ExamTopics
DRAG DROP -
You are designing an Azure Data Lake Storage Gen2 structure for telemetry data from 25 million devices distributed across seven key
geographical regions. Each minute, the devices will send a JSON payload of metrics to Azure Event Hubs.
You need to recommend a folder structure for the data. The solution must meet the following requirements:
✑ Data engineers from each region must be able to build their own pipelines for the data of their respective region only.
✑ The data must be processed at least once every 15 minutes for inclusion in Azure Synapse Analytics serverless SQL pools.
How should you recommend completing the structure? To answer, drag the appropriate values to the correct targets. Each value may be used
once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
Correct Answer:
Box 1: {raw/regionID}
Box 2: {YYYY}/{MM}/{DD}/{HH}/{mm}
Box 3: {deviceID}
Reference:
https://github.com/paolosalvatori/StreamAnalyticsAzureDataLakeStore/blob/master/README.md
https://www.examtopics.com/exams/microsoft/dp-203/view/16/ 4/10
26/12/2023, 14:51 DP-203 Exam – Free Actual Q&As, Page 16 | ExamTopics
HOTSPOT -
You are implementing an Azure Stream Analytics solution to process event data from devices.
The devices output events when there is a fault and emit a repeat of the event every five seconds until the fault is resolved. The devices output a
heartbeat event every five seconds after a previous event if there are no faults present.
How should you complete the Stream Analytics SQL query? To answer, select the appropriate options in the answer area.
Hot Area:
https://www.examtopics.com/exams/microsoft/dp-203/view/16/ 5/10
26/12/2023, 14:51 DP-203 Exam – Free Actual Q&As, Page 16 | ExamTopics
Correct Answer:
Box 2: ,TumblingWindow(Second, 5)
Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.
The following diagram illustrates a stream with a series of events and how they are mapped into 10-second tumbling windows.
Incorrect Answers:
,SessionWindow.. : Session windows group events that arrive at similar times, filtering out periods of time where there is no data.
Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/session-window-azure-stream-analytics https://docs.microsoft.com/en-us/stream-
analytics-query/tumbling-window-azure-stream-analytics
https://www.examtopics.com/exams/microsoft/dp-203/view/16/ 6/10
26/12/2023, 14:51 DP-203 Exam – Free Actual Q&As, Page 16 | ExamTopics
You are creating a new notebook in Azure Databricks that will support R as the primary language but will also support Scala and SQL.
B. @<Language >
C. \\[<language >]
D. \\(<language >)
Correct Answer: A
To change the language in Databricks' cells to either Scala, SQL, Python or R, prefix the cell with '%', followed by the language.
Reference:
https://www.theta.co.nz/news-blogs/tech-blog/enhancing-digital-twins-part-3-predictive-maintenance-with-azure-databricks
You have an Azure Data Factory pipeline that performs an incremental load of source data to an Azure Data Lake Storage Gen2 account.
You need to ensure that the pipeline execution meets the following requirements:
✑ Automatically retries the execution when the pipeline run fails due to concurrency or throttling limits.
✑ Supports backfilling existing data in the table.
Which type of trigger should you use?
A. event
B. on-demand
C. schedule
Correct Answer: D
In case of pipeline failures, tumbling window trigger can retry the execution of the referenced pipeline automatically, using the same input
parameters, without the user intervention. This can be specified using the property "retryPolicy" in the trigger definition.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger
https://www.examtopics.com/exams/microsoft/dp-203/view/16/ 7/10
26/12/2023, 14:51 DP-203 Exam – Free Actual Q&As, Page 16 | ExamTopics
You are designing a solution that will copy Parquet files stored in an Azure Blob storage account to an Azure Data Lake Storage Gen2 account.
The data will be loaded daily to the data lake and will use a folder structure of {Year}/{Month}/{Day}/.
You need to design a daily Azure Data Factory data load to minimize the data transfer between the two accounts.
Which two configurations should you include in the design? Each correct answer presents part of the solution.
C. Filter by the last modified date of the source files. Most Voted
Correct Answer: AC
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-data-lake-storage
You plan to build a structured streaming solution in Azure Databricks. The solution will count new events in five-minute intervals and report only
events that arrive during the interval. The output will be sent to a Delta Lake table.
A. update
B. complete
C. append
Correct Answer: C
Append Mode: Only new rows appended in the result table since the last trigger are written to external storage. This is applicable only for the
queries where existing rows in the Result Table are not expected to change.
Incorrect Answers:
B: Complete Mode: The entire updated result table is written to external storage. It is up to the storage connector to decide how to handle the
A: Update Mode: Only the rows that were updated in the result table since the last trigger are written to external storage. This is different from
Complete Mode in that Update Mode outputs only the rows that have changed since the last trigger. If the query doesn't contain aggregations, it
Reference:
https://docs.databricks.com/getting-started/spark/streaming.html
https://www.examtopics.com/exams/microsoft/dp-203/view/16/ 8/10
26/12/2023, 14:51 DP-203 Exam – Free Actual Q&As, Page 16 | ExamTopics
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that
might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.
You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named container1.
You plan to insert data from the files in container1 into Table1 and transform the data. Each row of data in the files will produce one row in the
serving layer of
Table1.
You need to ensure that when the source data files are loaded to container1, the DateTime is stored as an additional column in Table1.
Solution: In an Azure Synapse Analytics pipeline, you use a data flow that contains a Derived Column transformation.
B. No
Correct Answer: A
Use the derived column transformation to generate new columns in your data flow or to modify existing fields.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/data-flow-derived-column
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that
might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.
You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named container1.
You plan to insert data from the files in container1 into Table1 and transform the data. Each row of data in the files will produce one row in the
serving layer of
Table1.
You need to ensure that when the source data files are loaded to container1, the DateTime is stored as an additional column in Table1.
Solution: You use a dedicated SQL pool to create an external table that has an additional DateTime column.
A. Yes
B. No Most Voted
Correct Answer: B
Instead use the derived column transformation to generate new columns in your data flow or to modify existing fields.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/data-flow-derived-column
https://www.examtopics.com/exams/microsoft/dp-203/view/16/ 9/10