DP-100 Exam Valid Dumps
DP-100 Exam Valid Dumps
dumps questions are the best material for you to test all the related Microsoft
exam topics. By using the DP-100 exam dumps questions and practicing your
skills, you can increase your confidence and chances of passing the DP-100
exam.
Instant Download
Free Update in 3 Months
Money back guarantee
PDF and Software
24/7 Customer Support
Besides, Dumpsinfo also provides unlimited access. You can get all
Dumpsinfo files at lowest price.
1.HOTSPOT
You have an Azure Machine Learning workspace.
You run the following code in a Python environment in which the configuration file for your workspace
has been downloaded.
instructions: For each of the following statements, select Yes if the statement is true. Otherwise,
select No. NOTE: Each correct selection is worth one point.
Answer:
2.You plan to build a team data science environment. Data for training models in machine learning
pipelines will be over 20 GB in size.
You have the following requirements:
? Models must be built using Caffe2 or Chainer frameworks.
? Data scientists must be able to use a data science environment to build the machine learning
pipelines and train models on their personal devices in both connected and disconnected network
environments.
? Personal devices must support updating machine learning pipelines when connected to a network.
You need to select a data science environment.
Which environment should you use?
A. Azure Machine Learning Service
B. Azure Machine Learning Studio
C. Azure Databricks
D. Azure Kubernetes Service (AKS)
Answer: A
Explanation:
The Data Science Virtual Machine (DSVM) is a customized VM image on Microsoft’s Azure cloud
built specifically for doing data science. Caffe2 and Chainer are supported by DSVM.
DSVM integrates with Azure Machine Learning.
Incorrect Answers:
B: Use Machine Learning Studio when you want to experiment with machine learning models quickly
and easily, and the built-in machine learning algorithms are sufficient for your solutions.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-
machine/overview
3.HOTSPOT
You write code to retrieve an experiment that is run from your Azure Machine Learning workspace.
The run used the model interpretation support in Azure Machine Learning to generate and upload a
model explanation.
Business managers in your organization want to see the importance of the features in the model.
You need to print out the model features and their relative importance in an output that looks similar
to the following.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: from_run_id
from_run_id(workspace, experiment_name, run_id)
Create the client with factory method given a run ID.
Returns an instance of the ExplanationClient.
Parameters
Workspace Workspace An object that represents a workspace.
experiment_name str The name of an experiment.
run_id str A GUID that represents a run.
Box 2: list_model_explanations
list_model_explanations returns a dictionary of metadata for all model explanations available.
Returns
A dictionary of explanation metadata such as id, data type, explanation method, model type, and
upload time, sorted by upload time
Box 3: explanation
Reference: https://docs.microsoft.com/en-us/python/api/azureml-contrib-
interpret/azureml.contrib.interpret.explanation.explanation_client.explanationclient?view=azure-ml-py
4.Note: This question is part of a series of questions that present the same scenario. Each question in
the series contains a unique solution that might meet the stated goals. Some question sets might
have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are using Azure Machine Learning to run an experiment that trains a classification model.
You want to use Hyperdrive to find parameters that optimize the AUC metric for the model.
You configure a HyperDriveConfig for the experiment by running the following code:
variable named y_test variable, and the predicted probabilities from the model are stored in a variable
named y_predicted. You need to add logging to the script to allow Hyperdrive to optimize
hyperparameters for the AUC metric.
Solution: Run the following code:
5.HOTSPOT
You have a Python data frame named salesData in the following format:
Answer:
Explanation:
Box 1: dataFrame
Syntax: pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value',
col_level=None)[source]
Where frame is a DataFrame
Box 2: shop
Paramter id_vars id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
Box 3: ['2017','2018']
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
Example:
df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html
6.You plan to use the Hyperdrive feature of Azure Machine Learning to determine the optimal
hyperparameter values when training a model.
You must use Hyperdrive to try combinations of the following hyperparameter values. You must not
apply an early termination policy.
learning_rate: any value between 0.001 and 0.1
• batch_size: 16, 32, or 64
You need to configure the sampling method for the Hyperdrive experiment
Which two sampling methods can you use? Each correct answer is a complete solution. NOTE: Each
correct selection is worth one point.
A. Grid sampling
B. No sampling
C. Bayesian sampling
D. Random sampling
Answer: CD
Explanation:
C: Bayesian sampling is based on the Bayesian optimization algorithm and makes intelligent choices
on the hyperparameter values to sample next. It picks the sample based on how the previous
samples performed, such that the new sample improves the reported primary metric.
Bayesian sampling does not support any early termination policy
Example:
from azureml.train.hyperdrive import BayesianParameterSampling
from azureml.train.hyperdrive import uniform, choice
param_sampling = BayesianParameterSampling( {
"learning_rate": uniform(0.05, 0.1),
"batch_size": choice(16, 32, 64, 128)
}
)
D: In random sampling, hyperparameter values are randomly selected from the defined search space.
Random sampling allows the search space to include both discrete and continuous hyperparameters.
Incorrect Answers:
B: Grid sampling can be used if your hyperparameter space can be defined as a choice among
discrete values and if you have sufficient budget to exhaustively search over all values in the defined
search space. Additionally, one can use automated early termination of poorly performing runs, which
reduces wastage of resources.
Example, the following space has a total of six samples:
from azureml.train.hyperdrive import GridParameterSampling
from azureml.train.hyperdrive import choice
param_sampling = GridParameterSampling( {
"num_hidden_layers": choice(1, 2, 3),
"batch_size": choice(16, 32)
}
)
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
7.You plan to run a script as an experiment using a Script Run Configuration. The script uses
modules from the scipy library as well as several Python packages that are not typically installed in a
default conda environment.
You plan to run the experiment on your local workstation for small datasets and scale out the
experiment by running it on more powerful remote compute clusters for larger datasets.
You need to ensure that the experiment runs successfully on local and remote compute with the least
administrative effort.
What should you do?
A. Create and register an Environment that includes the required packages. Use this Environment for
all experiment runs.
B. Always run the experiment with an Estimator by using the default packages.
C. Do not specify an environment in the run configuration for the experiment. Run the experiment by
using the default environment.
D. Create a config. yaml file defining the conda packages that are required and save the file in the
experiment folder.
E. Create a virtual machine (VM) with the required Python configuration and attach the VM as a
compute target. Use this compute target for all experiment runs.
Answer: A
Explanation:
If you have an existing Conda environment on your local computer, then you can use the service to
create an environment object. By using this strategy, you can reuse your local interactive environment
on remote runs.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments
8. Find the Evaluate Model module and drag it onto the canvas.
9.Note: This question is part of a series of questions that present the same scenario. Each question in
the series contains a unique solution that might meet the stated goals. Some question sets might
have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have an Azure Machine Learning workspace. You connect to a terminal session from the
Notebooks page in Azure Machine Learning studio.
You plan to add a new Jupyter kernel that will be accessible from the same terminal session.
You need to perform the task that must be completed before you can add the new kernel.
Solution: Delete the Python 3.8 - AzureML kernel.
Does the solution meet the goal?
A. Yes
B. No
Answer: B
10.HOTSPOT
You train classification and regression models by using automated machine learning.
You must evaluate automated machine learning experiment results. The results include how a
classification model is making systematic errors in its predictions and the relationship between the
target feature and the regression model's predictions. You must use charts generated by automated
machine learning.
You need to choose a chart type for each model type.
Which chart types should you use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
11.You manage an Azure Machine Learning workspace. You plan to import data from Azure Data
Lake Storage Gen2. You need to build a URI that represents the storage location.
Which protocol should you use?
A. abfss
B. https
C. adl
D. wasbs
Answer: A
12.You use Azure Machine Learning studio to analyze an mltable data asset containing a decimal
column named column1. You need to verify that the column1 values are normally distributed.
Which statistic should you use?
A. Max
B. Type
C. Profile
D. Mean
Answer: C
13.HOTSPOT
You manage an Azure Machine Learning workspace named Workspace1 and an Azure Blob Storage
accessed by using the URL https://storage1.blob.core.wmdows.net/data1.
You plan to create an Azure Blob datastore in Workspace1. The datastore must target the Blob
Storage by using Azure Machine Learning Python SDK v2. Access authorization to the datastore
must be limited to a specific amount of time.
You need to select the parameters of the Azure Blob Datastore class that will point to the target
datastore and authorize access to it.
Which parameters should you use? To answer, select the appropriate options in the answer area
NOTE: Each correct selection is worth one point.
Answer:
15.Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets might
have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Scale and Reduce sampling mode.
Does the solution meet the goal?
A. Yes
B. No
Answer: B
Explanation:
Instead use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Note: SMOTE is used to increase the number of underepresented cases in a dataset used for
machine learning. SMOTE is a better way of increasing the number of rare cases than simply
duplicating existing cases.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
16.You have the following code. The code prepares an experiment to run a script:
The experiment must be run on local computer using the default environment.
You need to add code to start the experiment and run the script.
Which code segment should you use?
A. run = script_experiment.start_logging()
B. run = Run(experiment=script_experiment)
C. ws.get_run(run_id=experiment.id)
D. run = script_experiment.submit(config=script_config)
Answer: D
Explanation:
The experiment class submit method submits an experiment and return the active created run.
Syntax: submit(config, tags=None, **kwargs)
Reference: https://docs.microsoft.com/en-us/python/api/azureml-
core/azureml.core.experiment.experiment
17.You plan to use automated machine learning by using Azure Machine Learning Python SDK v2 to
train a regression model. You have data that has features with missing values, and categorical
features with few distinct values.
You need to control whether automated machine learning automatically imputes missing values and
encode categorical features as part of the training task.
Which enemy of the autumn package should you use?
A. ForecastHorizonMode
B. RegressionPrimaryMetrics
C. RegressionModels
D. FeaturizationMode
Answer: D
18.HOTSPOT
You have a dataset created for multiclass classification tasks that contains a normalized numerical
feature set with 10,000 data points and 150 features.
You use 75 percent of the data points for training and 25 percent for testing. You are using the scikit-
learn machine learning library in Python. You use X to denote the feature set and Y to denote class
labels.
You create the following Python data frames:
You need to apply the Principal Component Analysis (PCA) method to reduce the dimensionality of
the feature set to 10 features in both training and testing sets.
How should you complete the code segment? To answer, select the appropriate options in the
answer area. NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: PCA (n_components = 10)
Need to reduce the dimensionality of the feature set to 10 features in both training and testing sets.
Example:
from sklearn.decomposition import PCA
pca = PCA(n_components=2) ;2 dimensions
principalComponents = pca.fit_transform(x)
Box 2: pca
fit_transform(X[, y])fits the model with X and apply the dimensionality reduction on X.
Box 3: transform(x_test)
transform(X) applies dimensionality reduction to X.
Reference: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
Answer:
21.DRAG DROP
You manage an Azure Machine Learning workspace named workspace1 with a compute instance
named compute1. You connect to compute! by using a terminal window from wofkspace1. You create
a file named "requirements.txt" containing Python dependencies to include Jupyler.
You need to add a new Jupyter kernel to compute1.
Which four commands should you use? To answer, move the appropriate actions from the list of
actions to the answer area and arrange them in the correct order.
Answer:
22.You use Azure Machine Learning Designer to load the following datasets into an experiment:
Data set 1
Dataset 2
You need to create a dataset that has the same columns and header row as the input datasets and
contains all rows from both input datasets.
Solution: Use the Apply Transformation component.
Does the solution meet the goal?
A. Yes
B. No
Answer: B
23.You write five Python scripts that must be processed in the order specified in Exhibit A C which
allows the same modules to run in parallel, but will wait for modules with dependencies.
You must create an Azure Machine Learning pipeline using the Python SDK, because you want to
script to create the pipeline to be tracked in your version control system. You have created five
PythonScriptSteps and have named the variables to match the module names.
You need to create the pipeline shown. Assume all relevant imports have been done.
Which Python code segment should you use?
A)
B)
C)
D)
A. Option A
B. Option B
C. Option C
D. Option D
Answer: A
Explanation:
The steps parameter is an array of steps. To build pipelines that have multiple steps, place the steps
in order in this array.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-parallel-run-step
24.DRAG DROP
You are analyzing a raw dataset that requires cleaning.
You must perform transformations and manipulations by using Azure Machine Learning Studio.
You need to identify the correct modules to perform the transformations.
Which modules should you choose? To answer, drag the appropriate modules to the correct
scenarios. Each module may be used once, more than once, or not at all.
You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct
selection is worth one point.
Answer:
Explanation:
Box 1: Clean Missing Data
Box 2: SMOTE
Use the SMOTE module in Azure Machine Learning Studio to increase the number of
underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing
the number of rare cases than simply duplicating existing cases.
Box 3: Convert to Indicator Values
Use the Convert to Indicator Values module in Azure Machine Learning Studio. The purpose of this
module is to convert columns that contain categorical values into a series of binary indicator columns
that can more easily be used as features in a machine learning model.
Box 4: Remove Duplicate Rows
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/convert-to-
indicator-values
25.You ate designing a training job in an Azure Machine Learning workspace by using Automated ML
During training, the compute resource must scale up to handle larger datasets. You need to select the
compute resource that has a multi-node cluster that automatically scales.
Which Azure Machine Learning compute target should you use?
A. Compute instance
B. Endpoints
C. Serverless compute
D. Kubernetes cluster
Answer: C
26.Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets might
have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You create an Azure Machine Learning service datastore in a workspace.
The datastore contains the following files:
• /data/2018/Q1 .csv
• /data/2018/Q2.csv
• /data/2018/Q3.csv
• /data/2018/Q4.csv
• /data/2019/Q1.csv
All files store data in the following format:
id,f1,f2,l
1,1,2,0
2,1,1,1
27.You have an Azure Machine Learning workspace. You plan to tune model hyperparameters by
using a sweep job.
You need to find a sampling method that supports early termination of low-performance jobs and
continuous hyperpara meters.
Solution: Use the Bayesian sampling method over the hyperparameter space.
Does the solution meet the goal?
A. Yes
B. No
Answer: A
28.HOTSPOT
For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE:
Each correct selection is worth one point.
Answer:
Explanation:
Yes, yes, no
29.DRAG DROP
You use a training pipeline in the Azure Machine Learning designer. You register a datastore named
ds1. The datastore contains multiple training data files. You use the Import Data module with the
configured datastore.
You need to retrain a model on a different set of data files.
Which four actions should you perform in sequence? To answer, move the appropriate actions from
the list of actions to the answer area and arrange them in the correct order.
Answer:
30.DRAG DROP
You are creating a machine learning model that can predict the species of a penguin from its
measurements. You have a file that contains measurements for free species of penguin in comma
delimited format.
The model must be optimized for area under the received operating characteristic curve performance
metric averaged for each class.
You need to use the Automated Machine Learning user interface in Azure Machine Learning studio to
run an experiment and find the best performing model.
Which five actions should you perform in sequence? To answer, move the appropriate actions from
the list of actions to the answer area and arrange them in the collect order.
Answer:
31.HOTSPOT
You have a multi-class image classification deep learning model that uses a set of labeled
photographs.
You create the following code to select hyperparameter values when training the model.
For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE:
Each correct selection is worth one point.
Answer:
Explanation:
Box 1: Yes
Hyperparameters are adjustable parameters you choose to train a model that govern the training
process itself. Azure Machine Learning allows you to automate hyperparameter exploration in an
efficient manner, saving you significant time and resources. You specify the range of hyperparameter
values and a maximum number of training runs. The system then automatically launches multiple
simultaneous runs with different parameter configurations and finds the configuration that results in
the best performance, measured by the metric you choose. Poorly performing training runs are
automatically early terminated, reducing wastage of compute resources. These resources are instead
used to explore other hyperparameter configurations.
Box 2: Yes
uniform (low, high) - Returns a value uniformly distributed between low and high
Box 3: No
Bayesian sampling does not currently support any early termination policy.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
32.You need to implement a new cost factor scenario for the ad response models as illustrated in the
performance curve exhibit.
Which technique should you use?
A. Set the threshold to 0.5 and retrain if weighted Kappa deviates +/- 5% from 0.45.
B. Set the threshold to 0.05 and retrain if weighted Kappa deviates +/- 5% from 0.5.
C. Set the threshold to 0.2 and retrain if weighted Kappa deviates +/- 5% from 0.6.
D. Set the threshold to 0.75 and retrain if weighted Kappa deviates +/- 5% from 0.15.
Answer: A
Explanation:
Scenario:
Performance curves of current and proposed cost factor scenarios are shown in the following
diagram:
The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa deviated
from 0.1 +/- 5%.
33.HOTSPOT
You are retrieving data from a large datastore by using Azure Machine Learning Studio.
You must create a subset of the data for testing purposes using a random sampling seed based on
the system clock.
You add the Partition and Sample module to your experiment.
You need to select the properties for the module.
Which values should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: Sampling
Create a sample of data
This option supports simple random sampling or stratified random sampling. This is useful if you want
to create a smaller representative sample dataset for testing.
34.You create a binary classification model. The model is registered in an Azure Machine Learning
workspace. You use the Azure Machine Learning Fairness SDK to assess the model fairness.
You develop a training script for the model on a local machine.
You need to load the model fairness metrics into Azure Machine Learning studio.
What should you do?
A. Implement the download_dashboard_by_upload_id function
B. Implement the creace_group_metric_sec function
C. Implement the upload_dashboard_dictionary function
D. Upload the training script
Answer: C
Explanation:
import azureml.contrib.fairness package to perform the upload:
from azureml.contrib.fairness import upload_dashboard_dictionary,
download_dashboard_by_upload_id
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-
fairness-aml
35.HOTSPOT
You create an Azure Machine Learning workspace and install the MLflow library.
You need to tog different types of data by using the MLflow library.
Which method should you use? To answer, select the appropriate options in the answer area. NOTE:
Each correct selection is worth one point.
Answer:
36.Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets might
have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You use Azure Machine Learning designer to load the following datasets into an experiment:
You need to create a dataset that has the same columns and header row as the input datasets and
contains all rows from both input datasets.
Solution: Use the Add Rows module.
Does the solution meet the goal?
A. Yes
B. No
Answer: B
37.HOTSPOT
You manage an Azure Machine Learning workspace by using the Python SDK v2.
You must create a compute cluster in the workspace. The compute cluster must run workloads and
properly handle interruptions. You start by calculating the maximum amount of compute resources
required by the workloads and size the cluster to match the calculations.
The cluster definition includes the following properties and values:
• name="mlcluster1’’
• size="STANDARD.DS3.v2"
• min_instances=1
• maxjnstances=4
• tier="dedicated"
The cost of the compute resources must be minimized when a workload is active Of idle. Cluster
property changes must not affect the maximum amount of compute resources available to the
workloads run on the cluster.
You need to modify the cluster properties to minimize the cost of compute resources.
Which properties should you modify? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
38.You use the Azure Machine Learning designer to create and run a training pipeline.
The pipeline must be run every night to inference predictions from a large volume of files. The folder
where the files will be stored is defined as a dataset.
You need to publish the pipeline as a REST service that can be used for the nightly inferencing run.
What should you do?
A. Create a batch inference pipeline
B. Set the compute target for the pipeline to an inference cluster
C. Create a real-time inference pipeline
D. Clone the pipeline
Answer: A
Explanation:
Azure Machine Learning Batch Inference targets large inference jobs that are not time-sensitive.
Batch Inference provides cost-effective inference compute scaling, with unparalleled throughput for
asynchronous applications. It is optimized for high-throughput, fire-and-forget inference over large
collections of data.
You can submit a batch inference job by pipeline_run, or through REST calls with a published
pipeline.
Reference: https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-
azureml/machine-learning-pipelines/parallel-run/README.md
39.HOTSPOT
You create an Azure Databricks workspace and a linked Azure Machine Learning workspace.
You have the following Python code segment in the Azure Machine Learning workspace:
import mlflow
import mlflow.azureml
import azureml.mlflow
import azureml.core
from azureml.core import Workspace
subscription_id = 'subscription_id'
resourse_group = 'resource_group_name'
workspace_name = 'workspace_name'
ws = Workspace.get(name=workspace_name,
subscription_id=subscription_id,
resource_group=resource_group)
experimentName = "/Users/{user_name}/{experiment_folder}/{experiment_name}"
mlflow.set_experiment(experimentName)
uri = ws.get_mlflow_tracking_uri()
mlflow.set_tracking_uri(uri)
Instructions: For each of the following statements, select Yes if the statement is true. Otherwise,
select No. NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: No
The Workspace.get method loads an existing workspace without using configuration files.
ws = Workspace.get(name="myworkspace",
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup')
Box 2: Yes
MLflow Tracking with Azure Machine Learning lets you store the logged metrics and artifacts from
your local runs into your Azure Machine Learning workspace.
The get_mlflow_tracking_uri() method assigns a unique tracking URI address to the workspace, ws,
and set_tracking_uri() points the MLflow tracking URI to that address.
Box 3: Yes
Note: In Deep Learning, epoch means the total dataset is passed forward and backward in a neural
network once.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow
40.DRAG DROP
An organization uses Azure Machine Learning service and wants to expand their use of machine
learning.
You have the following compute environments.
The organization does not want to create another compute environment.
You need to determine which compute environment to use for the following scenarios.
Which compute types should you use? To answer, drag the appropriate compute environments to the
correct scenarios. Each compute environment may be used once, more than once, or not at all. You
may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection
is worth one point.
Answer:
Explanation:
Box 1: nb_server
Box 2: mlc_cluster
With Azure Machine Learning, you can train your model on a variety of resources or environments,
collectively referred to as compute targets. A compute target can be a local machine or a cloud
resource, such as an Azure Machine Learning Compute, Azure HDInsight or a remote virtual
machine.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets
41.You use the Azure Machine Learning service to create a tabular dataset named training.dat a. You
plan to use this dataset in a training script.
You create a variable that references the dataset using the following code:
training_ds = workspace.datasets.get("training_data")
You define an estimator to run the script.
You need to set the correct property of the estimator to ensure that your script can access the
training.data dataset
Which property should you set?
A)
B)
C)
D)
A. Option A
B. Option B
C. Option C
D. Option D
Answer: A
Explanation:
Example:
# Get the training dataset
diabetes_ds = ws.datasets.get("Diabetes Dataset")
# Create an estimator that uses the remote compute
hyper_estimator = SKLearn(source_directory=experiment_folder,
inputs=[diabetes_ds.as_named_input('diabetes')], # Pass the dataset as an input
compute_target = cpu_cluster,
conda_packages=['pandas','ipykernel','matplotlib'],
pip_packages=['azureml-sdk','argparse','pyarrow'],
entry_script='diabetes_training.py')
Reference: https://notebooks.azure.com/GraemeMalcolm/projects/azureml-primers/html/04 -
Optimizing Model Training.ipynb
42.HOTSPOT
You have a dataset that includes home sales data for a city.
The dataset includes the following columns.
Answer:
Explanation:
Box 1: Regression
Regression is a supervised machine learning technique used to predict numeric values.
Box 2: Price
Reference: https://docs.microsoft.com/en-us/learn/modules/create-regression-model-azure-machine-
learning-designer
43.HOTSPOT
You manage an Azure Machine Learning workspace. You submit a training job with the Azure
Machine Learning Python SDK v2. You must use MLflow to log metrics, model parameters, and
mode! artifacts automatically when training a model.
You start by writing the following code segment:
For each of the following statements, select Yes If the statement is true. Otherwise, select No.
Answer:
B)
C)
D)
A. Option A
B. Option B
C. Option C
D. Option D
Answer: A
Explanation:
The following custom role can do everything in the workspace except for the following actions:
? It can't create or update a compute resource.
? It can't delete a compute resource.
? It can't add, delete, or alter role assignments.
? It can't delete the workspace.
To create a custom role, first construct a role definition JSON file that specifies the permission and
scope for the role.
The following example defines a custom role named "Data Scientist Custom" scoped at a specific
workspace level:
data_scientist_custom_role.json :
{
"Name": "Data Scientist Custom",
"IsCustom": true,
"Description": "Can run experiment but can't create or delete compute.",
"Actions": ["*"],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/*/delete",
"Microsoft.MachineLearningServices/workspaces/write",
"Microsoft.MachineLearningServices/workspaces/computes/*/write",
"Microsoft.MachineLearningServices/workspaces/computes/*/delete",
"Microsoft.Authorization/*/write"
],
"AssignableScopes": [
"/subscriptions/<subscription_id>/resourceGroups/<resource_group_name>/providers/Micr
osoft.MachineLearningServices/workspaces/<workspace_name>" ]
}
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-assign-roles
45.You are developing a hands-on workshop to introduce Docker for Windows to attendees.
You need to ensure that workshop attendees can install Docker on their devices.
Which two prerequisite components should attendees install on the devices? Each correct answer
presents part of the solution. NOTE: Each correct selection is worth one point.
A. Microsoft Hardware-Assisted Virtualization Detection Tool
B. Kitematic
C. BIOS-enabled virtualization
D. VirtualBox
E. Windows 10 64-bit Professional
Answer: CE
Explanation:
C: Make sure your Windows system supports Hardware Virtualization Technology and that
virtualization is enabled.
Ensure that hardware virtualization support is turned on in the BIOS settings. For example:
E: To run Docker, your machine must have a 64-bit operating system running Windows 7 or higher.
Reference:
https://docs.docker.com/toolbox/toolbox_install_windows/
https://blogs.technet.microsoft.com/canitpro/2015/09/08/step-by-step-enabling-hyper-v-for-use-on-
windows-10/
46.HOTSPOT
You are analyzing the asymmetry in a statistical distribution.
The following image contains two density curves that show the probability distribution of two datasets.
Use the drop-down menus to select the answer choice that answers each question based on the
information presented in the graphic. NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: Positive skew
Positive skew values means the distribution is skewed to the right.
Box 2: Negative skew
Negative skewness values mean the distribution is skewed to the left.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-
reference/compute-elementary-statistics
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-enable-app-insights
50.You are creating a machine learning model. You have a dataset that contains null rows.
You need to use the Clean Missing Data module in Azure Machine Learning Studio to identify and
resolve the null and missing data in the dataset.
Which parameter should you use?
A. Replace with mean
B. Remove entire column
C. Remove entire row
D. Hot Deck
Answer: B
Explanation:
Remove entire row: Completely removes any row in the dataset that has one or more missing values.
This is useful if the missing value can be considered randomly missing.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-
missing-data
51.You are building a regression model tot estimating the number of calls during an event.
You need to determine whether the feature values achieve the conditions to build a Poisson
regression model.
Which two conditions must the feature set contain? I ach correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. The label data must be a negative value.
B. The label data can be positive or negative,
C. The label data must be a positive value
D. The label data must be non discrete.
E. The data must be whole numbers.
Answer: CE
Explanation:
Poisson regression is intended for use in regression models that are used to predict numeric values,
typically counts. Therefore, you should use this module to create your regression model only if the
values you are trying to predict fit the following conditions:
The response variable has a Poisson distribution.
Counts cannot be negative. The method will fail outright if you attempt to use it with negative labels.
A Poisson distribution is a discrete distribution; therefore, it is not meaningful to use this method with
non-whole numbers.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-
reference/poisson-regression
52.You are conducting feature engineering to prepuce data for further analysis.
The data includes seasonal patterns on inventory requirements.
You need to select the appropriate method to conduct feature engineering on the data.
Which method should you use?
A. Exponential Smoothing (ETS) function.
B. One Class Support Vector Machine module
C. Time Series Anomaly Detection module
D. Finite Impulse Response (FIR) Filter module.
Answer: D
53.HOTSPOT
You monitor an Azure Machine Learning classification training experiment named train-classification
on Azure Notebooks.
You must store a table named table as an artifact in Azure Machine Learning Studio during model
training.
You need to collect and list the metrics by using MLfow.
How should you complete the code segment? To answer, select the appropriate option in the answer
area. NOTE: Each correct selection is worth on* point.
Answer:
54.You are moving a large dataset from Azure Machine Learning Studio to a Weka environment.
You need to format the data for the Weka environment.
Which module should you use?
A. Convert to CSV
B. Convert to Dataset
C. Convert to ARFF
D. Convert to SVMLight
Answer: C
Explanation:
Use the Convert to ARFF module in Azure Machine Learning Studio, to convert datasets and results
in Azure Machine Learning to the attribute-relation file format used by the Weka toolset. This format is
known as ARFF.
The ARFF data specification for Weka supports multiple machine learning tasks, including data
preprocessing, classification, and feature selection. In this format, data is organized by entites and
their attributes, and is contained in a single text file.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/convert-
to-arff
55.HOTSPOT
You are developing a deep learning model by using TensorFlow. You plan to run the model training
workload on an Azure Machine Learning Compute Instance.
You must use CUDA-based model training.
You need to provision the Compute Instance.
Which two virtual machines sizes can you use? To answer, select the appropriate virtual machine
sizes in the answer area. NOTE: Each correct selection is worth one point.
Answer:
Explanation:
CUDA is a parallel computing platform and programming model developed by Nvidia for general
computing on its own GPUs (graphics processing units). CUDA enables developers to speed up
compute-intensive applications by harnessing the power of GPUs for the parallelizable part of the
computation.
Reference: https://www.infoworld.com/article/3299703/what-is-cuda-parallel-programming-for-
gpus.html
56.You arc creating a new experiment in Azure Machine Learning Studio. You have a small dataset
that has missing values in many columns. The data does not require the application of predictors for
each column. You plan to use the Clean Missing Data module to handle the missing data.
You need to select a data cleaning method.
Which method should you use?
A. Synthetic Minority
B. Replace using Probabilistic PAC
C. Replace using MICE
D. Normalization
Answer: B
57.HOTSPOT
You manage an Azure Machine Learning workspace named workspace1 by using the Python SDK
v2. The default datastore of workspace1 contains a folder named sample_data.
The folder structure contains the following content:
You write Python SDK v2 code to materialize the data from the files in the sample.data folder into a
Pandas data frame. You need to complete the Python SDK v2 code to use the MLTaWe folder as the
materialization blueprint.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
58.HOTSPOT
You manage an Azure Machine Learning workspace.
You train a model interactively with a Jupyter Notebook in the workspace During training, a dataset is
created with accuiacy and loss metrics for each epoch.
You need to configure model tracking with MLflow to log the dataset created during the training.
How should you complete the code segment? To answer, select the appropriate options in the
answer area. NOTE: Each correct selection is worth one point.
Answer:
Case study
Overview
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into Europe
and has asked you to investigate prices for private residences in major European cities. You use
Azure Machine Learning Studio to measure the median value of properties. You produce a regression
model to predict property prices by using the Linear Regression and Bayesian Linear Regression
modules.
Datasets
There are two datasets in CSV format that contain property details for two cities, London and Paris,
with the following columns:
The two datasets have been added to Azure Machine Learning Studio as separate datasets and
included as the starting point of the experiment.
Dataset issues
The AccessibilityToHighway column in both datasets contains missing values. The missing data must
be replaced with new data so that it is modeled conditionally using the other variables in the data
before filling in the missing values.
Columns in each dataset contain missing and null values. The dataset also contains many outliers.
The Age column has a high proportion of outliers. You need to remove the rows that have outliers in
the Age column. The MedianValue and AvgRoomsinHouse columns both hold data in numeric format.
You need to select a feature selection algorithm to analyze the relationship between the two columns
in more detail.
Model fit
The model shows signs of overfitting. You need to produce a more refined regression model that
reduces the overfitting.
Experiment requirements
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance.
In each case, the predictor of the dataset is the column named MedianValue. An initial investigation
showed that the datasets are identical in structure apart from the MedianValue column. The smaller
Paris dataset contains the MedianValue in text format, whereas the larger London dataset contains
the MedianValue in numerical format. You must ensure that the datatype of the MedianValue column
of the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-parameters
statistics to measure the relationships.
You must use a feature selection algorithm to analyze the relationship between the MedianValue and
AvgRoomsinHouse columns.
Model training
Given a trained model and a test dataset, you need to compute the permutation feature importance
scores of feature variables. You need to set up the Permutation Feature Importance module to select
the correct metric to investigate the model’s accuracy and replicate the findings.
You want to configure hyperparameters in the model learning process to speed the learning phase by
using hyperparameters. In addition, this configuration should cancel the lowest performing runs at
each evaluation interval, thereby directing effort and resources towards models that are more likely to
be successful.
You are concerned that the model might not efficiently use compute resources in hyperparameter
tuning. You also are concerned that the model might prevent an increase in the overall tuning time.
Therefore, you need to implement an early stopping criterion on models that provides savings without
terminating promising jobs.
Testing
You must produce multiple partitions of a dataset based on sampling using the Partition and Sample
module in Azure Machine Learning Studio. You must create three equal partitions for cross-validation.
You must also configure the cross-validation process so that the rows in the test and training datasets
are divided evenly by properties that are near each city’s main river. The data that identifies that a
property is near a river is held in the column named NextToRiver. You want to complete this task
before the data goes through the sampling process.
When you train a Linear Regression module using a property dataset that shows data for property
prices for a large city, you need to determine the best features to use in a model. You can choose
standard metrics provided to measure performance before and after the feature importance process
completes. You must ensure that the distribution of the features across multiple training models is
consistent.
Data visualization
You need to provide the test results to the Fabrikam Residences team. You create data visualizations
to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a
diagnostic test evaluation of the model. You need to select appropriate methods for producing the
ROC curve in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the
Two-Class Decision Jungle modules with one another.
DRAG DROP
You need to implement early stopping criteria as suited in the model training requirements.
Which three code segments should you use to develop the solution? To answer, move the
appropriate code segments from the list of code segments to the answer area and arrange them in
the correct order. NOTE: More than one order of answer choices is correct. You will receive credit for
any of the correct orders you select.
Answer:
Explanation:
You need to implement an early stopping criterion on models that provides savings without
terminating promising jobs.
Truncation selection cancels a given percentage of lowest performing runs at each evaluation
interval. Runs are compared based on their performance on the primary metric and the lowest X% are
terminated.
Example:
from azureml.train.hyperdrive import TruncationSelectionPolicy
early_termination_policy = TruncationSelectionPolicy(evaluation_interval=1,
truncation_percentage=20, delay_evaluation=5)
Incorrect Answers:
Bandit is a termination policy based on slack factor/slack amount and evaluation interval. The policy
early terminates any runs where the primary metric is not within the specified slack factor / slack
amount with respect to the best performing training run.
Example:
from azureml.train.hyperdrive import BanditPolicy
early_termination_policy = BanditPolicy(slack_factor = 0.1, evaluation_interval=1, delay_evaluation=5
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-
hyperparameters
60.HOTSPOT
Your Azure Machine Learning workspace has a dataset named real_estate_data.
A sample of the data in the dataset follows.
You want to use automated machine learning to find the best regression model for predicting the price
column.
You need to configure an automated machine learning experiment using the Azure Machine Learning
SDK.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: training_data
The training data to be used within the experiment. It should contain both training features and a label
column (optionally a sample weights column). If training_data is specified, then the
label_column_name parameter must also be specified.
Box 2: validation_data
Provide validation data: In this case, you can either start with a single data file and split it into training
and validation sets or you can provide a separate data file for the validation set. Either way, the
validation_data parameter in your AutoMLConfig object assigns which data to use as your validation
set.
Example, the following code example explicitly defines which portion of the provided data in dataset
to use for training and validation.
dataset = Dataset.Tabular.from_delimited_files(data)
training_data, validation_data = dataset.random_split(percentage=0.8, seed=1)
automl_config = AutoMLConfig(compute_target = aml_remote_compute,
task = 'classification',
primary_metric = 'AUC_weighted',
training_data = training_data,
validation_data = validation_data,
label_column_name = 'Class'
)
Box 3: label_column_name
label_column_name:
The name of the label column. If the input data is from a pandas.DataFrame which doesn't have
column names, column indices can be used instead, expressed as integers.
This parameter is applicable to training_data and validation_data parameters.
Incorrect Answers:
X: The training features to use when fitting pipelines during an experiment. This setting is being
deprecated. Please use training_data and label_column_name instead.
Y: The training labels to use when fitting pipelines during an experiment. This is the value your model
will predict. This setting is being deprecated. Please use training_data and label_column_name
instead.
X_valid: Validation features to use when fitting pipelines during an experiment.
If specified, then y_valid or sample_weight_valid must also be specified.
Y_valid: Validation labels to use when fitting pipelines during an experiment.
Both X_valid and y_valid must be specified together.
exclude_nan_labels: Whether to exclude rows with NaN values in the label. The default is True.
y_max: y_max (float)
Maximum value of y for a regression experiment. The combination of y_min and y_max are used to
normalize test set metrics based on the input data range. If not specified, the maximum value is
inferred from the data.
Reference: https://docs.microsoft.com/en-us/python/api/azureml-train-automl-
client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py
61.You define a datastore named ml-data for an Azure Storage blob container. In the container, you
have a folder named train that contains a file named data.csv. You plan to use the file to train a model
by using the Azure Machine Learning SDK.
You plan to train the model by using the Azure Machine Learning SDK to run an experiment on local
compute.
You define a DataReference object by running the following code:
B)
C)
D)
E)
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: E
Explanation:
Example:
data_folder = args.data_folder
# Load Train and Test data
train_data = pd.read_csv(os.path.join(data_folder, 'data.csv’))
Reference: https://www.element61.be/en/resource/azure-machine-learning-services-complete-toolbox-
ai
62.You register a model that you plan to use in a batch inference pipeline.
The batch inference pipeline must use a ParallelRunStep step to process files in a file dataset. The
script has the ParallelRunStep step runs must process six input files each time the inferencing
function is called.
You need to configure the pipeline.
Which configuration setting should you specify in the ParallelRunConfig object for the PrallelRunStep
step?
A. process_count_per_node= "6"
B. node_count= "6"
C. mini_batch_size= "6"
D. error_threshold= "6"
Answer: B
Explanation:
node_count is the number of nodes in the compute target used for running the ParallelRunStep.
Reference: https://docs.microsoft.com/en-us/python/api/azureml-contrib-pipeline-
steps/azureml.contrib.pipeline.steps.parallelrunconfig?view=azure-ml-py
63. Connect the left output of the left Execute R Script module to the right input port of the Train
Model module (in this tutorial you used the data coming from the left side of the Split Data module for
training).
This portion of the experiment now looks something like this:
Step 2: Score Model
Score and evaluate the models
You use the testing data that was separated out by the Split Data module to score our trained models.
You can then compare the results of the two models to see which generated better results.
Add the Score Model modules
65.You run an automated machine learning experiment in an Azure Machine Learning workspace.
Information about the run is listed in the table below:
You need to write a script that uses the Azure Machine Learning SDK to retrieve the best iteration of
the experiment run.
Which Python code segment should you use?
A)
B)
C)
D)
A. Option A
B. Option B
C. Option C
D. Option D
Answer: A
Explanation:
The get_output method on automl_classifier returns the best run and the fitted model for the last
invocation. Overloads on get_output allow you to retrieve the best run and fitted model for any logged
metric or for a particular iteration.
In [ ]:
best_run, fitted_model = local_run.get_output()
Reference: https://notebooks.azure.com/azureml/projects/azureml-getting-started/html/how-to-use-az
ureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-
deployment.ipynb
66.You train a model and register it in your Azure Machine Learning workspace. You are ready to
deploy the model as a real-time web service.
You deploy the model to an Azure Kubernetes Service (AKS) inference cluster, but the deployment
fails because an error occurs when the service runs the entry script that is associated with the model
deployment.
You need to debug the error by iteratively modifying the code and reloading the service, without
requiring a re-deployment of the service for each code update.
What should you do?
A. Register a new version of the model and update the entry script to load the new version of the
model from its registered path.
B. Modify the AKS service deployment configuration to enable application insights and re-deploy to
AKS.
C. Create an Azure Container Instances (ACI) web service deployment configuration and deploy the
model on ACI.
D. Add a breakpoint to the first line of the entry script and redeploy the service to AKS.
E. Create a local web service deployment configuration and deploy the model to a local Docker
container.
Answer: C
Explanation:
How to work around or solve common Docker deployment errors with Azure Container Instances
(ACI) and Azure Kubernetes Service (AKS) using Azure Machine Learning.
The recommended and the most up to date approach for model deployment is via the Model.deploy()
API using an Environment object as an input parameter. In this case our service will create a base
docker image for you during deployment stage and mount the required models all in one call.
The basic deployment tasks are:
67.You have an Azure Machine Learning workspace named Workspace 1 Workspace! has a
registered Mlflow model named model 1 with PyFunc flavor
You plan to deploy model1 to an online endpoint named endpoint1 without egress connectivity by
using Azure Machine learning Python SDK vl
You have the following code:
You need to add a parameter to the ManagedOnlineDeployment object to ensure the model deploys
successfully
Solution: Add the environment parameter.
Does the solution meet the goal?
A. Yes
B. No
Answer: B
68.HOTSPOT
You are using hyperparameter tuning in Azure Machine Learning Python SDK v2 to train a model.
You configure the hyperparameter tuning experiment by running the following code:
For each of the following statements select Yes if the statement is true. Otherwise, select No. NOTE:
Fach correct selection is worth one paint.
Answer:
69.HOTSPOT
You create an Azure Machine learning workspace and load a Python training script named tram.py in
the src subfolder. The dataset used to train your model is available locally.
You run the following script to tram the model:
instructions: For each of the following statements, select Yes if the statement is true. Otherwise,
select No. NOTE: Each correct selection is worth one point
Answer:
70.Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets might
have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Stratified split for the sampling mode.
Does the solution meet the goal?
A. Yes
B. No
Answer: B
Explanation:
Instead use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Note: SMOTE is used to increase the number of underepresented cases in a dataset used for
machine learning. SMOTE is a better way of increasing the number of rare cases than simply
duplicating existing cases.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
71.DRAG DROP
You create machine learning models by using Azure Machine Learning.
You plan to train and score models by using a variety of compute contexts. You also plan to create a
new compute resource in Azure Machine Learning studio.
You need to select the appropriate compute types.
Which compute types should you select? To answer, drag the appropriate compute types to the
correct requirements. Each compute type may be used once, more than once, or not at all. You may
need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is
worth one point.
Answer:
Explanation:
Box 1: Attached compute
72.HOTSPOT
You create an Azure Machine Learning workspace. You use the Azure Machine Learning SDK for
Python.
You must create a dataset from remote paths. The dataset must be reusable within the workspace.
You need to create the dataset.
How should you complete the following code segment? To answer, select the appropriate options in
the answer area. NOTE: Each correct selection is worth one point.
Answer:
73.Your team is building a data engineering and data science development environment.
The environment must support the following requirements:
? support Python and Scala
? compose data storage, movement, and processing services into automated data pipelines
? the same tool should be used for the orchestration of both data engineering and data science
? support workload isolation and interactive workloads
? enable scaling across a cluster of machines
You need to create the environment.
What should you do?
A. Build the environment in Apache Hive for HDInsight and use Azure Data Factory for orchestration.
B. Build the environment in Azure Databricks and use Azure Data Factory for orchestration.
C. Build the environment in Apache Spark for HDInsight and use Azure Container Instances for
orchestration.
D. Build the environment in Azure Databricks and use Azure Container Instances for orchestration.
Answer: B
Explanation:
In Azure Databricks, we can create two different types of clusters.
Standard, these are the default clusters and can be used with Python, R, Scala and SQL
High-concurrency
Azure Databricks is fully integrated with Azure Data Factory.
Incorrect Answers:
D: Azure Container Instances is good for development or testing. Not suitable for production
workloads.
Reference: https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/data-
science-and-machinelearning
74.You use the Azure Machine Learning SDK to run a training experiment that trains a classification
model and calculates its accuracy metric.
The model will be retrained each month as new data is available.
You must register the model for use in a batch inference pipeline.
You need to register the model and ensure that the models created by subsequent retraining
experiments are registered only if their accuracy is higher than the currently registered model.
What are two possible ways to achieve this goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. Specify a different name for the model each time you register it.
B. Register the model with the same name each time regardless of accuracy, and always use the
latest
version of the model in the batch inferencing pipeline.
C. Specify the model framework version when registering the model, and only register subsequent
models if this value is higher.
D. Specify a property named accuracy with the accuracy metric as a value when registering the
model, and only register subsequent models if their accuracy is higher than the accuracy property
value of the
currently registered model.
E. Specify a tag named accuracy with the accuracy metric as a value when registering the model, and
only register subsequent models if their accuracy is higher than the accuracy tag value of the
currently
registered model.
Answer: CE
Explanation:
E: Using tags, you can track useful information such as the name and version of the machine learning
library used to train the model. Note that tags must be alphanumeric.
Reference: https://notebooks.azure.com/xavierheriat/projects/azureml-getting-started/html/how-to-use
-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-
deploy-service.ipynb
75.You must store data in Azure Blob Storage to support Azure Machine Learning.
You need to transfer the data into Azure Blob Storage.
What are three possible ways to achieve the goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. Bulk Insert SQL Query
B. AzCopy
C. Python script
D. Azure Storage Explorer
E. Bulk Copy Program (BCP)
Answer: BCD
Explanation:
You can move data to and from Azure Blob storage using different technologies:
Azure Storage-Explorer
AzCopy
Python
SSIS
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-
process/move-azure-blob