AI 900T00A ENU TrainerHandbook
AI 900T00A ENU TrainerHandbook
Official
Course
AI-900T00
Microsoft Azure AI
Fundamentals
AI-900T00
Microsoft Azure AI
Fundamentals
II Disclaimer
Information in this document, including URL and other Internet Web site references, is subject to change
without notice. Unless otherwise noted, the example companies, organizations, products, domain names,
e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with
any real company, organization, product, domain name, e-mail address, logo, person, place or event is
intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the
user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in
or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical,
photocopying, recording, or otherwise), or for any purpose, without the express written permission of
Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Microsoft, the furnishing of this document does not give you any license to these
patents, trademarks, copyrights, or other intellectual property.
The names of manufacturers, products, or URLs are provided for informational purposes only and
Microsoft makes no representations and warranties, either expressed, implied, or statutory, regarding
these manufacturers or the use of the products with any Microsoft technologies. The inclusion of a
manufacturer or product does not imply endorsement of Microsoft of the manufacturer or product. Links
may be provided to third party sites. Such sites are not under the control of Microsoft and Microsoft is
not responsible for the contents of any linked site or any link contained in a linked site, or any changes or
updates to such sites. Microsoft is not responsible for webcasting or any other form of transmission
received from any linked site. Microsoft is providing these links to you only as a convenience, and the
inclusion of any link does not imply endorsement of Microsoft of the site or the products contained
therein.
© 2019 Microsoft Corporation. All rights reserved.
Microsoft and the trademarks listed at http://www.microsoft.com/trademarks 1are trademarks of the
Microsoft group of companies. All other trademarks are property of their respective owners.
1 http://www.microsoft.com/trademarks
EULA III
13. “Personal Device” means one (1) personal computer, device, workstation or other digital electronic
device that you personally own or control that meets or exceeds the hardware level specified for
the particular Microsoft Instructor-Led Courseware.
14. “Private Training Session” means the instructor-led training classes provided by MPN Members for
corporate customers to teach a predefined learning objective using Microsoft Instructor-Led
Courseware. These classes are not advertised or promoted to the general public and class attend-
ance is restricted to individuals employed by or contracted by the corporate customer.
15. “Trainer” means (i) an academically accredited educator engaged by a Microsoft Imagine Academy
Program Member to teach an Authorized Training Session, (ii) an academically accredited educator
validated as a Microsoft Learn for Educators – Validated Educator, and/or (iii) a MCT.
16. “Trainer Content” means the trainer version of the Microsoft Instructor-Led Courseware and
additional supplemental content designated solely for Trainers’ use to teach a training session
using the Microsoft Instructor-Led Courseware. Trainer Content may include Microsoft PowerPoint
presentations, trainer preparation guide, train the trainer materials, Microsoft One Note packs,
classroom setup guide and Pre-release course feedback form. To clarify, Trainer Content does not
include any software, virtual hard disks or virtual machines.
2. USE RIGHTS. The Licensed Content is licensed, not sold. The Licensed Content is licensed on a one
copy per user basis, such that you must acquire a license for each individual that accesses or uses the
Licensed Content.
●● 2.1 Below are five separate sets of use rights. Only one set of rights apply to you.
1. If you are a Microsoft Imagine Academy (MSIA) Program Member:
1. Each license acquired on behalf of yourself may only be used to review one (1) copy of the
Microsoft Instructor-Led Courseware in the form provided to you. If the Microsoft Instruc-
tor-Led Courseware is in digital format, you may install one (1) copy on up to three (3)
Personal Devices. You may not install the Microsoft Instructor-Led Courseware on a device
you do not own or control.
2. For each license you acquire on behalf of an End User or Trainer, you may either:
1. distribute one (1) hard copy version of the Microsoft Instructor-Led Courseware to one
(1) End User who is enrolled in the Authorized Training Session, and only immediately
prior to the commencement of the Authorized Training Session that is the subject matter
of the Microsoft Instructor-Led Courseware being provided, or
2. provide one (1) End User with the unique redemption code and instructions on how they
can access one (1) digital version of the Microsoft Instructor-Led Courseware, or
3. provide one (1) Trainer with the unique redemption code and instructions on how they
can access one (1) Trainer Content.
3. For each license you acquire, you must comply with the following:
1. you will only provide access to the Licensed Content to those individuals who have
acquired a valid license to the Licensed Content,
2. you will ensure each End User attending an Authorized Training Session has their own
valid licensed copy of the Microsoft Instructor-Led Courseware that is the subject of the
Authorized Training Session,
3. you will ensure that each End User provided with the hard-copy version of the Microsoft
Instructor-Led Courseware will be presented with a copy of this agreement and each End
EULA V
User will agree that their use of the Microsoft Instructor-Led Courseware will be subject
to the terms in this agreement prior to providing them with the Microsoft Instructor-Led
Courseware. Each individual will be required to denote their acceptance of this agree-
ment in a manner that is enforceable under local law prior to their accessing the Micro-
soft Instructor-Led Courseware,
4. you will ensure that each Trainer teaching an Authorized Training Session has their own
valid licensed copy of the Trainer Content that is the subject of the Authorized Training
Session,
5. you will only use qualified Trainers who have in-depth knowledge of and experience with
the Microsoft technology that is the subject of the Microsoft Instructor-Led Courseware
being taught for all your Authorized Training Sessions,
6. you will only deliver a maximum of 15 hours of training per week for each Authorized
Training Session that uses a MOC title, and
7. you acknowledge that Trainers that are not MCTs will not have access to all of the trainer
resources for the Microsoft Instructor-Led Courseware.
2. If you are a Microsoft Learning Competency Member:
1. Each license acquire may only be used to review one (1) copy of the Microsoft Instruc-
tor-Led Courseware in the form provided to you. If the Microsoft Instructor-Led Course-
ware is in digital format, you may install one (1) copy on up to three (3) Personal Devices.
You may not install the Microsoft Instructor-Led Courseware on a device you do not own or
control.
2. For each license you acquire on behalf of an End User or MCT, you may either:
1. distribute one (1) hard copy version of the Microsoft Instructor-Led Courseware to one
(1) End User attending the Authorized Training Session and only immediately prior to
the commencement of the Authorized Training Session that is the subject matter of the
Microsoft Instructor-Led Courseware provided, or
2. provide one (1) End User attending the Authorized Training Session with the unique
redemption code and instructions on how they can access one (1) digital version of the
Microsoft Instructor-Led Courseware, or
3. you will provide one (1) MCT with the unique redemption code and instructions on how
they can access one (1) Trainer Content.
3. For each license you acquire, you must comply with the following:
1. you will only provide access to the Licensed Content to those individuals who have
acquired a valid license to the Licensed Content,
2. you will ensure that each End User attending an Authorized Training Session has their
own valid licensed copy of the Microsoft Instructor-Led Courseware that is the subject of
the Authorized Training Session,
3. you will ensure that each End User provided with a hard-copy version of the Microsoft
Instructor-Led Courseware will be presented with a copy of this agreement and each End
User will agree that their use of the Microsoft Instructor-Led Courseware will be subject
to the terms in this agreement prior to providing them with the Microsoft Instructor-Led
Courseware. Each individual will be required to denote their acceptance of this agree-
ment in a manner that is enforceable under local law prior to their accessing the Micro-
soft Instructor-Led Courseware,
VI EULA
4. you will ensure that each MCT teaching an Authorized Training Session has their own
valid licensed copy of the Trainer Content that is the subject of the Authorized Training
Session,
5. you will only use qualified MCTs who also hold the applicable Microsoft Certification
credential that is the subject of the MOC title being taught for all your Authorized
Training Sessions using MOC,
6. you will only provide access to the Microsoft Instructor-Led Courseware to End Users,
and
7. you will only provide access to the Trainer Content to MCTs.
3. If you are a MPN Member:
1. Each license acquired on behalf of yourself may only be used to review one (1) copy of the
Microsoft Instructor-Led Courseware in the form provided to you. If the Microsoft Instruc-
tor-Led Courseware is in digital format, you may install one (1) copy on up to three (3)
Personal Devices. You may not install the Microsoft Instructor-Led Courseware on a device
you do not own or control.
2. For each license you acquire on behalf of an End User or Trainer, you may either:
1. distribute one (1) hard copy version of the Microsoft Instructor-Led Courseware to one
(1) End User attending the Private Training Session, and only immediately prior to the
commencement of the Private Training Session that is the subject matter of the Micro-
soft Instructor-Led Courseware being provided, or
2. provide one (1) End User who is attending the Private Training Session with the unique
redemption code and instructions on how they can access one (1) digital version of the
Microsoft Instructor-Led Courseware, or
3. you will provide one (1) Trainer who is teaching the Private Training Session with the
unique redemption code and instructions on how they can access one (1) Trainer
Content.
3. For each license you acquire, you must comply with the following:
1. you will only provide access to the Licensed Content to those individuals who have
acquired a valid license to the Licensed Content,
2. you will ensure that each End User attending an Private Training Session has their own
valid licensed copy of the Microsoft Instructor-Led Courseware that is the subject of the
Private Training Session,
3. you will ensure that each End User provided with a hard copy version of the Microsoft
Instructor-Led Courseware will be presented with a copy of this agreement and each End
User will agree that their use of the Microsoft Instructor-Led Courseware will be subject
to the terms in this agreement prior to providing them with the Microsoft Instructor-Led
Courseware. Each individual will be required to denote their acceptance of this agree-
ment in a manner that is enforceable under local law prior to their accessing the Micro-
soft Instructor-Led Courseware,
4. you will ensure that each Trainer teaching an Private Training Session has their own valid
licensed copy of the Trainer Content that is the subject of the Private Training Session,
EULA VII
5. you will only use qualified Trainers who hold the applicable Microsoft Certification
credential that is the subject of the Microsoft Instructor-Led Courseware being taught
for all your Private Training Sessions,
6. you will only use qualified MCTs who hold the applicable Microsoft Certification creden-
tial that is the subject of the MOC title being taught for all your Private Training Sessions
using MOC,
7. you will only provide access to the Microsoft Instructor-Led Courseware to End Users,
and
8. you will only provide access to the Trainer Content to Trainers.
4. If you are an End User:
For each license you acquire, you may use the Microsoft Instructor-Led Courseware solely for
your personal training use. If the Microsoft Instructor-Led Courseware is in digital format, you
may access the Microsoft Instructor-Led Courseware online using the unique redemption code
provided to you by the training provider and install and use one (1) copy of the Microsoft
Instructor-Led Courseware on up to three (3) Personal Devices. You may also print one (1) copy
of the Microsoft Instructor-Led Courseware. You may not install the Microsoft Instructor-Led
Courseware on a device you do not own or control.
5. If you are a Trainer.
1. For each license you acquire, you may install and use one (1) copy of the Trainer Content in
the form provided to you on one (1) Personal Device solely to prepare and deliver an
Authorized Training Session or Private Training Session, and install one (1) additional copy
on another Personal Device as a backup copy, which may be used only to reinstall the
Trainer Content. You may not install or use a copy of the Trainer Content on a device you do
not own or control. You may also print one (1) copy of the Trainer Content solely to prepare
for and deliver an Authorized Training Session or Private Training Session.
2. If you are an MCT, you may customize the written portions of the Trainer Content that are
logically associated with instruction of a training session in accordance with the most recent
version of the MCT agreement.
3. If you elect to exercise the foregoing rights, you agree to comply with the following: (i)
customizations may only be used for teaching Authorized Training Sessions and Private
Training Sessions, and (ii) all customizations will comply with this agreement. For clarity, any
use of “customize” refers only to changing the order of slides and content, and/or not using
all the slides or content, it does not mean changing or modifying any slide or content.
●● 2.2 Separation of Components. The Licensed Content is licensed as a single unit and you
may not separate their components and install them on different devices.
●● 2.3 Redistribution of Licensed Content. Except as expressly provided in the use rights
above, you may not distribute any Licensed Content or any portion thereof (including any permit-
ted modifications) to any third parties without the express written permission of Microsoft.
●● 2.4 Third Party Notices. The Licensed Content may include third party code that Micro-
soft, not the third party, licenses to you under this agreement. Notices, if any, for the third party
code are included for your information only.
●● 2.5 Additional Terms. Some Licensed Content may contain components with additional
terms, conditions, and licenses regarding its use. Any non-conflicting terms in those conditions
and licenses also apply to your use of that respective component and supplements the terms
described in this agreement.
VIII EULA
laws and treaties. Microsoft or its suppliers own the title, copyright, and other intellectual property
rights in the Licensed Content.
6. EXPORT RESTRICTIONS. The Licensed Content is subject to United States export laws and regula-
tions. You must comply with all domestic and international export laws and regulations that apply to
the Licensed Content. These laws include restrictions on destinations, end users and end use. For
additional information, see www.microsoft.com/exporting.
7. SUPPORT SERVICES. Because the Licensed Content is provided “as is”, we are not obligated to
provide support services for it.
8. TERMINATION. Without prejudice to any other rights, Microsoft may terminate this agreement if you
fail to comply with the terms and conditions of this agreement. Upon termination of this agreement
for any reason, you will immediately stop all use of and delete and destroy all copies of the Licensed
Content in your possession or under your control.
9. LINKS TO THIRD PARTY SITES. You may link to third party sites through the use of the Licensed
Content. The third party sites are not under the control of Microsoft, and Microsoft is not responsible
for the contents of any third party sites, any links contained in third party sites, or any changes or
updates to third party sites. Microsoft is not responsible for webcasting or any other form of trans-
mission received from any third party sites. Microsoft is providing these links to third party sites to
you only as a convenience, and the inclusion of any link does not imply an endorsement by Microsoft
of the third party site.
10. ENTIRE AGREEMENT. This agreement, and any additional terms for the Trainer Content, updates and
supplements are the entire agreement for the Licensed Content, updates and supplements.
11. APPLICABLE LAW.
1. United States. If you acquired the Licensed Content in the United States, Washington state law
governs the interpretation of this agreement and applies to claims for breach of it, regardless of
conflict of laws principles. The laws of the state where you live govern all other claims, including
claims under state consumer protection laws, unfair competition laws, and in tort.
2. Outside the United States. If you acquired the Licensed Content in any other country, the laws of
that country apply.
12. LEGAL EFFECT. This agreement describes certain legal rights. You may have other rights under the
laws of your country. You may also have rights with respect to the party from whom you acquired the
Licensed Content. This agreement does not change your rights under the laws of your country if the
laws of your country do not permit it to do so.
13. DISCLAIMER OF WARRANTY. THE LICENSED CONTENT IS LICENSED "AS-IS" AND "AS AVAILA-
BLE." YOU BEAR THE RISK OF USING IT. MICROSOFT AND ITS RESPECTIVE AFFILIATES GIVES NO
EXPRESS WARRANTIES, GUARANTEES, OR CONDITIONS. YOU MAY HAVE ADDITIONAL CON-
SUMER RIGHTS UNDER YOUR LOCAL LAWS WHICH THIS AGREEMENT CANNOT CHANGE. TO
THE EXTENT PERMITTED UNDER YOUR LOCAL LAWS, MICROSOFT AND ITS RESPECTIVE AFFILI-
ATES EXCLUDES ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICU-
LAR PURPOSE AND NON-INFRINGEMENT.
14. LIMITATION ON AND EXCLUSION OF REMEDIES AND DAMAGES. YOU CAN RECOVER FROM
MICROSOFT, ITS RESPECTIVE AFFILIATES AND ITS SUPPLIERS ONLY DIRECT DAMAGES UP TO
US$5.00. YOU CANNOT RECOVER ANY OTHER DAMAGES, INCLUDING CONSEQUENTIAL, LOST
PROFITS, SPECIAL, INDIRECT OR INCIDENTAL DAMAGES.
X EULA
■■ Module 0 Welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Welcome to the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
■■ Module 1 Introduction to AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Artificial Intelligence in Azure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Responsible AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
■■ Module 2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Introduction to Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Azure Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
■■ Module 3 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Computer Vision Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Computer Vision in Azure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
■■ Module 4 Natural Language Processing (NLP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Introduction to Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Building Natural Language Processing Solutions in Azure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Module 0 Welcome
Learning objectives
After completing this course, you will be able to:
●● Describe Artificial Intelligence workloads and considerations.
●● Describe fundamental principles of machine learning on Azure.
●● Describe features of computer vision workloads on Azure.
●● Describe features of Natural Language Processing (NLP) workloads on Azure.
Course Agenda
This course includes the following modules:
Module Lessons
Explore Fundamentals of Artificial Intelligence - Introduction to Artificial Intelligence
- Artificial Intelligence in Microsoft Azure
1 https://docs.microsoft.com/learn/certifications/azure-ai-fundamentals
2
Module Lessons
Explore Fundamentals of Machine Learning - Introduction to Machine Learning
- Azure Machine Learning
Explore Fundamentals of Computer Vision - Computer Vision Concepts
- Creating Computer Vision solutions in Azure
Explore Fundamentals of Natural Language - Introduction to Natural Language Processing
Processing - Building Natural Language Solutions in Azure
Lab environment
Labs in this course are based on exercises in Microsoft Learn. Y ou will be provided with an Azure sub-
scription for use in this class. Y
our instructor will provide details. ,
Module 1 Introduction to AI
Principles of Responsible AI
At Microsoft, AI software development is guided by a set of six principles, designed to ensure that AI
applications provide amazing solutions to difficult problems without any unintended negative conse-
quences.
Fairness
AI systems should treat all people fairly. For example, suppose you create a machine learning model to
support a loan approval application for a bank. The model should make predictions of whether or not the
loan should be approved without incorporating any bias based on gender, ethnicity, or other factors that
might result in an unfair advantage or disadvantage to specific groups of applicants.
Azure Machine Learning includes the capability to interpret models and quantify the extent to which each
feature of the data influences the model's prediction. This capability helps data scientists and developers
identify and mitigate bias in the model.
Inclusiveness
AI systems should empower everyone and engage people. AI should bring benefits to all parts of society,
regardless of physical ability, gender, sexual orientation, ethnicity, or other factors.
Transparency
AI systems should be understandable. Users should be made fully aware of the purpose of the system,
how it works, and what limitations may be expected.
Accountability
People should be accountable for AI systems. Designers and developers of AI-based solution should work
within a framework of governance and organizational principles that ensure the solution meets ethical
and legal standards that are clearly defined.
5
Note: For more information about Microsoft's principles for responsible AI, visit the Microsoft responsi-
ble AI site1.
1 https://microsoft.com/ai/responsible-ai
6
Responsible AI
Azure basics
Microsoft Azure provides a scalable, reliable cloud platform for AI, including:
●● Data storage: Azure Storage offers highly available, scalable, and secure storage for a variety of data
objects in the cloud.
●● Compute: Azure cloud compute provides the infrastructure to run applications and scale capacity on
demand. A compute target is a designated compute resource or environment.
●● Services: Azure services are delivered over the internet in a pay-as-you-go model. Services include
servers, storage, databases, networking, software, analytics, and intelligence. You can learn more
about Azure services2 here.
Service Description
Azure Machine Learning A platform for training, deploying, and managing
machine learning models
Cognitive Services A suite of services with four main pillars: Vision,
Speech, Language, Decision
Azure Bot Service A cloud-based platform for developing and
managing bots
Azure Cognitive Search Data extraction, enrichment, and indexing for
intelligent search and knowledge mining
Cognitive Services
In this lab, you will explore the Anomaly Detector cognitive service, which analyzes data over time to
detect any unusual values.
1. Start the virtual machine for this labor go to the exercise page at https://aka.ms/ai900-module-01.
2. Follow the instructions to complete the exercise on Microsoft Learn.
2 https://docs.microsoft.com/learn/modules/intro-to-azure-fundamentals/tour-of-azure-services
3 https://aka.ms/learn-artificial-intelligence
Module 2 Machine Learning
Feature Capability
Automated machine learning This feature enables non-experts to quickly create
an effective machine learning model from data.
Azure Machine Learning designer A graphical interface enabling no-code develop-
ment of machine learning solutions.
Data and compute management Cloud-based data storage and compute resources
that professional data scientists can use to run
data experiment code at scale.
Pipelines Data scientists, software engineers, and IT opera-
tions professionals can define pipelines to orches-
trate model training, deployment, and manage-
ment tasks.
In Azure Machine Learning, multi-step workflows to prepare data, train models, and perform model
management tasks are called pipelines. The designer tool in Azure Machine Learning studio enables you
to create and run pipelines by using a drag & drop visual interface to connect modules that define the
steps and data flow for the pipeline.
1 https://aka.ms/no-code-ml
Module 3 Computer Vision
Of course, computers don't have biological eyes that work the way ours do, but they are capable of
processing images; either from a live camera feed or from digital photographs or videos. This ability to
process images is the key to creating software that can emulate human visual perception.
To an AI application, an image is just an array of pixel values. These numeric values can be used as
features to train machine learning models that make predictions about the image and its contents.
Task Description
Image classification
Task Description
Semantic segmentation
Task Description
Face detection, analysis, and recognition
Service Description
Computer Vision - Image analysis – automated captioning and
tagging
- Common object detection
- Face detection
- Smart cropping
- Optical character recognition
17
Service Description
Custom Vision - Custom image classification
- Custom object detection
Face - Face detection and analysis
- Facial identification and recognition
Form Recognizer - Data extraction from forms, invoices, and other
documents
18
Describing an image
Computer Vision has the ability to analyze an image, evaluate the objects that are detected, and generate
a human-readable phrase or sentence that can describe what was detected in the image. Depending on
the image contents, the service may return multiple results, or phrases. Each returned phrase will have an
associated confidence score, indicating how confident the algorithm is in the supplied description. The
highest confidence phrases will be listed first.
To help you understand this concept, consider the following image of the Empire State building in New
York. The returned phrases are listed below the image in the order of confidence.
19
Detecting objects
The object detection capability is similar to tagging, in that the service can identify common objects; but
rather than tagging, or providing tags for the recognized objects only, this service can also return what is
known as bounding box coordinates. Not only will you get the type of object, but you will also receive a
set of coordinates that indicate the top, left, width, and height of the object detected, which you can use
to identify the location of the object in the image, like this:
20
Detecting brands
This feature provides the ability to identify commercial brands. The service has an existing database of
thousands of globally recognized logos from commercial brands of products.
When you call the service and pass it an image, it performs a detection task and determine if any of the
identified objects in the image are recognized brands. The service compares the brands against its
database of popular brands spanning clothing, consumer electronics, and many more categories. If a
known brand is detected, the service returns a response that contains the brand name, a confidence
score (from 0 to 1 indicating how positive the identification is), and a bounding box (coordinates) for
where in the image the detected brand was found.
For example, in the following image, a laptop has a Microsoft logo on its lid, which is identified and
located by the Computer Vision service.
Detecting faces
The Computer Vision service can detect and analyze human faces in an image, including the ability to
determine age and a bounding box rectangle for the location of the face(s). The facial analysis capabilities
of the Computer Vision service are a subset of those provided by the dedicated Face Service1. If you
need basic face detection and analysis, combined with general image analysis capabilities, you can use
the Computer Vision service; but for more comprehensive facial analysis and facial recognition functional-
ity, use the Face service.
The following example shows an image of a person with their face detected and approximate age
estimated.
Categorizing an image
Computer Vision can categorize images based on their contents. The service uses a parent/child hierar-
chy with a “current” limited set of categories. When analyzing an image, detected objects are compared
1 https://docs.microsoft.com/azure/cognitive-services/face/
21
to the existing categories to determine the best way to provide the categorization. As an example, one of
the parent categories is people_. This image of a person on a roof is assigned a category of people_.
A slightly different categorization is returned for the following image, which is assigned to the category
people_group because there are multiple people in the image:
2 https://docs.microsoft.com/azure/cognitive-services/computer-vision/category-taxonomy
22
Image Classification
Image classification is a machine learning technique in which the object being classified is an image, such
as a photograph.
As with any form of classification, creating an image classification solution involves training a model using
a set of existing data for which the class is already known. In this case, the existing data consists of a set
23
of categorized images, which you must upload to the Custom Vision service and tag with appropriate
class labels. After training the model, you can publish it as a service for applications to use.
Object Detection
Object detection is a form of machine learning based computer vision in which a model is trained to
recognize individual types of object in an image, and to identify their location in the image.
Creating an object detection solution with Custom Vision consists of three main tasks. First you must use
upload and tag images, then you can train the model, and finally you must publish the model so that
client applications can use it to locate objects in images.
Model training
To train a classification model, you must upload images to your training resource and label them with the
appropriate class labels. Then, you must train the model and evaluate the training results.
You can perform these tasks in the Custom Vision portal, or if you have the necessary coding experience
you can use one of the Custom Vision service programming language-specific software development kits
(SDKs).
One of the key considerations when using images for classification, is to ensure that you have sufficient
images of the objects in question and those images should be of the object from many different angles.
Model evaluation
Model training process is an iterative process in which the Custom Vision service repeatedly trains the
model using some of the data, but holds some back to evaluate the model. At the end of the training
process, the performance for the trained model is indicated by the following evaluation metrics:
●● Precision: What percentage of the class predictions made by the model were correct? For example, if
the model predicted that 10 images are oranges, of which eight were actually oranges, then the
precision is 0.8 (80%).
●● Recall: What percentage of class predictions did the model correctly identify? For example, if there are
10 images of apples, and the model found 7 of them, then the recall is 0.7 (70%).
●● Average Precision (AP): An overall metric that takes into account both precision and recall).
Face detection
Face detection involves identifying regions of an image that contain a human face, typically by returning
bounding box coordinates that form a rectangle around the face, like this:
25
Facial analysis
Moving beyond simple face detection, some algorithms can also return other information, such as facial
landmarks (nose, eyes, eyebrows, lips, and others).
These facial landmarks can be used as features with which to train a machine learning model from which
you can infer information about a person, such as their age or peceived emotional state, like this:
Facial recognition
A further application of facial analysis is to train a machine learning model to identify known individuals
from their facial features. This usage is more generally known as facial recognition, and involves using
26
multiple images of each person you want to recognize to train a model so that it can detect those
individuals in new images on which it wasn't trained.
Face
Face currently supports the following functionality:
●● Face Detection
●● Face Verification
27
The basic foundation of processing printed text is optical character recognition (OCR), in which a model
can be trained to recognize individual shapes as letters, numerals, punctuation, or other elements of text.
Much of the early work on implementing this kind of capability was performed by postal services to
support automatic sorting of mail based on postal codes. Since then, the state-of-the-art for reading text
has moved on, and it's now possible to build models that can detect printed or handwritten text in an
image and read it line-by-line or even word-by-word.
Uses of OCR
The ability to recognize printed and handwritten text in images, is beneficial in many scenarios such as:
●● note taking
●● digitizing forms, such as medical records or historical documents
●● scanning printed or handwritten checks for bank deposits
When you use the OCR API to process an image, it returns a hierarchy of information that consists of:
●● Regions in the image that contain text
●● Lines of text in each region
●● Words in each line of text
For each of these elements, the OCR API also returns bounding box coordinates that define a rectangle to
indicate the location in the image where the region, line, or word appears.
The receipt contains information that might be required for an expense claim, including:
●● The name, address, and telephone number of the merchant.
●● The date and time of the purchase.
●● The quantity and price of each item purchased.
●● The subtotal, tax, and total amounts.
Increasingly, organizations with large volumes of receipts and invoices to process are looking for artificial
intelligence (AI) solutions that can not only extract the text data from receipts, but also intelligently
interpret the information they contain.
After the resource has been created, you can create client applications that use its key and endpoint to
connect submit forms for analysis.
3 https://aka.ms/explore-computer-vision
Module 4 Natural Language Processing (NLP)
Machine translation – International and cross-cultural collaboration is often a key to success, and this
requires the ability to eliminate language barriers. AI can be used to automate translation of written and
spoken language. For example, an inbox add-in might be used to automatically translate incoming or
outgoing emails, or a conference call presentation system might provide a simultaneous transcript of the
speaker's words in multiple languages.
Semantic language modeling – Language can be complex and nuanced, so that multiple phrases might
be used to mean the same thing. For example, a driver might ask "Where can I get gas near here?",
"What's the location of the closest gas station?", or “Give me directions to a gas station.” All of these
mean essentially the same thing, so a semantic understanding of the language being used is required to
discern what the driver needs. An automobile manufacturer could train a language model to understand
phrases like these and respond by displaying appropriate satellite navigation directions.
Service Capabilities
Language - Language detection
- Key phrase extraction
- Entity detection
- Sentiment analysis
- Question answering
- Conversational language understanding
Speech - Text to speech
- Speech to text
- Speech translation
Translator - Text Translation
Azure Bot Service - Platform for conversational AI
35
●● Identify and categorize entities in the text. Entities can be people, places, organizations, or even
everyday items such as dates, times, quantities, and so on.
Speech recognition
Speech recognition is concerned with taking the spoken word and converting it into data that can be
processed - often by transcribing it into a text representation. The spoken words can be in the form of a
recorded voice in an audio file, or live audio from a microphone. Speech patterns are analyzed in the
audio to determine recognizable patterns that are mapped to words. To accomplish this feat, the software
typically uses multiple types of models, including:
●● An acoustic model that converts the audio signal into phonemes (representations of specific sounds).
●● A language model that maps phonemes to words, usually using a statistical algorithm that predicts
the most probable sequence of words based on the phonemes.
The recognized words are typically converted to text, which you can use for various purposes, such as.
●● Providing closed captions for recorded or live videos
●● Creating a transcript of a phone call or meeting
●● Automated note dictation
●● Determining intended user input for further processing
Speech synthesis
Speech synthesis is in many respects the reverse of speech recognition. It is concerned with vocalizing
data, usually by converting text to speech. A speech synthesis solution typically requires the following
information:
●● The text to be spoken.
●● The voice to be used to vocalize the speech.
To synthesize speech, the system typically tokenizes the text to break it down into individual words, and
assigns phonetic sounds to each word. It then breaks the phonetic transcription into prosodic units (such
as phrases, clauses, or sentences) to create phonemes that will be converted to audio format. These
37
phonemes are then synthesized as audio by applying a voice, which will determine parameters such as
pitch and timbre; and generating an audio wave form that can be output to a speaker or written to a file.
You can use the output of speech synthesis for many purposes, including:
●● Generating spoken responses to user input.
●● Creating voice menus for telephone systems.
●● Reading email or text messages aloud in hands-free scenarios.
●● Broadcasting announcements in public locations, such as railway stations or airports.
Translation
As organizations and individuals increasingly need to collaborate with people in other cultures and
geographic locations, the removal of language barriers has become a significant problem.
One solution is to find bilingual, or even multilingual, people to translate between languages. However
the scarcity of such skills, and the number of possible language combinations can make this approach
difficult to scale. Increasingly, automated translation, sometimes known as machine translation, is being
employed to solve this problem.
Utterances
An utterance is an example of something a user might say, and which your application must interpret. For
example, when using a home automation system, a user might use the following utterances:
“Switch the fan on.”
“Turn on the light.”
Entities
An entity is an item to which an utterance refers. For example, fan and light in the following utterances:
“Switch the fan on.”
“Turn on the light.”
You can think of the fan and light entities as being specific instances of a general device entity.
39
Intents
An intent represents the purpose, or goal, expressed in a user's utterance. For example, for both of the
previously considered utterances, the intent is to turn a device on; so in your Language Understanding
application, you might define a TurnOn intent that is related to these utterances.
A Language Understanding application defines a model consisting of intents and entities. Utterances are
used to train the model to identify the most likely intent and the entities to which it should be applied
based on a given input. The home assistant application we've been considering might include multiple
intents, like the following examples:
Creating intents
Define intents based on actions a user would want to perform with your application. For each intent, you
should include a variety of utterances that provide examples of how a user might express the intent.
If an intent can be applied to multiple entities, be sure to include sample utterances for each potential
entity; and ensure that each entity is identified in the utterance.
40
Predicting
When you are satisfied with the results from the training and testing, you can publish your Language
Understanding application to a prediction resource for consumption.
Client applications can use the model by connecting to the endpoint for the prediction resource, specify-
ing the appropriate authentication key; and submit user input to get predicted intents and entities. The
predictions are returned to the client application, which can then take appropriate action based on the
predicted intent.
You can anticipate different ways this question could be asked by adding an alternative phrasing such as:
●● Where is your head office located?
Connect channels
When your bot is ready to be delivered to users, you can connect it to multiple channels; making it
possible for users to interact with it through web chat, email, Microsoft Teams, and other common
communication media.
Users can submit questions to the bot through any of its channels, and receive an appropriate answer
from the knowledge base on which the bot is based.
1 https://aka.ms/explore-nlp