DP 900
DP 900
FOR USE ONLY AS PART OF MICROSOFT VIRTUAL TRAINING DAYS PROGRAM. THESE MATERIALS ARE NOT AUTHORIZED
FOR DISTRIBUTION, REPRODUCTION OR OTHER USE BY NON-MICROSOFT PARTIES.
Customer {
"firstName": "Joe",
"lastName": "Jones",
ID FirstName LastName Email Address "address":
{
joe@litware.c "streetAddress": "1 Main {
1 Joe Jones 1 Main St. "firstName": "Samir",
om
St.",
"city": "New York", "lastName": "Nadoy",
"state": "NY", "address":
samir@north {
2 Samir Nadoy 123 Elm Pl. "postalCode": "10099"
wind.com },
Pl.",
"streetAddress": "123 Elm
"contact":
[ "unit": "500",
"city": "Seattle",
Product
{
"type": "home", "state": "WA",
"number": "555 123-1234" "postalCode": "98999"
ID Name Price }, },
{ "contact":
[
123 Hammer 2.99
"type": "email",
"address": {
"joe@litware.com" "type": "email",
162 Screwdriver 3.49 } "address":
] "samir@northwind.com"
}
201 Wrench 4.25 }
]
}
How is data stored?
Files Databases
Delimited Text
FirstName,LastName,Email
Relational Customer
Product
ID Name Price
ep
Key Value
Customer
123 “Hammer ($2.99)”
r 1000 Joe Jones 1 Main St. Hammer 2.99
Hardware
}
Key-value {
Optimized formats:
2 "Name": “Samir Nadoy"
Ben }
Graph
Avro, ORC, Parquet Document
Operational data workloads
Data is stored in a database that is optimized for online transactional processing
(OLTP) operations that support applications
A mix of read and write activity
For example:
Read the Product table to display a catalog Order
… … …
Write to the Order table to record a purchase … … …
1
4
▲---
-
▼---
-
▲---
Operational data is extracted, transformed, and loaded (ETL) into a data lake for analysis
-
Data is loaded into a schema of tables - typically in a Spark-based data lakehouse with tabular
abstractions over files in the data lake, or a data warehouse with a fully relational SQL engine
Data in tables may be aggregated and loaded into an online analytical processing (OLAP)
model, or cube
The files in the data lake, relational tables, and analytical model can be queried to produce
reports and dashboards
Learning Objective: Data roles and services
Data professional roles
All rows have the same columns 1 Joe David Jones joe@litware.com 1 Main St. Seattle
Each column is assigned a datatype 2 Samir Nadoy samir@northwind.com 123 Elm Pl. New York
… … … … …
LineItem Product
Customer Order OrderNo ItemNo ProductID Quantity ID Name Price
ID FirstName LastName Address City OrderNo OrderDate Customer 1000 1 123 1 123 Hammer 2.99
1 Joe Jones 1 Main St. Seattle 1000 1/1/2022 1 1000 2 201 2 162 Screwdriver 3.49
2 Samir Nadoy 123 Elm Pl. New York 1001 1/1/2022 2 1001 1 123 2 201 Wrench 4.25
Structured Query Language (SQL)
SQL is a standard language for use with relational databases
Standards are maintained by ANSI and ISO
Most RDBMS systems support proprietary extensions of standard SQL
Data Definition Language (DDL) Data Control Language (DCL) Data Manipulation Language (DML)
CREATE, ALTER, DROP, RENAME GRANT, DENY, REVOKE INSERT, UPDATE, DELETE, SELECT
CREATE TABLE Product GRANT SELECT, INSERT, UPDATE SELECT Name, Price
( ON Product FROM Product
ProductID INT PRIMARY KEY, TO user1; WHERE Price > 2.50
Name VARCHAR(20) NOT NULL, ORDER BY Price;
Price DECIMAL NULL Product Results
);
ID Name Price Name Price
123 Hammer 2.99
Product Hammer 2.99
162 Screwdriver 3.49 Screwdriver 3.49
ID Name Price
201 Wrench 4.25 Wrench 4.25
Other common database objects
Views Stored Procedures Indexes
Pre-defined SQL queries that behave as Pre-defined SQL statements that can Tree-based structures that improve query
virtual tables include parameters performance
CREATE VIEW Deliveries CREATE PROCEDURE RenameProduct CREATE INDEX idx_ProductName
AS @ProductID INT, ON Product(Name);
SELECT o.OrderNo, o.OrderDate, @NewName VARCHAR(20)
c.Address, c.City AS
FROM Order AS o JOIN Customer AS c
ON o.Customer = c.ID; UPDATE Product
SET Name = @NewName
Customer Order WHERE ID = @ProductID; ●
...
… … … … … …
EXEC RenameProduct 201, 'Spanner'; Product
… … … … … … A-L M-Z ID Name Price
123 Hammer 2.99
Deliveries Product
162 Screwdriver 3.49
OrderNo OrderDate Address City ID Name Price
201 Wrench 4.25
1000 1/1/2022 1 Main St. Seattle 201 Wrench Spanner 4.25
1001 1/1/2022 123 Elm Pl. New York
Learning Objective : Explore Azure services
for relational data
Azure SQL
Family of SQL Server based cloud database services
SQL Server on Azure VMs Azure SQL Managed Instance Azure SQL Database
Guaranteed compatibility to SQL Server Near 100% compatibility with SQL Server Core database functionality
on premises on-premises compatibility with SQL Server
Customer manages everything – OS Automatic backups, software patching, Automatic backups, software patching,
upgrades, software upgrades, backups, database monitoring, and other database monitoring, and other
replication maintenance tasks maintenance tasks
Pay for the server VM running costs and Use a single instance with multiple Single database or elastic pool to
software licensing, not per database databases, or multiple instances in a pool dynamically share resources across
Great for hybrid cloud or migrating with shared resources multiple databases
complex on-premises database Great for migrating SQL Server databases Great for new, cloud-based applications
configurations to the cloud
IaaS PaaS
Azure Database services for open-source
Azure managed solutions for common open-source RDBMSs
PaaS
Demo • Lab: Provision Azure relational database services
Explore fundamentals of non-relational
data in Azure
Learning Objectives Fundamentals of Azure Storage
Fundamentals of Azure Cosmos DB
Learning Objective: Fundamentals of Azure
Storage
Azure Blob Storage
Storage for data as binary large objects (BLOBs)
Block blobs
• Large, discrete, binary objects that change infrequently Azure Storage Account
• Blobs can be up to 4.7 TB, composed of blocks of up to 100 MB
– A blob can contain up to 50,000 blocks
• Property columns are assigned a data type, and 1 123 2022-01-01 A value Another value
can contain any value of that type 1 124 2022-01-01 This value
• Rows do not need to include the same property 2 125 2022-01-01 That value
columns
Demo • Lab: Explore Azure Storage
Learning Objective: Fundamentals of Azure
Cosmos DB
What is Azure Cosmos DB?
Azure Cosmos DB for Table Azure Cosmos DB for Apache Azure Cosmos DB for Apache
• Key-value storage API Cassandra Gremlin
• Compatible with Azure Table Storage • Compatibility with Apache Cassandra • Used to work
(1) Sue
with graph data
PartitionKey RowKey Name id name dept manager • vertices are
1 123 Joe Jones 1 Sue Smith Hardware connected via
relationships
1 124 Samir Nadoy 2 Ben Chan Hardware Sue Smith
(edges) (2) Ben (h) Hardware
Demo • Lab: Explore Azure Cosmos DB
Explore large-scale data analytics
Learning Objectives • Large-scale data analytics
Learning Objective: Large-scale data analytics
Elements of a large-scale data analytics solution
Data ingestion and processing Analytical data store Analytical data model Data visualization
▲----
▼----
▲----
Extract, Transform, and Load (ETL) or Flexible, scalable file Semantic models for Reports
(ELT) orchestration to move data storage in a data lake analytical entities Charts
Database mirroring to replicate Relational tables in a Often in the form of Dashboards
operational data for analytics data lakehouse or data aggregated cubes that
Distributed processing to cleanse and warehouse summarize numeric values
restructure data at scale across one or more
dimensions
Batch and real-time data processing
Data processing in large-scale analytics
Well established model for relational data Open-Source platform for scalable,
storage and processing distributed data processing
Comprehensive SQL language support for Multi-language data processing code
querying and data manipulation (Python, Scala, Java, SQL, …)
Analytical data store architectures
Data is stored in a relational database and Data files are stored in a distributed file
queried using a SQL query engine system (a data lake) and typically processed
Tables are denormalized for query using Apache Spark
optimization Metadata is used to define tables that provide
Typically as a star or snowflake schema of a relational SQL interface to the file data
numeric facts that can be aggregated by Commonly, a Delta Parquet format is used to
dimensions provide transactional database functionality
PaaS data analytics with Azure Databricks
Azure Databricks
Partner &
Data Data Data Data Real-Time Industry
Power BI
Factory Engineering Warehouse Science Intelligence workloads
Copilot in Fabric
OneLake
Microsoft Purview
Demo • Lab: Explore data analytics in Microsoft Fabric
Explore streaming and real-time
analytics
Learning Objectives Explore streaming and real-time analytics
Learning Objective: Explore streaming and
real-time analytics
Batch vs stream processing
Batch processing Stream processing
Data is collected and processed at regular intervals Data is processed in (near) real-time as it arrives
Common elements of stream processing
1. An event generates some data.
2. The generated data is captured in a streaming source for processing.
3. The event data is processed.
4. The results of the stream processing operation are written to an output (or sink).
Real-time analytics in Microsoft Fabric
• Support for continuous data ingestion
from multiple sources
• Capture streaming data in an
eventstream
• Write real-time data to a table in a ••• Lakehouse
table
Eventstream
Lakehouse or a KQL database
• Query real-time data using SQL or KQL
• Build real-time visualizations KQL Database
table
Data analytics with Apache Spark
Apache Spark is a distributed processing framework for large scale data analytics. You can use
Spark on Microsoft Azure in the following services:
• Microsoft Fabric
• Azure Databricks
Delta Lake
Delta Lake can be used in Spark to define
relational tables for both batch and
stream processing.
Demo • Lab: Explore real-time analytics in Microsoft
Fabric
Explore fundamentals of
data visualization
Learning Objectives Explore fundamentals of data visualization
Learning Objective: Explore fundamentals of
data visualization
Introduction to data visualization with Power BI
Start with Power BI Desktop
Import data from one or more sources
Define a data model
Create visualizations in a report
Publish to Power BI Service Web Browser
Schedule data refresh
Create dashboards and apps Power BI Service
Power BI Desktop
Share with other users
Interact with published reports
Power BI Phone App
Web browser
Power BI phone app
Analytical data modeling
Customer (dimension) Product (dimension) Total revenue for wrenches
Key Name Address City Key Name Category sold to Samir in January
1 Joe 1 Main St. Seattle 1 Hammer Tools
2 Samir 123 Elm Pl. New York 2 Screwdriver Tools
Product
Sales (fact)
Key TimeKey ProductKey CustomerKey Quantity Revenue
ice
er
Al
1 01012022 1 1 1 2.99
∑ om
ir
m
2 01012022 2 1 2 6.98 t
Sa
us
e
3 02012022 1 2 2 5.98
C
Jo
Jan Feb Mar Apr May …
Time
Time (dimension) Measures Year Month Day Revenue
Key Year Month Day WeekDay Model aggregates measures 2022 8221.48
01012022 2022 Jan 1 Sat at each hierarchy level Jan 574.86
02012022 2022 Jan 2 Sun 1 9.97
2 5.98
Hierarchy … …
Common data visualizations in reports
Tables and text Bar or column chart Line chart