0% found this document useful (0 votes)
42 views59 pages

DP 900

The document provides an overview of Microsoft Azure's data fundamentals, covering core data concepts, data storage methods, and roles related to data management. It explains structured, semi-structured, and unstructured data, as well as operational and analytical data workloads, including the use of SQL and Azure services for relational and non-relational data. Additionally, it outlines various Azure database services and their functionalities for managing data in the cloud.

Uploaded by

krishnaswamy1022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views59 pages

DP 900

The document provides an overview of Microsoft Azure's data fundamentals, covering core data concepts, data storage methods, and roles related to data management. It explains structured, semi-structured, and unstructured data, as well as operational and analytical data workloads, including the use of SQL and Azure services for relational and non-relational data. Additionally, it outlines various Azure database services and their functionalities for managing data in the cloud.

Uploaded by

krishnaswamy1022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

© Copyright Microsoft Corporation. All rights reserved.

FOR USE ONLY AS PART OF MICROSOFT VIRTUAL TRAINING DAYS PROGRAM. THESE MATERIALS ARE NOT AUTHORIZED
FOR DISTRIBUTION, REPRODUCTION OR OTHER USE BY NON-MICROSOFT PARTIES.

Classified as Microsoft Confidential


Microsoft Azure Virtual Training
Day: Data Fundamentals
Explore fundamentals of data
Core data concepts
Learning Objectives
Data roles and services
Learning Objective : Core data concepts
What is data?
Values used to record information – often representing entities that have one or more attributes
Structured Semi-structured Unstructured

Customer {
"firstName": "Joe",
"lastName": "Jones",
ID FirstName LastName Email Address "address":
{
joe@litware.c "streetAddress": "1 Main {
1 Joe Jones 1 Main St. "firstName": "Samir",
om
St.",
"city": "New York", "lastName": "Nadoy",
"state": "NY", "address":
samir@north {
2 Samir Nadoy 123 Elm Pl. "postalCode": "10099"
wind.com },
Pl.",
"streetAddress": "123 Elm
"contact":
[ "unit": "500",
"city": "Seattle",
Product
{
"type": "home", "state": "WA",
"number": "555 123-1234" "postalCode": "98999"
ID Name Price }, },
{ "contact":
[
123 Hammer 2.99
"type": "email",
"address": {
"joe@litware.com" "type": "email",
162 Screwdriver 3.49 } "address":
] "samir@northwind.com"
}
201 Wrench 4.25 }
]
}
How is data stored?
Files Databases
Delimited Text
FirstName,LastName,Email
Relational Customer
Product
ID Name Price

Joe,Jones,joe@litware.com ID Email Address 123 Hammer 2.99

Samir,Nadoy,samir@northwind.com 1 joe@litware.com 1 Main St. 162 Screwdriver 3.49

2 samir@northwind.com 123 Elm Pl. 201 Wrench 4.25

JavaScript Object Notation (JSON)


{ LineItem
Order
OrderNo ItemNo ProductID Quantity
"customers": OrderNo OrderDate Customer
1000 1 123 1
[ 1000 1/1/2022 1
1000 2 201 2
{ "firstName": "Joe", "lastName": "Jones"}, 1001 1/1/2022 2
1001 1 123 2
{ "firstName": "Samir", "lastName": "Nadoy"}
]
}
Non-relational Orderswo
Extensible Markup Language (XML) rk
to Key Customer si Product
Products
ts nAddress
<Customer firstName="Joe" lastName="Jones"/> o r Sue Name Name Price

ep
Key Value
Customer
123 “Hammer ($2.99)”
r 1000 Joe Jones 1 Main St. Hammer 2.99

Binary Large Object (BLOB)


Key Document 1001 Samir Nadoy 123Elm Pl. Wrench 4.25
162 “Screwdriver ($3.49)” {
Column Family
works in
1
10110101101010110010... 201 “Wrench ($4.25)”
"Name": "Joe Jones"

Hardware
}

Key-value {

Optimized formats:
2 "Name": “Samir Nadoy"
Ben }
Graph
Avro, ORC, Parquet Document
Operational data workloads
Data is stored in a database that is optimized for online transactional processing
(OLTP) operations that support applications
A mix of read and write activity
For example:
Read the Product table to display a catalog Order
… … …
Write to the Order table to record a purchase … … …

Data is stored using transactions * * *

Transactions are "ACID" based:


Atomicity – each transaction is treated as a single unit of work, which succeeds completely or fails completely
Consistency – transactions can only take the data in the database from one valid state to another
Isolation – concurrent transactions cannot interfere with one another
Durability – when a transaction has succeeded, the data changes are persisted in the database
Analytical data workloads
2 3

1
4
▲---
-
▼---
-
▲---
Operational data is extracted, transformed, and loaded (ETL) into a data lake for analysis
-

Data is loaded into a schema of tables - typically in a Spark-based data lakehouse with tabular
abstractions over files in the data lake, or a data warehouse with a fully relational SQL engine
Data in tables may be aggregated and loaded into an online analytical processing (OLAP)
model, or cube
The files in the data lake, relational tables, and analytical model can be queried to produce
reports and dashboards
Learning Objective: Data roles and services
Data professional roles

Database Administrator Data Engineer Data Analyst


Database provisioning, Data integration pipelines and ETL Analytical modeling
configuration and management processes
Data reporting and summarization
Database security and user access Data cleansing and transformation
Data visualization
Database backups and resiliency Analytical data store schemas and
data loads
Database performance monitoring
and optimization
Microsoft cloud services for data
Operational Data Workloads Analytical Data Workloads
Azure SQL Software-as-a-Service (SaaS) Platform-as-a-Service (PaaS)
Family of SQL Server based relational Microsoft Fabric
database services
Integrated, end-to-end analytics: Azure Databricks
Data ingestion and ETL Apache Spark lakehouse
Open-source databases in Azure analytics and data processing
Data lakehouse
Maria DB, MySQL, PostgreSQL
Data warehouse
Data science and ML
Azure Cosmos DB Realtime analytics
Highly scalable non-relational and vector Data visualization
database Data governance and management
Microsoft Purview
Solution for enterprise-wide data governance
Azure Storage and discoverability:
File, blob, and table storage • Create a map of your data and track data
Hierarchical namespace for data lake lineage across multiple data sources.
storage • Enforce data governance across the
enterprise and ensure the integrity of data. others…
Explore fundamentals of relational data in
Azure
Explore relational data concepts
Learning Objectives
Explore Azure services for relational data
Learning Objective: Explore relational data
concepts
Relational tables

 Data is stored in tables Customer


 Tables consists of rows and columns ID FirstName Middle LastName Email Address City

 All rows have the same columns 1 Joe David Jones joe@litware.com 1 Main St. Seattle

 Each column is assigned a datatype 2 Samir Nadoy samir@northwind.com 123 Elm Pl. New York

Product Order LineItem


ID Name Price OrderNo OrderDate Customer OrderNo ItemNo ProductID Quantity

123 Hammer 2.99 1000 1/1/2022 1 1000 1 123 1

162 Screwdriver 3.49 1001 1/1/2022 2 1000 2 201 2

201 Wrench 4.25 1001 1 123 2


Normalization

Sales Data  Separate each entity into its own table


OrderNo OrderDate Customer Product Quantity  Separate each discrete attribute into its own column
1000 1/1/2022 Joe Jones, 1 Main St, Seattle Hammer ($2.99) 1  Uniquely identify each entity instance (row) using a
1000 1/1/2022 Joe Jones- 1 Main St, Seattle Screwdriver ($3.49) 2 primary key
1001 1/1/2022 Samir Nadoy, 123 Elm Pl, New York Hammer ($2.99) 2  Use foreign key columns to link related entities

… … … … …

LineItem Product
Customer Order OrderNo ItemNo ProductID Quantity ID Name Price

ID FirstName LastName Address City OrderNo OrderDate Customer 1000 1 123 1 123 Hammer 2.99

1 Joe Jones 1 Main St. Seattle 1000 1/1/2022 1 1000 2 201 2 162 Screwdriver 3.49

2 Samir Nadoy 123 Elm Pl. New York 1001 1/1/2022 2 1001 1 123 2 201 Wrench 4.25
Structured Query Language (SQL)
 SQL is a standard language for use with relational databases
 Standards are maintained by ANSI and ISO
 Most RDBMS systems support proprietary extensions of standard SQL

Data Definition Language (DDL) Data Control Language (DCL) Data Manipulation Language (DML)

CREATE, ALTER, DROP, RENAME GRANT, DENY, REVOKE INSERT, UPDATE, DELETE, SELECT
CREATE TABLE Product GRANT SELECT, INSERT, UPDATE SELECT Name, Price
( ON Product FROM Product
ProductID INT PRIMARY KEY, TO user1; WHERE Price > 2.50
Name VARCHAR(20) NOT NULL, ORDER BY Price;
Price DECIMAL NULL Product Results
);
ID Name Price Name Price
123 Hammer 2.99
Product Hammer 2.99
162 Screwdriver 3.49 Screwdriver 3.49
ID Name Price
201 Wrench 4.25 Wrench 4.25
Other common database objects
Views Stored Procedures Indexes
Pre-defined SQL queries that behave as Pre-defined SQL statements that can Tree-based structures that improve query
virtual tables include parameters performance
CREATE VIEW Deliveries CREATE PROCEDURE RenameProduct CREATE INDEX idx_ProductName
AS @ProductID INT, ON Product(Name);
SELECT o.OrderNo, o.OrderDate, @NewName VARCHAR(20)
c.Address, c.City AS
FROM Order AS o JOIN Customer AS c
ON o.Customer = c.ID; UPDATE Product
SET Name = @NewName
Customer Order WHERE ID = @ProductID; ●
...
… … … … … …
EXEC RenameProduct 201, 'Spanner'; Product
… … … … … … A-L M-Z ID Name Price
123 Hammer 2.99
Deliveries Product
162 Screwdriver 3.49
OrderNo OrderDate Address City ID Name Price
201 Wrench 4.25
1000 1/1/2022 1 Main St. Seattle 201 Wrench Spanner 4.25
1001 1/1/2022 123 Elm Pl. New York
Learning Objective : Explore Azure services
for relational data
Azure SQL
Family of SQL Server based cloud database services

SQL Server on Azure VMs Azure SQL Managed Instance Azure SQL Database

Guaranteed compatibility to SQL Server Near 100% compatibility with SQL Server Core database functionality
on premises on-premises compatibility with SQL Server
Customer manages everything – OS Automatic backups, software patching, Automatic backups, software patching,
upgrades, software upgrades, backups, database monitoring, and other database monitoring, and other
replication maintenance tasks maintenance tasks
Pay for the server VM running costs and Use a single instance with multiple Single database or elastic pool to
software licensing, not per database databases, or multiple instances in a pool dynamically share resources across
Great for hybrid cloud or migrating with shared resources multiple databases
complex on-premises database Great for migrating SQL Server databases Great for new, cloud-based applications
configurations to the cloud

IaaS PaaS
Azure Database services for open-source
Azure managed solutions for common open-source RDBMSs

Azure Database for Azure Database for Azure Database for


PostgreSQL MySQL MariaDB

Database service in the Microsoft PaaS implementation of MySQL in An implementation of the


cloud based on the PostgreSQL the Azure cloud, based on the MariaDB Community Edition
Community Edition database MySQL Community Edition database management system
engine Commonly used in Linux, Apache, adapted to run in Azure
Hybrid relational and object MySQL, PHP (LAMP) application
storage architectures

PaaS
Demo • Lab: Provision Azure relational database services
Explore fundamentals of non-relational
data in Azure
Learning Objectives Fundamentals of Azure Storage
Fundamentals of Azure Cosmos DB​​
Learning Objective: Fundamentals of Azure
Storage
Azure Blob Storage
Storage for data as binary large objects (BLOBs)
Block blobs
• Large, discrete, binary objects that change infrequently Azure Storage Account
• Blobs can be up to 4.7 TB, composed of blocks of up to 100 MB
– A blob can contain up to 50,000 blocks

Page blobs Blob Container


• Used as virtual disk storage for VMs
• Blobs can be up to 8 TB, composed of fixed sized-512 byte pages blob1
Append blobs
• Block blobs that are used to optimize append operations folder1/blob2
• Maximum size just over 195 GB – each block can be up to 4 MB

Per-blob storage tiers


Blobs can be organized in virtual directories,
• Hot – Highest cost, lowest latency
but each path is considered a single blob in
• Cool – Lower cost, higher latency a flat namespace – folder level operations
• Archive – Lowest cost, highest latency are not supported
Azure Data Lake Store Gen 2

Distributed file system built on Blob Storage Azure Storage Account


• Combines Azure Data Lake Store Gen 1 with Azure Blob Storage
for large-scale file storage and analytics
Blob Container
• Enables file and directory level access control and management
• Compatible with common large scale analytical systems Directory
File1
Enabled in an Azure Storage account through the File2
Hierarchical Namespace option Hierarchical Namespace
• Set during account creation
• Upgrade existing storage account File system includes directories and files,
– One-way upgrade process and is compatible with large scale data
analytics systems like Databricks
Azure Files

Files shares in the cloud that can be


accessed from anywhere with an internet Azure Storage Account
connection
• Support for common file sharing protocols:
– Server Message Block (SMB)
Azure Files share

– Network File System (NFS) – requires premium tier


• Data is replicated for redundancy and encrypted
at rest
Azure Table Storage

Key-Value storage for application data


Azure Storage Account
• Tables consist of key and value columns
– Partition and row keys
– Custom property columns for data values
Tables
• A Timestamp column is added automatically to log data
changes

• Rows are grouped into partitions to improve


performance PartitionKey RowKey Timestamp Property1 Property2

• Property columns are assigned a data type, and 1 123 2022-01-01 A value Another value

can contain any value of that type 1 124 2022-01-01 This value

• Rows do not need to include the same property 2 125 2022-01-01 That value
columns
Demo • Lab: Explore Azure Storage
Learning Objective: Fundamentals of Azure
Cosmos DB
What is Azure Cosmos DB?

A fully managed, NoSQL and vector {


"x":[…]
database for modern applications }
Documents Graphs
• Support for multiple APIs for application
development
• Real time access with fast read and write
performance
Vectors
• Enable multi-region writes to replicate
data globally; enabling users in specified
regions to work with a local replica Key-Value Tables Column Family Stores
Key Value Col1 Col2 Col3
Azure Cosmos DB APIs
Azure Cosmos DB for NoSQL Azure Cosmos DB for MongoDB Azure Cosmos DB for PostgreSQL
• Native API for Cosmos DB • Compatibility with MongoDB • Compatibility with PostgreSQL

{ id name dept manager


"id": "joe@litware.com",
{
SELECT * "name": "Joe Jones",
"id": 123,
FROM customers c "address": { db.products.find({ 1 Sue Smith Hardware Joe Jones
"name": "Hammer",
WHERE c.id = "street": "1 Main St.", id: 123})
"price": 2.99}
"joe@litware.com" "city": "Seattle"
}
}
} 2 Ben Chan Hardware Sue Smith

Azure Cosmos DB for Table Azure Cosmos DB for Apache Azure Cosmos DB for Apache
• Key-value storage API Cassandra Gremlin
• Compatible with Azure Table Storage • Compatibility with Apache Cassandra • Used to work
(1) Sue
with graph data
PartitionKey RowKey Name id name dept manager • vertices are
1 123 Joe Jones 1 Sue Smith Hardware connected via
relationships
1 124 Samir Nadoy 2 Ben Chan Hardware Sue Smith
(edges) (2) Ben (h) Hardware
Demo • Lab: Explore Azure Cosmos DB
Explore large-scale data analytics
Learning Objectives • Large-scale data analytics
Learning Objective: Large-scale data analytics
Elements of a large-scale data analytics solution
Data ingestion and processing Analytical data store Analytical data model Data visualization

▲----
▼----
▲----

Extract, Transform, and Load (ETL) or Flexible, scalable file Semantic models for Reports
(ELT) orchestration to move data storage in a data lake analytical entities Charts
Database mirroring to replicate Relational tables in a Often in the form of Dashboards
operational data for analytics data lakehouse or data aggregated cubes that
Distributed processing to cleanse and warehouse summarize numeric values
restructure data at scale across one or more
dimensions
Batch and real-time data processing
Data processing in large-scale analytics

Relational Database Apache Spark

 Well established model for relational data  Open-Source platform for scalable,
storage and processing distributed data processing
 Comprehensive SQL language support for  Multi-language data processing code
querying and data manipulation (Python, Scala, Java, SQL, …)
Analytical data store architectures

Data Warehouse Data Lakehouse

Data is stored in a relational database and Data files are stored in a distributed file
queried using a SQL query engine system (a data lake) and typically processed
Tables are denormalized for query using Apache Spark
optimization Metadata is used to define tables that provide
Typically as a star or snowflake schema of a relational SQL interface to the file data
numeric facts that can be aggregated by Commonly, a Delta Parquet format is used to
dimensions provide transactional database functionality
PaaS data analytics with Azure Databricks

Azure Databricks

Azure-based implementation of Databricks cloud analytics


platform
Scalable Spark and SQL querying for data lake analytics
Interactive experience in Azure Databricks workspace
Use Azure Data Factory to implement data ingestion and
processing pipelines

Use to leverage Databricks skills and for cloud portability


SaaS data analytics with Microsoft Fabric
Microsoft Fabric

Partner &
Data Data Data Data Real-Time Industry
Power BI
Factory Engineering Warehouse Science Intelligence workloads

Copilot in Fabric

OneLake

Microsoft Purview
Demo • Lab: Explore data analytics in Microsoft Fabric
Explore streaming and real-time
analytics​​
Learning Objectives Explore streaming and real-time analytics​​
Learning Objective: Explore streaming and
real-time analytics​​
Batch vs stream processing
Batch processing Stream processing

Data is collected and processed at regular intervals Data is processed in (near) real-time as it arrives
Common elements of stream processing
1. An event generates some data.
2. The generated data is captured in a streaming source for processing.
3. The event data is processed.
4. The results of the stream processing operation are written to an output (or sink).
Real-time analytics in Microsoft Fabric
• Support for continuous data ingestion
from multiple sources
• Capture streaming data in an
eventstream
• Write real-time data to a table in a ••• Lakehouse
table
Eventstream
Lakehouse or a KQL database
• Query real-time data using SQL or KQL
• Build real-time visualizations KQL Database
table
Data analytics with Apache Spark
Apache Spark is a distributed processing framework for large scale data analytics. You can use
Spark on Microsoft Azure in the following services:
• Microsoft Fabric
• Azure Databricks

Spark Structured Streaming


The Spark Structured Streaming library, which provides an application programming interface (API)
for ingesting, processing, and outputting results from perpetual streams of data.

Delta Lake
Delta Lake can be used in Spark to define
relational tables for both batch and
stream processing.
Demo • Lab: Explore real-time analytics in Microsoft
Fabric
Explore fundamentals of
data visualization​
Learning Objectives Explore fundamentals of data visualization​
Learning Objective: Explore fundamentals of
data visualization​
Introduction to data visualization with Power BI
Start with Power BI Desktop
Import data from one or more sources
Define a data model
Create visualizations in a report
Publish to Power BI Service Web Browser
Schedule data refresh
Create dashboards and apps Power BI Service
Power BI Desktop
Share with other users
Interact with published reports
Power BI Phone App
Web browser
Power BI phone app
Analytical data modeling
Customer (dimension) Product (dimension) Total revenue for wrenches
Key Name Address City Key Name Category sold to Samir in January
1 Joe 1 Main St. Seattle 1 Hammer Tools
2 Samir 123 Elm Pl. New York 2 Screwdriver Tools

Hammer Wrench Screwdriver


3 Alice 2 High St. Seattle 3 Wrench Tools
4 Bolts Hardware

Product
Sales (fact)
Key TimeKey ProductKey CustomerKey Quantity Revenue

ice
er

Al
1 01012022 1 1 1 2.99
∑ om

ir
m
2 01012022 2 1 2 6.98 t

Sa
us

e
3 02012022 1 2 2 5.98
C

Jo
Jan Feb Mar Apr May …
Time
Time (dimension) Measures Year Month Day Revenue
Key Year Month Day WeekDay Model aggregates measures 2022 8221.48
01012022 2022 Jan 1 Sat at each hierarchy level Jan 574.86
02012022 2022 Jan 2 Sun 1 9.97
2 5.98
Hierarchy … …
Common data visualizations in reports
Tables and text Bar or column chart Line chart

Pie chart Scatter plot Map


Demo • Lab: Visualize data with Power BI

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy