0% found this document useful (0 votes)
414 views264 pages

MS Azure DP-900

The document outlines the Microsoft Azure Data Fundamentals DP-900 course, which consists of four modules focusing on core data concepts, relational data, non-relational data, and data analytics. Each module covers various topics including data formats, storage options, and processing methods, emphasizing the importance of data in business decision-making. The course aims to provide foundational knowledge for understanding data roles and services within Azure.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
414 views264 pages

MS Azure DP-900

The document outlines the Microsoft Azure Data Fundamentals DP-900 course, which consists of four modules focusing on core data concepts, relational data, non-relational data, and data analytics. Each module covers various topics including data formats, storage options, and processing methods, emphasizing the importance of data in business decision-making. The course aims to provide foundational knowledge for understanding data roles and services within Azure.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 264

Microsoft AZURE

Data Fundamentals
DP-900
Wajih Khelifi
DDS- BSC - MSc
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts

Module 2 • Explore fundamental relational data concepts


• Explore relational database services in Azure
Explore relational data

Module 3 • Explore Azure Storage for non-relational data


• Explore fundamentals of Azure Cosmos DB
Explore non-relational data

Module 4 • Explore fundamentals of large-scale analytics


• Explore fundamentals of real-time analytics
Explore data analytics in Azure • Explore fundamentals of data visualization
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
Explore core data concepts • Explore data roles and services

Module 2 • Explore fundamental relational data concepts


• Explore relational database services in Azure
Explore relational data

Module 3 • Explore Azure Storage for non-relational data


• Explore fundamentals of Azure Cosmos DB
Explore non-relational data

Module 4 • Explore fundamentals of large-scale analytics


• Explore fundamentals of real-time analytics
Explore data analytics in Azure • Explore fundamentals of data visualization
Module 1: Explore core data concepts
1- Explore Core Data Concepts
• Introduction
• Identify data formats
• Explore file storage
• Explore databases
• Explore transactional data processing
• Explore analytical data processing
• Knowledge check
• Summary
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Introduction
The amount of data generated by systems, applications, and devices has grown significantly, with data
now available in various structures and formats.
Collecting and storing data has become easier and cheaper, making it a valuable asset for businesses to
gain insights and drive critical decisions.
In this module we will learn how to:
• Identify common data formats
• Describe options for storing data in files
• Describe options for storing data in databases
• Describe characteristics of transactional data processing solutions
• Describe characteristics of analytical data processing solutions
Module 1: Explore core data concepts
1- Explore Core Data Concepts
• Introduction
• Identify data formats
• Explore file storage
• Explore databases
• Explore transactional data processing
• Explore analytical data processing
• Knowledge check
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Identify data formats

Data is classified as
• structured
• semi-structured
• unstructured
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Identify data formats
Structured Data:
data that adheres to a fixed schema, so all of the data has the same fields or
properties.

the schema for structured


data entities is tabular

• Rows represent each instance of a data entity


• Columns represent attributes of the entity

Structured data is often stored in a relational database in which multiple tables can reference one another by using
key-values.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Identify data formats
Semi-structured Data:
Information that has some structure, but which allows for some variation
between entity instances. The representations nature of semi-structured data is
flexible
Common formats for semi-structured data:

JavaScript Object Notation (JSON)


XML
CSV files
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Identify data formats
Unstructured Data:
information that does not have a predefined format or organization, such as
text, images, videos, and social media posts
Module 1: Explore core data concepts
1- Explore Core Data Concepts

How do we store Data?


There are two broad categories of data store in common use:
• File stores

• Databases
Module 1: Explore core data concepts
1- Explore Core Data Concepts
• Introduction
• Identify data formats
• Explore file storage
• Explore databases
• Explore transactional data processing
• Explore analytical data processing
• Knowledge check
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage
The ability to store data in files is essential for any computing system.
Files can be stored:

• Locally (e.g., on hard disks or USB drives)


• Centrally, often in Cloud-based storage for cost-effective, secure, and reliable solutions.

The choice of file format depends on:


1.Data type (structured, semi-structured, or unstructured).
2.Applications/services that will read or process the data.
3.The need for human readability or optimized storage/processing.

Several common file formats are used depending on these requirements.


Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage
Delimited Text Files:
Data is stored in Plain Text Format with specific field Delimiters and ROW
TERMINATORS.

Comma-separated Values (CSV):


• Fields are separated by commas
• Rows are terminated by a carriage return / new line.

Other common formats include:


Tab-separated Values (TSV)
Space-delimited
Fixed-width Data in which each field is allocated a fixed number of
characters.

Delimited text is a good choice for structured data that needs to be


accessed by a wide range of applications and services in a human-readable
format.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage
JavaScript Object Notation (JSON)

JSON is Layered document format that represents


data as objects with multiple attributes.

It is flexible and works well for both structured and


semi-structured data.

JSON objects use {} (braces), collections use [] (square


brackets),
and attributes are represented as Name:Value pairs
separated by commas.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage

Extensible Markup Language (XML)


XML is a human-readable data format that
was popular in the 1990s and 2000s.

XML uses tags enclosed in angle-brackets


(<../>) to define elements and attributes
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage

Binary Large Object (BLOB)


For unstructured data, store the data as raw binary (0’s and 1’s) that must be
interpreted by applications and rendered.
Common types of data stored as binary include images, video, audio, and
application-specific documents.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage

Optimized File Formats


While human-readable formats for structured and semi-structured data can be
useful, they're typically not optimized for storage space or processing.

Over time, some specialized file formats that enable compression, indexing, and
efficient storage and processing have been developed.

Some common optimized file formats you might see include Avro, ORC, and Parquet
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage Optimized File Formats

Avro is a row-based format. It was created by


Apache.
Each record contains
• Header that describes the structure of the
data in the record. This header is stored as
JSON.
• The data is stored as binary information.

An application uses the information in the header to parse the binary data and extract the fields it contains.
Avro is a good format for compressing data and minimizing storage and network bandwidth requirements.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage Optimized File Formats

ORC (Optimized Row Columnar format)


organizes data into columns rather than rows.
It was developed by HortonWorks for optimizing
read and write operations in Apache Hive .

An ORC file contains stripes of data. Each stripe holds the data for a column or set of columns.
A stripe contains an index into the rows in the stripe, the data for each row, and a footer that holds
statistical information (count, sum, max, min, and so on) for each column.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage Optimized File Formats
Parquet
is a columnar storage format designed for efficient data
storage and processing, for big data applications,
offering efficient storage, fast read and write operations,
and strong support for complex data types.

• A Parquet file contains row groups.


• Data for each column is stored together in the same row group.
• Each row group contains one or more chunks of data.
• A Parquet file includes metadata that describes the set of rows found in each chunk.
• An application can use this metadata to quickly locate the correct chunk for a given set of
rows, and retrieve the data in the specified columns for these rows.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
• Introduction
• Identify data formats
• Explore file storage
• Explore databases
• Explore transactional data processing
• Explore analytical data processing
• Knowledge check
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore databases

A database is used to define a central system in


which data can be stored and queried.
In a simplistic sense, the file system on which files
are stored is a kind of database; but when we use
the term in a professional data context, we usually
mean a dedicated system for managing data
records rather than files.
Module 1: Explore core data concepts
1- Explore Core Data Concepts Relational Databases
Explore databases
• used to store and query Structured Data.
• The data is stored in Tables
• Each instance of an entity is assigned a primary key that uniquely
identifies it.
• keys are used to reference
the entity instance in other tables.

Example
Module 1: Explore core data concepts
1- Explore Core Data Concepts Non-Relational Databases
Explore databases
Non-relational databases are data management systems that don’t
apply a relational schema to the data.

Non-relational databases are often referred to as


NoSQL database, even though some support a
variant of the SQL language.
There are four common types of Non-relational
database commonly in use.
Module 1: Explore core data concepts
1- Explore Core Data Concepts Non-Relational Databases
Explore databases
Key-value Databases
in which each record consists of a
unique key and an associated
value, which can be in any format.
Module 1: Explore core data concepts
1- Explore Core Data Concepts Non-Relational Databases
Explore databases
Document Databases
are a specific form of key-value database in which the value
is a JSON document (which the system is optimized to
parse and query)
Module 1: Explore core data concepts
1- Explore Core Data Concepts Non-Relational Databases
Explore databases
Column Family Databases
store tabular data comprising rows and
columns

you can divide the columns into groups


known as column-families.

Each column family holds a set of


columns that are logically related
together.
Module 1: Explore core data concepts
1- Explore Core Data Concepts Non-Relational Databases
Explore databases
Graph Databases

store entities as nodes with links to define


relationships between them.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
• Introduction
• Identify data formats
• Explore file storage
• Explore databases
• Explore transactional data processing
• Explore analytical data processing
• Knowledge check
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore transactional data processing

A transactional system records transactions that encapsulate specific


events that the organization wants to track.

A transaction could be financial, such as the movement of money


between accounts in a banking system or a retail system, tracking
payments for goods and services from customers.

The work performed by transactional systems is often referred to as


Online Transactional Processing (OLTP).
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore transactional data processing

OLTP solutions rely on a database system in which data storage is


optimized for both read and write operations

CRUD operations:
Data records are created, retrieved, updated, and deleted

To ensure the integrity of the data stored in the database. The OLTP
systems enforce transactions should support so-called ACID semantics.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore transactional data processing

Atomicity Consistency

ACID
Durability
Isolation
Atomicity Consistency
Each transaction is treated as a single transactions can only take the data in
unit, which succeeds completely or fails the database from one valid state to
completely. another.

ACID
Isolation Durability
concurrent transactions cannot interfere when a transaction has been committed,
with one another, and must result in a it will remain committed.
consistent database state.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
• Introduction
• Identify data formats
• Explore file storage
• Explore databases
• Explore transactional data processing
• Explore analytical data processing
• Knowledge check
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore analytical data processing
Analytical data processing typically uses read-only systems that store
vast volumes of historical data or business metrics.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore analytical data processing
The most common architecture for enterprise-scale analytics looks like
this:

1- Operational data is extracted, transformed, and loaded (ETL) into a data


lake for analysis.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore analytical data processing
The most common architecture for enterprise-scale analytics looks like
this:

2- Data is loaded into a schema of tables - typically in a Spark-based data


lakehouse with tabular abstractions over files in the data lake, or a data
warehouse with a fully relational SQL engine.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore analytical data processing
The most common architecture for enterprise-scale analytics looks like
this:

3- Data in the data warehouse may be aggregated and loaded into an


online analytical processing (OLAP) model, or cube.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore analytical data processing
The most common architecture for enterprise-scale analytics looks like
this:

4- The data in the data lake, data warehouse, and analytical model can be
queried to produce reports, visualizations, and dashboards.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore analytical data processing

Data warehouses are an established Data lakes are common in large-scale data
way to store data in a relational analytical processing scenarios, where a
schema large volume of file-based data must be
optimized for read operations collected and analyzed.
primarily queries to support
reporting and data visualization.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore analytical data processing

Data Lakehouses are a more recent


innovation that combine the flexible and
scalable storage of a data lake with the
relational querying semantics of a data
warehouse.
The table schema may require some
denormalization of data in an OLTP data
source by introducing some duplication to
make queries perform faster
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore analytical data processing

Different types of user might perform data analytical work at different stages
of the overall architecture. For example:
•Data scientists might work directly with data files in a data lake to explore
and model data.
•Data Analysts might query tables directly in the data warehouse to produce
complex reports and visualizations.
•Business users might consume pre-aggregated data in an analytical model
in the form of reports or dashboards.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
• Introduction
• Identify data formats
• Explore file storage
• Explore databases
• Explore transactional data processing
• Explore analytical data processing
• Knowledge check
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Knowledge check
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts

Explore core data concepts • Explore data roles and services

Module 2 • Explore fundamental relational data concepts


• Explore relational database services in Azure
Explore relational data

Module 3 • Explore Azure Storage for non-relational data


• Explore fundamentals of Azure Cosmos DB
Explore non-relational data

Module 4 • Explore fundamentals of large-scale analytics


• Explore fundamentals of real-time analytics
Explore data analytics in Azure • Explore fundamentals of data visualization
Module 1: Explore core data concepts
2- Explore Data Roles And Services
• Introduction
• Explore job roles in the world of data
• Identify data services
• Knowledge check
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Introduction
The rapid growth of data over the past decade has led to the emergence of new roles and technologies in
the data field, impacting how data is managed and utilized.
Managing and working with data is a specialist skill that requires knowledge of multiple technologies.
Most organizations define job roles for the various tasks responsible for managing data.
In this module we will learn how to:
• Identify common data professional roles
• Identify common cloud services used by data professionals
Module 1: Explore core data concepts
2- Explore Data Roles And Services
• Introduction
• Explore job roles in the world of data
• Identify data services
• Knowledge check
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Explore job roles in the world of data
The three key job roles that deal with data in most organizations are:
• Database Administrators
manage databases, assigning permissions to users, storing backup copies of data, and restoring data in the
event of a failure.
• Data Engineers
manage infrastructure and processes for data integration across the organization, applying data cleaning
routines, identifying data governance rules, and implementing pipelines to transfer and transform data
between systems.
• Data Analysts
explore and analyze data to create visualizations and charts that enable organizations to make informed
decisions.
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Explore job roles in the world of data
The three key job roles that deal with data in most organizations are:
Database Administrators
A database administrator is responsible for the design, implementation, maintenance, and operational
aspects of on-premises and cloud-based database systems.

They're responsible for the overall availability and consistent performance and optimizations of
databases.

They work with stakeholders to implement policies, tools, and processes for backup and recovery plans
to recover following a natural disaster or human-made error.

The database administrator is also responsible for managing the security of the data in the database,
granting privileges over the data, granting or denying access to users as appropriate.
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Explore job roles in the world of data
The three key job roles that deal with data in most organizations are:
Data Engineer
A data engineer collaborates with stakeholders to design and implement data-related workloads,
including data ingestion pipelines, cleansing and transformation activities, and data stores for
analytical workloads.

They use a wide range of data platform technologies, including relational and non-relational databases,
file stores, and data streams.

They're also responsible for ensuring that the privacy of data is maintained within the cloud and spanning
from on-premises to the cloud data stores.

They own the management and monitoring of data pipelines to ensure that data loads perform as
expected.
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Explore job roles in the world of data
The three key job roles that deal with data in most organizations are:
Data Analyst
A data analyst enables businesses to maximize the value of their data assets.

They're responsible for exploring data to identify trends and relationships, designing and building
analytical models, and enabling advanced analytics capabilities through reports and visualizations.

A data analyst processes raw data into relevant insights based on identified business requirements to
deliver relevant insights.

There are additional data-related roles, such as data scientist and data architect; and there are other technical
professionals that work with data, including application developers and software engineers.
Module 1: Explore core data concepts
2- Explore Data Roles And Services
• Introduction
• Explore job roles in the world of data
• Identify data services
• Knowledge check
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services

Microsoft Azure is a cloud platform that powers the applications and IT infrastructure for
some of the world's largest organizations.

It includes many services to support cloud solutions, including transactional and analytical
data workloads.

Some of the most commonly used cloud services for data are described below.
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
• Introduction
• Explore job roles in the world of data
• Identify data services
• Knowledge check
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Knowledge check
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts

Module 2 • Explore fundamental relational data concepts


• Explore relational database services in Azure
Explore relational data

Module 3 • Explore Azure Storage for non-relational data


• Explore fundamentals of Azure Cosmos DB
Explore non-relational data

Module 4 • Explore fundamentals of large-scale analytics


• Explore fundamentals of real-time analytics
Explore data analytics in Azure • Explore fundamentals of data visualization
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts

Module 2 • Explore fundamental relational data concepts


• Explore relational database services in Azure
Explore relational data

Module 3 • Explore Azure Storage for non-relational data


• Explore fundamentals of Azure Cosmos DB
Explore non-relational data

Module 4 • Explore fundamentals of large-scale analytics


• Explore fundamentals of real-time analytics
Explore data analytics in Azure • Explore fundamentals of data visualization
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
• Introduction
• Understand relational data
• Understand normalization
• Explore SQL
• Describe database objects
• Knowledge check
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Introduction

In the early years of computing, different applications used unique data structures, which were
inefficient and hard to maintain, leading to the development of the relational database model.

This model uses tables to store and query data in a standardized, efficient way, and is widely used
across organizations to manage structured, related information.

In this module we will learn how to:


• Identify characteristics of relational data
• Define normalization
• Identify types of SQL statement
• Identify common relational database objects
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
• Introduction
• Understand relational data
• Understand normalization
• Explore SQL
• Describe database objects
• Knowledge check
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Understand Relational Data

In a relational database, you model collections of entities from the real world as tables.

An entity can be anything for which you want to record information, objects and events.

A table contains rows, and each row represents a single instance of an entity.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Understand Relational Data

Example
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Understand Relational Data

Each column stores data of a


specific datatype:

Text (string/char)
Decimal numeric (float)
Integer numeric
Date/Time values
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
• Introduction
• Understand relational data
• Understand normalization
• Explore SQL
• Describe database objects
• Knowledge check
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Understand Normalization
Normalization is a design process that minimizes data duplication and
enforces data integrity.

The simple definition for practical normalization is:

1. Separate each entity into its own table.


2. Separate each discrete attribute into its own column.
3. Uniquely identify each entity instance (row) using a primary key.
4. Use foreign key columns to link related entities.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Understand Normalization

Notice that the customer and product details are duplicated for each individual item sold; and that the customer
name and postal address, and the product name and price are combined in the same spreadsheet cells.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Understand Normalization

Normalization changes
the way the data is stored
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
• Introduction
• Understand relational data
• Understand normalization
• Explore SQL
• Describe database objects
• Knowledge check
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Explore SQL
SQL stands for Structured Query Language, and is used to
communicate with a relational database.

It's the standard language for relational database management


systems

Some common relational database management systems that use


SQL include Microsoft SQL Server, MySQL, PostgreSQL, MariaDB,
and Oracle
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Explore SQL
SQL statement types
SQL statements are grouped into three main logical groups:
•Data Definition Language (DDL)

•Data Control Language (DCL)

•Data Manipulation Language (DML)


Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Explore SQL
SQL statement types

Data Definition Language (DDL) statements:


we use DDL statements to create, modify, and remove tables and
other objects in a database (table, stored procedures, views, and so
on).
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
Data Definition Language (DDL) statements:
The most common DDL statements are:
Statement Description
CREATE Create a new object in the database, such as a table or a view.
ALTER Modify the structure of an object. For instance, altering a table to add a new column.
DROP Remove an object from the database.
RENAME Rename an existing object.

The DROP statement is very powerful. When you drop a table, all the
rows in that table are lost.
Unless you have a backup, you won't be able to retrieve this data.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
Data Definition Language (DDL) statements:
Example: CREATE
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Control Language (DCL) statements:

Database administrators generally use DCL statements to manage


access to objects in a database by granting, denying, or revoking
permissions to specific users or groups.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Control Language (DCL) statements:
The three main DCL statements are:

Statement Description
GRANT Grant permission to perform specific actions
DENY Deny permission to perform specific actions
REVOKE Remove a previously granted permission
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Control Language (DCL) statements:
Example: GRANT
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Manipulation Language (DML) statements:
We use DML statements to manipulate the rows in tables.

These statements enable you to retrieve (query) data, insert new


rows, or modify existing rows. You can also delete rows if you don't
need them anymore.
The four main DML statements are:
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Manipulation Language (DML) statements:
The four main DML statements are:
Statement Description
SELECT Read rows from a table (applied to every row in a table)
INSERT Insert new rows into a table (one row at a time)
UPDATE Modify data in existing rows (applied to every row in a table)
DELETE Delete existing rows (applied to every row in a table)
Usually apply a WHERE clause with these statements to specify criteria; only rows that match these criteria will be
selected, updated, or deleted.
Be careful when using DELETE or UPDATE without a WHERE clause because you can lose or modify a lot of data
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Manipulation Language (DML) statements:

Example 1: SELECT all columns from the table


Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Manipulation Language (DML) statements:

Example 2: SELECT specific columns


Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Manipulation Language (DML) statements:

Example 3: SELECT with Sorting the results “ORDER BY”


Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Manipulation Language (DML) statements:

Example 4: SELECT with the JOIN condition


Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Manipulation Language (DML) statements:
If you forget the WHERE clause, an UPDATE statement will
Example : UPDATE modify every row in the table.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Manipulation Language (DML) statements:
If you forget the WHERE clause, a DELETE statement will
Example : DELETE remove every row from the table.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Manipulation Language (DML) statements:

Example : INSERT INTO


Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types

DROP Vs DELETE

• DDL statement • DML statement


• Removes an entire database object • Removes rows (data) from a database object
while keeping the table structure intact

If you want to learn more about querying data with SQL


Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
• Introduction
• Understand relational data
• Understand normalization
• Explore SQL
• Describe database objects
• Knowledge check
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Describe database objects


In addition to Tables, a relational database can contain:

• Views
• Stored Procedures
• Indexes

which help optimize data organization, encapsulate programmatic


actions, and improve query performance.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Describe database objects


• Views:
A view is like a virtual table created from the results of a SELECT query. It
lets you see specific data from one or more tables as a single object
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Describe database objects


• Views:
we can query a view and filter the data in much the same way as you
would with a table
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Describe database objects


• Stored Procedures:

A stored procedure is a set of SQL statements that can be executed


whenever needed. It is used to encapsulate logic in a database for tasks
that applications frequently perform when working with data.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Describe database objects
• Stored Procedures:

We can execute the stored procedure, passing the ID of the product and the new name to be assigned
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Describe database objects


• Indexes :

An index in a database is a data structure that improves the speed of


data retrieval operations by providing a quick way to look up values in a
table, similar to an index in a book.

It stores a sorted copy of specific column values along with pointers to


the corresponding rows, making it faster to find data without scanning
the entire table.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
for tables with few rows, indexes
Describe database objects are not efficient ! Only use them
with tables containing many rows!
• Indexes :
The index creates a tree-based structure that the database system's
query optimizer can use to quickly find rows in the Product table based
on a specified Name.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
• Introduction
• Understand relational data
• Understand normalization
• Explore SQL
• Describe database objects
• Knowledge check
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Knowledge check
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts

Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts

Module 2 • Explore fundamental relational data concepts

Explore relational data • Explore relational database services in Azure

Module 3 • Explore Azure Storage for non-relational data


• Explore fundamentals of Azure Cosmos DB
Explore non-relational data

Module 4 • Explore fundamentals of large-scale analytics


• Explore fundamentals of real-time analytics
Explore data analytics in Azure • Explore fundamentals of data visualization
Module 2: Explore Relational Data
2- Explore relational database services in Azure
• Introduction
• Describe Azure SQL services and capabilities
• Describe Azure services for open-source databases
• Exercise: Explore Azure relational database services
• Knowledge check
Module 2: Explore Relational Data
2- Explore relational database services in Azure

Introduction
Azure supports a range of database services that you can use to support new cloud applications or migrate
existing applications to the cloud.

In this module, we will learn how to:


•Identify options for Azure SQL services
•Identify options for open-source databases in Azure
•Provision a database service on Azure
Module 2: Explore Relational Data
2- Explore relational database services in Azure
• Introduction
• Describe Azure SQL services and capabilities
• Describe Azure services for open-source databases
• Exercise: Explore Azure relational database services
• Knowledge check
Module 2: Explore Relational Data
2- Explore relational database services in Azure

Describe Azure SQL Services And Capabilities


Azure SQL is a collection of Microsoft SQL Services in Azure.

Azure SQL services include:

• SQL Server on Azure Virtual Machines (VMs)


• Azure SQL Managed Instance
• Azure SQL Database
• Azure SQL Edge: A SQL engine that is optimized for Internet-of-things (IoT) scenarios
that need to work with streaming time-series data.
Module 2: Explore Relational Data
2- Explore relational database services in Azure

Describe Azure SQL Services And Capabilities


• SQL Server on Azure Virtual Machines (VMs):
➢ Type of cloud service : IaaS
➢ SQL Server compatibility: Fully compatible on-premises physical and virtualized installations
➢ Architecture : SQL Server instances are installed in a virtual machine
➢ Availability: 99.99% !
➢ Management: You must manage all aspects of the server, including operating system and SQL Server
updates, configuration, backups, and other maintenance tasks. Why ? => IaaS

Use this option when you need to migrate or extend an on-premises SQL Server solution and retain full
control over all aspects of server and database configuration.
Module 2: Explore Relational Data
2- Explore relational database services in Azure

Describe Azure SQL Services And Capabilities


• Azure SQL Managed Instance:
➢ Type of cloud service : PaaS
➢ SQL Server compatibility: Near 100% compatible with SQL Server to Migrate on-premises DB using
Azure Database Migration service
➢ Architecture : Each managed instance can support multiple databases.
➢ Availability: 99.99% !
➢ Management: Fully automated updates, backups, and recovery. Why ? => PaaS

Use this option for most cloud migration scenarios, particularly when you need minimal changes to existing
applications. When you want to lift-and-shift an on-premises SQL Server instance and all its databases to the
cloud
Module 2: Explore Relational Data
2- Explore relational database services in Azure

Describe Azure SQL Services And Capabilities


• Azure SQL Database:
➢ Type of cloud service : PaaS
➢ SQL Server compatibility: Supports most core database-level capabilities of SQL Server. Some features
depended on by an on-premises application may not be available.
➢ Architecture :
▪ provision a single database in a dedicated, managed (logical) server
▪ you can use an elastic pool to share resources across multiple databases and take advantage of on-
demand scalability.
➢ Availability: 99.995% !
➢ Management: Fully automated updates, backups, and recovery. Why ? => PaaS

Use this option for new cloud solutions, or to migrate applications that have minimal instance-level
dependencies.
Module 2: Explore Relational Data
2- Explore relational database services in Azure
• Introduction
• Describe Azure SQL services and capabilities
• Describe Azure services for open-source databases
• Exercise: Explore Azure relational database services
• Knowledge check
Module 2: Explore Relational Data
2- Explore relational database services in Azure

Describe Azure services for open-source databases


Azure data services are available for other popular relational database systems,
such as:

• MySQL: Azure Database for MySQL


• MariaDB : Azure Database for MariaDB
• PostgreSQL: Azure Database for PostgreSQL

MySQL, MariaDB, and PostgreSQL are relational database management systems


that are tailored for different specializations
Module 2: Explore Relational Data
2- Explore relational database services in Azure

Describe Azure services for open-source databases


• MySQL: Azure Database for MySQL

MySQL is the leading open-source relational database for Linux, Apache, MySQL, and PHP (LAMP)
stack apps.
Azure Database for MySQL is a PaaS implementation of MySQL in the Azure cloud, based on the
MySQL Community Edition
Benefits:
•High availability features built-in. •Automatic backups and point-in-time restore for
•Predictable performance. the last 35 days.
•Easy scaling that responds quickly to demand. •Enterprise-level security and compliance with
legislation.
•Secure data, both at rest and in motion.
monitoring functionality to add alerts, and to
view metrics and logs.
The system uses pay-as-you-go pricing so you only pay for what you use.
Module 2: Explore Relational Data
2- Explore relational database services in Azure

Describe Azure services for open-source databases


• MariaDB : Azure Database for MariaDB
MariaDB is a newer database management system, created by the original developers of MySQL. The
database engine has since been rewritten and optimized to improve performance and to support
temporal data. A table can hold several versions of data
Azure Database for MariaDB is an implementation of the MariaDB database management system
adapted to run in Azure. It's based on the MariaDB Community Edition.
Benefits:

•Built-in high availability with no additional cost. •Secured protection of sensitive data at rest and in
•Predictable performance, using inclusive pay-as- motion.
you-go pricing. •Automatic backups and point-in-time-restore for up
•Scaling as needed within seconds. to 35 days.

•Enterprise-grade security and compliance.


Module 2: Explore Relational Data
2- Explore relational database services in Azure

Describe Azure services for open-source databases


• Azure Database for PostgreSQL

PostgreSQL is a hybrid relational-object database. You can store data in relational tables, but a
PostgreSQL database also enables you to store custom data types, with their own non-relational
properties.
Azure Database for PostgreSQL is a PaaS implementation of PostgreSQL in the Azure Cloud.
But Some features of on-premises PostgreSQL databases aren't available in Azure Database for
PostgreSQL
Benefits:

This service provides the same availability, performance, scaling, security, and administrative
benefits as the MySQL service.
It contains built-in failure detection and failover mechanisms => highly available service.
Module 2: Explore Relational Data
2- Explore relational database services in Azure
• Introduction
• Describe Azure SQL services and capabilities
• Describe Azure services for open-source databases
• Exercise: Explore Azure relational database services
• Knowledge check
Module 2: Explore Relational Data
2- Explore relational database services in Azure

Exercise: Explore Azure relational database services

you will need an Azure subscription in which you have


administrative access.
Module 2: Explore Relational Data
2- Explore relational database services in Azure
• Introduction
• Describe Azure SQL services and capabilities
• Describe Azure services for open-source databases
• Exercise: Explore Azure relational database services
• Knowledge check
Module 2: Explore Relational Data
2- Explore relational database services in Azure

Knowledge check
Module 2: Explore Relational Data
2- Explore relational database services in Azure

Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts

Module 2 • Explore fundamental relational data concepts


• Explore relational database services in Azure
Explore relational data

Module 3 • Explore Azure Storage for non-relational data


• Explore fundamentals of Azure Cosmos DB
Explore non-relational data

Module 4 • Explore fundamentals of large-scale analytics


• Explore fundamentals of real-time analytics
Explore data analytics in Azure • Explore fundamentals of data visualization
Module 3: Explore Non-Relational Data
1- Explore Azure Storage for non-relational data
• Introduction
• Explore Azure blob storage
• Explore Azure Data Lake Storage Gen2
• Explore Microsoft OneLake in Fabric
• Explore Azure Files
• Explore Azure Tables
• Exercise: Explore Azure Storage
• Knowledge check
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Introduction
Many applications today don't need the rigid structure of a relational database and rely on non-relational
storage.

Azure Storage is a key service in Microsoft Azure, and enables a wide range of data storage scenarios and
solutions.

In this module, we will learn how to:


•Describe features and capabilities of Azure blob storage
•Describe features and capabilities of Azure Data Lake Gen2
•Describe features and capabilities of Microsoft OneLake
•Describe features and capabilities of Azure file storage
•Describe features and capabilities of Azure table storage
•Provision and use an Azure Storage account
Module 3: Explore Non-Relational Data
1- Explore Azure Storage for non-relational data
• Introduction
• Explore Azure blob storage
• Explore Azure Data Lake Storage Gen2
• Explore Microsoft OneLake in Fabric
• Explore Azure Files
• Explore Azure Tables
• Exercise: Explore Azure Storage
• Knowledge check
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure blob storage
Blob is a Binary large object used to store massive amount of Unstructured Data in cloud-based storage.

Azure Blob Storage is a service that enables you to do so ! Blobs are stored in containers.

Azure Blob Storage supports three different types of blob:


•Block blobs

•Page blobs

•Append blobs
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure blob storage
• Block blobs: a set of blocks.
➢ One block size: up to 4000 MiB.
➢ A block blob : up to 190.7 TiB : 4000 MiB X 50,000 blocks.

The block is the smallest amount of data that can be read or written as an
individual unit.
Block blobs are best used to store discrete, large, binary objects that change
infrequently.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure blob storage
• Page blobs: a collection of fixed size 512-byte pages.

A page blob is optimized to support Random Read and Write operations.


A page blob can hold up to 8 TB of data.

Azure uses page blobs to implement virtual disk storage for virtual machines
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure blob storage
• Append blobs: a block blob optimized to support append operations.

You can only add blocks to the end of an append blob.


Updating or deleting existing blocks isn't supported.

Each block can vary in size, up to 4 MB.


The maximum size of an append blob is just over 195 GB
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure blob storage
Blob Storage Access Tiers:
The Hot tier: by default, for frequently accessed blobs. The blob data is stored on high-
performance media.

The Cool tier: for infrequently accessed data. it has lower performance and lower cost comparing
to Hot tier Storage.
You can migrate a blob from the Cool tier to the Hot tier or vice-versa depending on your access
frequency to the Blob
The Archive tier: provides the lowest storage cost but with the most latency (Hours). It’s used for
historical data that mustn't be lost, but is required only rarely. Blobs in the Archive tier are
effectively stored in an offline state. a blob from the Archive should be rehydrated to a Hot or a
Cool state. You can read the blob only when the rehydration process is complete.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure blob storage
Blob Storage lifecycle management policies :
A lifecycle management policy can automatically move a blob from Hot to Cool, and then to the
Archive tier, as it ages and is used less frequently
The policy is based on the number of days since modification.
A lifecycle management policy can also arrange to delete outdated blobs.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage for non-relational data
• Introduction
• Explore Azure blob storage
• Explore Azure Data Lake Storage Gen2
• Explore Microsoft OneLake in Fabric
• Explore Azure Files
• Explore Azure Tables
• Exercise: Explore Azure Storage
• Knowledge check
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure Data Lake Storage Gen2
Azure Data Lake Storage Gen2 is the new version of service for
hierarchical data storage for analytical data lakes that is integrated into
Azure Storage
Advantages:
▪ The scalability of blob storage
▪ The cost-control of storage tiers
▪ The hierarchical file system capabilities
▪ The compatibility with major analytics systems of Azure Data Lake Store
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure Data Lake Storage Gen2
To create an Azure Data Lake Store Gen2 files system:

1- create a storage account


2-enable the Hierarchical Namespace option of an Azure Storage account.

or you can upgrade an existing Azure Storage account to support Data Lake
Gen2.

This upgrade is a one-way process! :


after upgrading a storage account to support a hierarchical namespace for
blob storage, you can’t revert it to a flat namespace.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage for non-relational data
• Introduction
• Explore Azure blob storage
• Explore Azure Data Lake Storage Gen2
• Explore Microsoft OneLake in Fabric
• Explore Azure Files
• Explore Azure Tables
• Exercise: Explore Azure Storage
• Knowledge check
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Microsoft OneLake in Fabric
Microsoft Fabric automatically provisions OneLake, built upon Azure Data Lake Gen 2

OneLake is a single, unified, logical data lake designed for your entire organization.

OneLake comes automatically with every


Microsoft Fabric tenant and serves as the central
repository for all your analytics data.

OneLake:
➢ Supports any type of file and data (structured
or unstructured)
➢ Allows you to use the same data across
multiple analytical engines without data
movement or duplication.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Microsoft OneLake in Fabric
Microsoft Fabric automatically provisions OneLake, built upon Azure Data Lake Gen 2

Key Benefits of OneLake


➢ Organization-wide data lake
your entire organization shares a single data lake.
➢ Distributed ownership and collaboration
create workspaces, enable different parts of your organization to manage their data items.
This distributed ownership promotes collaboration and maintain governance boundaries.
➢ Open and Compatible Built on Azure Data Lake Storage (ADLS) Gen2:
Stores data in Delta Parquet format. It supports existing ADLS Gen2 APIs and SDKs, making it
compatible with your current applications.
➢ Easy to navigate
It's straightforward to navigate OneLake data from Windows using OneLake file explorer
Module 3: Explore Non-Relational Data
1- Explore Azure Storage for non-relational data
• Introduction
• Explore Azure blob storage
• Explore Azure Data Lake Storage Gen2
• Explore Microsoft OneLake in Fabric
• Explore Azure Files
• Explore Azure Tables
• Exercise: Explore Azure Storage
• Knowledge check
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure Files
Azure Files is a cloud-based network shares for documents and other files
and make them available to multiple users.

Benefits of file shares in Azure:


▪ eliminate hardware costs and maintenance overhead
▪ high availability and scalable cloud storage for files.

Azure Files enables you to share up to 100 TB of data in a single storage account.
The maximum size of a single file is 1 TB.
Azure File Storage supports up to 2000 concurrent connections per shared file
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure Files
How to upload files on Azure Files?

1- Azure portal: is a web-based interface provided by Microsoft Azure that allows users to
manage and interact with Azure resources.
You can manually upload, download, and manage files in Azure File Storage

2- AzCopy Utility: is a command-line tool designed for high-speed data transfer to and from
Azure Storage.
Useful for bulk uploads and downloads of files to/from Azure File Storage. It supports scripting and
automation, making it ideal for large-scale operations or repetitive tasks.

3- Azure File Sync: is a service that enables synchronization between on-premises file servers
and Azure File Storage.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure Files
Azure File Storage offers two Performance Tiers:

The Standard tier uses hard disk-based hardware in a datacenter

The Premium tier uses solid-state disks.


The Premium tier offers greater throughput, but is charged at a higher rate.

Azure Files supports two Network File Sharing Protocols:


•Server Message Block (SMB) file sharing : multiple operating systems (Windows, Linux, macOS).
•Network File System (NFS) only by some Linux and macOS versions. Only for a premium tier
storage account.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage for non-relational data
• Introduction
• Explore Azure blob storage
• Explore Azure Data Lake Storage Gen2
• Explore Microsoft OneLake in Fabric
• Explore Azure Files
• Explore Azure Tables
• Exercise: Explore Azure Storage
• Knowledge check
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure Tables

Azure Table Storage is a NoSQL storage solution.


It makes use of tables containing key/value data items.

An Azure Table is not like Table in Relational DB.

Azure Table enables you to store Semi-Structured Data

All rows in a table must have a unique key (composed of a


partition key and a row key)

A Timestamp column records the date and time of a


modification
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure Tables

Azure Table Storage Partitioning:


Partitioning is a mechanism for grouping related rows, based on a common property or partition key
Rows that share the same partition key will be stored together.

Partitioning helps to ensure fast access, organize data and improve scalability and performance

Partitions are independent from each other, and can grow or shrink as rows are added to,
or removed from, a partition. A table can contain any number of partitions.

When the partition key is used in the search criteria, this helps to narrow down the search
process and improves performance.

Azure Table Storage tables have no concept of foreign keys, relationships, stored
procedures, views, or other objects you might find in a relational database.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage for non-relational data
• Introduction
• Explore Azure blob storage
• Explore Azure Data Lake Storage Gen2
• Explore Microsoft OneLake in Fabric
• Explore Azure Files
• Explore Azure Tables
• Exercise: Explore Azure Storage
• Knowledge check
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Exercise: Explore Azure Storage
Module 3: Explore Non-Relational Data
1- Explore Azure Storage for non-relational data
• Introduction
• Explore Azure blob storage
• Explore Azure Data Lake Storage Gen2
• Explore Microsoft OneLake in Fabric
• Explore Azure Files
• Explore Azure Tables
• Exercise: Explore Azure Storage
• Knowledge check
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts

Module 2 • Explore fundamental relational data concepts


• Explore relational database services in Azure
Explore relational data

Module 3 • Explore Azure Storage for non-relational data

Explore non-relational data • Explore fundamentals of Azure Cosmos DB

Module 4 • Explore fundamentals of large-scale analytics


• Explore fundamentals of real-time analytics
Explore data analytics in Azure • Explore fundamentals of data visualization
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
• Introduction
• Describe Azure Cosmos DB
• Identify Azure Cosmos DB APIs
• Exercise: Explore Azure Cosmos DB
• Knowledge check
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Introduction
NoSQL traditionally stands for "Not Only SQL“
NoSQL databases extend SQL databases capabilities and store data in flexible
structures, such as documents, graphs, key-value stores, and column family
stores
Azure Cosmos DB provides a global-scale database solution for non-relational
data.
In this module, we will learn how to:
• Describe key features and capabilities of Azure Cosmos DB
• Identify Azure Cosmos DB APIs
• Provision and use an Azure Cosmos DB instance
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
• Introduction
• Describe Azure Cosmos DB
• Identify Azure Cosmos DB APIs
• Exercise: Explore Azure Cosmos DB
• Knowledge check
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Describe Azure Cosmos DB
Azure Cosmos DB is a globally distributed, highly scalable and fully managed NoSQL
database management system provided by Microsoft Azure .
Azure Cosmos DB supports multiple application programming interfaces (APIs)
enabling developers to use Cosmos DB to store and query data using APIs without
the need to learn new programming semantics.

Cosmos DB uses indexes and partitioning to provide fast read


and write performance and can scale to massive volumes of
data.
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Describe Azure Cosmos DB
Cosmos DB automatically allocates space in a container for your partitions up to 10 GB in size.
Indexes are created and maintained automatically.
Use cases of Azure Cosmos DB:
Cosmos DB has been used by many of Microsoft's products including Skype, Xbox, Microsoft 365,
Azure, and many others.
Cosmos DB is highly suitable for:
•IoT and telematics for Azure Machine Learning, Microsoft Fabric, and Power BI. Or
real-time processing using Azure Functions that are triggered as data arrives in the
database.
•Retail and marketing : Windows Store and Xbox Live.
•Gaming
•Web and mobile applications: The Cosmos DB SDKs can be used to build rich iOS
and Android applications
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
• Introduction
• Describe Azure Cosmos DB
• Identify Azure Cosmos DB APIs
• Exercise: Explore Azure Cosmos DB
• Knowledge check
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Identify Azure Cosmos DB APIs
Developers can build and migrate applications fast using their preferred open-source database engines,
including PostgreSQL, MongoDB, and Apache Cassandra …

• Azure Cosmos DB for NoSQL


• Azure Cosmos DB for MongoDB
• Azure Cosmos DB for PostgreSQL
• Azure Cosmos DB for Table
• Azure Cosmos DB for Apache Cassandra
• Azure Cosmos DB for Apache Gremlin
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Identify Azure Cosmos DB APIs
• Azure Cosmos DB for NoSQL
Azure Cosmos DB for NoSQL is a native non-relational service for working
with the document data model.
It manages data in JSON document format, and despite being a NoSQL data
storage solution, it uses SQL syntax to work with the data.
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Identify Azure Cosmos DB APIs
• Azure Cosmos DB for MongoDB
MongoDB is a popular open-source database in which data is stored in Binary JSON (BSON)
format.
Azure Cosmos DB for MongoDB enables developers to use MongoDB client libraries and code
to work with data in Azure Cosmos DB.
MongoDB Query Language (MQL) uses a compact, object-oriented syntax in which
developers use objects to call methods. The results of this query consist of JSON documents
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Identify Azure Cosmos DB APIs
• Azure Cosmos DB for PostgreSQL
Azure Cosmos DB for PostgreSQL is a Native Postgresql, globally distributed relational
database that automatically shards data to help you build highly scalable apps.

PostgreSQL is a Relational Database Management System (RDBMS) in which you define


relational tables of data

ProductID ProductName Price

123 Hammer 2.99 ProductID ProductName Price

162 Screwdriver 3.49 123 Hammer 2.99


Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Identify Azure Cosmos DB APIs
• Azure Cosmos DB for Table
Azure Cosmos DB for Table is used to work with data in key-value tables,
similar to Azure Table Storage.
It offers greater scalability and performance than Azure Table Storage.
We use the Table API through one of the language-specific
SDKs to make calls to a service endpoint and to retrieve
Partition RowKe Name Email
Key y data from the table
1 123 Joe Jones joe@litware.com https://endpoint/Customers(PartitionKey='1',RowKey='124')
1 124 Samir Nadoy samir@northwind.com

1 124 Samir Nadoy samir@northwind.com


Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Identify Azure Cosmos DB APIs
• Azure Cosmos DB for Apache Cassandra
Apache Cassandra is a popular open-source database that uses a column-family storage
structure.
Column families are tables, similar to those in a relational database, with the exception that it's
not mandatory for every row to have the same columns.
Cassandra supports a syntax based on SQL
ProductID ProductName Price

123 Hammer 2.99 ProductID ProductName Price

162 Screwdriver 3.49 123 Hammer 2.99


Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Identify Azure Cosmos DB APIs
• Azure Cosmos DB for Apache Gremlin
Apache Gremlin is a graph traversal language and query framework used to interact
with graph databases.
Azure Cosmos DB for Apache Gremlin is used with data in a graph structure; in which
entities are defined as vertices that form nodes in connected graph.
Nodes are connected by edges that represent relationships

Gremlin syntax includes functions to operate on vertices and


edges, enabling you to insert, update, delete, and query data
in the graph.
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
• Introduction
• Describe Azure Cosmos DB
• Identify Azure Cosmos DB APIs
• Exercise: Explore Azure Cosmos DB
• Knowledge check
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Exercise: Explore Azure Cosmos DB
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
• Introduction
• Describe Azure Cosmos DB
• Identify Azure Cosmos DB APIs
• Exercise: Explore Azure Cosmos DB
• Knowledge check
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Knowledge check
Module 3: Explore Non-Relational Data
2- Explore fundamentals of Azure Cosmos DB
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts

Module 2 • Explore fundamental relational data concepts


• Explore relational database services in Azure
Explore relational data

Module 3 • Explore Azure Storage for non-relational data


• Explore fundamentals of Azure Cosmos DB
Explore non-relational data

Module 4 • Explore fundamentals of large-scale analytics


• Explore fundamentals of real-time analytics
Explore data analytics in Azure • Explore fundamentals of data visualization
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
• Introduction
• Describe data warehousing architecture
• Explore data ingestion pipelines
• Explore analytical data stores
• Exercise: Explore data analytics in Microsoft Fabric
• Knowledge check
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Introduction
Big data analytics combines two approaches:

•Traditional data warehousing: Data is copied from systems into a database


designed for easy reporting and analysis.
•Big data analytics: Handles massive amounts of different types of data, either in
batches or real-time, stored in a data lake and processed with tools like Apache
Spark.
By combining the strengths of both, we get a data lakehouse: a system that stores
all kinds of data (like a data lake) but also supports easy analysis (like a data
warehouse).
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Introduction
In this module, we will learn how to:
•Identify common elements of a large-scale data warehousing solution

•Describe key features for data ingestion pipelines

•Identify common types of analytical data store and related Azure services

•Provision Microsoft Fabric and use it to ingest, process, and query data
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
• Introduction
• Describe data warehousing architecture
• Explore data ingestion pipelines
• Explore analytical data stores
• Exercise: Explore data analytics in Microsoft Fabric
• Knowledge check
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
Almost all Large-scale data analytics architecture include the following steps:
1- Data Ingestion And Preprocessing
2- Store Data For Analysis
3- Analytical Data Model
4- Data Visualization
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
1- Data ingestion and preprocessing:

• Data is being collected / ingested from different sources: Transactional


Systems, Files, or Real-time Streams
• Then data is loaded into a Data Lake or Relational Data Warehouse for analysis, This
process typically involves 2 methods:
• ETL (Extract, Transform, Load) (Traditional data warehouse/ Low Data Volumes)
or
• ELT (Extract, Load, Transform) (Modern cloud warehouses or data lakes: Large
Volumes of Data / Real-Time or Streaming Data)

• ETL (Extract, Transform, Load): Data is transformed (cleaned, filtered, and restructured) before being loaded
into the analytical store.
• ELT (Extract, Load, Transform): Data is loaded into the store first and then transformed within it.

➔ In both cases, the goal is to optimize the data structure for analytical queries.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
1- Data Ingestion And Preprocessing:

Data ingestion can be done in two ways:

•Batch Processing: For static data processed in chunks.


Example: Generating monthly financial reports

•Real-Time Processing: For continuous streams of incoming data.


Example: Real-time data ingestion for fraud detection

This combination ensures efficient handling and analysis of both historical and live data.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
2- Analytical Data Store:
An Analytical Data Store is a place where data is stored to be analyzed and to find useful insights.
There are three main types:
1. Data Warehouses: These store data in Tables (like a spreadsheet) for Structured Data, such as sales
reports or customer records. Example: A company using a relational database to track yearly sales.
2. Data Lakes: These store All Types Of Data: Structured, Semi-structured, and Unstructured, such
as files, videos, or raw logs. Example: A social media platform saving raw images, text posts, and videos
in a data lake for later analysis.
3. Data Lakehouses: These combine the best of both: a data lake’s ability to store all kinds of data and
a data warehouse’s ability to analyze it efficiently. Example: An e-commerce site using a lakehouse to
manage both raw user activity logs and structured sales data.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
3- Analytical Data Model:

An analytical data model is a way to organize data to make it easier for analysts to create reports,
dashboards, or visualizations. Instead of working directly with raw data, a model pre-summarizes the
data to save time and simplify analysis.
Often these data models are described as cubes, in which numeric data values are aggregated across
one or more dimensions (product, region, time).
Example: Imagine you want to find total sales by product and region. Instead of calculating it from
raw data every time, the model stores this information ready-made.

Drill-Down/Drill-Up: You can zoom in for more detail (e.g., sales for a specific region) or zoom out for
a broader view (e.g., total sales across all regions).

These models make exploring and understanding data faster and more interactive!
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
4- Data Visualization:

Data visualization is the process of turning data into VISUAL FORMATS like charts,
graphs, or dashboards to make it easier to understand and analyze.
•Who Uses It?
•Data analysts: Use data from analytical models or stores to create visual reports and dashboards.
•Non-technical users: Can perform self-service analysis to create their own reports using simple
tools.
•What Does It Show?
•Trends: E.g., how sales are increasing or decreasing over time.
•Comparisons: E.g., comparing performance across products or regions.
•Key Performance Indicators (KPIs): E.g., metrics like total revenue, customer retention rate, or
profit margins.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
4- Data Visualization:

Data visualization is the process of turning data into VISUAL FORMATS like charts,
graphs, or dashboards to make it easier to understand and analyze.

Formats:
•Printed reports or charts in documents.
•Slides in PowerPoint presentations.
•Interactive dashboards on the web where users can explore the data visually.

Data visualization makes it easier for everyone to understand complex information


and make better decisions!
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
• Introduction
• Describe data warehousing architecture
• Explore data ingestion pipelines
• Explore analytical data stores
• Exercise: Explore data analytics in Microsoft Fabric
• Knowledge check
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Explore Data Ingestion Pipelines
A Data Ingestion Pipelines is a series of steps /activities that move and transform data from
one or more sources into an analytical data store.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Explore Data Ingestion Pipelines
Building Pipelines on Azure

On Azure, data ingestion pipelines are typically built using:

• Azure Data Factory for orchestrating ETL workflows


or
• Microsoft Fabric for managing all components of the data
solution in a unified workspace.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Explore Data Ingestion Pipelines
Building Pipelines on Azure
Pipelines consist of linked services, which connect each step of the process to
different technologies.
Example:
Azure Blob Azure SQL Azure Azure
Storage Database Databricks Functions

• can be used to • can run stored • can handle • can be used to


load raw data procedures to distributed data execute custom
process the data processing or logic.
apply custom
transformations.
you can save the output dataset in a linked service such as Microsoft Fabric.
Pipelines can also include some built-in activities, which don’t require a linked service.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
• Introduction
• Describe data warehousing architecture
• Explore data ingestion pipelines
• Explore analytical data stores
• Exercise: Explore data analytics in Microsoft Fabric
• Knowledge check
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Explore Analytical Data Stores
There are two common types of analytical data store:

Data Warehouses Data Lakes

• Store data in a structured, relational format optimized for • Store large volumes of raw, unstructured, semi-
analytics. structured, or structured data in a distributed file system.
• Use a star schema with fact tables (one dimension) and • Use a schema-on-read approach, where the structure is
dimension tables (customer, product, time). applied when the data is read, rather than when it's
• Suited for structured transactional data and use SQL for stored.
querying. • ideal for handling diverse data types and supporting
• This approach allows for complex aggregations and analysis, advanced analytics or machine learning without requiring
making it ideal for business intelligence and reporting. predefined schemas.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Explore Analytical Data Stores
Hybrid approaches: Data Lakehouses

A Data Lakehouse combines the features of both data lakes and data warehouses.

Raw data is stored as files in a data lake, and SQL analytics endpoints, such as those in Microsoft Fabric, expose the
data as tables, allowing you to query it with SQL.

This hybrid model adds relational storage capabilities to Spark-based systems, enabling schema enforcement,
transactional consistency, and support for both batch and streaming data sources by providing a SQL API for
querying.
This approach provides the flexibility of a data lake with the structured querying power of a data warehouse.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Explore Analytical Data Stores
Azure services for analytical stores

Azure offers several services to implement a large-scale analytical store including:

• Microsoft Fabric

• Azure Databricks
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Explore Analytical Data Stores
Azure services for analytical stores
Azure offers several services to implement a large-scale analytical store including:
• Microsoft Fabric
Microsoft Fabric is a unified, end-to-end for large scale data analytics.

• It combines the reliability of a scalable SQL Server-based data warehouse with the flexibility of a
data lake and Apache Spark.
• It supports real-time log and telemetry analytics with Microsoft Fabric Real-Time Intelligence.
• It includes built-in data pipelines for data ingestion and transformation.

Each product experience within Microsoft Fabric, like the Data Factory Home, provides a central
location for managing and accessing items across multiple workspaces, making it an ideal choice for
creating a comprehensive analytics solution.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Explore Analytical Data Stores
Azure services for analytical stores
Azure offers several services to implement a large-scale analytical store including:
• Azure Databricks
Azure Databricks is a Cloud-based Implementation of the popular Databricks Platform.

• It’s built on Apache Spark, offering powerful data analytics and data science capabilities.
• It provides native SQL support and optimized Spark clusters for efficient processing.
• It has an interactive user interface and notebooks for data exploration.

Azure Databricks is ideal for those with existing expertise in Databricks or those needing a
multi-cloud or cloud-portable solution.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
• Introduction
• Describe data warehousing architecture
• Explore data ingestion pipelines
• Explore analytical data stores
• Exercise: Explore data analytics in Microsoft Fabric
• Knowledge check
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Exercise: Explore data analytics in Microsoft Fabric

https://microsoftlearning.github.io/DP-900T00A-Azure-Data-Fundamentals/Instructions/Labs/dp900-04b-fabric-lake-lab.html
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
• Introduction
• Describe data warehousing architecture
• Explore data ingestion pipelines
• Explore analytical data stores
• Exercise: Explore data analytics in Microsoft Fabric
• Knowledge check
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Knowledge check
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts

Module 2 • Explore fundamental relational data concepts


• Explore relational database services in Azure
Explore relational data

Module 3 • Explore Azure Storage for non-relational data


• Explore fundamentals of Azure Cosmos DB
Explore non-relational data

Module 4 • Explore fundamentals of large-scale analytics


• Explore fundamentals of real-time analytics
Explore data analytics in Azure • Explore fundamentals of data visualization
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
• Introduction
• Understand batch and stream processing
• Explore common elements of stream processing architecture
• Explore Microsoft Fabric Real-Time Intelligence
• Explore Apache Spark structured streaming
• Exercise: Explore Microsoft Fabric Real-Time Intelligence
• Knowledge check
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Introduction
Real-time processing in Azure enables the continuous ingestion, processing, and
analysis of streaming data as it arrives.

This approach helps organizations gain instant insights, detect trends, and
respond to events in real-time.

Azure offers a range of services, such as Azure Stream Analytics, Azure Event
Hubs, and Azure Functions, to build scalable, low-latency solutions for real-time
analytics, empowering businesses to make faster, data-driven decisions.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Introduction
In this module, we will learn how to:

• Compare batch and stream processing


• Describe common elements of streaming data solutions
• Describe features and capabilities of Azure Stream Analytics
• Describe features and capabilities of Microsoft Fabric Real-Time Intelligence
• Describe features and capabilities of Spark Structured Streaming on Azure
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
• Introduction
• Understand batch and stream processing
• Explore common elements of stream processing architecture
• Explore Microsoft Fabric Real-Time Intelligence
• Explore Apache Spark structured streaming
• Exercise: Explore Microsoft Fabric Real-Time Intelligence
• Knowledge check
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Understand Batch And Stream Processing
Data processing is simply the conversion of raw data to meaningful information
through a process.

There are two general ways to process data:

• Batch Processing: in which multiple data records are collected and stored
before being processed together in a single operation.

• Stream processing: in which a source of data is constantly monitored and


processed in real time as new data events occur.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Understand Batch And Stream Processing
• Batch Processing

1- Data is collected and stored.


2-the whole group is processed together as a batch
Advantages:

• Large volumes of data can be processed at a convenient time.


• It can be scheduled to run at a time when computers or systems might otherwise
be idle, such as overnight, or during off-peak hours.
Disadvantages:
• The time delay between ingesting the data and getting the results.
• All of a batch job's input data must be ready before a batch can be processed.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Understand Batch And Stream Processing
• Stream Processing
Data is processed immediately while being generated or ingested
Stream processing is ideal for time-critical operations that require an instant
real-time response
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Understand Batch And Stream Processing
Batch Processing vs Stream Processing
Aspect Batch Processing Stream Processing
Works on the most recent data or
Processes all the data in the
Data Scope within a time window (e.g., last 30
dataset.
seconds).
Processes individual records or
Data Size Handles large datasets efficiently. small micro-batches of a few
records.
Latency is high, typically in the Low latency, processes data in
Performance
order of hours. seconds or milliseconds.
Suitable for simple responses,
Used for complex analytics and in-
Analysis aggregates, or calculations like
depth processing.
rolling averages.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Understand Batch And Stream Processing
Combine Batch & Stream Processing

Combine batch and stream processing


Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Understand Batch And Stream Processing
Combine Batch & Stream Processing
Many large-scale analytics solutions integrate batch processing and stream processing to
support both real-time and historical data analysis:
•Real-Time Dashboards: Stream processing captures and filters real-time data for immediate aggregation
and visualization (e.g., showing the total cars passing a road within the current hour).
•Historical Analysis: Processed stream data is persisted in a data store alongside batch-processed data for
in-depth analysis over time (e.g., analyzing traffic patterns over a year).
•Stream as Input for Batch: Even without real-time analysis, streaming can be used to collect data in real-
time and store it for later batch processing (e.g., capturing all cars in a parking lot for later counting).
This combination provides a flexible architecture for handling both immediate insights and
long-term trends in a unified solution.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
• Introduction
• Understand batch and stream processing
• Explore common elements of stream processing
architecture
• Explore Microsoft Fabric Real-Time Intelligence
• Explore Apache Spark structured streaming
• Exercise: Explore Microsoft Fabric Real-Time Intelligence
• Knowledge check
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Common Elements Of Stream Processing Architecture
The simplest high-level architecture for stream processing looks like this:
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Common Elements Of Stream Processing Architecture
The simplest high-level architecture for stream processing looks like this:

First, an event generates some data:


• A signal from a sensor
• A social media post
• A new log file entry
• Or any other occurrence that results in
some digital data.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Common Elements Of Stream Processing Architecture
The simplest high-level architecture for stream processing looks like this:

Then the generated data is captured in a streaming source for processing.

• In simple cases, the source may be a folder in a


cloud data store or a table in a database.

• In more robust streaming solutions, the source


may be a "queue" that encapsulates logic to
ensure that event data is processed in order and
that each event is processed only once.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Common Elements Of Stream Processing Architecture
The simplest high-level architecture for stream processing looks like this:

Later, The event data is processed.


often by a continuous query that operates on the event
data to select data for:
• specific types of events
• project data values
• aggregate data values over temporal (time-based)
periods
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Common Elements Of Stream Processing Architecture
The simplest high-level architecture for stream processing looks like this:

Later, The event data is processed.


often by a continuous query that operates on the event
data to select data for:
• specific types of events
• project data values
• aggregate data values over temporal (time-based)
periods
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Common Elements Of Stream Processing Architecture
Azure Real-time Analytics Services
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Common Elements Of Stream Processing Architecture
Azure Sources For Stream Processing
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Common Elements Of Stream Processing Architecture
Azure Sinks (outputs) For Stream Processing
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Common Elements Of Stream Processing Architecture
Azure Sinks (outputs) For Stream Processing
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
• Introduction
• Understand batch and stream processing
• Explore common elements of stream processing architecture
• Explore Microsoft Fabric Real-Time Intelligence
• Explore Apache Spark structured streaming
• Exercise: Explore Microsoft Fabric Real-Time Intelligence
• Knowledge check
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Microsoft Fabric Real-Time Intelligence
Microsoft Fabric Real-Time Intelligence empowers organizations to extract insights
and visualize data in motion.
Capabilities
• Provides an end-to-end solution for:
• Event-driven scenarios.
• Streaming data.
• Data logs.
• Offers no-code connectors for seamless integration of time-based data from
various sources.
• Supports Scalability, accommodating data sizes from gigabytes to petabytes.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Microsoft Fabric Real-Time Intelligence
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Microsoft Fabric Real-Time Intelligence
Real-time hub
The Microsoft Fabric Real-time Hub is a centralized catalog of services for managing real-time data within an organization.

It ensures :
• access, addition, exploration, and data sharing.
• insights and visual clarity across domains by
broadening data sources
• data availability and accessibility,
• Swift decision-making and informed actions.
• Sharing streaming data from diverse sources
• unlocking comprehensive business intelligence
across your organization.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Microsoft Fabric Real-Time Intelligence
Exploring data with real-time intelligence

Steps to explore data with Real-Time Intelligence:


1- Choose a data stream from your organization or connected external/internal sources
2- Use Real-Time Intelligence’s tools for data exploration
3- Visualize data patterns, anomalies, and forecasting quantities.

Real-Time dashboards simplify data comprehension, accessible to all via visual


tools, Natural Language, and Copilot.

You can then turn the insights into actions by setting up Reflex alerts to react
in real-time.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
• Introduction
• Understand batch and stream processing
• Explore common elements of stream processing architecture
• Explore Microsoft Fabric Real-Time Intelligence
• Explore Apache Spark structured streaming
• Exercise: Explore Microsoft Fabric Real-Time Intelligence
• Knowledge check
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Apache Spark Structured Streaming

Apache Spark is a distributed framework for large-scale data analytics.


It supports parallel processing across cluster nodes which run code (usually
written in Python, Scala, or Java), enabling efficient batch and stream data
processing.
on Microsoft Azure: Microsoft Fabric & Azure Databricks

Spark Structured Streaming is a great choice for real-time analytics when you need to
incorporate streaming data into a Spark based data lake or analytical data store.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Apache Spark Structured Streaming
the Spark Structured Streaming library provides an application programming interface (API) for ingesting,
processing, and outputting results from perpetual streams of data.

You use the Spark Structured Streaming API to:

1- Read data from a real-time data source, such as a Kafka


hub, a file store, or a network port
2- Encapsulates a table of data into a boundless Dataframe
that is continually populated with new data from the stream.
3- Define a query on the dataframe that selects, projects, or
aggregates the data
4- Generate the results of the query as another Dataframe,
which can be persisted for analysis or further processing.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Apache Spark Structured Streaming
The Spark runtimes in Microsoft Fabric and Azure Databricks include support for Delta Lake.
Delta Lake is an open-source storage layer that adds support for transactional consistency, schema enforcement,
and other common data warehousing features to data lake storage

Capabilities:
• Unifies storage for streaming and batch data
• Used in Spark to define relational tables for both batch and stream processing.
• Used as a streaming source for queries against real-time data
• Used a sink to which a stream of data is written.

Delta Lake combined with Spark Structured Streaming is a good solution when
you need to abstract batch and stream processed data in a data lake behind a
relational schema for SQL-based querying and analysis.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
• Introduction
• Understand batch and stream processing
• Explore common elements of stream processing architecture
• Explore Microsoft Fabric Real-Time Intelligence
• Explore Apache Spark structured streaming
• Exercise: Explore Microsoft Fabric Real-Time Intelligence
• Knowledge check
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Exercise: Explore Microsoft Fabric Real-Time Intelligence

https://microsoftlearning.github.io/DP-900T00A-Azure-Data-
Fundamentals/Instructions/Labs/dp900-05c-fabric-realtime-
lab.html
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
• Introduction
• Understand batch and stream processing
• Explore common elements of stream processing architecture
• Explore Microsoft Fabric Real-Time Intelligence
• Explore Apache Spark structured streaming
• Exercise: Explore Microsoft Fabric Real-Time Intelligence
• Knowledge check
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Knowledge check
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts

Module 2 • Explore fundamental relational data concepts


• Explore relational database services in Azure
Explore relational data

Module 3 • Explore Azure Storage for non-relational data


• Explore fundamentals of Azure Cosmos DB
Explore non-relational data

Module 4 • Explore fundamentals of large-scale analytics


• Explore fundamentals of real-time analytics
Explore data analytics in Azure • Explore fundamentals of data visualization
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
• Introduction
• Describe Power BI tools and workflow
• Describe core concepts of data modeling
• Describe considerations for data visualization
• Exercise – Explore fundamentals of data visualization with Power BI
• Knowledge check
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Introduction
Data modeling and visualization are key to business intelligence, enabling effective reporting and
decision-making for organizational success.

This module introduces the core principles of analytical data modeling and visualization,
demonstrating their application using Microsoft Power BI.

In this module, we will learn how to:

• Describe a high-level process for creating reporting solutions with Microsoft Power BI
• Describe core principles of analytical data modeling
• Identify common types of data visualization and their uses
• Create an interactive report with Power BI Desktop
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
• Introduction
• Describe Power BI tools and workflow
• Describe core concepts of data modeling
• Describe considerations for data visualization
• Exercise – Explore fundamentals of data visualization with Power BI
• Knowledge check
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Power BI Tools And Workflow
Microsoft Power BI is a suite of tools and services within Microsoft Fabric that data analysts can
use to build interactive data visualizations for business users to consume.

Power BI Service
a Cloud Service in which reports can be
published and interacted with by
business users.
Using a web browser, the service offers
some basic data modeling and report
editing directly in, but the functionality for
this is limited compared to the Power BI
Desktop tool
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Power BI Tools And Workflow
Microsoft Power BI is a suite of tools and services within Microsoft Fabric that data analysts can
use to build interactive data visualizations for business users to consume.

Power BI Desktop
a Microsoft Windows application in
which you can import data from a
wide range of data sources, combine
and organize the data from these
sources in an analytics data model,
and create reports that contain
interactive visualizations of the data.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Power BI Tools And Workflow
Microsoft Power BI is a suite of tools and services within Microsoft Fabric that data analysts can
use to build interactive data visualizations for business users to consume.

Power BI phone app

Enables users to consume reports,


dashboards, and apps in the Power BI
service .
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
• Introduction
• Describe Power BI tools and workflow
• Describe core concepts of data modeling
• Describe considerations for data visualization
• Exercise – Explore fundamentals of data visualization with Power BI
• Knowledge check
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Core Concepts Of Data Modeling
Analytical Models organize data into structured formats for analysis.
It uses related tables to define:

• Measures :numeric values to analyze


like sales revenue
• Dimensions : entities
like products, customers, or time
Example:
Create a model to analyze the total revenue by customer or items sold by
product per month.

Conceptually, this creates a multidimensional structure, or "cube," where


intersections of dimensions represent aggregated measures for specific
combinations.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Core Concepts Of Data Modeling
Tables and schema
Dimension Tables represent the entities by which you want to
aggregate numeric measures
Example: product or customer.
Each entity is represented by a row with a unique key value.

The columns represent attributes of an entity


Example: products have names and categories, and customers have addresses and cities.

It’s common in most analytical models to include a Time dimension so that you can aggregate
numeric measures associated with events over time.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Core Concepts Of Data Modeling
Tables and schema

The numeric measures that will be aggregated by the various


dimensions in the model are stored in Fact tables.
Each row in a fact table represents a recorded event that has numeric
measures associated with it.
Example: the Sales table in this schema represents sales transactions for individual items, and includes numeric
values for quantity sold and revenue.

This is a Star Schema: the fact table is related to one or more dimension
tables.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Core Concepts Of Data Modeling
Attribute hierarchies
Attribute Hierarchies is useful when you want to quickly drill-up or drill-down to find aggregated
values at different levels in a hierarchical dimension

The model can be built with pre-aggregated values for each level of a hierarchy, enabling you to
quickly change the scope of your analysis

Example: view the total sales by


year, and then drill down to see a
more detailed breakdown of total
sales by month.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Core Concepts Of Data Modeling
Analytical modeling in Microsoft Power BI
Power BI offers tools to:

• Define an analytical model from tables of data


• Use the Model tab of Power BI Desktop to define your
analytical model:
▪ Create Relationships between fact and dimension
tables,
▪ define hierarchies,
▪ set data types and display formats for fields in the
tables
▪ manage other properties of your data that help
define a rich model for analysis.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
• Introduction
• Describe Power BI tools and workflow
• Describe core concepts of data modeling
• Describe considerations for data visualization
• Exercise – Explore fundamentals of data visualization with Power BI
• Knowledge check
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Considerations For Data Visualization
Power BI includes an extensive set of built-in visualizations that can be included in a report.

1- Tables and text

Tables and text are often the simplest way to


communicate data.

Tables are useful when numerous related values


must be displayed, and individual text values in
cards can be a useful way to show important figures
or metrics.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Considerations For Data Visualization
Power BI includes an extensive set of built-in visualizations that can be included in a report.

2- Bar and column charts

Bar and column charts are a good way to visually


compare numeric values for discrete categories.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Considerations For Data Visualization
Power BI includes an extensive set of built-in visualizations that can be included in a report.

3- Line charts

Line charts can also be used to compare categorized


values and are useful when you need to examine
trends, often over time.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Considerations For Data Visualization
Power BI includes an extensive set of built-in visualizations that can be included in a report.

4- Pie charts

Pie charts are often used in business reports to


visually compare categorized values as proportions
of a total.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Considerations For Data Visualization
Power BI includes an extensive set of built-in visualizations that can be included in a report.

5- Scatter plots

Scatter plots are useful when you want to compare


two numeric measures and identify a relationship
or correlation between them.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Considerations For Data Visualization
Power BI includes an extensive set of built-in visualizations that can be included in a report.

6- Maps

Maps are a great way to visually compare values for


different geographic areas or locations.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Considerations For Data Visualization
Interactive Reports In Power BI
In Power BI:

The visual elements for related data in a report are


automatically linked to one another and provide
interactivity.

Example:
selecting an individual category in one visualization will
automatically filter and highlight that category in other
related visualizations in the report.

the city Seattle has been selected in the Sales by City


and Category column chart, and the other visualizations
are filtered to reflect values for Seattle only.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
• Introduction
• Describe Power BI tools and workflow
• Describe core concepts of data modeling
• Describe considerations for data visualization
• Exercise – Explore fundamentals of data visualization with
Power BI
• Knowledge check
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Exercise – Explore fundamentals of data visualization with Power BI

https://microsoftlearning.github.io/DP-900T00A-Azure-Data-
Fundamentals/Instructions/Labs/dp900-pbi-06-lab.html
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
• Introduction
• Describe Power BI tools and workflow
• Describe core concepts of data modeling
• Describe considerations for data visualization
• Exercise – Explore fundamentals of data visualization with Power BI
• Knowledge check
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Knowledge check
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts

Module 2 • Explore fundamental relational data concepts


• Explore relational database services in Azure
Explore relational data

Module 3 • Explore Azure Storage for non-relational data


• Explore fundamentals of Azure Cosmos DB
Explore non-relational data

Module 4 • Explore fundamentals of large-scale analytics


• Explore fundamentals of real-time analytics
Explore data analytics in Azure • Explore fundamentals of data visualization

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy