MS Azure DP-900
MS Azure DP-900
Data Fundamentals
DP-900
Wajih Khelifi
DDS- BSC - MSc
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts
Data is classified as
• structured
• semi-structured
• unstructured
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Identify data formats
Structured Data:
data that adheres to a fixed schema, so all of the data has the same fields or
properties.
Structured data is often stored in a relational database in which multiple tables can reference one another by using
key-values.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Identify data formats
Semi-structured Data:
Information that has some structure, but which allows for some variation
between entity instances. The representations nature of semi-structured data is
flexible
Common formats for semi-structured data:
• Databases
Module 1: Explore core data concepts
1- Explore Core Data Concepts
• Introduction
• Identify data formats
• Explore file storage
• Explore databases
• Explore transactional data processing
• Explore analytical data processing
• Knowledge check
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage
The ability to store data in files is essential for any computing system.
Files can be stored:
Over time, some specialized file formats that enable compression, indexing, and
efficient storage and processing have been developed.
Some common optimized file formats you might see include Avro, ORC, and Parquet
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage Optimized File Formats
An application uses the information in the header to parse the binary data and extract the fields it contains.
Avro is a good format for compressing data and minimizing storage and network bandwidth requirements.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage Optimized File Formats
An ORC file contains stripes of data. Each stripe holds the data for a column or set of columns.
A stripe contains an index into the rows in the stripe, the data for each row, and a footer that holds
statistical information (count, sum, max, min, and so on) for each column.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore File Storage Optimized File Formats
Parquet
is a columnar storage format designed for efficient data
storage and processing, for big data applications,
offering efficient storage, fast read and write operations,
and strong support for complex data types.
Example
Module 1: Explore core data concepts
1- Explore Core Data Concepts Non-Relational Databases
Explore databases
Non-relational databases are data management systems that don’t
apply a relational schema to the data.
CRUD operations:
Data records are created, retrieved, updated, and deleted
To ensure the integrity of the data stored in the database. The OLTP
systems enforce transactions should support so-called ACID semantics.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore transactional data processing
Atomicity Consistency
ACID
Durability
Isolation
Atomicity Consistency
Each transaction is treated as a single transactions can only take the data in
unit, which succeeds completely or fails the database from one valid state to
completely. another.
ACID
Isolation Durability
concurrent transactions cannot interfere when a transaction has been committed,
with one another, and must result in a it will remain committed.
consistent database state.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
• Introduction
• Identify data formats
• Explore file storage
• Explore databases
• Explore transactional data processing
• Explore analytical data processing
• Knowledge check
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore analytical data processing
Analytical data processing typically uses read-only systems that store
vast volumes of historical data or business metrics.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore analytical data processing
The most common architecture for enterprise-scale analytics looks like
this:
4- The data in the data lake, data warehouse, and analytical model can be
queried to produce reports, visualizations, and dashboards.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore analytical data processing
Data warehouses are an established Data lakes are common in large-scale data
way to store data in a relational analytical processing scenarios, where a
schema large volume of file-based data must be
optimized for read operations collected and analyzed.
primarily queries to support
reporting and data visualization.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Explore analytical data processing
Different types of user might perform data analytical work at different stages
of the overall architecture. For example:
•Data scientists might work directly with data files in a data lake to explore
and model data.
•Data Analysts might query tables directly in the data warehouse to produce
complex reports and visualizations.
•Business users might consume pre-aggregated data in an analytical model
in the form of reports or dashboards.
Module 1: Explore core data concepts
1- Explore Core Data Concepts
• Introduction
• Identify data formats
• Explore file storage
• Explore databases
• Explore transactional data processing
• Explore analytical data processing
• Knowledge check
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Knowledge check
Module 1: Explore core data concepts
1- Explore Core Data Concepts
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
They're responsible for the overall availability and consistent performance and optimizations of
databases.
They work with stakeholders to implement policies, tools, and processes for backup and recovery plans
to recover following a natural disaster or human-made error.
The database administrator is also responsible for managing the security of the data in the database,
granting privileges over the data, granting or denying access to users as appropriate.
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Explore job roles in the world of data
The three key job roles that deal with data in most organizations are:
Data Engineer
A data engineer collaborates with stakeholders to design and implement data-related workloads,
including data ingestion pipelines, cleansing and transformation activities, and data stores for
analytical workloads.
They use a wide range of data platform technologies, including relational and non-relational databases,
file stores, and data streams.
They're also responsible for ensuring that the privacy of data is maintained within the cloud and spanning
from on-premises to the cloud data stores.
They own the management and monitoring of data pipelines to ensure that data loads perform as
expected.
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Explore job roles in the world of data
The three key job roles that deal with data in most organizations are:
Data Analyst
A data analyst enables businesses to maximize the value of their data assets.
They're responsible for exploring data to identify trends and relationships, designing and building
analytical models, and enabling advanced analytics capabilities through reports and visualizations.
A data analyst processes raw data into relevant insights based on identified business requirements to
deliver relevant insights.
There are additional data-related roles, such as data scientist and data architect; and there are other technical
professionals that work with data, including application developers and software engineers.
Module 1: Explore core data concepts
2- Explore Data Roles And Services
• Introduction
• Explore job roles in the world of data
• Identify data services
• Knowledge check
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Microsoft Azure is a cloud platform that powers the applications and IT infrastructure for
some of the world's largest organizations.
It includes many services to support cloud solutions, including transactional and analytical
data workloads.
Some of the most commonly used cloud services for data are described below.
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Identify data services
Module 1: Explore core data concepts
2- Explore Data Roles And Services
• Introduction
• Explore job roles in the world of data
• Identify data services
• Knowledge check
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Knowledge check
Module 1: Explore core data concepts
2- Explore Data Roles And Services
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts
Introduction
In the early years of computing, different applications used unique data structures, which were
inefficient and hard to maintain, leading to the development of the relational database model.
This model uses tables to store and query data in a standardized, efficient way, and is widely used
across organizations to manage structured, related information.
In a relational database, you model collections of entities from the real world as tables.
An entity can be anything for which you want to record information, objects and events.
A table contains rows, and each row represents a single instance of an entity.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Example
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Text (string/char)
Decimal numeric (float)
Integer numeric
Date/Time values
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
• Introduction
• Understand relational data
• Understand normalization
• Explore SQL
• Describe database objects
• Knowledge check
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Understand Normalization
Normalization is a design process that minimizes data duplication and
enforces data integrity.
Understand Normalization
Notice that the customer and product details are duplicated for each individual item sold; and that the customer
name and postal address, and the product name and price are combined in the same spreadsheet cells.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Understand Normalization
Normalization changes
the way the data is stored
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
• Introduction
• Understand relational data
• Understand normalization
• Explore SQL
• Describe database objects
• Knowledge check
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL stands for Structured Query Language, and is used to
communicate with a relational database.
Explore SQL
SQL statement types
SQL statements are grouped into three main logical groups:
•Data Definition Language (DDL)
Explore SQL
SQL statement types
The DROP statement is very powerful. When you drop a table, all the
rows in that table are lost.
Unless you have a backup, you won't be able to retrieve this data.
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
Data Definition Language (DDL) statements:
Example: CREATE
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Control Language (DCL) statements:
Statement Description
GRANT Grant permission to perform specific actions
DENY Deny permission to perform specific actions
REVOKE Remove a previously granted permission
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Control Language (DCL) statements:
Example: GRANT
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Explore SQL
SQL statement types
•Data Manipulation Language (DML) statements:
We use DML statements to manipulate the rows in tables.
DROP Vs DELETE
• Views
• Stored Procedures
• Indexes
We can execute the stored procedure, passing the ID of the product and the new name to be assigned
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Knowledge check
Module 2: Explore Relational Data
1- Explore Fundamental Relational Data Concepts
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts
Introduction
Azure supports a range of database services that you can use to support new cloud applications or migrate
existing applications to the cloud.
Use this option when you need to migrate or extend an on-premises SQL Server solution and retain full
control over all aspects of server and database configuration.
Module 2: Explore Relational Data
2- Explore relational database services in Azure
Use this option for most cloud migration scenarios, particularly when you need minimal changes to existing
applications. When you want to lift-and-shift an on-premises SQL Server instance and all its databases to the
cloud
Module 2: Explore Relational Data
2- Explore relational database services in Azure
Use this option for new cloud solutions, or to migrate applications that have minimal instance-level
dependencies.
Module 2: Explore Relational Data
2- Explore relational database services in Azure
• Introduction
• Describe Azure SQL services and capabilities
• Describe Azure services for open-source databases
• Exercise: Explore Azure relational database services
• Knowledge check
Module 2: Explore Relational Data
2- Explore relational database services in Azure
MySQL is the leading open-source relational database for Linux, Apache, MySQL, and PHP (LAMP)
stack apps.
Azure Database for MySQL is a PaaS implementation of MySQL in the Azure cloud, based on the
MySQL Community Edition
Benefits:
•High availability features built-in. •Automatic backups and point-in-time restore for
•Predictable performance. the last 35 days.
•Easy scaling that responds quickly to demand. •Enterprise-level security and compliance with
legislation.
•Secure data, both at rest and in motion.
monitoring functionality to add alerts, and to
view metrics and logs.
The system uses pay-as-you-go pricing so you only pay for what you use.
Module 2: Explore Relational Data
2- Explore relational database services in Azure
•Built-in high availability with no additional cost. •Secured protection of sensitive data at rest and in
•Predictable performance, using inclusive pay-as- motion.
you-go pricing. •Automatic backups and point-in-time-restore for up
•Scaling as needed within seconds. to 35 days.
PostgreSQL is a hybrid relational-object database. You can store data in relational tables, but a
PostgreSQL database also enables you to store custom data types, with their own non-relational
properties.
Azure Database for PostgreSQL is a PaaS implementation of PostgreSQL in the Azure Cloud.
But Some features of on-premises PostgreSQL databases aren't available in Azure Database for
PostgreSQL
Benefits:
This service provides the same availability, performance, scaling, security, and administrative
benefits as the MySQL service.
It contains built-in failure detection and failover mechanisms => highly available service.
Module 2: Explore Relational Data
2- Explore relational database services in Azure
• Introduction
• Describe Azure SQL services and capabilities
• Describe Azure services for open-source databases
• Exercise: Explore Azure relational database services
• Knowledge check
Module 2: Explore Relational Data
2- Explore relational database services in Azure
Knowledge check
Module 2: Explore Relational Data
2- Explore relational database services in Azure
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts
Azure Storage is a key service in Microsoft Azure, and enables a wide range of data storage scenarios and
solutions.
Azure Blob Storage is a service that enables you to do so ! Blobs are stored in containers.
•Page blobs
•Append blobs
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure blob storage
• Block blobs: a set of blocks.
➢ One block size: up to 4000 MiB.
➢ A block blob : up to 190.7 TiB : 4000 MiB X 50,000 blocks.
The block is the smallest amount of data that can be read or written as an
individual unit.
Block blobs are best used to store discrete, large, binary objects that change
infrequently.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure blob storage
• Page blobs: a collection of fixed size 512-byte pages.
Azure uses page blobs to implement virtual disk storage for virtual machines
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure blob storage
• Append blobs: a block blob optimized to support append operations.
The Cool tier: for infrequently accessed data. it has lower performance and lower cost comparing
to Hot tier Storage.
You can migrate a blob from the Cool tier to the Hot tier or vice-versa depending on your access
frequency to the Blob
The Archive tier: provides the lowest storage cost but with the most latency (Hours). It’s used for
historical data that mustn't be lost, but is required only rarely. Blobs in the Archive tier are
effectively stored in an offline state. a blob from the Archive should be rehydrated to a Hot or a
Cool state. You can read the blob only when the rehydration process is complete.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure blob storage
Blob Storage lifecycle management policies :
A lifecycle management policy can automatically move a blob from Hot to Cool, and then to the
Archive tier, as it ages and is used less frequently
The policy is based on the number of days since modification.
A lifecycle management policy can also arrange to delete outdated blobs.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage for non-relational data
• Introduction
• Explore Azure blob storage
• Explore Azure Data Lake Storage Gen2
• Explore Microsoft OneLake in Fabric
• Explore Azure Files
• Explore Azure Tables
• Exercise: Explore Azure Storage
• Knowledge check
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure Data Lake Storage Gen2
Azure Data Lake Storage Gen2 is the new version of service for
hierarchical data storage for analytical data lakes that is integrated into
Azure Storage
Advantages:
▪ The scalability of blob storage
▪ The cost-control of storage tiers
▪ The hierarchical file system capabilities
▪ The compatibility with major analytics systems of Azure Data Lake Store
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure Data Lake Storage Gen2
To create an Azure Data Lake Store Gen2 files system:
or you can upgrade an existing Azure Storage account to support Data Lake
Gen2.
OneLake is a single, unified, logical data lake designed for your entire organization.
OneLake:
➢ Supports any type of file and data (structured
or unstructured)
➢ Allows you to use the same data across
multiple analytical engines without data
movement or duplication.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Microsoft OneLake in Fabric
Microsoft Fabric automatically provisions OneLake, built upon Azure Data Lake Gen 2
Azure Files enables you to share up to 100 TB of data in a single storage account.
The maximum size of a single file is 1 TB.
Azure File Storage supports up to 2000 concurrent connections per shared file
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure Files
How to upload files on Azure Files?
1- Azure portal: is a web-based interface provided by Microsoft Azure that allows users to
manage and interact with Azure resources.
You can manually upload, download, and manage files in Azure File Storage
2- AzCopy Utility: is a command-line tool designed for high-speed data transfer to and from
Azure Storage.
Useful for bulk uploads and downloads of files to/from Azure File Storage. It supports scripting and
automation, making it ideal for large-scale operations or repetitive tasks.
3- Azure File Sync: is a service that enables synchronization between on-premises file servers
and Azure File Storage.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Explore Azure Files
Azure File Storage offers two Performance Tiers:
Partitioning helps to ensure fast access, organize data and improve scalability and performance
Partitions are independent from each other, and can grow or shrink as rows are added to,
or removed from, a partition. A table can contain any number of partitions.
When the partition key is used in the search criteria, this helps to narrow down the search
process and improves performance.
Azure Table Storage tables have no concept of foreign keys, relationships, stored
procedures, views, or other objects you might find in a relational database.
Module 3: Explore Non-Relational Data
1- Explore Azure Storage for non-relational data
• Introduction
• Explore Azure blob storage
• Explore Azure Data Lake Storage Gen2
• Explore Microsoft OneLake in Fabric
• Explore Azure Files
• Explore Azure Tables
• Exercise: Explore Azure Storage
• Knowledge check
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Exercise: Explore Azure Storage
Module 3: Explore Non-Relational Data
1- Explore Azure Storage for non-relational data
• Introduction
• Explore Azure blob storage
• Explore Azure Data Lake Storage Gen2
• Explore Microsoft OneLake in Fabric
• Explore Azure Files
• Explore Azure Tables
• Exercise: Explore Azure Storage
• Knowledge check
Module 3: Explore Non-Relational Data
1- Explore Azure Storage For Non-Relational Data
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts
•Identify common types of analytical data store and related Azure services
•Provision Microsoft Fabric and use it to ingest, process, and query data
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
• Introduction
• Describe data warehousing architecture
• Explore data ingestion pipelines
• Explore analytical data stores
• Exercise: Explore data analytics in Microsoft Fabric
• Knowledge check
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
Almost all Large-scale data analytics architecture include the following steps:
1- Data Ingestion And Preprocessing
2- Store Data For Analysis
3- Analytical Data Model
4- Data Visualization
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
1- Data ingestion and preprocessing:
• ETL (Extract, Transform, Load): Data is transformed (cleaned, filtered, and restructured) before being loaded
into the analytical store.
• ELT (Extract, Load, Transform): Data is loaded into the store first and then transformed within it.
➔ In both cases, the goal is to optimize the data structure for analytical queries.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
1- Data Ingestion And Preprocessing:
This combination ensures efficient handling and analysis of both historical and live data.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
2- Analytical Data Store:
An Analytical Data Store is a place where data is stored to be analyzed and to find useful insights.
There are three main types:
1. Data Warehouses: These store data in Tables (like a spreadsheet) for Structured Data, such as sales
reports or customer records. Example: A company using a relational database to track yearly sales.
2. Data Lakes: These store All Types Of Data: Structured, Semi-structured, and Unstructured, such
as files, videos, or raw logs. Example: A social media platform saving raw images, text posts, and videos
in a data lake for later analysis.
3. Data Lakehouses: These combine the best of both: a data lake’s ability to store all kinds of data and
a data warehouse’s ability to analyze it efficiently. Example: An e-commerce site using a lakehouse to
manage both raw user activity logs and structured sales data.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
3- Analytical Data Model:
An analytical data model is a way to organize data to make it easier for analysts to create reports,
dashboards, or visualizations. Instead of working directly with raw data, a model pre-summarizes the
data to save time and simplify analysis.
Often these data models are described as cubes, in which numeric data values are aggregated across
one or more dimensions (product, region, time).
Example: Imagine you want to find total sales by product and region. Instead of calculating it from
raw data every time, the model stores this information ready-made.
Drill-Down/Drill-Up: You can zoom in for more detail (e.g., sales for a specific region) or zoom out for
a broader view (e.g., total sales across all regions).
These models make exploring and understanding data faster and more interactive!
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
4- Data Visualization:
Data visualization is the process of turning data into VISUAL FORMATS like charts,
graphs, or dashboards to make it easier to understand and analyze.
•Who Uses It?
•Data analysts: Use data from analytical models or stores to create visual reports and dashboards.
•Non-technical users: Can perform self-service analysis to create their own reports using simple
tools.
•What Does It Show?
•Trends: E.g., how sales are increasing or decreasing over time.
•Comparisons: E.g., comparing performance across products or regions.
•Key Performance Indicators (KPIs): E.g., metrics like total revenue, customer retention rate, or
profit margins.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Describe Data Warehousing Architecture
4- Data Visualization:
Data visualization is the process of turning data into VISUAL FORMATS like charts,
graphs, or dashboards to make it easier to understand and analyze.
Formats:
•Printed reports or charts in documents.
•Slides in PowerPoint presentations.
•Interactive dashboards on the web where users can explore the data visually.
• Store data in a structured, relational format optimized for • Store large volumes of raw, unstructured, semi-
analytics. structured, or structured data in a distributed file system.
• Use a star schema with fact tables (one dimension) and • Use a schema-on-read approach, where the structure is
dimension tables (customer, product, time). applied when the data is read, rather than when it's
• Suited for structured transactional data and use SQL for stored.
querying. • ideal for handling diverse data types and supporting
• This approach allows for complex aggregations and analysis, advanced analytics or machine learning without requiring
making it ideal for business intelligence and reporting. predefined schemas.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Explore Analytical Data Stores
Hybrid approaches: Data Lakehouses
A Data Lakehouse combines the features of both data lakes and data warehouses.
Raw data is stored as files in a data lake, and SQL analytics endpoints, such as those in Microsoft Fabric, expose the
data as tables, allowing you to query it with SQL.
This hybrid model adds relational storage capabilities to Spark-based systems, enabling schema enforcement,
transactional consistency, and support for both batch and streaming data sources by providing a SQL API for
querying.
This approach provides the flexibility of a data lake with the structured querying power of a data warehouse.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Explore Analytical Data Stores
Azure services for analytical stores
• Microsoft Fabric
• Azure Databricks
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Explore Analytical Data Stores
Azure services for analytical stores
Azure offers several services to implement a large-scale analytical store including:
• Microsoft Fabric
Microsoft Fabric is a unified, end-to-end for large scale data analytics.
• It combines the reliability of a scalable SQL Server-based data warehouse with the flexibility of a
data lake and Apache Spark.
• It supports real-time log and telemetry analytics with Microsoft Fabric Real-Time Intelligence.
• It includes built-in data pipelines for data ingestion and transformation.
Each product experience within Microsoft Fabric, like the Data Factory Home, provides a central
location for managing and accessing items across multiple workspaces, making it an ideal choice for
creating a comprehensive analytics solution.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Explore Analytical Data Stores
Azure services for analytical stores
Azure offers several services to implement a large-scale analytical store including:
• Azure Databricks
Azure Databricks is a Cloud-based Implementation of the popular Databricks Platform.
• It’s built on Apache Spark, offering powerful data analytics and data science capabilities.
• It provides native SQL support and optimized Spark clusters for efficient processing.
• It has an interactive user interface and notebooks for data exploration.
Azure Databricks is ideal for those with existing expertise in Databricks or those needing a
multi-cloud or cloud-portable solution.
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
• Introduction
• Describe data warehousing architecture
• Explore data ingestion pipelines
• Explore analytical data stores
• Exercise: Explore data analytics in Microsoft Fabric
• Knowledge check
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Exercise: Explore data analytics in Microsoft Fabric
https://microsoftlearning.github.io/DP-900T00A-Azure-Data-Fundamentals/Instructions/Labs/dp900-04b-fabric-lake-lab.html
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
• Introduction
• Describe data warehousing architecture
• Explore data ingestion pipelines
• Explore analytical data stores
• Exercise: Explore data analytics in Microsoft Fabric
• Knowledge check
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Knowledge check
Module 4: Explore data analytics in Azure
1- Explore fundamentals of large-scale analytics
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts
This approach helps organizations gain instant insights, detect trends, and
respond to events in real-time.
Azure offers a range of services, such as Azure Stream Analytics, Azure Event
Hubs, and Azure Functions, to build scalable, low-latency solutions for real-time
analytics, empowering businesses to make faster, data-driven decisions.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Introduction
In this module, we will learn how to:
• Batch Processing: in which multiple data records are collected and stored
before being processed together in a single operation.
It ensures :
• access, addition, exploration, and data sharing.
• insights and visual clarity across domains by
broadening data sources
• data availability and accessibility,
• Swift decision-making and informed actions.
• Sharing streaming data from diverse sources
• unlocking comprehensive business intelligence
across your organization.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Microsoft Fabric Real-Time Intelligence
Exploring data with real-time intelligence
You can then turn the insights into actions by setting up Reflex alerts to react
in real-time.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
• Introduction
• Understand batch and stream processing
• Explore common elements of stream processing architecture
• Explore Microsoft Fabric Real-Time Intelligence
• Explore Apache Spark structured streaming
• Exercise: Explore Microsoft Fabric Real-Time Intelligence
• Knowledge check
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Apache Spark Structured Streaming
Spark Structured Streaming is a great choice for real-time analytics when you need to
incorporate streaming data into a Spark based data lake or analytical data store.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Explore Apache Spark Structured Streaming
the Spark Structured Streaming library provides an application programming interface (API) for ingesting,
processing, and outputting results from perpetual streams of data.
Capabilities:
• Unifies storage for streaming and batch data
• Used in Spark to define relational tables for both batch and stream processing.
• Used as a streaming source for queries against real-time data
• Used a sink to which a stream of data is written.
Delta Lake combined with Spark Structured Streaming is a good solution when
you need to abstract batch and stream processed data in a data lake behind a
relational schema for SQL-based querying and analysis.
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
• Introduction
• Understand batch and stream processing
• Explore common elements of stream processing architecture
• Explore Microsoft Fabric Real-Time Intelligence
• Explore Apache Spark structured streaming
• Exercise: Explore Microsoft Fabric Real-Time Intelligence
• Knowledge check
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Exercise: Explore Microsoft Fabric Real-Time Intelligence
https://microsoftlearning.github.io/DP-900T00A-Azure-Data-
Fundamentals/Instructions/Labs/dp900-05c-fabric-realtime-
lab.html
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
• Introduction
• Understand batch and stream processing
• Explore common elements of stream processing architecture
• Explore Microsoft Fabric Real-Time Intelligence
• Explore Apache Spark structured streaming
• Exercise: Explore Microsoft Fabric Real-Time Intelligence
• Knowledge check
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Knowledge check
Module 4: Explore data analytics in Azure
2- Explore fundamentals of real-time analytics
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts
This module introduces the core principles of analytical data modeling and visualization,
demonstrating their application using Microsoft Power BI.
• Describe a high-level process for creating reporting solutions with Microsoft Power BI
• Describe core principles of analytical data modeling
• Identify common types of data visualization and their uses
• Create an interactive report with Power BI Desktop
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
• Introduction
• Describe Power BI tools and workflow
• Describe core concepts of data modeling
• Describe considerations for data visualization
• Exercise – Explore fundamentals of data visualization with Power BI
• Knowledge check
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Power BI Tools And Workflow
Microsoft Power BI is a suite of tools and services within Microsoft Fabric that data analysts can
use to build interactive data visualizations for business users to consume.
Power BI Service
a Cloud Service in which reports can be
published and interacted with by
business users.
Using a web browser, the service offers
some basic data modeling and report
editing directly in, but the functionality for
this is limited compared to the Power BI
Desktop tool
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Power BI Tools And Workflow
Microsoft Power BI is a suite of tools and services within Microsoft Fabric that data analysts can
use to build interactive data visualizations for business users to consume.
Power BI Desktop
a Microsoft Windows application in
which you can import data from a
wide range of data sources, combine
and organize the data from these
sources in an analytics data model,
and create reports that contain
interactive visualizations of the data.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Power BI Tools And Workflow
Microsoft Power BI is a suite of tools and services within Microsoft Fabric that data analysts can
use to build interactive data visualizations for business users to consume.
It’s common in most analytical models to include a Time dimension so that you can aggregate
numeric measures associated with events over time.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Core Concepts Of Data Modeling
Tables and schema
This is a Star Schema: the fact table is related to one or more dimension
tables.
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Describe Core Concepts Of Data Modeling
Attribute hierarchies
Attribute Hierarchies is useful when you want to quickly drill-up or drill-down to find aggregated
values at different levels in a hierarchical dimension
The model can be built with pre-aggregated values for each level of a hierarchy, enabling you to
quickly change the scope of your analysis
3- Line charts
4- Pie charts
5- Scatter plots
6- Maps
Example:
selecting an individual category in one visualization will
automatically filter and highlight that category in other
related visualizations in the report.
https://microsoftlearning.github.io/DP-900T00A-Azure-Data-
Fundamentals/Instructions/Labs/dp900-pbi-06-lab.html
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
• Introduction
• Describe Power BI tools and workflow
• Describe core concepts of data modeling
• Describe considerations for data visualization
• Exercise – Explore fundamentals of data visualization with Power BI
• Knowledge check
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Knowledge check
Module 4: Explore data analytics in Azure
3- Explore Fundamentals Of Data Visualization
Knowledge check
Microsoft Azure Data Fundamentals
DP-900
Module 1 • Explore core data concepts
• Explore data roles and services
Explore core data concepts