0% found this document useful (0 votes)

17 views17 pages

How to test Azure Data Pipeline

This document provides a comprehensive guide on testing Azure Data Pipelines using Azure Data Factory (ADF), outlining the prerequisites, steps to create and validate ETL pipelines, and various use cases. It emphasizes the importance of monitoring, managing, and validating data movement and transformation processes to ensure accuracy and reliability. Additionally, it highlights potential challenges in testing and the need for meticulous planning and well-designed test cases.

Uploaded by

p mansoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views17 pages

How to test Azure Data Pipeline

Uploaded by

p mansoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

How to test Azure Data

Pipeline?
Shivanand Veerbhadrannavar

Shivanand Veerbhadrannavar

Innovative Technical Manager | Data Quality & Analysis Expert | ML and

Strategic Data Management Leader
Published Oct 15, 2023
+ Follow
In today’s data-driven world, organisations often need to
extract, transform, and load (ETL) data from various sources
to support their business processes. Azure Data Factory
(ADF) is one of the powerful cloud-based services provided by
Microsoft that enables you to build and manage ETL pipelines
at scale. In this blog, we will walk you through the process of
using Azure Data Factory to validate an ETL pipeline.

Prerequisites

To follow along with this blog, you will need:

1. An Azure subscription: Sign up for a free Azure

account if you don’t have one

2. Azure Data Factory: Create an Azure Data Factory

instance in your Azure portal

3. Knowledge over Azure cloud services and it’s uses

such as Storage account, SQL server, SQL database
etc

Steps to create data pipeline in ADF:

Step 1: Set up Azure Data Factory

1. Go to the Azure portal and create a new Azure Data

Factory instance

2. Provide the necessary details such as name,

subscription, resource group, and location

3. Once the Data Factory instance is created, navigate

to it and click on the “Author & Monitor” button to
open the Azure Data Factory user interface

Step 2: Create Linked Services

1. Click on the “Author” button in the Azure Data

Factory user interface to start building your ETL
pipeline

2. Begin by creating linked services, which represent

the connection information to your data sources and
destinations

3. Click on the “Manage” tab, select the desired type of

data source or destination, and provide the required
connection details. For instance, you can create
linked services for Azure SQL Database, Azure Blob
Storage, or an on-premises SQL Server
Step 3: Create Datasets

1. After setting up linked services, proceed to create

datasets. Datasets define the structure and location
of your source and destination data

2. Click on the “Author” tab and select the “Datasets”

tab

3. Create a dataset for each data source and

destination, specifying the format, location, and
linked service information. For instance, you can
create a dataset for a CSV file in Azure Blob Storage
or a table in an Azure SQL Database

Step 4: Build Pipelines

1. With the linked services and datasets in place, it’s

time to create pipelines that define the ETL workflow

2. Click on the “Author” tab, select the “Pipelines” tab,

and click on the “New pipeline” button

3. Drag and drop activities onto the pipeline canvas to define

the ETL steps
4. Configure each activity based on the data movement or
transformation required. For instance, you can use the “Copy
data” activity to move data from a source dataset to a
destination dataset or the “Data Flow” activity to perform
complex transformations using Azure Data Factory Data
Flows
5. For reference, predefined Azure Data Factory pipelines
allow you to get started quickly with Data Factory. Templates
are useful when you’re new to Data Factory and want to get
started quickly. These templates reduce the development
time for building data integration projects, thereby improving
developer productivity

Step 5: Monitor and Manage ETL Pipeline

1. After building the ETL pipeline, it’s crucial to monitor

its execution and manage its performance

2. In the Azure Data Factory user interface, click on the

“Monitor” button to access the monitoring dashboard

3. Monitor the pipeline runs, track data movement, and

troubleshoot any issues that arise during execution.

4. Utilise the integration with Azure Monitor and Azure

Log Analytics for more advanced monitoring and
analytics capabilities

Step 6: Test and Validate the Pipeline

1. Test the ETL pipeline by running it manually or

scheduling it to run at specific intervals

2. Monitor the execution of the pipeline and verify that

each Activity completes successfully
3. Validate the data movement, transformation, and
loading processes by checking the output in the
destination system or running queries against the
data

By following these steps, you can create an ETL pipeline in

Azure Data Factory for testing practice. Remember to iterate
and refine your pipeline based on feedback and continuously
improve the testing process.

Validation scenarios

As a tester for an Azure Data Factory pipeline, your role is to

ensure that the ETL process runs smoothly and that the data
transformation is accurate and reliable. Before starting it is
good to have an understanding of data model, Source to
target mapping check, Understanding data dictionary,
Metadata information and Environment readiness. Here is a
list of validations to consider when testing an Azure Data
Factory pipeline:
Remember to document your test cases, including inputs,
expected outputs, and actual results, to facilitate tracking
and issue resolution. Regular regression testing should also
be conducted when changes or updates are made to the
pipeline to ensure ongoing reliability and accuracy of the ETL
process.

Sample Use cases

Use Case1: Copying Data from Azure Blob Storage to

Azure SQL Database
Recommended by LinkedIn
Azure Data Factory vs SQL Server Integration Services:…
Sandeep Kumar Valluri 1 year ago

What is Azure Data Factory (ADF)?

Shruti Anand 9 months ago
Are you Ready to Say “Buh-Bye” to ETL and “Yo!” to…
Monika Wahi 3 years ago

Let’s consider an example where we want to copy data from

a CSV file stored in Azure Blob Storage to an Azure SQL
Database table.

1. Create linked services for Azure Blob Storage and

Azure SQL Database, providing the necessary
connection information

2. Create datasets for the source CSV file and the

destination SQL Database table, specifying the
respective linked services and file formats.

3. Build a pipeline
Above pipeline tries to truncate the table with already loaded
data then it will load a newly received csv file as per input
dataset. In this case both the activities are successful.
You may consider some negative scenarios, such as running
the pipeline with missing source file and table. Appropriate
error messages should get generate with meaningful
information for troubleshooting
Missing target table:

Missing input file:

Suggestions:
All defined columns will have a fixed mapping by default. A
fixed mapping takes a defined, incoming column and maps it
to an exact name.
If necessary, use schema drift throughout your flow to protect
against schema changes from the sources.
When schema drift is enabled, all incoming fields are read
from your source during execution and passed through the
entire flow to the Sink. By default, all newly detected
columns, known as drifted columns, arrive as a string data
type. If you wish for your data flow to automatically infer data
types of drifted columns, check Infer drifted column types in
your source settings.
Use Case 2: Delta data loading from SQL DB
Delta load refers to the process of identifying and loading
only the changed or new data since the last execution of the
pipeline. It involves comparing the source data with the
previously processed data and selecting the records that
meet specific criteria, such as updated timestamps or new
identifiers.
Below pipeline is built to load delta records. There are 3 rows
identified as a delta
By implementing delta load in your ADF pipeline, you can
efficiently process incremental data updates, reduce
processing time and resources, and keep your destination
system synchronised with the changes happening in the
source system
Use Case 3: File archive
File archiving involves moving files from the primary storage
location to a separate storage container or archive. This
process is typically performed to free up space in the primary
storage and ensure long-term retention of files while still
allowing access if needed.
Check the DimAccount.txt file got deleted from blob storage
‘etltestsourcefiles’’ and moved to Archive folder
‘archivedfiles’
By incorporating file archiving in Azure Data Pipeline creation,
organisations can effectively manage data retention, improve
system performance, ensure compliance, and enable efficient
data backup and recovery.

Challenges and Solutions

As the tester responsible for testing the Azure data pipeline, I

wanted to communicate the potential challenges that we may
face during the testing process and how we plan on
overcoming them.
Conclusion

Azure Data Factory provides a robust platform for building

data pipelines, allowing testers to define validation
checkpoints at different stages of the pipeline. From
connectivity and configuration checks to data completeness,
integrity, and transformation accuracy, testers can design
comprehensive test cases to verify the data movement,
transformations, and loading processes.
Effective data validation in Azure Data Pipelines requires
meticulous planning, well-designed test cases, and
continuous monitoring. Testers should validate the data
against expected results, check for any discrepancies or
errors, and ensure compliance with industry regulations and
security standards.
In conclusion, Azure Data Pipeline provides a powerful
platform for testers to perform data validations, ensuring the
accuracy, integrity, and quality of the data being processed.
With the right approach, thorough testing, and adherence to
best practices, testers can contribute to the success of Azure
Data Factory projects and enable organisations to make
informed decisions based on trustworthy data

Azure Data Factory tutorial
No ratings yet
Azure Data Factory tutorial
36 pages
Types of Activities in ADF
100% (1)
Types of Activities in ADF
37 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Azure Data Factory
100% (4)
Azure Data Factory
16 pages
Azure Data Factory
No ratings yet
Azure Data Factory
4 pages
Detailed Azure Data Factory Presentation
No ratings yet
Detailed Azure Data Factory Presentation
30 pages
ADF - Intro and components
No ratings yet
ADF - Intro and components
17 pages
Azure Data Factory For Beginners
No ratings yet
Azure Data Factory For Beginners
250 pages
Data Factory
100% (2)
Data Factory
26 pages
Adf Part-1
No ratings yet
Adf Part-1
5 pages
ADF-3
No ratings yet
ADF-3
10 pages
Data Factory
No ratings yet
Data Factory
1,158 pages
Azure Notes - 3 Data Integration
No ratings yet
Azure Notes - 3 Data Integration
9 pages
capgemini questionnaire
No ratings yet
capgemini questionnaire
11 pages
Himanshu_Assignment solved ETL 1
No ratings yet
Himanshu_Assignment solved ETL 1
6 pages
az questions
No ratings yet
az questions
11 pages
06.introduction To Data Factory
No ratings yet
06.introduction To Data Factory
26 pages
Azure Data Factory Presentation
No ratings yet
Azure Data Factory Presentation
30 pages
Copy activity in ADF
No ratings yet
Copy activity in ADF
52 pages
Azure_Data_Factory_Full_Notes
No ratings yet
Azure_Data_Factory_Full_Notes
4 pages
Azure Data Factory Presentation v2
No ratings yet
Azure Data Factory Presentation v2
9 pages
Introduction to ADF - LwTN
No ratings yet
Introduction to ADF - LwTN
54 pages
Azure Data Factory
No ratings yet
Azure Data Factory
13 pages
Databricks
No ratings yet
Databricks
43 pages
ADF - Data Flow, Triggers & CICD
No ratings yet
ADF - Data Flow, Triggers & CICD
20 pages
1694639964-Module 3 Azure Data Factory
No ratings yet
1694639964-Module 3 Azure Data Factory
48 pages
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Azure Data Factory - Pratap - Qbex Technologies - 8886230001
No ratings yet
Azure Data Factory - Pratap - Qbex Technologies - 8886230001
4 pages
auto_jack_loader_research_paper
No ratings yet
auto_jack_loader_research_paper
6 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Azure Data Factory Deck 1
No ratings yet
Azure Data Factory Deck 1
59 pages
Azure Data Factory
No ratings yet
Azure Data Factory
6 pages
Azure Data Factory Mapping Data Flows
No ratings yet
Azure Data Factory Mapping Data Flows
22 pages
Microsoft_ADF
No ratings yet
Microsoft_ADF
11 pages
Taking Interviw
No ratings yet
Taking Interviw
15 pages
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
MS Azure Data Factory Lab Overview
No ratings yet
MS Azure Data Factory Lab Overview
58 pages
Azure Data Factory Use Case
No ratings yet
Azure Data Factory Use Case
9 pages
Interview Series ADF Part-1
No ratings yet
Interview Series ADF Part-1
17 pages
Most Frequently Asked Azure Data Factory Interview Questions
0% (1)
Most Frequently Asked Azure Data Factory Interview Questions
5 pages
Adf 1
No ratings yet
Adf 1
29 pages
ADF_Class_Notes
No ratings yet
ADF_Class_Notes
2 pages
Azure Data Factory Compressed
No ratings yet
Azure Data Factory Compressed
24 pages
Introduction to Oracle Database Administration
From Everand
Introduction to Oracle Database Administration
Ying Wang
5/5 (1)
Azure Interview Questions
No ratings yet
Azure Interview Questions
7 pages
BY K Madhavi Data Architect
No ratings yet
BY K Madhavi Data Architect
24 pages
ADF Question Set2
No ratings yet
ADF Question Set2
2 pages
Data Platform and Analytics Foundational Training: (Speaker Notes)
No ratings yet
Data Platform and Analytics Foundational Training: (Speaker Notes)
19 pages
025.0 ADF Overview
No ratings yet
025.0 ADF Overview
12 pages
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
100% (1)
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
30 pages
Lab 7 - Orchestrating Data Movement With Azure Data Factory
No ratings yet
Lab 7 - Orchestrating Data Movement With Azure Data Factory
26 pages
Azure Data Factory Use Cases 1740680571
No ratings yet
Azure Data Factory Use Cases 1740680571
11 pages
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
AWS Glue for Data Engineers: Serverless ETL Made Easy
From Everand
AWS Glue for Data Engineers: Serverless ETL Made Easy
Robert Johnson
No ratings yet
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
From Everand
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
Brian Knight
No ratings yet
2 Data Literacy Essentials of Azure Data Factory
No ratings yet
2 Data Literacy Essentials of Azure Data Factory
4 pages
Hallo Microsoft Excel: Mastering Data Analytics
From Everand
Hallo Microsoft Excel: Mastering Data Analytics
Agus Kurniawan
No ratings yet
Azure Data Factory - Important Concepts
No ratings yet
Azure Data Factory - Important Concepts
12 pages
ADF Copy Data
100% (1)
ADF Copy Data
81 pages
Azure Data Factory
100% (2)
Azure Data Factory
14 pages
Everything You Ever Wanted To Know About DLLs - James McNellis - CppCon 2017
No ratings yet
Everything You Ever Wanted To Know About DLLs - James McNellis - CppCon 2017
253 pages
William Graham Sumner - FOLKWAYS - A Study of The Sociological Importance of Usage, Manners, Customs Mores, and Morals
No ratings yet
William Graham Sumner - FOLKWAYS - A Study of The Sociological Importance of Usage, Manners, Customs Mores, and Morals
712 pages
Ati Atihan Please
No ratings yet
Ati Atihan Please
6 pages
Role Play Group Activity RUBRIC
No ratings yet
Role Play Group Activity RUBRIC
2 pages
Assignment - 9 Final - Solutions
No ratings yet
Assignment - 9 Final - Solutions
9 pages
Improve Performance and Page Speed
No ratings yet
Improve Performance and Page Speed
3 pages
G8-RevsionU5-L1-surah Yasin- 55-68
No ratings yet
G8-RevsionU5-L1-surah Yasin- 55-68
16 pages
(Handbooks of Japanese Language and Linguistics (HJLL), 2) Haruo Kubozono - Handbook of Japanese Phonetics and Phonology-De Gruyter Mouton (2015)
100% (2)
(Handbooks of Japanese Language and Linguistics (HJLL), 2) Haruo Kubozono - Handbook of Japanese Phonetics and Phonology-De Gruyter Mouton (2015)
808 pages
CPU and Its Working Principle
83% (6)
CPU and Its Working Principle
11 pages
Assignment Calculus II - Chapter II 1
No ratings yet
Assignment Calculus II - Chapter II 1
13 pages
MSW-010 E Www.ignouassignmentguru.com
No ratings yet
MSW-010 E Www.ignouassignmentguru.com
234 pages
Lesson Plan Grammar Merged
No ratings yet
Lesson Plan Grammar Merged
13 pages
DR Benudhar Patra
No ratings yet
DR Benudhar Patra
312 pages
Stanag 4586 Ed-2 08nov2007
No ratings yet
Stanag 4586 Ed-2 08nov2007
266 pages
Grade 3 English Summative Test, Performance Task and TOS Quarter 3 Week !-8 Final
100% (2)
Grade 3 English Summative Test, Performance Task and TOS Quarter 3 Week !-8 Final
18 pages
Lesson Plan Animal Sound 1
100% (1)
Lesson Plan Animal Sound 1
10 pages
Chapter No. 5 Chapter No. 5: Transform Analysis of Linear Time-Invariant (LTI) Systems Invariant (LTI) Systems
No ratings yet
Chapter No. 5 Chapter No. 5: Transform Analysis of Linear Time-Invariant (LTI) Systems Invariant (LTI) Systems
50 pages
Dokumen
No ratings yet
Dokumen
9 pages
Methodology and Scope of Each ToK Area
No ratings yet
Methodology and Scope of Each ToK Area
16 pages
R Packages 2E: Organize, Test, Document, and Share Your Code (Early Release) 2nd Edition Hadley Wickham and Jenny Bryan Download PDF
No ratings yet
R Packages 2E: Organize, Test, Document, and Share Your Code (Early Release) 2nd Edition Hadley Wickham and Jenny Bryan Download PDF
49 pages
Metaphysics: The Search For Ultimate Reality
No ratings yet
Metaphysics: The Search For Ultimate Reality
32 pages
Eported Peech K S F T: Instructions
No ratings yet
Eported Peech K S F T: Instructions
2 pages
Eapp 1 Midterm 22 23
No ratings yet
Eapp 1 Midterm 22 23
4 pages
List of Party Songs Edited by Karan
No ratings yet
List of Party Songs Edited by Karan
17 pages
Major Components of Culture
No ratings yet
Major Components of Culture
3 pages
001 MBA Business Statistics Week 14 Parametric and Non Parametric Tests 17-06-2023 F1 SV
No ratings yet
001 MBA Business Statistics Week 14 Parametric and Non Parametric Tests 17-06-2023 F1 SV
109 pages
Analyzing the Structure of Attention
No ratings yet
Analyzing the Structure of Attention
14 pages
Mock g7 Gondar Zone 2024
No ratings yet
Mock g7 Gondar Zone 2024
9 pages
BSC MATHS 2018 2019 Compressed (6)
No ratings yet
BSC MATHS 2018 2019 Compressed (6)
57 pages
Profibus Master - Modbus Plus Slave: 512 Bytes (Max 64 Bytes As Modbus Plus Global Data) Ytes, Max
No ratings yet
Profibus Master - Modbus Plus Slave: 512 Bytes (Max 64 Bytes As Modbus Plus Global Data) Ytes, Max
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

How to test Azure Data Pipeline

Uploaded by

How to test Azure Data Pipeline

Uploaded by

How to test Azure Data

Innovative Technical Manager | Data Quality & Analysis Expert | ML and

To follow along with this blog, you will need:

1. An Azure subscription: Sign up for a free Azure

2. Azure Data Factory: Create an Azure Data Factory

3. Knowledge over Azure cloud services and it’s uses

Steps to create data pipeline in ADF:

1. Go to the Azure portal and create a new Azure Data

2. Provide the necessary details such as name,

3. Once the Data Factory instance is created, navigate

Step 2: Create Linked Services

1. Click on the “Author” button in the Azure Data

2. Begin by creating linked services, which represent

3. Click on the “Manage” tab, select the desired type of

1. After setting up linked services, proceed to create

2. Click on the “Author” tab and select the “Datasets”

3. Create a dataset for each data source and

Step 4: Build Pipelines

1. With the linked services and datasets in place, it’s

2. Click on the “Author” tab, select the “Pipelines” tab,

3. Drag and drop activities onto the pipeline canvas to define

Step 5: Monitor and Manage ETL Pipeline

1. After building the ETL pipeline, it’s crucial to monitor

2. In the Azure Data Factory user interface, click on the

3. Monitor the pipeline runs, track data movement, and

4. Utilise the integration with Azure Monitor and Azure

Step 6: Test and Validate the Pipeline

1. Test the ETL pipeline by running it manually or

2. Monitor the execution of the pipeline and verify that

By following these steps, you can create an ETL pipeline in

As a tester for an Azure Data Factory pipeline, your role is to

Sample Use cases

Use Case1: Copying Data from Azure Blob Storage to

What is Azure Data Factory (ADF)?

Let’s consider an example where we want to copy data from

1. Create linked services for Azure Blob Storage and

2. Create datasets for the source CSV file and the

Missing input file:

Challenges and Solutions

As the tester responsible for testing the Azure data pipeline, I

Azure Data Factory provides a robust platform for building

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.