0% found this document useful (0 votes)
17 views17 pages

How to test Azure Data Pipeline

This document provides a comprehensive guide on testing Azure Data Pipelines using Azure Data Factory (ADF), outlining the prerequisites, steps to create and validate ETL pipelines, and various use cases. It emphasizes the importance of monitoring, managing, and validating data movement and transformation processes to ensure accuracy and reliability. Additionally, it highlights potential challenges in testing and the need for meticulous planning and well-designed test cases.

Uploaded by

p mansoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views17 pages

How to test Azure Data Pipeline

This document provides a comprehensive guide on testing Azure Data Pipelines using Azure Data Factory (ADF), outlining the prerequisites, steps to create and validate ETL pipelines, and various use cases. It emphasizes the importance of monitoring, managing, and validating data movement and transformation processes to ensure accuracy and reliability. Additionally, it highlights potential challenges in testing and the need for meticulous planning and well-designed test cases.

Uploaded by

p mansoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

How to test Azure Data

Pipeline?
Shivanand Veerbhadrannavar

Shivanand Veerbhadrannavar

Innovative Technical Manager | Data Quality & Analysis Expert | ML and


Strategic Data Management Leader
Published Oct 15, 2023
+ Follow
In today’s data-driven world, organisations often need to
extract, transform, and load (ETL) data from various sources
to support their business processes. Azure Data Factory
(ADF) is one of the powerful cloud-based services provided by
Microsoft that enables you to build and manage ETL pipelines
at scale. In this blog, we will walk you through the process of
using Azure Data Factory to validate an ETL pipeline.

Prerequisites

To follow along with this blog, you will need:

1. An Azure subscription: Sign up for a free Azure


account if you don’t have one

2. Azure Data Factory: Create an Azure Data Factory


instance in your Azure portal

3. Knowledge over Azure cloud services and it’s uses


such as Storage account, SQL server, SQL database
etc

Steps to create data pipeline in ADF:


Step 1: Set up Azure Data Factory

1. Go to the Azure portal and create a new Azure Data


Factory instance

2. Provide the necessary details such as name,


subscription, resource group, and location

3. Once the Data Factory instance is created, navigate


to it and click on the “Author & Monitor” button to
open the Azure Data Factory user interface

Step 2: Create Linked Services

1. Click on the “Author” button in the Azure Data


Factory user interface to start building your ETL
pipeline

2. Begin by creating linked services, which represent


the connection information to your data sources and
destinations

3. Click on the “Manage” tab, select the desired type of


data source or destination, and provide the required
connection details. For instance, you can create
linked services for Azure SQL Database, Azure Blob
Storage, or an on-premises SQL Server
Step 3: Create Datasets

1. After setting up linked services, proceed to create


datasets. Datasets define the structure and location
of your source and destination data

2. Click on the “Author” tab and select the “Datasets”


tab

3. Create a dataset for each data source and


destination, specifying the format, location, and
linked service information. For instance, you can
create a dataset for a CSV file in Azure Blob Storage
or a table in an Azure SQL Database

Step 4: Build Pipelines

1. With the linked services and datasets in place, it’s


time to create pipelines that define the ETL workflow

2. Click on the “Author” tab, select the “Pipelines” tab,


and click on the “New pipeline” button

3. Drag and drop activities onto the pipeline canvas to define


the ETL steps
4. Configure each activity based on the data movement or
transformation required. For instance, you can use the “Copy
data” activity to move data from a source dataset to a
destination dataset or the “Data Flow” activity to perform
complex transformations using Azure Data Factory Data
Flows
5. For reference, predefined Azure Data Factory pipelines
allow you to get started quickly with Data Factory. Templates
are useful when you’re new to Data Factory and want to get
started quickly. These templates reduce the development
time for building data integration projects, thereby improving
developer productivity

Step 5: Monitor and Manage ETL Pipeline

1. After building the ETL pipeline, it’s crucial to monitor


its execution and manage its performance

2. In the Azure Data Factory user interface, click on the


“Monitor” button to access the monitoring dashboard

3. Monitor the pipeline runs, track data movement, and


troubleshoot any issues that arise during execution.

4. Utilise the integration with Azure Monitor and Azure


Log Analytics for more advanced monitoring and
analytics capabilities

Step 6: Test and Validate the Pipeline

1. Test the ETL pipeline by running it manually or


scheduling it to run at specific intervals

2. Monitor the execution of the pipeline and verify that


each Activity completes successfully
3. Validate the data movement, transformation, and
loading processes by checking the output in the
destination system or running queries against the
data

By following these steps, you can create an ETL pipeline in


Azure Data Factory for testing practice. Remember to iterate
and refine your pipeline based on feedback and continuously
improve the testing process.

Validation scenarios

As a tester for an Azure Data Factory pipeline, your role is to


ensure that the ETL process runs smoothly and that the data
transformation is accurate and reliable. Before starting it is
good to have an understanding of data model, Source to
target mapping check, Understanding data dictionary,
Metadata information and Environment readiness. Here is a
list of validations to consider when testing an Azure Data
Factory pipeline:
Remember to document your test cases, including inputs,
expected outputs, and actual results, to facilitate tracking
and issue resolution. Regular regression testing should also
be conducted when changes or updates are made to the
pipeline to ensure ongoing reliability and accuracy of the ETL
process.

Sample Use cases

Use Case1: Copying Data from Azure Blob Storage to


Azure SQL Database
Recommended by LinkedIn
Azure Data Factory vs SQL Server Integration Services:…
Sandeep Kumar Valluri 1 year ago

What is Azure Data Factory (ADF)?


Shruti Anand 9 months ago
Are you Ready to Say “Buh-Bye” to ETL and “Yo!” to…
Monika Wahi 3 years ago

Let’s consider an example where we want to copy data from


a CSV file stored in Azure Blob Storage to an Azure SQL
Database table.

1. Create linked services for Azure Blob Storage and


Azure SQL Database, providing the necessary
connection information

2. Create datasets for the source CSV file and the


destination SQL Database table, specifying the
respective linked services and file formats.

3. Build a pipeline
Above pipeline tries to truncate the table with already loaded
data then it will load a newly received csv file as per input
dataset. In this case both the activities are successful.
You may consider some negative scenarios, such as running
the pipeline with missing source file and table. Appropriate
error messages should get generate with meaningful
information for troubleshooting
Missing target table:

Missing input file:


Suggestions:
All defined columns will have a fixed mapping by default. A
fixed mapping takes a defined, incoming column and maps it
to an exact name.
If necessary, use schema drift throughout your flow to protect
against schema changes from the sources.
When schema drift is enabled, all incoming fields are read
from your source during execution and passed through the
entire flow to the Sink. By default, all newly detected
columns, known as drifted columns, arrive as a string data
type. If you wish for your data flow to automatically infer data
types of drifted columns, check Infer drifted column types in
your source settings.
Use Case 2: Delta data loading from SQL DB
Delta load refers to the process of identifying and loading
only the changed or new data since the last execution of the
pipeline. It involves comparing the source data with the
previously processed data and selecting the records that
meet specific criteria, such as updated timestamps or new
identifiers.
Below pipeline is built to load delta records. There are 3 rows
identified as a delta
By implementing delta load in your ADF pipeline, you can
efficiently process incremental data updates, reduce
processing time and resources, and keep your destination
system synchronised with the changes happening in the
source system
Use Case 3: File archive
File archiving involves moving files from the primary storage
location to a separate storage container or archive. This
process is typically performed to free up space in the primary
storage and ensure long-term retention of files while still
allowing access if needed.
Check the DimAccount.txt file got deleted from blob storage
‘etltestsourcefiles’’ and moved to Archive folder
‘archivedfiles’
By incorporating file archiving in Azure Data Pipeline creation,
organisations can effectively manage data retention, improve
system performance, ensure compliance, and enable efficient
data backup and recovery.

Challenges and Solutions

As the tester responsible for testing the Azure data pipeline, I


wanted to communicate the potential challenges that we may
face during the testing process and how we plan on
overcoming them.
Conclusion

Azure Data Factory provides a robust platform for building


data pipelines, allowing testers to define validation
checkpoints at different stages of the pipeline. From
connectivity and configuration checks to data completeness,
integrity, and transformation accuracy, testers can design
comprehensive test cases to verify the data movement,
transformations, and loading processes.
Effective data validation in Azure Data Pipelines requires
meticulous planning, well-designed test cases, and
continuous monitoring. Testers should validate the data
against expected results, check for any discrepancies or
errors, and ensure compliance with industry regulations and
security standards.
In conclusion, Azure Data Pipeline provides a powerful
platform for testers to perform data validations, ensuring the
accuracy, integrity, and quality of the data being processed.
With the right approach, thorough testing, and adherence to
best practices, testers can contribute to the success of Azure
Data Factory projects and enable organisations to make
informed decisions based on trustworthy data

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy