0% found this document useful (0 votes)

44 views

Lab4 Data Quality

This document provides instructions for using SQL Server Data Quality Services (DQS) to profile and cleanse dirty customer data. The steps include: 1. Creating SSIS packages to profile data using the Data Profiling task and output results to an XML file. 2. Analyzing the profiling results to identify data quality issues. 3. Preparing clean and dirty customer data tables for cleansing. 4. Adding a DQS cleansing transformation to an SSIS data flow to cleanse dirty customer address data by mapping it to a reference data source.

Uploaded by

Mariem El Mechry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Lab4 Data Quality

Uploaded by

Mariem El Mechry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Report LAB4 SQL Server Data Quality Services

Realized by :

Bouhtil Rania

El Maksour Imane

School-Year : 2022-2023
In this exercise, we will use the Data Profiling task to find inaccurate data in the
CustomersDirty view we created in the previous lab, within the
DQS_STAGING_DATA database.
1. Open Visual Studio (or SQL Server Data Tools (SSDT) for older versions). Create
a new SSIS project and solution.
2.Drag the Data Profiling task from the SSIS Toolbox (it should be in the Common
Tasks group) to the control flow working area. Right-click it and select Edit.

3.On the General tab, use the Destination Property drop-down list to select New File
Connection.

4.In the File Connection Manager Editor window, change the usage type to Create
File. In the File text box, type the file name ProfilingCustomers.xml.

4.In the File Connection Manager Editor window, change the usage type to Create
File. In the File text box, type the file name ProfilingCustomers.xml.
5.When you are back in the Data Profiling Task Editor, on the General tab, change
the OverwriteDestination property to True to make it possible to re-execute the
package multiple times (otherwise you will get an error saying that the destination file
already exists when the package next executes).
6.In the lower-right corner of the Data Profiling Task Editor, on the General tab, click
the Quick Profile button.

7. In the Simple Table Quick Profiling Form dialog box, click the New button to create
a new ADO.NET connection. The Data Profiling task accepts only ADO.NET
connections.
8.Connect to your SQL Server instance by using Windows authentication, and select
the DQS_STAGING_DATA database. Click OK to return to the Simple Table Quick
Profiling Form dialog box.

9. Select the CustomersDirty view in the Table Or View drop-down list. Leave the
first four check boxes selected, as they are by default. Clear the Candidate Key
Profile check box, and select the Column Pattern Profile check box.

10. In the Data Profiling Task Editor window, in the Profile Type list on the right,
select different profiles and check their settings. Change the Column property for the
Column Value Distribution Profile Request from (*) to Occupation (you are going to
profile this column only). Change the ValueDistributionOption property for this
request to All-Values. In addition, change the value for the Column property of the
Column Pattern Profile Request from (*) to EmailAddress. Click OK.
11. Save the project. Execute the package.

12.When the Execution finishes, Check whether the XML file appeared in the folder
you chose in step 4.

Open Data Profile Viewer and Navigate to the ProfilingCustomers.xml file and open
it. Now you can thus start harvesting the results.
2. On the left, in the Profiles pane, select, for example, the Column Value Distribution
Profiles. In the upperright pane, select the Occupation column. In the middle-right
window, you should see the distribution for the Occupation attribute. Click the value
that has very low frequency (the Profesional value). Find the drilldown button in the
upper-right corner of the middle-right window. Click it, and in the lower-right pane,
check the row with this suspicious value.

3. Check the Column Pattern Profiles. Note that for the EmailAddress column, the
Data Profiling task shows you the regular expression patterns for this column. Note
that these two regular expressions are the regular expressions you used when you
prepared a DQS knowledge base in the previous Labs.

4. Also check the other profiles. When you are done checking, close the Data Profile
Viewer.
Data Cleansing with SSIS
1. Open SSMS, connect to your SQL Server instance, open a new query window,
and change the context to the DQS_STAGING_DATA database.
2. Create a table for clean customer data. Name it CustomersCleanT. Include only
columns for the customer key, full name, and street address. Use the following code.

3. Populate the table with every tenth customer from the DimCustomer table from the
AdventureWorksDW database by using the following query.

4. Create a table with a structure similar to the one for CustomersCleanT and call it
customersDirtyT. Add two integer columns to this table called Updated and
CleanCustomerKey. The first one will be used by the query that makes the data dirty
and the second one to populate the table with the customer key from the clean table
after identity mapping (process of linking or mapping data from an input data source
to corresponding records in a reference data source.).
To create our Dirty Data we will execute the queries in the createDiryData.sql file.

5. Check the dirty data after changes. A little bit more than 40 percent of data should
be updated. Because there is randomness in updates, you get a different number of
rows and different rows updated every time you run the code. You can check the
changes with the following query.
6. Finally, update the row for the customer with a key equal to -11010. Set the
FullName to jacquelyn suarez and StreetAddress to 7800 corrinne ct. This gives you
a row that can be corrected with the DQS Cleansing transformation in the practice
next. Use the following code

7. Create a new table in the DQS_STAGING_DATA database in the dbo schema

and name it CustomersDirtyMatchT. Use the following code.

8. Add another new table in the dbo schema and name it CustomersDirtyNoMatchT.
Use the same schema as for the previous table
Now that our data and tables are prepared, we will create an SSIS flow to clean the
dirty data.
1. Create a new package in your integration project from the first exercise. You can
name the package DQSCleansing.

2. Drag a data flow task to the control flow working area. Click the Data Flow tab to
open the data flow working area.

3. Right-click the Connection Managers folder in Solution Explorer and select New
Connection Manager

4. Select the OLEDB connection manager type and click Add. In the Configure OLE
DB Connection Manager window, click New.
5. Select Native OLE DB\SQL Server Native Client 11.0 Provider. Provide the name
of your SQL Server instance and authentication information, and select the DQS_
STAGING_DATA database. Click OK. When you are back in the Configure OLE DB
Connection Manager window, click OK.

6. Add an OLE DB source to your data flow. Rename it to customersDirty. Double-

click it to open the editor. Select the table CustomersDirtyT table. Click the Columns
tab in the left pane to get the column mappings. Check the mappings and click OK.
7. In the SSIS Toolbox, expand Other Transforms. Drag the DQS Cleansing
transformation to the data flow. Connect it to the CustomersDirty data source with
the normal data flow (gray arrow). Rename the transformation
CleanseStreetAddress. Double-click it to open the editor.

Practice Questions for Tableau Desktop Specialist Certification Case Based
From Everand
Practice Questions for Tableau Desktop Specialist Certification Case Based
Exam OG
5/5 (1)
Lab 5
No ratings yet
Lab 5
3 pages
Jump into JMP Scripting, Second Edition
From Everand
Jump into JMP Scripting, Second Edition
Wendy Murphrey
No ratings yet
Sap Ides-1
No ratings yet
Sap Ides-1
30 pages
Performance Management (SaaS) Troubleshooting
No ratings yet
Performance Management (SaaS) Troubleshooting
68 pages
#Lab4 ANJAR ELMECHRY
No ratings yet
#Lab4 ANJAR ELMECHRY
20 pages
ANJAR ELMECHRYLab4
No ratings yet
ANJAR ELMECHRYLab4
12 pages
Lab3 - Cleansing Data
No ratings yet
Lab3 - Cleansing Data
5 pages
Lab4 SSIS DQS
No ratings yet
Lab4 SSIS DQS
10 pages
EIM Tutorial
0% (1)
EIM Tutorial
84 pages
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Data_bases_lab_7-8
No ratings yet
Data_bases_lab_7-8
6 pages
KevinKbidi Resume
No ratings yet
KevinKbidi Resume
6 pages
LAB03-Creating An ETL Solution With SSIS
No ratings yet
LAB03-Creating An ETL Solution With SSIS
9 pages
SQL_Server_Practical_Questions
No ratings yet
SQL_Server_Practical_Questions
3 pages
Enforcing Data Quality
No ratings yet
Enforcing Data Quality
28 pages
Data Cleaning in Excel
100% (1)
Data Cleaning in Excel
68 pages
SQL
No ratings yet
SQL
4 pages
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
From Everand
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
Larry Keller
3/5 (1)
C# Interview Questions, Answers, and Explanations: C Sharp Certification Review
From Everand
C# Interview Questions, Answers, and Explanations: C Sharp Certification Review
equitypress
4.5/5 (3)
SQL_Server_Topics_Practical_Questions
No ratings yet
SQL_Server_Topics_Practical_Questions
5 pages
Introducing SQL Server 2012 Data Quality Services
No ratings yet
Introducing SQL Server 2012 Data Quality Services
19 pages
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
BI Unit 4 Final
No ratings yet
BI Unit 4 Final
2 pages
How to Write a Bulk Emails Application in Vb.Net and Mysql: Step by Step Fully Working Program
From Everand
How to Write a Bulk Emails Application in Vb.Net and Mysql: Step by Step Fully Working Program
Lotfi Ferchichi
No ratings yet
1738563659003
No ratings yet
1738563659003
11 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
MCSA: SQL 2016 Database Development - Skills Measured: Exam 70-761: Querying Data With Transact-SQL
No ratings yet
MCSA: SQL 2016 Database Development - Skills Measured: Exam 70-761: Querying Data With Transact-SQL
5 pages
Vandana-Prajapati-DA Resume-Upt
No ratings yet
Vandana-Prajapati-DA Resume-Upt
2 pages
Priyansh Saini
No ratings yet
Priyansh Saini
1 page
Excel Power Pivot & Power Query For Dummies
From Everand
Excel Power Pivot & Power Query For Dummies
Michael Alexander
No ratings yet
Data Quality and Preprocessing Concepts ETL
No ratings yet
Data Quality and Preprocessing Concepts ETL
64 pages
Karim Mohamed Diab - Senior Business Intelligence Developer - CV PDF
No ratings yet
Karim Mohamed Diab - Senior Business Intelligence Developer - CV PDF
4 pages
Microsoft Power Platform For Dummies
From Everand
Microsoft Power Platform For Dummies
Jack A. Hyman
No ratings yet
Akshaya Resume
No ratings yet
Akshaya Resume
2 pages
VL2023240101055 Ast01
No ratings yet
VL2023240101055 Ast01
9 pages
DBMS Lab Ass 1 QP
No ratings yet
DBMS Lab Ass 1 QP
6 pages
Saurabh Sanjayrao Patil: Professional Summary
No ratings yet
Saurabh Sanjayrao Patil: Professional Summary
4 pages
Ba Createing Data Mart SQL
No ratings yet
Ba Createing Data Mart SQL
25 pages
Vertipaq Vs OLAP - Change Your Data Modeling Approach - Marco Russo
No ratings yet
Vertipaq Vs OLAP - Change Your Data Modeling Approach - Marco Russo
10 pages
BI Journal Manish 3
No ratings yet
BI Journal Manish 3
55 pages
Umesh Gaikwad Resume SQL - SSIS - SSRS - PowerBI
No ratings yet
Umesh Gaikwad Resume SQL - SSIS - SSRS - PowerBI
6 pages
Ashwinkumar Pandey Latest
No ratings yet
Ashwinkumar Pandey Latest
2 pages
ADF Data Flow Cheat Sheet
No ratings yet
ADF Data Flow Cheat Sheet
9 pages
Day 4 Assignment Oct
No ratings yet
Day 4 Assignment Oct
2 pages
Exercise 2: Cleansing Data With Integration Services: Creating The DQS Connection Manager
No ratings yet
Exercise 2: Cleansing Data With Integration Services: Creating The DQS Connection Manager
22 pages
Exam 70-762: Developing SQL Databases - Skills Measured: Audience Profile
No ratings yet
Exam 70-762: Developing SQL Databases - Skills Measured: Audience Profile
3 pages
MSBI Vinay Tech
No ratings yet
MSBI Vinay Tech
7 pages
CV-prerana22
No ratings yet
CV-prerana22
3 pages
Salesforce Developer Interview Questions: 1.0, #1
From Everand
Salesforce Developer Interview Questions: 1.0, #1
SFDC TELUGU
No ratings yet
Task 1-Email
No ratings yet
Task 1-Email
3 pages
Bhavana Resume PDF
No ratings yet
Bhavana Resume PDF
2 pages
Querying Microsoft SQL Server 2012/2014: Create Database Objects (20-25%)
No ratings yet
Querying Microsoft SQL Server 2012/2014: Create Database Objects (20-25%)
12 pages
767 Implementing A SQL Data Warehouse: Exam Design
No ratings yet
767 Implementing A SQL Data Warehouse: Exam Design
4 pages
Crystal Reports Introduction: Versions 2008-2016
From Everand
Crystal Reports Introduction: Versions 2008-2016
Seth Bonder
No ratings yet
ADBMS Journal
No ratings yet
ADBMS Journal
100 pages
Mayilov Semender database
No ratings yet
Mayilov Semender database
21 pages
Ssis Ssas Training Course
No ratings yet
Ssis Ssas Training Course
4 pages
Resume Subhjit o
No ratings yet
Resume Subhjit o
2 pages
Resume Subh NTT
No ratings yet
Resume Subh NTT
3 pages
Revanth main
No ratings yet
Revanth main
5 pages
Ais Elect - Reviewer
No ratings yet
Ais Elect - Reviewer
5 pages
Akanksha Final Documentation
No ratings yet
Akanksha Final Documentation
43 pages
Presentation ON System Databases: Prepared By: ANNU (030) Pooja Rana
No ratings yet
Presentation ON System Databases: Prepared By: ANNU (030) Pooja Rana
22 pages
Tarun - Tuteja SQL Developer
No ratings yet
Tarun - Tuteja SQL Developer
3 pages
Business Intelligence Manager Resume Samples - JobHero
No ratings yet
Business Intelligence Manager Resume Samples - JobHero
6 pages
Manisha Sr. Dot Net Developer
No ratings yet
Manisha Sr. Dot Net Developer
8 pages
ADB Chapter One
No ratings yet
ADB Chapter One
48 pages
Pakkiru Rajak: Key Skills
No ratings yet
Pakkiru Rajak: Key Skills
2 pages
Abhishek Neela: Sr. Engineer/Business Analyst - Samsung STA, Seattle
No ratings yet
Abhishek Neela: Sr. Engineer/Business Analyst - Samsung STA, Seattle
7 pages
WIN10 pro安装U2000LCT.zh-Eng
No ratings yet
WIN10 pro安装U2000LCT.zh-Eng
14 pages
Resume Aniket Kakade DotNet Developer1
No ratings yet
Resume Aniket Kakade DotNet Developer1
3 pages
Sample_PPT_Internship_Project new 2025
No ratings yet
Sample_PPT_Internship_Project new 2025
12 pages
Synopsis Online Insurance Management
No ratings yet
Synopsis Online Insurance Management
20 pages
DotNet SQL Interview Questions
No ratings yet
DotNet SQL Interview Questions
167 pages
V8.8 What's New PDF
No ratings yet
V8.8 What's New PDF
17 pages
Users and Roles in Database Management
No ratings yet
Users and Roles in Database Management
3 pages
How To Restore A SQL Server Backup in .NET Core - Stack Overflow
No ratings yet
How To Restore A SQL Server Backup in .NET Core - Stack Overflow
7 pages
Brij Mair
No ratings yet
Brij Mair
6 pages
Azure Data Factory
100% (2)
Azure Data Factory
14 pages
45 Essential SQL Interview Questions: Hire A Developer Apply As A Developer Log in
No ratings yet
45 Essential SQL Interview Questions: Hire A Developer Apply As A Developer Log in
48 pages
TM07 Using Basic Structured Query Language
No ratings yet
TM07 Using Basic Structured Query Language
120 pages
70-462 Sample
No ratings yet
70-462 Sample
5 pages
5C00641I StudentGuide
No ratings yet
5C00641I StudentGuide
257 pages
Akshay Kumar P
No ratings yet
Akshay Kumar P
3 pages
20764A ENU TrainerHandbook
No ratings yet
20764A ENU TrainerHandbook
524 pages
MS SQL DBA Course Content
No ratings yet
MS SQL DBA Course Content
5 pages
TECHNICAL NOTE 0140: Managing The Siebel Development Environment
No ratings yet
TECHNICAL NOTE 0140: Managing The Siebel Development Environment
19 pages
MSSQL - Student Resource Guide
No ratings yet
MSSQL - Student Resource Guide
59 pages
NAV17 Web
No ratings yet
NAV17 Web
812 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lab4 Data Quality

Uploaded by

Lab4 Data Quality

Uploaded by

Report LAB4 SQL Server Data Quality Services

7. Create a new table in the DQS_STAGING_DATA database in the dbo schema

6. Add an OLE DB source to your data flow. Rename it to customersDirty. Double-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.