0% found this document useful (0 votes)
20 views

Validating Direct Mapping with SQL

The document outlines a SQL-based approach for validating direct mapping between two tables, Table 1 and Table 2, focusing on counting records, identifying mapped columns, comparing data, and checking for missing records. It provides specific SQL queries for each step, including data type and format validation. An example scenario illustrates the mapping and SQL application for practical understanding.

Uploaded by

mmyybabybaby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Validating Direct Mapping with SQL

The document outlines a SQL-based approach for validating direct mapping between two tables, Table 1 and Table 2, focusing on counting records, identifying mapped columns, comparing data, and checking for missing records. It provides specific SQL queries for each step, including data type and format validation. An example scenario illustrates the mapping and SQL application for practical understanding.

Uploaded by

mmyybabybaby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

You're dealing with a common ETL scenario: Table 1 (10 columns) and Table 2 (8

columns). You want to validate the "direct mapping," meaning specific columns from
Table 1 are moved to Table 2. Here's a SQL-based approach:

Assumptions
● You have read access to both tables.
● You know which columns in Table 1 should map to which columns in Table 2.
● Both tables are in the same database or you can use fully qualified names (e.g.,
database1.schema1.table1).
● A "key" column(s) exists that can uniquely identify rows in Table 1 and Table 2.
This is crucial for comparing data. Let's call this key column Table1_Key and
Table2_Key, respectively. They may or may not have the same name.
Steps
1. Count the Records
○ This is a basic sanity check. It doesn't guarantee data accuracy, but it can
quickly reveal if a large number of records are missing.
-- Count records in Table 1
SELECT COUNT(*) AS SourceRecordCount FROM Table1;

-- Count records in Table 2


SELECT COUNT(*) AS TargetRecordCount FROM Table2;

○ Compare the counts. The TargetRecordCount should be less than or equal to


the SourceRecordCount. If it's significantly lower, it indicates a problem. If it's
higher, it indicates duplicates, which is also a problem.
2. Identify Mapped Columns
○ This is the most crucial part. Let's say your mapping is as follows:
■ Table 1.ColumnA -> Table 2.ColumnA
■ Table 1.ColumnB -> Table 2.ColumnB
■ Table 1.ColumnC -> Table 2.ColumnC
■ Table 1.ColumnD -> Table 2.ColumnD
■ Table 1.Table1_Key -> Table 2.Table2_Key
3. Compare Data Using a JOIN
○ Use a JOIN (usually a LEFT JOIN or INNER JOIN) to compare the data in the
mapped columns.
-- Compare mapped columns
SELECT
T1.Table1_Key,
T1.ColumnA AS T1_ColumnA,
T2.ColumnA AS T2_ColumnA,
T1.ColumnB AS T1_ColumnB,
T2.ColumnB AS T2_ColumnB,
T1.ColumnC AS T1_ColumnC,
T2.ColumnC AS T2_ColumnC,
T1.ColumnD AS T1_ColumnD,
T2.ColumnD AS T2_ColumnD
FROM
Table1 AS T1
LEFT JOIN Table2 AS T2 ON T1.Table1_Key = T2.Table2_Key
WHERE
T1.ColumnA != T2.ColumnA OR
T1.ColumnB != T2.ColumnB OR
T1.ColumnC != T2.ColumnC OR
T1.ColumnD != T2.ColumnD OR
T2.Table2_Key IS NULL;

○ Explanation:
■ The FROM and LEFT JOIN clauses join the two tables on their key
columns. A LEFT JOIN is used to include all rows from Table 1, even if
there's no matching row in Table 2.
■ The SELECT clause retrieves the key columns and the mapped data
columns from both tables, aliasing them (e.g., T1_ColumnA, T2_ColumnA)
to distinguish between them.
■ The WHERE clause filters the results to show only rows where the data in
the mapped columns is different or where a key from Table 1 is not found
in Table 2 (indicating a missing row in the target).
■ If the query returns any rows, it indicates a data discrepancy.
4. Check for Missing Target Records
○ The LEFT JOIN in the previous query also helps identify missing records in
Table 2. The T2.Table2_Key IS NULL condition in the WHERE clause will find
these. You can also write a separate query:
-- Find missing records in Table 2
SELECT
T1.Table1_Key
FROM
Table1 AS T1
LEFT JOIN Table2 AS T2 ON T1.Table1_Key = T2.Table2_Key
WHERE
T2.Table2_Key IS NULL;

5. Data Type and Format Validation


○ SQL can also help with basic data type and format validation. For example:
-- Check for non-numeric values in a numeric column (e.g., age)
SELECT Table2_Key FROM Table2 WHERE TRY_CAST(ColumnB AS INT) IS
NULL AND ColumnB IS NOT NULL;

-- Check for dates in an incorrect format


SELECT Table2_Key FROM Table2 WHERE ISDATE(ColumnC) = 0 AND
ColumnC IS NOT NULL;

--Check nulls in not nullable columns


SELECT Table2_Key from Table2 where columnD is NULL;

○ These queries use SQL functions (TRY_CAST, ISDATE) to check if the data in
ColumnB and ColumnC is of the expected type. The IS NOT NULL condition is
added to not select rows where the value is already null.
Example Scenario

Let's say:
● Table 1: SourceData (SourceDataID, Name, Age, City, ProductID, OrderDate,
Email, Phone, Address, Status)
● Table 2: TargetData (TargetDataID, CustomerName, CustomerAge, CustomerCity,
ProductID, OrderDate, EmailAddress, Status)
● Mapping:
○ SourceData.Name -> TargetData.CustomerName
○ SourceData.Age -> TargetData.CustomerAge
○ SourceData.City -> TargetData.CustomerCity
○ SourceData.ProductID -> TargetData.ProductID
○ SourceData.OrderDate -> TargetData.OrderDate
○ SourceData.Email -> TargetData.EmailAddress
○ SourceData.Status -> TargetData.Status
○ SourceData.SourceDataID -> TargetData.TargetDataID
Here's how you'd apply the SQL:

-- 1. Count Records
SELECT COUNT(*) AS SourceCount FROM SourceData;
SELECT COUNT(*) AS TargetCount FROM TargetData;

-- 2. Compare Data
SELECT
SD.SourceDataID,
SD.Name AS SD_Name,
TD.CustomerName AS TD_CustomerName,
SD.Age AS SD_Age,
TD.CustomerAge AS TD_CustomerAge,
SD.City AS SD_City,
TD.CustomerCity AS TD_CustomerCity,
SD.ProductID,
TD.ProductID,
SD.OrderDate,
TD.OrderDate,
SD.Email AS SD_Email,
TD.EmailAddress AS TD_EmailAddress,
SD.Status,
TD.Status
FROM
SourceData AS SD
LEFT JOIN TargetData AS TD ON SD.SourceDataID = TD.TargetDataID
WHERE
SD.Name != TD.CustomerName OR
SD.Age != TD.CustomerAge OR
SD.City != TD.CustomerCity OR
SD.ProductID != TD.ProductID OR
SD.OrderDate != TD.OrderDate OR
SD.Email != TD.EmailAddress OR
SD.Status != TD.Status OR
TD.TargetDataID IS NULL;

-- 3. Check for Missing Target Records


SELECT SourceDataID FROM SourceData WHERE SourceDataID NOT IN (SELECT
TargetDataID FROM TargetData);

-- 4. Data Type/Format Validation


SELECT TargetDataID FROM TargetData WHERE TRY_CAST(CustomerAge AS INT) IS
NULL AND CustomerAge IS NOT NULL;
SELECT TargetDataID FROM TargetData WHERE ISDATE(OrderDate) = 0 AND
OrderDate IS NOT NULL;
SELECT TargetDataID from TargetData where EmailAddress is NULL;

This comprehensive SQL approach will help you thoroughly validate the direct
mapping from Table 1 to Table 2. Adapt the table and column names to your specific
scenario.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy