0% found this document useful (0 votes)

11 views7 pages

Techniques Used To Transform Data, Part 2

Uploaded by

Constant HOUEHA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

Techniques Used To Transform Data, Part 2

Uploaded by

Constant HOUEHA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Techniques used to transform data, part 2

So far, you’ve learned that data transformations enable you to make changes to your data so it
is usable for analysis and visualization. You also learned about useful transformation
techniques in SQL for handling aggregation, deduplication, derivation, and filtering. In this
reading, you’ll learn more techniques for transforming data.

Data transformation types

You won’t always receive data in a ready-to-use format, so you may have to transform it
depending on your purpose. Many of the previous methods you learned for SQL can be
applied to the techniques you will learn about in this reading. These techniques can be used on
large datasets:

● Data integration with unions

● Data joining with joins
● Data splitting
● Formatting data with concatenation

Data integration

Data integration is the combination of rows from two or more tables to create a single dataset.
Integrating data is useful for data spread across multiple tables, or databases that need a
unified view.

For example, an analyst needs to work with sales data for two different years. The sales data
for one year is in one table, and the second year is in another. The analyst needs a combined
view to perform analysis to show how sales have changed over time.

The analyst writes a SELECT statement that takes the product name and sales amount from
the first year’s table, and writes a UNION statement to join the product name and sales amount
from the second year’s table.

Unset
SELECT product_name, sales_amount FROM sales_2021

UNION

SELECT product_name, sales_amount FROM sales_2022;

1
If the sales_2021 and sales_2022 tables contain these rows of data:

Unset
| product_name | sales_amount |
|-—------------|--------------|
| Product A | 1500 |
| Product B | 1000 |
| Product C | 500 |
| Product D | 800 |

Unset
| product_name | sales_amount |
|-—------------|--------------|
| Product A | 1800 |
| Product B | 1100 |
| Product C | 700 |
| Product E | 1200 |

Then the query will return this result:

Unset
| product_name | sales_amount |
|-—------------|--------------|
| Product A | 1500 |
| Product B | 1000 |
| Product C | 500 |
| Product D | 800 |
| Product A | 1800 |
| Product B | 1100 |
| Product C | 700 |
| Product E | 1200 |

The SQL UNION operator integrates data from multiple tables by combining the results of one
or more SELECT statements. Each SELECT statement in a UNION must have the same
columns in the same order, and the columns must have the same datatypes in each underlying

2
table. The UNION operator selects only distinct values by default, meaning if there’s a row with
the same product_name and sales_amount in both tables, it will appear only once in the result.

Data joining

Joins in SQL combine rows from two or more tables based on the related columns between
them. If you have several tables that contain information you need, use a JOIN statement to
combine the data into one table. There needs to be a related column between the tables you’re
joining. The columns don’t need to have the same name, but they do need to contain the same
data and data type.

For example, an analyst wants to determine if a customer is more likely to purchase products if
they’re on the company email list. Customer information, like email address and order
numbers, are in one table, but descriptions and categorizations of products bought are in
another table. Both columns contain a customer_id column.

The analyst writes a SELECT statement to take the order id and order date from the orders
table, and joins the customer name from the customers table on the customer_id column.

Unset
SELECT orders.orders_id, customers.customer_name,
orders.order_date FROM orders JOIN customers ON orders.customer_id
= customers.customer_id;

If the orders and customers table contain these rows of data:

Unset
| orders_id | customer_id | order_date |
|-----------|-------------|------------|
| 1 | 101 | 2023-01-01 |
| 2 | 102 | 2023-02-01 |
| 3 | 103 | 2023-03-01 |
| 4 | 101 | 2023-04-01 |

3
Unset
| customer_id | customer_name |
|-------------|---------------|
| 101 | John Doe |
| 102 | Jane Doe |
| 103 | Jim Bean |

Then the query will return this result:

Unset
| orders_id | customer_name | order_date |
|-----------|---------------|------------|
| 1 | John Doe | 2023-01-01 |
| 2 | Jane Doe | 2023-02-01 |
| 3 | Jim Bean | 2023-03-01 |
| 4 | John Doe | 2023-04-01 |

Data splitting

Data splitting is when you divide data within a column to create two or more columns.
Sometimes, data arrives in a combined format, but needs to be stored separately for better
analysis or clarity. Data splitting is a useful technique for extracting important information from
a column. Analysts often use this approach for extracting product codes from descriptions.

If an analyst wants to extract data from product_code SKU092023, the first field would return a
‘09’ and the second would return ‘2023’ , which represents month and year of sale.

Unset
SELECT SUBSTRING(product_code, 4, 2) as product_month,
SUBSTRING(product_code, 6, 4) as product_year from product_table

If the product_table contains these rows of data:

4
Unset
| product_id | product_code | product_name |
|------------|--------------|--------------|
| 1 | SKU092023 | Product A |
| 2 | SKU122022 | Product B |
| 3 | SKU082021 | Product C |
| 4 | SKU072020 | Product D |
| 5 | SKU042019 | Product E |

Then the query will return the this result:

Unset
| product_month | product_year |
|---------------|--------------|
| 09 | 2023 |
| 12 | 2022 |
| 08 | 2021 |
| 07 | 2020 |
| 04 | 2019 |

Formatting data

Formatting data involves changing the presentation of data, like modifying text cases or
merging columns. Formatting data creates uniformity and contributes to better reporting
because the data is standardized. Imagine a dataset that contains data in lower case, upper
case, and a mix of both. If you’re using this data to create a dashboard, these inconsistencies
will feed into the visualization and create confusion for your audience.

Consider a table called donor_table containing information on donor name and contribution
amount to a charity event. If you wanted to create a bar graph associated with each
donor_name, the name case would be inconsistent and hard to read:

Unset
| donor_name | contribution_amt_USD |
|---------------|----------------------|
| JOHN DOE | 1000 |
| jane doe | 500 |

5
| MiKe Black | 750 |
| SARAH WHITE | 1200 |
| daniel GREEN | 600 |
| AMY o’connell | 300 |
| RACHEL Brown | 450 |
| aLan smith | 900 |

The CONCAT statement is useful for merging columns with data better suited to be combined.
For example, if you have columns with first name, last name, and birthday, you can combine all
three columns to create a unique ID. This will make it easier to identify duplicates later, but it
will also reduce the number of columns you’ll see in the final view.

Unset
SELECT CONCAT(first_name, ‘’, last_name, ‘’, birthdate) AS
unique_id)) FROM employees;

If the employees table contains these rows of data:

Unset
| id | first_name | last_name | product_name |
|----|------------|--------------|--------------|
| 1 | John | Doe | 1990-01-01 |
| 2 | Jane | Smith | 1985-06-15 |
| 3 | Alan | Johnson | 1978-12-12 |
| 4 | Mary | Lee | 1992-04-03 |
| 5 | Jack | White | 1982-09-10 |

Then the query will return this result:

Unset
| unique_id |

6
|---------------------|
| JohnDoe19900101 |
| JaneSmith19850615 |
| AlanJohnson19781212 |
| MaryLee19920403 |
| JackWhite19820910 |

Key takeaways
Techniques like unions and joins, and splitting and formatting data play a vital role in preparing
and transforming data for analysis.

These techniques ensure that data is in a usable format for analysis by helping to reduce
unnecessary information, so that analysts can provide meaningful insights and impactful
visualizations.

Resources for more information

The following resource further explores how you can collect data effectively:
● Solutions marketer and MBA Firoj Alam’s perspective on the importance of proper
formatting, and how it impacts data collection: Importance of Proper Formatting in
Data Collection

SQL For Data Analysis PDF
100% (1)
SQL For Data Analysis PDF
10 pages
SQL Handbook - Cracknontech
No ratings yet
SQL Handbook - Cracknontech
24 pages
Deploying DFS Replication On A Windows Failover Cluster
No ratings yet
Deploying DFS Replication On A Windows Failover Cluster
33 pages
Big Data Architectures and The Data Lake: James Serra
No ratings yet
Big Data Architectures and The Data Lake: James Serra
53 pages
Crack Your Data Engineering SQL Round
No ratings yet
Crack Your Data Engineering SQL Round
112 pages
?? ????? - 1696879727
No ratings yet
?? ????? - 1696879727
77 pages
Components of An Android Application: 1. Activities
100% (1)
Components of An Android Application: 1. Activities
3 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
11 pages
Techniques Used To Transform Data, Part 1
No ratings yet
Techniques Used To Transform Data, Part 1
12 pages
Introduction To Database and SQL - BIA 5002 Week 4 Winter 22
No ratings yet
Introduction To Database and SQL - BIA 5002 Week 4 Winter 22
40 pages
Tech Mahindra Data Analyst Interview Questions
No ratings yet
Tech Mahindra Data Analyst Interview Questions
11 pages
Iot Basics
No ratings yet
Iot Basics
43 pages
Profihub B5plus B5plusrd Manual en v301
No ratings yet
Profihub B5plus B5plusrd Manual en v301
53 pages
TLM8 Protocols CE-M Approved Manual EN
No ratings yet
TLM8 Protocols CE-M Approved Manual EN
132 pages
SQL by Rohan
No ratings yet
SQL by Rohan
13 pages
Rohitjha DB, S
No ratings yet
Rohitjha DB, S
37 pages
OpenStack-made-easy Ebook 11.17 PDF
No ratings yet
OpenStack-made-easy Ebook 11.17 PDF
29 pages
Manual MLT 1 MLT 2 Cat 200 Foundation Fieldbus Communication Software 3rd Ed Rosemount en 69940
No ratings yet
Manual MLT 1 MLT 2 Cat 200 Foundation Fieldbus Communication Software 3rd Ed Rosemount en 69940
98 pages
Data Cleaning in SQL
No ratings yet
Data Cleaning in SQL
14 pages
SQL Essentials: Mark Mcilroy
No ratings yet
SQL Essentials: Mark Mcilroy
36 pages
SQL 1721960421
No ratings yet
SQL 1721960421
131 pages
Mastering Data Cleaning Techniques With SQL - Explained Examples - by ? Pandata - Level Up Coding
No ratings yet
Mastering Data Cleaning Techniques With SQL - Explained Examples - by ? Pandata - Level Up Coding
31 pages
Rslogix 5000 Training Seminar: Programming in Ladder Logic With Rockwell'S Rs 5000
No ratings yet
Rslogix 5000 Training Seminar: Programming in Ladder Logic With Rockwell'S Rs 5000
10 pages
Advanced SQL Techniques
No ratings yet
Advanced SQL Techniques
19 pages
Most Confusing SQL Functions
No ratings yet
Most Confusing SQL Functions
27 pages
Modern Optimization With R Use R 2nd Ed 2021 3030728188 9783030728182 - Compress
No ratings yet
Modern Optimization With R Use R 2nd Ed 2021 3030728188 9783030728182 - Compress
264 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
62 pages
SQL 2005/2008 DBA (Database Administrator) : Kebutuhan: 1 Orang
No ratings yet
SQL 2005/2008 DBA (Database Administrator) : Kebutuhan: 1 Orang
4 pages
Integrated Jackpot Controller Technical Guide - SAS 985 116 Rev B
No ratings yet
Integrated Jackpot Controller Technical Guide - SAS 985 116 Rev B
79 pages
Delphi Informant 95 2001
No ratings yet
Delphi Informant 95 2001
49 pages
Investment Management With SAP ERP-FB
No ratings yet
Investment Management With SAP ERP-FB
349 pages
Dbms 3 Notes
No ratings yet
Dbms 3 Notes
32 pages
Mobile Legends Hack No Offers + Unlimited Diamonds Generator 2019 New Year Offer
No ratings yet
Mobile Legends Hack No Offers + Unlimited Diamonds Generator 2019 New Year Offer
4 pages
A Systematic Review of Voice Assistant Usability: An ISO 9241-11 Approach
No ratings yet
A Systematic Review of Voice Assistant Usability: An ISO 9241-11 Approach
23 pages
Homework Answer-1
No ratings yet
Homework Answer-1
3 pages
G4 Advance SQL Implementation
No ratings yet
G4 Advance SQL Implementation
51 pages
Module III DBMS
No ratings yet
Module III DBMS
13 pages
Data Cleaning in SQL
No ratings yet
Data Cleaning in SQL
21 pages
Introduction To Structured Query Language (SQL)
No ratings yet
Introduction To Structured Query Language (SQL)
44 pages
SQL
No ratings yet
SQL
20 pages
IDAB Assignment 3: 1. Explain SQL Subqueries
No ratings yet
IDAB Assignment 3: 1. Explain SQL Subqueries
6 pages
Python Notes Typed
No ratings yet
Python Notes Typed
3 pages
Muhammad Ahmed Khan - Cv-1
No ratings yet
Muhammad Ahmed Khan - Cv-1
2 pages
Services Grameenphone (GP) Banglalink Robi Teletalk: Type "SFC Old Number New Number" & Send To 2888
No ratings yet
Services Grameenphone (GP) Banglalink Robi Teletalk: Type "SFC Old Number New Number" & Send To 2888
1 page
SQL Cheat Sheet:: - by Yash Shirodkar
No ratings yet
SQL Cheat Sheet:: - by Yash Shirodkar
8 pages
SC4x W2L2 v2
No ratings yet
SC4x W2L2 v2
49 pages
Advanced SQL Concepts
No ratings yet
Advanced SQL Concepts
38 pages
Three Main Ways To Combine Data
No ratings yet
Three Main Ways To Combine Data
6 pages
Www-Wikihow-Com-Us
No ratings yet
Www-Wikihow-Com-Us
10 pages
Lecture Week 3-Databases
No ratings yet
Lecture Week 3-Databases
17 pages
Lateral Thinking
No ratings yet
Lateral Thinking
4 pages
Microsoft Azure Presentation
No ratings yet
Microsoft Azure Presentation
32 pages
SQL Interview
100% (1)
SQL Interview
68 pages
SQL Cheat Sheet DS
No ratings yet
SQL Cheat Sheet DS
22 pages
DBMS Reviewer
No ratings yet
DBMS Reviewer
44 pages
6.1 Managing Backup and Recovery in Oracle RAC
No ratings yet
6.1 Managing Backup and Recovery in Oracle RAC
10 pages
What Is Good SOW
No ratings yet
What Is Good SOW
8 pages
Fun With SQL
No ratings yet
Fun With SQL
6 pages
SQL Topic 7 - LECTURE 3
No ratings yet
SQL Topic 7 - LECTURE 3
25 pages
Statement of Purpose@ Pace
No ratings yet
Statement of Purpose@ Pace
3 pages
Interview - 7 - IMP
No ratings yet
Interview - 7 - IMP
26 pages
UU-COM-4008 Reading Material Week 3
No ratings yet
UU-COM-4008 Reading Material Week 3
9 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
12 pages
Chapter 2 - SQL Basics and Query Optimization
No ratings yet
Chapter 2 - SQL Basics and Query Optimization
23 pages
Aws Cloud Technical Essentials
No ratings yet
Aws Cloud Technical Essentials
2 pages
3.note 3
No ratings yet
3.note 3
10 pages
SQL Functions For Data Analysis Tasks PDF
No ratings yet
SQL Functions For Data Analysis Tasks PDF
16 pages
Advanced Concepts in SQL
No ratings yet
Advanced Concepts in SQL
5 pages
SQL Revision
No ratings yet
SQL Revision
41 pages
BDST 122 RDBMS
No ratings yet
BDST 122 RDBMS
12 pages
Week 2SQL
No ratings yet
Week 2SQL
7 pages
SQL Cheat Sheet:: - by Yash Shirodkar
No ratings yet
SQL Cheat Sheet:: - by Yash Shirodkar
8 pages
SQL Sarans
No ratings yet
SQL Sarans
17 pages
LAB 04 Basic Queries (Part 2) : IN Operator
No ratings yet
LAB 04 Basic Queries (Part 2) : IN Operator
14 pages
Essential SQL Commands Cheat Sheet
No ratings yet
Essential SQL Commands Cheat Sheet
3 pages
SQL Essentials PDF
No ratings yet
SQL Essentials PDF
36 pages
CH 1 Mechatronics
No ratings yet
CH 1 Mechatronics
26 pages
SQL Advanced Cheatsheet
No ratings yet
SQL Advanced Cheatsheet
1 page
IP XII Quick Notes - Querying in MYSQL
No ratings yet
IP XII Quick Notes - Querying in MYSQL
11 pages
Arduino OBD2 Simulator - 3 Steps - Instructables
100% (1)
Arduino OBD2 Simulator - 3 Steps - Instructables
7 pages
SQL Cheat Sheet Grid
No ratings yet
SQL Cheat Sheet Grid
8 pages
SQL Query Tutorial
No ratings yet
SQL Query Tutorial
12 pages
Excel - Module 2 (Formulas, Functions, and Formatting)
No ratings yet
Excel - Module 2 (Formulas, Functions, and Formatting)
3 pages
SQL Theory With Query
No ratings yet
SQL Theory With Query
11 pages
Aaaaaa
No ratings yet
Aaaaaa
15 pages
Dispute Management
No ratings yet
Dispute Management
3 pages
KPMG Data Analyst Interview Questions
No ratings yet
KPMG Data Analyst Interview Questions
30 pages
Business 360°: Unlocking Computer Application
From Everand
Business 360°: Unlocking Computer Application
NotesKaro
No ratings yet
Excel Dashboards & Reports For Dummies
From Everand
Excel Dashboards & Reports For Dummies
Michael Alexander
4/5 (1)
Excel Power Pivot & Power Query For Dummies
From Everand
Excel Power Pivot & Power Query For Dummies
Michael Alexander
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Techniques Used To Transform Data, Part 2

Uploaded by

Techniques Used To Transform Data, Part 2

Uploaded by

Techniques used to transform data, part 2

Data transformation types

● Data integration with unions

SELECT product_name, sales_amount FROM sales_2022;

Then the query will return this result:

If the orders and customers table contain these rows of data:

Then the query will return this result:

If the product_table contains these rows of data:

Then the query will return the this result:

If the employees table contains these rows of data:

Then the query will return this result:

Resources for more information

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.