0% found this document useful (0 votes)
293 views13 pages

DATA WRANGLING New

This document summarizes the process of data wrangling presented by team members Dineshkumar, Gokilavani, and Sneha, and their mentor Eyamini. It outlines the key steps in data wrangling including discovery of the raw data, structuring and formatting the data, cleaning it by removing errors, enriching it with external sources if needed, validating the quality, and publishing the final cleaned data for analysis. The overall goal of data wrangling is to transform raw data into a clean and usable format to inform accurate analysis and decision making.

Uploaded by

SNEHA SNEHA .M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
293 views13 pages

DATA WRANGLING New

This document summarizes the process of data wrangling presented by team members Dineshkumar, Gokilavani, and Sneha, and their mentor Eyamini. It outlines the key steps in data wrangling including discovery of the raw data, structuring and formatting the data, cleaning it by removing errors, enriching it with external sources if needed, validating the quality, and publishing the final cleaned data for analysis. The overall goal of data wrangling is to transform raw data into a clean and usable format to inform accurate analysis and decision making.

Uploaded by

SNEHA SNEHA .M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

DATA WRANGLING

TEAM MEMBERS NAME :


M.DINESHKUMAR [711620243002] III –AI-DS
S.GOKILAVANI [711620243302] III –AI-DS
M.SNEHA [711620243307] III -AI-DS

MENTOR NAME:
MS .C .EYAMINI AP AI-DS
KATHIR COLLEGE OF ENGINEERING
OUTLINE OF PRESENTATION

 Why is data wrangling?


 Process of data wrangling
 conclusion
WHAT IS DATA WRANGLING

 Data wrangling – also called data cleaning ,data remediation, or data munging—
refers to a variety of processes designed to transform raw data into more readily used
formats.
 The exact methods differ from project to project depending on the data you are
leveraging and the goal you are trying to achieve.
EXAMPLES

 Merging multiple data sources into a single dataset for analysis


 Identifying gaps in data (for examples , empty cells in a spreadsheet) and either filling
or deleting them
 Deleting data that’s either unnecessary or irrelevant to the project you are working on
 Identifying extreme outliers in data and either explaining the discrepancies or removing
them so that analysis can take place
DATA WRANGLING STEPS

 Each data projects requires a unique approach to ensure its final dataset is reliable and
accessible.
 That being said , several processes typically inform the approach.
 These are commonly referred to as data wrangling steps or activities.
FLOW DIAGRAM OF DATA WRANGLING
DISCOVERY

 Discovery refers to the process of familiarizing yourself with data so you can conceptualize how you
might use it.
 You can liken it to looking in your refrigerator before cooking a meal to see what ingredients you
have at your disposal.
 During discovery , you may identify trends or patterns in the data , along with obvious issues , such
as missing or incomplete values that we need to be addressed.
 This is an important step , as it will inform every activity that comes afterwards.
STRUCTURING

 Raw data is typically unusable in its raw state because it is either incomplete or
misformatted for its intended application .
 Data structuring is the process of taking raw data and transforming it to be more readily
leveraged.
 The form your data takes will depend on the analytical model you use to interpret
CLEANING

 Data cleaning is the process of removing inherent errors in data that might distort your
analysis or render it less valuable.
 Cleaning can come in different forms ,including deleting empty cells or row ,removing
outliers , and standardizing inputs.
 The goals of data cleaning is to ensure there are no errors(or few as possible) that could
influence your final analysis.
ENRICHING

 Once your understand your existing data and have transformed it into a more usable state
, you must determine whether you have all of the data necessary for the projects at hand .
 If not, you may choose to enrich or augment your data by incorporating values from
other datasets.
 For this reason ,it’s important to understand what other data is available for use.
 If you decide that enrichment is necessary ,you need to repeat the steps above for any new
data.
VALIDATING

 Data validation refers to the process of verifying that your data is both consistent and of
a high enough quality .
 During validation , you may discover issues you need to resolve or conclude that your
data is ready to be analyzed .
 Validation is typically achieved through various automated processes and requires
programming .
PUBLISHING

 Once your data has been validated , you can publish it .


 This involves making it available to others within your organization for analysis.
 The format you use to share the information –such as a written report or electronic file
–will depend on your data and organization’s goals.
THE IMPORTANCE OF DATA WRANGLING

 Any analyses a business performs will ultimately by constrained by the data that informs
them . If data is incomplete

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy