0% found this document useful (0 votes)

61 views15 pages

Machine: Learning

This document discusses machine learning and data preprocessing. It provides an overview of machine learning concepts like supervised learning, deep learning, and the machine learning workflow. It then focuses on key steps in data preprocessing, including data quality assessment, cleaning, transformation, and reduction. Specific techniques covered are handling missing values, outliers, and approaches like binning, winsorizing, and imputation.

Uploaded by

ARCHANA R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views15 pages

Machine: Learning

Uploaded by

ARCHANA R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

data science

Machine
LEARNING group 1
1. Farhan Imam Naufal
2. Annisa Dwi Ari
3. Salwa Ahla Amania
4. Mikhael Aditha Sembiring Meliala
5. Djeremiah Christofel
what is MACHINE LEARNING ?
Machine learning is a method of data analysis that automates analytical model

building.

Learn from
Learn from data
experience
ARTIFICIAL INTELLIGENCE
ability of a machine to
imitate intelligence human
behavior

MACHINE LEARNING

application of ai that
allows to system
automatically learn and
improve from experience

DEEP LEARNING application of machine

learning that uses complex
algorithms and deep
neural nets train a model
JENIS MACHINE LEARNING

SUPERVISED LEARNING SUPERVISED LEARNING SUPERVISED LEARNING

Task-Driven Data-Driven Learn From Errors

(Classification / Regression) (Clustering) (Playing Game)
MACHINE LEARNING work flow
PREPROCESSING
Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning model. It is
the first and crucial step while creating a machine learning model.

DATA QUALITY DATA DATA DATA

ASSESMENT CLEANING TRANSFORMATION REDUCTION

Information Change DType Normalization Dimensionality

DType Check Handling Null Generalization Reduction
Null & Outlier Handling Outlier etc Numerosity
Disrepancy etc Reduction
etc etc
Data
CLEANING
Data cleaning is the process of preparing data for analysis by removing or modifying
data that is incorrect, incomplete, irrelevant, duplicated, or improperly formatted.

Data cleaning is a lot of muscle work. There’s a reason data cleaning is the most
important step if you want to create a data-culture, let alone make airtight
predictions. It involves:
Fixing spelling and syntax errors
Standardizing data sets
Correcting mistakes such as empty fields
Identifying duplicate data points
MISSING VALUES
Missing data means absence of observations in columns. It appears in
values such as “0”, “NA”, “NaN”, “NULL”, “Not Applicable”, “None”.

The cause of it can be data corruption ,failure to record data, lack of

Why dataset has Missing information, incomplete results ,person might not provided the data
values? intentionally ,some system or equipment failure etc. There could any
reason for missing values in your dataset.

One of the biggest impact of Missing Data is, It can bias the results of the
Why to handle Missing
machine learning models or reduce the accuracy of the model. So, It is very
values?
important to handle missing values.
MISSING VALUES TYPES
Missing values depend on the
you have complete information as unobserved data.
In MCAR, the probability of data being
there is some relationship between

missing is the same for all the

the missing data and other If there is some structure/pattern in
observations.
values/data. missing data and other observed data

can not explain it, then it is Missing Not
In this case, there is no relationship
In this case, the data is not missing for At Random (MNAR).
between the missing data and any
all the observations. It is missing only

other values observed or unobserved

within sub-samples of the data and If the missing data does not fall under
(the data which is not recorded)
there is some pattern in the missing the MCAR or MAR then it can be
within the given dataset.
values. categorized as MNAR.

mcar mnar
mar
( M is s in g C o m p l e t el y ( M is s in g N o t A t R a n do m )
(Missing At Random)
A t Ra n d o m )

Handling
MISSING VALUES
OUTLIERS
An outlier is an object that deviates significantly from the rest of the objects. They can be
caused by measurement or execution error. The analysis of outlier data is referred to as
outlier analysis or outlier mining.
Detecting OUTLIERS
BOXPLOT SCATTERPLOT Z-SCORE IQR
Handling OUTLIERS
Similar to not detecting outliers at all, handling outliers can bear the risk of having a substantial impact
on the outcome of an analysis or machine learning model. From a mathematical point of view, there is
no right and wrong answer on how to treat outlying observations. A more important role, next to
mathematics, can be given to qualitative information you have available in the decision process around
outliers.

If we can't rectify the outliers, then we may think of some the following methods to handle outliers.
Doing nothing
Deleting/Trimming
After deleting the outliers, we should be careful not to run the outlier detection test once again. As the IQR and standard
deviation changes after the removal of outliers, this may lead to wrongly detecting some new values as outliers.

Unlike trimming, here we replace the outliers with other values. Common is replacing
Winsorizing the outliers on the upper side with 95% percentile value and outlier on the lower side
with 5% percentile.

Transformation Use transformation such as log transformation in case of right tailed distribution.

Binning or discretization of continuous data into groups such low, medium and high
Binning converts the outlier values into count values.

Robust estimators such as median while measuring central tendency and decision trees for
Use robust estimators
classification tasks can handle the outliers better.

Another method is to treat the outliers as missing values and then imputing them using similar
Imputing methods that we saw while handling missing values.

Thank You

Suresh-Sparkling Time Series Forecasting Project Report
No ratings yet
Suresh-Sparkling Time Series Forecasting Project Report
73 pages
Method of Testing To Determine Flow Resistance of HVAC Ducts and Fittings
100% (1)
Method of Testing To Determine Flow Resistance of HVAC Ducts and Fittings
6 pages
Time Series Forecasting Week 2 Quiz Part 1
75% (4)
Time Series Forecasting Week 2 Quiz Part 1
3 pages
SEAOC Seismic Design Manual Examples - UBC 97 - Vol III
No ratings yet
SEAOC Seismic Design Manual Examples - UBC 97 - Vol III
341 pages
30 Deep Learning Projects
No ratings yet
30 Deep Learning Projects
7 pages
Time Series Forecasting Week 1 Quiz Part 2
67% (3)
Time Series Forecasting Week 1 Quiz Part 2
2 pages
ThermostatCatalog 570-280
0% (1)
ThermostatCatalog 570-280
12 pages
Geo Server
No ratings yet
Geo Server
7 pages
Open Text Vendor Invoice Management 5 (1) .2 Configuration Guide
No ratings yet
Open Text Vendor Invoice Management 5 (1) .2 Configuration Guide
380 pages
Preprocessing - M2
No ratings yet
Preprocessing - M2
53 pages
2nd Year Physics CH Wise 2021 by 786 Academy
100% (6)
2nd Year Physics CH Wise 2021 by 786 Academy
10 pages
Petrochemical Processes - 2001
No ratings yet
Petrochemical Processes - 2001
174 pages
VZ 950 Titan 2018 en
100% (2)
VZ 950 Titan 2018 en
8 pages
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
No ratings yet
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
69 pages
Tron 60S/GPS EPIRB: Float-Free and Manual Bracket
No ratings yet
Tron 60S/GPS EPIRB: Float-Free and Manual Bracket
48 pages
4 - Data Pre-Processing I
No ratings yet
4 - Data Pre-Processing I
37 pages
Suresh-Rose Time Series Forecasting Project Report
100% (1)
Suresh-Rose Time Series Forecasting Project Report
75 pages
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
No ratings yet
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
52 pages
Data Analytics Program - Introduction To Data Analytics - Lesson 1
No ratings yet
Data Analytics Program - Introduction To Data Analytics - Lesson 1
56 pages
Data Quality
100% (2)
Data Quality
16 pages
Ukuran Bearing
100% (1)
Ukuran Bearing
32 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
33 pages
Data Cleaning
No ratings yet
Data Cleaning
42 pages
Feature Engineering
No ratings yet
Feature Engineering
23 pages
Data Preparation
No ratings yet
Data Preparation
17 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
60 pages
Statistics Interview Questions
100% (2)
Statistics Interview Questions
5 pages
Reading 5 - Data Preparation
No ratings yet
Reading 5 - Data Preparation
23 pages
TASKING TriCore Tools Linker Tips - Tricks - WEB
No ratings yet
TASKING TriCore Tools Linker Tips - Tricks - WEB
12 pages
Dataminin Presentation (1) .PPTX - Read-Only
No ratings yet
Dataminin Presentation (1) .PPTX - Read-Only
23 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
8 pages
ML (Supervised) - in General
No ratings yet
ML (Supervised) - in General
42 pages
Python Codin
No ratings yet
Python Codin
4 pages
Measurement
No ratings yet
Measurement
22 pages
Data Cleaning (Examples)
No ratings yet
Data Cleaning (Examples)
9 pages
Wide Damping Region For LCL Type Grid-Connected Inverter With An Improved Capacitor-Current-Feedback Method
No ratings yet
Wide Damping Region For LCL Type Grid-Connected Inverter With An Improved Capacitor-Current-Feedback Method
13 pages
Outliners
No ratings yet
Outliners
15 pages
Statistics Materials: Data Science: Week 9
No ratings yet
Statistics Materials: Data Science: Week 9
22 pages
Statistic & Machine Learning: Team 2
No ratings yet
Statistic & Machine Learning: Team 2
42 pages
Deposit Subscription: Eda Mini Project
No ratings yet
Deposit Subscription: Eda Mini Project
41 pages
Machine Learning: by Team 2
No ratings yet
Machine Learning: by Team 2
41 pages
Linear Regression and SVR
No ratings yet
Linear Regression and SVR
25 pages
Model Deployment GL
No ratings yet
Model Deployment GL
20 pages
1-Implementing A Java Program
No ratings yet
1-Implementing A Java Program
13 pages
Week 12
No ratings yet
Week 12
55 pages
Wine DS
No ratings yet
Wine DS
14 pages
AI351 Lecture 1 - Data Preprocessing
No ratings yet
AI351 Lecture 1 - Data Preprocessing
8 pages
Verb Tense Consistency
No ratings yet
Verb Tense Consistency
10 pages
Statistics and Machine Learning
No ratings yet
Statistics and Machine Learning
51 pages
DSML Brochure 2023 Latest Feb
No ratings yet
DSML Brochure 2023 Latest Feb
18 pages
3 DSEngineering
No ratings yet
3 DSEngineering
64 pages
Tableau+2020 2+relationships
No ratings yet
Tableau+2020 2+relationships
2 pages
Preprocessing
No ratings yet
Preprocessing
13 pages
Titanic DS Callenge
No ratings yet
Titanic DS Callenge
24 pages
3b. Data Pre-Processing
No ratings yet
3b. Data Pre-Processing
84 pages
Data Visualisation Using Tableau
No ratings yet
Data Visualisation Using Tableau
12 pages
PGDM Semester - I (2020-2022) End Term Examination: Instructions
100% (1)
PGDM Semester - I (2020-2022) End Term Examination: Instructions
2 pages
Getting Started With Activex Automation Using Vb6: White Paper
No ratings yet
Getting Started With Activex Automation Using Vb6: White Paper
12 pages
Plan The Week - Storytelling With Data-1
No ratings yet
Plan The Week - Storytelling With Data-1
5 pages
3-Data Preprocessing
No ratings yet
3-Data Preprocessing
32 pages
Vired
No ratings yet
Vired
4 pages
Unit 1
No ratings yet
Unit 1
21 pages
Photosynthesis Knowledge Organiser
No ratings yet
Photosynthesis Knowledge Organiser
1 page
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
ET 610 - Data Preprocessing
No ratings yet
ET 610 - Data Preprocessing
41 pages
Chapter - 2 - Cleaning and Transforming Data
No ratings yet
Chapter - 2 - Cleaning and Transforming Data
27 pages
Degradation of Silicon Two-Barrier Thin
No ratings yet
Degradation of Silicon Two-Barrier Thin
9 pages
The Scales of Harmonies: Family Popular Name Interval Steps Systematic Name Chords
No ratings yet
The Scales of Harmonies: Family Popular Name Interval Steps Systematic Name Chords
1 page
DSBDL Asg 2 Write Up
No ratings yet
DSBDL Asg 2 Write Up
4 pages
Ads Exp2 C35
No ratings yet
Ads Exp2 C35
9 pages
Lecture 3
No ratings yet
Lecture 3
32 pages
DWM Module 2
No ratings yet
DWM Module 2
9 pages
Chapter 3 Data Preparation
100% (1)
Chapter 3 Data Preparation
34 pages
CNT 0010838 02
No ratings yet
CNT 0010838 02
9 pages
CH 02 Data Handling Technique
No ratings yet
CH 02 Data Handling Technique
105 pages
PMS CH-5 Project Control, Review and Audit
No ratings yet
PMS CH-5 Project Control, Review and Audit
3 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
64 pages
Lecture 02
No ratings yet
Lecture 02
41 pages
DS Lec 6
No ratings yet
DS Lec 6
27 pages
Lec 3 Data Preprocessing and Transformation
No ratings yet
Lec 3 Data Preprocessing and Transformation
66 pages
6.1 Removed
No ratings yet
6.1 Removed
76 pages
Integrating Data From Different Sources
No ratings yet
Integrating Data From Different Sources
11 pages
CC&BD Unit 4
No ratings yet
CC&BD Unit 4
12 pages
Physical Sciences NSC P2 Memo Sept 2018 Eng Mpumalanga
No ratings yet
Physical Sciences NSC P2 Memo Sept 2018 Eng Mpumalanga
10 pages
DWDM 3
No ratings yet
DWDM 3
12 pages
Security Processor Architecture 1
No ratings yet
Security Processor Architecture 1
29 pages
Sonotube Footing
No ratings yet
Sonotube Footing
1 page
Missing Data Handling
No ratings yet
Missing Data Handling
19 pages
EDA - Zep
No ratings yet
EDA - Zep
33 pages
Data Cleaning
No ratings yet
Data Cleaning
4 pages
Data Preprocessing
No ratings yet
Data Preprocessing
11 pages
Anomalies in Dataset
No ratings yet
Anomalies in Dataset
4 pages
Advanced Separation Processes
No ratings yet
Advanced Separation Processes
3 pages
Bicosome BicowhiteComplex
No ratings yet
Bicosome BicowhiteComplex
2 pages
Data Collection Cleaning Preprocessing Presentation
No ratings yet
Data Collection Cleaning Preprocessing Presentation
13 pages
Unit-4 Part 1 Preparing Model
No ratings yet
Unit-4 Part 1 Preparing Model
20 pages
4 - Outliers - +transformaations ML
No ratings yet
4 - Outliers - +transformaations ML
28 pages
Practice Questions For Test and Exam Preparation
No ratings yet
Practice Questions For Test and Exam Preparation
3 pages
02 - 23ECE216 - EDA - Pre Processing
No ratings yet
02 - 23ECE216 - EDA - Pre Processing
16 pages
Get Theory of Neural Information Processing Systems A. C. C. Coolen PDF Ebook With Full Chapters Now
No ratings yet
Get Theory of Neural Information Processing Systems A. C. C. Coolen PDF Ebook With Full Chapters Now
45 pages
Lect 04 Preprocessing Structured
No ratings yet
Lect 04 Preprocessing Structured
39 pages
Data Quality
No ratings yet
Data Quality
14 pages
ML Lecture 5 Data Quality
No ratings yet
ML Lecture 5 Data Quality
19 pages
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
No ratings yet
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
36 pages
Chapter3 DS
No ratings yet
Chapter3 DS
17 pages
Lec 9 - 11 - Machine Learning Basics
No ratings yet
Lec 9 - 11 - Machine Learning Basics
58 pages
Data Mining
No ratings yet
Data Mining
22 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
66 pages
Outliers ML
No ratings yet
Outliers ML
14 pages
Lec 3 Data Preprocessing and Transformation
No ratings yet
Lec 3 Data Preprocessing and Transformation
73 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
Module II - Data Processing
No ratings yet
Module II - Data Processing
54 pages
Data Preprocessing
No ratings yet
Data Preprocessing
67 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Machine: Learning

Uploaded by

Machine: Learning

Uploaded by

data science

DEEP LEARNING application of machine

SUPERVISED LEARNING SUPERVISED LEARNING SUPERVISED LEARNING

Task-Driven Data-Driven Learn From Errors

DATA QUALITY DATA DATA DATA

Information Change DType Normalization Dimensionality

The cause of it can be data corruption ,failure to record data, lack of

missing is the same for all the

other values observed or unobserved

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.