0% found this document useful (0 votes)

6 views4 pages

Encoding Notes

The document discusses the encoding of categorical variables in machine learning, highlighting the necessity of converting these variables into numerical formats for algorithm compatibility. It explains different encoding techniques such as Ordinal Encoding, One Hot Encoding, and Label Encoding, detailing their applications and limitations. Additionally, it provides a comparison between Label Encoding and One-Hot Encoding, emphasizing their respective uses and impacts on data dimensions.

Uploaded by

nhkjdhyegvemd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views4 pages

Encoding Notes

Uploaded by

nhkjdhyegvemd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Machine Learning | YBI Foundation

Encoding Concept Notes

Categorical variables are usually represented as ‘strings’ or ‘categories’ and are finite in number.
The variables only have definite possible values. Many machine learning algorithms cannot work
with categorical data directly. The categories must be converted into numbers. This is required for
both input and output variables that are categorical.

Further, categorical variables can be divided into two categories: Nominal (No particular order) and
Ordinal (some ordered).

1. Ordinal Data: The categories have an inherent order. In Ordinal data, while encoding, one
should retain the information regarding the order in which the category is provided. Like
qualification as schooling, graduate, post graduate etc. of a person possesses decides
whether a person is suitable for a post or not. Also, these qualifications are ordered from
schooling being least to post graduation being maximum qualification.
2. Nominal Data: The categories do not have an inherent order. While encoding Nominal data,
we have to consider the presence or absence of a feature. In such a case, no notion of order
is present. In such a case, no notion of order is present. For example, the city a person lives
in. For the data, it is important to retain where a person lives. Here, we do not have any order
or sequence. It is equal if a person lives in Delhi or Bangalore.

Ordinal Encoding

We do Ordinal encoding to ensure the encoding of variables retains the ordinal nature of the variable.
If we consider the temperature scale as the order, then the ordinal value should be from cold to
“Very Hot. “Ordinal encoding will assign values as ( Cold(0) <Warm(1)<Hot(2)<Very Hot(3)). Usually,
Ordinal Encoding is done starting from 0. Whereas, as per alphabetically sorted order Scikit-learn
ordinal encoding function assignee Cold(0), Hot(1), Very Hot (2) and Warm (3).

One Hot Encoding or Dummy Variable

In this method, we map each category to a vector that contains 1 and 0, denoting the presence or
absence of the feature. The number of vectors depends on the number of categories for features.
www.ybifoundation.org (+91) 9667987711 support@ybifoundation.or g

Page 2|5
Machine Learning | YBI Foundation

This method produces many columns that slow down the learning significantly if the number of the
category is very high for the feature. Pandas has get_dummies function, which is quite easy to use.
Scikit-learn has OneHotEncoder for this purpose, but it does not create an additional feature column
(another code is needed.

One Hot Encoding is very popular. We can represent all categories by N-1 (N= No of Category) as
sufficient to encode the one that is not included. Usually, for Regression, we use N-1 (drop first or
last column of One Hot Coded new feature). Let’s explain, If the model includes an intercept and
contains dummy variables, then the columns would add up (row-wise) to the intercept and this linear
combination would prevent the matrix inverse from being computed (as it is singular).

Still, for classification, the recommendation is to use all N columns without as most of the tree-
based algorithm builds a tree based on all available variables. One hot encoding with N-1 binary
variables should be used in linear Regression to ensure the correct number of degrees of freedom
(N-1). The linear Regression has access to all of the features as it is being trained and therefore
examines the whole set of dummy variables altogether. This means that N-1 binary variables give
complete information about (represent completely) the original categorical variable to the linear
Regression. This approach can be adopted for any machine learning algorithm that looks at ALL the
features simultaneously during training—for example, support vector machines and neural networks
as well as clustering algorithms.

We will never consider that additional label in tree-based methods if we drop. Thus, if we use the
categorical variables in a tree-based learning algorithm, it is good practice to encode it into N binary
variables and don’t drop.

www.ybifoundation.org (+91) 9667987711 support@ybifoundation.or g

Page 3|5
Machine Learning | YBI Foundation

Label Encoding
Labels can be words or numbers. Usually, the training data is labeled with words to make it readable.
Label encoding converts word labels into numbers to let algorithms work on them.

www.ybifoundation.org (+91) 9667987711 support@ybifoundation.or g

Page 4|5
Machine Learning | YBI Foundation

Encoding Interview Preparation

Q. What is the difference between Label Encoding and One-Hot Encoding?

Label Encoding One-Hot Encoding

How is the Converts the data into dummy
categorical Labels the data into numbers variables, i.e., binary having 1 or 0 as
data treated? values.
Var_Male: 1 and 0 / Var_Female: 0 and
Example Male: 1 Female: 2
1
Dummies can be created by either
It can be used via the sklearn
How to use it in sklearn’s function: OneHotEncoder or
package’s function called
Python? Python’s inbuilt function:
LabelEncoder
pd.get_dummies
Changes the nominal data into ordinal
The method creates extra redundant
making the values given to the
Limitation of columns for each category, and a
categories as weights, and hence the
the method different column is generated. This
machine accordingly gives those
increases the dimensions of the data.
values importance.
Solution Employ Dummy creation or One-Hot Use the various methods available for
available encoding technique dimensionality reduction
Label Encoding One-Hot Encoding

www.ybifoundation.org (+91) 9667987711 support@ybifoundation.or g

Page 5|5

100 Days of Machine Learning
No ratings yet
100 Days of Machine Learning
14 pages
Feature Engineering
100% (2)
Feature Engineering
76 pages
Regularized Target Encoding Outperforms Traditional Methods in Supervised Machine Learning With High Cardinality Features
No ratings yet
Regularized Target Encoding Outperforms Traditional Methods in Supervised Machine Learning With High Cardinality Features
22 pages
Categorical Encoding Using Label-Encoding and One-Hot-Encoder
No ratings yet
Categorical Encoding Using Label-Encoding and One-Hot-Encoder
9 pages
All About Encoding - by Baijayanta Roy - Towards Data Science
No ratings yet
All About Encoding - by Baijayanta Roy - Towards Data Science
25 pages
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
Cerda Et Al. - 2018 - Similarity Encoding For Learning With Dirty Categorical Variables
No ratings yet
Cerda Et Al. - 2018 - Similarity Encoding For Learning With Dirty Categorical Variables
18 pages
Transposition Technique: Data Encryption & Security (CEN-451) Spring2020, BUKC
No ratings yet
Transposition Technique: Data Encryption & Security (CEN-451) Spring2020, BUKC
13 pages
Machine Learning
No ratings yet
Machine Learning
34 pages
L1 - Data Pre-Processing & Steps of Building A Model
No ratings yet
L1 - Data Pre-Processing & Steps of Building A Model
30 pages
Unit 1 MLF 1
No ratings yet
Unit 1 MLF 1
33 pages
Using Categorical Data With One Hot Encoding - Kaggle PDF
No ratings yet
Using Categorical Data With One Hot Encoding - Kaggle PDF
4 pages
Feature Engineering
No ratings yet
Feature Engineering
43 pages
Handling of Categorical Data
No ratings yet
Handling of Categorical Data
18 pages
Lecture 18 One Hot Encoding
No ratings yet
Lecture 18 One Hot Encoding
13 pages
Lecture 5 Encoding
No ratings yet
Lecture 5 Encoding
35 pages
Principles of Training Multi-Layer Neural Network Using Backpropagation
100% (1)
Principles of Training Multi-Layer Neural Network Using Backpropagation
15 pages
Dealing With Categorical
No ratings yet
Dealing With Categorical
25 pages
Comparison Between Encoding Methods - 1
No ratings yet
Comparison Between Encoding Methods - 1
7 pages
ML Inter Q&A
No ratings yet
ML Inter Q&A
54 pages
Machine Learning
No ratings yet
Machine Learning
81 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
7 - InnovatiCS - Categorical Data & Data Transformation
No ratings yet
7 - InnovatiCS - Categorical Data & Data Transformation
20 pages
Week 6. Data Preparation and Transformation
No ratings yet
Week 6. Data Preparation and Transformation
34 pages
Label Encoding Presentation
No ratings yet
Label Encoding Presentation
11 pages
Exp 6
No ratings yet
Exp 6
9 pages
Encoding Categorical Data: Is There Yet Anything Hotter' Than One-Hot Encoding?
No ratings yet
Encoding Categorical Data: Is There Yet Anything Hotter' Than One-Hot Encoding?
11 pages
(Articulo) A Comparative Study of Categorical Variable Encoding PDF
No ratings yet
(Articulo) A Comparative Study of Categorical Variable Encoding PDF
4 pages
MLA TAB Lecture2
No ratings yet
MLA TAB Lecture2
84 pages
Dealing With Categorical Data
No ratings yet
Dealing With Categorical Data
14 pages
DS 1
No ratings yet
DS 1
20 pages
Python
No ratings yet
Python
14 pages
Mastering Categorical Encoding
No ratings yet
Mastering Categorical Encoding
8 pages
Machine Learning Summer Training
No ratings yet
Machine Learning Summer Training
118 pages
Unit-2 Feature Selection
No ratings yet
Unit-2 Feature Selection
92 pages
Ex 3
No ratings yet
Ex 3
11 pages
What - Why: Dummy Variables
No ratings yet
What - Why: Dummy Variables
4 pages
ML Concepts Papers
No ratings yet
ML Concepts Papers
3 pages
A Deep-Learned Embedding Technique For Categorical Features Encoding
No ratings yet
A Deep-Learned Embedding Technique For Categorical Features Encoding
11 pages
Machine Learning (2) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (2) : Inteligência Artificial E Cibersegurança (Inacs)
45 pages
TP4-ML-features Encoding
No ratings yet
TP4-ML-features Encoding
4 pages
Machine Learning: by Team 2
No ratings yet
Machine Learning: by Team 2
41 pages
Exploring Categorical Data - Students
No ratings yet
Exploring Categorical Data - Students
40 pages
Lab 6
No ratings yet
Lab 6
6 pages
Encoding Categorical Data
No ratings yet
Encoding Categorical Data
4 pages
MTRN3210 W2L2 - First-Order System
No ratings yet
MTRN3210 W2L2 - First-Order System
30 pages
Feature Normalisation
No ratings yet
Feature Normalisation
9 pages
Summary Statistics - Variable Types Cheatsheet - Codecademy
No ratings yet
Summary Statistics - Variable Types Cheatsheet - Codecademy
2 pages
Ai (Artificial Intelligence: History of Ai Types ( (Narrow Ai) and (General Ai) Weak Ai and Strong Ai
No ratings yet
Ai (Artificial Intelligence: History of Ai Types ( (Narrow Ai) and (General Ai) Weak Ai and Strong Ai
9 pages
OneHot Encoding
No ratings yet
OneHot Encoding
5 pages
Machine Learning Pipeline: Created by Arbaz Ali
No ratings yet
Machine Learning Pipeline: Created by Arbaz Ali
32 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
ML-Lab05-Data Preprocessing Techniques in Python
No ratings yet
ML-Lab05-Data Preprocessing Techniques in Python
7 pages
Working With Pre (Rocessing Data Files
No ratings yet
Working With Pre (Rocessing Data Files
4 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Feature Encoding
No ratings yet
Feature Encoding
5 pages
All About Categorical Variable Encoding
No ratings yet
All About Categorical Variable Encoding
21 pages
1 Intro
No ratings yet
1 Intro
5 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
Lab Manual 5 Solved 40
No ratings yet
Lab Manual 5 Solved 40
13 pages
NCERT Solutions For Class 12 Maths Chapter 13 Probability Exercise 13.4
No ratings yet
NCERT Solutions For Class 12 Maths Chapter 13 Probability Exercise 13.4
22 pages
DAA Assignment1
No ratings yet
DAA Assignment1
8 pages
Quantum Computers: Theory and Algorithms 1st Edition Belal Ehsan Baaquie 2024 Scribd Download
No ratings yet
Quantum Computers: Theory and Algorithms 1st Edition Belal Ehsan Baaquie 2024 Scribd Download
37 pages
St. Joseph's University, Bengaluru Department of Statistics Statistics Practical: ST 6P1 Analysis of 2 Factorial Experiment Using RBD Layout
No ratings yet
St. Joseph's University, Bengaluru Department of Statistics Statistics Practical: ST 6P1 Analysis of 2 Factorial Experiment Using RBD Layout
2 pages
One Variable Optimization
No ratings yet
One Variable Optimization
15 pages
A Comparative Study of Categorical Variable Encoding Techniques
No ratings yet
A Comparative Study of Categorical Variable Encoding Techniques
4 pages
Neural Network Exercise PDF
No ratings yet
Neural Network Exercise PDF
5 pages
Probability 2
No ratings yet
Probability 2
14 pages
Test Bank For Calculus 11th Edition Ron Larson Bruce H Edwards
No ratings yet
Test Bank For Calculus 11th Edition Ron Larson Bruce H Edwards
14 pages
Investment Theory COMM 371 - Class 13: Portfolio Theory: Optimal Portfolio
No ratings yet
Investment Theory COMM 371 - Class 13: Portfolio Theory: Optimal Portfolio
27 pages
Practical
No ratings yet
Practical
27 pages
A Short History of Probability
No ratings yet
A Short History of Probability
13 pages
Binary Search Tree: CS221 (A) - Data Structures & Algorithms
No ratings yet
Binary Search Tree: CS221 (A) - Data Structures & Algorithms
24 pages
6 LP - Post Optimal Analysis
No ratings yet
6 LP - Post Optimal Analysis
28 pages
Model Question Paper - I - or - 5TH SEM
No ratings yet
Model Question Paper - I - or - 5TH SEM
7 pages
Bmte 141 em 2024 MP
No ratings yet
Bmte 141 em 2024 MP
28 pages
Roots of Nonlinear Equation BISECTION METHOD
No ratings yet
Roots of Nonlinear Equation BISECTION METHOD
22 pages
2 Transform Coding - KLT - Discriet
No ratings yet
2 Transform Coding - KLT - Discriet
17 pages
Topic 4.5 Correlational Analysis
No ratings yet
Topic 4.5 Correlational Analysis
28 pages
Pepar 1
No ratings yet
Pepar 1
13 pages
Ph125 Notes l15
No ratings yet
Ph125 Notes l15
15 pages
SC4070 - Control Systems Lab: Course Information For Academic Year 2012/2013, Q3
No ratings yet
SC4070 - Control Systems Lab: Course Information For Academic Year 2012/2013, Q3
3 pages
2023 3 Kel State A
No ratings yet
2023 3 Kel State A
6 pages
CSE4550 Project Report
No ratings yet
CSE4550 Project Report
5 pages
Calculus Project Final
No ratings yet
Calculus Project Final
4 pages
Data Sampling in SystemVerilog
No ratings yet
Data Sampling in SystemVerilog
5 pages
Iarjset 2024 11739
No ratings yet
Iarjset 2024 11739
4 pages
ICS 46 Study Guide
No ratings yet
ICS 46 Study Guide
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Encoding Notes

Uploaded by

Encoding Notes

Uploaded by

Machine Learning | YBI Foundation

Encoding Concept Notes

One Hot Encoding or Dummy Variable

www.ybifoundation.org (+91) 9667987711 support@ybifoundation.or g

www.ybifoundation.org (+91) 9667987711 support@ybifoundation.or g

Encoding Interview Preparation

Q. What is the difference between Label Encoding and One-Hot Encoding?

Label Encoding One-Hot Encoding

www.ybifoundation.org (+91) 9667987711 support@ybifoundation.or g

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.