0% found this document useful (0 votes)
97 views34 pages

Suhana

This document is a seminar report on automated machine learning submitted by Suhana Sainab. It discusses automated machine learning as the new wave of machine learning. The report provides an overview of the automated machine learning pipeline, including data preprocessing techniques, feature engineering, and model selection/hyperparameter optimization methods. It concludes with a discussion of the impact and future directions of automated machine learning.

Uploaded by

Shameer k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views34 pages

Suhana

This document is a seminar report on automated machine learning submitted by Suhana Sainab. It discusses automated machine learning as the new wave of machine learning. The report provides an overview of the automated machine learning pipeline, including data preprocessing techniques, feature engineering, and model selection/hyperparameter optimization methods. It concludes with a discussion of the impact and future directions of automated machine learning.

Uploaded by

Shameer k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

AUTOMATED MACHINE LEARNING:THE

NEW WAVE OF MACHINE LEARNING


SEMINAR REPORT

(Submitted in Partial Fulfillment of the Requirement for B-Tech Degree Course in Electronics
and Communication Engineering of APJ Abdul Kalam Technological University)

Submitted By
SUHANA SAINAB.S(AME19EC013)

Under the guidance of


Ms. ASHA ARVIND
Assistant Professor,
Department of Electronics and Communication
Engineering
AUTOMATED MACHINE LEARNING:THE NEW WAVE OF
MACHINE LEARNING

SEMINAR REPORT

(Submitted in Partial Fulfillment of the Requirement for B. Tech DegreeCourse in Electronics and
Communication Engineering of A P J Abdul Kalam Technological University)

Submitted by
SUHANA SAINAB.S (AME19EC013)

under the guidance of


Ms.ASHA ARVIND
Assistant Professor

Department of Electronics and Communication Engineering

Department of Electronics and Communication Engineering


Rajadhani Institute of Science and Technology,Post
Mankara, Palakkad – 678613
RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY
PALAKKAD, KERALA-678613

Department of Electronics and Communication Engineering

CERTIFICATE
This is to certify that seminar report entitled “AUTOMATED MACHINE LEARNING:THE
NEW WAVE OF MACHINE LEARNING” is a bonafide record of the project work done
by SUHANA SAINAB.S(AME19EC013) at Rajadhani Institute of Science and Technology,
in partial fulfillment of the requirements of the B.Tech Degree course in Electronics and
Communication Engineering of A P J Abdul Kalam Technological University 2019- 2023
batch.

Ms.SITHARA KRISHNAN Ms.ASHA ARVIND


Co-ordinator Guide

Ms.SITHARA KRISHNAN
Head of department

Internal Examiner External Examiner


ACKNOWLEDGEMENT

It is with great enthusiasm and learning spirit that I am bringing out this seminar report.
Here I would like to mark my token of gratitude to all those who influenced me during
the period of my work. I would like to express my sincere thanks to The Management
of Rajadhani Institute of Science and Technology, Palakkad and Dr.RAMANI.K, The
Principal Rajadhani Institute of Science and Technology for the facilities provided here.
I express my heart-felt gratitude to Head of the Department, Ms.SITHARA KRISHNAN,
Assistant Professor, Department of Electronics and Communication Engineering for allowing
me to takeup this work.
With immense pleasure and gratitude, I express sincere thanks to my guide Ms.ASHA
ARVIND and Co-ordinator Ms.SITHARA KRISHNAN, Assistant Professor for her com-
mitted guidance, valuable suggestions and constructive criticisms. Her stimulating suggestions
and encouragement helped me through our project work. I extend my gratitude to all teachers
in the Department of Electronics and Communication Engineering, Rajadhani Institute of
Science and Technology, Palakkad for their support and inspiration.
Above all I praise and thank the Almighty God, who showered her abundant grace on me to
make this project a success. I also express my special thanks and gratitude to my family and
all my friends for their support and encouragement.
ABSTRACT

With the explosion in the use of machine learning in various domains, the need for an
efficient pipeline for the development of machine learning models has never been more crit-
ical. However, the task of forming and training models largely remains traditional with a
dependency on domain experts and time-consuming data manipulation operations, which
impedes the development of machine learning models in both academia as well as indus-
try. This demand advocates the new research era concerned with fitting machine learning
models fully auto- matically i.e., AutoML. Automated Machine Learning (AutoML) is an end-
to-end process that aims at automating this model development pipeline without any external
assistance. First, we provide an insights of AutoML. Second, we delve into the individual
segments in the AutoML pipeline and cover their approaches in brief. We also provide a case
study on the industrial use and impact of AutoML with a focus on practical applicability in a
business context. At last, we conclude with the open research issues, and future research
directions. Index Terms—Automated Machine Learning, Artificial Intelli- gence Meta
Learning, Hyperparameter Optimization.
AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING ii

Contents

2 WHAT IS MACHINE LEARNING 2

3 AUTOMATED MACHINE LEARNING 4


3.1 AutoML Platforms 4
3.2 AUTO ML 5

4 Data preprocessing 7
4.1 Data Imputation 7
4.2 Data Balancing 8
4.3 Data Encoding 12

5 FEATURE ENGINEERING 14
5.1 Feature Mining 14
5.2 Feature Generation 15
5.3 Feature Generation 16

6 Model Selection and Hyperparameter Optimization 18


6.1 Grid search 19
6.2 Random Search 20
6.3 Sequential Model-Based Optimization 21
6.4 Evolutionary optimization 23

7 DISCUSSION 24

8 CONCLUSION 25

BIBLIOGRAPHY 25

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING iii

List of Figures
to∆10
2.1 Machine learning 2

3.1 Auto Machine learning pipeline 6

4.1 Data preprocessing 7


4.2 Data imputation 8
4.3 Data balancing 12
4.4 Data Encoding 13

5.1 An example of the feature tree generated 16


5.2 Difference between feature selection and feature extraction 17
6.1 Hyperparameter optimization by trial and error 19
6.2 Grid Search 21
6.3 Random Search 22
6.4 Sequential Model-Based Optimization 23
6.5 Evolutionary Optimizaton
24

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 1

Chapter 1

INTRODUCTION
Data analysis is a powerful tool for learning insights on how to improve the
decision making, business model and even products. This involves the construction and
training of a machine learning model which faces several challenges due to lack of expert
knowledge. This challenges can be overcomed by using automated machine
learning(AutoML) field. AutoML refers to the process of studying a traditional machine
learning model development pipeline to segment it into modules and automate each of those
to accelerate workflow. With the advent of deeper models, such as the ones used in image
processing, Natural Language Processing, etc., there is an increasing need for tailored models
that can be crafted for specific workloads. However, such specific models require immense
resources such as high capacity memory, strong GPUs, domain experts to help during the
development and long wait times during training.
The task gets critical as there is not much work done for creating a formal framework for
deciding model parameters without the need for trial and error. These nuances emphasized the
need for AutoML where automation can reduce turnaround times and also increase the
accuracy of the derived models by removing human errors. In recent years, several tools and
models have been proposed in the domain of AutoML. Some of these focus on particular
segments of AutoML such as feature engineering or model selection, whereas some models
attempt to optimize the complete pipeline. These tools have matured enough to be able to
compare with human experts on Kaggle competitions and at times have beat them as well,
showcasing their veracity. There are wide variety of applications based on AutoML such as
autonomic cloud computing , Intelligent Vehicular networks, Block Chain,Software Defined
Networking , among others.
This paper aims at providing an overview of the advances seen in the realm of AutoML in
recent years. We focus on in- dividual aspects of AutoML and summarize the improvements
achieved in recent years. The motivation of this paper stems from the unavailability ofa
compact study of the current state of AutoML. While we acknowledge the existence of other
surveys , their motive is to either provide an in-depth understanding of a particular segment of
AutoML, provide just an experimental comparison of various tools used or are fixated towards
deep learning models.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 2

Chapter 2

WHAT IS MACHINE LEARNING


Machine learning is an application of AI that enables systems to learn and im-
prove from experience without being explicitly programmed. Machine learning focuses on
developing computer programs that can access data and use it to learn for themselves.
Similar to how the human brain gains knowledge and understanding, machine learn-
ing relies on input, such as training data or knowledge graphs, to understand entities, do- mains
and the connections between them. With entities defined, deep learning can begin.

Figure 2.1: Machine learning

The machine learning process begins with observations or data, such as examples, direct
experience or instruction. It looks for patterns in data so it can later make inferences based on
the examples provided. The primary aim of ML is to allow computers to learn autonomously
without human intervention or assistance and adjust actions accordingly.Machine learning
as a concept has been around for quite some time. The term “machine learning” was coined by
Arthur Samuel, a computer scientist at IBM and a pioneer in AI and computer gaming. Samuel
designed a computer program for playing checkers. The more the program played, the more
it learned from experience, using algorithms to make predictions.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 3

ML has proven valuable because it can solve problems at a speed and scale that cannot
be duplicated by the human mind alone. With massive amounts of computational ability behind
a single task or multiple specific tasks, machines can be trained to identify patterns in and
relationships between input data and automate routine processes
Supervised Learning: More Control, Less Bias Supervised machine learning algorithms
apply what has been learned in the past to new data using labeled examples to predict future
events. By analyzing a known training dataset, the learning algorithm produces an inferred
function to predict output values. The system can provide targets for any new input after
sufficient training. It can also compare its output with the correct, intended output to find
errors and modify the model accordingly.
Unsupervised Learning: Speed and Scale Unsupervised machine learning algorithms are
used when the information used to train is neither classified nor labeled. Unsupervised learning
studies how systems can infer a function to describe a hidden structure from unlabeled data.
At no point does the system know the correct output with certainty. Instead, it draws inferences
from datasets as to what the output should be.
Reinforcement Learning: Reinforcement learning is a feedback-based learning method,
in which a learning agent gets a reward for each right action and gets a penalty for each
wrong action. The agent learns automatically with these feedbacks and improves its
performance. In reinforcement learning, the agent interacts with the environment and
explores it. The goal of an agent is to get the most reward points, and hence, it improves its
performance. The robotic dog, which automatically learns the movement of his arms, is an
example of
Reinforcement learning.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 4

Chapter 3

AUTOMATED MACHINE LEARNING


Automated Machine Learning or AutoML is a way to automate the time-consuming and
iterative tasks involved in the machine learning model development process. It provides
various methods to make machine learning available for people with limited knowledge of
Machine Learning. It aims to reduce the need for skilled people to build the ML model. It also
helps to improve efficiency and to accelerate the research on Machine learning.

3.1 AutoML Platforms


AutoML has evolved before many years, but in the last few years, it has gained popular-
ity. There are several platforms or frameworks that have emerged. These platforms enable the
user to train the model using drag drop design tools.

1. Google Cloud AutoML

Google has launched several AutoML products for building our own custom machine
learning models as per the business needs, and it also allows us to integrate these models
into our applications or websites. Google has created the following product:
AutoML Natural Language AutoML Tables AutoML translation AutoML Video Intel-
ligence AutoML Vision The above products provide various tools to train the model for
specific use cases with limited machine learning expertise. For cloud AutoML, we don’t need
to have knowledge of transfer learning or how to create a neural network, as it provides the
out-of-box for deep learning models.

2. Microsoft Azure AutoML

The Microsoft Azure AutoML was released in the year 2018. It also offers a transparent
model selection process to non-ml experts to build the ML models.

3. H2O.ai

H2O is an open-source platform that enables the user to create ML models. It can be
used for automating the machine learning workflow, such as automatic training and tuning
RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE
AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 5

of many models within a user-specified time limit. Although H2O AutoML can make the
development of ML models easy for the non-experts still, a good knowledge of data science
is required to build the high-performing ML models.

4. TPOT

TPOT(Tree-based Pipeline Optimization) can be considered as a Data science assistant for


developers. It is a Python packaged Automated Machine Learning tool, which uses genetic
programming to optimize the machine learning pipelines. It is built on the top of

5. DataRobot

DataRobot is one of the best AutoML tools platforms. It provides complete automa-
tion by automating the ML pipeline and supports all the steps required for the preparation,
building, deployment, monitoring, and maintaining the powerful AI applications.

6. Auto-Sklearn

Auto-Sklearn is an open-source library built on the top of scikit learn. Itautomatically does
algorithm selection and parameter tuning for a machine learning model.It provides out-of-the-
box features of supervised learning.

7. MLBox

MLBox also provides the powerful Python Library for automated Machine Learning.

3.2 AUTO ML
An AutoML is the process of automating the end-to-end process of applying machine
learning to real-world problems. The problem of AutoML is a combinational one, where any
proposed algorithm is required to find a suitable combination of operations for each segment
of the ML pipeline to minimize the errors.
The standard data pre-processing operations are well defined and discussed in sectionII-
A. While a completely raw data collection cannot be processed with these standard opera-
tions, datasets are usually refined to some extent and can work with such operations well. The
automation in data pre-processing is defined as a series of actions that are selected from the
standard pre-defined operation set and performed on the dataset. Feature Engineeringis
performed by selecting relevant features from the dataset by finding dependant pairs and using
them for generating new features. Model selection and hyperparameter optimization work on
finding the optimal parametric configuration from an infinite search-space or from learning
RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE
AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 6

them (reinforcement learning) from previous models designed for various tasks.The final
term in equation 1 demonstrates the probabilistic reinforcement learning used in recent years
for constraining the configuration space.
The solution-space explosion due to exponentials and facto- rials, as shown in equation 1
is the core issue of AutoML. This explosion causes a high expense computationallyand voids
any accuracy advantage over humans. To address this problem, various research works that are
proposed, allows a parameter configuration to granularly adjust the volumeof the search space
explored by any algorithm. Some works have removed the combination configurations deemed
ineffective based on previous experience

Figure 3.1: Auto Machine learning pipeline

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 7

Chapter 4
DATA PREPROCESSING
• Data Imputation
• Data Balancing
• Data Encoding
This section describes the various segments of AutoML as per the taxonomy shown.
We present the most notable contributions seen in the domain of AutoML. We compare the
various approaches adopted for each individual segment of AutoML.
Data pre-processing guarantees the delivery of quality data derived from the original
dataset. It is an important step due to the unavailability of quality data as a large portion of
information generated and stored is usually semi-structural or even non-structured in form.
However, even though it is a crucial part of any machine learning pipeline, it is reported
to be the least enjoyable part, with authors stating that 60-80 percentage of data scientists
finding it to be the most mundane and tedious job. In AutoML, certain data-preprocessing
operations are hard- coded, which are then applied to a given dataset in certain combinations
such that the overall clarity and usability of the data increases. We have largely classified these
operations into the following categories based on our surveys of recent papers.

Figure 4.1: Data preprocessing

4.1 Data Imputation


Often datasets, in reality, may contain missing values for some different reasons (human
error or unavailability of the data). There are two fundamental kinds of missing data as
described by , which are missing at random (MAR) data and Missing completely at random

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 8

(MCAR) data. The randomness of MCAR data is high enough that there is no overall bias. In
data Imputation, we deal with inconsistencies such as NaNs, spaces, Null values, incorrect data
types, etc. This is addressed by replacing these values with multiple methods such as default
value selection in which every problematic value is removed, and a pre- selected value takes its
place. Another approach is to use the mean or median of the dataset column to replace any
missing value. Some approaches as regression imputation have used standard Deviation and
Variance to compute the replacement value for a given data column. Some data imputation
technique with lighter time constraints uses the successive halving approach in Auto- WEKA.
XGboost algorithm is also used widely in TPOT tool , and Auto- WEKA for data

Figure 4.2: Data imputation

4.2 Data Balancing


Data imbalance is a condition when one or more classes in a categorical dataset have higher
observations than the rest of the classes. Feeding such im- balanced data leads to the input
majority class have an unjustified bias. The sample handling approach for data balancing will
preprocess the training set to minimize class differences, and this issue can be resolved. Two
techniques
ordinarily utilized to tackle data imbalance are under-sampling and down-sampling in
which data is repeated or removed pairwise to maintain class balance across the dataset.
Another approach to boost classifier performance in case of an im- balance is Ensemble
learning derived from Beriman’s work in which the existing dataset is augmented to gener- ate
more data points in an attempt to increase training data. An approach for tackling data
imbalance is to use cost-sensitive learning instead of changing the dataset. Cost-sensitive

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 9

learning uses the variable cost of misclassification to balance the bias of an imbalance class.
It is suitable for a highly skewed dataset where certain classes are minorities. In the case
of AutoML, tools such as TPOT provide an implementation in its API to adjust for class-
specific sensitivity to adjust for skewed classes.

Challenges faced with imbalanced data


of the main challenges faced by the utility industry today is electricity theft. Electricity theft is
the third largest form of theft worldwide. Utility companies are increasingly turn-ing
towards advanced analytics and machine learning algorithms to identify consumption patterns
that indicate theft.
However, one of the biggest stumbling blocks is the humongous data and its distribution.
Fraudulent transactions are significantly lower than normal healthy transactions i.e. account-
ing it to around 1-2 percentage of the total number of observations. The ask is to improve
identification of the rare minority class as opposed to achieving higher overall accuracy.
Machine Learning algorithms tend to produce unsatisfactory classifiers when faced with
imbalanced datasets. For any imbalanced data set, if the event to be predicted belongs to the
minority class and the event rate is less than 5percentage, it is usually referred to as a rare
event.
Approach to handling Imbalanced Data:

Resampling Techniques
Dealing with imbalanced datasets entails strategies such as improving classification
algorithms or balancing classes in the training data (data preprocessing) before providing the
data as input to the machine learning algorithm. The later technique is preferred as it has wider
application.
The main objective of balancing classes is to either increasing the frequency of the mi-
nority class or decreasing the frequency of the majority class. This is done in order to obtain
approximately the same number of instances for both the classes. Let us look at a few
resampling techniques:

Random Under-Sampling
Random Under sampling aims to balance class distribution by randomly eliminating
majority class examples. This is done until the majority and minority class instances are
balanced out.
Total Observations = 1000
Fraudulent Observations =20
Non Fraudulent Observations = 980
Event Rate= 2

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 10

In this case we are taking 10


Non Fraudulent Observations after random under sampling = 10
Total Observations after combining them with Fraudulent observations = 20+98=118
Event Rate for the new dataset after under sampling = 20/118 = 17
Advantages
It can help improve run time and storage problems by reducing the number of training data
samples when the training data set is huge. Disadvantages It can discard potentially useful
information which could be important for building rule classifiers. The sample chosen by
random under sampling may be a biased sample. And it will not be an accurate representative
of the population. Thereby, resulting in inaccurate results with the actual test data set.

Random Over-Sampling
Over-Sampling increases the number of instances in the minority class by randomly
replicating them in order to present a higher representation of the minority class in the sample.
Total Observations = 1000
Fraudulent Observations =20
Non Fraudulent Observations = 980
Event Rate= 2
In this case we are replicating 20 fraud observations 20 times.
Non Fraudulent Observations =980
Fraudulent Observations after replicating the minority class observations= 400
Total Observations in the new data set after oversampling=1380
Event Rate for the new data set after under sampling= 400/1380 = 29

Advantages
Unlike under sampling this method leads to no information loss. Outper- forms under
sampling Disadvantages It increases the likelihood of overfitting since it repli- cates the
minority class events.

Cluster-Based Over Sampling


In this case, the K-means clustering algorithm is inde- pendently applied to minority and
majority class instances. This is to identify clusters in the dataset. Subsequently, each cluster
is oversampled such that all clusters of the same class have an equal number of instances and
all classes have the same size.
Total Observations = 1000
Fraudulent Observations =20
Non Fraudulent Observations = 980
Event Rate= 2

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 11

Majority Class Clusters

Cluster 1: 150 Observations


Cluster 2: 120 Observations
Cluster 3: 230 observations
Cluster 4: 200 observations
Cluster 5: 150 observations
Cluster 6:130 observations
Minority Class Clusters Cluster 1: 8 Observations Cluster 2: 12 Observations
After oversampling of each cluster, all clusters of the same class contain the same number of
observations.

Majority Class Clusters


Cluster 1: 170 Observations
Cluster 2: 170 Observations
Cluster 3: 170 observations
Cluster 4: 170 observations
Cluster 5: 170 observations
Cluster 6:170 observations
Minority Class Clusters Cluster 1: 250 Observations Cluster 2: 250 Observations Event
Rate post cluster based oversampling sampling = 500/ (1020+500) = 33

Advantages
This clustering technique helps overcome the challenge between class im- balance. Where
the number of examples representing positive class differs from the number of examples
representing a negative class. Also, overcome challenges within class imbalance, where a class
is composed of different sub clusters. And each sub cluster does not contain the same number
of examples. Disadvantages The main drawback of this algorithm, like most oversampling
techniques is the possibility of over-fitting the training data.

Informed Over Sampling:

Synthetic Minority Over-sampling Technique for imbalanced data This technique is


followed to avoid overfitting which occurs when exact replicas of minority instances are added
to the main dataset. A subset of data is taken from the minority class as an example and then
new synthetic similar instances are created. These synthetic instances are then added to the
original dataset. The new dataset is used as a sample to train the classification models.
Total Observations = 1000
Fraudulent Observations = 20
Non Fraudulent Observations = 980
Event Rate = 2

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 12

A sample of 15 instances is taken from the minority class and similar synthetic instances
are generated 20 times
Post generation of synthetic instances, the following data set is created
Minority Class (Fraudulent Observations) = 300
Majority Class (Non-Fraudulent Observations) = 980
Event rate= 300/1280 = 23.4

Advantages
Mitigates the problem of overfitting caused by random oversampling as synthetic examples
are generated rather than replication of instances No loss of useful information

Disadvantages
While generating synthetic examples SMOTE does not take into consideration
neighboring examples from other classes. This can result in increase in overlapping of classes
and can introduce additional noise SMOTE is not very effective for high dimensional data

Figure 4.3: Data balancing

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 13

4.3 Data Encoding


To make the data human-readable, the training data is often labelled in words. Data
Encoding refers to converting the provided feature labels into numerical form to allow com-
puter machines to interpret them. Some of the common forms of data encoding are ordinal,
one-hot, binary, hashing, target encoding, etc. Target encoding is the process of replacing
a categorical value with the mean/median/mode of the target variable. While other label
encoding assigns incremental values or binary columns to every label, the values assigned
to them are not representative of any property of the given data. Target encoding assigns a
meaningful label number which represents a certain property of the data such as the fraction
of true values in the target variable. H2O.AI is an autoML tool that makes use of target
encoding in its API. Auto-Sklearn uses one-hot for data encoding

DIFFERENT TYPES OF ENCODING

Encoding is a technique of converting categorical variables into numerical values so that it


could be easily fitted to a machine learning model.
Before getting into the details, let’s understand about the different types of categorical
variables.
NOMINAL CATEGORICAL VARIABLE: Nominal categorical variables are those
for which we do not have to worry about the arrangement of the categories.
Example,
i. suppose we have a gender column with categories as Male and Female
ii. We. Can also have a state column in which we have different states like NY, FL,
NV,TX
So here we don’t have to worry about the arrangement of the categories
.
ORDINAL CATEGORICAL VARIABLE :Ordinal categories are those in which we
haveto worry about the rank. These categories can be rearranged based on ranks.
Example,
i. Suppose in a dataset there is an education column which we will use to predict the salary
of the person. The education column has categories like ‘bachelors’,’masters’,’PHD’. Based
on the above categories we can rearrange this and assign ranks to each category.Based
on the education level ‘PHD’ will get the highest rank (PHD-1, masters-2, bachelors- 3).

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 14

Now that we have discussed about the type of categorical variables, let’s see the different
types of encoding:
Nominal Encoding and Ordinal Encoding

Figure 4.4: Data Encoding

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 15

Chapter 5

FEATURE ENGINEERING
• Feature Mining
• Feature Generation
• Feature Selection

5.1 Feature Mining


The dataset generated after pre- processing contains features, some of them which are
useful for the model training, whereas others have minimal contribution towards the training
phase of the machine learning model. Feature mining is responsible for picking out the
impactful features from a given dataset. This is done by computing the relevant feature
pairs. The measure of relevance is usually defined as the information gain or by measuring the
relationship between any featured pair. AutoLearn uses a cosine-transform and measures the
euclidean distance on these transforms to determine the feature pairs which correlate.
Feature selection is critical to building a good model for several reasons. One is that feature
selection implies some degree of cardinality reduction, to impose a cutoff on the number of
attributes that can be considered when building a model. Data almost always contains more
information than is needed to build the model, or the wrong kind of information. For example,
you might have a dataset with 500 columns that describe the characteristics of customers;
however, if the data in some of the columns is very sparse you would gain very little benefit
from adding them to the model, and if some of the columns duplicate each other, using both
columns could affect the model.
Not only does feature selection improve the quality of the model, it also makes the process
of modeling more efficient. If you use unneeded columns while building a model, more CPU
and memory are required during the training process, and more storage space is required for
the completed model. Even if resources were not an issue, you would still want to perform
feature selection and identify the best columns, because unneeded columns can degrade the
quality of the model in several ways:
• Noisy or redundant data makes it more difficult to discover meaningful patterns.
• If the data set is high-dimensional, most data mining algorithms require a much larger
training data set.
RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE
AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 16

During the process of feature selection, either the analyst or the modeling tool or al-
gorithm actively selects or discards attributes based on their usefulness for analysis. The
analyst might perform feature engineering to add features, and remove or modify existing data,
while the machine learning algorithm typically scores columns and validates their use- fulness
in the model.
In short, feature selection helps solve two problems: having too much data that is of little
value, or having too little data that is of high value. Your goal in feature selection should
be to identify the minimum number of columns from the data source that are significant in
building a model.

5.2 Feature Generation


Feature generation is a process of combining pre-existing features to generate new fea-
tures. AutoLearn achieves this by performing ridged regression over feature pairs to map
the relationships and considers the
newly generated mapping as a relationship. COGNITO uses a series of standard op-
erations over a feature tree to generate new features. LFE(Learning Feature Engineering)
improves on COGNITO by learning over the previous datasets to learn the relationship be-
tween features and the transforms, which generated better accuracy outcomes. These trans-
forms are evaluated for a given dataset by LFE to determine the operations which will lead
to an increase in the performance of the machine learning model. Reinforcement learning
has also been used by Khurana et al. for generating features. They leveraged the feature
tree structure used in COGNITO and incorporated a traversal policy to optimize transform
exploration. Reinforcement learning encourages the exploration of transforms that are ben-
eficial to the overall model and also applies a budget constraint. This constraint is needed to
prevent the algorithm from performing an exhaustive search over the feature graph.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 17

Figure 5.1: An example of the feature tree generated

5.3 Feature Selection


Feature Generation is an iterative process that leads to an explosion of total features.
This is controlled by the selection phase based on the impact of a particular feature on the
overall accuracy of the model. This is usually measured either by using a rank function or
by measuring the loss of the model when a particular feature is excluded and included. Ex-
ploreKit uses a novel ranking function based on a meta-feature classifier to determine which
generated features are important. One BM utilizes the chi- square hypothesis to determine
the features most relevant to the performance of the machine learning model. Information gain
has also been used as a parameter for feature selection . The above-described methods set a
threshold to select only the most relevant features generated in an autoML pipeline.
Before we get into the details let’s review what a feature is. A feature (or column)
represents a measurable piece of data like name, age or gender.It is the basic building block
of a dataset. The quality of a feature can vary significantly and has an immense effect on model
performance. We can improve the quality of a dataset’s features in the pre-processing stage
using processes like Feature Generation and Feature Selection.
Feature Generation (also known as feature construction, feature extraction or feature
engineering) is the process of transforming features into new features that better relate tothe
target. This can involve mapping a feature into a new feature using a function like log,or
creating a new feature from one or multiple features using multiplication or addition.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 18

Feature Generation can improve model performance when there is a feature interaction.
Two or more features interact if the combined effect is (greater or less) than the sum of their
individual effects. It is possible to make interactions with three or more features, but this tends
to result in diminishing returns.
Feature Generation is often overlooked as it is assumed that the model will learn any
relevant relationships between features to predict the target variable. However, the genera- tion
of new flexible features is important as it allows us to use less complex models that are faster
to run and easier to understand and maintain.
Feature Selection
In fact, not all features generated are relevant. Moreover, too many features may
adversely affect the model performance. This is because as the numberof features increases,
it becomes more difficult for the model to learn mappings between features and target (this is
known as the curse of dimensionality).

Figure 5.2: Difference between feature selection and feature extraction

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 19

Chapter 6

Model Selection and Hyperparameter


Optimization
• Grid Search
• Random Search
• Sequential Model-Based Optimization
• Evolutionary optimization
The core of any machine learning pipeline is the model used to perform the prediction task.
However, a single problem may have multiple model configurations with varying accuracy to
tackle it. Hence, it is crucial to determine the most appropriate model keeping in mind the
accuracy-execution time tradeoff. Conventionally, domain experts with previous expe- rience
ap- proximate a model to be used. This manual task by humans follows an iterative trial and
error approach for determining the model to be used, as shown in Fig. II-C. Once a model is
finalized, the hyperparameter optimization is again performed manually to generate the final
model. AutoML automates these above steps to reduce human dependency.

Figure 6.1: Hyperparameter optimization by trial and error

Most of the AutoML tools and methods combine the problem of model selection and
hyperparameter optimization into a single problem called the CASH(Combined Algorithm
Selection and Hyperparameter) problem. CASH problem con- siders model selection and

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 20

hyperparameters optimization as a single hierarchical hyperparameter optimization prob-


lem. At the root level, a hyperparameter resides, which selects between different learning
algorithms or models. At the next level, model-specific hyperparameters are placed which are
optimized to generate the final model. Auto-WEKA and SmartML are some of the tools, which
consider model selection and hyperparameter optimization as a singular problem.The
following approaches address CASH problem:
Starting from a given dataset, training a machine learning model implies the computation
of a set of model parameters that minimizes/maximizes a given metric or optimization
function. The optimum point is generally found using a gradient descent-based method.
However, most models are defined with an extra layer of parameters, called hyper-
parameters. Their value affects the parameters computed during model training and they
cannot be estimated directly from the data. Examples are the regularization parameter for ridge
regression models, the number of trees for random forest models, the number of layers in a
neural network and many others.
The recommended method for hyper-parameters tuning depends on the type of model
you are training and the number of hyper-parameters you consider. I will be focusing here
on Grid Search, a simple hyper-parameter tuning method and one of the first to be thought
to data science students. It requires the definition of a list of potential values for each pa-
rameters, training the model for each combination and selecting the values that produce the
best results according to a given criteria

6.1 Grid search


A model hyperparameter is a characteristic of a model that is external to the model and
whose value cannot be estimated from data. The value of the hyperparameter has to be
set before the learning process begins. For example, c in Support Vector Machines, k in k-
Nearest Neighbors, the number of hidden layers in Neural Networks.
In contrast, a parameter is an internal characteristic of the model and its value can be es-
timated from data. Example, beta coefficients of linear/logistic regression or support vectors
in Support Vector Machines.
Grid-search is used to find the optimal hyperparameters of a model which results in the
most ‘accurate’ predictions. The Grid search was proposed as a traditional approach for
the systematic exploration of the hyperparameter configuration space. It is simply a brute-
force algorithm that searches through a pre-specified subset of hyperparameter space of
the specific learning algorithm. The algorithm can be parallelized across multiple models with
different configurations to accelerate the search. However, due to its brute-forcing
characteristics, it is a very costly approach as for N different hyperparameters each having just
two possible values, we have a total of 2n possible configurations.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 21

Figure 6.2: Grid Search

Grid search is the simplest algorithm for hyperparameter tuning. Basically, we divide
the domain of the hyperparameters into a discrete grid. Then, we try every combination of
values of this grid, calculating some performance metrics using cross-validation. The point
of the grid that maximizes the average value in cross-validation, is the optimal combination
of values for the hyperparameters.

6.2 Random Search


Random search is similar to grid search, but instead of using all the points in the grid, it
tests only a randomly selected subset of these points. The smaller this subset, the faster but less
accurate the optimization. The larger this dataset, the more accurate the optimization but
the closer to a grid search.
Example of random search Random search is a very useful option when you have several
hyperparameters with a fine-grained grid of values. Using a subset made by 5-100 randomly
selected points, we are able to get a reasonably good set of values of the hyperparameters.
It will not likely be the best point, but it can still be a good set of values that gives us a good
model.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 22

Figure 6.3: Random Search

To alleviate the exhaustive enumeration of combinations in a grid search, Random Search


chooses random values from the hyperparameter subset independently. By navigating the
grid of hyperparameters randomly, one can obtain a similar performance as a full grid
search. However, this approach is surprisingly easy and effective. It is also well suited for
gradient-free functions with many local minima .Random search can outperform Grid Search
in a scenario where the small number of hyperparameters affects the final performance of the
model. Random Search replaces the exhaustive enumeration of all combinations by selecting
them randomly. This can be simply applied to the discrete setting described above, but also
generalizes to continuous and mixed spaces. It can outperform Grid search, especially when
only a small number of hyperparameters affects the final performance of the machine learning
algorithm. In this case, the optimization problem is said to have a low intrinsic dimensionality.
Random Search is also embarrassingly parallel, and additionally allows the inclusion of prior
knowledge by specifying the distribution from which to sample.

6.3 Sequential Model-Based Optimization


State-of-the-art algorithms for hard computational problems often expose many param-
eters that can be modified to improve empirical performance. However, manually explor- ing
the resulting combinatorial space of parameter settings is tedious and tends to lead to
unsatisfactory outcomes. Recently, automated approaches for solving this algorithm con-
figuration problem have led to substantial improvements in the state of the art for solving
various problems. One promising approach constructs explicit regression models to de- scribe
the dependence of target algorithm performance on parameter settings; however, this

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 23

approach has so far been limited to the optimization of few numerical algorithm parame-
ters on single instances. In this paper, we extend this paradigm for the first time to general
algorithm configuration problems, allowing many categorical parameters and optimization for
sets of instances. We experimentally validate our new algorithm configuration procedure by
optimizing a local search and a tree search solver for the propositional satisfiability problem
(SAT), as well as the commercial mixed integer programming (MIP) solver CPLEX. In these
experiments, our procedure yielded state-of-the-art performance, and in many cases
outperformed the previous best configuration approach. :: Random Search and Grid search
performs hyperparameter checking independ of each other and often end up performing
repeated and wasteful computations. To improve on their shortcomings, Sequential Model-
based Optimization(SMBO) was pro- posed, which uses a combination of regression and
Bayesian optimization to select hyperparameters. It sequentially applies the hyperparameters
and adjusts their values based on the Bayesian heatmap, which is a probabilistic distribution.
The probabilistic approach of SMBO resolves the scalability issues that were rampant in grid
search and random search.

Figure 6.4: Sequential Model-Based Optimization

A comparative example of hyperparameter selection behaviour of various strategies. No-


tice the selection clustering in the case of SMBO near the high scoring regions
To further improve this Bayesian optimization approach, the work in introduces a deep
neural network for global optimization of the hyperparameters. For neural networks, Men-
doza et al. introduced Auto-Net, an AutoML tool based on Bayesian optimization for tuning
neural networks. The tool uses Stochastic Gradient Descent (SGD) as its optimizer for
Hyperparameter optimization. The authors have also demonstrated a combined approach of

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 24

using Auto-Net and Auto-SKLearn to outperform human adversaries by a significant margin


of 10The authors in further improve upon the previous approaches which can be generalized
across datasets using a transfer learning strategy. They achieve this by constructing a
common hyperparameter surface of the previous hyperparameter selection plane and the target
models hyperparameters. SmartML is a meta-learning based framework for auto- mated
hyperparameter tuning and selection of ML models. It continuously learns from a given dataset
and stores information about the meta-features of all the previously processed datasets to
increase performance.

6.4 Evolutionary optimization


Evolutionary optimization is inspired by biological evolution which follows Survival of
the fittest. Such algorithms work by generating random agents, which perform a particular task
and are scored on their performance. The agents are evaluated, and a breeding algorithm
generates new child agents derived from the best
agents in the former generation. This new generation again performs the same given task,
and the cycle continues. For AutoML, evolutionary algorithms are used for tackling
hyperparameter optimization by search- ing the configuration space for a given model.

Figure 6.5: Evolutionary Optimizaton

To further imp Tree-based Pipeline Optimization Tool(TPOT) , an AutoML tool, makes


use of evolutionary algorithms (genetic programming in particular) and has demonstrated its
effectiveness on simulated and real-world genetic datasets. Autostacker is another AutoML
architecture that uses evolutionary models to optimize hyperparameter search. It produces
candidate pipelines in each generation and evolves itself.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 25

Chapter 7

DISCUSSION
Even though data pre-processing consumes a large chunk of time in an ML pipeline, it is
astonishing to see the inadequate amount of work done to automate it. For data pre- process-
ing, it can be noted that while the existing approaches are adequate for structured and semi-
structured data, work still needs to be done to assimilate unstructured data. We suggest the
incorporation of data-mining methods as they can deal with such unformed data. This can
allow AutoML pipelines to create models capable of learning from Internet sources. In fea-
ture engineering, it should be noted that most methods used until now adhere to supervised
learning. However, dataset specificity is high, and therefore, AutoML pipelines should be
as generic as possible to accommodate the diverse datasets. Therefore, a gradual paradigm shift
towards unsupervised learning is required to increase the ability of AutoML. To replace
domain experts, feature generation should be able to work flexibly(such as the introduction
of non-standard trans- forms) with the original feature sets. Reinforcement learning is a
step in the right direction and needs to be inculcated further with feature engineering. Hy- per
parameter optimization has seen large improvements over the years, especially with the
introduction of Bayesian optimization strategies such as
SMBO. However, the use of a continuously integrating meta- learning framework needs to
be researched as its performance gain is high. Transfer learning has also been success- fully
used in the context of AutoML to show promising results. With the increase in the availability
of task-specific pre-trained models, it should be expected to see an increase in the usage of
transfer learning.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 26

Chapter 8

CONCLUSION
In this paper, we provide insights to the readers about the various segments of AutoML
with a conceptual perspective. Each of these segments has various approaches that have been
briefly explained to provide a concise overview. We also discuss the various trends seen in
recent years including suggestions of thirsty research areas which need attention.We also
put forward some future directions that can be explored to extend the research in the
domain of AutoML. We suggest that the research exploration can be done in the direction of a
generalized AutoML pipeline, which can accept datasets of a wide range and a central meta-
learning framework be established that acts as a central brain for approximating the pipelines
for all future problems statements.
We almost forget it in these times of strong focus on technological innovation, but in the
end, technology is only there to support your business. In other words: our data analysis is
not our core business. our data analysis must only our core business. As an entrepreneur, it
is therefore beneficial to choose technology that works as efficiently as possible.
Then our analysts can put their cognitive energy back into thinking about business
problems instead of doing endless repetitive work before the technology can do its job.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


AUTOMATED MACHINE LEARNING:THE NEW WAVE OF MACHINE LEARNING 27

Bibliography
[1] Lukas Tuggener, Mohammadreza Amirian, Katharina Rombach, Stefan Lo¨rwald,
Anastasia Varlet, Christian Westermann, and Thilo Stadel- mann. Automated machine
learning in practice: state of the art and recent results. In 2019 6th Swiss Conference
on Data Science (SDS), pages 31–36. IEEE, 2019.

[2] Karen Simonyan and Andrew Zisserman. Very deep convolu- tional networks for large-
scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training
of deep bidirectional transformers for language un- derstanding. arXiv preprint
arXiv:1810.04805, 2018.

[4] Avatar Jaykrushna, Pathik Patel, Harshal Trivedi, and Jitendra Bhatia. Linear re-
gression assisted prediction based load balancer for cloud computing. In 2018 IEEE
Punecon, pages 1–3. IEEE.

[5] Jitendra Bhatia, Ruchi Mehta, and Madhuri Bhavsar. Variants of software defined net-
work (sdn) based load balancing in cloud comput- ing: A quick review. In International
Conference on Future Internet Technologies and Trends, pages 164–173. Springer,2017.

[6] Ishan Mistry, Sudeep Tanwar, Sudhanshu Tyagi, and Neeraj Kumar. Blockchain for 5g-
enabled iot for industrial automation: A systematic review, solutions, and challenges.
Mechanical Systems and Signal Processing, 135:106382, 2020.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy