0% found this document useful (0 votes)
22 views13 pages

Ai Unit 5

The document discusses various evaluation metrics for machine learning algorithms, including classification accuracy, logarithmic loss, confusion matrix, F1 score, mean absolute error, and mean squared error. It also covers the model selection process, emphasizing the importance of choosing the right model based on problem formulation, candidate selection, performance evaluation, and hyperparameter tuning. Additionally, it introduces ensemble methods and deep generative models, highlighting their applications and mechanisms in machine learning.

Uploaded by

drrsrnanireddy00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views13 pages

Ai Unit 5

The document discusses various evaluation metrics for machine learning algorithms, including classification accuracy, logarithmic loss, confusion matrix, F1 score, mean absolute error, and mean squared error. It also covers the model selection process, emphasizing the importance of choosing the right model based on problem formulation, candidate selection, performance evaluation, and hyperparameter tuning. Additionally, it introduces ensemble methods and deep generative models, highlighting their applications and mechanisms in machine learning.

Uploaded by

drrsrnanireddy00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT 5

Machine Learning Algorithm Analytics and Deep Learning

Evaluating Machine Learning Algorithms


Evaluating your machine learning algorithm is an essential part of any project. Most of the times
we use classification accuracy to measure the performance of our model, however it is not
enough to truly judge our model.
The different types of evaluation metrics are
1. Classification Accuracy
2. Logarithmic Loss
3. Confusion Matrix
4. F1 Score
5. Mean Absolute Error
6. Mean Squared Error

1. Classification Accuracy

Classification Accuracy is what we usually mean, when we use the term accuracy. It is the ratio
of number of correct predictions to the total number of input samples.

It works well only if there are equal number of samples belonging to each class.

2. Logarithmic Loss

Logarithmic Loss or Log Loss, works by penalising the false classifications. It works well for
multi-class classification. When working with Log Loss, the classifier must assign probability
to each class for all the samples. Suppose, there are N samples belonging to M classes, then the
Log Loss is calculated as below :

where,
y_ij, indicates whether sample i belongs to class j or not
p_ij, indicates the probability of sample i belonging to class j

3. Confusion Matrix
Confusion Matrix as the name suggests gives us a matrix as output and describes the complete
performance of the model.
Lets assume we have a binary classification problem. We have some samples belonging to two
classes : YES or NO. Also, we have our own classifier which predicts a class for a given input
sample. On testing our model on 165 samples ,we get the following result.
Confusion Matrix
There are 4 important terms :
• True Positives : The cases in which we predicted YES and the actual output was
also YES.
• True Negatives : The cases in which we predicted NO and the actual output was
NO.
• False Positives : The cases in which we predicted YES and the actual output was
NO.
• False Negatives : The cases in which we predicted NO and the actual output was
YES.
Accuracy for the matrix can be calculated by taking average of the values lying across the “main
diagonal” i.e

Confusion Matrix forms the basis for the other types of metrics.

4. F1 Score
F1 Score is used to measure a test’s accuracy
F1 Score is the Harmonic Mean between precision and recall. The range for F1 Score is [0, 1].
It tells you how precise your classifier is (how many instances it classifies correctly), as well as
how robust it is (it does not miss a significant number of instances).
High precision but lower recall, gives you an extremely accurate, but it then misses a large
number of instances that are difficult to classify. The greater the F1 Score, the better is the
performance of our model. Mathematically, it can be expressed as :

F1 Score
F1 Score tries to find the balance between precision and recall.
• Precision : It is the number of correct positive results divided by the number of
positive results predicted by the classifier.

Precision
• Recall : It is the number of correct positive results divided by the number
of all relevant samples (all samples that should have been identified as positive).

Recall

5. Mean Absolute Error


Mean Absolute Error is the average of the difference between the Original Values and the
Predicted Values. It gives us the measure of how far the predictions were from the actual output.
Mathematically, it is represented as :

6.Mean Squared Error


Mean Squared Error(MSE) is quite similar to Mean Absolute Error, the only difference being
that MSE takes the average of the square of the difference between the original values and the
predicted values.
The advantage of MSE being that it is easier to compute the gradient, whereas Mean
Absolute Error requires complicated linear programming tools to compute the gradient.
As, we take square of the error, the effect of larger errors become more pronounced then
smaller error, hence the model can now focus more on the larger errors.

Mean Squared Error

Model Selection
Model selection is an essential phase in the development of powerful and precise predictive
models in the field of machine learning. Model selection is the process of deciding which
algorithm and model architecture is best suited for a particular task or dataset. It entails
contrasting various models, assessing their efficacy, and choosing the one that most effectively
addresses the issue at hand.
The choice of an appropriate machine learning model is crucial since there are various levels
of complexity, underlying assumptions, and capabilities among them. A model's ability to
generalize to new, untested data may not be as strong as its ability to perform effectively on a
single dataset or problem. Finding a perfect balance between the complexity of models &
generalization is therefore key to model selection.
Choosing a model often entails a number of processes. The first step in this process is to define
a suitable evaluation metric that matches the objectives of the particular situation. According
to the nature of the issue, this statistic may refer to precision, recall, accuracy, F1-score, or any
other relevant measure.
The selection of numerous candidate models is then made in accordance with the problem at
hand and the data that are accessible. These models might be as straightforward as decision
trees or linear regression or as sophisticated as deep neural networks, random forests, or support
vector machines. During the selection process, it is important to take into account the
assumptions, constraints, and hyperparameters that are unique to each model.
Using a suitable methodology, such as cross-validation, the candidate models are trained and
evaluated after being selected. To do this, the available data must be divided into validation
and training sets, with each model fitting on the training set before being evaluated on the
validation set. The models are compared using their performance metrics, then the model with
the highest performance is chosen.
Model selection is a continuous process, though. In order to make wise selections, it frequently
calls for an iterative process that involves testing several models and hyperparameters. The
models are improved through this iterative process, which also aids in choosing the ideal mix
of algorithms & hyperparameters.
Model Selection
In machine learning, the process of selecting the top model or algorithm from a list of potential
models to address a certain issue is referred to as model selection. It entails assessing and
contrasting various models according to how well they function and choosing the one that
reaches the highest level of accuracy or prediction power.
Because different models have varied levels of complexity, underlying assumptions, and
capabilities, model selection is a crucial stage in the machine-learning pipeline. Finding a
model that fits the training set of data well and generalizes well to new data is the objective.
While a model that is too complex may overfit the data and be unable to generalize, a model
that is too simple could underfit the data and do poorly in terms of prediction.
The following steps are frequently included in the model selection process:
• Problem formulation: Clearly express the issue at hand, including the kind of
predictions or task that you'd like the model to carry out (for example, classification,
regression, or clustering).
• Candidate model selection: Pick a group of models that are appropriate for the issue
at hand. These models can include straightforward methods like decision trees or linear
regression as well as more sophisticated ones like deep neural networks, random
forests, or support vector machines.
• Performance evaluation: Establish measures for measuring how well each model
performs. Common measurements include recall, F1-score, mean squared error, and
accuracy, precision, and recall. The type of problem and the particular requirements
will determine which metrics are used.
• Training and evaluation: Each candidate model should be trained using a subset of
the available data (the training set), and its performance should be assessed using a
different subset (the validation set or via cross-validation). The established evaluation
measures are used to gauge the model's effectiveness.
• Model comparison: Evaluate the performance of various models and determine which
one performs best on the validation set. Take into account elements like data handling
capabilities, interpretability, computational difficulty, and accuracy.
• Hyperparameter tuning: Before training, many models require that certain
hyperparameters, such as the learning rate, regularisation strength, or the number of
layers that are hidden in a neural network, be configured. Use methods like grid search,
random search, and Bayesian optimization to identify these hyperparameters' ideal
values.
• Final model selection: After the models have been analyzed and fine-tuned, pick the
model that performs the best. Then, this model can be used to make predictions based
on fresh, unforeseen data.
There are numerous important considerations to bear in mind while selecting a model for
machine learning. These factors assist in ensuring that the chosen model is effective in solving
the issue at its core and has an opportunity for outstanding performance. Here are some crucial
things to remember:
• The complexity of the issue
• Data Availability & Quality
• Model Assumptions
• Scalability and Efficiency
• Domain Expertise
• Resource Constraints
• Evaluation and Experimentation

Ensemble methods
Ensemble methods is a machine learning technique that combines several base models in
order to produce one optimal predictive model. To better understand this definition lets take a
step back into ultimate goal of machine learning and model building.
A Decision Tree determines the predictive value based on series of questions and conditions.
For instance, this simple Decision Tree determining on whether an individual should play
outside or not. The tree takes several weather factors into account, and given each factor either
makes a decision or asks another question.

Types of Ensemble Methods

1. BAGGing, or Bootstrap AGGregating. BAGGing gets its name because it


combines Bootstrapping and Aggregation to form one ensemble model. Given a
sample of data, multiple bootstrapped subsamples are pulled. A Decision Tree is
formed on each of the bootstrapped subsamples. After each subsample Decision
Tree has been formed, an algorithm is used to aggregate over the Decision Trees to
form the most efficient predictor. The image below will help explain:
Given a Dataset, bootstrapped subsamples are pulled. A Decision Tree is formed on each
bootstrapped sample. The results of each tree are aggregated to yield the strongest, most
accurate predictor.
2. Random Forest Models. Random Forest Models can be thought of as BAGGing, with a
slight tweak. When deciding where to split and how to make decisions, BAGGed Decision Trees
have the full disposal of features to choose from. Therefore, although the bootstrapped samples
may be slightly different, the data is largely going to break off at the same features throughout
each model. In contrary, Random Forest models decide where to split based on a random
selection of features. Rather than splitting at similar features at each node throughout, Random
Forest models implement a level of differentiation because each tree will split based on different
features. This level of differentiation provides a greater ensemble to aggregate over, ergo
producing a more accurate predictor. Refer to the image for a better understanding.

Similar to BAGGing, bootstrapped subsamples are pulled from a larger dataset. A decision tree
is formed on each subsample. HOWEVER, the decision tree is split on different features (in this
diagram the features are represented by shapes).
3.Boosting
Boosting is a sequential process, where each subsequent model attempts to correct the
errors of the previous model. The succeeding models are dependent on the previous model.
Let’s understand the way boosting works in the below steps.
1. A subset is created from the original dataset.
2. Initially, all data points are given equal weights.
3. A base model is created on this subset.
4. This model is used to make predictions on the whole dataset.

5. Errors are calculated using the actual values and predicted values.
6. The observations which are incorrectly predicted, are given higher weights.
(Here, the three misclassified blue-plus points will be given higher weights)
7. Another model is created and predictions are made on the dataset.
(This model tries to correct the errors from the previous model)

8. Similarly, multiple models are created, each correcting the errors of the previous
model.
9. The final model (strong learner) is the weighted mean of all the models (weak
learners).

Thus, the boosting algorithm combines a number of weak learners to form a strong learner.
The individual models would not perform well on the entire dataset, but they work well
for some part of the dataset. Thus, each model actually boosts the performance of the
ensemble.
Deep generative models
A Generative Model is a powerful way of learning any kind of data distribution using
unsupervised learning and it has achieved tremendous success in just few years.

Deep generative models are a class of artificial intelligence algorithms used in machine
learning and specifically in the field of generative modeling. These models aim to learn
the underlying distribution of a dataset in order to generate new data samples that resemble
the original dataset. Deep generative models leverage deep learning techniques, typically
using neural networks with multiple layers to capture complex patterns and relationships
within the data.
Some popular types of deep generative models include:
1. Variational Autoencoders (VAEs)
2. Generative Adversarial Networks (GANs)
3. Autoregressive Models
4. Flow-Based Models
Deep generative models have a wide range of applications, including image generation,
text generation, data augmentation, and anomaly detection.
A Boltzmann machine is an unsupervised deep learning model in which every node is
connected to every other node. It is a type of recurrent neural network, and the nodes make
binary decisions with some level of bias.
These machines are not deterministic deep learning models, they are stochastic or generative
deep learning models. They are representations of a system.
A Boltzmann machine has two kinds of nodes
• Visible nodes:
These are nodes that can be measured and are measured.
• Hidden nodes:
These are nodes that cannot be measured or are not measured.

There are three types of Boltzmann machines. These are:


• Restricted Boltzmann Machines (RBMs)
• Deep Belief Networks (DBNs)
• Deep Boltzmann Machines (DBMs)
1. Restricted Boltzmann Machines (RBMs)
While in a full Boltzmann machine all the nodes are connected to each other and the
connections grow exponentially, an RBM has certain restrictions with respect to node
connections.
In a Restricted Boltzmann Machine, hidden nodes cannot be connected to each other while
visible nodes are connected to each other.
2. Deep Belief Networks (DBNs)
In a Deep Belief Network, you could say that multiple Restricted Boltzmann Machines are
stacked, such that the outputs of the first RBM are the inputs of the subsequent RBM. The
connections within individual layers are undirected, while the connections between layers are
directed. However, there is an exception here. The connection between the top two layers is
undirected.
A deep belief network can either be trained using a Greedy Layer-wise Training Algorithm or
a Wake-Sleep Algorithm.
3. Deep Boltzmann Machines (DBMs)
Deep Boltzmann Machines are very similar to Deep Belief Networks. The difference between
these two types of Boltzmann machines is that while connections between layers in DBNs are
directed, in DBMs, the connections within layers, as well as the connections between the layers,
are all undirected.

Deep Boltzmann Machine


A Deep Boltzmann Machine (DBM) is a three-layer generative model. It is similar to a Deep Belief
Network, but instead allows bidirectional connections in the bottom layers.
DBMs consist of multiple layers of hidden units, which are like the neurons in our brains.
These units work together to capture the probabilities of various patterns within the data.
Unlike some other neural networks, all units in a DBM are connected across layers, but not
within the same layer, which allows them to create a web of relationships between different
features in the data. This structure helps DBMs to be good at understanding complex data
like images, text, or sound.
The ‘deep’ in the Deep Boltzmann Machine refers to the multiple layers in the network,
which allow it to build a deep understanding of the data. Each layer captures increasingly
abstract representations of the data. The first layer might detect edges in an image, the second
layer might detect shapes, and the third layer might detect whole objects like cars or trees.
How Deep Boltzmann Machines Work?
Deep Boltzmann Machines work by first learning about the data in an unsupervised way,
which means they look for patterns without being told what to look for. They do this using a
process that involves adjusting the connections between units based on the data they see. This
process is similar to tuning a radio to get a clear signal; the DBM ‘tunes’ itself to resonate
with the structure of the data.
When a DBM is given a set of data, it uses a stochastic, or random, process to decide whether
a hidden unit should be turned on or off. This decision is based on the input data and the
current state of other units in the network. By doing this repeatedly, the DBM learns the
probability distribution of the data—basically, it gets an understanding of which patterns are
likely and which are not.
After the learning phase, you can use a DBM to generate new data. When generating new
data, the DBM starts with a random pattern and refines it step by step, each time updating
the pattern to be more like the patterns it learned during training.
Concepts Related to Deep Boltzmann Machines (DBMs)
Several key concepts underpin Deep Boltzmann Machines:
• Energy-Based Models: DBMs are energy-based models, which means they assign an
‘energy’ level to each possible state of the network. States that are more likely have lower
energy. The network learns by finding states that minimize this energy.
• Stochastic Neurons: Neurons in a DBM are stochastic. Unlike in other types of neural
networks, where neurons output a deterministic value based on their input, DBM neurons
make random decisions about whether to activate.
• Unsupervised Learning: DBMs learn without labels. They look at the data and try to
understand the underlying structure without any guidance on what features are important.
• Pre-training: DBMs often go through a pre-training phase where they learn one layer at a
time. This step-by-step learning helps in stabilizing the learning process before fine-tuning
the entire network together.
• Fine-Tuning: After pre-training, DBMs are fine-tuned, which means they adjust all their
parameters at once to better model the data.

Time Series Data


A time series is a series of data points indexed (or listed or graphed) in time order. Most
commonly, a time series is a sequence taken at successive equally spaced points in time.

Time series data refers to sequential data points collected over time, where each data point is
associated with a specific timestamp. Deep learning techniques have been increasingly applied
to analyze and model time series data due to their ability to capture complex temporal
dependencies and patterns.

Here are several common approaches to using deep learning for time series data:

Recurrent Neural Networks (RNNs): RNNs are a type of neural network architecture
specifically designed to handle sequential data. They have recurrent connections that allow
information to persist over time. Long Short-Term Memory (LSTM) networks and Gated
Recurrent Units (GRUs) are popular variants of RNNs that are capable of capturing long-range
dependencies and mitigating the vanishing gradient problem.

Convolutional Neural Networks (CNNs): While CNNs are traditionally used for image data,
they can also be applied to time series data by treating the temporal dimension as a spatial
dimension. This is achieved by using one-dimensional convolutions over the time axis. CNNs
can capture local patterns and are particularly effective when there are spatial-temporal patterns
in the data.

Temporal Convolutional Networks (TCNs): TCNs are a variant of CNNs specifically


designed for processing time series data. They utilize causal convolutions, meaning that the
output at each time step only depends on past and current inputs, making them suitable for
online prediction tasks. TCNs have shown promising results in various time series forecasting
and sequence modeling tasks.

Transformer-based Models: Transformer models, originally proposed for natural language


processing tasks, have also been adapted for time series data. These models leverage self-
attention mechanisms to capture global dependencies between different time steps in the
sequence. Transformer-based models have achieved state-of-the-art performance in various
time series forecasting and anomaly detection tasks.

Autoencoder-based Approaches: Autoencoder architectures, such as Variational


Autoencoders (VAEs) and Sequence-to-Sequence Autoencoders, can be used for
dimensionality reduction, feature learning, and anomaly detection in time series data. These
models learn to reconstruct the input data while capturing its underlying structure in a latent
space.

Hybrid Models: Some approaches combine multiple architectures, such as combining CNNs
and RNNs or CNNs and Transformers, to leverage the strengths of each model type for
different aspects of the time series data.

Deep learning models for time series data have been applied in various domains, including
finance (for stock price prediction and algorithmic trading), healthcare (for patient monitoring
and disease diagnosis), energy (for demand forecasting and anomaly detection), and many
others. However, it's important to carefully design the architecture, preprocess the data
appropriately, and tune the hyperparameters to achieve optimal performance for a specific task.

Autoencoders -Machine Learning


✓ Autoencoders are a specialized class of algorithms that can learn efficient
representations of input data with no need for labels.
✓ It is a class of artificial neural networks designed for unsupervised learning.
✓ Learning to compress and effectively represent input data without specific labels is
the essential principle of an automatic decoder.
✓ This is accomplished using a two-fold structure that consists of an encoder and a
decoder.

The encoder transforms the input data into a reduced-dimensional representation,


which is often referred to as “latent space” or “encoding”. From that representation, a
decoder rebuilds the initial input.
For the network to gain meaningful patterns in data, a process of encoding and decoding
facilitates the definition of essential features.
Architecture of Autoencoder in Deep Learning
The general architecture of an autoencoder includes an encoder, decoder, and bottleneck
layer.
1. Encoder
• Input layer take raw input data
• The hidden layers progressively reduce the dimensionality of the
input, capturing important features and patterns. These layer compose
the encoder.
• The bottleneck layer (latent space) is the final hidden layer, where the
dimensionality is significantly reduced. This layer represents the
compressed encoding of the input data.
2. Decoder
• The bottleneck layer takes the encoded representation and expands it
back to the dimensionality of the original input.
• The hidden layers progressively increase the dimensionality and aim
to reconstruct the original input.
• The output layer produces the reconstructed output, which ideally
should be as close as possible to the input data.
3. The loss function used during training is typically a reconstruction loss,
measuring the difference between the input and the reconstructed output.
Common choices include mean squared error (MSE) for continuous data or binary
cross-entropy for binary data.
4. During training, the autoencoder learns to minimize the reconstruction loss,
forcing the network to capture the most important features of the input data in the
bottleneck layer.
After the training process, only the encoder part of the autoencoder is retained to encode a
similar type of data used in the training process. The different ways to constrain the network
are: –
• Keep small Hidden Layers: If the size of each hidden layer is kept as small as
possible, then the network will be forced to pick up only the representative
features of the data thus encoding the data.
• Regularization: In this method, a loss term is added to the cost function which
encourages the network to train in ways other than copying the input.
• Denoising: Another way of constraining the network is to add noise to the input
and teach the network how to remove the noise from the data.
• Tuning the Activation Functions: This method involves changing the
activation functions of various nodes so that a majority of the nodes are dormant
thus, effectively reducing the size of the hidden layers.
Applications of Deep Networks
The use cases for deep generative models being applied in the real world today:
• Autonomous vehicle systems use inputs from visual and Lidar sensors fed to a neural
network that predicts future behavior to make proactive course corrections thousands of times
a second.
• Fraud detection compares historical behavior to current transactions to detect anomalies and
act accordingly.
• Virtual assistants learn a person’s taste in music, their schedule, purchasing history and any
other information they have access to make recommendations. For example, it can provide
travel times to home or places to work.
• Entertainment systems can recommend movies based on past viewing of similar content.
• A smartwatch can warn of potential medical conditions, over-exertion, and lack of sleep to
oversee the owner’s well-being.
• Images taken with a digital camera or scanned images can be enhanced by increasing
sharpness, balancing colors, and suggesting crops.
• Captions can be auto-generated for movies or meeting videos to enhance playback.
• Handwriting style can be learned, and new text can be generated in the same style.
• Captioned videos can have captions generated in multiple languages.
• Photo libraries can be tagged with descriptions to make finding similar ones or duplicates
easier.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy