SC Exp1
SC Exp1
: 16010122812
Experiment No. 01
Grade: AA / AB / BB / BC / CC / CD /DD
___________________________________________________________________________________
CO1 : Identify and describe soft computing techniques and their roles
____________________________________________________________________________________
Books/ Journals/ Websites referred:
J.S.R.Jang, C.T.Sun and E.Mizutani, “Neuro-Fuzzy and Soft Computing”, PHI, 2004, Pearson
Education 2004.
Davis E.Goldberg, “Genetic Algorithms: Search, Optimization and Machine Learning”, Addison
Wesley, N.Y., 1989.
____________________________________________________________________________________
Pre Lab/ Prior Concepts:
Neural networks, sometimes referred to as connectionist models, are parallel-distributed models that
have several distinguishing features-
1) A set of processing units;
2) An activation state for each unit, which is equivalent to the output of the unit;
3) Connections between the units. Generally each connection is defined by a weight wjk that
determines the effect that the signal of unit j has on unit k;
4) A propagation rule, which determines the effective input of the unit from its external inputs;
5) An activation function, which determines the new level of activation based on the effective
input and the current activation;
6) An external input (bias, offset) for each unit;
7) A method for information gathering (learning rule);
K. J. Somaiya College of Engineering, Mumbai-77
8) An environment within which the system can operate, provide input signals and, if necessary,
error signals.
____________________________________________________________________________________
Implementation Details:
Most units in neural network transform their net inputs by using a scalar-to-scalar function called an
activation function, yielding a value called the unit's activation. Except possibly for output units, the
activation value is fed to one or more other units. Activation functions with a bounded range are
often called squashing functions. Some of the most commonly used activation functions are: -
1) Identity function: -
In soft computing, the identity function is a fundamental concept used in various areas such as
artificial intelligence, machine learning, and neural networks. The identity function is a simple
mathematical function that takes an input and returns the same value as output. In other words, the
function maps each input element to itself without any alteration.
The identity function is commonly denoted as "f(x) = x" or "y = x," where 'x' represents the input
value, and 'y' represents the output value. Regardless of the value of 'x', the output 'y' will always be
the same as 'x' in the identity function.
f(x) = x
The identity function is particularly useful in various soft computing algorithms and techniques,
including neural networks. In neural networks, the identity function is often used as the activation
function in some layers to maintain the linearity of the data transformation. For instance, in a multi-
layer perceptron (MLP) neural network, the identity function can be used in the output layer to
perform regression tasks, where the model is required to predict continuous values without any
specific transformation.
In summary, the identity function in soft computing is a simple and essential concept that helps
maintain the original values of inputs, playing a significant role in various applications within the
field of artificial intelligence and machine learning.
f(x) = {
0, if x < 0
1, if x >= 0
}
In other words, when the input 'x' is less than zero, the function returns a value of 0, and when the
input is greater than or equal to zero, the function returns a value of 1. This function creates a hard
threshold, where any input below zero is mapped to 0, and any input equal to or above zero is
mapped to 1.
The binary step function is commonly used in binary classification problems, where the goal is to
categorize inputs into two classes (e.g., positive and negative, yes and no, etc.). The function acts as
a simple decision maker, assigning one of the two classes to the input based on the threshold at zero.
However, one limitation of the binary step function is that it is not differentiable at the threshold
point (x = 0). This lack of differentiability makes it unsuitable for some optimization algorithms used
in training neural networks, such as gradient descent. As a result, the binary step function is not
commonly used in modern neural networks for training purposes.
Instead, other activation functions like the sigmoid function, ReLU (Rectified Linear Unit), or the
softmax function are preferred in most cases because they are differentiable and can better facilitate
the learning process during training. Nevertheless, the binary step function remains an essential
concept in the study of activation functions and their role in soft computing and artificial neural
networks.
3) Sigmoid function: -
The sigmoid function is a commonly used activation function in soft computing, especially in
artificial neural networks and logistic regression. It is valued for its smooth and continuously
differentiable nature, which allows for efficient optimization during training.
The standard sigmoid function is also known as the logistic function and is defined as follows:
f(x) = 1 / (1 + exp(-x))
In this equation, 'x' is the input value, 'exp' represents the exponential function, and 'f(x)' is the output
value of the sigmoid function.
1. Range: The output of the sigmoid function lies between 0 and 1, which is useful for problems
where we want to interpret the output as a probability. The function maps any real-valued input to a
probability value in the range [0, 1].
2. S-shaped curve: The sigmoid function has an S-shaped curve, which means that the output
increases monotonically as the input increases. This property helps in introducing non-linearity to the
neural network, allowing it to learn complex relationships in the data.
3. Differentiability: The sigmoid function is differentiable for all input values, making it suitable for
optimization algorithms like gradient descent during the training process. This differentiability
allows the model to update its weights and learn from the training data effectively.
4. Saturated behavior: The sigmoid function can suffer from vanishing gradients when the input
values become very large or very small. This can lead to slow or stalled learning during training in
deep neural networks, known as the vanishing gradient problem.
While the sigmoid function was commonly used as the default activation function in the early days
of neural networks, it has some limitations, particularly the vanishing gradient problem. As a result,
other activation functions like ReLU (Rectified Linear Unit) and its variants have gained popularity
in modern deep learning architectures. These alternative functions help mitigate the vanishing
gradient problem and facilitate faster convergence during training.
However, the sigmoid function is still valuable in certain scenarios, such as binary classification
problems or as an output activation for models where the outputs are interpreted as probabilities.
f(x) = (2 / (1 + exp(-2x))) - 1
In this equation, 'x' represents the input value, 'exp' denotes the exponential function, and 'f(x)' is the
output value of the bipolar sigmoid function.
1. Range: The output of the bipolar sigmoid function lies between -1 and 1, which makes it suitable
for problems where the input data is centered around zero. This range provides a more balanced
output for both positive and negative input values.
2. S-shaped curve: Similar to the standard sigmoid function, the bipolar sigmoid function also has
an S-shaped curve, introducing non-linearity to neural networks and allowing them to learn complex
patterns in the data.
3. Differentiability: The bipolar sigmoid function is differentiable for all input values, facilitating
efficient optimization during the training process using methods like gradient descent.
4. Centered at zero: One of the main advantages of the bipolar sigmoid function over the standard
sigmoid function is that it is centered at zero. This means that when the input is zero, the output of
the function is also zero, making it more suitable for data with mean-centered distributions.
The bipolar sigmoid function is commonly used as an activation function in neural networks,
particularly in hidden layers. It helps to introduce non-linearity and allows the network to learn and
approximate complex functions effectively.
While the bipolar sigmoid function has some benefits, it still suffers from the vanishing gradient
problem when input values are very large or very small. In deep neural networks, this issue can
hinder learning and slow down training.
f(x) = max(0, x)
In this equation, 'x' represents the input value, 'max' denotes the maximum function, and 'f(x)' is the
output value of the ReLU function.
1. Non-linearity: The ReLU function is a non-linear activation function. It returns the input value 'x'
if 'x' is positive (greater than or equal to zero) and returns zero if 'x' is negative. This non-linearity
enables neural networks to approximate more complex functions and better capture intricate
patterns in the data.
2. Sparsity: One interesting property of the ReLU function is that it introduces sparsity in the
network. When the input is negative, the output is zero, effectively "deactivating" the neuron and
making it less computationally expensive during forward and backward propagation. This sparsity
can lead to more efficient and faster training of deep neural networks.
3. Avoiding vanishing gradient: The ReLU function helps mitigate the vanishing gradient problem,
which can occur when using sigmoid or tanh functions. This issue happens because these traditional
sigmoidal functions saturate for large positive and negative inputs, resulting in very small gradients.
The ReLU function, on the other hand, avoids this saturation for positive inputs, leading to more
stable and faster convergence during training.
However, the ReLU function also has a potential drawback known as the "dying ReLU" problem.
This occurs when a neuron always outputs zero due to a large negative input. If a neuron gets stuck
in this state, it becomes inactive and does not contribute to the learning process. To address this
problem, variants of the ReLU function have been proposed, such as the Leaky ReLU and
Parametric ReLU, which allow small negative values to pass through to prevent neurons from
dying.
In summary, the ReLU function is a widely used activation function in deep learning due to its
simplicity, non-linearity, and effectiveness in mitigating the vanishing gradient problem. While it
may have some limitations, various modifications and variants have been introduced to address
those issues and make the ReLU family of functions an essential component of modern neural
network architectures.
1. Identity function: -
Code: -
import numpy as np
import matplotlib.pyplot as plt
# Plotting
plt.figure(figsize=(8, 6))
plt.plot(input_values, output_values, label='Identity Function', color='blue')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Identity Activation Function')
plt.legend()
plt.grid(True)
plt.show()
Outputs: -
# Input values
input_values = np.array([-3, -1, 0, 2, 4])
plt.show()
Outputs: -
3. Sigmoid function: -
Code: -
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(x):
return 1 / (1 + np.exp(-x))
output_values = sigmoid(input_values)
Outputs: -
def bipolar_sigmoid(x):
return (2 / (1 + np.exp(-2 * x))) - 1
Outputs: -
5. ReLU function: -
Code: -
import numpy as np
import matplotlib.pyplot as plt
Outputs: -
1. Introducing non-linearity: Without activation functions, the neural network would be limited to
performing only linear transformations of the input data. A combination of linear operations would
still result in a linear operation. However, real-world data is often non-linear, and complex patterns
cannot be represented by a series of linear transformations alone. Activation functions introduce non-
linearity to the network, enabling it to learn and represent non-linear relationships between the input
and output.
3. Decision-making and output mapping: In many tasks, neural networks are used for decision-
making, such as classifying objects in an image or predicting a target value. Activation functions play
a crucial role in these tasks by mapping the output of the neurons to specific ranges or binary values,
making them suitable for classification or regression problems.
4. Gradient propagation during backpropagation: During the training process, the neural network
adjusts its weights and biases to minimize the error between predicted and actual output. Activation
functions also influence how gradients are propagated backward through the network during the
backpropagation algorithm. Different activation functions have different gradient properties, which
can impact the speed and stability of learning. Activation functions like ReLU help alleviate the
vanishing gradient problem, which can occur in deeper networks with traditional sigmoidal
activation functions.
In summary, activation functions are an essential part of neural networks because they introduce non-
linearity, allowing the network to learn complex patterns and make decisions. They play a vital role
in the learning process, ensuring that the network can effectively learn from data, generalize to new
examples, and perform well on various tasks, including classification, regression, and other machine
learning problems.
2. Range of output: The range of the activation function's output refers to the values the function can
take as an output. Ideally, an activation function should map the input to a specific range that is
suitable for the task at hand. For example, the sigmoid and softmax functions map the input to the
range (0, 1), making them suitable for classification tasks where outputs represent probabilities. On
the other hand, ReLU and its variants map the input to the range [0, ∞), which is useful for most
hidden layers of a neural network.
3. Continuity and Differentiability: Activation functions should be continuous and differentiable (or
at least piecewise differentiable) for efficient gradient-based optimization algorithms like
backpropagation to work effectively. The continuity ensures that small changes in input produce
small changes in output, while differentiability is essential for calculating gradients during the
training process. Functions like ReLU are continuous but not differentiable at zero, leading to the use
of leaky ReLU or other variants.
4. Sparsity: Some activation functions, like ReLU, introduce sparsity in the network. When the input
is negative, the output is zero, effectively "deactivating" the neuron. This sparsity can lead to more
efficient and faster training of deep neural networks, as fewer neurons need to be updated during
backpropagation.
5. Vanishing and Exploding Gradients: The problem of vanishing and exploding gradients refers to
the issue of gradients becoming extremely small or extremely large during the training process.
Activation functions can influence the occurrence of these problems. For example, the sigmoid and
tanh functions can suffer from the vanishing gradient problem when inputs are very large or very
small. ReLU helps mitigate the vanishing gradient problem, but it can still lead to the exploding
gradient problem for very high learning rates.
7. Saturation and Zero-Centering: Activation functions like sigmoid and tanh can suffer from
saturation when inputs are very large or very small, leading to vanishing gradients. Additionally, they
are not zero-centered, which may cause optimization challenges in certain scenarios. ReLU and its
variants address these issues by avoiding saturation and being zero-centered.
In summary, understanding the properties of activation functions is crucial for selecting appropriate
activation functions for specific tasks and designing effective neural network architectures. Different
activation functions possess unique characteristics that can impact the performance, stability, and
convergence of the neural network during training and inference.