0% found this document useful (0 votes)
21 views7 pages

Act Fun

AI NOTES

Uploaded by

mm88crhh2v
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views7 pages

Act Fun

AI NOTES

Uploaded by

mm88crhh2v
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Activation Functions

An activation function is a mathematical function used in neural networks to determine


the output of a neuron. It introduces non-linearity into the model, enabling it to learn
complex patterns in the data.

Importance of Activation Functions


• Adds Non-linearity: Real-world data is typically non-linear, and activation func-
tions enable the network to learn such patterns.

• Allows Deep Networks to Work: Without non-linear activation functions, a


neural network would act as a linear model regardless of its depth.

• Bounds Output: Some activation functions restrict the output to a specific range,
which helps stabilize training.

Types of Activation Functions


Activation functions can be broadly divided into:

• Linear Activation Functions

• Non-linear Activation Functions

1. Linear Activation Function


The linear activation function is defined as:

f (x) = x

Key Features
• The output is directly proportional to the input.

• No transformation or non-linearity is introduced.

Advantages
• Simple and computationally efficient.

• Useful in specific cases such as regression tasks.

1
Disadvantages
• Does not introduce non-linearity, making it unsuitable for learning complex pat-
terns.

• Causes the network to collapse into a single linear transformation.

Example with Data


Consider:  
1
2
x=
3 .

4
Applying the linear activation function:
 
1
2
f (x) = x = 
3 .

The output is identical to the input, with no transformation.


2. Non-linear Activation Functions


Non-linear activation functions enable the network to approximate complex patterns by
introducing non-linearity.

Common Types of Non-linear Activation Functions


• Sigmoid Function

• Tanh Function

• Rectified Linear Unit (ReLU)

• Leaky ReLU

• Softmax Function

2.1 Sigmoid Function


The sigmoid function is defined as:
1
f (x) =
1 + e−x

2
Output Range
(0, 1)

Key Features
• Squashes input into a range between 0 and 1.

• Commonly used in the output layer for binary classification tasks.

Advantages
• Probabilistic interpretation of output.

Disadvantages
• Vanishing Gradient Problem: For large positive or negative inputs, the gradient
becomes very small.

• Saturation: Outputs close to 0 or 1 lead to minimal weight updates.

Example with Data


Consider:  
−1
0
x=
 1 .

2
Applying the sigmoid function:
 
0.2689
1  0.5 
f (x) = =
0.7311 .

1 + e−x
0.8808

2.2 Hyperbolic Tangent (Tanh) Function


The tanh function is defined as:
ex − e−x
f (x) =
ex + e−x

Output Range
(−1, 1)

Key Features
• Similar to sigmoid but centered at 0.

• Captures negative values effectively.

3
Advantages
• Zero-centered output leads to balanced gradients.

Disadvantages
• Suffers from the vanishing gradient problem for large inputs.

Example with Data


Consider:  
−1
0
x=
 1 .

2
Applying the tanh function:
 
−0.7616
 0 
f (x) = tanh(x) = 
 0.7616  .

0.9640

2.3 Rectified Linear Unit (ReLU)


The ReLU function is defined as:

f (x) = max(0, x)

Output Range
[0, ∞)

Key Features
• Outputs zero for all negative inputs.

• Outputs the input directly for positive inputs.

Advantages
• Avoids vanishing gradients for positive values.

• Simple and computationally efficient.

Disadvantages
• Dead Neurons Problem: Neurons can become inactive if they output zero re-
peatedly.

4
Example with Data
Consider:  
−1
0
x=
 1 .

2
Applying the ReLU function:
 
0
0
f (x) = max(0, x) = 
1 .

2

2.4 Leaky ReLU


Leaky ReLU is defined as: (
x, if x > 0,
f (x) =
αx, if x ≤ 0,
where α is a small constant (e.g., α = 0.01).

Output Range
(−∞, ∞)

Key Features
• Allows small negative values instead of zero for negative inputs.

Advantages
• Solves the dead neurons problem.

Disadvantages
• Slightly more complex than standard ReLU.

Example with Data


Consider:  
−1
0
 1 .
x= 

2
Applying Leaky ReLU (α = 0.01):
 
−0.01
 0 
f (x) = 
 1 .

5

2.5 Softmax Function


The softmax function is defined as:
exi
f (xi ) = P xj
je

Output Range
X
(0, 1), and f (xi ) = 1.
i

Key Features
• Converts raw scores into probabilities.

• Commonly used in the output layer for multi-class classification.

Advantages
• Outputs are interpretable as probabilities.

Disadvantages
• Computationally expensive for large datasets.

Example with Data


Consider:  
1
x = 2 .
3
Applying the softmax function:
 
e1  
e1 +e2 +e3 0.0900
 e2 
f (x) =  e1 +e2 +e3  = 0.2447 .
e3 0.6652
e1 +e2 +e3

Conclusion
Activation functions are essential for neural networks to learn complex patterns in data.
Choosing the right activation function depends on the problem:
• Sigmoid: Binary classification tasks.

• Tanh: Regression or tasks involving negative values.

• ReLU: Default choice for hidden layers.

6
• Leaky ReLU: When ReLU encounters dead neurons.

• Softmax: Multi-class classification tasks.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy