Act Fun
Act Fun
• Bounds Output: Some activation functions restrict the output to a specific range,
which helps stabilize training.
f (x) = x
Key Features
• The output is directly proportional to the input.
Advantages
• Simple and computationally efficient.
1
Disadvantages
• Does not introduce non-linearity, making it unsuitable for learning complex pat-
terns.
4
Applying the linear activation function:
1
2
f (x) = x =
3 .
• Tanh Function
• Leaky ReLU
• Softmax Function
2
Output Range
(0, 1)
Key Features
• Squashes input into a range between 0 and 1.
Advantages
• Probabilistic interpretation of output.
Disadvantages
• Vanishing Gradient Problem: For large positive or negative inputs, the gradient
becomes very small.
2
Applying the sigmoid function:
0.2689
1 0.5
f (x) = =
0.7311 .
1 + e−x
0.8808
—
Output Range
(−1, 1)
Key Features
• Similar to sigmoid but centered at 0.
3
Advantages
• Zero-centered output leads to balanced gradients.
Disadvantages
• Suffers from the vanishing gradient problem for large inputs.
2
Applying the tanh function:
−0.7616
0
f (x) = tanh(x) =
0.7616 .
0.9640
—
f (x) = max(0, x)
Output Range
[0, ∞)
Key Features
• Outputs zero for all negative inputs.
Advantages
• Avoids vanishing gradients for positive values.
Disadvantages
• Dead Neurons Problem: Neurons can become inactive if they output zero re-
peatedly.
4
Example with Data
Consider:
−1
0
x=
1 .
2
Applying the ReLU function:
0
0
f (x) = max(0, x) =
1 .
2
—
Output Range
(−∞, ∞)
Key Features
• Allows small negative values instead of zero for negative inputs.
Advantages
• Solves the dead neurons problem.
Disadvantages
• Slightly more complex than standard ReLU.
2
Applying Leaky ReLU (α = 0.01):
−0.01
0
f (x) =
1 .
5
—
Output Range
X
(0, 1), and f (xi ) = 1.
i
Key Features
• Converts raw scores into probabilities.
Advantages
• Outputs are interpretable as probabilities.
Disadvantages
• Computationally expensive for large datasets.
Conclusion
Activation functions are essential for neural networks to learn complex patterns in data.
Choosing the right activation function depends on the problem:
• Sigmoid: Binary classification tasks.
6
• Leaky ReLU: When ReLU encounters dead neurons.