Capsule Network - Kumar Shaswat
Capsule Network - Kumar Shaswat
Capsule Netwo E18SOE802
rks
Bem BENNETT
UNIVERSITY
What we will Cover
CNNs Over - Simplified
1 2 3
CNN Learns to Layers that are Finally dense layers
detect edges and deeper, learn to will combine very
colour gradients detect more complex high level features
features and give classification
predictions
CNN's
Drawback
Internal data representation
of a convolutional neural
network does not take into
account important spatial
hierarchies between simple
and complex objects.
CNN's Band Aid
CNN approach to solve this issue is to Hinton: “The pooling operation used in
use max pooling or successive convolutional neural networks is a big
convolutional layers that reduce mistake and the fact that it works so
spacial size of the data flowing through well is a disaster.”
the network and therefore increase
the “field of view” of higher layer’s
neurons, thus allowing them to detect
higher order features in a larger region
of the input image.
Our Brain Accepts all these as Same
Hinton argues that brains, in fact, brain do
the opposite of rendering i.e inverse
graphics: from visual information received
Hinton's by eyes, it deconstruct a hierarchical
Idea representation of the world around us and
try to match it with already learned
patterns and relationships stored in the
brain.
Visual Representation of Requirement
Capsule: A capsule is a group of
neurons whose outputs represent
different properties of the same
CapsNet entity.
(Capsule
Network) Capsule Network: The idea is to add
capsules to a conventional neural
networks and to reuse output from
several of those capsules.
CapsNet Architecture
CapsNet Architecture
Architecture Encoder:
◦ Encoder part of the network takes as input a 28 by
28 MNIST digit image and learns to encode it into a
16-dimensional vector of instantiation parameters
Encoder Decoder
◦ The output of the network during prediction is a 10-
dimensional vectors of lengths of DigitCaps’ outputs.
Convolution Fully
Layer Connected#1 Decoder:
◦ Decoder takes a 16-dimensional vector from the
correct DigitCap and learns to decode it into an
PrimaryCaps Fully
Layer Connected #2 image of a digit
◦ Decoder forces capsules to learn features that are
useful for reconstructing the original image
DigitCaps Fully
Layer Connected#3
Convolution
layer
Input: 28x28 image (one color
channel).
Number of parameters:
(9*9+1)*256 = 20992
Kernels: 256, Size: 9*9*1
Output: 20x20x256 tensor.
Stride: 1
Input: 20x20x256 tensor.
Number of Capsules: 32
PrimaryCaps Each capsule applies: 8*(*9*9*256), stride 2
Layer Output: 6x6x8x32 tensor.
Number of parameters: 5308672.
PrimaryCaps
How Capsule works
Digit Caps Layer
Input: 6x6x8x32 tensor. = 1152 tensors
10 digit capsule one for each digit
Weight for each tensor: 8*16
1152c coefficients and 1152 b coefficients
Output: 16x10 matrix.
Number of parameters:
(1152*8*16+1152+1152)*10 = 1497600.
CapsNet Loss Function
Fully Connected Layers
Fully Connected Fully Connected Fully Connected
Layer#1 Layer#2 Layer# 3
Routing by
Squashing DigitCaps
Agreement
Steps in CapsNet
Routing by
Agreement
Lower level capsule will
send its input to the
higher level capsule that
“agrees” with its input.
References
1. Sabour, S., Frosst, N. and Hinton, G.E., 2017. Dynamic routing between capsules. In Advances
in Neural Information Processing Systems (pp. 3856-3866).
2. Hinton, G.E., Krizhevsky, A. and Wang, S.D., 2011, June. Transforming auto-encoders.
In International Conference on Artificial Neural Networks (pp. 44-51). Springer, Berlin,
Heidelberg.
3. Understanding Hinton’s Capsule Networks by Max Pechyonkin
4. Understanding Capsule Networks — AI’s Alluring New Architecture by Nick Bourdakos