0% found this document useful (0 votes)
11 views61 pages

Week 8

The document provides an overview of Generative Adversarial Networks (GANs), detailing their architecture, which consists of a Generator and a Discriminator that engage in a competitive training process. It explains the roles of each component, how they function, and their applications in generating realistic data such as images. Additionally, it discusses various types of GANs and the loss functions used to optimize their performance.

Uploaded by

anamtoc9anam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views61 pages

Week 8

The document provides an overview of Generative Adversarial Networks (GANs), detailing their architecture, which consists of a Generator and a Discriminator that engage in a competitive training process. It explains the roles of each component, how they function, and their applications in generating realistic data such as images. Additionally, it discusses various types of GANs and the loss functions used to optimize their performance.

Uploaded by

anamtoc9anam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Deep Learning

Dr. Irfan Yousuf


Institute of Data Science, UET, Lahore
(Week 8; March 09, 2025)
Outline
• Generative Adversarial Networks (GANs)
Generative Models
• A generative model is a type of machine learning model that
aims to learn the underlying patterns or distributions of data
in order to generate new, similar data.

• In essence, it's like teaching a computer to dream up its own


data based on what it has seen before.

• The significance of this model lies in its ability to create,


which has vast implications in various fields, from art to
science.
Generative vs. Discriminative Models
• Generative models: These models focus on understanding
how the data is generated.

• They aim to learn the distribution of the data itself.

• For instance, if we're looking at pictures of cats and dogs, a


generative model would try to understand what makes a cat
look like a cat and a dog look like a dog. It would then be
able to generate new images that resemble either cats or dogs.
Generative vs. Discriminative Models
• Discriminative models: These models, on the other hand,
focus on distinguishing between different types of data.

• They don't necessarily learn or understand how the data is


generated; instead, they learn the boundaries that separate
one class of data from another.

• Using the same example of cats and dogs, a discriminative


model would learn to tell the difference between the two, but
it wouldn't necessarily be able to generate a new image of a
cat or dog on its own.
Generative Adversarial Networks
Generative Adversarial Networks
Generative Adversarial Networks
Generative Adversarial Networks
Generative Adversarial Networks
Generative Adversarial Networks
GANs Architecture
• GANs consists of two neural networks.
• There is a Generator G(x) and a Discriminator D(x).
• Both of them play an adversarial game.
• The generator's aim is to fool the discriminator by producing
data that are similar to those in the training set.
• The discriminator will try not to be fooled by identifying fake
data from real data.
• Both of them work simultaneously to learn and train complex
data like audio, video, or image files.
What is a Generator?
• A Generator in GANs is a neural network that creates fake
data to be trained on the discriminator.
• It learns to generate reasonable data.
• The generated examples/instances become negative training
examples for the discriminator. It takes a fixed-length
random vector carrying noise as input and generates a
sample.
What is a Generator?
• The main aim of the Generator is to make the discriminator
classify its output as real. The part of the GAN that trains the
Generator includes:
1. noisy input vector
2. generator network, which transforms the random input into
a data instance
3. discriminator network, which classifies the generated data
4. generator loss, which penalizes the Generator for failing to
dolt the discriminator.
5. The backpropagation method is used to adjust each weight
in the right direction by calculating the weight's impact on
the output. It is also used to obtain gradients and these
gradients can help change the generator weights.
How Generator Works?
• Input: Noise Vector (Latent Space)
ogenerator takes a random noise vector, typically of a fixed
size (for example, a 100-dimensional vector).
oThis noise is drawn from a simple distribution, such as a
Gaussian (normal) or uniform distribution.
oThis vector doesn't contain any meaningful information at
first, but the generator will learn to map it to a complex
image space.
How Generator Works?
How Generator Works?
• Fully Connected Layers (or Convolutional Layers)
oThe noise vector is passed through a series of fully connected
layers or, in some cases, convolutional layers (especially in
the case of generating images).
oThe purpose of these layers is to progressively shape and
expand the latent vector into a more structured
representation, eventually leading to an image.
How Generator Works?
• Upsampling (Deconvolution)
oAs the generator processes the noise vector, it starts with a
very small representation and gradually upscales the tensor
(the multi-dimensional data) into a larger, more complex one.

oThis step usually involves transposed convolutions (also


known as deconvolutions or upsampling layers), which help
to increase the spatial dimensions of the feature maps.
How Generator Works?
• Activation Functions
• After each layer (or block of layers), non-linear activation
functions like ReLU (Rectified Linear Unit) or Leaky ReLU
are applied.

• These activations introduce non-linearity, enabling the


network to learn more complex patterns and representations.

• The final layer typically uses a tanh or sigmoid activation


function, depending on the desired output image range.
How Generator Works?
• Output Image
• Eventually, after passing through all these layers, the
generator produces an image-like tensor.

• This tensor has the same shape as the target image


dimensions (e.g., 64x64x3 for a 64x64 RGB image).

• The generator’s goal is to make this output indistinguishable


from real images when evaluated by the discriminator.
How Generator Works?
• Training through Adversarial Feedback
• The generator is trained to fool the discriminator into
thinking that the generated image is real.
• The discriminator receives both real and generated images
and attempts to classify them as real or fake.
• The generator's weights are updated through
backpropagation, using feedback from the discriminator's
classification errors.
• Over time, the generator learns to produce more realistic
images that become harder for the discriminator to
distinguish from real images.
What is a Discriminator?
• The Discriminator is a neural network that identifies real data
from the fake data created by the Generator. The
discriminator's training data comes from different two
sources:

• The real data instances, such as real pictures of birds,


humans, currency notes, etc., are used by the Discriminator
as positive samples during training.
• The fake data instances created by the Generator are used as
negative examples during the training process.
What is a Discriminator?
How Discriminator Works?
• The architecture of the discriminator in a Generative
Adversarial Network (GAN) is typically a binary classifier
designed to distinguish between real and fake data.

• The structure of the discriminator can vary depending on the


type of data (e.g., images, text, audio), but for the most
common use case of GANs—generating images—
discriminators are usually built using convolutional neural
networks (CNNs) due to their effectiveness in handling
image data.
How Discriminator Works?
• Input Layer:
oThe input image is fed into the discriminator. It has a shape
of 𝐻×𝑊×C (height, width, and channels).
• Convolutional Layers:
oThe image is passed through a series of convolutional layers
that apply filters to extract high-level features.
oIn each convolutional layer, the number of filters (or kernels)
increases, and the spatial dimensions (height and width) of
the image typically decrease.
oThis helps the network learn hierarchical features (e.g.,
edges, textures, complex patterns).
How Discriminator Works?
• Activation Function
• After each convolutional layer, a Leaky ReLU activation
function is typically used.

• Leaky ReLU is preferred over regular ReLU because it


allows small negative values to pass through, which helps
avoid "dying ReLU" problems (where neurons stop learning
because they always output zeros).
How Discriminator Works?
• Output Layer
oThe final layer is a single neuron (or unit) with a sigmoid
activation function.
oThe output of this neuron is a probability between 0 and 1:
o1 indicates the input is real (from the true data distribution).
o0 indicates the input is fake (generated by the generator).
oThe output is typically a scalar value, representing the
probability that the input image is real.
GANs Architecture
GANs Applications
• Image-to-image translation: Image-to-image translation is
an application where a certain image B is transformed with
the properties of A.
GANs Applications
• Generate Human Faces: GANs can be trained on the
images of humans to generate realistic faces.
GANs Applications
• Generate images based on input text.
Loss in GANs
Discriminator Loss
• While the discriminator is trained, it classifies both the real
data and the fake data from the generator.
• It penalizes itself for misclassifying a real instance as fake, or
a fake instance (created by the generator) as real, by
maximizing the below function.

• log(D(x)) refers to the probability that the discriminator is


rightly classifying the real image,
• maximizing log(1-D(G(z))) would help it to correctly label
the fake image that comes from the generator.
Generator loss
• While the generator is trained, it samples random noise and
produces an output from that noise. The output then goes
through the discriminator and gets classified as either “Real”
or “Fake” based on the ability of the discriminator to tell one
from the other.
• The generator loss is then calculated from the discriminator’s
classification – it gets rewarded if it successfully fools the
discriminator, and gets penalized otherwise.
GANs Loss Function
GANs Loss Function
• The original GAN Loss function is also known as Min-Max
Loss, is a fundamental concept in GANs. It serves as the
“scorecard” for both the Generator and Discriminator,
guiding their training process through a competitive dynamic.

• The generator tries to minimize this function while the


discriminator tries to maximize it.
GANs Loss Function
• min_G: This indicates that we are minimizing the loss
function with respect to the Generator.

• max_D: This signifies that we are maximizing the loss


function with respect to the Discriminator (D).

• V(D,G): It tells us about the overall loss function that


depends on both the generator and the discriminator.
GANs Loss Function
• E: This denotes the expectation operator, which essentially
calculates the average value of a function across the entire
data distribution. In simpler terms, it considers all the
possible data points and their probabilities to arrive at a
representative value.
• E_x: This denotes the expectation over the real data
distribution (represented by x).
• E_z: This signifies the expectation over the random noise
distribution (represented by z).
GANs Loss Function
• logD(x): This term represents the Discriminator’s loss
function when presented with real data(x).
• Ideally, the Discriminator should output a value close to 1
(indicating “real”) for real data.
• Therefore, high log value (log of a value close to 1) translates
to a low loss for the discriminator which it wants to
maximize.
GANs Loss Function
• log(1-D(G(z))): This term represents the Generator’s loss.
• It calculates the logarithm of (1 minus the Discriminator
Output) when presented with generated data (G(z)).
• The Discriminator should ideally output a value close to 0 (
indicating “Fake”) for generated data.
• Therefore, high log value (logarithm of a value close to 0)
translates to a high loss for the generator, which it wants to
minimize.
Types of GANs
Types of GANs

• Vanilla GAN
• Conditional Gan (CGAN)
• Deep Convolutional GAN (DCGAN)
• CycleGAN
• Generative Adversarial Text to Image Synthesis
• Style GAN
• Super Resolution GAN (SRGAN)
Strided vs. Fractionally-strided Convolution

• In a standard convolution operation, a filter (also called a


kernel) is applied to the input data, and the result is a
weighted sum of the values in the input region covered by the
filter. This operation reduces the spatial dimensions of the
input.

• Fractionally-strided convolution is designed to perform the


opposite operation, i.e., increase the spatial dimensions.
Strided vs. Fractionally-strided Convolution

• The standard convolutional operation reduces the spatial


dimensions (width and height) of the input and intermediate
feature maps, a process known as encoding or
downsampling. It’s used in the discriminator to analyze real
and fake images.

• The fractionally-strided convolution gradually restores the


spatial dimensions of the feature maps while reducing the
number of channels. This process is known as upsampling
or decoding. It’s used in the generator to transform random
noise into a realistic image.
Strided vs. Fractionally-strided Convolution
Strided Convolution (Downsampling)
Fractionally-Strided Convolution (Upsampling)

Transposed Convolution of 2x2 input with a 2x2 kernel to


produce a 3x3 output. At each step, a single element (e.g., red)
of the input matrix multiplies all 4 kernel elements, with the
result placed in an intermediate 3x3 matrix at a location
corresponding to position of the red input element. The
process is repeated for the three other input elements. The four
intermediate matrices are added together to create the final 3x3
output.
Fractionally-Strided Convolution (Upsampling)
Fractionally-Strided Convolution (Upsampling)
Deep Convolutional GAN (DCGAN)
Deep Convolutional GAN (DCGAN)

• In recent years, supervised learning with convolutional


networks (CNNs) has seen huge adoption in computer vision
applications.

• Comparatively, unsupervised learning with CNNs has


received less attention.

• We introduce a class of CNNs called deep convolutional


generative adversarial networks (DCGANs), that have certain
architectural constraints, and demonstrate that they are a
strong candidate for unsupervised learning.
DCGAN Architecture

• DCGAN uses convolutional and convolutional-transpose


layers in the generator and discriminator, respectively.

• The generator consists of convolutional-transpose layers,


batch normalization layers, and ReLU activations. The output
will be a 3x64x64 RGB image.

• The discriminator consists of strided convolution layers,


batch normalization layers, and LeakyRelu as activation
function. It takes a 3x64x64 input image.
DCGAN Architecture
DCGAN Architecture: Generator

• The DCGAN generator is designed to synthesize realistic


images from random noise.

• It typically comprises several layers of transposed


convolutions, also known as deconvolutions or upsampling
layers.

• Starting with a random noise vector as input, often drawn


from a standard normal distribution, this vector is projected
through a fully connected layer to a higher-dimensional
space.
DCGAN Architecture: Generator

• The subsequent layers use transposed convolutions to


progressively upsample the features, gradually transforming
the initial noise into a complex image representation.

• Batch normalization, introduced in each layer, helps stabilize


the learning process by normalizing the input to each layer
and reducing internal covariate shifts
DCGAN Architecture: Discriminator

• In contrast, the discriminator network within a DCGAN


functions as a binary classifier, distinguishing between real
and generated images.

• It comprises a series of convolutional layers, each followed


by batch normalization and activation functions like
LeakyReLU to introduce non-linearity.

• Strided convolutions are employed to reduce the spatial


dimensions of the input, gradually extracting high-level
features to discern real images from synthetic ones.
DCGAN Architecture: Discriminator

• The discriminator’s architecture does not include fully


connected layers, aligning with the principles of
convolutional networks to capture spatial hierarchies
effectively.

• The discriminator ends with a sigmoid activation function,


producing a probability score indicating the authenticity of
the input image.
Applications of DCGAN

• Image Generation: DCGANs excel in generating high-


resolution, photorealistic images across diverse domains,
from human faces to landscapes and artworks.

• Data Augmentation: They find utility in augmenting


training datasets, especially in scenarios with limited data,
enhancing the generalization capability of machine learning
models.

• Image-to-Image Translation: DCGANs can aid in tasks like


converting sketches to realistic images or altering images in
ways that preserve their content while changing aspects like
colors or textures.
DCGAN generated images
DCGAN generated images
Summary
• Generative Adversarial Networks

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy