The document provides an overview of Generative Adversarial Networks (GANs), detailing their architecture, which consists of a Generator and a Discriminator that engage in a competitive training process. It explains the roles of each component, how they function, and their applications in generating realistic data such as images. Additionally, it discusses various types of GANs and the loss functions used to optimize their performance.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
11 views61 pages
Week 8
The document provides an overview of Generative Adversarial Networks (GANs), detailing their architecture, which consists of a Generator and a Discriminator that engage in a competitive training process. It explains the roles of each component, how they function, and their applications in generating realistic data such as images. Additionally, it discusses various types of GANs and the loss functions used to optimize their performance.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61
Deep Learning
Dr. Irfan Yousuf
Institute of Data Science, UET, Lahore (Week 8; March 09, 2025) Outline • Generative Adversarial Networks (GANs) Generative Models • A generative model is a type of machine learning model that aims to learn the underlying patterns or distributions of data in order to generate new, similar data.
• In essence, it's like teaching a computer to dream up its own
data based on what it has seen before.
• The significance of this model lies in its ability to create,
which has vast implications in various fields, from art to science. Generative vs. Discriminative Models • Generative models: These models focus on understanding how the data is generated.
• They aim to learn the distribution of the data itself.
• For instance, if we're looking at pictures of cats and dogs, a
generative model would try to understand what makes a cat look like a cat and a dog look like a dog. It would then be able to generate new images that resemble either cats or dogs. Generative vs. Discriminative Models • Discriminative models: These models, on the other hand, focus on distinguishing between different types of data.
• They don't necessarily learn or understand how the data is
generated; instead, they learn the boundaries that separate one class of data from another.
• Using the same example of cats and dogs, a discriminative
model would learn to tell the difference between the two, but it wouldn't necessarily be able to generate a new image of a cat or dog on its own. Generative Adversarial Networks Generative Adversarial Networks Generative Adversarial Networks Generative Adversarial Networks Generative Adversarial Networks Generative Adversarial Networks GANs Architecture • GANs consists of two neural networks. • There is a Generator G(x) and a Discriminator D(x). • Both of them play an adversarial game. • The generator's aim is to fool the discriminator by producing data that are similar to those in the training set. • The discriminator will try not to be fooled by identifying fake data from real data. • Both of them work simultaneously to learn and train complex data like audio, video, or image files. What is a Generator? • A Generator in GANs is a neural network that creates fake data to be trained on the discriminator. • It learns to generate reasonable data. • The generated examples/instances become negative training examples for the discriminator. It takes a fixed-length random vector carrying noise as input and generates a sample. What is a Generator? • The main aim of the Generator is to make the discriminator classify its output as real. The part of the GAN that trains the Generator includes: 1. noisy input vector 2. generator network, which transforms the random input into a data instance 3. discriminator network, which classifies the generated data 4. generator loss, which penalizes the Generator for failing to dolt the discriminator. 5. The backpropagation method is used to adjust each weight in the right direction by calculating the weight's impact on the output. It is also used to obtain gradients and these gradients can help change the generator weights. How Generator Works? • Input: Noise Vector (Latent Space) ogenerator takes a random noise vector, typically of a fixed size (for example, a 100-dimensional vector). oThis noise is drawn from a simple distribution, such as a Gaussian (normal) or uniform distribution. oThis vector doesn't contain any meaningful information at first, but the generator will learn to map it to a complex image space. How Generator Works? How Generator Works? • Fully Connected Layers (or Convolutional Layers) oThe noise vector is passed through a series of fully connected layers or, in some cases, convolutional layers (especially in the case of generating images). oThe purpose of these layers is to progressively shape and expand the latent vector into a more structured representation, eventually leading to an image. How Generator Works? • Upsampling (Deconvolution) oAs the generator processes the noise vector, it starts with a very small representation and gradually upscales the tensor (the multi-dimensional data) into a larger, more complex one.
oThis step usually involves transposed convolutions (also
known as deconvolutions or upsampling layers), which help to increase the spatial dimensions of the feature maps. How Generator Works? • Activation Functions • After each layer (or block of layers), non-linear activation functions like ReLU (Rectified Linear Unit) or Leaky ReLU are applied.
• These activations introduce non-linearity, enabling the
network to learn more complex patterns and representations.
• The final layer typically uses a tanh or sigmoid activation
function, depending on the desired output image range. How Generator Works? • Output Image • Eventually, after passing through all these layers, the generator produces an image-like tensor.
• This tensor has the same shape as the target image
dimensions (e.g., 64x64x3 for a 64x64 RGB image).
• The generator’s goal is to make this output indistinguishable
from real images when evaluated by the discriminator. How Generator Works? • Training through Adversarial Feedback • The generator is trained to fool the discriminator into thinking that the generated image is real. • The discriminator receives both real and generated images and attempts to classify them as real or fake. • The generator's weights are updated through backpropagation, using feedback from the discriminator's classification errors. • Over time, the generator learns to produce more realistic images that become harder for the discriminator to distinguish from real images. What is a Discriminator? • The Discriminator is a neural network that identifies real data from the fake data created by the Generator. The discriminator's training data comes from different two sources:
• The real data instances, such as real pictures of birds,
humans, currency notes, etc., are used by the Discriminator as positive samples during training. • The fake data instances created by the Generator are used as negative examples during the training process. What is a Discriminator? How Discriminator Works? • The architecture of the discriminator in a Generative Adversarial Network (GAN) is typically a binary classifier designed to distinguish between real and fake data.
• The structure of the discriminator can vary depending on the
type of data (e.g., images, text, audio), but for the most common use case of GANs—generating images— discriminators are usually built using convolutional neural networks (CNNs) due to their effectiveness in handling image data. How Discriminator Works? • Input Layer: oThe input image is fed into the discriminator. It has a shape of 𝐻×𝑊×C (height, width, and channels). • Convolutional Layers: oThe image is passed through a series of convolutional layers that apply filters to extract high-level features. oIn each convolutional layer, the number of filters (or kernels) increases, and the spatial dimensions (height and width) of the image typically decrease. oThis helps the network learn hierarchical features (e.g., edges, textures, complex patterns). How Discriminator Works? • Activation Function • After each convolutional layer, a Leaky ReLU activation function is typically used.
• Leaky ReLU is preferred over regular ReLU because it
allows small negative values to pass through, which helps avoid "dying ReLU" problems (where neurons stop learning because they always output zeros). How Discriminator Works? • Output Layer oThe final layer is a single neuron (or unit) with a sigmoid activation function. oThe output of this neuron is a probability between 0 and 1: o1 indicates the input is real (from the true data distribution). o0 indicates the input is fake (generated by the generator). oThe output is typically a scalar value, representing the probability that the input image is real. GANs Architecture GANs Applications • Image-to-image translation: Image-to-image translation is an application where a certain image B is transformed with the properties of A. GANs Applications • Generate Human Faces: GANs can be trained on the images of humans to generate realistic faces. GANs Applications • Generate images based on input text. Loss in GANs Discriminator Loss • While the discriminator is trained, it classifies both the real data and the fake data from the generator. • It penalizes itself for misclassifying a real instance as fake, or a fake instance (created by the generator) as real, by maximizing the below function.
• log(D(x)) refers to the probability that the discriminator is
rightly classifying the real image, • maximizing log(1-D(G(z))) would help it to correctly label the fake image that comes from the generator. Generator loss • While the generator is trained, it samples random noise and produces an output from that noise. The output then goes through the discriminator and gets classified as either “Real” or “Fake” based on the ability of the discriminator to tell one from the other. • The generator loss is then calculated from the discriminator’s classification – it gets rewarded if it successfully fools the discriminator, and gets penalized otherwise. GANs Loss Function GANs Loss Function • The original GAN Loss function is also known as Min-Max Loss, is a fundamental concept in GANs. It serves as the “scorecard” for both the Generator and Discriminator, guiding their training process through a competitive dynamic.
• The generator tries to minimize this function while the
discriminator tries to maximize it. GANs Loss Function • min_G: This indicates that we are minimizing the loss function with respect to the Generator.
• max_D: This signifies that we are maximizing the loss
function with respect to the Discriminator (D).
• V(D,G): It tells us about the overall loss function that
depends on both the generator and the discriminator. GANs Loss Function • E: This denotes the expectation operator, which essentially calculates the average value of a function across the entire data distribution. In simpler terms, it considers all the possible data points and their probabilities to arrive at a representative value. • E_x: This denotes the expectation over the real data distribution (represented by x). • E_z: This signifies the expectation over the random noise distribution (represented by z). GANs Loss Function • logD(x): This term represents the Discriminator’s loss function when presented with real data(x). • Ideally, the Discriminator should output a value close to 1 (indicating “real”) for real data. • Therefore, high log value (log of a value close to 1) translates to a low loss for the discriminator which it wants to maximize. GANs Loss Function • log(1-D(G(z))): This term represents the Generator’s loss. • It calculates the logarithm of (1 minus the Discriminator Output) when presented with generated data (G(z)). • The Discriminator should ideally output a value close to 0 ( indicating “Fake”) for generated data. • Therefore, high log value (logarithm of a value close to 0) translates to a high loss for the generator, which it wants to minimize. Types of GANs Types of GANs
• Vanilla GAN • Conditional Gan (CGAN) • Deep Convolutional GAN (DCGAN) • CycleGAN • Generative Adversarial Text to Image Synthesis • Style GAN • Super Resolution GAN (SRGAN) Strided vs. Fractionally-strided Convolution
• In a standard convolution operation, a filter (also called a
kernel) is applied to the input data, and the result is a weighted sum of the values in the input region covered by the filter. This operation reduces the spatial dimensions of the input.
• Fractionally-strided convolution is designed to perform the
opposite operation, i.e., increase the spatial dimensions. Strided vs. Fractionally-strided Convolution
• The standard convolutional operation reduces the spatial
dimensions (width and height) of the input and intermediate feature maps, a process known as encoding or downsampling. It’s used in the discriminator to analyze real and fake images.
• The fractionally-strided convolution gradually restores the
spatial dimensions of the feature maps while reducing the number of channels. This process is known as upsampling or decoding. It’s used in the generator to transform random noise into a realistic image. Strided vs. Fractionally-strided Convolution Strided Convolution (Downsampling) Fractionally-Strided Convolution (Upsampling)
Transposed Convolution of 2x2 input with a 2x2 kernel to
produce a 3x3 output. At each step, a single element (e.g., red) of the input matrix multiplies all 4 kernel elements, with the result placed in an intermediate 3x3 matrix at a location corresponding to position of the red input element. The process is repeated for the three other input elements. The four intermediate matrices are added together to create the final 3x3 output. Fractionally-Strided Convolution (Upsampling) Fractionally-Strided Convolution (Upsampling) Deep Convolutional GAN (DCGAN) Deep Convolutional GAN (DCGAN)
• In recent years, supervised learning with convolutional
networks (CNNs) has seen huge adoption in computer vision applications.
• Comparatively, unsupervised learning with CNNs has
received less attention.
• We introduce a class of CNNs called deep convolutional
generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. DCGAN Architecture
• DCGAN uses convolutional and convolutional-transpose
layers in the generator and discriminator, respectively.
• The generator consists of convolutional-transpose layers,
batch normalization layers, and ReLU activations. The output will be a 3x64x64 RGB image.
• The discriminator consists of strided convolution layers,
batch normalization layers, and LeakyRelu as activation function. It takes a 3x64x64 input image. DCGAN Architecture DCGAN Architecture: Generator
• The DCGAN generator is designed to synthesize realistic
images from random noise.
• It typically comprises several layers of transposed
convolutions, also known as deconvolutions or upsampling layers.
• Starting with a random noise vector as input, often drawn
from a standard normal distribution, this vector is projected through a fully connected layer to a higher-dimensional space. DCGAN Architecture: Generator
• The subsequent layers use transposed convolutions to
progressively upsample the features, gradually transforming the initial noise into a complex image representation.
• Batch normalization, introduced in each layer, helps stabilize
the learning process by normalizing the input to each layer and reducing internal covariate shifts DCGAN Architecture: Discriminator
• In contrast, the discriminator network within a DCGAN
functions as a binary classifier, distinguishing between real and generated images.
• It comprises a series of convolutional layers, each followed
by batch normalization and activation functions like LeakyReLU to introduce non-linearity.
• Strided convolutions are employed to reduce the spatial
dimensions of the input, gradually extracting high-level features to discern real images from synthetic ones. DCGAN Architecture: Discriminator
• The discriminator’s architecture does not include fully
connected layers, aligning with the principles of convolutional networks to capture spatial hierarchies effectively.
• The discriminator ends with a sigmoid activation function,
producing a probability score indicating the authenticity of the input image. Applications of DCGAN
• Image Generation: DCGANs excel in generating high-
resolution, photorealistic images across diverse domains, from human faces to landscapes and artworks.
• Data Augmentation: They find utility in augmenting
training datasets, especially in scenarios with limited data, enhancing the generalization capability of machine learning models.
• Image-to-Image Translation: DCGANs can aid in tasks like
converting sketches to realistic images or altering images in ways that preserve their content while changing aspects like colors or textures. DCGAN generated images DCGAN generated images Summary • Generative Adversarial Networks