0% found this document useful (0 votes)
6 views15 pages

Unit Iii

Unit III of the document covers generative models, focusing on Autoencoders, Variational Autoencoders (VAE), and Generative Adversarial Networks (GAN). It explains the architecture, training processes, advantages, and disadvantages of each model, emphasizing their applications in data generation and representation learning. Practical implementation using frameworks like TensorFlow and PyTorch is also included, along with hands-on exercises.

Uploaded by

itissandyprof
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views15 pages

Unit Iii

Unit III of the document covers generative models, focusing on Autoencoders, Variational Autoencoders (VAE), and Generative Adversarial Networks (GAN). It explains the architecture, training processes, advantages, and disadvantages of each model, emphasizing their applications in data generation and representation learning. Practical implementation using frameworks like TensorFlow and PyTorch is also included, along with hands-on exercises.

Uploaded by

itissandyprof
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

AD2602-Generative AI UNIT III

UNIT III
GENERATIVE MODEL
Syllabus : Introduction to Autoencoders – Variational Autoencoders (VAE) – Generative
Adversarial Networks (GAN) – Flow-based models – Practical implementation and hands-on
exercises (using TensorFlow, PyTorch, Jupyter Notebook, Keras, etc).
Introduction to Autoencoders
Autoencoders are a type of neural network designed for unsupervised learning tasks,
particularly in the context of dimensionality reduction, feature extraction, and data
reconstruction. They consist of two main components: an encoder and a decoder. The encoder
compresses the input data into a lower-dimensional latent space representation, while the
decoder reconstructs the input data from this representation. The training objective is to
minimize the reconstruction loss, often measured as the mean squared error or binary cross-
entropy between the original and reconstructed data. Autoencoders find applications in tasks
like noise reduction, anomaly detection, and pretraining for other machine learning models.
Variants such as sparse autoencoders, denoising autoencoders, and convolutional autoencoders
expand their utility by introducing regularization, robustness to noise, and spatial information,
respectively.
Autoencoders are a specialized class of algorithms that can learn efficient representations of
input data with no need for labels. It is a class of artificial neural networks designed for
unsupervised learning. Learning to compress and effectively represent input data without
specific labels is the essential principle of an automatic decoder. This is accomplished using a
two-fold structure that consists of an encoder and a decoder. The encoder transforms the input
data into a reduced-dimensional representation, which is often referred to as “latent space” or
“encoding”. From that representation, a decoder rebuilds the initial input. For the network to
gain meaningful patterns in data, a process of encoding and decoding facilitates the definition
of essential features.
Architecture of Autoencoder
The general architecture of an autoencoder includes an encoder, decoder, and bottleneck layer.

1
AD2602-Generative AI UNIT III

1) Encoder
Input layer take raw input data
The hidden layers progressively reduce the dimensionality of the input, capturing important
features and patterns. These layer compose the encoder.
The bottleneck layer (latent space) is the final hidden layer, where the dimensionality is
significantly reduced. This layer represents the compressed encoding of the input data.
2) Decoder
The bottleneck layer takes the encoded representation and expands it back to the dimensionality
of the original input.
The hidden layers progressively increase the dimensionality and aim to reconstruct the original
input.
The output layer produces the reconstructed output, which ideally should be as close as possible
to the input data.
3. The loss function used during training is typically a reconstruction loss, measuring the
difference between the input and the reconstructed output. Common choices include mean
squared error (MSE) for continuous data or binary cross-entropy for binary data.
4. During training, the autoencoder learns to minimize the reconstruction loss, forcing the
network to capture the most important features of the input data in the bottleneck layer.
After the training process, only the encoder part of the autoencoder is retained to encode a
similar type of data used in the training process. The different ways to constrain the network
are: –

2
AD2602-Generative AI UNIT III

• Keep small Hidden Layers: If the size of each hidden layer is kept as small as possible, then
the network will be forced to pick up only the representative features of the data thus
encoding the data.
• Regularization: In this method, a loss term is added to the cost function which encourages
the network to train in ways other than copying the input.
• Denoising: Another way of constraining the network is to add noise to the input and teach
the network how to remove the noise from the data.
• Tuning the Activation Functions: This method involves changing the activation functions
of various nodes so that a majority of the nodes are dormant thus, effectively reducing the
size of the hidden layers.
Types of Autoencoders
There are diverse types of autoencoders and analyze the advantages and disadvantages
associated with different variation:
Denoising Autoencoder
Denoising autoencoder works on a partially corrupted input and trains to recover the original
undistorted image. As mentioned above, this method is an effective way to constrain the
network from simply copying the input and thus learn the underlying structure and important
features of the data.
Advantages
This type of autoencoder can extract important features and reduce the noise or the useless
features.
Denoising autoencoders can be used as a form of data augmentation, the restored images can
be used as augmented data thus generating additional training samples.
Disadvantages
Selecting the right type and level of noise to introduce can be challenging and may require
domain knowledge.
Denoising process can result into loss of some information that is needed from the original
input. This loss can impact accuracy of the output.
Sparse Autoencoder
This type of autoencoder typically contains more hidden units than the input but only a few are
allowed to be active at once. This property is called the sparsity of the network. The sparsity of
the network can be controlled by either manually zeroing the required hidden units, tuning the
activation functions or by adding a loss term to the cost function.
Advantages
The sparsity constraint in sparse autoencoders helps in filtering out noise and irrelevant features
during the encoding process.
These autoencoders often learn important and meaningful features due to their emphasis on
sparse activations.

3
AD2602-Generative AI UNIT III

Disadvantages
The choice of hyperparameters play a significant role in the performance of this autoencoder.
Different inputs should result in the activation of different nodes of the network.
The application of sparsity constraint increases computational complexity.
Variational Autoencoder
Variational autoencoder makes strong assumptions about the distribution of latent variables and
uses the Stochastic Gradient Variational Bayes estimator in the training process. It assumes that
the data is generated by a Directed Graphical Model and tries to learn an approximation to
Advantages
Variational Autoencoders are used to generate new data points that resemble the original
training data. These samples are learned from the latent space.
Variational Autoencoder is probabilistic framework that is used to learn a compressed
representation of the data that captures its underlying structure and variations, so it is useful in
detecting anomalies and data exploration.
Disadvantages
Variational Autoencoder use approximations to estimate the true distribution of the latent
variables. This approximation introduces some level of error, which can affect the quality of
generated samples.
The generated samples may only cover a limited subset of the true data distribution. This can
result in a lack of diversity in generated samples.
Convolutional Autoencoder
Convolutional autoencoders are a type of autoencoder that use convolutional neural networks
(CNNs) as their building blocks. The encoder consists of multiple layers that take a image or a
grid as input and pass it through different convolution layers thus forming a compressed
representation of the input. The decoder is the mirror image of the encoder it deconvolves the
compressed representation and tries to reconstruct the original image.
Advantages
Convolutional autoencoder can compress high-dimensional image data into a lower-
dimensional data. This improves storage efficiency and transmission of image data.
Convolutional autoencoder can reconstruct missing parts of an image. It can also handle images
with slight variations in object position or orientation.
Disadvantages
These autoencoder are prone to overfitting. Proper regularization techniques should be used to
tackle this issue.
Compression of data can cause data loss which can result in reconstruction of a lower quality
image.

4
AD2602-Generative AI UNIT III

Variational Autoencoders (VAE)


Variational Autoencoders (VAEs) are a probabilistic extension of autoencoders that aim to
model the underlying data distribution. Unlike traditional autoencoders, which encode input
into fixed points in latent space, VAEs encode inputs into a probabilistic distribution defined
by a mean and variance. This probabilistic nature allows VAEs to generate new data samples
by sampling from the latent space. The training process involves optimizing a loss function
comprising two terms: a reconstruction loss that ensures the generated output resembles the
input and a Kullback-Leibler (KL) divergence term that regularizes the latent space by
encouraging it to approximate a prior distribution (typically Gaussian). VAEs are widely used
for generative tasks, including image synthesis, text generation, and data augmentation, and
they provide interpretable latent representations.

GENERATIVE ADVERSARIAL NETWORKS (GAN)


Generative Adversarial Networks (GANs) are a class of generative models that employ a game-
theoretic approach to learn the data distribution. A GAN consists of two neural networks: a
generator and a discriminator. The generator creates synthetic data samples, while the
discriminator evaluates the authenticity of these samples by distinguishing between real and
generated data. The two networks are trained simultaneously in an adversarial setup, with the
generator aiming to fool the discriminator and the discriminator striving to correctly classify
the samples. This adversarial process is governed by a minimax loss function. GANs have been
immensely successful in applications such as image synthesis (e.g., StyleGAN), video
generation, and domain adaptation. However, training GANs is challenging due to issues like
mode collapse, instability, and the need for careful hyperparameter tuning.
A Generative Adversarial Network (GAN) typically utilizes architectures such as convolutional
neural networks (CNN). GAN framework is composed of two neural networks: Generator and
Discriminator. These networks play an important role where the generator focuses on creating
new data and the discriminator evaluates it.
A Generative Adversarial Network (GAN) is a type of artificial intelligence framework that is
used for unsupervised learning. GANs are made up of two neural networks: a Generator and a
Discriminator. GANs use adversarial training to produce artificial data that resembles the actual
data.
GANs can be divided to have three components −
Generative − This component focuses on learning how to generate new data by understanding
the underlying patterns in the dataset.
Adversarial − In simple terms, "adversarial" means setting two things in opposition. In GANs,
the generated data is compared to real data from the dataset. This is done using a model trained
to distinguish between real and fake data. This model is known as discriminator.
Networks − To enable the learning process, GANs uses deep neural networks.
The Generator Model

5
AD2602-Generative AI UNIT III

The goal of the generator model is to generate new data samples that are intended to resemble
real data from the dataset.
• It takes random input data as input and transforms it into synthetic data samples.
• Once transformed, the other goal of the generator is to produce data that is identical to
real data when presented to the discriminator.
• The generator is implemented as a neural network model. Depending on the type of
data being generated, it uses fully connected layers like Dense or Convolutional layers.
The Discriminator Model
The goal of the discriminator model is to evaluate the input data and tries to distinguish between
real data samples from the dataset and fake data samples generated by the generator model.
• It takes input data and predicts whether it is real or fake.
• Another goal of the discriminator model is to correctly classify the source of the input
data as real or fake.
• Like the generator model, the discriminator model is also implemented as a neural
network model. It also uses Dense or Convolutional layers.
During the training of a GAN, both the generator and the discriminator are trained
simultaneously but in adverse ways, i.e., in competition with each other.

GANs have two main components: a generator network and a discriminative network. Given
below are the steps involved in the working of a GAN −
1) Initialization
The GAN consists of two neural networks: the Generator (say G) and the Discriminator (say
D).
The goal of the generator is to generate new data samples like images or text that closely
resemble the real data from the dataset.
The discriminator, playing the role of a critic, has the goal to distinguish between the real data
and the data generated by the generator.
2) Training Loop
The training loop involves alternating between training the generator and the discriminator.
3) Training the Discriminator

6
AD2602-Generative AI UNIT III

While training the discriminator, for each iteration −


• First, select a batch of real data samples from the dataset.
• Next, enerate a batch of fake data samples using the current generator.
• Once generated, train the discriminator on both the real and fake data samples.
• Finally, the discriminator learns to distinguish between real and fake data by adjusting
its weights to minimize its classification error.
4) Training the Generator
While training the generator, for each iteration −
• First, generate a batch of fake data samples using the generator.
• Next, train the generator to produce fake data that the discriminator classifies as the real
data. To do this, we need to pass the fake data through the discriminator and update the
generator's weights based on the discriminator's classification error.
• Finally, the generator will learn to produce more realistic fake data by adjusting its
weights to maximize the discriminator's error when classifying its generated samples.
5) Adversarial Training
As the training progresses, both the generator and discriminator improve their performance in
an adversarial manner, i.e., in opposition.

The generator gets better at creating fake data that resembles real data, while the discriminator
gets better at distinguishing between real and fake data.
With the help of this adversarial relationship between the generator and discriminator, both the
networks try to improve continuously until the generator generates data that is identical to the
real data.
6) Evaluation
Once the training is over, the generator can be used to generate new data samples that resemble
the real data from the dataset.We can evaluate the quality of the generated data either by
inspecting samples visually or using quantitative measures like similarity scores or classifier
accuracy.
7) Fine-Tuning and Optimization
Depending on the application, you can fine-tune the trained GAN model to improve its
performance or adapt it to specific tasks or datasets.
Generative Adversarial Networks (GANs) is one of the most prominent and widely used
generative models. In this chapter, we explained the basics of a GAN and how it works using
neural networks to produce artificial data that resembles actual data.The steps that are involved
in the working of a GAN include: Initialization, Training Loop, Training the Discriminator,
Training the Generator, Adversarial Training, Evaluation, and Fine tuning and Optimization.
The Role of Generator in GAN Architecture

7
AD2602-Generative AI UNIT III

The first primary part of GAN architecture is the Generator. Let’s see its function and structure

• Generator: Function and Structure
The primary goal of the generator is to generate new data samples that are intended to resemble
real data from the dataset. It begins with a random noise vector and transforms it through fully
connected layers like Dense or Convolutional layers to generate synthetic data sample.
• Generator: Layers and Components
Below are the layers and components of the generator neural network −
Input Layer − The generator receives a low dimensionality random noise vector or input data
as input.
Fully Connected Layers − The FLC is used to increase the input noise vector dimensionality.
Transposed Convolutional Layers − These layers are also known as deconvolutional layers.
It is used for upsampling i.e., to generate an output feature map having greater spatial
dimension than the input feature map.
Activation Functions − Two commonly used activations functions are: Leaky ReLU and Tanh.
The Leaky ReLU activation function helps in decreasing the dying ReLU problem, while the
Tanh activation function makes sure that the output is within a specific range.
Output Layer − It produces the final data output like an image of a certain resolution.
• Generator: Objective Function
The goal of generator neural network is to create data that the discriminator cannot distinguish
from real data. This is achieved by minimizing the generator’s loss function –

The Role of Discriminator in GAN Architecture


The second part of GAN architecture is the Discriminator. Let’s see its function and structure

• Discriminator: Function and Structure
The primary goal of the discriminator is to classify the input data as real or generated by the
generator. It takes a data sample as input and gives a probability as output that indicates whether
the sample is real or fake.
• Discriminator: Layers and Components
Below are the layers and components of the discriminator neural network −
Input Layer − The discriminator receives a data sample from either the real dataset or the
generator as input.

8
AD2602-Generative AI UNIT III

Convolutional Layers − It is used for downsampling the input data to extract relevant features.
Fully Connected Layers − The FLC is used to process the extracted features and make a final
classification.
Activation Functions − It uses Leaky ReLU activation function to address the vanishing
gradient problem. It also introduces non-linearity.
Output Layer − As name implies, it gives a single probability value between 0 and 1 as output
that indicates whether the sample is real or fake.
• Discriminator: Objective Function
The goal of discriminator neural network is to maximize its ability to correctly distinguish real
data from generated data. This is achieved by minimizing the discriminator’s loss function –

Types of Generative Adversarial Networks


We can have different types of GAN models based on the way the generator and the
discriminator networks interact with each other. Here are some notable variations −
Vanilla GAN
Vanilla GAN represents the simplest form of generative adversarial networks (GANs). It
provides a fundamental understanding of how GANs work. The term "Vanilla" implies that this
is the simplest form without any advanced modifications or enhancements.
Deep Convolutional GANs (DCGANs)
DCGANs is one of the most popular implementations of GANs. It is composed of ConvNets
in the place of multi-layer perceptron to stabilize GAN training. These guidelines have
significantly stabilized GAN training particularly for image generation tasks.
Some of the key features of DCGANs include the use of:
• Strided Convolutions
• Batch Normalization
• The removal of fully connected hidden layers
Conditional GANs (cGANs)
Conditional GAN (cGAN) includes additional condition information like class labels,
attributes, or even other data samples, into both generator and discriminator. With the help of
these conditioning information, Conditional GANs provide us the control over the
characteristic of the generated output.
CycleGANs

9
AD2602-Generative AI UNIT III

CycleGANs are designed for unpaired image-to-image translation tasks where there is no
relation between the input and output images. A cycle consistency loss is used to ensure that
translating from one domain to another and back again produces consistent results.
Progressive GANs (ProGANs)
ProGANs generate high-resolution images by progressively increasing the resolution of both
the generator and discriminator during training. With this approach, you can create more
detailed and higher-quality images.
StyleGANs
StyleGANs, developed by NVIDIA, is specifically designed for generating photo-realistic
high-quality images. They introduced some innovative techniques for improved image
synthesis and have some better control over specific attributes.
Laplacian Pyramid GAN (LAPGAN)
Laplacian Pyramid GAN (LAPGAN) is a type of generative adversarial network that uses a
multi-resolution approach to generate high-quality images. It uses a Laplacian pyramid
framework where images are generated at multiple scales.LAPGANs are mainly effective in
creating detailed and realistic images as compared to standard GANs.

Flow-based Models
Flow-based models are a family of generative models that learn the exact data distribution by
transforming a simple base distribution (e.g., Gaussian) into a complex data distribution using
a series of invertible mappings. These mappings are parameterized by neural networks and
ensure that both the forward and inverse transformations are computationally efficient. The key
advantage of flow-based models is that they provide exact log-likelihood computation, making
them both generative and probabilistic in nature. Popular flow-based models include RealNVP
(Real-valued Non-Volume Preserving transformations) and Normalizing Flows. They are
particularly effective for tasks requiring precise likelihood evaluation, such as density
estimation and anomaly detection. Their ability to generate high-quality samples, coupled with
interpretability, makes them a valuable addition to the toolkit of generative modeling.
Flow-based generative models are a class of probabilistic generative models designed to learn
a bijective mapping between a simple base distribution (e.g., Gaussian) and a complex target
distribution (e.g., natural images). Unlike Generative Adversarial Networks (GANs) or
Variational Autoencoders (VAEs), these models directly model the data distribution without
the need for an explicit density function.
Architecture:
These models use a sequence of invertible transformations (flows) to map a simple base
distribution to a target distribution.Invertibility ensures efficient sampling and likelihood
computation.
Popular Architectures:
• Real NVP (Real Non-Volume Preserving): A foundational flow-based model.

10
AD2602-Generative AI UNIT III

• Glow (Generative Flow with Invertible 1x1 Convolutions): Extends Real NVP with
additional flexibility.
• Neural Spline Flows: Introduce spline-based transformations for improved
expressiveness.
Training Process:
Objective:
• Maximize the likelihood of observed data using:
• Maximum Likelihood Estimation (MLE): Optimizes model parameters by directly
maximizing the likelihood of data.
• Maximum a Posteriori (MAP) Estimation: Includes priors for regularization.
Advantages in Training:
Flow-based models have tractable likelihoods, enabling exact likelihood computation and
efficient training via backpropagation.
Challenges:
Ensuring invertibility of transformations.
Maintaining numerical stability during optimization.
Advantages:
Exact Likelihood Evaluation: Reliable estimation of the data density.
Efficient Sampling: Enables both data generation and latent space inference due to explicit
invertible mappings.
Strong Generalization: Capable of generating diverse, high-quality samples across domains.
Applications:
Computer Vision: Image generation, super-resolution, inpainting, and style transfer.
Natural Language Processing (NLP): Text generation, language modeling, and machine
translation.
Generative Design: Applications in molecular modeling and anomaly detection.
Challenges:
Computational Cost: Modeling high-dimensional data can be resource-intensive.
Scalability: Adapting these models to large-scale datasets remains an area of ongoing research.
Future Directions:
Architectural Innovations: Explore novel designs to improve expressiveness and efficiency.
Enhanced Training Algorithms: Focus on overcoming numerical stability and optimization
challenges.

11
AD2602-Generative AI UNIT III

Hybrid Approaches: Combine flow-based models with other generative techniques (e.g.,
VAEs, GANs) for superior performance.
Flow-based generative models stand out as a powerful framework in generative modeling.
Their ability to generate diverse, realistic samples and their tractable likelihood evaluation
make them invaluable for applications in AI and beyond. As research progresses, these models
are poised to play a pivotal role in advancing machine learning and data generation.
REAL NVP (REAL NON-VOLUME PRESERVING)

12
AD2602-Generative AI UNIT III

13
AD2602-Generative AI UNIT III

14
AD2602-Generative AI UNIT III

15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy