0% found this document useful (0 votes)
42 views40 pages

ESRGAN Slides 3mar2025

Uploaded by

melvanogondra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views40 pages

ESRGAN Slides 3mar2025

Uploaded by

melvanogondra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Enhanced Super Resolution GAN with

edge preservation

R V SAI SRIRAM 21BD1A053L


A SRUJAN 21BD1A0523
K RAHUL 21BD1A052L
K SATHVIK 21BD1A052Y

Guide: Mr. Para Upendar, HoD, Department of CSE


Contents
• Name Slide
• Abstract
• Introduction
• Existing system
• Disadvantages
• Proposed system
• Advantages
• Architectural diagram
• UML diagrams
• Code
• Results
• References
Abstract
• Single Image Super-Resolution (SISR) is a fundamental problem in computer vision,
aiming to reconstruct high-resolution images from their low-resolution counterparts.
The challenge lies in recovering lost high-frequency details and preserving fine textures
and edges. While deep learning models, particularly Generative Adversarial Networks
(GANs), have shown significant progress in SISR, they often struggle with accurately
restoring sharp edges and intricate details. This project presents an enhanced Super-
Resolution Generative Adversarial Network (SRGAN) model that incorporates an edge-
preserving loss function to improve the quality of reconstructed images. Utilizing the
DIV2K dataset for training and evaluation, the proposed approach demonstrates
superior performance in terms of Peak Signal-to-Noise Ratio (PSNR) and Structural
Similarity Index Measure (SSIM) compared to baseline models. The results indicate that
incorporating edge information significantly enhances the visual fidelity of super-
resolved images.
• The project also aims at developing B/W to color image conversion GAN model and
enhancing the colored image using ESRGAN.
• DOMAIN: Machine Learning
Objectives
• Implementation of Enhanced Super Resolution GAN architecture with edge preservation for RGB
images. (For ex., given an input image of size 256x256, it will enhance 4 times to 1024x1024)

• Given a Single channel B/W images it first converts it into color using pix2pix Image colorizing
GAN and then pass it on to ESRGAN, the output will be 4 time resolution enhanced color image

• The project aims at implementing two GAN architectures one for Enhanced Super resolution and
another for B/W to color image conversion
Applications
Applications of Combined B/W to Colorization and Super-Resolution Solutions
1. Restoration of Historical Photos & Videos
• Enhancing old black-and-white photographs by adding color and improving resolution for archival purposes.
2. Medical Imaging Enhancement
• Improving low-resolution grayscale scans (e.g., X-rays, MRIs) by colorizing and upscaling for better analysis.
3. Surveillance & Forensics
• Enhancing low-quality black-and-white security footage by adding color and increasing resolution for better
identification.
4. AI-Powered Film Restoration
• Converting old black-and-white movies into high-resolution color versions for modern viewing experiences.
5. Art and Media Enhancement
• Converting B/W sketches or manga into colored, high-resolution digital versions for artists and publishers.
6. Satellite and Aerial Imagery
• Colorizing and improving resolution of grayscale satellite images for better geographical and environmental analysis.
7. Heritage Preservation
• Restoring ancient manuscripts, paintings, and artworks that were originally in black and white by reconstructing colors
and improving clarity.
Introduction to Super-Resolution & ESRGAN
• Super-Resolution (SR) is a cutting-edge deep learning technique that enhances the quality of low-resolution images by
reconstructing high-frequency details.
• Traditional methods relied on interpolation techniques, but deep learning approaches, such as convolutional neural
networks (CNNs) and generative adversarial networks (GANs), have revolutionized the field.
• A key dataset in SR research is DIV2K, a large-scale, high-resolution dataset widely used for training super-resolution
models. It includes 800 training images and 100 validation images, providing a diverse set of high-quality images for
effective learning.
• This presentation explores the deep learning pipeline for SR, covering dataset preprocessing, model architectures, and
the training process.
What is GAN (Generative Adversarial Network)
• In a generative adversarial network (GAN), two neural networks compete with one another to make predictions that are as
accurate as possible. GANs typically operate unsupervised and learn through cooperative zero-sum games.
• The generator and the discriminator are the two neural networks that constitute a GAN.
• A de-convolutional neural network serves as the discriminator, and a convolutional neural network serves as the
generator.
• The generator, creates new data instances, and the discriminator, evaluates them for authenticity and determines if each
instance of data that it analyzes is actually a part of the training data set.
• This adversarial training continues till the generator starts generating fake images which discriminator is not able to
discriminate from the real images.
Existing system
• Before ESRGAN, the existing system was SRGAN (Super-Resolution Generative Adversarial Network).
• SRGAN (Super-Resolution Generative Adversarial Network) is a foundational model for image super-
resolution using GANs, while ESRGAN (Enhanced Super-Resolution Generative Adversarial Network) is an
improved version of SRGAN.
• It has issues in generating more visually realistic images with sharper details and natural textures, artifacts
present in SRGAN outputs
Existing system – Disadvantages
Disadvantages of SRGAN (in brief):
1. Low-Quality Texture Generation: SRGAN often produced blurry or
unrealistic high-frequency textures in the output images.
2. Perceptual Loss Limitation: The perceptual loss used in SRGAN focused
more on content similarity rather than visual realism.
3. Mode Collapse: It sometimes generated repetitive patterns, lacking
diversity in textures.
4. Lack of Fine Details: The network struggled to reconstruct fine details,
especially in complex images.
5. Limited Adversarial Training: The discriminator wasn't optimized enough
to push the generator for more realistic images.
Proposed system
ESRGAN (Enhanced Super-Resolution Generative Adversarial Network)
• The proposed system is ESRGAN, the main architecture of the ESRGAN is the same as the SRGAN with
some modifications to generate high-quality, photo-realistic images from low-resolution inputs using deep
learning techniques.
• ESRGAN has a Residual in Residual Dense Block(RRDB) which combines a multi-level residual network and
dense connection without Batch Normalization.
Proposed system - Advantages
Advantages of ESRGAN:
1. High-Quality Texture Generation: Produces sharper and more realistic
textures compared to SRGAN.
2. Improved Perceptual Loss: Uses a more effective perceptual loss
function, enhancing fine details and visual quality.
3. Dense Residual Blocks: Helps in better feature extraction and deeper
network learning without vanishing gradients.
4. Relativistic GAN: The discriminator uses a relativistic approach,
comparing real and fake images together, which improves realism.
5. Robust Performance: Performs better on various datasets and complex
images with consistent quality.
Overall Block Diagram

LR Color HR
Image 1x
Image Inferencing
ESRGAN
Flask Interface and
Model 4x
1x Visualisation
LR B/W B/W to
Image Color GAN
Overall Architecture diagram
In this diagram, the flow is as follows:
1. User Upload & Flask Server:
The user uploads an image via the Flask web interface, and the server saves the file.
2. Color Check & Optional Colorization:
The system checks whether the image is grayscale. If yes, it runs a colorization module (a U-Net
generator implemented in PyTorch) to produce a colorized image. Otherwise, the original image is
used.
3. Preprocessing & Downscaling:
The chosen image is preprocessed (cropped/resized) and downscaled (bicubic) to generate a low-
resolution (LR) image.
4. ESRGAN Inference:
The LR image is fed into the ESRGAN model (loaded via TensorFlow Hub) to produce a super-resolved
(SR) image.
5. Composite Generation:
A composite output is created, displaying multiple panels (e.g., HR, LR, Bicubic, ESRGAN output, and
optionally the original B/W input).
6. Output:
Finally, the composite image is returned/displayed via the Flask interface.

The diagram starts with a user upload, moves through file saving, a decision step
(grayscale or not), followed by optional colorization, preprocessing and
downscaling, ESRGAN inference, and finally the assembly of a composite output
displayed back via Flask.
UML Diagrams

Class diagram
System Architecture diagram Use case diagram
UML Diagrams

Sequence diagram Activity diagram Data Flow diagram


Dataset Pre-processing & Transforms
• Before training a deep learning model for Super-Resolution, the dataset undergoes preprocessing to
ensure optimal learning. High-Resolution (HR) images are resized to 256x256 pixels and undergo random
cropping to introduce variability. Horizontal flipping is applied to increase the diversity of the training
data.
• Low-Resolution (LR) images are resized to 64x64 pixels, matching the expected input dimensions for the
super-resolution model. This step ensures a consistent scale factor for upscaling.
• Additionally, images are converted into tensors and normalized, helping the model learn effectively.
Augmentation techniques such as random cropping and flipping enhance generalization, reducing
overfitting and improving real-world performance.
ESRGAN Architecture Diagram

This diagram shows the data pipeline from the dataset and DataLoader
into the generator and then to the discriminator (with both generated
and real HR images), followed by loss calculation and weight updates.
Generator & Discriminator Architectures

• Super-Resolution models leverage a Generator-Discriminator framework, a key principle of Generative Adversarial


Networks (GANs).
• The Generator is responsible for upscaling low-resolution images while retaining fine details. It uses Residual-in-
Residual Dense Blocks (RRDB) to enhance feature extraction and improve resolution quality.
• On the other side, the Discriminator is a deep convolutional network that classifies images as either real (from the
dataset) or fake (generated).
• This classification forces the generator to improve its outputs over successive training iterations.
• Through adversarial training, the generator and discriminator compete, leading to an enhancement in perceptual
quality. The resulting images look highly realistic, closely matching ground-truth high-resolution images.
Generator Architecture

The generator’s diagram details:


• An initial convolution block for feature extraction.
• A block of 8 RRDB modules (each RRDB contains multiple residual dense blocks).
• A skip (residual) connection.
• Two upscaling stages (each doubling resolution via interpolation, convolution, and activation).
• A final reconstruction convolution that outputs the super-resolved image.
Residual Dense Block & RRDB Model
• The Residual Dense Block (RDB) is a key component in Super-Resolution models, enabling efficient feature
extraction through densely connected convolutional layers.
• This architecture ensures that information is retained across multiple layers, improving learning efficiency.
• The Residual-in-Residual Dense Block (RRDB) extends the RDB structure by removing batch normalization
layers, which enhances model performance by improving gradient flow and reducing computational
complexity.
• RRDB allows deeper networks to be trained without significant degradation in performance.
• Using RRDB improves the model’s ability to learn fine details, making it a preferred choice in state-of-the-art
Super-Resolution architectures such as ESRGAN.
Discriminator Architecture

This diagram depicts the discriminator as a series of convolution blocks (with strides and BatchNorm)
that reduce the spatial dimensions, followed by flattening and fully connected layers to yield a final
validity score.
Training Configuration & Loss Functions.
• Training a Super-Resolution model requires carefully balancing multiple loss functions.
• The L1 Loss (content loss) ensures that generated images are close to their high-resolution counterparts,
while the BCEWithLogitsLoss (GAN loss) forces the generator to create more realistic outputs.
• For optimization, the Adam optimizer is used with separate learning rates: 1e-4 for the generator and 5e-5
for the discriminator.
• This configuration stabilizes training and prevents the discriminator from overpowering the generator.
Training follows an adversarial strategy, where the generator and discriminator learn iteratively.
• Periodic checkpoints are saved, allowing model performance tracking and resuming from a specific epoch
when needed.
Training Process & Checkpointing
• Training a Super-Resolution GAN follows an adversarial approach, where the generator and discriminator
compete. This iterative training refines the model’s ability to produce high-quality images that are
indistinguishable from real ones.
• To ensure model stability, loss functions are closely monitored throughout training. Generator loss and
discriminator loss trends help in diagnosing issues like mode collapse or overfitting.
• Periodic checkpointing is implemented to save model weights, optimizer states, and epoch progress. This
allows training to resume from a previous checkpoint in case of interruptions, avoiding the need to start
over.
• This method is essential for long-duration training processes.
Comparative results
Conclusion & Key Takeaways
• Super-Resolution using deep learning has revolutionized the way we enhance image quality, benefiting fields such as
medical imaging, surveillance, and media restoration.
• Models like SRGAN leverage Residual Dense Blocks (RDB) and adversarial training to generate highly realistic high-
resolution images.
• Key takeaways include the importance of GAN-based adversarial training, the role of RRDB in improving feature
extraction, and the need for well-preprocessed datasets like DIV2K.
• Loss functions and optimizer tuning ensure stable and efficient model training. Looking ahead, advancements in AI,
computational power, and novel architectures will further push super-resolution capabilities, making it an even more
powerful tool in real-world applications.
B/W to color Image conversion

Overall Architecture

It shows the data flow—from loading RGB images (which are


converted to Lab and split into L and ab channels), to feeding
the L channel into the U-Net generator, forming fake and real
colorized images via concatenation, and then passing them
through the patch-based discriminator before computing losses
and updating weights.
LAB Color Model
What does L*a*b* stand for?
• L* : Lightness
• a* : Red/Green Value
• b* : Blue/Yellow Value
• The a* axis runs from left to right. A color measurement
movement in the +a direction depicts a shift toward red.
• Along the b* axis, +b movement represents a shift
toward yellow.
• The center L* axis shows L = 0 (black or total absorption)
at the bottom.
• At the center of this plane is neutral or gray.
Generator Architecture

The above figure depicts the U-Net’s encoder–decoder structure. The encoder (downsampling) path consists of several
convolution blocks. The bottleneck (innermost UnetBlock) is followed by the decoder (upsampling) path where skip
connections (shown as dashed “Concat” nodes) merge features from corresponding encoder layers. Finally, the output
layer predicts the ab channels.
Discriminator architecture

The above figure visualizes a series of convolutional blocks that gradually reduce the spatial dimensions and increase
the feature depth. The final convolution outputs a patch-wise validity score.
Flask Interface
Results
Test Results using Satellite Images
Test Results using Satellite Images
Using CT Images
Using Xray Images
References
1. Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 38(2), 295-307.
2. Kim, J., Lee, J. K., & Lee, K. M. (2016). Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1646-1654).
3. Lim, B., Son, S., Kim, H., Nah, S., & Lee, K. M. (2017). Enhanced Deep Residual Networks for Single Image Super-Resolution. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 136-144).
4. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-Realistic Single Image Super-
Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(pp. 4681-4690).
5. Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., ... & Change Loy, C. (2018). ESRGAN: Enhanced Super-Resolution Generative
Adversarial Networks. In Proceedings of the European Conference on Computer Vision Workshops (pp. 63-79).
6. Lai, W. S., Huang, J. B., Ahuja, N., & Yang, M. H. (2017). Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 624-632).
7. Haris, M., Shakhnarovich, G., & Ukita, N. (2018). Deep Back-Projection Networks for Super-Resolution. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (pp. 1664-1673).
8. Agustsson, E., & Timofte, R. (2017). NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 126-135).
9. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative Adversarial Nets.
In Advances in Neural Information Processing Systems (pp. 2672-2680).
Thank You

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy