ESRGAN Slides 3mar2025
ESRGAN Slides 3mar2025
edge preservation
• Given a Single channel B/W images it first converts it into color using pix2pix Image colorizing
GAN and then pass it on to ESRGAN, the output will be 4 time resolution enhanced color image
• The project aims at implementing two GAN architectures one for Enhanced Super resolution and
another for B/W to color image conversion
Applications
Applications of Combined B/W to Colorization and Super-Resolution Solutions
1. Restoration of Historical Photos & Videos
• Enhancing old black-and-white photographs by adding color and improving resolution for archival purposes.
2. Medical Imaging Enhancement
• Improving low-resolution grayscale scans (e.g., X-rays, MRIs) by colorizing and upscaling for better analysis.
3. Surveillance & Forensics
• Enhancing low-quality black-and-white security footage by adding color and increasing resolution for better
identification.
4. AI-Powered Film Restoration
• Converting old black-and-white movies into high-resolution color versions for modern viewing experiences.
5. Art and Media Enhancement
• Converting B/W sketches or manga into colored, high-resolution digital versions for artists and publishers.
6. Satellite and Aerial Imagery
• Colorizing and improving resolution of grayscale satellite images for better geographical and environmental analysis.
7. Heritage Preservation
• Restoring ancient manuscripts, paintings, and artworks that were originally in black and white by reconstructing colors
and improving clarity.
Introduction to Super-Resolution & ESRGAN
• Super-Resolution (SR) is a cutting-edge deep learning technique that enhances the quality of low-resolution images by
reconstructing high-frequency details.
• Traditional methods relied on interpolation techniques, but deep learning approaches, such as convolutional neural
networks (CNNs) and generative adversarial networks (GANs), have revolutionized the field.
• A key dataset in SR research is DIV2K, a large-scale, high-resolution dataset widely used for training super-resolution
models. It includes 800 training images and 100 validation images, providing a diverse set of high-quality images for
effective learning.
• This presentation explores the deep learning pipeline for SR, covering dataset preprocessing, model architectures, and
the training process.
What is GAN (Generative Adversarial Network)
• In a generative adversarial network (GAN), two neural networks compete with one another to make predictions that are as
accurate as possible. GANs typically operate unsupervised and learn through cooperative zero-sum games.
• The generator and the discriminator are the two neural networks that constitute a GAN.
• A de-convolutional neural network serves as the discriminator, and a convolutional neural network serves as the
generator.
• The generator, creates new data instances, and the discriminator, evaluates them for authenticity and determines if each
instance of data that it analyzes is actually a part of the training data set.
• This adversarial training continues till the generator starts generating fake images which discriminator is not able to
discriminate from the real images.
Existing system
• Before ESRGAN, the existing system was SRGAN (Super-Resolution Generative Adversarial Network).
• SRGAN (Super-Resolution Generative Adversarial Network) is a foundational model for image super-
resolution using GANs, while ESRGAN (Enhanced Super-Resolution Generative Adversarial Network) is an
improved version of SRGAN.
• It has issues in generating more visually realistic images with sharper details and natural textures, artifacts
present in SRGAN outputs
Existing system – Disadvantages
Disadvantages of SRGAN (in brief):
1. Low-Quality Texture Generation: SRGAN often produced blurry or
unrealistic high-frequency textures in the output images.
2. Perceptual Loss Limitation: The perceptual loss used in SRGAN focused
more on content similarity rather than visual realism.
3. Mode Collapse: It sometimes generated repetitive patterns, lacking
diversity in textures.
4. Lack of Fine Details: The network struggled to reconstruct fine details,
especially in complex images.
5. Limited Adversarial Training: The discriminator wasn't optimized enough
to push the generator for more realistic images.
Proposed system
ESRGAN (Enhanced Super-Resolution Generative Adversarial Network)
• The proposed system is ESRGAN, the main architecture of the ESRGAN is the same as the SRGAN with
some modifications to generate high-quality, photo-realistic images from low-resolution inputs using deep
learning techniques.
• ESRGAN has a Residual in Residual Dense Block(RRDB) which combines a multi-level residual network and
dense connection without Batch Normalization.
Proposed system - Advantages
Advantages of ESRGAN:
1. High-Quality Texture Generation: Produces sharper and more realistic
textures compared to SRGAN.
2. Improved Perceptual Loss: Uses a more effective perceptual loss
function, enhancing fine details and visual quality.
3. Dense Residual Blocks: Helps in better feature extraction and deeper
network learning without vanishing gradients.
4. Relativistic GAN: The discriminator uses a relativistic approach,
comparing real and fake images together, which improves realism.
5. Robust Performance: Performs better on various datasets and complex
images with consistent quality.
Overall Block Diagram
LR Color HR
Image 1x
Image Inferencing
ESRGAN
Flask Interface and
Model 4x
1x Visualisation
LR B/W B/W to
Image Color GAN
Overall Architecture diagram
In this diagram, the flow is as follows:
1. User Upload & Flask Server:
The user uploads an image via the Flask web interface, and the server saves the file.
2. Color Check & Optional Colorization:
The system checks whether the image is grayscale. If yes, it runs a colorization module (a U-Net
generator implemented in PyTorch) to produce a colorized image. Otherwise, the original image is
used.
3. Preprocessing & Downscaling:
The chosen image is preprocessed (cropped/resized) and downscaled (bicubic) to generate a low-
resolution (LR) image.
4. ESRGAN Inference:
The LR image is fed into the ESRGAN model (loaded via TensorFlow Hub) to produce a super-resolved
(SR) image.
5. Composite Generation:
A composite output is created, displaying multiple panels (e.g., HR, LR, Bicubic, ESRGAN output, and
optionally the original B/W input).
6. Output:
Finally, the composite image is returned/displayed via the Flask interface.
The diagram starts with a user upload, moves through file saving, a decision step
(grayscale or not), followed by optional colorization, preprocessing and
downscaling, ESRGAN inference, and finally the assembly of a composite output
displayed back via Flask.
UML Diagrams
Class diagram
System Architecture diagram Use case diagram
UML Diagrams
This diagram shows the data pipeline from the dataset and DataLoader
into the generator and then to the discriminator (with both generated
and real HR images), followed by loss calculation and weight updates.
Generator & Discriminator Architectures
This diagram depicts the discriminator as a series of convolution blocks (with strides and BatchNorm)
that reduce the spatial dimensions, followed by flattening and fully connected layers to yield a final
validity score.
Training Configuration & Loss Functions.
• Training a Super-Resolution model requires carefully balancing multiple loss functions.
• The L1 Loss (content loss) ensures that generated images are close to their high-resolution counterparts,
while the BCEWithLogitsLoss (GAN loss) forces the generator to create more realistic outputs.
• For optimization, the Adam optimizer is used with separate learning rates: 1e-4 for the generator and 5e-5
for the discriminator.
• This configuration stabilizes training and prevents the discriminator from overpowering the generator.
Training follows an adversarial strategy, where the generator and discriminator learn iteratively.
• Periodic checkpoints are saved, allowing model performance tracking and resuming from a specific epoch
when needed.
Training Process & Checkpointing
• Training a Super-Resolution GAN follows an adversarial approach, where the generator and discriminator
compete. This iterative training refines the model’s ability to produce high-quality images that are
indistinguishable from real ones.
• To ensure model stability, loss functions are closely monitored throughout training. Generator loss and
discriminator loss trends help in diagnosing issues like mode collapse or overfitting.
• Periodic checkpointing is implemented to save model weights, optimizer states, and epoch progress. This
allows training to resume from a previous checkpoint in case of interruptions, avoiding the need to start
over.
• This method is essential for long-duration training processes.
Comparative results
Conclusion & Key Takeaways
• Super-Resolution using deep learning has revolutionized the way we enhance image quality, benefiting fields such as
medical imaging, surveillance, and media restoration.
• Models like SRGAN leverage Residual Dense Blocks (RDB) and adversarial training to generate highly realistic high-
resolution images.
• Key takeaways include the importance of GAN-based adversarial training, the role of RRDB in improving feature
extraction, and the need for well-preprocessed datasets like DIV2K.
• Loss functions and optimizer tuning ensure stable and efficient model training. Looking ahead, advancements in AI,
computational power, and novel architectures will further push super-resolution capabilities, making it an even more
powerful tool in real-world applications.
B/W to color Image conversion
Overall Architecture
The above figure depicts the U-Net’s encoder–decoder structure. The encoder (downsampling) path consists of several
convolution blocks. The bottleneck (innermost UnetBlock) is followed by the decoder (upsampling) path where skip
connections (shown as dashed “Concat” nodes) merge features from corresponding encoder layers. Finally, the output
layer predicts the ab channels.
Discriminator architecture
The above figure visualizes a series of convolutional blocks that gradually reduce the spatial dimensions and increase
the feature depth. The final convolution outputs a patch-wise validity score.
Flask Interface
Results
Test Results using Satellite Images
Test Results using Satellite Images
Using CT Images
Using Xray Images
References
1. Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 38(2), 295-307.
2. Kim, J., Lee, J. K., & Lee, K. M. (2016). Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1646-1654).
3. Lim, B., Son, S., Kim, H., Nah, S., & Lee, K. M. (2017). Enhanced Deep Residual Networks for Single Image Super-Resolution. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 136-144).
4. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-Realistic Single Image Super-
Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(pp. 4681-4690).
5. Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., ... & Change Loy, C. (2018). ESRGAN: Enhanced Super-Resolution Generative
Adversarial Networks. In Proceedings of the European Conference on Computer Vision Workshops (pp. 63-79).
6. Lai, W. S., Huang, J. B., Ahuja, N., & Yang, M. H. (2017). Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 624-632).
7. Haris, M., Shakhnarovich, G., & Ukita, N. (2018). Deep Back-Projection Networks for Super-Resolution. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (pp. 1664-1673).
8. Agustsson, E., & Timofte, R. (2017). NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 126-135).
9. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative Adversarial Nets.
In Advances in Neural Information Processing Systems (pp. 2672-2680).
Thank You