0% found this document useful (0 votes)

124 views20 pages

Text To Image Synthesis Using Self

The document discusses text-to-image synthesis using self-attention generative adversarial networks. It describes how earlier convolutional GANs could generate high-resolution images but lacked the ability to maintain detailed features. The self-attention GAN model allows for long-range dependency modelling which helps preserve image details. It also has a more thorough discriminator that can check for fine-grained features. The model is evaluated using the Fréchet Inception distance metric.

Uploaded by

PG Guides

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views20 pages

Text To Image Synthesis Using Self

Uploaded by

PG Guides

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

TEXT TO IMAGE SYNTHESIS USING SELF-ATTENTION GENERATIVE

ADVERSARIAL NETWORKS

MICHELLE SARAH SIMON

A project report submitted

in partial fulfilment of the requirement for the award of the

POST GRADUATE DIPLOMA IN MANAGEMENT

RESEARCH & BUSINESS ANALYTICS

MADRAS SCHOOL OF ECONOMICS
MSE BUSINESS SCHOOL

May 2022

MADRAS SCHOOL OF ECONOMICS

Chennai ‐600025
Degree and Branch : PGDM

(RESEARCH & BUSINESS ANALYTICS)

Month and Year of Submission : MAY 2022

Title of the Project Work : TEXT TO IMAGE SYNTHESIS USING

SELF-ATTENTION GENERATIVE
ADVERSARIAL NETWORKS

ii
Name of Student : MICHELLE SARAH SIMON

Roll Number : 2020DMB06

Name and Designation : Mr. BALAJI MUTHUKRISHNAN

of Supervisor Visiting faculty

Madras School of Economics

Chennai‐ 600025

Bonafide certificate

Certified that this Project Report titled Text to Image Synthesis using Self-Attention
Generative Adversarial Network is the bonafide work of Ms. Michelle Sarah Simon, who
carried out the project under my supervision. Certified further, that to the best of my
knowledge the work reported herein does not form part of any other Project Report of the
basis of which a degree or award was conferred on an earlier occasion on this or any other
candidate

iii
Abstract

In this project, I attempt the self-attention generative adversarial network (SAGAN)

which allows for attention-driven, long-range dependency modelling for image
generation. For a long time, convolutional GANs were used. While they were of high
resolution, they were not able to maintain several detailed features of the image. In self
attention GAN, the details of the images, the features of the images were salient. In
addition, self-attention GAN’s discriminator is more thorough, and can check for
highly detailed features in the images generated. The key is applying spectral
normalization, and not batch normalization in case of Deep Convoluted GANs. The
metrics that the model is judged on is the Fréchet Inception distance (FID).

iv
Acknowledgement

Thank you everyone

v
Table of contents

vi
List of tables

List of figures

vii
CHAPTER 1

INTRODUCTION

The objective of the text-to-image synthesis problem is to generate high-quality images

from the specific text descriptions. It is a fundamental problem with a wide range of
practical applications, including art generation, image editing, and computer-aided design.
While the idea of text to image generation started in 2008, it was the creation of GANs , or
generative adversarial networks, in 2014 that led to the current state-of-the-art synthesizers
that exist today. Conditioned on the text descriptions, the GAN-based models can generate
realistic images with consistent semantic meaning. In practice, one image is associated to
multiple captions in the datasets. These text descriptions annotated by humans for the same
image are highly subjective and diverse in terms of contents and choice of words.
Additionally, some text descriptions do not even provide sufficient semantic information to
guide the image generation. The linguistic variance and inadequacy between the captions of
the identical image leads to the synthetic images conditioned on them deviating from the
ground truth.

In the image-text matching task, we pretrain an image encoder and a text encoder to learn
the semantically consistent visual and textual representations of the image-text pair.
Meanwhile, we learn the consistent textual representations by pushing together the captions
of the same image and pushing way the captions of different images via the contrastive
loss. The pretrained image encoder and text encoder are leveraged to extract consistent
visual and textual features in the following stage of GAN training. Then contrastive loss is
used to minimize the distance of the fake images generated from text descriptions related to
the same ground truth image while maximizing those related to different ground truth
images. We generalize the existing text-to-image models to a unified framework so that the
approach can be integrated into them to improve their performance.

Generation of complicated, real-world images such as MS-COCO remains an open

challenge. GANs have sparked a lot of interest and advanced research efforts in
viii
synthesising images. They framed the image synthesis task as a two-player game of two
competing artificial neural networks. A generator network is trained to produce realistic
samples, while a discriminator network is trained to distinguish between real and generated
images. The training objective of the generator is to fool the discriminator. This approach
has successfully been adapted to many applications such as high-resolution synthesis of
human faces, image super resolution, image in-painting, data augmentation, style transfer,
image-to-image translation, and representation learning.

Current models are still far from being capable of generating complex scenes with multiple
objects based only on textual descriptions. There is also very limited work on for
resolutions higher than 256 × 256 pixels. It is challenging to reproduce the quantitative
results of many approaches, even if code and pre-trained models are provided. This is
reflected in the literature provided, where often different quantitative results are reported
for the same model. Furthermore, it is observe that many of the currently used evaluation
metrics are unsuitable for evaluating text-to-image synthesis models and do not correlate
well with human perception. This is because only a few approaches perform human user
studies to assess if their improvements are evident in a qualitative sense, and if they do, the
studies are not standardized, making the comparison of results difficult.

ix
CHAPTER 2

LITERATURE REVIEW

OVERVIEW
There are billions of volumes of textual content generated every day in today's world. In-
app messaging such as WhatsApp and Telegram, social media sites such as Facebook and
Instagram, news publishing sites, Google searches, and a variety of other sources are all
possible sources. The main focus of this section is on the popular NLP task of sentiment
analysis. Sentiment analysis is a fantastic tool for users to extract important information
and assists organisations in understanding the social sentiment of their brand, product, or
service while monitoring online conversations. This section investigates the various
approaches and models used in the task of sentiment analysis.

LITERATURES

For this project, I take the help of

1. SPECTRAL NORMALIZATION FOR GENERATIVE ADVERSARIAL NETWORKS -

Takeru Miyato , Toshiki Kataoka , Masanori Koyama , Yuichi Yoshida:
One of the difficulties in studying generative adversarial networks is the instability of their
training. In this paper, we propose spectral normalisation, a novel weight normalisation
technique for stabilising discriminator training. Our new normalisation technique is
computationally light and simple to implement in existing systems. We tested the efficacy of
spectral normalisation on the CIFAR10, STL-10, and ILSVRC2012 datasets, and we found that
spectrally normalised GANs (SN-GANs) can generate images of comparable or higher quality
than previous training stabilisation techniques.

2. Improved Techniques for Training GANs - Tim Salimans, Ian Goodfellow, Wojciech Zaremba,
Vicki Cheung, Alec Radford, Xi Chen
We present a number of new architectural features and training procedures for the
generative adversarial networks (GANs) framework. We achieve cutting-edge results in

x
semi-supervised classification on MNIST, CIFAR-10, and SVHN using our new
techniques. A visual Turing test confirmed the high quality of the generated images: our
model generates MNIST samples that humans cannot distinguish from real data and
CIFAR-10 samples with a human error rate of 21.3 percent. We also show ImageNet
samples with unprecedented resolution and demonstrate how our methods enable the
model to learn recognisable ImageNet class features.

xi
CHAPTER 3

THE STUDY

1. OPEN-SOURCE DATASET
Caltech-UCSD Birds 200 (CUB-200) is an image dataset annotated with 200 bird species.
It was created to enable the study of subordinate categorization, which is not possible with
other popular datasets that focus on basic level categories. The images were downloaded
from the website data.caltech.edu/records. Each image is annotated with a bounding box, a
rough bird segmentation, and a set of attribute labels.

CUB-200 includes 6,033 annotated images of birds, belonging to 200, mostly North
American, bird species.

2. FEATURE EXTRACTION

Feature extraction in image processing

Feature extraction is a step in the dimensionality reduction process that divides and reduces
an initial set of raw data to more manageable groups. As a result, processing will be
simpler. The most important feature of these large data sets is the large number of
variables. These variables necessitate a significant amount of computing power to process.
As a result, feature extraction aids in obtaining the best feature from large data sets by
selecting and combining variables into features, effectively reducing the amount of data.
These features are simple to process while accurately and uniquely describing the actual
data set.

For image processing, there are three methods to extract features.

1. Grayscale Pixel Values as Features
2. Mean Pixel Value of Channels
3. Extracting Edges

For this project, I will be using extracting edges method.

xii
Consider that we are given the below image and we need to identify the objects present in
it:

As a human, you recognize the images instantly - a dog, a car, and a cat. The shape could
be one important factor, followed by colour, or size.

A similar idea is to extract edges as features and use that as the input for the model. Edge is
basically where there is a sharp change in colour. Look at the below image:

The machine can identify the edge because there was a change in colour from white to
brown, and brown to black. An image is represented in the form of numbers. So, we will
look for pixels values around which there is a drastic change in the pixel values.

With the help of this, we can extract several features, such as eyes, ears, wings, feathers,
colour, etc.

xiii
Feature extraction in text processing

Text words represent discrete, categorical features in text processing. We encode such data
in a way that the algorithms can use it. Feature extraction refers to the process of mapping
textual data to real-valued vectors. Bag of Words is one of the most basic techniques for
numerically representing text.

Bag of Words (BOW): The vocabulary is a list of unique words in the text corpus that we
create. Then, for each sentence or document, we can represent it as a vector, with each word
represented as 1 for present and 0 for absent from the vocabulary. Another way to represent
this is to count the number of times each word appears in a document. The most widely used
method is the Term Frequency-Inverse Document Frequency (TF-IDF) technique.

 Term Frequency (TF) = (Number of times term t appears in a document)/(Number of

terms in the document)
 Inverse Document Frequency (IDF) = log(N/n), where N is the number of documents and
n is the number of documents a term t has appeared in. The IDF of a rare word is high,
whereas the IDF of a frequent word is likely to be low. Thus having the effect of
highlighting words that are distinct.
 We calculate TF-IDF value of a term as = TF * IDF

Let us take an example to calculate TF-IDF of a term in a document.

xiv
As shown in Document 1, the TF-IDF method heavily penalises the word 'beautiful' while
giving more weight to the word 'day.' This is due to the IDF part, which gives more
weightage to distinct words. In other words, in the context of the entire corpus, 'day' is an
important word for Document1. The Python scikit-learn library includes functions for
calculating the TF-IDF of text vocabulary given a text corpus. For natural language
processing (NLP) maintaining the context of the words is of utmost importance. For this, we
use another approach called Word Embedding.

Word Embedding is a text representation in which words with the same meaning are
represented similarly. In other words, it represents words in a coordinate system where
related words are placed closer together based on a corpus of relationships. The more well-
known models of word embedding are Word2Vec and Global vectors (GloVe). For this,
project, I will be using Word2Vec for the embedding, and relationship definition.

Word2vec takes as its input a large corpus of text and produces a vector space with each
unique word being assigned a corresponding vector in the space. Word vectors are
positioned in the vector space such that words that share common contexts in the corpus are
in close proximity to one another in the space. Word2Vec is very famous at capturing
meaning and demonstrating it on tasks like calculating analogy questions of the form a is
to b as c is to ___. For example, man is to woman as uncle is to ___ (aunt) using a simple
vector offset method based on cosine distance.

xv
3. EVALUATION METRICS

The Fréchet Inception Distance (FID) is a metric for evaluating the quality of generated
images and specifically developed to evaluate the performance of GANs. The activations are
summarized as a multivariate Gaussian by calculating the mean and covariance of the
images. These statistics are calculated for the activations across the collection of real and
generated images. The deviation between these two distributions is called the Fréchet
distance.

4. SELF ATTENTION

To evaluate the effect of the proposed self-attention mechanism, we constructed several

SAGAN models by incorporating the self-attention mechanism at various stages of the
generator and discriminator. SAGAN models with self-attention mechanisms at the middle-
to-high level feature maps outperform models with self-attention mechanisms at the low
level feature maps. With larger feature maps, self-attention receives more evidence and has
more freedom to choose conditions, which results in a lower FID score. It works in
conjunction with convolution for large feature maps.

When modelling dependencies for small feature maps, it functions similarly to the local
convolution. It demonstrates how the attention mechanism empowers both the generator and
the discriminator to directly model the feature maps' long-range dependencies. Furthermore,
a comparison of our SAGAN and the baseline model without attention demonstrates the
efficacy of the proposed self-attention mechanism.

The self-attention blocks perform better than residual blocks with the same number of
parameters. Even when the training goes smoothly, replacing the self-attention block with
the residual block results in worse FID and Inception score results. This comparison shows
that the performance boost provided by SAGAN is not simply due to an increase in model
depth and capacity. To better understand what was learned during the generation process, we
visualised the generator's attention weights in SAGAN for various images.

xvi
Fig. 3.4.1: Self attention architecture for image processing

5. SPECTRAL NORMALIZATION

Miyato first proposed using spectral normalisation to the discriminator network to stabilise
GAN training. By limiting the spectral norm of each layer, the discriminator's Lipschitz
constant is constrained. In comparison to other normalisation techniques, spectral
normalisation does not require additional hyper-parameter tuning (in practise, setting the
spectral norm of all weight layers to 1 consistently performs well). Furthermore, the
computational cost is relatively low.

Spectral normalisation in the generator can prevent parameter magnitude spike and avoid
unusual gradients. We find that spectral normalisation of both the generator and the
discriminator allows us to use fewer discriminator updates per generator update,
significantly lowering the computational cost of training. The method also exhibits more
consistent training behaviour.

xvii
CHAPTER 4

RESULTS

After pre-training the model on BERT, and developing the self-attention, we get the
following table.
SAGAN
Model No. of attention layers
feat 8 feat 16 feat 32 feat 64

FID 23 22.98 22.14 18.28 18.65

Table 4.1: Fréchet inception distance of the self-attention GAN

We can clearly see that, by decreasing the number of features, the FID distance reduces. It is
optimal at 32 features, after the distance starts to increase.

Fig. 4.2: Fréchet inception distance of the self-attention GAN visually represented

The self-attention module is useful for modelling long-term dependencies. Furthermore, we

show that applying spectral normalisation to the generator stabilises GAN training and
speeds up training of regularised discriminators.

xviii
This project shows that our approach is satisfactory based on the FID score and when
compared to the results of the papers mentioned in the literature reviews. On the CUB-200
dataset, our method increases the FID by 21.11%. We believe our approach has potential
applicability in a wide range of cross domain tasks, such as visual question answering,
image-text retrieval, and text-to-image synthesis, because image-text representation learning
is a fundamental task.

xix
CHAPTER 5

CONCLUSION

We demonstrated how to incorporate the self-attention model into text-to-image models to

improve their performance in this project. To begin, we use the BERT and Self-attention to
train the image-text matching task to push together the textual representations corresponding
to the same image. Furthermore, we use the method to improve the consistency of generated
images based on captions for the same image. We propose a generalised framework for text-
to-image models and use the FID score to evaluate our model.

Rishab Paper Final
No ratings yet
Rishab Paper Final
7 pages
Base Paper Batch 9 Final Updated 3
No ratings yet
Base Paper Batch 9 Final Updated 3
10 pages
Mpai05 - Final Document
No ratings yet
Mpai05 - Final Document
40 pages
Text-To-Image Generation Using Generative AI
No ratings yet
Text-To-Image Generation Using Generative AI
5 pages
From Words To Pictures Artificial Intelligence Based Art Generator
No ratings yet
From Words To Pictures Artificial Intelligence Based Art Generator
9 pages
Deep Learning Based Text To Image Genera
No ratings yet
Deep Learning Based Text To Image Genera
6 pages
Indian Institute OF Information Technology Allahabad: Text To Image Synthesis
No ratings yet
Indian Institute OF Information Technology Allahabad: Text To Image Synthesis
8 pages
Final All Correct
No ratings yet
Final All Correct
49 pages
Visual Image Caption Generator
No ratings yet
Visual Image Caption Generator
8 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
10 pages
Text To Image Synthesis Using Generative Adversarial Networks
No ratings yet
Text To Image Synthesis Using Generative Adversarial Networks
10 pages
Synthesizing Visual Realities Design and Implementation of A Text To Image Synthesizer Leveraging Spatial Transformer Generative Adversarial Networks
No ratings yet
Synthesizing Visual Realities Design and Implementation of A Text To Image Synthesizer Leveraging Spatial Transformer Generative Adversarial Networks
5 pages
Tao DF-GAN A Simple and Effective Baseline For Text-to-Image Synthesis CVPR 2022 Paper
No ratings yet
Tao DF-GAN A Simple and Effective Baseline For Text-to-Image Synthesis CVPR 2022 Paper
11 pages
Dynamic Image Generation From Text Prompt Research Paper-JOT-5135
100% (1)
Dynamic Image Generation From Text Prompt Research Paper-JOT-5135
7 pages
Photographic Text-to-Image Synthesis With A Hierarchically-Nested Adversarial Network
No ratings yet
Photographic Text-to-Image Synthesis With A Hierarchically-Nested Adversarial Network
10 pages
Master of Technology in Computer Science: Generative Adversarial Network
No ratings yet
Master of Technology in Computer Science: Generative Adversarial Network
11 pages
Master of Technology in Computer Science: Generative Adversarial Network
No ratings yet
Master of Technology in Computer Science: Generative Adversarial Network
11 pages
Text-to-Image Generation Using Deep Learning
No ratings yet
Text-to-Image Generation Using Deep Learning
6 pages
Documents 5
No ratings yet
Documents 5
5 pages
Deep Generative Adversarial Networks For Image-To
No ratings yet
Deep Generative Adversarial Networks For Image-To
26 pages
Text-to-Image Synthesis With Generative Models Met
No ratings yet
Text-to-Image Synthesis With Generative Models Met
16 pages
Meta
No ratings yet
Meta
17 pages
Report Image Generation
No ratings yet
Report Image Generation
61 pages
Saw Gan
No ratings yet
Saw Gan
11 pages
Engproc 20 00016 With Cover
No ratings yet
Engproc 20 00016 With Cover
7 pages
ImageGenerationwithGans basedTechniquesASurvey
No ratings yet
ImageGenerationwithGans basedTechniquesASurvey
19 pages
An Adaptive Approach To Text To Image
No ratings yet
An Adaptive Approach To Text To Image
5 pages
Report (ST GAN)
No ratings yet
Report (ST GAN)
44 pages
TAM GAN - Tamil Text To Naturalistic Image Synthesis Using Conventional Deep Adversarial Networks - 3584019
No ratings yet
TAM GAN - Tamil Text To Naturalistic Image Synthesis Using Conventional Deep Adversarial Networks - 3584019
18 pages
Ttoimage Merged
No ratings yet
Ttoimage Merged
57 pages
Text-to-Image Synthesis With Generative Models Methods Datasets Performance Metrics Challenges and Future Direction Basiv
No ratings yet
Text-to-Image Synthesis With Generative Models Methods Datasets Performance Metrics Challenges and Future Direction Basiv
16 pages
Verisimilar Image Synthesis For Accurate Detection and Recognition of Texts in Scenes
No ratings yet
Verisimilar Image Synthesis For Accurate Detection and Recognition of Texts in Scenes
18 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Synthetic Data Generation For Scarce Road Scene Detection Scenarios
No ratings yet
Synthetic Data Generation For Scarce Road Scene Detection Scenarios
10 pages
Cycle-Consistent Inverse GAN For Text-to-Image Synthesis - 3474085.3475226
No ratings yet
Cycle-Consistent Inverse GAN For Text-to-Image Synthesis - 3474085.3475226
2 pages
Survey Paper On Text-to-Image Generation
No ratings yet
Survey Paper On Text-to-Image Generation
8 pages
Lata 2019
No ratings yet
Lata 2019
4 pages
GAN Technical Final Report
No ratings yet
GAN Technical Final Report
21 pages
Semantic Object Accuracy For Generative Text-to-Image Synthesis
No ratings yet
Semantic Object Accuracy For Generative Text-to-Image Synthesis
14 pages
A Realistic Image Generation of Face From Text Description Using The Fully Trained Generative Adversarial Networks
No ratings yet
A Realistic Image Generation of Face From Text Description Using The Fully Trained Generative Adversarial Networks
11 pages
Generating AI Text To Image A Comprehensive Guide
No ratings yet
Generating AI Text To Image A Comprehensive Guide
3 pages
A Review of Generative Adversarial Networks For Computer Vision TasksElectronics Switzerland
No ratings yet
A Review of Generative Adversarial Networks For Computer Vision TasksElectronics Switzerland
17 pages
Image Generation From Caption
No ratings yet
Image Generation From Caption
10 pages
Image Generation A Review
No ratings yet
Image Generation A Review
39 pages
Modern Neural Network Technologies Text-to-Image: Scientific Visualization, 2023, Volume 15, Number 2, Pages 66 - 79
No ratings yet
Modern Neural Network Technologies Text-to-Image: Scientific Visualization, 2023, Volume 15, Number 2, Pages 66 - 79
13 pages
Latifiah
No ratings yet
Latifiah
18 pages
Text Similarity Using Siamese Networks and Transformers
No ratings yet
Text Similarity Using Siamese Networks and Transformers
10 pages
Frank Gabel Eml2018 Report
No ratings yet
Frank Gabel Eml2018 Report
15 pages
Unsupervised Cross-Domain Image Generation
No ratings yet
Unsupervised Cross-Domain Image Generation
14 pages
Generative Adversarial Networks For Image and Video Synthesis: Algorithms and Applications
No ratings yet
Generative Adversarial Networks For Image and Video Synthesis: Algorithms and Applications
24 pages
Conference Template A4
No ratings yet
Conference Template A4
6 pages
Dual Adversarial Inference For Text-to-Image Synthesis
No ratings yet
Dual Adversarial Inference For Text-to-Image Synthesis
20 pages
Jimaging 08 00310 v3
No ratings yet
Jimaging 08 00310 v3
33 pages
A Study On Similar Image Finder Using Deep Learning
No ratings yet
A Study On Similar Image Finder Using Deep Learning
15 pages
BTP - 6 Sem - Part1
No ratings yet
BTP - 6 Sem - Part1
40 pages
Abstractive Text Summarization Using GAN
No ratings yet
Abstractive Text Summarization Using GAN
6 pages
B.M.S College of Engineering: (Autonomous Institution Under VTU) Bangalore-560 019
No ratings yet
B.M.S College of Engineering: (Autonomous Institution Under VTU) Bangalore-560 019
25 pages
NeurIPS 2021 Cogview Mastering Text To Image Generation Via Transformers Paper
No ratings yet
NeurIPS 2021 Cogview Mastering Text To Image Generation Via Transformers Paper
14 pages
DR-GAN Distribution Regularization For Text-To-Image Generation
No ratings yet
DR-GAN Distribution Regularization For Text-To-Image Generation
15 pages
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Chatgpt Tweets Sentiment Analysis Using Machine Learning and Data Classification
No ratings yet
Chatgpt Tweets Sentiment Analysis Using Machine Learning and Data Classification
11 pages
Data Analytics and Visualization Previous Year Questions
No ratings yet
Data Analytics and Visualization Previous Year Questions
4 pages
A Comparative Analysis of Text Similarity Measures and Algorithms in Research Paper Recommender Systems
No ratings yet
A Comparative Analysis of Text Similarity Measures and Algorithms in Research Paper Recommender Systems
5 pages
RecipeBowl A Cooking Recommender For Ingredients and Recipes Using Set Transformer
No ratings yet
RecipeBowl A Cooking Recommender For Ingredients and Recipes Using Set Transformer
11 pages
Qta Lse Day2 PDF
No ratings yet
Qta Lse Day2 PDF
55 pages
Hierarchical Link Analysis For Ranking W
No ratings yet
Hierarchical Link Analysis For Ranking W
44 pages
Natural Language Processing: Neural Question Answering
No ratings yet
Natural Language Processing: Neural Question Answering
37 pages
Unit 6 - NLP Notes
No ratings yet
Unit 6 - NLP Notes
7 pages
Fake News Detection Project
No ratings yet
Fake News Detection Project
9 pages
Blockchain-Powered Learning Pathway Suggestion System: 1 Udit Kumar Mahaldar 2 Dr. Dinakaran M
No ratings yet
Blockchain-Powered Learning Pathway Suggestion System: 1 Udit Kumar Mahaldar 2 Dr. Dinakaran M
5 pages
VVM - Ai MCQ'S
No ratings yet
VVM - Ai MCQ'S
25 pages
Analysis of Implication of Artificial Intelligence (Ai) in Robotics
No ratings yet
Analysis of Implication of Artificial Intelligence (Ai) in Robotics
84 pages
Building Ontologies For Different Natural Languages
No ratings yet
Building Ontologies For Different Natural Languages
23 pages
Automated Ticket Resolution
No ratings yet
Automated Ticket Resolution
10 pages
Utilizing Vector Space Models For Identifying Legal Factors From Text
No ratings yet
Utilizing Vector Space Models For Identifying Legal Factors From Text
10 pages
Nlp-Enriched Automatic Video Segmentation: Mohannad Almousa Rachid Benlamri Richard Khoury
No ratings yet
Nlp-Enriched Automatic Video Segmentation: Mohannad Almousa Rachid Benlamri Richard Khoury
6 pages
Unstructured
No ratings yet
Unstructured
37 pages
Capstone Review 02
No ratings yet
Capstone Review 02
54 pages
Tamrakar 2015
No ratings yet
Tamrakar 2015
6 pages
ML Project Report
No ratings yet
ML Project Report
26 pages
Adaptive Focus
No ratings yet
Adaptive Focus
6 pages
Unit-3 Irs
No ratings yet
Unit-3 Irs
48 pages
Financial Documents For Effcient Retirval
No ratings yet
Financial Documents For Effcient Retirval
87 pages
1Z0 1127 24 Demo
No ratings yet
1Z0 1127 24 Demo
17 pages
TF IDF Vectorizer
No ratings yet
TF IDF Vectorizer
2 pages
Fake News Detection
No ratings yet
Fake News Detection
11 pages
Social Media Sentiment Analysis
No ratings yet
Social Media Sentiment Analysis
49 pages
A Comparative Study of Feature Selection Methods
No ratings yet
A Comparative Study of Feature Selection Methods
9 pages
Modern Information Retrieval Chapter 5 Query Operations
No ratings yet
Modern Information Retrieval Chapter 5 Query Operations
33 pages
AI 102 Notes
No ratings yet
AI 102 Notes
41 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Text To Image Synthesis Using Self

Uploaded by

Text To Image Synthesis Using Self

Uploaded by

TEXT TO IMAGE SYNTHESIS USING SELF-ATTENTION GENERATIVE

MICHELLE SARAH SIMON

A project report submitted

POST GRADUATE DIPLOMA IN MANAGEMENT

RESEARCH & BUSINESS ANALYTICS

MADRAS SCHOOL OF ECONOMICS

(RESEARCH & BUSINESS ANALYTICS)

Month and Year of Submission : MAY 2022

Title of the Project Work : TEXT TO IMAGE SYNTHESIS USING

Roll Number : 2020DMB06

Name and Designation : Mr. BALAJI MUTHUKRISHNAN

of Supervisor Visiting faculty

Madras School of Economics

In this project, I attempt the self-attention generative adversarial network (SAGAN)

Thank you everyone

The objective of the text-to-image synthesis problem is to generate high-quality images

Generation of complicated, real-world images such as MS-COCO remains an open

For this project, I take the help of

1. SPECTRAL NORMALIZATION FOR GENERATIVE ADVERSARIAL NETWORKS -

Feature extraction in image processing

For image processing, there are three methods to extract features.

For this project, I will be using extracting edges method.

 Term Frequency (TF) = (Number of times term t appears in a document)/(Number of

Let us take an example to calculate TF-IDF of a term in a document.

To evaluate the effect of the proposed self-attention mechanism, we constructed several

FID 23 22.98 22.14 18.28 18.65

Table 4.1: Fréchet inception distance of the self-attention GAN

The self-attention module is useful for modelling long-term dependencies. Furthermore, we

We demonstrated how to incorporate the self-attention model into text-to-image models to

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.