0% found this document useful (0 votes)

27 views6 pages

Ijcrt 196552

Uploaded by

Lâm Thế Tài

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views6 pages

Ijcrt 196552

Uploaded by

Lâm Thế Tài

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Image Caption Generator Using CNN and LSTM

Swarnim Tripathi Ravi Sharma

swarnim0711@gmail.com ravi.sharma@galgotiasuniversity.edu.in
Galgotias University,India Galgotias University,India

Abstract - F
or this paper, we use CNN and LSTM to and many others. Social media like Instagram ,
become aware of the caption of the image. Image Facebook etc can generate captions routinely from
caption generation is a system that comprehends images
natural language processing & computer vision The principal goal of this research paper is to get a
standards to recognize the connection of the image little bit of expertise in deep learning strategies. We
in English. In this research paper, we cautiously use two strategies specially CNN and LSTM for
pursue a number of important concepts of image classification.
photograph captioning and its familiar processes.
We talk about Keras library, numpy and jupyter IMAGE CAPTIONING TECHNIQUES
notebooks for the making of this paper. We also talk CNN - Convolutional Neural systems are specific
about flickr_dataset and CNN used for photo important neural systems that can produce
classification. information that has an information shape,for
example, a 2D lattice and CNN is valuable for
Keywords- CNN,LSTM,image captioning, deep working with pictures. It examines pictures from
learning. left corner to the right corner and through to
extricate significant highlights from the picture,
and consolidates the element to characterize
INTRODUCTION pictures. It can deal with interpreted, pivoted,
Every day we see a lot of photographs in the scaled, and modified pictures. The Convolutional
surroundings , on social media and in the neural system is a profound learning calculation
newspapers. Humans are able to recognize that takes in the info picture, allocates significance
photographs themselves only. We humans can pick to various components/protests in the picture, and
out the photographs without their designated recognizes it from each other.
captions but on the other hand machines need
images to get trained first then it'd generate the
photograph caption automatically.
Image captioning may benefit for loads of
purposes, for example supporting the visionless
person using text-to-speech through real time
feedback about encompassing the situation over a
camera feed, improving social medical leisure with Fig.1 CNN Architecture
the aid of reorganizing the captions for photographs The required pre-handling in ConvNet negligible
in social feed alongwith messages to speech. when compared with other order calculations. In
Facilitating kids in recognizing substances further spite of the fact that channels are hand-designed in
to gaining knowledge of the language. Captions crude strategies, with sufficient preparation,
for every photograph on the world wide web can ConvNets is fit for learning these
produce quicker & detailed authentic photographs channels/highlights. The structure of the curved
exploring and indexing. Image captioning has system is like the neuronal network design inside
diverse packages in numerous fields inclusive of human mind & is inspired by the way of the
biomedicine, commerce, internet looking and navy organization of the visual cortex. Singular neurons
react to upgrades simply in a limited district of the compare two images we check the pixel values of
visible field known as open field. The assortment each pixel. This technique only helps us to compare
of such fields covers the summation of visual two identical images only but when we keep
regions. different images to compare the comparison fails.
In CNN image comparison takes place piece by
CNN : Architecture - A pure rustic neural network, in piece.
whatever location all neurons in a single layer merge
with all of the neurons in the subsequent layer is
inefficient in regards to analyzing large pictures and
video. For a normal size picture with many picture
elements called pixels & 3-tone colors (RGB i.e., red
color,green color,blue color), the range of restriction
utilizing an accepted neural system will be in the tons,
& that can prompt overfitting.
To constrain effective quantities of restrictions &
recognition of the neural system on significant pieces
of picture, CNN utilises a 3D arrangement in which
each adjustment of neurons breaks down a little area
or “highlight” of picture. Rather than all neurons to
skip their selections to the next neural layer, each
gathering of the neurons spends significant time in
Fig.3 Feature map of CNN picture
distinguishing one piece of picture, such as a nose, a
left ear, mouth or a leg. The last yield is a point of
scope, illustrating how reasonable every one of The main reason behind using CNN algorithm is
abilities is elected as part of the class. that, this is the only algorithm which takes pictures
as an input and on the basis of input pictures
drawing the feature map, ie.classifying each pixel
on the basis on similarity and differences.The CNN
classifies the pixels and a matrix is created, which
is known as feature map. Feature map is a
collection of similar pixels placed in a separate
category. These matrices play an important role in
Fig.2 Working of CNN finding the essence of the thing in the input picture.
More about CNN -
How does CNN work ?
There are total 3 types of layers in CNN model-
As we have discussed previously, a fully connected
neural network where the input in the preceding 1. Convolutional
layers is connected to every input in the following 2. Pooling
layers is convenient for the task at hand, along such 3. Fully connected
lines, according to CNN, the neurons in a cell may In the first layer, the input image is read through the
be connected with a specific cell area before it, CNN, and on that foundation a feature map is made.
rather than all the neurons in a totally similar way. From that feature map , it serves as an input to the
following layers, i.e for the Pooling layer. In the
pooling layer, the feature map is broken down into
extra simpler parts to carefully examine the context of
the picture. This layer makes the feature map more
dense so as to discover the most critical information
about the picture.
The 1st and 2nd layers i.e Convolutional and Pooling
they’re practised so many times, depending on the
This helps in reducing the complexity of the neural picture as to get the densed information about the
network and acquiring less computing power. As picture. The extra dense feature map is created
per new computer under standard image with the because of these two layers. And this densed feature
use of numbers at each pixel. When we generally map is utilised by the last layer i.e Fully Connected.
This layer performs classification. It sorts the pixels
with respect to similarity and differences.
Classification is done upto exceptional limit so as to
The Problem with RNNs(Recurrent Neural
get the essence of the picture, help in identifying the Networks):-
objects , persons, things,etc.
RNNs are a part of a deep learning set of rules which
are performed to deal with a number of complicated or
complex computer tasks like item classification &
speech recognition. RNNs are performed to address an
array of activities that arise in series, with the
information of every situation based completely on
Fig.4 Layers of scanned picture statistics from preceding situations.
These layers help CNN to clearly locate and find Exquisitely, we intend to favour RNNs which are
features of the picture. Extraction of vital features having extended collections of data & higher
present in the picture of fixed length inputs is capabilities. This RNN can be used to carry out plenty
transformed into fixed size outputs. of real life problems like inventory forecasting &
reinforce speech recognition. Yet, RNNs are not used
CNN techniques are very much in usage viz,
to solve real life problems & that is because of the
· Computer vision— in the area of medical Vanishing Gradient problem.
sciences image analysis is done through CNNs
only. Inner structure of the body is effortlessly
examined with the help of this. Vanishing Gradient Problem -

This vanishing gradient problem is the main cause

In mobile phones, it's been used for so many things, which makes the working of RNNs challenging. In
for instance, to find the age of the person, to unlock general, the engineering of RNNs is made such that
the phone by examining the picture from the it stores the data for some short period of time and
camera. stores some array of data. It's not possible for
RNNs to remember all the data values and a long
In industries its far used for making patents or period of time. RNNs can only store some of the
copyright of specific clicked pictures. data for a small period of time. Thereupon, the
reminiscence of RNNs is only favourable for
shorter arrays of data and for short-time periods.
· Pharmaceuticals discovery— its been broadly
This vanishing gradient problem becomes very
used for discovering the drugs/pharmaceuticals, by
analysing the chemical features and finding the best prominent as compared to traditional RNNs- to
drug to cure a particular problem. solve a particular problem it adds so many time
steps, which results in losing the data when we use
backpropagation. With so many time steps, RNNs
Origin of LSTM:- have to store data values of each time step, which
results in storing more & more data values and that
LSTM was first searched by two German one is not feasible in the case of RNNs. And by this
researchers - Sepp Hochreiter and Jurgen vanishing gradient problem is formed.
Schmidhuber, in 1997. LSTM stands for long
short-term memory. In the Deep Learning What can be done so as to solve this Vanishing
discipline of recurrent neural networks, LSTM Gradient problem with RNNs -
holds a crucial place. The special element about
LSTM is that it not only stores the input data, but To solve this problem, we will be using Long
can also supply predictions about the subsequent
short-term memory (LSTM) , which is a subset of
datasets through its own. This LSTM network
RNNs. LSTM are basically constructed to
retains the stored data for a particular time period
overcome the problem of Vanishing Gradients. The
and on that basis predicts or gives the future values
exceptional thing about LSTM is that it can
to the data. This is the main purpose why LSTM is
used here more than that of traditional RNN. preserve the data values for lengthy interval
of time and hence can solve the vanishing gradient not needed in the future to solve a particular task.
problem. This gate is responsible for the overall performance
of the LSTM, it optimizes the data.
LSTMs are constructed in such a manner that they · Input gate — the starting of LSTM starts from
always contain errors. And due to these errors this gate, i.e. input gate. This gate takes input from
LSTM keeps studying the data values over several the user and supplies the input data to other gates.
time steps. Because of studying data values again · Output gate — This gate is responsible for
& again, it makes studying backpropagation easy showcasing the desired result in a proper manner.
over time & layers.

Uses of Long Short-Term Memory Networks :-

LSTMs are profoundly and mostly used for variety

deep learning duties that largely encompasses
forecasting of the data depending upon the
preceding data. The 2 remarkable illustrations
cover text prediction and stock market prediction.

Text Prediction - The LSTM is very used in

predicting the texts. The long term memory,
understanding of LSTM makes it capable enough
to predict the next words in the sentences. This is
the result of the LSTM network in predicting the
next words by its own. The LSTM first stores the
data, the feel of the words, the styling of the words,
the use of the words in a particular situation,etc and
Fig.5 Gates in LSTM
on that basis predicts the next words. The stored
data, i.e. input data is further used for future use.
Ideally, as per the diagram above, LSTM uses
The best illustration can be given of text prediction
several gates to store the data and after that it is a Chatbot, that is widely utilized by the
processes the data and sends the result to the final eCommerce websites and mobile applications.
gate. When we talk about RNNs, they used to pass
Stock market Prediction - In the stock market
the data to the final gate without any processing. also, LSTM stores the data or the trends in which
From these gates in LSTM, the whole network can the market behaves at a particular time, at a
shape the data in many forms, including storing the particular instant and on that account predicts the
next variations and trends of the market. It's a
data and reviewing the data from the gates. The
problematic task to predict the variation in the
gates in the LSTM are independently skillful to stock market because market variations are very
make judgement concerning the facts & the data. challenging to predict and forecast. The LSTM
Moreover these gates are able to make judgements model has to be trained in such a manner that it
gives the correct values to the users. For that, a lot
on their own by opening or closing the gates. of data has to be stored for a lot of time, it can take
days also.
The understanding of LSTM gates to hold the data
More about LSTM -
for a period of time offers benefit to the LSTM
over the RNNs. LSTMs are basically a part of RNNs, which are
having capacity to hold more data values as
Architecture Of LSTM :- compared to RNNs. LSTMs are widely in use
today in every field. The simplest diagram of
The architecture of LSTM is very simple, it
LSTM is shown below. It consists of 3 major gates
consists of 3 major gates, which store the data for a
viz, Forget gate, input gate, output gate. These
longer period of time and help in solving the
gates are having capacity to store the data and give
difficulties which RNNs couldn't solve.
out the desired output. Whenever talked about
The 3 major gates of the LSTM covers are : LSTM network the three gates always comes up.
· Forget gate — the main work of the forget gate is The below diagram shows the simplest architecture
to filter the data, i.e. to delete all that data which is of LSTM :
The following files are set up for making this
system to run by us to check the working of the
CNN-LSTM model..

· Model – This folder will contain all the trained

models which are at first trained. This would be
one time process to train the model.

Fig.6 Working of LSTM · Description.txt – This is the file which will

contain the picture names & their related captions
Image Caption Generation Model:- later preprocessing.

In order to prepare an image caption generation · Feature.p – This file binds the picture and their
model, we will be summing up the two different related captions that are extracted from the
Xception, which is a pre-trained CNN model.
architectures. It is further called as CNN-LSTM
model. So, in this we will be using these two · Tokenizers.p – This file contains an expression
architectures to get the caption for the input which we call tokens , and these tokens are
pictures. generalised with the index value.

· CNN - it's been used to extract the important · Models.png – Diagrammatic representation of
features from the input picture. To do this, we have extension of the CNN-LSTM model.
taken a pre-trained model for our consideration
named Xception. · Testing_captions_generator.py – This is the
Python file which is used in generating the captions
· LSTM - its been used to store the data or the of the pictures.
features from the CNN model and further process it
· Training_captions_generator.ipynb – This is
and to support in the generation of a good caption
basically a Jupyter notebook, which is in short a
for the picture.
web based application. We use this to train our
model & on that basis achieving captions to our
input pictures.

Fig.7 CNN-LSTM model

Project File Architecture :- Fig.8 Project file structure

For our research purpose, we have downloaded the

data set which consists of following files :

· Flickr8k_Datasets – This file contains all the

pictures for which we have to first train our model.
It contains 8091 images.

· Flickr8k_texts – This folder contains text files &

pre-formed captions for the pictures.
In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition.
5561–5570.
[6] Lisa Anne Hendricks, Subhashini Venugopalan,
Marcus Rohrbach, Raymond Mooney, Kate
Saenko, Trevor Darrell, Junhua Mao, Jonathan
Huang, Alexander Toshev, Oana Camburu, et al.
2016. Deep compositional captioning: Describing
novel object categories without paired training
data. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition.
Fig.9 Flickr_Dataset
[7] Dzmitry Bahdanau, Kyunghyun Cho, and
Yoshua Bengio. 2015. Neural machine translation
by jointly learning to align and translate. In
CONCLUSION International Conference on Learning
Representations (ICLR).
The CNN-LSTM model was built on the idea of
[8] Shuang Bai and Shan An. 2018. A Survey on
generating the captions for the input pictures. This
Automatic Image Caption Generation.
model can be used for a variety of applications. In
Neurocomputing. ACM Computing Surveys, Vol.
this, we studied about the CNN model, RNN
models, LSTM models, and in the end we validated 0, No. 0, Article 0. Acceptance Date: October
that the model is generating captions for the input 2018. 0:30 Hossain et al.
pictures. [9] Satanjeev Banerjee and Alon Lavie. 2005.
METEOR: An automatic metric for MT evaluation
with improved correlation with human judgments.
In Proceedings of the acl workshop on intrinsic and
REFERENCES
extrinsic evaluation measures for machine
[1] Abhaya Agarwal and Alon Lavie. 2008. translation and/or summarization, Vol. 29. 65–72.
Meteor, m-bleu and m-ter: Evaluation metrics for
high-correlation with human rankings of machine
translation output. In Proceedings of the
ThirdWorkshop on Statistical Machine Translation.
Association for Computational Linguistics,
115–118.
[2] Ahmet Aker and Robert Gaizauskas. 2010.
Generating image descriptions using dependency
relational patterns. In Proceedings of the 48th
annual meeting of the association for computational
linguistics. Association for Computational
Linguistics, 1250–1258.
[3] Peter Anderson, Basura Fernando, Mark
Johnson, and Stephen Gould. 2016. Spice:
Semantic propositional image caption evaluation.
In European Conference on Computer Vision.
Springer, 382–398.
[4] Peter Anderson, Xiaodong He, Chris Buehler,
Damien Teney, Mark Johnson, Stephen Gould, and
Lei Zhang. 2017. Bottom-up and top-down
attention for image captioning and vqa. arXiv
preprint arXiv:1707.07998 (2017).
[5] Jyoti Aneja, Aditya Deshpande, and Alexander
G Schwing. 2018. Convolutional image captioning.

Image Recognition Using CNN
0% (1)
Image Recognition Using CNN
12 pages
Convolutional Neural Network
43% (7)
Convolutional Neural Network
20 pages
Department of Electronics & Electrical Engineering: Ec5245: Artificial Neural Network & Fuzzy Logic
No ratings yet
Department of Electronics & Electrical Engineering: Ec5245: Artificial Neural Network & Fuzzy Logic
51 pages
Image Caption Generator Using CNN and LSTM
No ratings yet
Image Caption Generator Using CNN and LSTM
8 pages
DL-Unit-3 Final
No ratings yet
DL-Unit-3 Final
25 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
4 pages
Convolutional NN
No ratings yet
Convolutional NN
34 pages
Ad3501-Dl-Unit 2 Notes
No ratings yet
Ad3501-Dl-Unit 2 Notes
29 pages
AD3501-DL-Unit 2
No ratings yet
AD3501-DL-Unit 2
33 pages
CNN 1
No ratings yet
CNN 1
19 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Module 5
No ratings yet
Module 5
20 pages
Variants of CNN (Page No 17-23), Structured Output (29-31), Datatypes
No ratings yet
Variants of CNN (Page No 17-23), Structured Output (29-31), Datatypes
31 pages
CV PPT Mt101
No ratings yet
CV PPT Mt101
16 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
26 pages
DL CNN
No ratings yet
DL CNN
129 pages
Sommaire CNN Presentation
No ratings yet
Sommaire CNN Presentation
10 pages
Unit Iv DL
No ratings yet
Unit Iv DL
26 pages
Topic: Convolution Neural Network: Presented by
No ratings yet
Topic: Convolution Neural Network: Presented by
13 pages
Convolutional Neural Network - Wikipedia
No ratings yet
Convolutional Neural Network - Wikipedia
21 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
35 pages
ML 2
No ratings yet
ML 2
70 pages
Unit 3 CNN 2024
No ratings yet
Unit 3 CNN 2024
58 pages
NN Jaguar Lava 122
No ratings yet
NN Jaguar Lava 122
10 pages
CNN Notes Architecture
No ratings yet
CNN Notes Architecture
4 pages
What Is A Convolutional Neural Network-Unit3
No ratings yet
What Is A Convolutional Neural Network-Unit3
12 pages
Seminar
No ratings yet
Seminar
16 pages
DL Unit 3
No ratings yet
DL Unit 3
12 pages
3 # Deep Learning
No ratings yet
3 # Deep Learning
36 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
25 pages
CNN Students
No ratings yet
CNN Students
170 pages
Unit III
No ratings yet
Unit III
60 pages
A Survey On Computer Vision Algorithms
No ratings yet
A Survey On Computer Vision Algorithms
16 pages
Unit - 2
No ratings yet
Unit - 2
31 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
9 pages
Assignment-6 STC-DL
No ratings yet
Assignment-6 STC-DL
17 pages
Unit 2
No ratings yet
Unit 2
20 pages
3 Distributing Tensor Flow Across Devices and Ser 241120 095224
No ratings yet
3 Distributing Tensor Flow Across Devices and Ser 241120 095224
47 pages
Department of Information Science and Engineering Technical Seminar (18Css84) Convolutional Neural Networks
No ratings yet
Department of Information Science and Engineering Technical Seminar (18Css84) Convolutional Neural Networks
15 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
Disadvantages of Using Anns For Image Data: High Number of Parameters Full Connectivity
No ratings yet
Disadvantages of Using Anns For Image Data: High Number of Parameters Full Connectivity
10 pages
DL Unit 3
No ratings yet
DL Unit 3
27 pages
What Is A CNN
No ratings yet
What Is A CNN
46 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
26 pages
Deep Learning Image Classification
No ratings yet
Deep Learning Image Classification
11 pages
Admin,+4554 Article+Text 17736 2 10 20210928
No ratings yet
Admin,+4554 Article+Text 17736 2 10 20210928
13 pages
Automated Neural Image Caption Generator For Visually Impaired People
No ratings yet
Automated Neural Image Caption Generator For Visually Impaired People
6 pages
Advancements in Image Classification Using Convolutional Neural Network
No ratings yet
Advancements in Image Classification Using Convolutional Neural Network
8 pages
CNN Notes Unit-3
No ratings yet
CNN Notes Unit-3
12 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
21 pages
Module 05 CNN Arctitecture
No ratings yet
Module 05 CNN Arctitecture
7 pages
4a Convolutional Neural Networks
No ratings yet
4a Convolutional Neural Networks
56 pages
Co2 CNN 3
No ratings yet
Co2 CNN 3
31 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
74 pages
DL Group 6 Rep
No ratings yet
DL Group 6 Rep
11 pages
Typical CNN (Convolutional Neural Network) Architecture: CHARAN S (1VE20CA005) Cse-Ai, Svce
No ratings yet
Typical CNN (Convolutional Neural Network) Architecture: CHARAN S (1VE20CA005) Cse-Ai, Svce
13 pages
Nria20-Dl - Unit-3 Notes-Final
No ratings yet
Nria20-Dl - Unit-3 Notes-Final
23 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
6 pages
Sania Technical Seminar
No ratings yet
Sania Technical Seminar
14 pages
Unit 3
No ratings yet
Unit 3
105 pages
Convolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python
From Everand
Convolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python
Frank Millstein
No ratings yet
Exam2005 2
0% (1)
Exam2005 2
19 pages
AbhishekYadav Assignment 02
No ratings yet
AbhishekYadav Assignment 02
24 pages
Deep Learning
No ratings yet
Deep Learning
1 page
MCQ
No ratings yet
MCQ
4 pages
Ai Assignment 2 Answer
No ratings yet
Ai Assignment 2 Answer
12 pages
cs224n Practice Midterm 3 Sol
No ratings yet
cs224n Practice Midterm 3 Sol
14 pages
Amharic Abstractive Text Summarization
No ratings yet
Amharic Abstractive Text Summarization
6 pages
Capstone Project-1
No ratings yet
Capstone Project-1
15 pages
MCQs - Deep Learning Fundamentals - Understanding Neural Networks, Activation Functions, and Bac
No ratings yet
MCQs - Deep Learning Fundamentals - Understanding Neural Networks, Activation Functions, and Bac
10 pages
Chapter 1 - Introduction To Deep Learning 2023
No ratings yet
Chapter 1 - Introduction To Deep Learning 2023
50 pages
Image Caption Generator Final Report
No ratings yet
Image Caption Generator Final Report
28 pages
MST-2 - Machine Learning
No ratings yet
MST-2 - Machine Learning
14 pages
Autoencoders and GAN Lab Programs - Algorithms and Explanations
No ratings yet
Autoencoders and GAN Lab Programs - Algorithms and Explanations
7 pages
DL Que
No ratings yet
DL Que
14 pages
Assignment-1 (MLP From Scratch) : Roll No: EDM18B055
No ratings yet
Assignment-1 (MLP From Scratch) : Roll No: EDM18B055
1 page
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
7 pages
Convolution Neural Networks Vs Fully Connected Neural Networks
No ratings yet
Convolution Neural Networks Vs Fully Connected Neural Networks
6 pages
JNTU - Neural Network
No ratings yet
JNTU - Neural Network
5 pages
Deep Learning Research Paper
No ratings yet
Deep Learning Research Paper
4 pages
Applications of Deep Learning in Severity Prediction of Traffic Accidents
No ratings yet
Applications of Deep Learning in Severity Prediction of Traffic Accidents
17 pages
Inception Net
No ratings yet
Inception Net
88 pages
NN Ch04
No ratings yet
NN Ch04
29 pages
Experiment No. 3 SL-II (ANN)
No ratings yet
Experiment No. 3 SL-II (ANN)
3 pages
Models Definition 3. Gans Training 4. Types of Gans 5. Gans Applications
No ratings yet
Models Definition 3. Gans Training 4. Types of Gans 5. Gans Applications
28 pages
2.vanishing Gradient and Exploding Gradient Simple Notes
No ratings yet
2.vanishing Gradient and Exploding Gradient Simple Notes
2 pages
BasicNeuralNetwork TrainingAndEvaluation - Ipynb Colaboratory
No ratings yet
BasicNeuralNetwork TrainingAndEvaluation - Ipynb Colaboratory
2 pages
UNIT II Basic On Neural Networks
No ratings yet
UNIT II Basic On Neural Networks
36 pages
DL - NLP - Reading Materials - Bda - Cs - 25
No ratings yet
DL - NLP - Reading Materials - Bda - Cs - 25
25 pages
SkillDzire - Deep Learning - Internship Content
No ratings yet
SkillDzire - Deep Learning - Internship Content
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ijcrt 196552

Uploaded by

Ijcrt 196552

Uploaded by

Image Caption Generator Using CNN and LSTM

Swarnim Tripathi Ravi Sharma

This vanishing gradient problem is the main cause

Uses of Long Short-Term Memory Networks :-

LSTMs are profoundly and mostly used for variety

Text Prediction - The LSTM is very used in

· Model – This folder will contain all the trained

Fig.6 Working of LSTM · Description.txt – This is the file which will

Fig.7 CNN-LSTM model

Project File Architecture :- Fig.8 Project file structure

For our research purpose, we have downloaded the

· Flickr8k_Datasets – This file contains all the

· Flickr8k_texts – This folder contains text files &

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Ijcrt 196552

Uploaded by

Ijcrt 196552

Uploaded by

Image Caption Generator Using CNN and LSTM

Swarnim Tripathi Ravi Sharma

This vanishing gradient problem is the main cause

Uses of Long Short-Term Memory Networks :-

LSTMs are profoundly and mostly used for variety

Text Prediction - ​The LSTM is very used in

· Model –​ This folder will contain all the trained

Fig.6 Working of LSTM · Description.txt –​ This is the file which will

Fig.7 CNN-LSTM model

Project File Architecture ​:- Fig.8 Project file structure

For our research purpose, we have downloaded the

· Flickr8k_Datasets – ​This file contains all the

· Flickr8k_texts –​ This folder contains text files &

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Text Prediction - The LSTM is very used in

· Model – This folder will contain all the trained

Fig.6 Working of LSTM · Description.txt – This is the file which will

Project File Architecture :- Fig.8 Project file structure

· Flickr8k_Datasets – This file contains all the

· Flickr8k_texts – This folder contains text files &