0% found this document useful (0 votes)

63 views14 pages

Team 21 Omkar Reddy Gojala Mrinalini Injeti Ramakanth

The document describes a project to generate descriptive captions for images using neural networks. The team used the Flickr8K dataset to train an encoder-decoder model with an InceptionV3 CNN encoder and LSTM decoder. The model was evaluated using BLEU scores, with the LSTM achieving a higher score than RNN. Examples of accurate, funny, and incorrect predictions are provided. Future work ideas involve using more data, visual attention techniques, and an app to aid the visually impaired.

Uploaded by

Haythem Nedri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views14 pages

Team 21 Omkar Reddy Gojala Mrinalini Injeti Ramakanth

Uploaded by

Haythem Nedri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Team 21

Omkar Reddy Gojala

Mrinalini Injeti Ramakanth
 Goal is to generate a descriptive sentence of an image
 Project was inspired by the works of Andrej Karpathy and Marc Tanti et al.(2017)

Neural Two dogs are wrestling in

Network the grass

 Potential Applications:
 Aiding visually impaired
 Generating video summary using individual frames
 We used Flickr8K dataset for this
project

 Flickr8K dataset contains a variety of

images depicting scenes and situations

 The dataset consists of 8000 images

and each image has 5 corresponding
descriptions

 We split the data into 6000, 1000, &

• A man riding his bike on a hill 1000 images as training, validation
• A man with helmet and backpack standing on dirt and testing sets respectively
bike in a hilly grassy area
• A person rides a motorbike through a grassy field  The images are of different dimensions
• Man on motorcycle riding in dry field wearing a
helmet and backpack
• The biker is riding through a grassy plain .
 Each description is tokenized and converted to lowercase
 Removed alpha-numeric characters and punctuation marks
 We use startseq and endseq as prefix and postfix for each caption respectively
 Filtered out unique words from the corpus and represented each word by an integer
 To generate a fixed length word vector we calculated the maximum length caption
 Resized all images to a fixed size of 299x299x3 using OpenCV
 Employed transfer learning using pre-trained InceptionV3 CNN model to encode images
 We removed the last softmax layer from the InceptionV3 network to extract 2048 image
vector
 For each image we will train the model by temporally injecting incremental
sequences of the description
 In this phase, we essentially create labels in our training data

Image Partial Caption Target Word

Image startseq a
Image startseq a young
Image startseq a young boy
…… …… ……
Image startseq a young boy endseq
wearing a helmet and
riding a bike in a park
 We used an encoder-decoder architecture

 2048 image vector is fed to a Dense layer to

generate 256 length image vector

 34 length word vector is fed to LSTM/RNN to

output 256 length word vector

 Decoder model adds both the encoder outputs

and is fed to Dense 256 layer

 The last Dense layer will have as many nodes as

the vocabulary size

 The last softmax layer predicts the next word

present in the output vocabulary
 Caption is predicted word by word
 Image is fed along with the first word(startseq) to the RNN to predict the second
word
 Again the same image along with first word + second word is fed to the RNN to
predict the third word and so on until the last word(endseq) is encountered

Target Word
Neural Network
(i=0) little
model
(i=1) boy
(i=2) ….
(i=0) startseq
(i=1) startseq little
(i=2) …..
Bilingual Evaluation Understudy Score (BLEU)
 BLEU is a metric for evaluating a generated sentence to a reference sentence
 BLEU score lies between 0 and 1

LSTM (Long Short Term Memory) Simple RNN (Recurrent Neural Network)

BLEU N-GRAM SCORE BLEU N-GRAM SCORE

BLEU-1 0.572214 BLEU-1 0.364472
BLEU-2 0.339204 BLEU-2 0.181942
BLEU-3 0.237129 BLEU-3 0.103185
BLEU-4 0.116733 BLEU-4 0.085675
Correct Predictions

Actual Caption: Actual Caption: Actual Caption:

a boy with a blue helmet is white fluffy dog running in a boy dribbles a basketball
riding a bike the dirt in the gymnasium

Predicted Caption: Predicted Caption: Predicted Caption:

little boy rides bike with white dog runs across the boy in white shirt is playing
helmet sand basketball
Funny Predictions ??

Actual Caption: Actual Caption: Actual Caption:

man fly fishing in a small a woman wearing a black and a group of different people
river with steam in the white outfit while holding her are walking in all different
background sunglasses directions in a city

Predicted Caption: Predicted Caption:

Predicted Caption: man in pink dress is holding group of people walking
Man is swinging on a swing her head ocean
Predictions that went really wrong!

Actual Caption: Actual Caption: Actual Caption:

A man wearing a red life A dog is chewing on a metal a young hockey player
jacket is holding a purple pole playing in the ice rink
rope while waterskiing

Predicted Caption: Predicted Caption: Predicted Caption:

man in white and white and dog is standing in its mouth chasing player in motorcycle
white shorts leash on swing is playing chasing
 We can enhance the predictions by using
more training examples. For example using
Flickr32k dataset which has 32000 images

FUTURE  Implement visual attention techniques,

which focuses on interesting parts of the
WORK image

 Creating an application for visually

impaired to convert the generated caption
into voice output

Deep Learning 117 MCQ
No ratings yet
Deep Learning 117 MCQ
33 pages
Building A Voice Based Image Caption Generator With Deep Learning
No ratings yet
Building A Voice Based Image Caption Generator With Deep Learning
6 pages
Image Caption Generator
100% (1)
Image Caption Generator
20 pages
Image Caption
No ratings yet
Image Caption
16 pages
Visual Image Caption Generator Using Deep Learning
No ratings yet
Visual Image Caption Generator Using Deep Learning
7 pages
Image Caption Generator Report
No ratings yet
Image Caption Generator Report
27 pages
Final Demo
No ratings yet
Final Demo
48 pages
Cloth Captioning
No ratings yet
Cloth Captioning
36 pages
Design of Machine Learning Algorithms For Object Captioning
No ratings yet
Design of Machine Learning Algorithms For Object Captioning
45 pages
Slides P71 Caption Generation
No ratings yet
Slides P71 Caption Generation
15 pages
Final Project Report
No ratings yet
Final Project Report
18 pages
Caption Generation With Visual Attention
No ratings yet
Caption Generation With Visual Attention
25 pages
Show and Tell: A Neural Image Caption Generator
No ratings yet
Show and Tell: A Neural Image Caption Generator
38 pages
I Image Caption Generation Using Contextual Information Fusion With Bi-LSTM-s
No ratings yet
I Image Caption Generation Using Contextual Information Fusion With Bi-LSTM-s
10 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
Aust Cse Thesis Final Book
No ratings yet
Aust Cse Thesis Final Book
72 pages
Hybrid Image Captioning Model
No ratings yet
Hybrid Image Captioning Model
6 pages
Clip
No ratings yet
Clip
15 pages
Review 3
No ratings yet
Review 3
18 pages
CSCI 5922 Neural Networks and Deep Learning: Image Captioning
No ratings yet
CSCI 5922 Neural Networks and Deep Learning: Image Captioning
26 pages
Fang 2015
No ratings yet
Fang 2015
10 pages
Minor
No ratings yet
Minor
14 pages
Presentation Manu Niha
No ratings yet
Presentation Manu Niha
11 pages
DL Project Report
No ratings yet
DL Project Report
10 pages
RP Springer
No ratings yet
RP Springer
10 pages
Imagecaptionusing CNNand LSTM
No ratings yet
Imagecaptionusing CNNand LSTM
11 pages
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
No ratings yet
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
10 pages
Image Caption Technical Report
50% (2)
Image Caption Technical Report
28 pages
Review 3
No ratings yet
Review 3
18 pages
Image Caption Generator by Using CNN and LSTM: International Journal For Multidisciplinary Research
No ratings yet
Image Caption Generator by Using CNN and LSTM: International Journal For Multidisciplinary Research
6 pages
BTP Report
No ratings yet
BTP Report
27 pages
An Empirical Study of Language CNN For Image Captioning
No ratings yet
An Empirical Study of Language CNN For Image Captioning
10 pages
Pami Im2Show and Tell: Lessons Learned From The 2015 MSCOCO Image Captioning Challenge
No ratings yet
Pami Im2Show and Tell: Lessons Learned From The 2015 MSCOCO Image Captioning Challenge
12 pages
Show Attend and Tell
No ratings yet
Show Attend and Tell
10 pages
Image Captioning Research Paper
No ratings yet
Image Captioning Research Paper
59 pages
Gu An Empirical Study ICCV 2017 Paper PDF
No ratings yet
Gu An Empirical Study ICCV 2017 Paper PDF
10 pages
Document From Deependra Singh
No ratings yet
Document From Deependra Singh
10 pages
Image Captioning
No ratings yet
Image Captioning
17 pages
Automated Image Captioning Using CNN and RNN
No ratings yet
Automated Image Captioning Using CNN and RNN
17 pages
Automated Neural Image Caption Generator For Visually Impaired People
No ratings yet
Automated Neural Image Caption Generator For Visually Impaired People
6 pages
Project Review
No ratings yet
Project Review
12 pages
Implementation of Simple and Efficient P
No ratings yet
Implementation of Simple and Efficient P
8 pages
Image Captioning Using Deep Stacked LSTMS, Contextual Word Embeddings and Data Augmentation
No ratings yet
Image Captioning Using Deep Stacked LSTMS, Contextual Word Embeddings and Data Augmentation
18 pages
Image Caption Generator Research Paper
No ratings yet
Image Caption Generator Research Paper
4 pages
Using AI in Academic Writing and Research - A Complete Guide To Effective and Ethical Academic AI
No ratings yet
Using AI in Academic Writing and Research - A Complete Guide To Effective and Ethical Academic AI
153 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
5 pages
Image Captioning
No ratings yet
Image Captioning
33 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
9 pages
Ai Image Captioning
No ratings yet
Ai Image Captioning
10 pages
Image Captioning Generator Using Deep Machine Learning
No ratings yet
Image Captioning Generator Using Deep Machine Learning
3 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
Show and Tell: A Neural Image Caption Generator
No ratings yet
Show and Tell: A Neural Image Caption Generator
9 pages
Image Caption Generator Using AI: Review - 1
No ratings yet
Image Caption Generator Using AI: Review - 1
9 pages
Image Caption Generation
No ratings yet
Image Caption Generation
8 pages
Automatic Image Captioning Using Neural Networks
No ratings yet
Automatic Image Captioning Using Neural Networks
9 pages
ROHAN PRASAD FinalProjectReport - Rohan Gamer
No ratings yet
ROHAN PRASAD FinalProjectReport - Rohan Gamer
39 pages
Video Captioning Using Deep Learning and NLP To Detect Suspicious Activities
No ratings yet
Video Captioning Using Deep Learning and NLP To Detect Suspicious Activities
5 pages
Image Caption Generator PCL
No ratings yet
Image Caption Generator PCL
19 pages
Image Captioning TR
No ratings yet
Image Captioning TR
5 pages
QQ - GG: Point Any
No ratings yet
QQ - GG: Point Any
14 pages
Image Caption Generator Using Deep Learning: Guided by Dr. Ch. Bindu Madhuri, M Tech, PH.D
No ratings yet
Image Caption Generator Using Deep Learning: Guided by Dr. Ch. Bindu Madhuri, M Tech, PH.D
9 pages
Predictive Maintenance System For Production Lines in Manufacturing (ESTE)
No ratings yet
Predictive Maintenance System For Production Lines in Manufacturing (ESTE)
10 pages
Gas Leakage Sms Message Alert GAL SMART Detection System
No ratings yet
Gas Leakage Sms Message Alert GAL SMART Detection System
28 pages
Stock Price Prediction Using Recurrent Neural Networks PDF
No ratings yet
Stock Price Prediction Using Recurrent Neural Networks PDF
132 pages
BERT-LSTM Model For Sarcasm Detection in Code-Mixed Social Media Post
No ratings yet
BERT-LSTM Model For Sarcasm Detection in Code-Mixed Social Media Post
20 pages
A Review On Cybersecurity Based On Machine Learnin
No ratings yet
A Review On Cybersecurity Based On Machine Learnin
12 pages
A Comprehensive Analytical Study of Traditional and Recent Development in Natural Language Processing
No ratings yet
A Comprehensive Analytical Study of Traditional and Recent Development in Natural Language Processing
11 pages
Predictive Maintenance in Oil & Gas
No ratings yet
Predictive Maintenance in Oil & Gas
35 pages
LSTM
No ratings yet
LSTM
22 pages
Cryptocurrency Market Dynamics: A Machine Learning-Based Approach For Price Prediction
No ratings yet
Cryptocurrency Market Dynamics: A Machine Learning-Based Approach For Price Prediction
6 pages
An Automated Approach For Predicting Road Traffic Accident Severity
No ratings yet
An Automated Approach For Predicting Road Traffic Accident Severity
11 pages
Machine Learning in Fixed Income Markets, Forecasting and Portfolio Management
No ratings yet
Machine Learning in Fixed Income Markets, Forecasting and Portfolio Management
204 pages
Lecture12 1MultimodalFusion
No ratings yet
Lecture12 1MultimodalFusion
66 pages
Machine Learning For Time Series Forecasting With Python 1st Edition Francesca Lazzeri
No ratings yet
Machine Learning For Time Series Forecasting With Python 1st Edition Francesca Lazzeri
48 pages
Presentation 12
No ratings yet
Presentation 12
16 pages
ANN - Wiki
No ratings yet
ANN - Wiki
39 pages
CPT Coding
No ratings yet
CPT Coding
10 pages
Machine Learning and The Physical Sciences0-7
No ratings yet
Machine Learning and The Physical Sciences0-7
8 pages
2015 CS 069
No ratings yet
2015 CS 069
67 pages
Adversarial Attacks On Deep-Learning Models in Natural Language Processing: A Survey
No ratings yet
Adversarial Attacks On Deep-Learning Models in Natural Language Processing: A Survey
41 pages
1 s2.0 S1570870524000180 Main 1
No ratings yet
1 s2.0 S1570870524000180 Main 1
18 pages
Handwriting To Text Conversion
No ratings yet
Handwriting To Text Conversion
7 pages
Image Processing & Deep Learning
No ratings yet
Image Processing & Deep Learning
4 pages
Context Based Text-Generation Using LSTM Networks
No ratings yet
Context Based Text-Generation Using LSTM Networks
11 pages
Two Stage Prediction Method For Capacity Aging Trajectories of Lith - 2024 - Ene
No ratings yet
Two Stage Prediction Method For Capacity Aging Trajectories of Lith - 2024 - Ene
19 pages
Understanding GRU Networks
No ratings yet
Understanding GRU Networks
8 pages
Technologies 10 00005
No ratings yet
Technologies 10 00005
11 pages
ML Paper 2
No ratings yet
ML Paper 2
8 pages
Sagar Resume
No ratings yet
Sagar Resume
1 page
Nature's Hidden Patterns
From Everand
Nature's Hidden Patterns
Rick McKeon
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Team 21 Omkar Reddy Gojala Mrinalini Injeti Ramakanth

Uploaded by

Team 21 Omkar Reddy Gojala Mrinalini Injeti Ramakanth

Uploaded by

Team 21

Omkar Reddy Gojala

Neural Two dogs are wrestling in

 Flickr8K dataset contains a variety of

 The dataset consists of 8000 images

 We split the data into 6000, 1000, &

Image Partial Caption Target Word

 2048 image vector is fed to a Dense layer to

 34 length word vector is fed to LSTM/RNN to

 Decoder model adds both the encoder outputs

 The last Dense layer will have as many nodes as

 The last softmax layer predicts the next word

BLEU N-GRAM SCORE BLEU N-GRAM SCORE

Actual Caption: Actual Caption: Actual Caption:

Predicted Caption: Predicted Caption: Predicted Caption:

Actual Caption: Actual Caption: Actual Caption:

Predicted Caption: Predicted Caption:

Actual Caption: Actual Caption: Actual Caption:

Predicted Caption: Predicted Caption: Predicted Caption:

FUTURE  Implement visual attention techniques,

 Creating an application for visually

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.