0% found this document useful (0 votes)

13 views23 pages

Ai Final Print

The document discusses building a deep neural network for machine translation using recurrent neural networks. It describes preprocessing text data, building and training a model, generating translations, and iterating on the model architecture. Key steps involve loading data, text cleaning, tokenization, padding, creating encoder and decoder RNN layers, training the model, and comparing output translations to ground truths.

Uploaded by

UNIQUE STAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views23 pages

Ai Final Print

Uploaded by

UNIQUE STAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 23

LANGUAGE TRANSLATION USING MACHINE LEARNING

A MINI PROJECT REPORT

18CSC305J - ARTIFICIAL INTELLIGENCE

Submitted by
Jaya Lohith(RA2111026010206)
S.D Azhar(RA2111026010220)
Aamir Mayan(RA2111026010257)

Under the guidance of

Dr. A Robert Singh

Assistant Professor, Department of Computational Intelligence
in partial fulfillment for the award of the
degree of

BACHELOR OF TECHNOLOGY
in

COMPUTER SCIENCE & ENGINEERING

FACULTY OF ENGINEERING AND TECHNOLOGY

S.R.M. Nagar, Kattankulathur, Chengalpattu District

MAY 2024

1
SRM INSTITUTE OF SCIENCE AND
TECHNOLOGY
(Under Section 3 of UGC Act, 1956)

BONAFIDE CERTIFICATE

Certified that Mini project report titled “LANGUAGE TRANSLATION

USING MACHINE LEARNING” is the bona fide work of Jaya
Lohith(RA2111026010206), S.D Azhar(RA2111026010220), Aamir
Mayan(RA2111026010257) who carried out the minor project under my
supervision. Certified further, that to the best of my knowledge, the work reported
herein does not form any other project report or dissertation on the basis of which
a degree or award was conferred on an earlieroccasion on this or any other
candidate.

SIGNATURE SIGNATURE

Dr. A Robert Singh Dr. R Annie Uthra

Assistant Professor Head of the Department
CINTEL CINTEL

2
ABSTRACT

 Language translation has undergone a significant transformation with

the advent of machine learning techniques, particularly neural machine
translation (NMT). Traditional rule-based and statistical approaches have
been supplanted by deep learning architectures like recurrent neural
networks (RNNs) and transformers. These advancements have vastly
improved translation quality and efficiency by leveraging mechanisms
such as attention and self-attention, which enable models to capture
complex linguistic structures and nuances effectively.

 However, challenges persist, including data scarcity for low-resource

languages, domain adaptation issues, and the need to mitigate bias
and cultural nuances. Ongoing research efforts are addressing these
challenges through techniques like data augmentation, multi-task
learning, and adversarial training. Despite these challenges, machine
learning-based translation systems have found wide-ranging
applications in domains such as e-commerce, healthcare, legal, and
diplomacy,
facilitating cross-cultural communication and global collaboration.

 In conclusion, recent developments in machine learning have

revolutionized language translation, empowering NMT systems to handle
diverse linguistic contexts with remarkable accuracy. While challenges
remain, ongoing research is driving innovation to overcome these hurdles
and further enhance the capabilities and applicability of machine
learning-based translation systems in diverse real-world scenarios.

3
TABLE OF CONTENTS

1. ABSTRACT 1

2. TABLE OF CONTENTS 4

3. INTRODUCTION 5

4. LITERATURE SURVEY 7

5. SYSTEM ARCHITECTURE AND DESIGN 9

6. METHODOLOGY 13

7. CODING AND TESTING 15

8. OUTPUT 19

9. CONCLUSION AND FUTURE ENHANCEMENT 21

10. REFERENCES 22

4
CHAPTER I
INTRODUCTION

• In this project, we build a deep neural network that functions as part of

a machine translation pipeline. The pipeline accepts English text as
input and returns the French translation. The goal is to achieve the
highest
translation accuracy possible.

• To translate a corpus of English text to French, we need to build a

recurrent neural network (RNN). Before diving into the
implementation, let’s first build some intuition of RNNs and why
they’re useful for NLP tasks.

• Depending on the use-case, you’ll want to set up your RNN to handle

inputs and outputs differently. For this project, we’ll use a many-to-many
process where the input is a sequence of English words and the output is a
sequence of French words.

• Below is a summary of the various preprocessing and modeling

steps. The high-level steps include:

1. Preprocessing: load and examine data, cleaning,

tokenization, padding.

2. Modelling: build, train, and test the model.

3. Prediction: generate specific translations of English to French,

and compare the output translations to the ground truth
translations.

4. Iteration: iterate on the model, experimenting with

different architectures.
5
• We use Keras for the frontend and TensorFlow for the backend in this
project. I prefer using Keras on top of TensorFlow because the syntax is
simpler, which makes building the model layers more intuitive.
However, there is a trade-off with Keras as you lose the ability to do
fine-grained customizations. But this won’t affect the models we’re
building in this project.

6
CHAPTER II
LITERATURE SURVEY

 Digitalization is not a new process, per se, but the speed and scope
of digital processes, and their integration in companies and labour,
have significantly increased, according to the latest research. The
digital
transformation is quickly changing the world of work in the
developed societies of Europe and North America. Although the view
that new
technologies are not deterministic currently prevails, their
implementation impacts both labour and employment. The debate on the
future of work in recent years has been dominated by pessimistic
scenarios of job
extinction.
 The translating profession is considered creative 7 due to the fact that
translators solve communication problems in different cultural and
communicational environments, which is a highly creative activity.
In
recent years, this particular profession has changed in a few ways. Firstly,
the share of self-employed translators is increasing worldwide at the
expense of permanently employed ones. Secondly, a decent share of job
and task seeking is taking place through online work-finding platforms.
According to a forecast, in 2025 online platforms are expected to be
accountable for about one third of all labour relationships 8.
 Finally, interest in new technologies has been growing in translation
work. This kind of digitalization started in the 1990s, with the
implementation of machine translation technologies. This was most
7
marked in agencies, government services, and multinational companies,
where translations, primarily of technical documentation, were produced

8
on a large scale. This was the major market for the mainframe systems
Systran, Logos, METAL, and ATL Currently, the opinion that the
translating profession will be heavily impacted by digitalization,
automatization, and AI prevails in the literature. However, in practice, the
data on technologies’ implementation in translation is not exactly
deterministic.
 For example, a relatively recent survey shows that digital tools are
widely used in Denmark, but machine translation tools are still less
common.
Further, relatively recent research in Spanish-speaking countries shows
that 45% of researched companies offering language services in Spanish
already use machine translation. There is a predominant view that these
technologies will further penetrate translation work.
 This process opens up a number of debates. For example, some
authors point out that it is necessary to re-think the freedom of
translation, as
translation technology is quickly ousting human translators. According to
others, this process has to change the way translators are educated, and
the way they work.
 Some publications even show that translators are leaving the profession
because of the technological impact, but there no large-scale research
results have been published that would shed light on the numbers or
reasons behind this process. In this respect, the present study of their
motivation, views, and expectations contributes not only to the body of
research on a specific creative profession, but to the future
development of labour in general.

9
CHAPTER III
SYSTEM ARCHITECTURE AND DESIGN

 First, let’s breakdown the architecture of an RNN at a high level.

o Inputs:
Input sequences are fed into the model with one word for
every time step. Each word is encoded as a unique integer or
one-hot encoded vector that maps to the English dataset
vocabulary.
o Embedding Layers:
Embeddings are used to convert each word to a vector. The size
of the vector depends on the complexity of the vocabulary.
o Recurrent Layers (Encoder):
This is where the context from word vectors in previous time
steps is applied to the current word vector.
o Dense Layers (Decoder):
These are typical fully connected layers used to decode
the encoded input into the correct translation sequence.
o Outputs:
The outputs are returned as a sequence of integers or one-hot
encoded vectors which can then be mapped to the French dataset
vocabulary.
 Embeddings allow us to capture more precise syntactic and semantic
word relationships. This is achieved by projecting each word into n-
dimensional space. Words with similar meanings occupy similar
regions of this space; the closer two words are, the more similar they
are. And

10
often the vectors between words represent useful relationships, such as
gender, verb tense, or even geopolitical relationships.

 Since our dataset for this project has a small vocabulary and low
syntactic variation, we’ll use Keras to train the embeddings ourselves.
 Our sequence-to-sequence model links two recurrent networks: an
encoder and decoder. The encoder summarizes the input into a context
variable, also called the state. This context is then decoded and the
output sequence is generated.

11
 Since both the encoder and decoder are recurrent, they have loops
which process each part of the sequence at different time steps. To
picture this, it’s best to unroll the network so we can see what’s
happening at each
time step.
 In the example below, it takes four timesteps to encode the entire input
sequence. At each time step, the encoder “reads” the input word and
performs a transformation on its hidden state. Then it passes that
hidden state to the next time step. The bigger the hidden state, the
greater the learning capacity of the model, but also the greater the
computation requirements.

12
 For now, notice that for each time step after the first word in the sequence
there are two inputs: the hidden state and a word from the sequence. For
the encoder, it’s the next word in the input sequence. For the decoder, it’s
the previous word from the output sequence.
 To implement bidirectional communication, we train two RNN
layers simultaneously. The first layer is fed the input sequence as-is
and the second is fed a reversed copy.

13
CHAPTER IV
METHODOLOGY

 Load & Examine Data

o The inputs are sentences in English; the outputs are the
corresponding translations in French.
o When we run a word count, we can see that the vocabulary for
the dataset is quite small. This was by design for this project. This
allows us to train the models in a reasonable time.

 Cleaning

o No additional cleaning needs to be done at this point. The data

has already been converted to lowercase and split so that there are
spaces between all words and punctuation.
o For other NLP projects you may need to perform additional
steps such as: remove HTML tags, remove stop words, remove
punctuation or convert to tag representations, label the parts of
speech, or perform entity extraction.

 Tokenization
o Next, we need to tokenize the data — i.e., convert the text to
numerical values. This allows the neural network to perform
operations on the input data. For this project, each word and
punctuation mark will be given a unique ID. (For other NLP
projects, it might make sense to assign each character a unique
ID.)
o When we run the tokenizer, it creates a word index, which is
then used to convert each sentence to a vector.
14
 Padding
o When we feed our sequences of word IDs into the model, each
sequence needs to be the same length. To achieve this, padding
is added to any sequence that is shorter than the max length (i.e.
shorter than the longest sentence).

 Encoding and Decoding

o The encoder summarizes the input into a context variable,
also called the state. This context is then decoded and the
output sequence is generated.

15
CHAPTER V
CODING AND TESTING
import helper
import numpy as np
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Model, Sequential
from keras.layers import GRU, Input, Dense, TimeDistributed, Activation,
RepeatVector, Bidirectional, Dropout, LSTM
from keras.layers import Embedding
from keras.optimizers import Adam
from keras.losses import sparse_categorical_crossentropy
import tensorflow as tf
english_path='https://raw.githubusercontent.com/projjal1/datasets/master/
small_vocab_en.txt'
french_path='https://raw.githubusercontent.com/projjal1/datasets/master/
small_vocab_fr.txt'
import os
def load_data(path):
input_file = os.path.join(path)
with open(input_file, "r") as f:
data = f.read()
return data.split('\n')

#Using helper to inport dataset

english_data=tf.keras.utils.get_file('file1',english_path)
french_data=tf.keras.utils.get_file('file2',french_path)
#Now loading data
english_sentences=load_data(english_data)
french_sentences=load_data(french_data)
for i in range(5):
print('Sample :',i)
print(english_sentences[i])
print(french_sentences[i])
print('-'*50)

import collections
english_words_counter = collections.Counter([word for sentence in
english_sentences for word in sentence.split()])
french_words_counter = collections.Counter([word for sentence in
french_sentences for word in sentence.split()])
print('English Vocab:',len(english_words_counter))
print('French Vocab:',len(french_words_counter))
def tokenize(x):
tokenizer = Tokenizer()
tokenizer.fit_on_texts(x)
return tokenizer.texts_to_sequences(x), tokenizer
16
# Tokenize Sample output
text_sentences = [
'The quick brown fox jumps over the lazy dog .',
'By Jove , my quick study of lexicography won a prize .',
'This is a short sentence .']

text_tokenized, text_tokenizer = tokenize(text_sentences)

print(text_tokenizer.word_index)
print()

for sample_i, (sent, token_sent) in enumerate(zip(text_sentences,

text_tokenized)):
print('Sequence {} in x'.format(sample_i + 1))
print(' Input: {}'.format(sent))
print(' Output: {}'.format(token_sent))

def pad(x, length=None):

return pad_sequences(x, maxlen=length, padding='post')

def preprocess(x, y):

"""
Preprocess x and y
:param x: Feature List of sentences
:param y: Label List of sentences
:return: Tuple of (Preprocessed x, Preprocessed y, x tokenizer, y
tokenizer)
"""
preprocess_x, x_tk = tokenize(x)
preprocess_y, y_tk = tokenize(y)
preprocess_x = pad(preprocess_x)
preprocess_y = pad(preprocess_y)
# Keras's sparse_categorical_crossentropy function requires the labels to
be in 3 dimensions
#Expanding dimensions
preprocess_y = preprocess_y.reshape(*preprocess_y.shape, 1)
return preprocess_x, preprocess_y, x_tk, y_tk

preproc_english_sentences, preproc_french_sentences, english_tokenizer,

french_tokenizer =\
preprocess(english_sentences, french_sentences)

max_english_sequence_length = preproc_english_sentences.shape[1]
max_french_sequence_length = preproc_french_sentences.shape[1]
english_vocab_size = len(english_tokenizer.word_index)
french_vocab_size = len(french_tokenizer.word_index)
print('Data Preprocessed')
print("Max English sentence length:", max_english_sequence_length)
17
print("Max French sentence length:", max_french_sequence_length)
print("English vocabulary size:", english_vocab_size)
print("French vocabulary size:", french_vocab_size)
def logits_to_text(logits, tokenizer):
index_to_words = {id: word for word, id in tokenizer.word_index.items()}
index_to_words[0] = ''
#So basically we are predicting output for a given word and then selecting
best answer
#Then selecting that label we reverse-enumerate the word from id
return ' '.join([index_to_words[prediction] for prediction in
np.argmax(logits, 1)])
def embed_model(input_shape, output_sequence_length, english_vocab_size,
french_vocab_size):
"""
Build and train a RNN model using word embedding on x and y
:param input_shape: Tuple of input shape
:param output_sequence_length: Length of output sequence
:param english_vocab_size: Number of unique English words in the dataset
:param french_vocab_size: Number of unique French words in the dataset
:return: Keras model built, but not trained
"""
# TODO: Implement
# Hyperparameters
learning_rate = 0.005
# TODO: Build the layers
model = Sequential()
model.add(Embedding(english_vocab_size, 256, input_length=input_shape[1],
input_shape=input_shape[1:]))
model.add(GRU(256, return_sequences=True))
model.add(TimeDistributed(Dense(1024, activation='relu')))
model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(french_vocab_size, activation='softmax')))
# Compile model
model.compile(loss=sparse_categorical_crossentropy,
optimizer=Adam(learning_rate),
metrics=['accuracy'])
return model

# Reshaping the input to work with a basic RNN

tmp_x = pad(preproc_english_sentences, preproc_french_sentences.shape[1])
tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2]))
simple_rnn_model = embed_model(
tmp_x.shape,
preproc_french_sentences.shape[1],
len(english_tokenizer.word_index)+1,
len(french_tokenizer.word_index)+1)

simple_rnn_model.summary()
18
history=simple_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size=1024,
epochs=20, validation_split=0.2)
simple_rnn_model.save('model.h5')
def final_predictions(text):
y_id_to_word = {value: key for key, value in
french_tokenizer.word_index.items()}
y_id_to_word[0] = ''
sentence = [english_tokenizer.word_index[word] for word in text.split()]
sentence = pad_sequences([sentence], maxlen=preproc_french_sentences.shape[-
2], padding='post')
print(sentence.shape)
print(logits_to_text(simple_rnn_model.predict(sentence[:1])[0],
french_tokenizer))

import re
txt=input().lower()
final_predictions(re.sub(r'[^\w]', ' ', txt))

19
CHAPTER VI
OUTPUT

20
21
CHAPTER VII
CONCLUSION AND FUTURE ENHANCEMENT

 We have successfully constructed a machine learrning method to

provide a bidrectional translation of english to french using an RNN and
it has yielded a 79% accuracy value in a training time of 27 epochs.

 Future Improvements:
o Do proper data split (training, validation, test):
Currently, there is no test set, only training and validation.
o LSTM + attention:
This has been the de facto architecture for RNNs over the past few
years, although there are some limitations.
o Train on a larger and more diverse text corpus:
The text corpus and vocabulary for this project are quite small with
little variation in syntax. As a result, the model is very brittle. To
create a model that generalizes better, you’ll need to train on a
larger dataset with more variability in grammar and sentence
structure.
o Residual layers:
You could add residual layers to a deep LSTM RNN, as described
in this paper. Or, use residual layers as an alternative to LSTM
and GRU, as described here.

22
REFERENCES
https://towardsdatascience.com/language-translation-with-rnns-
d84d43b40571
https://tommytracey.github.io/AIND-
Capstone/machine_translation.html

Understanding Deep Learning
100% (1)
Understanding Deep Learning
39 pages
Real Time Voice Translator
No ratings yet
Real Time Voice Translator
28 pages
Machine Translation
No ratings yet
Machine Translation
10 pages
A12 Mini Project Documentation 1
No ratings yet
A12 Mini Project Documentation 1
56 pages
LLM AI4Bharath
No ratings yet
LLM AI4Bharath
101 pages
Language Modelling Approaches To Adaptive Machine Translation
No ratings yet
Language Modelling Approaches To Adaptive Machine Translation
132 pages
2503 06594v1-LaMaTE
No ratings yet
2503 06594v1-LaMaTE
36 pages
Mandarin Translator Bro
No ratings yet
Mandarin Translator Bro
23 pages
Internship Report (Sanjay Final)
No ratings yet
Internship Report (Sanjay Final)
45 pages
702 - Sample Assignment
No ratings yet
702 - Sample Assignment
20 pages
Final Report - 12
No ratings yet
Final Report - 12
60 pages
Machine Learning in Translation (Peng Wang, David B. Sawyer) (Z-Library)
No ratings yet
Machine Learning in Translation (Peng Wang, David B. Sawyer) (Z-Library)
219 pages
Paper Review
No ratings yet
Paper Review
41 pages
Unit V - NLP
No ratings yet
Unit V - NLP
30 pages
Atp3 34x40
No ratings yet
Atp3 34x40
228 pages
(Slide) Neural Machine Translation
No ratings yet
(Slide) Neural Machine Translation
37 pages
2024 Lukasik AI in Translation
No ratings yet
2024 Lukasik AI in Translation
16 pages
Preprints202505 2271 v1
No ratings yet
Preprints202505 2271 v1
10 pages
Prof R A SanusiKeynote
No ratings yet
Prof R A SanusiKeynote
28 pages
Electronics 14 00243
No ratings yet
Electronics 14 00243
30 pages
Thank You
No ratings yet
Thank You
23 pages
1st Review-Tarun
No ratings yet
1st Review-Tarun
19 pages
Project Phase 4 Ibm
No ratings yet
Project Phase 4 Ibm
8 pages
Summaries of The Chapters
No ratings yet
Summaries of The Chapters
29 pages
Bangla To English Machine Translation
No ratings yet
Bangla To English Machine Translation
112 pages
Improving Machine Translation With Large Language Models - A Preliminary Study With Cooperative Decoding
No ratings yet
Improving Machine Translation With Large Language Models - A Preliminary Study With Cooperative Decoding
19 pages
Natural Language Processing For Language Translation
No ratings yet
Natural Language Processing For Language Translation
23 pages
Tanujasynopsis
No ratings yet
Tanujasynopsis
8 pages
AI and Translation
No ratings yet
AI and Translation
16 pages
Phase 1 Project
No ratings yet
Phase 1 Project
18 pages
Machine Translation of Vedic Sanskrit Using Deep Learning Algorithm
No ratings yet
Machine Translation of Vedic Sanskrit Using Deep Learning Algorithm
4 pages
Neural Machine Translation A Review of Methods Resources and - 2020 - AI Ope
No ratings yet
Neural Machine Translation A Review of Methods Resources and - 2020 - AI Ope
17 pages
Lang Gragh
No ratings yet
Lang Gragh
14 pages
Diy Drone and Quadcopter Projects The Editors of PDF Download
No ratings yet
Diy Drone and Quadcopter Projects The Editors of PDF Download
41 pages
Introduction To Artificial Intelligence Translation and Cyber Security
No ratings yet
Introduction To Artificial Intelligence Translation and Cyber Security
12 pages
Language Translation
No ratings yet
Language Translation
4 pages
Replacing Human Translators-1
No ratings yet
Replacing Human Translators-1
3 pages
Lecture 11
No ratings yet
Lecture 11
5 pages
Precast Concrete Bearing Wall Panel Design
100% (1)
Precast Concrete Bearing Wall Panel Design
22 pages
Rasees Mohamed Alnagaz AI Proposal 2
No ratings yet
Rasees Mohamed Alnagaz AI Proposal 2
11 pages
Vaishnavi Paper
No ratings yet
Vaishnavi Paper
5 pages
Proposal PhamThaiNguyen 22560053
No ratings yet
Proposal PhamThaiNguyen 22560053
11 pages
Simple Language Translator
No ratings yet
Simple Language Translator
14 pages
NLP Unit-5
No ratings yet
NLP Unit-5
14 pages
084 Article p1
No ratings yet
084 Article p1
13 pages
BY:-Walabuma Lenjiso: Advisor
No ratings yet
BY:-Walabuma Lenjiso: Advisor
22 pages
ChatGPTvs - GoogleTranslate HiT-IT-2023-proceedings
No ratings yet
ChatGPTvs - GoogleTranslate HiT-IT-2023-proceedings
12 pages
Oak Iron Rulebook Full PDF
No ratings yet
Oak Iron Rulebook Full PDF
31 pages
Quinn Thesis Final On NMT
No ratings yet
Quinn Thesis Final On NMT
29 pages
Multi-Model Neural Machine Translation: B. Nikitha, K. Bhanu Prakash, M. Sravanthi Suma, M. Kavya Srihitha
No ratings yet
Multi-Model Neural Machine Translation: B. Nikitha, K. Bhanu Prakash, M. Sravanthi Suma, M. Kavya Srihitha
2 pages
359 1632 1 PB
No ratings yet
359 1632 1 PB
5 pages
Government College of Engineering and Technology Jammu
No ratings yet
Government College of Engineering and Technology Jammu
20 pages
U3T4 Machine Translation Is Almost A Solved Problem
No ratings yet
U3T4 Machine Translation Is Almost A Solved Problem
4 pages
English To Luganda Translation
No ratings yet
English To Luganda Translation
13 pages
Real-Time Language Translation Using Transformer Models in Python
No ratings yet
Real-Time Language Translation Using Transformer Models in Python
5 pages
Ancient Script Detection System
No ratings yet
Ancient Script Detection System
20 pages
CPM18th Care of Older Persons
No ratings yet
CPM18th Care of Older Persons
11 pages
Machine Translation Using Natural Language Process
No ratings yet
Machine Translation Using Natural Language Process
6 pages
Group 1 - FC1 G12 01 STEM - 1st Draft of RRL
No ratings yet
Group 1 - FC1 G12 01 STEM - 1st Draft of RRL
10 pages
RCSHPPR 22
No ratings yet
RCSHPPR 22
5 pages
Paper 14038
No ratings yet
Paper 14038
4 pages
Practical Research 1 Module 1 Performance Task
100% (2)
Practical Research 1 Module 1 Performance Task
5 pages
Ikeja Electric PLC's Financial Statement For Statutory Report
No ratings yet
Ikeja Electric PLC's Financial Statement For Statutory Report
76 pages
Advanced Technical Analysis of Contemporary Translation Technologies
No ratings yet
Advanced Technical Analysis of Contemporary Translation Technologies
4 pages
Advanced Technical Exploration of Modern Translation Technologies
No ratings yet
Advanced Technical Exploration of Modern Translation Technologies
4 pages
Case Study - The Impact of Artificial Intelligence On Language Translation
No ratings yet
Case Study - The Impact of Artificial Intelligence On Language Translation
3 pages
Physics: Motion in One Direction: Instantaneous Velocity and Acceleration
No ratings yet
Physics: Motion in One Direction: Instantaneous Velocity and Acceleration
11 pages
Final Translator
No ratings yet
Final Translator
9 pages
Department of Computer Science, University of Kashmir Presentation For PHD Admission
No ratings yet
Department of Computer Science, University of Kashmir Presentation For PHD Admission
9 pages
This Study Resource Was: Supply Chain Management
No ratings yet
This Study Resource Was: Supply Chain Management
4 pages
21 Reasons Kettlebells PDF
No ratings yet
21 Reasons Kettlebells PDF
4 pages
TPEditor V1.10 Manual
No ratings yet
TPEditor V1.10 Manual
100 pages
95SCS-4 Sr. No. 10 Examination of Marine Engineer Officer
No ratings yet
95SCS-4 Sr. No. 10 Examination of Marine Engineer Officer
6 pages
Morphology of Flowering Plants Learn Cbse
No ratings yet
Morphology of Flowering Plants Learn Cbse
6 pages
The Technical Aspects When Using BENDER Communication Solutions
No ratings yet
The Technical Aspects When Using BENDER Communication Solutions
4 pages
Marathi To English Neural Machine Translation With Near Perfect Corpus and Transformers
No ratings yet
Marathi To English Neural Machine Translation With Near Perfect Corpus and Transformers
5 pages
Hindi To French Ankur
No ratings yet
Hindi To French Ankur
6 pages
The Elements and Principles of Art
No ratings yet
The Elements and Principles of Art
4 pages
TWGMC 1N4007 - C727081 - Diode 1N4001 Surface Mount
No ratings yet
TWGMC 1N4007 - C727081 - Diode 1N4001 Surface Mount
3 pages
Maligad Week-1 Assignment GEE-311 Gender-Society CSAB
No ratings yet
Maligad Week-1 Assignment GEE-311 Gender-Society CSAB
3 pages
SAN AGUSTIN v. PEOPLE
No ratings yet
SAN AGUSTIN v. PEOPLE
1 page
Chapter 4 - Kanban Agile Method
No ratings yet
Chapter 4 - Kanban Agile Method
5 pages
4MS Year Lesson Plan 1 Seq 1 2018-2019
No ratings yet
4MS Year Lesson Plan 1 Seq 1 2018-2019
3 pages
Project-Description-for-Scoping MCTEP
No ratings yet
Project-Description-for-Scoping MCTEP
33 pages
SOAL UAS BAHASA INGGRIS Semester Ganjil
No ratings yet
SOAL UAS BAHASA INGGRIS Semester Ganjil
8 pages
Digital Twins For Precision Healthcare
No ratings yet
Digital Twins For Precision Healthcare
20 pages
At Home and Abroad
No ratings yet
At Home and Abroad
6 pages
14.07.24 - SR - Star Co Super Chaina (Model-A&b) - Exams Syllabus Clarification
No ratings yet
14.07.24 - SR - Star Co Super Chaina (Model-A&b) - Exams Syllabus Clarification
2 pages
Describe and Evaluate Vygotsky's Theory of Cognitive Development
No ratings yet
Describe and Evaluate Vygotsky's Theory of Cognitive Development
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ai Final Print

Uploaded by

Ai Final Print

Uploaded by

LANGUAGE TRANSLATION USING MACHINE LEARNING

A MINI PROJECT REPORT

Under the guidance of

Dr. A Robert Singh

COMPUTER SCIENCE & ENGINEERING

FACULTY OF ENGINEERING AND TECHNOLOGY

S.R.M. Nagar, Kattankulathur, Chengalpattu District

Certified that Mini project report titled “LANGUAGE TRANSLATION

Dr. A Robert Singh Dr. R Annie Uthra

 Language translation has undergone a significant transformation with

 However, challenges persist, including data scarcity for low-resource

 In conclusion, recent developments in machine learning have

5. SYSTEM ARCHITECTURE AND DESIGN 9

7. CODING AND TESTING 15

9. CONCLUSION AND FUTURE ENHANCEMENT 21

• In this project, we build a deep neural network that functions as part of

• To translate a corpus of English text to French, we need to build a

• Depending on the use-case, you’ll want to set up your RNN to handle

• Below is a summary of the various preprocessing and modeling

1. Preprocessing: load and examine data, cleaning,

2. Modelling: build, train, and test the model.

3. Prediction: generate specific translations of English to French,

4. Iteration: iterate on the model, experimenting with

 First, let’s breakdown the architecture of an RNN at a high level.

 Load & Examine Data

o No additional cleaning needs to be done at this point. The data

 Encoding and Decoding

#Using helper to inport dataset

text_tokenized, text_tokenizer = tokenize(text_sentences)

for sample_i, (sent, token_sent) in enumerate(zip(text_sentences,

def pad(x, length=None):

def preprocess(x, y):

preproc_english_sentences, preproc_french_sentences, english_tokenizer,

# Reshaping the input to work with a basic RNN

 We have successfully constructed a machine learrning method to

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.