0% found this document useful (0 votes)

51 views4 pages

Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural

The document discusses implementing BERT for machine translation from English to Telugu. It provides background on machine translation methods like rule-based, statistical, and neural machine translation. It describes BERT and how it uses bidirectional training of Transformer models for language modeling. The document discusses RNNs, LSTMs, bidirectional RNNs and how they are used for sequence modeling in NLP. It also discusses limitations of Transformer models and text preprocessing techniques like tokenization and normalization that are important for neural machine translation. A literature review summarizes several papers on preprocessing and modeling for various language pairs.

Uploaded by

Govind Messi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views4 pages

Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural

Uploaded by

Govind Messi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Problem Statement:

Implementing pre-trained model BERT for Machine Translation from English to Telugu.

Introduction:
Machine Translation is being carried out from a very long time. And a lot of methods like Rule-
Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
Machine Translation. Statistical MT uses predictive algorithms to teach a compute r
how to translate text. NMT is based on the model of neural networks in the human
brain, with information being sent to different “layers” to be processed before
output. Statistical machine translation usually works less well for language pairs with
significantly different word order. Using Neural Networks, it is possible to translate
longer sentences into required language . The Previous work in neural machine
translation involves using different models like Transformer, Attention and
Encoder – Decoder. Here in this proposal we would like to introduce a fine tuned
model -BERT (Bidirectional Encoder Representations from Transformers). BERT’s key
technical innovation is applying the bidirectional training of Transformer, a popular attention
model, to language modelling. This is in contrast to previous efforts which looked at a text
sequence either from left to right or combined left-to-right and right-to-left training. the
researchers detail a novel technique named Masked LM (MLM) which allows bidirectional
training in models. Here we are going to implement BERT for translating one language to
another (English to Telugu).

RNN
The concept of the recurrent neural net is one in which the input is a sequence and we want to
use one previous input as an input to the next input somehow. Take a step back, The recurrent
uses its internal memory to capture internal state (memory) to process sequences of inputs. The
feedforward takes everything that input is composed of and makes a decision. RNN’s have
shown great success in many NLP tasks. The most commonly used type of RNNs are LSTM’s,
which are much better at capturing long-term dependencies than vanilla RNNs are. LSTMs are
essentially the same thing as the RNN, they have a different way of computing the hidden state.
LSTMs in more detail below.
LSTMS
LSTM networks are quite popular these days and we briefly talked about them above.
LSTMs don’t have a fundamentally different architecture from RNNs, but they use a different
function to compute the hidden state. The memory in LSTMs are called cells and we can think
of them as black boxes that take as input the previous state and current input . Internally
these cells decide what to keep in (and what to erase from) memory. They then combine the
previous state, the current memory, and the input. It turns out that these types of units are very
efficient at capturing long-term dependencies.
Bidirectional RNNs
These are based on the idea that the output at time may not only depend on the previous
elements in the sequence, but also future elements. For example, to predict a missing word in
a sequence you want to look at both the left and the right context. Bidirectional RNNs are quite
simple. They are just two RNNs stacked on top of each other. The output is then computed
based on the hidden state of both RNNs.

BERT
BERT uses a multi-layer bidirectional Transformer encoder. Its self-attention layer
performs self-attention in both directions. Google has released two variants of the model:

1. BERT Base: Number of Transformers layers = 12, Total Parameters = 110M

2. BERT Large: Number of Transformers layers = 24, Total Parameters = 340M

Pretrained Language Models(LM) such as ELMO and BERT (Peters et al., 2018; Devlin et al.,
2018) have turned out to signiﬁcantly improve the quality of several Natural Language
Processing (NLP) tasks by transferring the prior knowledge learned from data-rich
monolingual corpora to data-poor NLP tasks such as question answering, bio-medical
information extraction and standard benchmarks (Wang et al., 2018; Lee et al., 2019). In
addition, it was shown that these representations contain syntactic and semantic information in
different layers of the network (Tenney et al., 2019). Therefore, using such pretrained LMs for
Neural Machine Translation (NMT) is appealing.
Transformer Model
• The Transformer model in NLP has truly changed the way we work with text data
• Transformer is behind the recent NLP developments, including Google’s BERT
• The Transformer in NLP is a novel architecture that aims to solve sequence-to-sequence
tasks while handling long-range dependencies with ease.
• The Encoder block has 1 layer of a Multi-Head Attention followed by another layer
of Feed Forward Neural Network. The decoder, on the other hand, has an
extra Masked Multi-Head Attention.
• The encoder and decoder blocks are actually multiple identical encoders and decoders
stacked on top of each other. Both the encoder stack and the decoder stack have the
same number of units.

Limitations of the Transformer

Transformer is undoubtedly a huge improvement over the RNN based seq2seq models. But it
comes with its own share of limitations:

• Attention can only deal with fixed-length text strings. The text has to be split into a
certain number of segments or chunks before being fed into the system as input
• This chunking of text causes context fragmentation. For example, if a sentence is
split from the middle, then a significant amount of context is lost. In other words, the
text is split without respecting the sentence or any other semantic boundary
Text Preprocessing:
It is the process of modifying the dataset made initially to the form that is acceptable by the
machine Learning algorithms and produce the output with maximum accuracy.
Preprocessing of data is necessary for a NMT dataset in which Tokenization followed by
normalization is performed to produce better output and the best accuracy.
Tokenization:
It is the process of splitting the given dataset to tokens.
Normalization:
It is the process in which the dataset is tested for redundancy and repetition of sentences which
affect the training the system and decrease Accuracy.
The below is the literature survey made on preprocessing in NMT on different languages.

Literature Survey:
1. There are many challenges in machine translation for Indian languages which include
the size of parallel corpora ,differences amongst languages, mainly word order
differences due to syntactical difference are two of the major challenges. Indian
languages suffer both of these problems, especially when they are being translated from
English. Applied NMT for English-Tamil language pair. proposed a novel neural
machine translation technique using word-embedding along with Byte-Pair-Encoding
(BPE) to develop an efficient translation system that overcomes the OOV (Out Of
Vocabulary) problem for languages which do not have much translations available
online.
2. Neural systems are compared with traditional phrase-based systems using various
parallel corpora including UN, ISI and Ummah. The corpus UN is an obvious choice
for many researchers and is used in these experiments. It is composed of parliamentary
documents of the United Nations since 1990 for Arabic, Chinese, English, French,
Russian, and Spanish. In this experiments, we applied Arabic preprocessing ,which
includes normalization and tokenization. to see its impact on both SMT and NMT
systems. We used Farasa (Abdelali et al., 2016), a fast Arabic segmentor.
3. a study on word level machine translation on English-German Translation is done.
Unlike traditional systems, Neural Machine Translation (NMT) systems learn the
parameters of the model and require only minimal preprocessing. Memory and time
constraints allow to take only a fixed number of words into account, which leads to the
out-of-vocabulary (OOV) problem.
4. In this system description paper, they report details of training neural machine
translation with multi-source Romance languages with the Transformer model and in
the evaluation frame of the biomedical WMT 2018 task. Using multi-source languages
from the same family allows improvements of over 6 BLEU points.
5. We build Neural Machine Translation(NMT) systems for English-Hindi,Bengali-Hindi
and Gujarati-Hindi with two different units of translation i.e. word and sub word and
present a comparative study of sub word NMT and word level NMT systems, along
with strong results and case studies. We train attention-based encoder-decoder model
for word level and use Byte Pair Encoding(BPE) in sub word NMT for word
segmentation.We conduct case studies to study the effects of BPE.'
6. In this paper, They used a novel sequence encoding model named as feedforward
sequential memory networks (FSMNs) [16][17][18] to replace the RNNs model in both
encoder and decoder module of the end-to-end framework. The FSMN model is a
standard feedforward neural network with single or multiple memory blocks in hidden
layers and can learn long term dependency in language model [16] as well as speech
recognition [17]. On account of the ability of memorizing the context information of a
word, FSMN is used as encoder model. They also modified the attention module so that
FSMN can be employed as decoder model and generates output symbols
simultaneously during training. The FSMN based encoder-decoder model can be
trained fast because of no recurrent connections.(2017-govind)
7. Proposed a hierarchical RNN encoder that models the input sentence in a hierarchical
structure. This is the first attempt to explore the sources ide word-clause-sentence
hierarchical structure for NMT. Based on the hierarchical structure modeled by our
encoder, They incorporate two types of attentional mechanism into decoder, and thus
different scopes of contexts can be distinguished and exploited for NMT. Experimental
results show that model significantly outperforms several state-of-the-art models in
machine translation.(2018 - govind)
8. Sentence-level context to model source topic information, and design a topic attention
to integrate these learned latent topic representations into the existing NMT architecture
for improving target word prediction. They first represent source topic information over
a source sentence as latent topic representations (LTRs) by a variant of convolutional
neural networks(CNNs)[11].A topic attention, which is then learned based on word
context and topic context, is used to compute an additional topic context vector for
predicting target words. Compared with the approach of Zhang et al. [6], Main
contributions are: 1) This work focuses on learning a novel independent sequence of
sentence-level context representations to capture latent topic information for predicting
target words. 2) The learned latent topic representations can be estimated together with
the existing NMT architecture by an encoder with a convolutional neural network
instead of being pretrained.(2019 - govind)

Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen - Li
No ratings yet
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen - Li
272 pages
MB Actros Parameterizable Special Module (PSM) Function Description and Parameterization
100% (11)
MB Actros Parameterizable Special Module (PSM) Function Description and Parameterization
182 pages
Thesis Amended
No ratings yet
Thesis Amended
157 pages
Unit 3 - Operating System - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Operating System - WWW - Rgpvnotes.in
38 pages
AACL Machine Translation Tutorial 2023
No ratings yet
AACL Machine Translation Tutorial 2023
145 pages
Group19 Eee Paper
No ratings yet
Group19 Eee Paper
23 pages
12007-Article (PDF) - 24616-1-10-20201002
No ratings yet
12007-Article (PDF) - 24616-1-10-20201002
76 pages
Unit 6
No ratings yet
Unit 6
26 pages
Tianzheng Troy Wang CIS498EAS499 Submission
No ratings yet
Tianzheng Troy Wang CIS498EAS499 Submission
51 pages
Unit - 3
No ratings yet
Unit - 3
55 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
Neural Machine Translation A Review of Methods Resources and - 2020 - AI Ope
No ratings yet
Neural Machine Translation A Review of Methods Resources and - 2020 - AI Ope
17 pages
Lecture 12 Pretraining
No ratings yet
Lecture 12 Pretraining
46 pages
1102AITA04 AI For Text Analytics
No ratings yet
1102AITA04 AI For Text Analytics
88 pages
NLP DL Lecture4
No ratings yet
NLP DL Lecture4
78 pages
AMMUS: A Survey of Transformer-Based Pretrained Models in Natural Language Processing
No ratings yet
AMMUS: A Survey of Transformer-Based Pretrained Models in Natural Language Processing
42 pages
Bert
No ratings yet
Bert
60 pages
Machine Translation Wise 2016/2017
No ratings yet
Machine Translation Wise 2016/2017
58 pages
2503 06594v1-LaMaTE
No ratings yet
2503 06594v1-LaMaTE
36 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
Question Answering System: 296: Natural Language Processing
No ratings yet
Question Answering System: 296: Natural Language Processing
30 pages
Transformers in NLP 1
No ratings yet
Transformers in NLP 1
9 pages
Trend
No ratings yet
Trend
47 pages
The State-Of-Art Applications of NLP: Evidence From ChatGPT
No ratings yet
The State-Of-Art Applications of NLP: Evidence From ChatGPT
7 pages
Lang Gragh
No ratings yet
Lang Gragh
14 pages
Quinn Thesis Final On NMT
No ratings yet
Quinn Thesis Final On NMT
29 pages
6-Bert T5 GPT
No ratings yet
6-Bert T5 GPT
31 pages
Exploring The Efficacy of LSTM Networks in Machine Translation: A Survey of Techniques and Applications
No ratings yet
Exploring The Efficacy of LSTM Networks in Machine Translation: A Survey of Techniques and Applications
11 pages
English-to-Malayalam Machine Translation Framework Using Transformers
No ratings yet
English-to-Malayalam Machine Translation Framework Using Transformers
5 pages
1 s2.0 S2095809922006324 Main
No ratings yet
1 s2.0 S2095809922006324 Main
20 pages
Machine Translation Using Natural Language Process
No ratings yet
Machine Translation Using Natural Language Process
6 pages
The Development of Language AI Models in 2018
No ratings yet
The Development of Language AI Models in 2018
5 pages
A Comparison of LSTM and BERT For Small Corpus: Aysu Ezen-Can SAS Inst. September 14, 2020
No ratings yet
A Comparison of LSTM and BERT For Small Corpus: Aysu Ezen-Can SAS Inst. September 14, 2020
12 pages
Duan 2020
No ratings yet
Duan 2020
6 pages
Polynomial Expansion Paper
No ratings yet
Polynomial Expansion Paper
4 pages
From Recurrent Neural Network Techniques To Pre-Trained Models: Emphasis On The Use in Arabic Machine Translation
No ratings yet
From Recurrent Neural Network Techniques To Pre-Trained Models: Emphasis On The Use in Arabic Machine Translation
10 pages
Bert
No ratings yet
Bert
20 pages
Rishabh Sharma (Anantika Johari)
No ratings yet
Rishabh Sharma (Anantika Johari)
8 pages
Introduction
No ratings yet
Introduction
3 pages
359 1632 1 PB
No ratings yet
359 1632 1 PB
5 pages
Multi-Model Neural Machine Translation: B. Nikitha, K. Bhanu Prakash, M. Sravanthi Suma, M. Kavya Srihitha
No ratings yet
Multi-Model Neural Machine Translation: B. Nikitha, K. Bhanu Prakash, M. Sravanthi Suma, M. Kavya Srihitha
2 pages
Information 14 00242
No ratings yet
Information 14 00242
17 pages
Generative AI in The Era of Transformers
No ratings yet
Generative AI in The Era of Transformers
8 pages
Large-Scale News Classification Using BERT Languag
No ratings yet
Large-Scale News Classification Using BERT Languag
9 pages
Bert
No ratings yet
Bert
10 pages
A Comprehensive Survey On Applications of Transformers For Deep Learning Tasks
No ratings yet
A Comprehensive Survey On Applications of Transformers For Deep Learning Tasks
58 pages
Definition:: Large Language Models (LLMS)
No ratings yet
Definition:: Large Language Models (LLMS)
41 pages
Assignment 05 CL
No ratings yet
Assignment 05 CL
3 pages
Slide
No ratings yet
Slide
28 pages
Paper Review
No ratings yet
Paper Review
6 pages
Bert Ayman
No ratings yet
Bert Ayman
5 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
Transformers: State-of-the-Art Natural Language Processing
No ratings yet
Transformers: State-of-the-Art Natural Language Processing
8 pages
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
Overview of The Transformer-Based Models For NLP Tasks
No ratings yet
Overview of The Transformer-Based Models For NLP Tasks
5 pages
Preprint Jesus
No ratings yet
Preprint Jesus
2 pages
Introduction To Good Manners and Right Conduct
100% (1)
Introduction To Good Manners and Right Conduct
15 pages
ChatGPT KZ Feb2023 PDF
No ratings yet
ChatGPT KZ Feb2023 PDF
7 pages
Himbro - Honey London
No ratings yet
Himbro - Honey London
140 pages
SNEHA JADHAV Projects........... 2000
No ratings yet
SNEHA JADHAV Projects........... 2000
84 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
Reliability: Supplement Outline
No ratings yet
Reliability: Supplement Outline
19 pages
Case Study Journey of Royal Enfield From Dusk To Down Along With Various Strategies Adopted
No ratings yet
Case Study Journey of Royal Enfield From Dusk To Down Along With Various Strategies Adopted
4 pages
The Teaching Profession 2
No ratings yet
The Teaching Profession 2
11 pages
FEA-Academy Course On-Demand - Practical Basic FEA
No ratings yet
FEA-Academy Course On-Demand - Practical Basic FEA
35 pages
Amapl - SS316L - Dia 100 MM - HT - 24SL1214 - 2596.000 Kgs.
No ratings yet
Amapl - SS316L - Dia 100 MM - HT - 24SL1214 - 2596.000 Kgs.
4 pages
Functional Level Strategy of Starbucks
No ratings yet
Functional Level Strategy of Starbucks
25 pages
Topic/ Lesson: Communicative Style
No ratings yet
Topic/ Lesson: Communicative Style
8 pages
Afm-Kiosk Deign Criteria 2021-R2
No ratings yet
Afm-Kiosk Deign Criteria 2021-R2
27 pages
5th Grade Colonial Village Unit Plan
100% (1)
5th Grade Colonial Village Unit Plan
25 pages
Cell Organelle Chart-1
No ratings yet
Cell Organelle Chart-1
4 pages
AoR-2020 (MGNREGA), Vol-I
No ratings yet
AoR-2020 (MGNREGA), Vol-I
135 pages
Mckay Denise 222105489 Epm742 At1 3
No ratings yet
Mckay Denise 222105489 Epm742 At1 3
16 pages
PL MEHTA Demand Forecasting
No ratings yet
PL MEHTA Demand Forecasting
12 pages
Second Quarter Physical Education: Masbate National Comprehensive High School
No ratings yet
Second Quarter Physical Education: Masbate National Comprehensive High School
11 pages
Xi Ümumi̇ Sinaq - 4
No ratings yet
Xi Ümumi̇ Sinaq - 4
3 pages
Motors AC
No ratings yet
Motors AC
5 pages
MC 10161751 9999
No ratings yet
MC 10161751 9999
3 pages
Lesson Plan 10 Transactional Letters
No ratings yet
Lesson Plan 10 Transactional Letters
3 pages
Qadaqadar PDF
No ratings yet
Qadaqadar PDF
4 pages
Anas Enterprise
No ratings yet
Anas Enterprise
6 pages
Installation of NS2
No ratings yet
Installation of NS2
3 pages
m2l12 PDF
No ratings yet
m2l12 PDF
8 pages
December 2 Flier Final-NEW PDF
No ratings yet
December 2 Flier Final-NEW PDF
1 page
SM T311 - Direy 6
No ratings yet
SM T311 - Direy 6
3 pages
Digvijay Singh
No ratings yet
Digvijay Singh
2 pages
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural

Uploaded by

Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural

Uploaded by

Problem Statement:

1. BERT Base: Number of Transformers layers = 12, Total Parameters = 110M

Limitations of the Transformer

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.