0% found this document useful (0 votes)

17 views15 pages

Language Detection & Translation

Uploaded by

mayurkshirsat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views15 pages

Language Detection & Translation

Uploaded by

mayurkshirsat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

LANGUAGE

DETECTION &
TRANSLATION

Nikita Chorge 23070243006

Language detection & translation 2

CONTENTS

Problem Statement
Dataset
Methodology
Results
Conclusion
Language detection & translation 3

PROBLEM STATEMENT

In the modern interconnected world where information effortlessly traverses through borders
and language divides the need for automatic language detection and translation systems has
become important. Our task is to create a system that can easily identify the language of any
given text input and translate it to English. With the ability to handle a variety of languages or
system aims to bridge linguistic gaps enabling effective communication and collaboration
across diverse cultural landscapes.
Language detection & translation 4
• The dataset contains total 22 languages, each
represented by 1000 text samples.
• It covers a diverse range of languages from

DATASET •
various regions and linguistic families.
The dataset includes languages such as
English, Russian, Spanish, Hindi, Chinese,
French, Portuguese, Urdu, Arabic, and others.
• Each language is represented by an equal
number of samples, to ensure balanced
distribution across classes.
• All languages are equally represented in the
dataset, with each language class having
exactly 1000 samples.
• This balanced distribution facilitates unbiased
model training and evaluation across different
languages.
Language detection & translation 5

METHODOLOGY

Vectorization
• TF-IDF(Term Frequency- Inverse Document Frequency).
• This technique considers frequency of a word in a document and also its frequency across other
documents.
• Assigns higher weights to frequently used words in a document
text data  numerical feature vectors
Language detection & translation 6

METHODOLOGY

Model 1- Multinomial Naïve Bayes

• The Multinomial Naive Bayes classifier is a probabilistic classifier based on Bayes' theorem
• MNB classifier calculates the probability that a given text belongs to each language class based on the frequencies
of words in the document.
• The classifier uses the training data to estimate the likelihood probabilities and prior probabilities for each language.
• On passing new input the TF-IDF vectorizer transforms it into a numerical feature vector.
• The Multinomial Naive Bayes classifier then calculates the probability of the document belonging to each language
class based on the feature vector.
Language detection & translation 7

METHODOLOGY

Model 2- RandomForestClassifier
• RandomForestClassifier builds a forest of decision trees using the TF-IDF feature vectors as input and the
language labels as targets.
• Each decision tree in the forest is trained on a bootstrap sample of the training data, with replacement.
• At each node of the tree, a random subset of features is considered for splitting, helping to reduce correlation
between trees and improve model diversity.
• Given a new text document, after vectorization the RandomForestClassifier aggregates the predictions from all
decision trees in the forest and assigns the final predicted language based on majority voting or averaging of class
probabilities.
Language detection & translation 8

METHODOLOGY

Model 3- LogisticRegression
• Logistic Regression calculates the probability that the text belongs to each language class using the learned
weights and bias terms.
• The final predicted language is typically the one with the highest predicted probability.
• While both MNB and Logistic Regression can perform well on text classification tasks Multinomial Naive Bayes is
great for big text datasets because it's fast and works well even when there are lots of different words. But if you
need to understand more complicated patterns in the data, Logistic Regression might be better because it can
handle different kinds of features and relationships between them.
Language detection & translation 9

RESULTS
Language detection & translation
Accuracy score 10

• Accuracy score on training data:

0.9840909090909091

MULTINOMIALNB • Accuracy score on testing data:

0.9571212121212122
Language detection & translation
Accuracy score 11

• Accuracy score on training data:

RANDOM
1.0
• Accuracy score on testing data:

FOREST 0.9225757575757576
Language detection & translation
Accuracy score 12

• Accuracy score on training data:

LOGISTIC
0.9864935064935065
• Accuracy score on testing data:

REGRESSION 0.9540909090909091
Language detection & translation 13

COMPARISON
Language detection & translation 14

CONCLUSION
In conclusion, our language detection and translation project aimed to automatically identify the language of a given text input and translate it
into English. We experimented with three different machine learning models: Multinomial Naive Bayes (MNB), Logistic Regression, and Random
Forest. However, despite our efforts, none of these models were able to provide accurate predictions for all languages. This indicates that the
problem at hand may require more sophisticated techniques beyond traditional machine learning approaches.

Considering the limitations encountered with the existing models, it is evident that Natural Language Processing (NLP) techniques could offer a
more effective solution. NLP methods, such as deep learning models like recurrent neural networks (RNNs) or transformer-based architectures
like BERT, have demonstrated superior performance in language-related tasks, including language identification and translation. These models
can learn complex linguistic patterns and relationships, capturing details that traditional machine learning models may struggle with.
THANK
YOU

Nikita Chorge
Prn: 23070243006

Language Detection Using Natural Language Processing: Abstract
No ratings yet
Language Detection Using Natural Language Processing: Abstract
11 pages
PROJECT REPORT For Machine Learning
No ratings yet
PROJECT REPORT For Machine Learning
22 pages
PROJECT REPORT For Machine Learning
100% (1)
PROJECT REPORT For Machine Learning
22 pages
Cettolo & Als, On The Development of Customized Neural Machine Translation Models (FBK Paper 30)
No ratings yet
Cettolo & Als, On The Development of Customized Neural Machine Translation Models (FBK Paper 30)
7 pages
Mod 1
No ratings yet
Mod 1
71 pages
Thesis PDF
No ratings yet
Thesis PDF
131 pages
Linguistic Input Features Improve Neural Machine Translation
No ratings yet
Linguistic Input Features Improve Neural Machine Translation
9 pages
What Kind of Language Is Hard To Language-Model?
No ratings yet
What Kind of Language Is Hard To Language-Model?
15 pages
Survey Paper CLSA
No ratings yet
Survey Paper CLSA
5 pages
Ltrc25 Nlpscale Indic Dec24
No ratings yet
Ltrc25 Nlpscale Indic Dec24
18 pages
NLP M5 Part-2 SPP
No ratings yet
NLP M5 Part-2 SPP
62 pages
Language Detection Using Machine Learning: K Prem Kumar, T Sri Vinay, S Sai Aasritha, P Vasantha
No ratings yet
Language Detection Using Machine Learning: K Prem Kumar, T Sri Vinay, S Sai Aasritha, P Vasantha
4 pages
Multilingual Mysteries The Art of Automated Language Identification
No ratings yet
Multilingual Mysteries The Art of Automated Language Identification
6 pages
Natural Language Processing For Language Translation
No ratings yet
Natural Language Processing For Language Translation
23 pages
YLC Dissertation
No ratings yet
YLC Dissertation
91 pages
Unsupervised Language Identification in The Wild: Anonymous Submission
No ratings yet
Unsupervised Language Identification in The Wild: Anonymous Submission
6 pages
Machine Learning in Translation Corpora Processing
No ratings yet
Machine Learning in Translation Corpora Processing
281 pages
Language Models and Application of Natural Language Processing
No ratings yet
Language Models and Application of Natural Language Processing
70 pages
Language Model Evaluation in Open-Ended Text Gener
No ratings yet
Language Model Evaluation in Open-Ended Text Gener
70 pages
Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers
No ratings yet
Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers
7 pages
Spanish - Portuguese SMT
100% (2)
Spanish - Portuguese SMT
10 pages
The FLoRes Evaluation Datasets For Low-Resource Machine Translation Nepali-English and Sinhala-English
No ratings yet
The FLoRes Evaluation Datasets For Low-Resource Machine Translation Nepali-English and Sinhala-English
14 pages
Tree Based Statistical Machine Translati
No ratings yet
Tree Based Statistical Machine Translati
15 pages
Analysing The Impact of Linguistic Features On Cross-Lingual Transfer
No ratings yet
Analysing The Impact of Linguistic Features On Cross-Lingual Transfer
10 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
FN Paper 2
No ratings yet
FN Paper 2
13 pages
Termpaper
No ratings yet
Termpaper
6 pages
The Unreasonable Effectiveness of Data PDF
No ratings yet
The Unreasonable Effectiveness of Data PDF
5 pages
Ed3book (001 282)
No ratings yet
Ed3book (001 282)
282 pages
CONNEAU and Lample - 2019 - Cross-Lingual Language Model Pretraining
No ratings yet
CONNEAU and Lample - 2019 - Cross-Lingual Language Model Pretraining
11 pages
Corporate Accounting (2024)
No ratings yet
Corporate Accounting (2024)
5 pages
Paper Review
No ratings yet
Paper Review
41 pages
Sanchez Martinez11a
No ratings yet
Sanchez Martinez11a
12 pages
Language Detection For Global Communication
No ratings yet
Language Detection For Global Communication
10 pages
The Unreasonable Effectiveness of Data by Halevy, Norvig
No ratings yet
The Unreasonable Effectiveness of Data by Halevy, Norvig
5 pages
Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing
No ratings yet
Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing
14 pages
Ed 3 Book
No ratings yet
Ed 3 Book
577 pages
LLM AI4Bharath
No ratings yet
LLM AI4Bharath
101 pages
Multilingual Machine Translation With Large Language Models: Empirical Results and Analysis
No ratings yet
Multilingual Machine Translation With Large Language Models: Empirical Results and Analysis
14 pages
View
No ratings yet
View
99 pages
Identification of Indian Languages Using Naïve Bayes, CNN, LSTM, and HMM
No ratings yet
Identification of Indian Languages Using Naïve Bayes, CNN, LSTM, and HMM
7 pages
Facebook AI's WMT20 News Translation Task Submission
No ratings yet
Facebook AI's WMT20 News Translation Task Submission
14 pages
ASWIN TS Unit 3 NLP Translations Gen AI
No ratings yet
ASWIN TS Unit 3 NLP Translations Gen AI
5 pages
Lan Detecttion For Translit Text
No ratings yet
Lan Detecttion For Translit Text
4 pages
NLP Sem Unit 5
No ratings yet
NLP Sem Unit 5
9 pages
Cross-Lingual Language Model Pretraining PDF
No ratings yet
Cross-Lingual Language Model Pretraining PDF
10 pages
Electronics 14 00243
No ratings yet
Electronics 14 00243
30 pages
Ima 2000
No ratings yet
Ima 2000
56 pages
Speech and Language Processing - J&M
No ratings yet
Speech and Language Processing - J&M
599 pages
Sample
No ratings yet
Sample
8 pages
Advances in Natural Language Processing
No ratings yet
Advances in Natural Language Processing
7 pages
Learning Translation Rules From Bilingual English - Filipino Corpus
No ratings yet
Learning Translation Rules From Bilingual English - Filipino Corpus
10 pages
Neural Machine Translation For English-Tamil: Himanshu Choudhary Aditya Kumar Pathak
No ratings yet
Neural Machine Translation For English-Tamil: Himanshu Choudhary Aditya Kumar Pathak
7 pages
Challenges in NMT - 1907.05019
No ratings yet
Challenges in NMT - 1907.05019
27 pages
Multilinguality
No ratings yet
Multilinguality
10 pages
Ed3book PDF
No ratings yet
Ed3book PDF
621 pages
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Natural Language Processing with NLTK: Definitive Reference for Developers and Engineers
From Everand
Natural Language Processing with NLTK: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide
From Everand
Advanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide
Adam Jones
No ratings yet
Assembly Language: From Basics to Expert Proficiency
From Everand
Assembly Language: From Basics to Expert Proficiency
William Smith
No ratings yet
Classification Algorithm
No ratings yet
Classification Algorithm
51 pages
Introduction To Sentiment Analysis PDF
No ratings yet
Introduction To Sentiment Analysis PDF
32 pages
Naive Bayes in Focus: A Thorough Examination of Its Algorithmic Foundations and Use Cases
No ratings yet
Naive Bayes in Focus: A Thorough Examination of Its Algorithmic Foundations and Use Cases
4 pages
Machine Learning
100% (5)
Machine Learning
35 pages
Chi-Square Feature Selection Effect On Naive Bayes Classifier Algorithm Performance For Sentiment Analysis Document
No ratings yet
Chi-Square Feature Selection Effect On Naive Bayes Classifier Algorithm Performance For Sentiment Analysis Document
8 pages
Spam Detection in Text Using Machine Learning 1
No ratings yet
Spam Detection in Text Using Machine Learning 1
85 pages
Ritesh Machine Learning Project
100% (9)
Ritesh Machine Learning Project
46 pages
Lab 12
No ratings yet
Lab 12
4 pages
Research Paper On Fake Online Reviews Detection Using Semi-Supervised and Supervised Learning
100% (1)
Research Paper On Fake Online Reviews Detection Using Semi-Supervised and Supervised Learning
9 pages
Internal Assessment Test-Iii Department of Computer Science & Engineering
No ratings yet
Internal Assessment Test-Iii Department of Computer Science & Engineering
2 pages
15-381 Spring 2007 Assignment 6: Learning
No ratings yet
15-381 Spring 2007 Assignment 6: Learning
14 pages
Project Report
No ratings yet
Project Report
13 pages
Spam Filtering Email Classification SFECM Using Gain and Graph Mining Algorithm
No ratings yet
Spam Filtering Email Classification SFECM Using Gain and Graph Mining Algorithm
6 pages
Online School Sentiment Analysis in Indonesia On Twitter Using The Naïve Bayes Classifier and Rapid Miner Tools
No ratings yet
Online School Sentiment Analysis in Indonesia On Twitter Using The Naïve Bayes Classifier and Rapid Miner Tools
4 pages
ML Notes (Unit 1&2)
No ratings yet
ML Notes (Unit 1&2)
42 pages
Aimlsyll Removed
No ratings yet
Aimlsyll Removed
13 pages
C15 Documentation
No ratings yet
C15 Documentation
75 pages
An Efficient Machine Learning Approach For Diagnosing Parkinson's Disease by Utilizing Voice Features
No ratings yet
An Efficient Machine Learning Approach For Diagnosing Parkinson's Disease by Utilizing Voice Features
20 pages
Chatbot For Education System
No ratings yet
Chatbot For Education System
6 pages
ML QB Final
No ratings yet
ML QB Final
16 pages
DM Ch6 (Classification and Prediction)
No ratings yet
DM Ch6 (Classification and Prediction)
39 pages
Unit-3 ML Mech 3-2
No ratings yet
Unit-3 ML Mech 3-2
16 pages
B.E Cse Batchno 214
No ratings yet
B.E Cse Batchno 214
47 pages
1 s2.0 S0034425717302821 Main
No ratings yet
1 s2.0 S0034425717302821 Main
15 pages
Basic Concepts of Probability
No ratings yet
Basic Concepts of Probability
36 pages
Technologies 09 00052 v3
No ratings yet
Technologies 09 00052 v3
17 pages
Depression Detection in Social Media A Comprehensive Review of Machine Learning and Deep Learning Techniques
No ratings yet
Depression Detection in Social Media A Comprehensive Review of Machine Learning and Deep Learning Techniques
30 pages
Age and Gender Prediction in Open Domain Text
No ratings yet
Age and Gender Prediction in Open Domain Text
8 pages
Module 2
No ratings yet
Module 2
53 pages
Praktikumsbericht MMM Consulting GMBH
No ratings yet
Praktikumsbericht MMM Consulting GMBH
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Language Detection & Translation

Uploaded by

Language Detection & Translation

Uploaded by

LANGUAGE

Nikita Chorge 23070243006

Model 1- Multinomial Naïve Bayes

• Accuracy score on training data:

MULTINOMIALNB • Accuracy score on testing data:

• Accuracy score on training data:

• Accuracy score on training data:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.