0% found this document useful (0 votes)

8 views7 pages

Word Embedding Methodsof Text Processing

Uploaded by

bob60fighter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views7 pages

Word Embedding Methodsof Text Processing

Uploaded by

bob60fighter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/369071756

Word Embedding Methods of Text Processing in Big Data: A Comparative

Study

Conference Paper · March 2023

DOI: 10.1007/978-3-031-26254-8_121

CITATION READS

1 152

2 authors, including:

Lahcen Idouglid
Ibn Tofaïl University
7 PUBLICATIONS 6 CITATIONS

SEE PROFILE

All content following this page was uploaded by Lahcen Idouglid on 21 March 2023.

The user has requested enhancement of the downloaded file.

Word Embedding Methods of Text Processing
in Big Data: A Comparative Study

Lahcen Idouglid(B) and Said Tkatek

Computer Sciences Research Laboratory, Faculty of Sciences, IBN Tofail University, Kenitra,
Morocco
{lahcen.idouglid,said.tkatek}@uit.ac.ma

Abstract. One of the biggest challenges any NLP data scientist faces is choos-
ing the best numeric/vector representation of a text string for running a machine
learning model. This research paper provides a comprehensive study of big data
and its impact on improving the performance of word embedding techniques for
text processing. Therefore, we propose a method for text processing using word
embedding techniques and machine learning (ML) algorithms to improve the per-
formance of analyzing data and decision-making. For this reason, it is possible
to use several word embedding methods for text processing, especially the most
popular ones like CountVectoriser, TF-IDF, and HashingVectorizer, and combine
them with ML algorithms like decision trees, random forest classifiers, and logistic
regression for implemented text Regression and other supervised machine learn-
ing algorithms are combined with classification. Compared with recent work, our
comparative study shows the impact of dataset size on the performance of text
classification algorithms and gives good results.

Keywords: Word Embedding · NLP · Big Data · Machine Learning

1 Introduction
The big data term refers to vast, complex, and real-time data that necessitates sophis-
ticated management, analytical, and processing approaches to extract management
insights. There are a little of people who can think critically about big data problems
and who have the skills and knowledge to tackle big data problems [14].
Machine learning (ML) is a science focused on understanding and developing learn-
ing algorithms [13]. It is considered to be a component of artificial intelligence [7].
Without being expressly taught to do so, machine learning algorithms create a model
using sample data, also referred to as training data, to make predictions or judgments.
The study of theories and techniques that enable information exchange between
humans and computers through natural language is known as “natural language pro-
cessing" [9]. NLP combines linguistics, computer science, and mathematics [2]. NLP
is an area of AI that aids computers in manipulating and interpreting human languages.
Text mining is a technique used in NLP to extract relevant information from text. The
goal of NLP is to glean knowledge from natural language. One of the most common

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023

Y. Farhaoui et al. (Eds.): ICAISE 2022, LNNS 635, pp. 831–836, 2023.
https://doi.org/10.1007/978-3-031-26254-8_121
832 L. Idouglid and S. Tkatek

document vocabulary representations is word embedding. It can record a document’s

context, the semantic and syntactic similarity of its words, their relationships to one
another, etc.[2].
The paper is organized as follows. Section 2 contains related work Sect. 3 discusses
methodologies and the system architecture, we present the word embedding techniques
and algorithms classifiers used in this work. The experimental results are discussed in
Sect. 4. Finally, the conclusion is presented in Sect. 5.

2 Related Work
This section introduces related work from methods of evaluating word embeddings, and
existing studies of evaluating embeddings in downstream tasks.
The performance of word embedding approaches was compared, examined, and eval-
uated using machine learning algorithms in a Turkish sentiment analysis study that was
conducted using a dataset derived from user comments shared on multiple shopping web-
sites in recent years. This study will serve as our starting point for the discussion of word
embeddings [3]. The second study named “BnVec: Towards the Development of Word
Embedding for Bangla Language Processing” approaches for Bengali word embedding.
Six well-known word embedding techniques CountVectorizer, TF-IDF, Hash vectorizer,
Word2vec, fastText, and GloVe are included in the first one, which highlights their well-
known functionality [9]. Various qualitative and quantitative tests were conducted on a
few tasks to show each technique’s ability to infer word proximity and, in addition, to
examine how well it performs in comparison to the other word embedding techniques [9].
The third work is about Sentiment analysis on film review in Gujarati language using
machine learning this paper is a comparative study between TF-IDF vectorizer and
CountVectorizer features after applying sentiment analysis [12]. Comparing the results
of two different machine learning algorithms based on Accuracy, Recall, Precision, and
F-score performance parameter. The last work cited is “Measuring associational thinking
through word embeddings” it’s an investigating various way of incorporating existing
embeddings to decide the semantic or non-semantic acquainted strength between words
so relationship with human decisions can be augmented [11].

3 Methodology
3.1 System Architecture
There are four steps for the text classification process and initial steps regarding col-
lecting and preparing the datasets. Preprocessing techniques play an important role to
improve the performance of the models. Three key steps of data preprocessing namely,
tokenization, stop words removal and stemming. The technique of tokenizing involves
separating a stream of text into recognized tokens, such as words, phrases, symbols, or
other practical elements. Tokenization’s objective is to analyze each word in a statement.
Stemming is a technique for getting a word’s numerous forms to look similar to its stems.
Bag of Words is one of the most popular methods. It is a text representation that indicates
the order of words in a text. After Stemming and Lemmatization, the step that becomes
is the division of the Data Set into Train Set and Test Set.
Word Embedding Methods of Text Processing in Big Data 833

3.2 Word Embedding Techniques:

TfidfVectorizer: Each word is represented as a weighted vector of the terms discovered
in the super vector after the most significant terms in the super vector have been selected.
Each document assigns a weight to each word. It determines the significance of a word
in a corpus document. It’s calculated by multiplying the term frequency (TF) by the
document frequency’s inverse (IDF). TF counts the number of times a phrase appears
in a document, while IDF counts the phrase’s significance concerning the corpus as a
whole [1].

TF − IDF = TF(t) ∗ IDF(t) (1)

CountVectorizer: Count Vector is a simple yet incredibly effective method used in

language processing [9]. The number of information components (N) and the number
of unit components (M) present in the information components are combined to create
an N * M grid [6]. The frequency of the unit elements supplied in each data component
serves as a representation of that component [9]. Text is transformed into a vector by
marking the presence (1) or absence (0) of a word of a given input [12].
HashingVectorizer: The hashing trick is used by the hashing vectorizer to identify
the mapping from the token string name to the feature integer index. This vectorizer
converts text documents into matrices by creating sparse matrices out of the collection
of documents that contain the token occurrence counts [5].

3.3 Text Classifiers

In this paragraph, we present the machine learning algorithms implemented in our work.
Decision tree Classifier: A decision tree is an induction approach that has been applied
to a variety of classification problems. It is based on separating features and determining
their worth [4]. The splitting procedure continues until each branch can only have one
classification labeled on it. It generates decision trees from random data samples, assigns
expectations to each tree, and chooses the best solution.
Random Forest Classifier: The Random Forest (RF) decision tree ensemble is a well-
known decision tree ensemble that is often utilized in categorization. The popularity of
RF stems from its superior performance when compared to other classification methods
[8].
Logistic regression Classifier: Is one of the most often used classification techniques.
It’s employed in a variety of fields since it’s easy to understand and the results are
interpretable, allowing for what-if scenarios. It’s a classification method based on the
Bayes Theorem and the assumption of predictor independence [10].

4 Results and Discussion

In this experiment, we evaluated the classification results, based on standard evaluation
metrics of accuracy that were used to compare the state-of-the-art three machine learning
834 L. Idouglid and S. Tkatek

algorithms, namely Decision Tree, Random Forest classifier, Logistic Regression, and
their compatibility with three methods of Word Embedding, namely Tf-IDF Vectorizer,
ContVectorizer, and HashingVectorizer. In the first part of this comparative study, we
will see the evolution of the accuracy of the algorithms regarding the size of the dataset,
for this, we will take parts of our example dataset (2000, 5000, 20000, 50000, 150000
inputs) from my dataset. And we will present all this in graphs.
The Dataset was downloaded from GitHub “Large Question Answering Datasets”;
All experiments are realized, executed and tested on Google Collaboratory.

Table 1. Comparative Accuracy of ML algorithms with word embedding techniques for Text
Processing

ML Dataset Size Word Embedding Methods

Algorithms Tf-idf Vectorizer ContVectorizer Hashing Vectorizer
Decision 2000 82 82 97,5
Tree 5000 82,6 82,6 98,1
20000 84,42 84,42 97,88
50000 85,47 85,48 97,77
150000 80,26 80,26 97,47
Random 2000 82 82 98,5
Forest 5000 82,50 82,50 98,1
Classifier
20000 84,45 84,48 98,32
50000 85,32 85,33 98,32
150000 80,26 80,60 98,02
Logistic 2000 78,75 82 97
Regression 5000 77,70 81,70 97,20
20000 78,85 82,82 97,75
50000 80,12 83,86 97,7
150000 78,76 70,59 97,49

This table has shown the results of the comparative study used the most famous
machine learning algorithms with such us Decision Tree, Random Forest Classifier and
Logistic Regression with Tf-idf Vectorizer, ContVectorizer and Hashing Vectorizer word
embedding methods. The evaluation process is conducted using evaluation parameters
such as accuracy. The Hashing Vectorizer word embedding method archived the highest
scores 98,5% in term of accuracy in Random Forest Classifier ML algorithm with dataset
size 2000.
The graphical representation (a) shows the evolution of the accuracy of the Tfid-
fVectorizer method with the number of entries in the training DataSet. As the graph
shows, we conclude that the accuracy increases with increasing number of entries in the
Word Embedding Methods of Text Processing in Big Data 835

Fig. 1. Performance comparison for word embedding methods

training dataset and we arrive at the best result when the size of our training dataset is
5000 entries with the best performance obtained for this model 85%.
The second graph named (b) shows CountVectorizer performance evolution with
increasing dataset size, we conclude that the accuracy of CountVectorizer evolves and
increases with the increase in the number of inputs in the training Dataset, but there is a
degradation when the Dataset size is 50000 with the best performance achieved for this
model 85,47%. And the best algorithm that gives good result are DT and RF.
As shown in the graph (c) that presente HashingVectorizer performance evolution
with increasing dataset size, we conclude that the accuracy evolves and increases with
the increase in the number of inputs in the Training Dataset and we arrive at the best
result when the size of our training Dataset is 15000 inputs with the best performance
achieved for this model 98%. And this method HashingVectorizer work efficacelly with
all algorithms tested.

5 Conclusion

This work presents the result of a comparative study of Word Embedding Methods for
text processing as Tf-idf Vectorizer, ContVectorizer and Hashing Vectorizer.
The results show the impact and influence of big data and the size of training data on
the performance of machine learning algorithms with the three Word Embedding Meth-
ods. When more than the dataset size and larger the performance becomes higherThe
performance achieved for our model 98%, and the best method is HashingVectorizer
witch work efficacelly with all algorithms tested. Int the futures work we will research
the similarity semantic for text categorization using genetics algorithms in subsequent
study. We’ll also create a larger model for several languages.
836 L. Idouglid and S. Tkatek

References
1. Alammary, A.S.: Arabic Questions Classification Using Modified TF-IDF. IEEE Access 9,
95109–95122 (2021). https://doi.org/10.1109/ACCESS.2021.3094115
2. Al-Ansari K Survey on Word Embedding Techniques in Natural Language Processing. 7
3. Aydoğan, M.: Comparison of word embedding methods for Turkish sentiment classification.
21
4. Guezzaz, A., Benkirane, S., Azrour, M., Khurram, S.: A reliable network intrusion detection
approach using decision tree with enhanced data quality. Secur. Commun. Netw. 2021, 1–8
(2021). https://doi.org/10.1155/2021/1230593
5. Haque, F., Md Manik, M.H., Hashem, M.M.A.: Opinion mining from bangla and phonetic
bangla reviews using vectorization methods. In: 2019 4th International Conference on Electri-
cal Information and Communication Technology (EICT), Khulna, Bangladesh. IEEE, pp. 1–6
(2019)
6. Haque, R., Islam, N., Islam, M., Ahsan, M.M.: A comparative analysis on suicidal ideation
detection using NLP, machine, and deep learning. Technologies 10, 57 (2022). https://doi.
org/10.3390/technologies10030057
7. Hilbert, S., et al.: Machine learning for the educational sciences. Rev. Educ. 9(3), e3310https://
doi.org/10.1002/rev3.3310
8. Jain, A., Sharma, Y., Kishor, K.: Financial administration system using ML Algorithm. 10
9. Kowsher, M., et al.: BnVec: towards the development of word embedding for Bangla language
processing. IJET 10, 95 (2021). https://doi.org/10.14419/ijet.v10i2.31538
10. Mahesh, B.: Machine Learning Algorithms - A Review 9, 7 (2018)
11. Periñán-Pascual, C.: Measuring associational thinking through word embeddings. Artif. Intell.
Rev. 55(3), 2065–2102 (2021). https://doi.org/10.1007/s10462-021-10056-6
12. Shah, P., Swaminarayan, P., Patel, M.: Sentiment analysis on film review in Gujarati language
using machine learning. IJECE 12, 1030 (2022). https://doi.org/10.11591/ijece.v12i1.pp1
030-1039
13. Tkatek, S.: A hybrid genetic algorithms and sequential simulated annealing for a constrained
personal reassignment problem to preferred posts. IJATCSE 9, 454–464 (2020). https://doi.
org/10.30534/ijatcse/2020/62912020
14. Tkatek, S., Belmzoukia, A., Nafai, S., Abouchabaka, J., Ibnou-ratib, Y.: Putting the world
back to work: an expert system using big data and artificial intelligence in combating the
spread of COVID-19 and similar contagious diseases. WOR 67, 557–572 (2020). https://doi.
org/10.3233/WOR-203309

View publication stats

Unit-III NLP
No ratings yet
Unit-III NLP
15 pages
The Impact of Preprocessing On Word Embedding Quality: A Comparative Study
No ratings yet
The Impact of Preprocessing On Word Embedding Quality: A Comparative Study
35 pages
Agarwal, Resume Shortlisting and Ranking With Transformers
No ratings yet
Agarwal, Resume Shortlisting and Ranking With Transformers
12 pages
BPYDT&Pages 1&min SRC Count 5
No ratings yet
BPYDT&Pages 1&min SRC Count 5
1 page
Impact of Word Embedding Models On Text Analytics in Deep Learning Environment: A Review
No ratings yet
Impact of Word Embedding Models On Text Analytics in Deep Learning Environment: A Review
81 pages
Classification Survey
No ratings yet
Classification Survey
40 pages
Lect 5
No ratings yet
Lect 5
40 pages
Nlput-Unit2 Notes
No ratings yet
Nlput-Unit2 Notes
28 pages
2021 Sustainlp-1 0
No ratings yet
2021 Sustainlp-1 0
10 pages
Word Embeddings
No ratings yet
Word Embeddings
13 pages
14-Word Embeddings II
No ratings yet
14-Word Embeddings II
31 pages
Bert - Se: A P - L R M S E: RE Trained Anguage Epresentation Odel For Oftware Ngineering
No ratings yet
Bert - Se: A P - L R M S E: RE Trained Anguage Epresentation Odel For Oftware Ngineering
17 pages
NLP Basic - YL
No ratings yet
NLP Basic - YL
16 pages
IJISRT23DEC1110
No ratings yet
IJISRT23DEC1110
8 pages
He Laskar 2019
No ratings yet
He Laskar 2019
4 pages
Performance Evaluation of Word Embedding Algorithms
No ratings yet
Performance Evaluation of Word Embedding Algorithms
7 pages
Word Embadding
No ratings yet
Word Embadding
24 pages
Speech and Language Processing - J&M
No ratings yet
Speech and Language Processing - J&M
599 pages
From Word Vectors To Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models
No ratings yet
From Word Vectors To Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models
21 pages
Trend
No ratings yet
Trend
47 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
A Survey of Word Embeddings Based On Deep Learning: Shirui Wang Wenan Zhou Chao Jiang
No ratings yet
A Survey of Word Embeddings Based On Deep Learning: Shirui Wang Wenan Zhou Chao Jiang
24 pages
Word Embedding For Understanding Natural Language: A Survey: Yang Li Tao Yang
No ratings yet
Word Embedding For Understanding Natural Language: A Survey: Yang Li Tao Yang
13 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
Module03 Embeddings
No ratings yet
Module03 Embeddings
102 pages
Module III
No ratings yet
Module III
42 pages
Zhou 2020
No ratings yet
Zhou 2020
5 pages
Data Scientist Roadmap
From Everand
Data Scientist Roadmap
Mohammed Ahmed
5/5 (1)
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
DL Unit-IV
No ratings yet
DL Unit-IV
20 pages
Machine Learning Fundamentals: Concepts, Models, and Applications
From Everand
Machine Learning Fundamentals: Concepts, Models, and Applications
Amar Sahay
No ratings yet
Effect of Word Embedding Vector Dimensionality On Sentiment Analysis Through Short and Long Texts
No ratings yet
Effect of Word Embedding Vector Dimensionality On Sentiment Analysis Through Short and Long Texts
8 pages
Wordembed
No ratings yet
Wordembed
31 pages
NLP DeepNLP
No ratings yet
NLP DeepNLP
61 pages
01 - Introduction To Text Analytics - Part2
No ratings yet
01 - Introduction To Text Analytics - Part2
48 pages
Chapter II
No ratings yet
Chapter II
26 pages
Document and Knowledge Management Interrelationships
From Everand
Document and Knowledge Management Interrelationships
A. Afritopic
4.5/5 (2)
Survey On Text Classification
No ratings yet
Survey On Text Classification
7 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
Woodmizer LT15 Parts
No ratings yet
Woodmizer LT15 Parts
39 pages
5 Word Embeddingfor Understanding Natural Language ASurvey 1
No ratings yet
5 Word Embeddingfor Understanding Natural Language ASurvey 1
26 pages
Big Data Analytics Chap 11
No ratings yet
Big Data Analytics Chap 11
8 pages
Full Ordinary Differential Equations Principles and Applications Cambridge IISc Series 1st Edition A. K. Nandakumaran PDF All Chapters
No ratings yet
Full Ordinary Differential Equations Principles and Applications Cambridge IISc Series 1st Edition A. K. Nandakumaran PDF All Chapters
65 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
NLP - Natural Language Processing
No ratings yet
NLP - Natural Language Processing
74 pages
127 1498038923 - 21-06-2017 PDF
No ratings yet
127 1498038923 - 21-06-2017 PDF
9 pages
Hitchhiker's Guide To Reductive Amination: Special Topic Syn Thesis
No ratings yet
Hitchhiker's Guide To Reductive Amination: Special Topic Syn Thesis
11 pages
Pulmonary Involvement in Niemann-Pick Disease: A State-of-the-Art Review
No ratings yet
Pulmonary Involvement in Niemann-Pick Disease: A State-of-the-Art Review
8 pages
Forklift Inspection
No ratings yet
Forklift Inspection
4 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
28 pages
Text Mining Project Report
No ratings yet
Text Mining Project Report
27 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Limitorque MX/QX: HART Field Unit
No ratings yet
Limitorque MX/QX: HART Field Unit
2 pages
Com01 PPT Operatorsandconditionalstatement
No ratings yet
Com01 PPT Operatorsandconditionalstatement
19 pages
Probability Mass Function & Density Function
No ratings yet
Probability Mass Function & Density Function
34 pages
The Mediating Effect of Self-Acceptance in The Relationship Between Mindfulness and Peace of Mind
No ratings yet
The Mediating Effect of Self-Acceptance in The Relationship Between Mindfulness and Peace of Mind
6 pages
Large-Scale News Classification Using BERT Languag
No ratings yet
Large-Scale News Classification Using BERT Languag
9 pages
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Introduction To Cloud Computing Practical
No ratings yet
Introduction To Cloud Computing Practical
17 pages
Sheetal Cyriac Virtual Impedance Based Stabilization
No ratings yet
Sheetal Cyriac Virtual Impedance Based Stabilization
6 pages
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
From Everand
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
StotraNidhi Telugu 15-Books Combo
No ratings yet
StotraNidhi Telugu 15-Books Combo
1 page
Extended Essay BM IB
No ratings yet
Extended Essay BM IB
51 pages
SCBM-910400#SCBM-910400 1
No ratings yet
SCBM-910400#SCBM-910400 1
2 pages
Unit 1
No ratings yet
Unit 1
24 pages
Exercise 1: Pizza Class Represents A Type of Pizza That Belongs To The Pizzeria's Menu
No ratings yet
Exercise 1: Pizza Class Represents A Type of Pizza That Belongs To The Pizzeria's Menu
4 pages
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
From Everand
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OUM MARKETING MANAGEMENT BBPM2103 Topic 2
No ratings yet
OUM MARKETING MANAGEMENT BBPM2103 Topic 2
45 pages
Machine Structure
No ratings yet
Machine Structure
27 pages
Landpower 125-185 TDI
80% (10)
Landpower 125-185 TDI
204 pages
RHB R6.2 Point Release PDF
No ratings yet
RHB R6.2 Point Release PDF
14 pages
DLL - Mapeh 4 - Q3 - W9
No ratings yet
DLL - Mapeh 4 - Q3 - W9
4 pages
Emfd Eec
No ratings yet
Emfd Eec
2 pages
Advanced ATM Crime Prevention System by Using Wireless Communication
No ratings yet
Advanced ATM Crime Prevention System by Using Wireless Communication
6 pages
VGS House Model - Estimate
No ratings yet
VGS House Model - Estimate
1 page
Acsai Test Score
No ratings yet
Acsai Test Score
3 pages
1800SRM1498 (03 2013) Us en PDF
No ratings yet
1800SRM1498 (03 2013) Us en PDF
70 pages
SU 841 Separation System System Description: STO P
0% (1)
SU 841 Separation System System Description: STO P
2 pages
CD Expt 3 Implementation of A Lexical Analyzer Using Lex Tool
No ratings yet
CD Expt 3 Implementation of A Lexical Analyzer Using Lex Tool
6 pages
The Muncaster Steam-Engine Models: 5-Vertical Stationary Engines
No ratings yet
The Muncaster Steam-Engine Models: 5-Vertical Stationary Engines
3 pages
OPTALIGNsmart guideNV
No ratings yet
OPTALIGNsmart guideNV
2 pages
Sf6 Gas Density Monitor
No ratings yet
Sf6 Gas Density Monitor
2 pages
SCM Module1 Questions and Answers 1
No ratings yet
SCM Module1 Questions and Answers 1
11 pages
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Knowledge Reasoning: Fundamentals and Applications
From Everand
Knowledge Reasoning: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Word Embedding Methodsof Text Processing

Uploaded by

Word Embedding Methodsof Text Processing

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Word Embedding Methods of Text Processing in Big Data: A Comparative

Conference Paper · March 2023

The user has requested enhancement of the downloaded file.

Lahcen Idouglid(B) and Said Tkatek

Keywords: Word Embedding · NLP · Big Data · Machine Learning

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023

document vocabulary representations is word embedding. It can record a document’s

3.2 Word Embedding Techniques:

TF − IDF = TF(t) ∗ IDF(t) (1)

CountVectorizer: Count Vector is a simple yet incredibly effective method used in

3.3 Text Classifiers

4 Results and Discussion

ML Dataset Size Word Embedding Methods

Fig. 1. Performance comparison for word embedding methods

View publication stats

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.