0% found this document useful (0 votes)

46 views8 pages

18

The document discusses topic detection approaches for identifying topics and events from Arabic corpora. It investigates famous topic detection techniques and the latest significant scholarly articles related to topic detection in general and in the Arabic domain specifically. The goal is to help researchers interested in topic detection become familiar with commonly used techniques and stay updated on the latest technologies in this area.

Uploaded by

ياسر سعد الخزرجي

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views8 pages

18

Uploaded by

ياسر سعد الخزرجي

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ScienceDirect

Available online at www.sciencedirect.com

ScienceDirect
Procedia Computer Science 00 (2018) 000–000
Available online at www.sciencedirect.com www.elsevier.com/locate/procedia
Procedia Computer Science 00 (2018) 000–000
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 142 (2018) 270–277

The 4th International Conference on Arabic Computational Linguistics (ACLing 2018),

November 17-19 2018, Dubai, United Arab Emirates
The 4th International Conference on Arabic Computational Linguistics (ACLing 2018),
November 17-19 2018, Dubai, United Arab Emirates
Topic Detection Approaches in Identifying Topics and Events from
Arabic
Topic Detection Approaches Corpora Topics and Events from
in Identifying
Arabic Corpora
Ahmed Rafeaa and Nada A. GabAllahb*
a in Cairo, AUC Avenue, P.O. Box 74
Ahmed Rafea
The American and Nada A. GabAllahb*
Univeristy
New Cairo 11835, Egypt
The American Univeristy in Cairo, AUC Avenue, P.O. Box 74
New Cairo 11835, Egypt
Abstract

Abstract
How can we know what is going on in the world with a click of a button? With the increase of digital data everywhere, it is
becoming difficult to categorize and retrieve information from such huge data. Topic detection is considered a powerful way to
How can we
mine data andknow
relatewhat is documents
similar going on intogether.
the world with a click
Although of a button?
the Arabic content With
on thethe increase
web of digital
is increasing data
every everywhere,
day, it is
the application
becoming
of difficultontoArabic
topic detection categorize
text isand
notretrieve
up to thisinformation from
increase. In this such
paperhuge data.
we are Topic detection
investigating famousis topic
considered a powerful
detection wayand
techniques, to
mine data
latest and relate
significant similararticles
scholarly documents
relatedtogether.
to topic Although
detection the Arabicand
in general content
in theonArabic
the web is increasing
domain every
in specific. day,
This the application
survey paper will
of topic
help detectioninterested
researchers on Arabicintexttheisdomain
not up to ofthis increase.
topic In this
detection paper
to be we are
familiar investigating
with commonly famous topic detection
used techniques techniques,
and updated with and
the
latest significant
technologiesscholarly articles related to topic detection in general and in the Arabic domain in specific. This survey paper will
in this area.
help researchers interested in the domain of topic detection to be familiar with commonly used techniques and updated with the
latest
© 2018technologies
The Authors.in this area. by Elsevier B.V.
Published
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/)
© 2018
© 2018 The
The Authors.
Authors. Published
Published by by Elsevier
Elsevier B.V.
B.V.
Peer-review under responsibility of the scientific committee of the 4th International Conference on Arabic Computational
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/)
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/)
Linguistics.
Peer-review under responsibility of the scientific committee of the 4th International Conference on Arabic Computational Linguistics.
Peer-review under responsibility of the scientific committee of the 4th International Conference on Arabic Computational
Linguistics.
Keywords: Topic modeling;LDA;Clustering; Data Mining

Keywords: Topic modeling;LDA;Clustering; Data Mining

1. Introduction

1. Over
Introduction
the past few years the world has witnessed a huge increase in digital data across the internet. Nowadays people
use various types of social media, news sites, blogs, etc. as a source of information and to express themselves in
Over the
different past Along
ways. few years
withthethis
world has witnessed
massive amount of a huge
data,increase
the needin for
digital data across
classifying andthe internet.
sorting Nowadays
these peoplea
data became
crucial need. Sophisticated approaches have been applied to be able to classify these data in an organized manner. in
use various types of social media, news sites, blogs, etc. as a source of information and to express themselves
different ways. Along with this massive amount of data, the need for classifying and sorting these data became a
crucial need. Sophisticated approaches have been applied to be able to classify these data in an organized manner.

*
E-mail address: arafea@aucegypt.edu and bnadaaym@aucegypt.edu

*
E-mail address:
1877-0509 a
© 2018 rafea@aucegypt.edu and bby
The Authors. Published nadaaym@aucegypt.edu
Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review©under
1877-0509 2018responsibility
The Authors. of the scientific
Published B.V.of the 4th International Conference on Arabic Computational Linguistics.
committee
by Elsevier
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review under responsibility of the scientific committee of the 4th International Conference on Arabic Computational Linguistics.

1877-0509 © 2018 The Authors. Published by Elsevier B.V.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review under responsibility of the scientific committee of the 4th International Conference on Arabic Computational Linguistics.
10.1016/j.procs.2018.10.492
Ahmed Rafea et al. / Procedia Computer Science 142 (2018) 270–277 271
2 Author name / Procedia Computer Science 00 (2018) 000–000

From the massive amount of data streams coming from different social media, detecting new events and tracking
current ones has been an area of interest to many researchers over the past years. The original idea of topic detection
and tracking was originated back in 1996 in the US Government Defence Advanced Research Projects Agency
(DARPA) within its broadcast news[1]. The idea has been developed over the years using different techniques. The
techniques used can be summarized under three main categories: document-pivot approaches, feature-pivot
approaches and probabilistic approaches [2].
The remainder of the paper is structured as follows. Section 2 reviews the document-pivot approach. Section 3
provides an overview on the feature-pivot approach. Section 4 presents the probabilistic topic modelling approaches.
Section 5 shows the application of topic detection methods on Arabic texts. Finally, in section 6 we conclude the paper
and provide our remarks.

2. Document-pivot approaches

The document-pivot approach is based on clustering documents around certain topic together based on document
similarity. Different clustering methods were used, in [3] hierarchical agglomerative clustering with time decay was
used to identify events in news. The time decay feature helps clustering posts about the same event and detecting a
new event when happens. It is also applied in [4] to detect topics within financial news. Incremental clustering is
widely used specially in detecting topics from social media. The idea of the incremental clustering method is that if a
received item exceeds the similarity threshold to existing clusters, it’s added to the most similar one, otherwise a new
cluster is created with this item [5]. Incremental clustering is used in online topic detection as number of clusters do
not need to be known ahead.
With the rapid growth of social media, Twitter needed different handling due to the nature of its short posts. An
enhancement to the incremental clustering for online event detection from Twitter is presented in [6]. The
enhancement is based on representing the centroid of the cluster as a feature vector, so when a new tweet comes, a
pairwise similarity between the features of the tweet and centroid is performed. The weights of the features are updated
in the tweet vector and the centroid vector of active cluster. They called this method Incremental Clustering Vector
Expansion (ICVE) method. Their experiment showed a significant improvement in terms of precision, recall, and F-
measure compared to the traditional incremental clustering method.
Tackling the area of online topic detection from social media, ‘TwitterNews+’ is presented in [7]. Although the
incremental clustering method has lower computation complexity than other methods, when applying online this
computation cost can be high. This system decreases the computation cost by discarding old clusters after a threshold
of time to allow more space for the new ones. A time stamp is added to each cluster based on the time stamps of the
tweets in it. This system could achieve a recall of 0.96 and precision of 0.89 against the state of the art techniques
including, an older version of the system ‘TwitterNews’ [8], and first story detection method in [9].

3. Feature-pivot method

This approach relies on statistical methods to extract set of terms representing the topics. It uses similarity and co-
occurrence of terms together to detect a topic, this method was adopted by many researches on topic detection from
Twitter, as it suits the nature of limited numbers of words in the tweet. In [10] emerging topics was detected by
considering the posting time of the tweet and its growth/decay in a certain time interval. It also considered the author
of the tweet as a feature to better cluster tweets related to the same topic together. ‘TwitterMonitor’ presented in [11]
is detecting bursty terms that suddenly appear with high frequency indicating a new topic. It clusters those terms based
on their probabilistic co-occurrence to identify the topics. They also used more information from tweets in a post
processing phase, like: geo-location, and news sources to better visualization of the results.
Four approaches based on the feature-pivot approach is presented in [12] and compared to document-pivot
as baseline. The first approach is called Graphic feature-pivot, which uses the structural clustering algorithm for
networks (SCAN) [13] in grouping terms together. The algorithm groups nodes that share similarities together in what
is called a community. The node that’s connected to more than a community is called a hub. The nodes of the graph
are the terms and the communities represent the topics. by detecting the hubs, related topics can be grouped based on
number of hubs connecting them. The second algorithm presented is frequent pattern mining (FPM) which relies on
272 Ahmed Rafea et al. / Procedia Computer Science 142 (2018) 270–277
Author name / Procedia Computer Science 00 (2018) 000–000 3

pairwise co-occurrence between unigrams. The third algorithm is Soft FPM (SFPM) which extends the algorithm to
group a set of co-occurring unigrams not just pairs. The fourth one considers n-grams co-occurrences and called
BNgram. All the algorithms use a modified version of Term frequency-inverse document frequency (TF-IDF) which
considers the changing frequency over time. All algorithms were applied on three datasets collected from Twitter
during three major events about sports, politics and a social event in the USA in 2012, and evaluated using topical
recall, keyword precision and keyword recall measures. The result showed that BNgram and document-pivot method
could achieve the highest topical recall score of 0.7692 followed by SFPM achieving a score of 0.615. Regarding the
keyword precision, FPM could achieve a precision score of 1 in one of the datasets and 0.75 in another while a zero
in the third dataset. For the keyword recall, SFPM could achieve a recall score of 0.8982. It is worth noting that the
superiority of one algorithm is not consistent over the results of the three datasets. This is related to the nature of the
targeted events as the structure and coherent of topics differ from one domain to another.
A comparison between feature-pivot approach and document-pivot approaches was presented in [14]. The feature-
pivot approach is based on grouping co-occurring unigrams according to their proportional frequency in the set of
tweets they occur in, and their frequency in the whole dataset. While the document-pivot approach used bisecting k-
means clustering technique. Both approaches were applied on an Egyptian Twitter dataset, showing that the feature-
pivot approach could achieve F-score of 0.923 compared to 0.8 achieved by document-pivot. And by validating the
results on different sizes of datasets from three different domains, sports, entertainment, and politics, the average F-
score of the feature-pivot approach was 0.83 compared to 0.56 achieved by the document-pivot approach.
For online event detection feature pivot doesn’t work very well in new event detection, in [7] ‘TwitterNews+’
which is based on document-pivot approach could achieve a recall score of 0.96 and precision score of 0.89, while
two feature-pivot based approaches presented in [15] and [16] achieved recall scores of 0.58 and 0.71 respectively,
and precision scores of 0.55 and 0.64 respectively.

4. Probabilistic topic modeling method

Topic modeling is an area of research focuses on classifying the data into groups [17], it is prominent for demonstrating
discrete data and giving a productive approach to find hidden structures in huge data [18]. The way topic modeling
works can be simply described as assuming each document as a group of topics with different probability.
Mathematical formulas are applied to find the probability of each topic in each document.
The idea of topic modeling could be originated back to the 90’s when latent semantic Analysis (LSA), formerly
known as latent semantic indexing (LSI), was presented as a novel approach for retrieval of documents not just by
occurrence of query word in them but based on the conceptual content these words imply [19]. Since then the idea of
clustering textual data based on their content is a growing area of research improving each day. Afterwards in 1999
the probabilistic latent semantic analysis (PLSA) was presented in [20] and was a great contribution towards the
enhancement of topic modeling approaches. In 2003 the famous Latent Dirichlet Allocation (LDA) approach was
presented in [21]. Latent Dirichlet allocation (LDA) is a generative probabilistic model of a corpus. The main idea is
that documents are represented as random mixtures over latent topics, where each topic is characterized by a
distribution over words. The major advantage of LDA over LSA and PLSA is its ability in dimensionality reduction
and the ability to be applied in more complex methods.
Many variants of topic modeling techniques have appeared since then yielding lots of approaches can be applied
in various domains guaranteeing better understanding of large data and more coherent resulting topics. In this section
we will present the famous probabilistic topic modelling techniques, Probabilistic Latent Semantic Analysis (PLSA),
and Latent Dirichlet Allocation (LDA). Then significant recent works between 2017 and 2018 investigating variations
of LDA is presented.

4.1. Probabilistic Latent Semantic Analysis

Before presenting PLSA we will give a brief background of LSA to get more insight why PLSA was introduced.
LSA is a topic modeling technique in natural language processing. Its main objective is to create a vector-based
representation for texts that help grouping the related words [22]. A detailed review of LSA in [23] explains the
technical aspects of the approach. The mathematical foundation of Latent Semantic Analysis is the Vector Space
Ahmed Rafea et al. / Procedia Computer Science 142 (2018) 270–277 273
4 Author name / Procedia Computer Science 00 (2018) 000–000

Model (VSM), an algebraic model for representing documents as vectors in a space where dictionary terms are used
as dimensions. Using matrix notation, VSM represents a collection of ‘d’ documents (a corpus) in a space of ‘t’
dictionary terms as the t × d matrix X. Two main term reduction techniques are applied to matrix X: stop words
removal and stemming. TF-IDF is one of the most common techniques used in representing the entries of the matrix.
The similarity between terms in documents are calculated using different similarity metrics. One of the most common
one is the cosine similarity. The main contribution of LSA is that it considers not only the similarity of terms in
documents but also takes into consideration the related terms to generate a more insight to the topic. The LSA approach
was compared in [19] against a straightforward term matching method, and the precision of detecting relevant
documents by LSA outperformed the other approach by 13%.
LSA is used in many applications like online customer support, spam filtering, text summarization and lots more.
From the drawbacks LSA experiences is the high computational and memory usage, also determining the optimal
number of dimensions to use for singular value decomposition (SVD). Application of LSA is investigated in [24] on
the 2016 presidential debates in USA, it could capture the policy adopted by each candidate and the change in topics
based on people’s reactions.
PLSA was first introduced in 1999 [20], this approach has important theoretical advantages over standard LSA,
since it is based on the likelihood principle, defines a generative data model, and directly minimizes word perplexity.
It can also take advantage of statistical standard methods for model fitting, overfitting control, and model combination.
The core of PLSA is a statistical model called the aspect model. It is a latent variable model for general co-occurrence
data which associates an unobserved class variable 𝑧𝑧𝑧𝑧𝑧𝑧 = {𝑧𝑧1 , … . . 𝑧𝑧𝑘𝑘 }. With each occurrence of a word 𝑤𝑤𝑤𝑤𝑤𝑤 =
{𝑤𝑤1 , … . . 𝑤𝑤𝑀𝑀 } in a document 𝑑𝑑𝑑𝑑𝑑𝑑 = {𝑑𝑑1 , … . . 𝑑𝑑𝑁𝑁 }.
The Expectation Maximization (EM) algorithm is used commonly for maximum likelihood estimation in latent
variable models. It has two main steps:
• The expectation step (E) where posterior probabilities are calculated for the latent variable 𝑧𝑧 based on the
current estimates of the parameters.
• The maximization step (M) where parameters are updated for given posterior probabilities calculated in the
(E) step.
Regarding solving the perplexity problem of words, both approaches were applied to more than a dataset and their
performance was compared to a unigrams baseline. PLSA could reduce perplexity by more than a factor of 3, while
LSA achieved less than a factor of two.
PLSA suffers from drawbacks related to number of parameters and discovering new topics when the data increases
significantly. A recent novel approach is presented in [25] addressing the problem of discovering new topics from
large textual data. They presented Weighted Incremental PLSA (WPLSA) algorithm. They compared their approach
to the standard PLSA, Maximum posterior PLSA (MAP-PLSA), and Quasi-Bayes PLSA (QB-PLSA). Using
perplexity measure, WPLSA could achieve lower values compared to the other approaches on different number of
topics.

4.2. Latent Dirichlet Allocation

Continuing from the idea of PLSA and trying to solve the disadvantages of the model regarding the number
parameters that grows linearly with the size of the corpus, and the difficulty of assigning a probability to a new
document outside the trained corpus. The idea of latent Dirichlet allocation (LDA) came up from the fundamental
probabilistic feature both LSA and PLSA was built upon, which is the “bag of words” assumption. In the bag of words
model, the order of words is not taken into consideration, applying the exchangeability property of words. In [21] the
LDA approach was presented for the first time. It is defined as a generative probabilistic model of a corpus. The basic
idea that documents are represented as random mixtures over latent topics, where each topic is characterized by a
distribution over words.
The model assumes the following generative process for each 𝑤𝑤 in corpus 𝐷𝐷:
1. Choose 𝑁𝑁 ~𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝(𝜉𝜉)
2. Choose 𝜃𝜃~𝐷𝐷𝐷𝐷𝐷𝐷(𝛼𝛼)
3. For each of the 𝑁𝑁 words 𝑤𝑤𝑛𝑛 :
− Choose a topic 𝑧𝑧𝑛𝑛 ~ 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 (𝜃𝜃)
274 Ahmed Rafea et al. / Procedia Computer Science 142 (2018) 270–277
Author name / Procedia Computer Science 00 (2018) 000–000 5

− Choose a word 𝑤𝑤𝑛𝑛 from 𝑝𝑝(𝑤𝑤𝑛𝑛 |𝑧𝑧𝑛𝑛 , 𝛽𝛽) a multinomial property conditioned on the topic 𝑧𝑧𝑛𝑛
Comparing LDA to PLSA and mixture of unigrams model in [21], the perplexity at 100 topics of LDA was lower
than that of PLSA by 16% and lower than the mixture of unigrams model by 45% on one dataset. It also showed that
the other approaches didn’t perform well when new documents different than those of the training set is presented.
Regarding the application of LDA on social media specially Twitter, feature-pivot based approaches in [12] could
achieve better topical recall, keyword recall and precision better than LDA over three datasets. On one of the datasets
LDA resulted in zero scores in the three evaluation measures. In [7] LDA achieved a recall score of 0.45 and precision
score of 0.49 while feature-pivot approaches and ‘TwitterNews’ a document-pivot based approach achieved
significant better results for online event detection from Twitter.

4.3. Variants of LDA

With the variations of digital content, it became more intuitive to look beyond just the words in the documents.
Researchers started to think to augment some auxiliary data associated with the digital documents for better quality
of topics. A supervised approach tackling the idea of different context holds different sentiment for the same word
which implies different topic features, discriminatively objective-subjective LDA (dosLDA) is proposed in [26]. The
basic idea they built their approach on is Bag of Discriminative Words (BoDW). They elaborated with an example
that the same word as “bug” can appear in a scientific research which means an insect and has a neutral or may be
positive sentiment, can appear also in a software document meaning a “problem” which holds a negative sentiment.
If this is taken into consideration it will help the resulting topic to be of better quality. They applied their model on
different datasets, including Twitter, Flickr, and a multidomain sentiment data set. They defined the subjective part as
the sentiment (positive, neutral, and negative), and the objective part as the categories of documents, they showed that
the Bag of discriminative Words (BoDW) representation is more predictive than the Bag of Topics (BoT)[27]
representation that is used for discriminative tasks.
Mostly the probabilistic topic modeling is based on three-layer hierarchical Bayesian structure [21], where each
document is modelled as a probability distribution over topics, and each topic is a probability distribution over words.
A new approach suggesting adding a latent concept layer between topic layer and word layer, so that each topic is a
probability distribution over concepts and each concept is a probability distribution over words. They presented the
unsupervised model Conceptual LDA (CLDA) and the supervised model Conceptual Labelled LDA (CLLDA) in
[28].A concept knowledge base tool called Probase [29] is used to get the probability distribution of each concept over
words, and Gibs Sampling was used in the estimation of parameters. The models were applied on different datasets
and compared to the traditional Labeled LDA (LLDA)[30] model and showed better results.

5. Topic detection in Arabic

Although topic detection area of research has been actively evolving over the past years, we can hardly find solid
application of topic modelling approaches on the Arabic text. The number of researches on Arabic is falling behind
the increase of Arabic content on the web [31]. Mainly due to different dialects used over the internet and the decrease
of using standard Arabic in social media and other blogs. Also, the lack of annotated corpora used for categorization.
In this section we are trying to present the most significant works related to the area of topic modelling and topic
detection in Arabic language since 2011 till 2018.
In 2011 LDA was applied on real word Arabic datasets collected by the authors in [31] based on ‘Echorouk’,
‘Reuters’, and ‘Xinhua’ web articles between the years 2008 and 2009. The research focused on investigating the
effect of different stemmers on the results of topic modelling. The experiments showed the proper usage of stemming
yields to better resulting topics. Another application of LDA is presented in [32] to discover the thematic structure of
the Quran. Each chapter of the Quran is considered as a document and LDA was applied to extract topics in each
chapter. The algorithm was able to identify major topics in each chapter of the Quran, characterized by the distinct
themes of Makki and Madani chapters.
A comparative study between LDA and K-means clustering technique is presented in [33]. The objective of the
research was to compare the influence of morpho-syntactic characteristics of the Arabic language in the pre-processing
phase on the performance of topic detection using both approaches. They used an Arabic corpus called “OSAC” (open
Ahmed Rafea et al. / Procedia Computer Science 142 (2018) 270–277 275
6 Author name / Procedia Computer Science 00 (2018) 000–000

source Arabic corpus) collected from multiple sites consists of 11 topics. Their results showed that LDA performed
better when applied on raw documents. After applying stop words removal, both approaches yielded in better results.
However, when stemmer was applied some ambiguity in the words increases resulting in performance degradation.
The authors related this degradation to the parameters used in the pre-processing phase, they suggested including other
parameters such as lemmatization to enhance the performance.
Recently, tackling Arabic dialects and its challenges, a topic and sentiment model was applied to the Colloquial
(Maghrebi) Arabic in [34]. The corpus used was collected from Facebook pages, a supervised approach was applied
to extract the sentiment, then LDA was applied to extract the topics. They proposed a new semi-supervised approach
combining the topic and the sentiment to join each topic to a specific sentiment. A sentiment layer between the
document and the topic layer was added, where sentiment labels are associated with documents, topics are associated
with sentiment labels and words are associated with both sentiment labels and topics. The results were promising, yet
some challenges need to be addressed. The challenge of the lack of a lexicon for the dialect so the authors had to build
their own lexicon which was not large enough. The stemmer didn’t perform well with the dialect, so more
improvement needs to be done.
In 2018, a combined approach between the k-means and the topic modeling approaches is presented in [35]. They
used a dataset of Modern Standard Arabic of news document composed of 2700 documents of 9 topics. The k-means
algorithm is applied to cluster documents. To cluster topics, LDA is applied before the clustering algorithm. The mean
normalized vectors of data act as input to the LDA algorithm. After that the output probability of topics in documents
is mean-normalized and k-means clustering is applied. In that manner the dimension of features is reduced because of
using LDA. The model was evaluated by applying it on different datasets, “Aljazeera”, “Alkhaleej”, “Alwatan”,
“BBC”, and “CNN”. The results showed that using the combined approach could detect topics with an F-score of
0.8163 while k-means alone scored 0.551.
In the domain of social media of Arabic Web, we can summarize the significant works in this area. For detecting
events in Arabic social media, a research presented in [36] tackling the detection of disruptive events from Twitter.
Their model is based on the co-occurrence of terms over time. Another work in [37] presented an end-to-end event
detection framework which comprises six main components: data collection, pre-processing, classification, feature
selection, topic clustering, and summarization. A dataset of over 16 million Arabic Twitter messages is used. They
focused on the temporal, spatial and textual features for each cluster to help detecting the event. They compared their
results to LDA, and they showed LDA couldn’t perform well on Tweets as short messages. Considering detection of
bursty features from Arabic Twitter, [38] investigated a new technique based on TFIDF, entropy and stream chunking.
They collected Tweets from Egypt using Twitter API between the period of December 26 th 2015 and May 20th 2016.
The results were compared against known events occurred on the days of streaming. The results showed that the
technique could capture the bursty terms related to events happening in real life. Comparing different clustering
techniques in the application of document-pivot approach on Egyptian Twitter datasets is presented in [39]. The results
showed that bisecting k-means performed better than other methods like agglomerative and traditional k-means
methods.

6. Conclusions and future work

This paper reviewed three main approaches of topic detection area of research. Different researches were applied
on document-pivot, feature-pivot and probabilistic topic modeling approaches. The performance of each approach
relied heavily on the datasets used. The feature-pivot approach appeared more suitable to offline Twitter and short
messages datasets than the document-pivot approach. While for the online topic detection from Twitter incremental
clustering as a document-pivot approach performed better. Traditional LDA method couldn’t perform well on social
media, yet variants of it like dosLDA [26] and CLDA [28] could achieve significant improvement. Applying topic
detection techniques on Arabic text still needs more research, traditional methods were applied, yet more improvement
is on the way. Probabilistic topic modelling methods are an active area of research, and new algorithms are presented
each day to enhance the results of detected topics from large textual corpus. Future work includes applying word
embedding to enhance capturing related topics based on their content. Combining LDA and word2vec representation
for topic detection can be found in [40]. Dynamic embedding is presented in [41] which formulates word embedding
with conditional probabilistic models. To enhance the results of topic detection for short texts, auxiliary word
276 Ahmed Rafea et al. / Procedia Computer Science 142 (2018) 270–277
Author name / Procedia Computer Science 00 (2018) 000–000 7

embeddings are used in [42]. This paper provided a summary of different methods for topic detection and the recent
researches in this area, to work as a guidance for researchers interested in this domain.

References

[1] J. Allan, “Introduction to Topic Detection and Tracking,” in Topic Detection and Tracking, vol. 12, J. Allan,
Ed. Boston, MA: Springer US, 2002, pp. 1–16.
[2] N. Alkhamees and M. Fasli, “Event detection from social network streams using frequent pattern mining with
dynamic support values,” in 2016 IEEE International Conference on Big Data (Big Data), Washington
DC,USA, 2016, pp. 1670–1679.
[3] Xiangying Dai and Yunlian Sun, “Event identification within news topics,” in 2010 International Conference
on Intelligent Computing and Integrated Systems, Guilin, China, 2010, pp. 498–502.
[4] X.-Y. Dai, Q.-C. Chen, X.-L. Wang, and J. Xu, “Online topic detection and tracking of financial news based
on hierarchical clustering,” in 2010 International Conference on Machine Learning and Cybernetics,
Qingdao, China, 2010, pp. 3341–3346.
[5] H. Becker, M. Naaman, and L. Gravano, “Beyond Trending Topics: Real-World Event Identification on
Twitter,” 2011.
[6] O. Ozdikis, P. Karagoz, and H. Oğuztüzün, “Incremental clustering with vector expansion for online event
detection in microblogs,” Social Network Analysis and Mining, vol. 7, no. 1, Dec. 2017.
[7] M. Hasan, M. A. Orgun, and R. Schwitter, “Real-time event detection from the Twitter data stream using the
TwitterNews+ Framework,” Information Processing & Management, Mar. 2018.
[8] M. Hasan, M. A. Orgun, and R. Schwitter, “TwitterNews: Real time event detection from the Twitter data
stream,” PeerJ PrePrints, vol. 4, 2016.
[9] S. Petrovic, M. Osborne, and V. Lavrenko, “Streaming First Story Detection with application to Twitter,”
presented at the Human language technologies: The 2010 annual conference of the north american chapter of
the association for computational linguistics, 2010, pp. 181–189.
[10] M. Cataldi, L. Di Caro, and C. Schifanella, “Emerging topic detection on Twitter based on temporal and
social terms evaluation,” in Proceedings of the Tenth International Workshop on Multimedia Data Mining -
MDMKDD ’10, Washington, D.C., 2010, pp. 1–10.
[11] M. Mathioudakis and N. Koudas, “TwitterMonitor: trend detection over the twitter stream,” in Proceedings of
the 2010 international conference on Management of data - SIGMOD ’10, Indianapolis, Indiana, USA, 2010,
p. 1155.
[12] L. M. Aiello et al., “Sensing Trending Topics in Twitter,” IEEE Transactions on Multimedia, vol. 15, no. 6,
pp. 1268–1282, Oct. 2013.
[13] X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger, “SCAN: a structural clustering algorithm for networks,” in
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining -
KDD ’07, San Jose, California, USA, 2007, p. 824.
[14] Nada A.Mostafa, “Tredning Topic Extraction from Social Media,” American University in Cairo, Egypt,
2016.http://dar.aucegypt.edu/bitstream/handle/10526/4691/Trending_Topic_Extraction_from_Social_Media_
Nada_Ayman.pdf?sequence=1
[15] A. Guille and C. Favre, “Event detection, tracking, and visualization in Twitter: a mention-anomaly-based
approach,” Social Network Analysis and Mining, vol. 5, no. 1, Dec. 2015.
[16] J. Benhardus and J. Kalita, “Streaming trend detection in Twitter,” International Journal of Web Based
Communities, vol. 9, no. 1, p. 122, 2013.
[17] Padmaja CH V R, S Lakshmi Narayana, and Divakar CH, “PROBABILISTIC TOPIC MODELING AND ITS
VARIANTS – A SURVEY,” International Journal of Advanced Research in Computer Science, vol. 9, no. 3,
pp. 173–177, Jun. 2018.
[18] H. Jelodar, Y. Wang, C. Yuan, and X. Feng, “Latent Dirichlet Allocation (LDA) and Topic modeling: models,
applications, a survey,” arXiv:1711.04305 [cs], Nov. 2017.
[19] Deerwester, Scott, Dumais, Susan T, Furnas, George W, Landauer, Thomas K, and Harshman, Richard,
“Indexing by latent semantic analysis,” Journal of the American society for information science, vol. 41, no.
6, pp. 391–407, 1990.
[20] T. Hofmann, “Probabilistic Latent Semantic Analysis,” Proceedings of the Fifteenth conference on
Ahmed Rafea et al. / Procedia Computer Science 142 (2018) 270–277 277
8 Author name / Procedia Computer Science 00 (2018) 000–000

Uncertainty in artificial intelligence, pp. 289–296, 1999.

[21] D. M. Blei, Ng,Andew Y, and Jordan,Michael I, “Latent Dirichlet Allocation,” Journal of machine learning
research, vol. 3, no. Jan, pp. 993–1022, 2003.
[22] Computer Science & Engineering Raghu Engineering College Visakhapatnam, AP, India and P. C. V R,
“PROBABILISTIC TOPIC MODELING AND ITS VARIANTS – A SURVEY,” International Journal of
Advanced Research in Computer Science, vol. 9, no. 3, pp. 173–177, Jun. 2018.
[23] N. E. Evangelopoulos, “Latent semantic analysis: Latent semantic analysis,” Wiley Interdisciplinary Reviews:
Cognitive Science, vol. 4, no. 6, pp. 683–692, Nov. 2013.
[24] D. Valdez, A. C. Pickett, and P. Goodson, “Topic Modeling: Latent Semantic Analysis for the Social
Sciences*: Topic Modeling,” Social Science Quarterly, Sep. 2018.
[25] N. Li, W. Luo, K. Yang, F. Zhuang, Q. He, and Z. Shi, “Self-organizing weighted incremental probabilistic
latent semantic analysis,” International Journal of Machine Learning and Cybernetics, Apr. 2017.
[26] Y. Zhuang et al., “Bag-of-Discriminative-Words (BoDW) Representation via Topic Modeling,” IEEE
Transactions on Knowledge and Data Engineering, vol. 29, no. 5, pp. 977–990, May 2017.
[27] W. Chong, D. Blei, and F. Li, “Simultaneous image classification and annotation,” in 2009 IEEE Conference
on Computer Vision and Pattern Recognition, 2009, pp. 1903–1910.
[28] Y.-K. Tang, X.-L. Mao, H. Huang, X. Shi, and G. Wen, “Conceptualization topic modeling,” Multimedia
Tools and Applications, vol. 77, no. 3, pp. 3455–3471, Feb. 2018.
[29] W. Wu, H. Li, H. Wang, and K. Q. Zhu, “Probase: a probabilistic taxonomy for text understanding,” p. 12.
[30] D. Ramage, D. Hall, R. Nallapati, and C. D. Manning, “Labeled LDA: a supervised topic model for credit
attribution in multi-labeled corpora,” in Proceedings of the 2009 Conference on Empirical Methods in Natural
Language Processing Volume 1 - EMNLP ’09, Singapore, 2009, vol. 1, p. 248.
[31] A. Brahmi, A. Ech-Cherif, and A. Benyettou, “Arabic texts analysis for topic modeling evaluation,”
Information Retrieval, vol. 15, no. 1, pp. 33–53, Feb. 2012.
[32] M. A. Siddiqui, S. M. Faraz, and S. A. Sattar, “Discovering the Thematic Structure of the Quran using
Probabilistic Topic Model,” in 2013 Taibah University International Conference on Advances in Information
Technology for the Holy Quran and Its Sciences, Madinah, Saudi Arabia, 2013, pp. 234–239.
[33] A. Kelaiaia and H. F. Merouani, “Clustering with Probabilistic Topic Models on Arabic Texts,” in Modeling
Approaches and Algorithms for Advanced Computer Applications, vol. 488, A. Amine, A. M. Otmane, and L.
Bellatreche, Eds. Cham: Springer International Publishing, 2013, pp. 65–74.
[34] T. Zarra, R. Chiheb, R. Moumen, R. Faizi, and A. E. Afia, “Topic and sentiment model applied to the
colloquial Arabic: a case study of Maghrebi Arabic,” in Proceedings of the 2017 International Conference on
Smart Digital Environment - ICSDE ’17, Rabat, Morocco, 2017, pp. 174–181.
[35] M. Alhawarat and M. Hegazi, “Revisiting K-Means and Topic Modeling, a Comparison Study to Cluster
Arabic Documents,” IEEE Access, vol. 6, pp. 42740–42749, 2018.
[36] N. Alsaedi and P. Burnap, “Arabic Event Detection in Social Media,” in Computational Linguistics and
Intelligent Text Processing, vol. 9041, A. Gelbukh, Ed. Cham: Springer International Publishing, 2015, pp.
384–401.
[37] N. Alsaedi, P. Burnap, and O. Rana, “Sensing Real-World Events Using Arabic Twitter Posts,” Proceeding of
the Tenth International AAAI Conference on Web and Social Media (ICWSM2016), p. 4, 2016.
[38] M. Hammad and S. R. El-Beltagy, “Towards Efficient Online Topic Detection through Automated Bursty
Feature Detection from Arabic Twitter Streams,” Procedia Computer Science, vol. 117, pp. 248–255, 2017.
[39] Ahmed Rafea and Nada A.M. Gaballah, “Trending Topic Extraction from Twitter for Arabic Speaking User,”
presented at the The 33rd International Conference on Computers and Their Applications (CATA 2018), Las
Vegas, Nevada, USA, 2018, pp. 214–219.
[40] C. E. Moody, “Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec,” arXiv:1605.02019
[cs], May 2016.
[41] M. Rudolph and D. Blei, “Dynamic Embeddings for Language Evolution,” in Proceedings of the 2018 World
Wide Web Conference on World Wide Web - WWW ’18, Lyon, France, 2018, pp. 1003–1011.
[42] C. Li, Y. Duan, H. Wang, Z. Zhang, A. Sun, and Z. Ma, “Enhancing Topic Modeling for Short Texts with
Auxiliary Word Embeddings,” ACM Transactions on Information Systems, vol. 36, no. 2, pp. 1–30, Aug.
2017.

Statistical Topic Modeling For Afaan Oromo Document Clustering
No ratings yet
Statistical Topic Modeling For Afaan Oromo Document Clustering
10 pages
Peerj Cs 2081
No ratings yet
Peerj Cs 2081
27 pages
AUC Knowledge Fountain AUC Knowledge Fountain
No ratings yet
AUC Knowledge Fountain AUC Knowledge Fountain
104 pages
Ke Et Al. - 2024 - Recent Advances in Text Analysis
No ratings yet
Ke Et Al. - 2024 - Recent Advances in Text Analysis
60 pages
Topic Modelling Using NLP
No ratings yet
Topic Modelling Using NLP
18 pages
A Graph Analytical Approach For Topic Detection
No ratings yet
A Graph Analytical Approach For Topic Detection
21 pages
Twitter Topic Modelling Using Latent Dirichlet Allocation Approach
No ratings yet
Twitter Topic Modelling Using Latent Dirichlet Allocation Approach
8 pages
Bittermann & Fischer (2018) - Hot Topics - Preprint
No ratings yet
Bittermann & Fischer (2018) - Hot Topics - Preprint
34 pages
1143-Article Text-7844-1-10-20221206
No ratings yet
1143-Article Text-7844-1-10-20221206
10 pages
Practice Questions - Sign Convention - Spherical Mirrors DONEE
0% (1)
Practice Questions - Sign Convention - Spherical Mirrors DONEE
2 pages
Thematic Analysis and Visualization of Textual Corpus
No ratings yet
Thematic Analysis and Visualization of Textual Corpus
17 pages
Wang 2006
No ratings yet
Wang 2006
10 pages
Gibbs-BERTopic A Hybrid Approach For Short Text To
No ratings yet
Gibbs-BERTopic A Hybrid Approach For Short Text To
13 pages
A Document Exploring System On Lda Topic Model For Wikipedia Articles
No ratings yet
A Document Exploring System On Lda Topic Model For Wikipedia Articles
13 pages
Applying Clustering Techniques For Efficient Text Mining in Twitter Data
No ratings yet
Applying Clustering Techniques For Efficient Text Mining in Twitter Data
4 pages
Jipeng Qiang 2019
No ratings yet
Jipeng Qiang 2019
17 pages
2019 - Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
No ratings yet
2019 - Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
43 pages
A Correlated Topic Model of Science1
No ratings yet
A Correlated Topic Model of Science1
19 pages
1 Text Mining Review Slides
No ratings yet
1 Text Mining Review Slides
78 pages
ITD253 L8 TopicModelling
No ratings yet
ITD253 L8 TopicModelling
31 pages
DR 3
No ratings yet
DR 3
7 pages
1 s2.0 S1877050922010158 Main
No ratings yet
1 s2.0 S1877050922010158 Main
10 pages
A Review of Approaches For Topic Detection in Twitter
No ratings yet
A Review of Approaches For Topic Detection in Twitter
28 pages
Real Time Text Mining On Twitter Data: Shilpy Gandharv Vivek Richhariya Richhariya
No ratings yet
Real Time Text Mining On Twitter Data: Shilpy Gandharv Vivek Richhariya Richhariya
5 pages
A Novel Heuristic For Graph-Based Topic
No ratings yet
A Novel Heuristic For Graph-Based Topic
9 pages
AI Research Paper 4
No ratings yet
AI Research Paper 4
12 pages
Learning Author-Topic Models From Text Corpora
No ratings yet
Learning Author-Topic Models From Text Corpora
38 pages
Visualizing Topic Models
No ratings yet
Visualizing Topic Models
4 pages
Frai 07 1345445
No ratings yet
Frai 07 1345445
19 pages
Running Head: Topic Model by Using Latent Dirichlet Allocation 1
No ratings yet
Running Head: Topic Model by Using Latent Dirichlet Allocation 1
8 pages
Latent Dirichlet Allocation LDA and Topic Modeling PDF
No ratings yet
Latent Dirichlet Allocation LDA and Topic Modeling PDF
41 pages
ECIR2009 Topic Trend Detection
No ratings yet
ECIR2009 Topic Trend Detection
5 pages
1 s2.0 S1877050921012199 Main
No ratings yet
1 s2.0 S1877050921012199 Main
4 pages
NewSociRank: Recognizing and Ranking Frequent News Topics Using Social Media Factors
No ratings yet
NewSociRank: Recognizing and Ranking Frequent News Topics Using Social Media Factors
4 pages
Chengqing Zong - Rui Xia - Jiajun Zhang - Text Data Mining-Springer Singapore
100% (1)
Chengqing Zong - Rui Xia - Jiajun Zhang - Text Data Mining-Springer Singapore
506 pages
DYNA102 Stanadyne Pump
100% (3)
DYNA102 Stanadyne Pump
4 pages
Computer Science 1
No ratings yet
Computer Science 1
61 pages
Final ThesisII
No ratings yet
Final ThesisII
82 pages
Topcat: Data Mining For Topic Identification in A Text Corpus
No ratings yet
Topcat: Data Mining For Topic Identification in A Text Corpus
33 pages
Detecting Favorite Topics in Computing Scientific Literature Via Dynamic Topic Modeling
No ratings yet
Detecting Favorite Topics in Computing Scientific Literature Via Dynamic Topic Modeling
11 pages
A Gentle Introduction To Topic Modeling Using Pyth
No ratings yet
A Gentle Introduction To Topic Modeling Using Pyth
10 pages
Review On Topic Detection Methods For Twitter Streams
No ratings yet
Review On Topic Detection Methods For Twitter Streams
5 pages
Eai 13-7-2018 159623
No ratings yet
Eai 13-7-2018 159623
16 pages
Loop Breaker Manual
No ratings yet
Loop Breaker Manual
62 pages
The Structural Topic Model and Applied Social Science
No ratings yet
The Structural Topic Model and Applied Social Science
4 pages
Topic Modeling For Social Media Content A Practical Approach
No ratings yet
Topic Modeling For Social Media Content A Practical Approach
7 pages
IoT Module 4 Associated IoT Technologies
No ratings yet
IoT Module 4 Associated IoT Technologies
56 pages
MANUAL Health O Meter Scale 800KL
No ratings yet
MANUAL Health O Meter Scale 800KL
2 pages
Draft: Automatic Topic Labeling Using Ontology-Based Topic Models
No ratings yet
Draft: Automatic Topic Labeling Using Ontology-Based Topic Models
7 pages
A Survey of Topic Pattern Mining in Text Mining PDF
No ratings yet
A Survey of Topic Pattern Mining in Text Mining PDF
7 pages
Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
No ratings yet
Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
40 pages
Combine PDF
No ratings yet
Combine PDF
7 pages
Discovering Emerging Topics in Social Streams Via Link-Anomaly Detection
No ratings yet
Discovering Emerging Topics in Social Streams Via Link-Anomaly Detection
5 pages
Summer Internship Report
100% (1)
Summer Internship Report
35 pages
A Two Staged NLP Based Framework For Assessing The Sentiments On Indian Supreme Court Judgments
No ratings yet
A Two Staged NLP Based Framework For Assessing The Sentiments On Indian Supreme Court Judgments
10 pages
Baum Et Al 2021 Artificial Intelligence in Chemistry Current Trends and Future Directions
No ratings yet
Baum Et Al 2021 Artificial Intelligence in Chemistry Current Trends and Future Directions
16 pages
Mevolution - Ged101 - Quiambao - A54
No ratings yet
Mevolution - Ged101 - Quiambao - A54
12 pages
Grade 9 - Sustainable Ecosystems
No ratings yet
Grade 9 - Sustainable Ecosystems
36 pages
An Integrated Clustering and BERT Framework For Improved Topic Modeling
No ratings yet
An Integrated Clustering and BERT Framework For Improved Topic Modeling
9 pages
Text Mining of Twitter Data Using A Latent Dirichlet Allocation Topic Model and Sentiment Analysis
No ratings yet
Text Mining of Twitter Data Using A Latent Dirichlet Allocation Topic Model and Sentiment Analysis
6 pages
Topic Modeling and Digital Humanities
No ratings yet
Topic Modeling and Digital Humanities
6 pages
Major Research Topics in Big Data
No ratings yet
Major Research Topics in Big Data
4 pages
Contextual Topic Discovery Using Unsupervised Keyphrase Extraction and Hierarchical Semantic Graph Model
No ratings yet
Contextual Topic Discovery Using Unsupervised Keyphrase Extraction and Hierarchical Semantic Graph Model
19 pages
DATAMINING
No ratings yet
DATAMINING
8 pages
Topic Modelling: A Survey of Topic Models: Abstract-In Recent Years We Have Significant Increase
No ratings yet
Topic Modelling: A Survey of Topic Models: Abstract-In Recent Years We Have Significant Increase
12 pages
Information Retrieval Using Effective Bigram Topic Modeling
No ratings yet
Information Retrieval Using Effective Bigram Topic Modeling
8 pages
Incorporating Topic Transition in Topic Detection and Tracking Algorithmsincorporating Topic Transition in Topic Detection and Tracking Algorithms
No ratings yet
Incorporating Topic Transition in Topic Detection and Tracking Algorithmsincorporating Topic Transition in Topic Detection and Tracking Algorithms
6 pages
Topic Models From Twitter Hashtags: 1 Problem Definition
No ratings yet
Topic Models From Twitter Hashtags: 1 Problem Definition
2 pages
Keyphrase Extraction in Scientific Publications
No ratings yet
Keyphrase Extraction in Scientific Publications
10 pages
Twitternews: Real Time Event Detection From The Twitter Data Stream
No ratings yet
Twitternews: Real Time Event Detection From The Twitter Data Stream
9 pages
A Comparative Study of Keyword Extraction Algorithms For English Texts
No ratings yet
A Comparative Study of Keyword Extraction Algorithms For English Texts
8 pages
A Geometric Review of Linear Algebra: Vectors (Finite-Dimensional)
No ratings yet
A Geometric Review of Linear Algebra: Vectors (Finite-Dimensional)
10 pages
A LDA Based Model For Topic Evolution: Evidence From Information Science Journals
No ratings yet
A LDA Based Model For Topic Evolution: Evidence From Information Science Journals
6 pages
A Survey of Topic Modeling in Text Mining
No ratings yet
A Survey of Topic Modeling in Text Mining
7 pages
Contact Us Track Your Order: - 44-330-175-5511 - Europe - Language
No ratings yet
Contact Us Track Your Order: - 44-330-175-5511 - Europe - Language
18 pages
Topic Model For LDA
No ratings yet
Topic Model For LDA
9 pages
Discovering Emerging Topics in Social Streams Via Link-Anomaly Detection
No ratings yet
Discovering Emerging Topics in Social Streams Via Link-Anomaly Detection
5 pages
Probabilistic Topic Modeling and Its Variants - A Survey: Padmaja CH V R S Lakshmi Narayana
No ratings yet
Probabilistic Topic Modeling and Its Variants - A Survey: Padmaja CH V R S Lakshmi Narayana
5 pages
03jul201502074415 PDF
No ratings yet
03jul201502074415 PDF
6 pages
Likap Notebook Manual
No ratings yet
Likap Notebook Manual
8 pages
BEP 328 - Project Management 8: Negotiating Solutions
No ratings yet
BEP 328 - Project Management 8: Negotiating Solutions
13 pages
Numerical Analysis: MATLAB Practical (Autumn 2020) B.E. III Semester Thapar Institute of Engineering & Technology Patiala
No ratings yet
Numerical Analysis: MATLAB Practical (Autumn 2020) B.E. III Semester Thapar Institute of Engineering & Technology Patiala
6 pages
Ales Hrdlicka - Some Results of Recent Anthropological Exploration in Peru, 1911
No ratings yet
Ales Hrdlicka - Some Results of Recent Anthropological Exploration in Peru, 1911
40 pages
Activity 2 - Crossword Puzzle
No ratings yet
Activity 2 - Crossword Puzzle
2 pages
Lecture 2 Design Controls and Criteria
No ratings yet
Lecture 2 Design Controls and Criteria
17 pages
Questionpaper Paper1P June2017 PDF
No ratings yet
Questionpaper Paper1P June2017 PDF
36 pages
What Is Chemical Engineering
No ratings yet
What Is Chemical Engineering
8 pages
Futo Digital Bootcamp 2024 Timetable
No ratings yet
Futo Digital Bootcamp 2024 Timetable
3 pages
Skema Toshiba c805 By3 By4
No ratings yet
Skema Toshiba c805 By3 By4
49 pages
Tyagi Wang Wen Zuo
No ratings yet
Tyagi Wang Wen Zuo
17 pages
Filler Metals: SMAW (Stick) Solutions - Electrodes
No ratings yet
Filler Metals: SMAW (Stick) Solutions - Electrodes
2 pages
931 Nesteoil GB
No ratings yet
931 Nesteoil GB
1 page
Kaizen Costing
No ratings yet
Kaizen Costing
4 pages
Boolean Xor Based (K, N) Threshold Visual Cryptography For Grayscale Images
No ratings yet
Boolean Xor Based (K, N) Threshold Visual Cryptography For Grayscale Images
4 pages
Notes On The Balance of Power
No ratings yet
Notes On The Balance of Power
1 page
Using OpenRefine
From Everand
Using OpenRefine
Ruben Verborgh
4/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

18

Uploaded by

18

Uploaded by

ScienceDirect

Available online at www.sciencedirect.com

The 4th International Conference on Arabic Computational Linguistics (ACLing 2018),

Keywords: Topic modeling;LDA;Clustering; Data Mining

1877-0509 © 2018 The Authors. Published by Elsevier B.V.

4. Probabilistic topic modeling method

4.1. Probabilistic Latent Semantic Analysis

4.2. Latent Dirichlet Allocation

4.3. Variants of LDA

5. Topic detection in Arabic

6. Conclusions and future work

Uncertainty in artificial intelligence, pp. 289–296, 1999.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.