0% found this document useful (0 votes)

39 views7 pages

2020 Lrec-1 258

Uploaded by

yuti6211

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views7 pages

2020 Lrec-1 258

Uploaded by

yuti6211

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2103–2109

Marseille, 11–16 May 2020

c European Language Resources Association (ELRA), licensed under CC-BY-NC

Event Extraction from Unstructured Amharic Text

Ephrem Tadesse, Rosa Tsegaye Aga, Kuulaa Qaqqabaa
Jimma University, Armauer Hansen Research Institute, Addis Ababa Science and Technology University
Jimma, Addis Ababa, Akaki Kality Sub-City
ephe11ta@gmail.com, rosatsegaye@gmail.com, kuulaa@gmail.com

Abstract
In information extraction, event extraction is one of the types that extract the specific knowledge of certain incidents from texts. Event
extraction has been done on different languages text but not on one of the Semitic language, Amharic. In this study, we present a system
that extracts an event from unstructured Amharic text. The system has designed by the integration of supervised machine learning and
rule-based approaches. We call this system a hybrid system. The system uses the supervised machine learning to detect events from the
text and the handcrafted and the rule-based rules to extract the event from the text. For the event extraction, event arguments have been
used. Event arguments identify event triggering words or phrases that clearly express the occurrence of the event. The event argument
attributes can be verbs, nouns, sometimes adjectives (such as ˜rg/wedding) and time as well. The hybrid system has compared with the
standalone rule-based method that is well known for event extraction. The study has shown that the hybrid system has outperformed the
standalone rule-based method.

Keywords: Event extraction, under-resourced language, Machine learning algorithms, Nominal events.

1. Introduction guage Processing (NLP) tasks we are interested to tackle

this problem. In this study we present a comprehensive
Amharic is a Semitic language, related to Hebrew, Arabic,
technique for extracting events from Amharic unstructured
and Syriac. It has been the second most spoken Semitic lan-
text.
guage by around 27 million speakers (Mulugeta and Gasser,
2012) primarily in Ethiopia next to Arabic language. It is The rest of the paper is organized as follows. Section 2. dis-
currently the official language of government in Ethiopia, cusses the related works of this study. Section 3. explains
and has been since the 13th century. In addition, it is the the methodology of the study. It motivates and elaborates
medium of instruction in primary and secondary schools as the event extraction models and algorithms that have used
well as the source language for a large body of historical in the study. Section 5. presents the experimental results of
texts. As a result, most documents in the country have been the study, and discussion and comparison of the different
produced in Amharic and there has been an enormous pro- result of the models that have proposed in the study. The
duction of electronic and online accessible Amharic docu- study has concluded in Section 6. by conclusion and future
ments. work.
The predominant problem of underrepresented languages
is the lack of resources (Sohail and Elahi, 2018). Most re- 2. Related Work
cently on the web fewer online Amharic textual resources Recently event extraction has gained popularity due to its
are available for people in their everyday lives. However, wide applicability for various NLP applications. Most
researchers and other interested group of people in linguis- event extraction systems support English and European lan-
tic and computing disciplines face difficulties because of guage texts from different domains using a variety of tech-
Amharic presents sophisticated language-specific issues. niques. Now a days, Semitic languages are typically a topic
The existing information extraction systems that have de- of interest for researchers. Event extraction for Amharic
veloped for Hebrew, Arabic, or other languages have not has not been done yet; therefore this study is the first in
represented the linguistic structure and morphological rich- this particular information extraction (IE) application. Due
ness of the languages. But events in Amharic text are pre- to the variation of the language structure the existing tech-
dominantly expressed through verbs and nouns. Therefore, niques and tools applied to other languages can’t be directly
these systems can not be used directly for Amharic texts. used for this particular task.
For example, consider the following sentence "˜~ There are some progressive work that have been done so
Œs¼m 1965 ‚tÓÍÑ ¤wµt ¶g› ¶¤r." /“Ethiopia far on Amharic NLP tasks with promising results includ-
was in turmoil in Monday, September , 1965”. In this sen- ing part of speech tagging, morphological analyzer, named
tence "¤wµt" and "¶g›" refers to an event, whereas the entity recognition, base phrase chunking and text classifica-
phrase "˜~ Œs¼m 1965" is a time argument which indi- tion as in (Adafre, 2005; Ibrahim and Assabie, 2014; Sik-
cates when the event happened. The word "‚tÓÍÑ" refers dar and Gambäck, 2018; lasker et al., 2007). Various tech-
the named entity or participant of the event. niques have been widely employed for each task to enhance
Because of this prominent significance of extracting events the accuracy and handling linguistic exceptions. However,
from unstructured Amharic text for high level Natural Lan- there have not been ready-made pre-components and well

2103
organized datasets. Besides these limitations there has not portant event arguments that are event agent, event loca-
been any undergoing research on event extraction from un- tion, event trigger, event target, and event product and
structured Amharic text due to difficulties in syntactic and event time. The tools and dataset that have used in their
semantic status of class of functional verbs. The other chal- study have utilized twitter streaming API and preprocessed
lenges are identifying event arguments. In our case tem- through AraNLP Java-based package. Moreover, after the
poral event arguments have considered. However, it has visualization services event extraction like calendar, time-
a challenge in Amharic texts. Amharic texts have repre- line supplied through the help of ontological knowledge
sented in various forms such as; sequence of words, Arabic bases. In their study the experimental results show that the
and Geez’e script numerals. As such it needs extra normal- approach has an accuracy of 75.9 for T1: event trigger ex-
ization and syntactic analyzing scheme to tackle temporal traction, 87.5 for T2: Event time extraction and 97.7 for T3:
argument. event type identification. Their study claims that applying
Semitic languages like Arabic, Hebrew and Amharic have this kind of domain dependent approach to extract events
much more complex morphology than English. The mor- from tweets scores significant results.
phological variation limits the research progress on Natural In general there has been a lot of work in event extrac-
language processing, in general. However, there are stud- tion such as (Arnulphy et al., 2015; Tourille et al., 2017) in
ies relative to other Semitic languages. For example, (Al- European languages, predominantly in English; But, much
Smadi and Qawasmeh, 2016) have done their study on au- less research in other languages. An approach or technique
tomatic event extraction for Arabic language using knowl- that has used in one language to extract events might be
edge driven approach which concentrates on tagging the used in languages as well if they have a similar grammar
event trigger instances and related entities. One of their and character set. However, If languages have very differ-
main contribution is to link event with the entity mention. ent grammar, or a very different written representation, it
However, in our case we mainly concentrate on extracting will be difficult to use related approaches or techniques to
events and its arguments with the advantage of hand crafted extract events.
rules and machine learning classifiers. There has been research in part-of-speech tagging on
Hindi is another under-resourced an indo European lan- Amharic text (Adafre, 2005) and on Amharic morphol-
guage that has more common words with Arabic. In ogy (Mulugeta and Gasser, 2012) which are helpful for
(Ramrakhiyani and Majumder, 2015) solely has focused on event detection, but not directly related to the actual event
Temporal Expression Recognition in Hindi using interac- extraction task from Amharic text. For this particular task
tive handcrafted rules. They aim to carry out two basic the state of art Event detection system typically uses a ro-
goals that are identification of the temporal expressions in bust machine-learning techniques. Examples of such sys-
plain text and classifying the identified temporal expres- tems are (Arnulphy et al., 2015). Because of the lack of
sion. However, extracting events along with the corre- sufficient labeled training data for Amharic, we bootstrap
sponding arguments gains more advantage for the ease of an event extractor using a rule-based algorithm.
chronological ordering of events in their occurrences. In
addition it can be extended for event argument relationship 3. Methodology
extraction tasks. According to (Frederik Hogenboom and Kaymak, 2016)
(Smadi and Qawasmeh, 2018) has proposed a supervised event extraction techniques have been evaluated based on
machine learning approach for extracting events from Ara- the works on a set of qualitative dimensions that are
bic tweets. The study mainly focuses on four main tasks: the amount of required data, knowledge, expertise, inter-
Event Trigger Extraction, Event Time Expression Extrac- pretability of the results and the required development and
tion, Event Type Identification, and Temporal Resolution execution times. In this study, supervised machine learn-
for ontology population. Significant scores have resulted ing techniques, handcrafted rules and hybrid techniques
for each task covered under this paper includes; T1: event have employed to detect and extract events and its argu-
trigger extraction F-1= 92.6, and T2: event time expres- ments from unstructured text. Our focus of interest has
sion extraction F-1= 92.8 in T3: event type identification been extracting events and event arguments from unstruc-
Accuracy= 80.1. They have claimed that the third task is tured Amharic text. Event arguments include identification
relatively better than the previous works done using simi- of event trigger words; where in Amharic unstructured text
lar techniques like document-term matrix or bag-of-words. nominal events become ambiguous. Such events can be ar-
(Arnulphy et al., 2015) has also proposed supervised ma- guments of other events, and they often have been hard to
chine learning approach but to detect French and English be identified.
Time Markup Language (ML) Events. The study has sug-
gested the approach to be used by combining different su- 3.1. Dataset preparation
pervised machine learning algorithms such as conditional Unlike other languages, Amharic language does not have
random field, decision tree and k-nearest neighbor includ- any standardized annotated publically available corpora
ing language models. like Treebank1 and PropBank2 for English. The news do-
(Al-Smadi and Qawasmeh, 2016) has proposed knowledge- main is more preferable data source. Because its publicly
based approach for event extraction from Arabic Tweets. available and contains rich source of information that helps
There are three subtasks covered under their study such as
1
event trigger extraction, event time extraction, and event https://catalog.ldc.upenn.edu/LDC99T42
2
type identification. The event expression includes im- http://www.nltk.org/howto/propbank.html

2104
for any NLP applications such as entity extraction, event their derivation. The lemma of a word is very crucial fea-
and temporal information extraction and co-reference res- ture for the classifier. We have applied hornmorpho 8 that
olution. In this study, we build our own dataset by scrap- is a system to process the morphology of Amharic. The
ing top local websites. These are Zehabesha3 , Satenaw4 , system works for the other Ethiopian local languages such
Ezega5 , and one international website BBC Amharic6 that as Amharic, Affan Oromo, and Tigrinya languages. How-
contains relevant Amharic unstructured text. A Python ever, the system misses some unique and compound words.
Beautiful Soup library 7 has been used for for scraping Thus, we have developed our own unique exceptional dic-
the sites. The scraped texts are from all domains such tionary (Gazetteer) to handle exceptional keywords. Find-
as economy, politics, technology and sport. Simple reg- ing a pattern to get only the lemma of the hornmorpho
ular expressions have been used to retrieve only relevant result has also other difficulties; because sometimes the
text contents. A total of 659,848 words have extracted. co responding word doesn’t contain full information. In
Along with our own dataset, we have used Amharic corpora that case the Hornmorpho skips subject, object, grammar,
that have been prepared by the Ethiopian Languages Re- or word classes of a specific words. For Amharic Lan-
search Center of Addis Ababa University in a project called guage, Hornmorpho has evaluated using 200 randomly se-
the annotation of Amharic news documents (Demeke and lected verbs and nouns/adjectives in (Gasser, 2011) study.
Getachew, 2006). The project has been tagging manually The output has compared with manually identified Amharic
each Amharic word in its context with the most appropri- verbs and nouns. 99%; Amharic nouns: 95.5%. Although,
ate parts-of speech tag. The corpus contains 210,000 words we prefer to use this tool in our study, because of the lack of
that has collected from 1065 Amharic news (documents of other ready-made NLP components for Amharic language.
Walta Information Center (Demeke and Getachew, 2006)). The Jython library9 has been used to integrate the python
Walta Information Center is a private news and information based morphological analyzer for Amharic to get morpho-
service provider located in Addis Ababa, Ethiopia. logical features of words.
Besides analyzing the verb morphology, annotating the ex-
3.2. Data preprocessing act word class of the instance is also the required prepro-
In this step, data has converted to the appropriate format cessing task in this study. To do so, we have been us-
required for the respective information extraction process. ing the publically available language independent part-of-
In this study the scraped texts have many junks such as speech tagger, which is TreeTagger10 . TreeTagger is a tool
markup tags and other special characters. The first step for annotating text with part-of-speech and lemma informa-
in our study is raw text preprocessing. This step con- tion. It has been successfully used to tag German, English,
tains cleaning unwanted junks, sentence splitting, tokeniz- French, Italian, Danish, Swedish, Norwegian, Dutch, Span-
ing, word stemming, character normalization, stop word re- ish, Bulgarian, Coptic and Spanish texts. It is adaptable to
moval and Part Of Speech tagging (POS). Unlike other lan- other languages as well if a lexicon and a manually tagged
guages, Amharic is a morphologically rich language that training corpus are available (Schmid, 1994). It consists of
posses complicated syntactic features. This makes cum- two programs: the training program that creates a param-
bersome the preprocessing task to analyze the morpholog- eter file from a full-form lexicon and the lexicon genera-
ical features of representative tokens. The sentence splitter tor along with a hand tagged corpus. The tagger program
splits using Amharic sentence demarcations ( ~ ; ? !). reads the parameter file and annotates the text with part of
Amharic language has different characters with the same speech and lemma information. To prepare a parameter file
meaning and pronunciation. Those different characters for TreeTagger we used a total of 217 000 Amharic manu-
should be treated equally because there is no change in ally tagged corpora with 9 distinguished word classes and
meaning regardless of the linguistic view of orientation corresponding lemmas. We have conducted evaluation of
among the characters. For example:- (€,K,ƒ), (˜, P), TreeTager using 92,456 randomly untagged tokens. The
(a, A, €) and (Ð, Ø), each group has the same mean- output of TreeTager results 99.9% accuracy compared with
ing (Gasser, 2011). As a result, we develop a character manually tagged Amharic words.
normalizer that enables to normalize those characters to an The other crucial step in our preprocessing module is nor-
ordinary conceivable form This task helps the performance malizing Amharic temporal arguments. There are various
of our system. The other preprocessing task is stop word representations of date time expressions in Amharic such
removal. Like other language, Amharic has its own list of as Arabic, Geez and using alphanumeric characters.
stop words such as conjunctions, articles and prepositions. For example, the following sentences show the different
In our case we have adopted stop word lists that has used in date time representation:
(Tsedalu, 2010) study. In addition, we have built our own
(€¨l ¤1995 A.m °Â†Ô ~ ) using Arabic characters
stop word lists, as well with the help of linguistic experts.
(€¨l ¤] È°} Œµ È°¹ €mst A.m °Â†Ô ~ )
Then a total of 235 stop word have identified.
using alphanumeric characters
The other important preprocessing task is analyzing
(€¨l ¤ |:C9CB5| A.m °Â†Ô ~ ) using geez
Amharic verb morphology to identify lemma of words and
characters
3
http://www.zehabesha.com/amharic/
4
https://www.satenaw.com/amharic/
5 8
https://www.ezega.com/News/am/ https://www.cs.indiana.edu/ gasser/HLTD11/
6 9
https://www.bbc.com/amharic https://www.jython.org/
7 10
https://www.crummy.com/software/BeautifulSoup/bs4/doc/ https://reckart.github.io/tt4j/

2105
The probabilistic model of naive Bayes classifiers is based
on Bayes’ theorem. This algorithm works on the assump-
tion that the features in a dataset are independent from each
other.
Figure 1: Geez numerals LIBSVM is a library for Support Vector Machines (SVM).
It has gained wide popularity in machine learning and many
other areas. SVM finds an optimal solution and maximizes
The above sentences refer logically similar meaning with the distance between the hyperplane and the difficult points
various syntactic representation. In order to handle tem- close to decision boundary. As (Chang and Lin, 2011)
poral arguments of the event, a normalization and conver- stated, if there are no points near the decision surface, then
sion scheme to convert temporal representations into one there are no very uncertain classification decisions.
form. The conversion of Ge’ez numerals to uniform Arabic The other classifier algorithm that has used in this study is
number system is not straight forward as other normaliza- decision tree. Decision tree is a Tree-based classifier for
tion tasks because of the irregularities of Unicode values for instances that have represented as feature-vectors. There is
Ge’ez numerals. Some of the Ge’ez numerals are presented one branch for each value of the feature, and leaves specify
in Figure 1. the category. It represents arbitrary classification function
over discrete feature vectors. For the decision tree, J48 al-
3.3. Event detection using supervised machine
gorithm have used. J48 is an algorithm used to generate
leaning
a decision tree. The decision trees generated by J48 can
In this study, supervised machine learning approach has be used for classification, and for this reason, J48 is often
been employed. Supervised machine learning classifiers referred to as a statistical classifier.
predict new events based on the given labeled training sets. The above algorithms have been used to train the models
It uses event properties and characteristics from training using the labeled dataset as an input. Then the models have
data and generalize the unseen situations to predict events. detected the instances even on a test set as on-event and off-
In this study, the supervised learning approach has been event classes. The POS tag feature has showed good per-
used to detect the events from a text. formance as the best syntactic feature to detect the events
In this study, the datasets are unstructured text and doc- based on the feature selection recommendation.
uments. Therefore, the unstructured text sequences have
converted into a structured feature space using mathemati- 3.4. Event Extraction using Rule Based
cal modeling. For classification, feature extraction can be Approach
seen as a search among all possible transformations of the
Rule based learning is one of the information extraction
feature set for the best one. This preserves class maintain-
method that utilizes the extraction pattern to retrieve in-
ability as much as possible in the space with the lowest
formation from a text document. In this study, a stan-
possible dimension. In this study, the features contain in-
dalone rule-based approach has proposed to enhance the
formation of the text that have used to provide necessary
accuracy of event extraction system. Unlike other lan-
information associated to a given events. These features in-
guages, Amharic has a subject-object-verb agreement and
crease the confidence level of predicting a token as an event.
other morphological features that makes cumbersome the
Thus, the feature extractor component that has used in this
rules construction. As (Yunita Sari and Zamin, 2010) has
study is responsible for extracting candidate attributes for
mentioned, construction of extraction pattern is based on
the classifier. The features that have used in this study are
syntactic or semantic constraint and delimiter or combi-
the following:-
nation of both syntactic and semantic constraint. Events
• Words of the instance dominantly exist as nominal and verbs (Ramesh and Ku-
mar, 2016). The nominal events are ambiguous, in which
• POS of the corresponding word they can appear in deverbal or non-deverbal nouns form.
Thus, we need morphological features of the instances to
• Lemma of the corresponding word
disambiguate nominal events. To do so, morphological ana-
• List of lexicons for exceptional events lyzer has employed to get the morphological features of the
event that have mentioned in the instances. For example:- (
A binary classifier has been used to detect events from Î‚tÓÍÑ hz©m ¼Êh ¤ö‰ àÝ ŒÈ¹Ýt €yàlÛm ~
Amharic text. The classifier detects events from the text ) In this sentence, the underline word (àÝ) is de-
and classify the text as on-event and off-event. The on- rived from the verb (fÕm). It seems an adjective, but,
event class refers the instance that contains event trigger it’s a deverbal entity we call it a nominal event. The rules
keywords; Whereas the off-event class refers the instance have been developed based on syntactic features of words
that does not infer the event trigger keywords. From the ma- with the help of a carefully constructed list of gazetteers.
chine learning algorithms, Naive Bayes, decision tree and The POS tag and lemma of the word have been used as
SVM algorithms have proposed based on their widely use an abasement for the handcrafted linguistic rules. Differ-
in text classification tasks (Pranckevicius and Marcinkevi- ent components have been used to get syntactic features of
cius, 2017). words using Tree Tagger and Hornmorpho. The pattern ex-
Naive Bayes classifier is linear classifier that is known for tractor has been developed based on the syntactic features.
being simple and very efficient for text classification tasks. Simple rules have been applied to extract detected events.

2106
For example:- (€¤¤) N (t‰nt) ADV (ÎÚËw) VN expertise is generally high due to the combination of mul-
(¤–) N (°) VP ( ~ ) . In this example, the snippets tiple techniques compared to pure knowledge driven tech-
of handcrafted rules have tackled based on the POS tagger niques.Moreover, the interpretability of a system benefits
results. The formal structures are not always regular to de- to some extent from the use of semantics as in knowledge-
velop stable rules. In contrast, the morphological analyzer based techniques(Baradaran and Mineai-Bidgoli, 2015).
is very helpful, because of the existence of deverbal events The other technique that has employed in this study is com-
that have been act as ambiguous. bining both supervised machine learning and rule-based
Some of the rules that have applied in this study for the techniques to extract events from Amharic unstructured
hybrid system includes the following:- text. The machine learning approach mainly focuses on
coverage (Recall) apart from sensitivity (precision) while,
1. Automatically label preprocessed texts with their cor- the handcrafted rules approach is on achieving the highest
responding word classes or parts-of-speeches. potential of precision value based on the incorporated rules.
In our case, the machine learning classifiers ignores nomi-
2. Get the morphological features of words including nal events in comparison with the verbal events. Therefore,
word, subject, root, lemma , object, grammar and we incorporate some rules to tackle the missed events from
preposition the machine learning classifiers result. Deverbal nouns
3. Usually events are expressed using verbs and nouns. exhibit both nominal and verbal syntactic representations
Check the neighboring words using bigram language They serve as concrete nouns, but also participate in verbal
models. Because, not all nouns have been events and constructions where they require arguments and accept the
sometimes nouns come at the beginning, then they are aspectual modification. Nominal events sometimes appear
the subjects or participant of the event not exactly the as deverbal and non-deverbal, in which deverbal entities
event. have been derived from verbs in-contrary the non-deverbal
entities have not derived from verbs. e.g. (àÕ) is a de-
4. Identifying the nominal events; To do so, the morpho- verbal entities that is an event derived from verb (fÕm).
logical analyzer has main role on indicating the cita- An event is a situation that lasts for a moment. By this def-
tion of the respective nouns; i.e words that have ex- inition, nominal can be an event e.g (˜rg)/ wedding is a
actly nominal can be deverbal or non deverbal nouns. non-deverbal nominal event. Another example (Î€¤¤ ˜rg
But, deverbal nouns has a citation of verbs. ƒmŠ 16, 2010 A.m nw.) . Simply knowing the mor-
phological variation of words and having a common non-
5. Words that has categorized as verbs and verb group deverbal nominal list from the gazetteers (list of exceptional
word classes as part-of-speech and it’s infinitive forms non deverbal events) help to get rid of event ambiguity. We
have selected as primary candidates. also get those deverbal events from the morphological ana-
lyzer and non-deverbal events from the gazetteers. Apply-
6. Check non deverbal nouns (usually acts as events) ing such disambiguation scheme improves accuracy of our
from carefully built gazetteers (list of non deverbal system in proportion to the standalone rule based approach.
noun lexical). Because of our limited dictionary a
ternary search tree algorithm has been applied to en- 4. Model Evaluation
hance the efficiency.
Among the standard information extraction evaluation met-
7. Identifying words that contain temporal keywords. rics precision, recall and F-measure have been used to eval-
The temporal indicator keywords have carefully built uate the performance of models. In this study a 10-fold
the list of commonly used temporal expressions in cross-validation technique has used to split the dataset. In
Amharic. In addition, regular expressions have been this case by shuffling the dataset randomly, 80% of the data
constructed to tackle regular date-time expressions. has used for training and 20% has used for test.
Bi-gram language models have been applied to find
temporal arguments. 5. Experimental Results
For example:- Î€¤¤ ˜rg ¶Ú ¶w ~ / "Abebe’s In this study, a total of five experiments have been con-
wedding is tomorrow." ducted. Three experiments are on supervised learning al-
From this sentence, the word ˜rg is a deverbal gorithms (Naive Bayes, Decision Tree and SVM) to de-
nouns that has been extracted as an event and it’s ac- tect events from the unstructured Amharic text. One ex-
tually an event, where the word ¶Ú is an event ar- periment is on the rule-based approach to extract the event
gument extracted as temporal event argument of the from the unstructured Amharic text. The other experiment
major event ˜rg is combining the supervised learning and the rule-based ap-
proaches.
The first three experiments are training a model using the
3.5. Event Extraction using hybrid Approach three selected supervised learning algorithms. All features
Unlike knowledge driven systems; in hybrid event extrac- have used to see the effect of each attribute on the event
tion systems the amount of required data increases, due detection. Each algorithm has been experimented on the
to the usage of supervised machine learning techniques, full features.
yet typically remains less than the case with purely data- As the result shows in Table 1, among the three algo-
driven techniques. Where complexity and hence required rithms, the Naïve Bayes (NB) classifier has outperformed

2107
the other classifiers to detect events. It has showed F-score technique. As the table shows, the combination of both rule
of 0.915% on the weighted average, 0.831 on the On-Event based and supervised machine learning classifiers bring sig-
class, and 0.944 on the Off-Event class. This experimen- nificant result to extract events from unstructured Amharic
tal result confirms the advantage of Naïve Bayes classifier text.
for event detection task. We get encouraging result using
a machine learning classifier for event detection task. The
Table 2: Over all event extraction evaluation of experimen-
problem resides on deverbal entities ambiguousness.
tal result comparison

Table 1: Experimental results for machine learning Algo-

Standard measures
rithms to detect events Techniques
Precision Recall F-measure
Rule based Approach 0.976 0.952 0.959
Measures Hybrid Approach 0.979 0.962 0.971
Algorithms Classes
Precision Recall F-measure
0.866 0.798 0.831 On-Event
NB 0.932 0.957 0.944 Off-Event
0.915 0.916 0.915 Weighted Ave.
0.895 0.395 0.548 On-Event 6. Conclusion and future work
LIBSVM 0.825 0.984 0.897 Off-Event
0.843 0.833 0.808 Weighted Ave. In this study we have presented a system that extract events
0.891 0.698 0.783 On-Event from unstructured Amharic text. The system has built by
J48 0.903 0.971 0.935 Off-Event combining supervised learning and the standalone rule-
0.9 0.9 0.896 Weighted Ave. based techniques. The supervised machine learning have
used to detect events and the standalone rule-based tech-
nique to extract the even from the unstructured Amharic
From the machine learning event detection system, it has text. For the supervised machine learning, the three algo-
observed that due to linguistic features verb triggered rithms (naïve bayes, support vector machine and decision
events have equal weight by the classifier with the non- tree) have proposed. Then Naïve bayes has outperformed
event class. This has been the reason that motivates the to detect events from the unstructured Amharic texts.
study to come up with developing hand crafted rules to get The standalone rule based approach has evaluated indepen-
rid of the ambiguities. In this particular technique, in or- dently to extract events from unstructured Amharic text.
der to make a clear comparison with the hybrid based event However, the proposed hybrid system has outperformed us-
extraction system, similar dataset have been used. ing the Naïve bayes algorithm to detect the event.
The other experiment is on the rule-based approach. As the In the future we need to address other relevant event ex-
result Table 2 shows, the F-score of this approach model is traction tasks such as building larger events and temporally
0.959. This shows that it has outperformed the supervised annotated corpus, employing powerful deep learning tech-
machine learning three models. niques to extract relation between event and time, extract-
The last experiment that has been conducted in this study ing relation between events and document creation time.
is on the hybrid event extraction technique. The perfor-
mance of this method relay on the power of having the ad-
7. Bibliographical References
vantage of the rule based and supervised machine learning
methods in conjunction. The machine learning classifiers Adafre, S. F. (2005). Part of speech tagging for amharic
have labeled the instances as on-event and off-event binary using conditional random fields. In Proceedings of the
classes by assigning different weights. An instance that has ACL workshop on computational approaches to semitic
assigned high probability value by the classifier is catego- languages.
rized under on-event class; which is actually an event. On Al-Smadi, M. and Qawasmeh, O. (2016). Knowledge-
the other hand an instance which has assigned low probabil- based approach for event extraction from arabic tweets.
ity value than the on-event class instance has been mostly International Journal of Advanced Computer Science
non-event or categorized as off-event class. Thus positive and Applications, 7(6).
predicated values accepted as it’s i.e. instances categorized Arnulphy, B., Claveau, V., Tannier, X., and Vilnat, A.
as on-event with highest weighted value. Because, it is pre- (2015). Supervised Machine Learning Techniques to
dicted exactly as an event, while instances getting equal Detect TimeML Events in French and English. In Chris
weight by the classifier in both class are going to be the tar- Beimann, et al., editors, 20thInternational Conference
get instances for the heuristics. Equal weighted instances on Applications of Natural Language to Information Sys-
are considered as ambiguous. Using the help of syntactic tems, NLDB 2015, volume 9103 of Proceedings of the
features, ambiguous instances have been handled. As a re- NLDB conference, Passau, Germany, June. Springer.
sult, the number of on-event instances correctly extracted Baradaran, R. and Mineai-Bidgoli, B. (2015). Event ex-
increases when heuristics has applied. In order to get the traction from classical arabic texts. International Arab
false negative and the false positive values we have used a Journal of Information Technology", 12(5).
manual scanning of the result to be accurate. Chang, C.-C. and Lin, C.-J. (2011). Libsvm: A library for
Table 2 shows the hybrid technique experimental result as support vector machines. ACM Trans. Intell. Syst. Tech-
well and compare with the experimental result of rule based nol., 2(3):27:1–27:27, May.

2108
Demeke, G. and Getachew, M. (2006). Manual annotation pattern extractor and named entity recognition: A hybrid
of amharic news items with part-of-speech tags and its approach. IEEE.
challenges. 01.
Frederik Hogenboom, F. F. and Kaymak, U. (2016). A sur-
vey of event extraction methods from text for decision
support systems. Elsevier.
Gasser, M. (2011). Hornmorpho: a system for morpho-
logical processing of amharic, oromo, and tigrinya. In
Conference on Human Language Technology for Devel-
opment.
Ibrahim, A. and Assabie, Y. (2014). Amharic sentence
parsing using base phrase chunking. In COLING 2014.
lasker, L., Argaw, A. A., and Gamback, B. (2007). Apply-
ing machine learning to amharic text classification. In
Proceedings of the 5th World Congress of African Lin-
guistics.
Mulugeta, W. and Gasser, M. (2012). Learning morpho-
logical rules for amharic verbs using inductive logic pro-
gramming.
Pranckevicius, T. and Marcinkevicius, V. (2017). Compar-
ison of naïve bayes, random forest, decision tree, sup-
port vector machines, and logistic regression classifiers
for text reviews classification. In Baltic J. Modern Com-
puting.
Ramesh, D. and Kumar, S. S. (2016). Event extraction
from natural language text. International Journal of En-
gineering Sciences and Research Technology (IJESRT),
5(7).
Ramrakhiyani, N. and Majumder. (2015). Approaches to
temporal expression recognition in hindi. ACM Transac-
tions on Asian and Low-Resource Language Information
Processing (TALLIP), 14(1).
Schmid, H. (1994). Probabilistic part-of-speech tagging
using decision trees. In International Conference on
New Methods in Language Proceeding.
Sikdar, U. and Gambäck, B., (2018). Named Entity Recog-
nition for Amharic Using Stack-Based Deep Learning:
18th International Conference, CICLing 2017, Budapest,
Hungary, April 17–23, 2017, Revised Selected Papers,
Part I, pages 276–287. 01.
Smadi, M. and Qawasmeh, O. (2018). A supervised ma-
chine learning approach for events extraction out of ara-
bic tweets. In Fifth International Conference on Social
Networks Analysis, Management and Security, SNAMS
2018, Valencia, Spain, October 15-18, 2018, pages 114–
119.
Sohail, O. and Elahi, I. (2018). Text classification in an
under-resourced language via lexical normalization and
feature pooling. In Twenty-Second Pacific Asia Confer-
ence on Information Systems.
Tourille, J., Ferret, O., Tannier, X., and Neveol, A. (2017).
Temporal information extraction from clinical text. In
Proceedings of the 15th Conference of the European
Chapter of the Association for Computational Linguis-
tics, volume 2 of EACL, page 739–745.
Tsedalu, G. (2010). Information extraction model from
amharic news texts. Master’s thesis, Addis Ababa Uni-
versity.
Yunita Sari, M. F. H. and Zamin, N. (2010). Rule based

2109

Polyglot Notes. Practical Tips for Learning Foreign Language
From Everand
Polyglot Notes. Practical Tips for Learning Foreign Language
Yuriy Ivantsiv
5/5 (8)
Scanviewer V6.1.1 User Manual
No ratings yet
Scanviewer V6.1.1 User Manual
47 pages
Exploring Pre-Trained Language Models For Event Extraction and Generation
No ratings yet
Exploring Pre-Trained Language Models For Event Extraction and Generation
11 pages
Advanced Certification in Full Stack Developer Course IITG
No ratings yet
Advanced Certification in Full Stack Developer Course IITG
13 pages
Gemini For Google Cloud Documentation
No ratings yet
Gemini For Google Cloud Documentation
2 pages
How To Make Speakers
No ratings yet
How To Make Speakers
4 pages
2 IntelligentAgent
No ratings yet
2 IntelligentAgent
31 pages
Updated Operate Presentation Packages
No ratings yet
Updated Operate Presentation Packages
12 pages
1 AI Intro
No ratings yet
1 AI Intro
27 pages
Model Data Object
No ratings yet
Model Data Object
18 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Developing Cascading Style
No ratings yet
Developing Cascading Style
19 pages
Estimate Cost
No ratings yet
Estimate Cost
27 pages
Presentation IT Ext
No ratings yet
Presentation IT Ext
149 pages
4.18.2024 Impartiality-Confidentiality
No ratings yet
4.18.2024 Impartiality-Confidentiality
24 pages
Workshop Equipment List (Status)
100% (2)
Workshop Equipment List (Status)
4 pages
Events
No ratings yet
Events
16 pages
HTML2
No ratings yet
HTML2
73 pages
Certificate Generation Using Blockchain Report-1
No ratings yet
Certificate Generation Using Blockchain Report-1
31 pages
Applsci 13 11004 v4
No ratings yet
Applsci 13 11004 v4
14 pages
AMER - BRO - Stroboscopy Solution - (MKENT-2482EN-U Rev 2) - 09.2020
No ratings yet
AMER - BRO - Stroboscopy Solution - (MKENT-2482EN-U Rev 2) - 09.2020
4 pages
A Survey On Event Extraction From Webpage
No ratings yet
A Survey On Event Extraction From Webpage
6 pages
Batt Mobile - Digital Strategy Deck
No ratings yet
Batt Mobile - Digital Strategy Deck
72 pages
Learn & Retain Italian with Spaced Repetition
From Everand
Learn & Retain Italian with Spaced Repetition
ADROS VERSE EDUCATION S.R.L.
No ratings yet
Program Technical Sessions
No ratings yet
Program Technical Sessions
17 pages
Association For Computing Machinery ACM SIGPLAN Proceedings Template 1
No ratings yet
Association For Computing Machinery ACM SIGPLAN Proceedings Template 1
4 pages
2020 Isa-1 10
No ratings yet
2020 Isa-1 10
7 pages
Coda Cofee and Bext360 SC: MH, THING, RNET of Things, and BC
0% (1)
Coda Cofee and Bext360 SC: MH, THING, RNET of Things, and BC
5 pages
Finals Activity 1 - 6.4.3.3 Packet Tracer - Connect A Router To A LAN
No ratings yet
Finals Activity 1 - 6.4.3.3 Packet Tracer - Connect A Router To A LAN
5 pages
Graham Giller Wilmott Talk
No ratings yet
Graham Giller Wilmott Talk
31 pages
A Survey of Event Extraction From Text
No ratings yet
A Survey of Event Extraction From Text
27 pages
Anaphora and Language Design
From Everand
Anaphora and Language Design
Eric Reuland
No ratings yet
Sectona Tech Overview
No ratings yet
Sectona Tech Overview
17 pages
Learning Morphological Rulesfor Amharic Verbsusing Inductive Logic Programming
No ratings yet
Learning Morphological Rulesfor Amharic Verbsusing Inductive Logic Programming
7 pages
Renolit Poliplex Series - en
No ratings yet
Renolit Poliplex Series - en
2 pages
GTPL Bill
No ratings yet
GTPL Bill
1 page
Elex 2021 16 pp269-287-1
No ratings yet
Elex 2021 16 pp269-287-1
19 pages
N Overview of Event Extraction and Its Applications
No ratings yet
N Overview of Event Extraction and Its Applications
23 pages
Computer Vision Based Attendance Management System For Students
No ratings yet
Computer Vision Based Attendance Management System For Students
6 pages
Born to Parse: How Children Select Their Languages
From Everand
Born to Parse: How Children Select Their Languages
David W. Lightfoot
No ratings yet
Over 251 Google Products & Services You Probably Don't Know
No ratings yet
Over 251 Google Products & Services You Probably Don't Know
13 pages
Open Geodata Repositories & ISRO Geoweb Services For Thematic Applications by Shri. Kamal Pandey
No ratings yet
Open Geodata Repositories & ISRO Geoweb Services For Thematic Applications by Shri. Kamal Pandey
12 pages
PVC-Insulated Cables: 450/750V Single-Core PVC Insulated, Non-Sheathed Cable
No ratings yet
PVC-Insulated Cables: 450/750V Single-Core PVC Insulated, Non-Sheathed Cable
1 page
75 C1.1 Exam Dec. 2021
No ratings yet
75 C1.1 Exam Dec. 2021
5 pages
Kubernetes Interview Questions 1 3 1685320790
No ratings yet
Kubernetes Interview Questions 1 3 1685320790
3 pages
Collins Cobuild English Grammar
From Everand
Collins Cobuild English Grammar
HarperCollins UK
4/5 (13)
Ijieeb V13 N5 5
No ratings yet
Ijieeb V13 N5 5
9 pages
Words, Phrases, and Building a Strong Vocabulary
From Everand
Words, Phrases, and Building a Strong Vocabulary
Gauraang Asan
No ratings yet
Assignment 3 - Test Plan
No ratings yet
Assignment 3 - Test Plan
57 pages
Food Tech and Processing Solutions
From Everand
Food Tech and Processing Solutions
Ambar Achari
No ratings yet
RetroMagazine 07 Eng
No ratings yet
RetroMagazine 07 Eng
55 pages
Personal Details Update Dbs
No ratings yet
Personal Details Update Dbs
1 page
English Language and Linguistic Studies
From Everand
English Language and Linguistic Studies
Yogendra Butt
No ratings yet
Ft-950 Usa Exp Eu Om Eng Eh031h206
No ratings yet
Ft-950 Usa Exp Eu Om Eng Eh031h206
132 pages
Grammar and Linguistics: Core Concepts
From Everand
Grammar and Linguistics: Core Concepts
Saraswati Saini
No ratings yet
Class 8 Networking Concepts Part-1 PDF
No ratings yet
Class 8 Networking Concepts Part-1 PDF
7 pages
Language, Linguistics, and Development Simplified
From Everand
Language, Linguistics, and Development Simplified
Narinder Mehra
No ratings yet
Forensic Linguistics: Solving Crimes with Language
From Everand
Forensic Linguistics: Solving Crimes with Language
Kanti Shukla
No ratings yet
High Availability and DR Test Report: T24 Architecture With JMS Connectivity Oracle Stack
No ratings yet
High Availability and DR Test Report: T24 Architecture With JMS Connectivity Oracle Stack
59 pages
Experiment No.-04: Heat Balance On 4 Stroke Single Cylinder Diesel Engine AIM
No ratings yet
Experiment No.-04: Heat Balance On 4 Stroke Single Cylinder Diesel Engine AIM
8 pages
Afan Oromo Text Keyword Extraction Using Machine Learning
100% (1)
Afan Oromo Text Keyword Extraction Using Machine Learning
18 pages
Thesis Review On Information Extraction From Text Knowledge Poor Approach
No ratings yet
Thesis Review On Information Extraction From Text Knowledge Poor Approach
7 pages
Amharic Part-of-Speech Tagger For Factored Language Modeling
No ratings yet
Amharic Part-of-Speech Tagger For Factored Language Modeling
7 pages
Automatic Relation Extraction Between Entities For Amharic Text
No ratings yet
Automatic Relation Extraction Between Entities For Amharic Text
12 pages
Fundamentals of New Testament Greek Workbook
From Everand
Fundamentals of New Testament Greek Workbook
Stanley E. Porter
3/5 (1)
Automatic Construction of Amharic Semantic Networks From Unstructured Text Using Amharic Wordnet
No ratings yet
Automatic Construction of Amharic Semantic Networks From Unstructured Text Using Amharic Wordnet
6 pages
Amharic Document Representation For Adhoc Retrieval: Tilahun Yeshambel, Josiane Mothe, Yaregal Assabie
No ratings yet
Amharic Document Representation For Adhoc Retrieval: Tilahun Yeshambel, Josiane Mothe, Yaregal Assabie
13 pages
Information Extraction
No ratings yet
Information Extraction
8 pages
M.B.A. (Full - Time) : Bharathidasan University, Tiruchirappalli - 24
No ratings yet
M.B.A. (Full - Time) : Bharathidasan University, Tiruchirappalli - 24
42 pages
The Language of Localization
From Everand
The Language of Localization
Katherine Brown-Hoekstra
1/5 (1)
A Psychological Approach to Translation
From Everand
A Psychological Approach to Translation
Akbar Dehghan Ferdows
No ratings yet
TEFL Practices: Scenarios for Research and Reflection
From Everand
TEFL Practices: Scenarios for Research and Reflection
Araceli Salas
No ratings yet
The Fractal Approach to Teaching English As a Foreign Language
From Everand
The Fractal Approach to Teaching English As a Foreign Language
Maurice Claypole
5/5 (1)
Yirdaw 2012
No ratings yet
Yirdaw 2012
8 pages
An Overview of Event Extraction From Text
No ratings yet
An Overview of Event Extraction From Text
10 pages
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
From Everand
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
Georgette Nicolas Jabbour
No ratings yet
Natural Language Processing with Python: Natural Language Processing Using NLTK
From Everand
Natural Language Processing with Python: Natural Language Processing Using NLTK
Frank Millstein
3.5/5 (4)
An English Grammar
From Everand
An English Grammar
William Malone Baskervill
1/5 (3)
Common Mistakes Made by Esl Learners Using Arabic as Reference Language
From Everand
Common Mistakes Made by Esl Learners Using Arabic as Reference Language
Dr. M. Solainman Ali
No ratings yet
ArMeXLeR Arabic Meaning Extraction Through Lexical Resources: A General-Purpose Data Mining Model For Arabic Texts
No ratings yet
ArMeXLeR Arabic Meaning Extraction Through Lexical Resources: A General-Purpose Data Mining Model For Arabic Texts
6 pages
Task-based grammar teaching of English: Where cognitive grammar and task-based language teaching meet
From Everand
Task-based grammar teaching of English: Where cognitive grammar and task-based language teaching meet
Susanne Niemeier
3.5/5 (4)
The English Language
From Everand
The English Language
R. G. (Robert Gordon) Latham
2.5/5 (2)
Dölling Etc., Language, Context, and Cognition
No ratings yet
Dölling Etc., Language, Context, and Cognition
554 pages
Multilingualism and Intercultural Communication: A South African perspective
From Everand
Multilingualism and Intercultural Communication: A South African perspective
Russell Kaschula
No ratings yet
Enhancing EFL speaking in rural settings:: Challenges and opportunities for material developers
From Everand
Enhancing EFL speaking in rural settings:: Challenges and opportunities for material developers
Bertha Ramos Holguín
No ratings yet
Answering Questions Using Advanced Semantics and Probabilistic Inference
No ratings yet
Answering Questions Using Advanced Semantics and Probabilistic Inference
7 pages
Voice and Mood (Essentials of Biblical Greek Grammar): A Linguistic Approach
From Everand
Voice and Mood (Essentials of Biblical Greek Grammar): A Linguistic Approach
David L. Mathewson
No ratings yet
Comparable Corpora and Computer-assisted Translation
From Everand
Comparable Corpora and Computer-assisted Translation
Estelle Maryline Delpech
No ratings yet
Indo-European Cognate Dictionary
From Everand
Indo-European Cognate Dictionary
Fiona McPherson
5/5 (1)
Natural Language Understanding: Fundamentals and Applications
From Everand
Natural Language Understanding: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computational Linguistics: Language Models and Artificial Intelligence in Robotic Systems
From Everand
Computational Linguistics: Language Models and Artificial Intelligence in Robotic Systems
Fouad Sabry
No ratings yet
Conversational Hebrew Quick and Easy: PART II: The Most Innovative and Revolutionary Technique to Learn the Hebrew Language.
From Everand
Conversational Hebrew Quick and Easy: PART II: The Most Innovative and Revolutionary Technique to Learn the Hebrew Language.
Yatir Nitzany
No ratings yet
Statistical Semantics: Fundamentals and Applications
From Everand
Statistical Semantics: Fundamentals and Applications
Fouad Sabry
No ratings yet
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Translation: Fundamentals and Applications
From Everand
Machine Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Natural Language Processing: Fundamentals and Applications
From Everand
Natural Language Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Explanation Based Learning: Fundamentals and Applications
From Everand
Explanation Based Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Terminology Extraction: Fundamentals and Applications
From Everand
Terminology Extraction: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

2020 Lrec-1 258

Uploaded by

2020 Lrec-1 258

Uploaded by

Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2103–2109

Marseille, 11–16 May 2020

Event Extraction from Unstructured Amharic Text

1. Introduction guage Processing (NLP) tasks we are interested to tackle

Table 1: Experimental results for machine learning Algo-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.