0% found this document useful (0 votes)
336 views86 pages

Sentiment Mining Model For Opinionated Amharic Texts

This thesis proposes a sentiment mining model for opinionated Amharic texts. The model uses a lexicon-based approach to analyze Amharic reviews and classify their sentiment polarity. The thesis presents the design and implementation of the proposed system. It builds an Amharic sentiment lexicon using a manual approach and natural language processing techniques. The system pre-processes texts, detects sentiment words using the lexicon, assigns weights and propagates polarity. It then classifies the overall polarity and strength of reviews. The thesis contributes to sentiment analysis in Amharic, an under-resourced language.

Uploaded by

Yared Arega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
336 views86 pages

Sentiment Mining Model For Opinionated Amharic Texts

This thesis proposes a sentiment mining model for opinionated Amharic texts. The model uses a lexicon-based approach to analyze Amharic reviews and classify their sentiment polarity. The thesis presents the design and implementation of the proposed system. It builds an Amharic sentiment lexicon using a manual approach and natural language processing techniques. The system pre-processes texts, detects sentiment words using the lexicon, assigns weights and propagates polarity. It then classifies the overall polarity and strength of reviews. The thesis contributes to sentiment analysis in Amharic, an under-resourced language.

Uploaded by

Yared Arega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

ADDIS ABABA UNIVERSITY

SCHOOL OF GRADUATE STUDIES

SENTIMENT MINING MODEL FOR OPINIONATED


AMHARIC TEXTS

By: Selama Gebremeskel

A THESIS SUBMITTED TO
THE SCHOOL OF GRADUATE STUDIES OF THE ADDIS ABABA UNIVERSITY IN
PARTIAL FULFILLMENT FOR THE DEGREE OF MASTERS OF SCIENCE IN
COMPUTER SCIENCE

November, 2010
ADDIS ABABA UNIVERSITY
SCHOOL OF GRADUATE STUDIES
FACULTY OF COMPUTER AND MATHEMATICAL SCIENCES
DEPARTMENT OF COMPUTER SCIENCE

SENTIMENT MINING MODEL FOR OPINIONATED


AMHARIC TEXTS

BY: Selama Gebremeskel

ADVISOR:
Solomon Atnafu (PhD.)

APPROVED BY:

EXAMINING BOARD:

1. Dr. Solomon Atnafu, Advisor _____________________


2. _______________________ ______________________
3. _______________________ ______________________
Acknowledgments

First thing first, I gratefully would like to give my special words of thanks to my advisor
Solomon Atnafu (PhD) for his brilliant guidance, constructive suggestions and encouragements
during this research work. I wish to express my sincere gratitude to Melese Tamiru for his
valuable help and support. Special thanks to my beloved friends: Teklay G., Abel T., Filimon
G., and Rehmet M. for your support and faithful friendship. My special thanks goes to all staff
members of Computer Science Department, AAU who helped me throughout my works. I wish
to express my special thanks to Tsegaye Aregay, head Department of Ethiopian Language
studies, AAU and his students Genet W. and Gebre-egziabher T. for their cooperation in
constructing the Amharic Sentiment lexicon. I also would like to thank Mr. Matt Robinson for
his help in buying the SelamSoft electronic English-Amharic dictionary online. And finally, I
thank to all who helped me throughout my life directly or indirectly. Thank you and I respect
you my classmates.
Dedication

Dedicated to my country ETHIOPIA


ሁሉም ነገር ኢትዮጵያ እንዳለችው ይሁን!!!
TABLE OF CONTENTS

LIST OF TABLES .................................................................................................................. I

LIST OF FIGURES .............................................................................................................. II

LIST OF APPENDIXES ..................................................................................................... III

LIST OF ALGORITHMS ................................................................................................... IV

LIST OF ACRONYMS ........................................................................................................ V

ABSTRACT ......................................................................................................................... VI

1. INTRODUCTION ............................................................................................................. 1

1.1. Overview .................................................................................................................................................... 1

1.2. Statement of the Problem........................................................................................................................... 2

1.3. Motivation .................................................................................................................................................. 3

1.4. Objectives ................................................................................................................................................... 3

1.5. Scope and Limitations ................................................................................................................................ 4

1.6. Methodology ............................................................................................................................................... 4

1.7. Procedures.................................................................................................................................................. 5

1.8. Application of Results ................................................................................................................................ 6

1.9. Thesis organization .................................................................................................................................... 6

2. LITERATURE REVIEW .................................................................................................. 7


2.1. Introduction ............................................................................................................................................... 7

2.2. Sentiment mining techniques ................................................................................................................... 11

2.2.1. Machine learning techniques................................................................................................................ 11


2.2.2. Natural language processing techniques ............................................................................................... 12
2.2. 3.Linguistic techniques ........................................................................................................................... 12
2.2.4. Ontology based techniques .................................................................................................................. 12
2.2.5. lexicon-based techniques ..................................................................................................................... 13

i|Pa g e
2.3. General steps in lexicon-based sentiment mining .................................................................................... 13

2.4. Opinion lexicon generation ...................................................................................................................... 14

2.4.1. Manual Approach ................................................................................................................................ 14


2.4.2. Dictionary based approaches................................................................................................................ 15
2.4.3. Corpus based approach ........................................................................................................................ 15

2.5. Basic rules of opinions .............................................................................................................................. 16

2.6. Summary .................................................................................................................................................. 17

3. RELATED WORKS ........................................................................................................ 19

3.1. Sentiment mining from opinionated English texts................................................................................... 19

3.2. Sentiment mining from opinionated non- English texts .......................................................................... 21

3.3. Review’s spam detection .......................................................................................................................... 22

3.4. Summary .................................................................................................................................................. 23

4. DESIGN AND IMPLEMENTATION ............................................................................ 23

4.1. Introduction ............................................................................................................................................. 24

3.2. General system architecture .................................................................................................................... 24

4.2.1. Pre-processing ..................................................................................................................................... 25


4.2.2. Detection of sentiment words............................................................................................................... 27
4.2.3. Weight assignment and polarity propagation ........................................................................................ 28
4.2.4. Polarity classification .......................................................................................................................... 29
4.2.5. Review’s Polarity strength ................................................................................................................... 31

4.3. Implementation ........................................................................................................................................ 32

4.3.1. Building sentiment lexicon .................................................................................................................. 32


4.3.2. Lexicon building guidelines ................................................................................................................. 34
4.3.3. Tools ................................................................................................................................................... 35
4.3.4. The Proposed Algorithms .................................................................................................................... 40

4.4. Summary .................................................................................................................................................. 43

5. EXPERIMENTAL RESULTS ........................................................................................ 44

5.1. Procedures and Experimental Setups ...................................................................................................... 44

5.1.1. Opinionated Data Collection ................................................................................................................ 44

ii | P a g e
5.2.2. Manual classification ........................................................................................................................... 45

5.3. Evaluation ................................................................................................................................................ 45

5.4. Results ...................................................................................................................................................... 46

5.4.1. Experiment one: Basic system ............................................................................................................. 47


5.4.2. Experiment two: using general purpose and domain specific lexica ...................................................... 47
5.4.3. Experiment three: using both lexica and contextual valence shifter terms.............................................. 48

5.6. Discussion of the results ........................................................................................................................... 49

6. CONCLUSIONS AND RECOMMENDATIONS........................................................... 53

6.1. Conclusions .............................................................................................................................................. 53

6.2. Contributions of the study ....................................................................................................................... 54

6.3. Recommendations .................................................................................................................................... 54

6.4. Future Works ........................................................................................................................................... 55

REFERENCES .................................................................................................................... 56

ANNEXES ............................................................................................................................ 60

iii | P a g e
List of tables
Table 4.1: Sentiment terms’ polarity propagation example ……………………………31

Table 5.1: Results of experiment one …………………………………………………..46

Table 5.2: Results of experiment two …………………………………………………..47

Table 5.3: Results of experiment three ………………………………………………....47

Table 5.4: Results of experiment three with additional reviews..……………………....48

i|Pa g e
List of figures

Figure 4.1: The sentiment mining model for opinionated Amharic texts……………………..25

Figure 4.2: Reviews’ polarity classification…………………………………………………...30

Figure 4.3: SelamSoft electronic English-Amharic dictionary used for sentiment translation..33

Figure 4.4: sample movie reviews input ……………………………………………………...36

Figure 4.5: sample of polarity classified reviews movie reviews …………………………….37

Figure 4.6: sample of classified reviews with its polarity strength ……………......................37

Figure 4.7: sample of accepting review from user and its polarity classification......................38

Figure 4.8: Sample of browsed opinionated Amharic texts and polarity classifications……...39

ii | P a g e
List of Appendixes

Appendix A: Sample subjectivity lexicon of OpinionFinder …………………………………59

Appendix B: Sample representation of sentiment terms in our dictionary…………..………..60

Appendix C: Questionnaire and sample responses………………..…………………………..61

Appendix D: sample of movie reviews ……………………………………………………….62

Appendix E: List of approved Amharic sentiment terms and approval letter ………………..63

iii | P a g e
List of Algorithms
Algorithm 4.1: Review’s sentiment detection and polarity classification………………40
Algorithm 4.2: Sentiment polarity propagation…………………………………………41

iv | P a g e
List of Acronyms
ASCII American Standard Code for Information Interchange
CTRW Choose the Right Word
DAL Dictionary of Affect Languages
GI General Inquirer
IMDB Internet Movie Database
IR Information Retrieval
MCC Media and communication Center
ME Maximum Entropy
NB Naïve Bayes
NLP Natural Language Processing
SentiWN Senti Word Net
SVM Support Vector Machine
POS Part of Speech
UN Unclassified

v|Page
Abstract
Opinions are so important that whenever we need to make a decision, we want to hear other’s
opinions. This is not only true for individuals but also for organizations. Due to the rapid
growth of opinionated documents, reviews and posts on the Web, the need for finding relevant
sources, extract related sentences with opinions, summarize them and organize them to useful
form is becoming very high. Sentiment mining can play an important role in satisfying these
needs. The process of sentiment mining involves categorizing an opinionated document into
predefined categories such as positive, negative or neutral based on the sentiment terms that
appear within the opinionated document. In this research work, a sentiment mining model is
proposed for determining the sentiments expressed in an opinionated Amharic texts or reviews.
The polarity classification or semantic orientation of the opinionated texts can be positive,
negative or neutral. The system designed based on the proposed model detects positive and
negative sentiment terms including contextual valence shifters such as negations and assigns an
initial polarity weight to all detected sentiment terms in order to determine the polarity
classification of the opinionated text. The lexica of Amharic sentiment terms are used to
identify and assign initial polarity value to the sentiment terms detected. A prototype system is
developed to validate the proposed model and the algorithms designed. Tests on the prototype
are done using movie and newspaper reviews where the result obtained with these test data is
very much encouraging.

Keywords: opinions, sentiments, sentiment mining from opinionated Amharic texts, polarity
classification from opinionated Amharic texts, sentiment lexicon, opinionated Amharic text

vi | P a g e
CHAPTER ONE
1. INTRODUCTION
1.1. Overview
An important part of our information-gathering behavior has always been to find out what
other people think about an issue. With the growing availability and popularity of opinion-rich
resources such as online review sites and personal blogs, new opportunities and challenges
arise as people now can, and do, actively use information technologies to seek and understand
the opinion of others [1]. The ability to automatically extract and classify opinions from texts
would be enormously helpful to individuals, business intelligence, government intelligence and
others in decision making. Extracted opinion also can be used effectively by recommendation
and collaboration systems. A collaboration system helps users to explore recommendations
from various viewpoints. Given ratings and reviewers from reviews, this system provides
virtual reviewers that represent particular view points and recommendations [2, 3].
Opinion mining, which is also known as sentiment analysis, emotion mining, attitude mining or
subjectivity mining [2], is a hot research discipline which is concerned with the computational
study of opinions, sentiments and emotions expressed in an opinionated text. Why is opinion
mining important now? It is mainly because of the web, which is full of huge volume of
opinionated text.
Sentiment mining can be done at sentence level, document level or feature level. In sentence
level opinion mining, there are two tasks: the subjectivity classification and sentiment
classification. The first is concerned with subjectivity and objectivity classification. Sentences
are classified into pre-defined binary classification subjective sentence (e.g. it is such a nice
phone) or objective sentence (e.g. I bought an iPhone a few days ago). The sentiment
classification is concerned with polarity classification. The sentences are classified as positive
(e.g. it is just a nice phone), negative (e.g. the phone broke in two days) or neutral
classification. The document level sentiment classification is concerned with classifying the
document based on the overall opinion expressed by the opinion holder as positive, negative
and neutral. At the feature level sentiment mining, commented features are identified, extracted
and the sentiment towards these features is determined [4]. In this research our focus is on
document level sentiment classification of Amharic texts such as movie reviews.

1|Page
1.2. Statement of the Problem
One of the main reasons for the lack of study on opinions is the fact that there were little
opinionated texts available before the World Wide Web. Before the Web, when an individual
needed to make a decision, he/she typically asked for opinions from friends and families. When
an organization wanted to find the opinions of sentiments of the general public about its
products and services, it conducted opinion polls, surveys, and focus groups. However, with
the Web, especially with the explosive growth of the user generated content on the Web in the
past few years, the world has transformed [5].
We can have different opinion search queries for different purposes such as to:
1. Find opinion of a person or organization on a particular object or feature of the
object (e.g. what is Obama’s opinion on abortion).
2. Find positive, negative or neutral opinion on a particular object (e.g. customers’
opinion on a digital camera, public opinion on political topic) and
3. Determine how object A compares with object B (e.g. gmail versus hotmail).
As online business is becoming more and more popular, the quantity of reviews toward
products given by customers is growing rapidly as well. Hence it is difficult for a customer,
seller or the producer to read all of the reviews and make a reasonable decision when she/he is
facing the problem whether to purchase a certain product / use certain service or not [4]. Due to
the availability of opinion rich documents on review sites, forums, discussion groups, blogs
etc, there are too many opinions and reviews to be read which is very difficult and hence
traditional techniques are inadequate. So there is a need for good sampling and classification
techniques for these reviews and opinions. For this reason many researches on sentiment
analysis have been done and are being under taken for English and other languages such as
French [6]. But to the extent of my knowledge, sentiment classification of Amharic documents
has never been studied even though the amount of opinionated Amharic documents on the web
is increasing [7]. Therefore, this study investigates and aims to develop a sentiment
classification model for opinionated Amharic texts.

2|Page
1.3. Motivation
For specific opinion search queries that are of the same nature with finding positive, negative
or neutral opinion on a particular object, we can have Amharic opinion queries such as “የሚኒባስ
ባለንብረቶች ስለ አዲሱ በቀጠና መስራት ምን አስተያየት ኣላቸዉ ?” where the collected review could be
positive such as (“በ አዲስ አበባ ያለዉን የትራንስፖርት ችግር ስለሚፈታ በጣም ጥሩ ነው ብየ አምናለሁ፡።”,
negative opinions such as “ይሄ አሰራር ዘላቂ መፍትሄ ኣይሆንም ።” or neutral (that contains both
negative and positive opinions with the same strength or weight). So we can analyze these
opinions for the purpose of decision making. “የፊልም ኣፍቃርያን ስለ ‘የዎንዶች ጉዳይ ቁ.2’ ፈልም ምን
አስተያየት አላቸዉ” the collected opinions may be positive such as “ድርሰቱም ሆነ ቅንብሩ በጣም ደስ
ይላል። ስለ ሆነም በጣም ተመችቶኛል።”, negative opinions such as “በአጠቃላይ ፊልሙ ለኔ አልተመቸኝም። or
neutral (that contains both positive and negative opinions with the same strength or weight).
Therefore in this research we are going to investigate the possibilities of developing an
Amharic Sentiment Mining model that will be able to automatically analyze the sentiment of
huge amount of collected reviews prior to making decisions.

1.4. Objectives
The general and specific objectives of this study are given below:
General Objective: the general objective of this research work is to design and develop a
sentiment mining model for opinionated Amharic documents.
Specific Objectives: the specific objectives of this research work are:
 Analysis of the general structure of Amharic statements related to opinions and
sentiments such as identifying negative, positive and neutral statements.
 Analyze the relationship between opinions and Amharic words and their intensity
or strength.
 Design a model for sentiment mining from Amharic opinionated texts.
 Building of both domain specific and general purpose lexicon of Amharic language
opinion terms where these terms are tagged as positive (+), negative (-),
overstatement (>) understatement (<) or negation (Negate).
 Develop necessary algorithms to realize the proposed model in developing an
Amharic sentiment mining model.
 Develop a prototype to demonstrate that the model designed is valid.
 Evaluate the model designed using movies reviews.
3|Page
1.5. Scope and Limitations
Opinion mining is a complex and recent research discipline that requires the effective analysis
and processing of documents. Since there are no publicly available Natural Language
Processing (NLP) tools and other resources for Amharic language that can be integrated with
our model, the scope of our research work is:
 Limited to sentiment (polarity) mining (only positive, negative or neutral)
classification. i.e. it doesn’t cover subjective or objective classification.
 We use domain specific review texts that are grammatically checked and organized.
 The opinion holder identification and reasons for positive and negative classifications
are not covered in this research work.
 Attention is given to most common Amharic words used to express opinions “ጥሩ ፣ ደስ
ይላ ል፣ በ ጣም ጥሩ ፣ መልካ ም ወዘ ተ ” for positive opinions and “መጥፎ ፣ አ ይመችም ፣ ደስ አይልም

ወዘ ተ ” for negative opinions. Because of their complicated nature, Amharic expressions

such as “ቅ ኔ ያ ዊ አ ነ ጋ ገ ር ” are out of the scope of this research work.


 Opinion spam detection, the process of detecting fake reviews, is not covered in this
research work as it is a very complicated problem. Fake reviews are reviews that
contain false positive or malicious negative opinions [4, 8].
1.6. Methodology
Literature review
Opinion mining related literatures from different sources such as published papers, journal
articles and other materials are reviewed in detail to get better understanding of the area and to
have detail knowledge on the various techniques of sentiment mining.
Analysis of existing opinion based texts in Amharic
Since this research work is mainly concerned with opinionated Amharic texts, it was
compulsory to analyze the nature of Amharic documents that contain opinions. Therefore rules
and methods were proposed to identify or categorize Amharic opinion terms.
Lexicon of Opinion terms
The proposed model is fully dependent on the lexicon of Amharic opinion terms. This lexicon
contains Amharic opinion terms tagged as positive (+), negative (-), negations (negate),
overstatement (>) or understatement (<). Negations and intensifiers (overstatements and
understatement) are collectively known as contextual valence shifters. Those terms change the

4|Page
initial value of a sentiment term. Negations are terms that can change the semantic orientation
of a term. Basically these terms switch a positive term to negative term and vice versa as in ጥሩ
(good) is positive while ጥሩ አይደለም (not good) is negative due to the contextual valence shifter
term አይደለም (not). Intensifiers are terms that change the degree of the expressed sentiment. For
example, in the sentence ‘ፊልሙ በጣም ጥሩ ነዉ’ (the film is very good), the terms በጣም ጥሩ (very
good) are more positive than just ‘ጥሩ’ (good) alone [9].
We have built two lexica of opinion terms. These are: Domain specific lexicon and General
purpose lexicon. The domain specific lexicon contains opinion terms restricted to a specific
domain (e.g. movie reviews domain). The general purpose lexicon contains opinion terms of
Amharic language terms which are not restricted to a specific domain (this lexicon is to be used
by any domain such as products reviews, movie reviews, hotel reviews etc). The procedures
and guidelines of building the lexica are given in chapter four.
Data Source
Most of the datasets (reviews) used for conducting the experiment are manually collected from
Cinemas in Addis Ababa and previously collected reviews by undergraduate students of the
department of Theatrical arts at Addis Ababa University. The rest of the dataset is collected
from www.habeshafilms.com [10].
Prototyping
In order to test the proposed model, we have developed a prototype. We built a lexicon of
opinion terms according to the procedures and guidelines for implementing the prototype. This
lexicon of terms is then integrated for manipulation and prototyping.
1.7. Procedures
Different components and development stages (phases) are employed in developing the
Amharic sentiment mining model. Three main tasks are involved throughout this research
work. These tasks are: linguistic related (Amharic) studies, building Amharic opinion terms
lexicon and programming language (implementation) related tasks. The linguistic related study
includes studying the structure of Amharic statements related to opinion expressions while the
lexicon construction is concerned with building a dictionary of opinion terms and assigning
prior polarity value. The last one refers to choosing suitable working environment and
implementing the prototype.

5|Page
1.8. Application of Results
In the current business and political situations, knowing what other people think is a
determinant factor in decision making. Hence, the Amharic Sentiment Mining model can be
used for different purposes. Some of them are:
 Business and organizations (product review mining and service analysis, market
intelligence) can use the system to reduce the money spent to find consumer’s
sentiment and opinions.
 Individuals (who are interested in other’s opinion, can use it when purchasing a product
such as in the case of www.epinions.com [11], using a service or finding opinions on
political topics).
 Government intelligence can use the system for mining opinions of people on a
particular issue.
 The system can be used to classify movie reviews as positive, negative or neutral.
 The system can be used to answer opinion questions. For instance, what is the
international reaction to the 4th Ethiopian national election conducted on May 23, 2010?
1.9. Thesis organization

The remainder of this thesis report is organized as follows. Chapter two introduces an overview
of opinion mining (sentiment mining) and the different techniques used in sentiment mining
researches. Moreover, the general steps in sentiment mining are also discussed in this chapter.
Chapter three presents reviews of related researches conducted on opinion mining/sentiment
mining. In this chapter, in depth reviews of researches done on sentiment mining using
different techniques for different languages is presented. Chapter four describes the general
architecture of the proposed model for the Amharic sentiment mining model and the
construction of Amharic sentiment terms lexicon. In addition, implementation related issues
such as pre-processing, dictionary/lexicon integration and classification are also explained in
the same chapter. Chapter five presents the experimental results of the proposed model in
general and the different algorithms in particular. Finally, future works, recommendations and
conclusions are given in the last chapter.

6|Page
CHAPTER TWO
2. LITERATURE REVIEW
2.1. Introduction
The two main types of textual information available in any texts are: facts and opinions. Facts
are objective statements about entities and events in the world but opinions are subjective
statements that reflect people’s sentiments or perceptions about the entities and events.
Automatic sentiment analysis in texts, also called opinion mining, has attracted considerable
attention in recent years, primarily because of its potential use in marketing study. It aims to
answer questions such as ‘is the customer who sent a mail to an after-sale service particularly
dissatisfied?’,’ are the opinions about product posted in blogs positive or negative?’, what is
the image of political party or leader in the press?’. All these questions, which are related to the
way something is presented or evaluated in a text, are particularly difficult for traditional
information extraction techniques [12], which are interested with factual information. The
sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals
with the computational treatment of opinion, sentiment and subjectivity in the text, has
occurred at least in a part as a direct response to the surge of interest in new systems that deal
directly with opinions as a first class object [1]. In this chapter a clear description of the area,
the methodology, languages and related issues is given.
Opinions or sentiments
Opinion is a private state that is not open to objective observation or verification. It is defined
as a person’s idea and thought towards something and it is an assessment, judgment or
evaluation of something [13]. The web contains a wealth of opinions about products,
politicians, and more, which are expressed in newsgroup posts, review sites, and elsewhere.
These opinions are so important that whenever we need to make a decision, we always need to
hear other’s opinions. As a result the problem of opinion mining has seen increasing attention
[14].
Opinion words
Opinion words such as positive (e.g. beautiful, wonderful…), or negative (e.g. bad, poor,
terrible...) are instrumental for sentiment mining. Some opinion words are context independent
(e.g. good) while some are context dependent (e.g. increase, it is positive for the employees

7|Page
when it refers to increase in salary and negative for taxpayers if it refers to increase in tax)
[15].
Types of opinion words

According to [16], there are three types of opinion words. These are personal emotion (e.g.
happy, delighted, proud, sad, angry, horrified, etc), appreciation (flexible, stable, efficient,
reduced, ideal, backward, poor, highest etc) and judgment (e.g. active, decisive, caring,
dedicated, intelligent, negligent, evil, etc). Bing Liu [5] divided opinion words into two types,
the base types and comparative types. Base types of opinion words are used to express desired
or undesired states (e.g. wonderful, poor, etc) where as comparative types of opinion words are
used to express comparative or superlative opinions (e.g. better, worse, best, etc).
Context valence shifters

Context valence shifters are terms that cause the valence of a sentiment term to shift from one
pole to the other or, less forcefully, to modify the valence towards a more neutral. Negations
are the most obvious valence shifters. “Not” flips the valence of a term. In addition to “not”,
negations can belong to various classes. Simple negations include never, none, nobody,
nowhere, nothing, neither …” As in:
John is clever versus john is not clever
John is successful at tennis versus john is never successful at tennis
Of course for a shift in attitude to take place there has to be an attitude expressed in the first
place. A simple sentence such as “john is home” might express a simple fact without betraying
an attitude (i.e. the attitude score is 0). When negated, as in “john is not home”, there is no shift
in attitude (i.e. the negation of 0 is 0). Combining positive words with a negation such as “not”
flips the positive valence to a negative valence [9].
Components of an opinion

The basic components of an opinion are: opinion holder, object and the actual opinion. Opinion
holder refers to the person or the organization that holds a specific opinion on a particular
object. The object refers to which an opinion is expressed while the opinion is the view,
attitude or appraisal on an object from the opinion holder. Example: John said that the scanner
is slow. Here john is the opinion holder, scanner is the object and slow is the actual opinion or
sentiment towards the scanner [13].

8|Page
Sentiment mining

Sentiment mining is an area of text mining that has recently received a lot of attention due to
the amount of opinion information that resides in web documents. It is concerned with the
identification of opinions in a text and their classification as positive, negative and neutral.
Sentiment mining refers to a broad area of natural language processing, computational
linguistics and text mining that aims to determine the attitude of a speaker or writer with
respect to some topic [17]. Feiyu XU & Xiwen Cheng [13] Defined sentiment mining as a
recent disciple at the cross roads of information retrieval, text mining and computational
linguistics which tries to detect the opinions expressed in the natural language texts. Sentiment
mining is a complex field as it involves the processing and interpretation of natural language.
Hence it must deal with natural languages’ inherently ambiguous natures, the importance of
context, and other complications that do not lend themselves to automation. The following
example demonstrates how important and difficult to get the idea expressed in a given
statement. “Just go read the book”, if this is mentioned regarding a book, this could be
considered a recommendation. But if it is in reference to a film adaption of a book, it would
seem to suggest the film is not worth watching [18].
Main sentiment mining activities
The main activities needed for building a sentiment mining system are: development of
linguistic resources (e.g. build a lexicon of subjective terms), classification of text (entire
documents, sentences) based on their content (e.g. classifying a news article either as positive
or negative in relation to the subject), extraction of opinion expression from text, including
relations with the rest of content (e.g. recognizing an opinion, who is expressing it, who/what is
the target of the opinion), mining tools and visualization tools to extract meaningful
information from the mined articles based on the sentiment tags [16].
Levels of sentiment mining

The sentiment mining process could be conducted on three different levels: document level,
sentence level, or feature level. In the sequel, each of them is discussed [13].
Sentiment mining on document level: here, document is classified into positive, negative or
neutral based on the overall sentiment expressed by the opinion holder with the assumption that
each document focuses on a single object and contains opinion from a single opinion holder
(e.g. “I bought an iPhone a few days ago. It was such a nice phone. The touch screen was

9|Page
really cool. The voice quality was clear too. Although the battery life was not long, that is ok
for me. However, my mother was mad with me as I did not tell her before I bought the phone.
She also thought the phone was too expensive, and wanted me to return it to the shop. …”) [4].
Sentiment mining on sentence level: contains two basic tasks: the subjectivity classification
and sentiment classification. The subjective classification is concerned with classifying
sentences as objective (e.g. “I bought an iPhone a few days ago”) or subjective (e.g. “it is such
a nice iPhone”). The sentiment classification is concerned with the subjective sentence to
classify as positive (e.g. “it is such a nice iPhone”) or negative (e.g. “the battery life of the
phone was not long”) or neutral.
Sentiment mining at feature level: in this level, commented features are identified and
extracted and the sentiment towards these features is determined. Sentiment classification at
both document level and sentence level are not enough to tell what people like and/or dislike,
because a positive opinion on an object does not mean that the opinion holder likes everything
(e.g. “the touch screen was really cool” where the ‘touch screen’ is the feature of an iPhone).
Similarly a negative opinion on an object does not mean that the opinion holder dislikes
everything (e.g. “the phone was too expensive” where ‘price’ is a feature of an iPhone).
However, some people are not very interested about features; they just want to know the
general information about an object [19].
General Sentiment mining tasks

In general the tasks of sentiment mining are: determining document subjectivity, determining
document polarity and determining strength of document orientation [20].
Determining document subjectivity: deciding whether a given text has a factual nature or
expresses an opinion on its subject matter. This amounts to performing binary text
categorization under categories of objective and subjective.
Determining document polarity: decides if a given subjective text expresses positive,
negative or neutral opinion on its subject matter.
Determining strength of document orientation: decides whether the positive opinion
expressed by a text on its subject matter is weakly positive, mildly positive or strongly positive.
Similarly decides whether the negative opinion expressed by a text on its subject matter is
weakly negative, mildly negative or strongly negative.

10 | P a g e
Components of sentiment mining

Subjectivity mining: is the linguistic expression of somebody’s opinion, sentiments, emotions,


evaluations and beliefs. The subjectivity analysis classifies content into objective or subjective.
Polarity mining: attempts to identify the opinion or sentiment that a person may hold towards
an object and classifies it as positive, negative or neutral [21].
Opinion spam

Reviews are used by potential customers to find opinions of existing users before deciding to
purchase a product or use a service. They are also used by product manufacturers to identify
problems of their products and to find competitive intelligence information about their
competitors. But unfortunately, this importance of reviews also gives good incentives for spam,
which contains false positive (giving undeserving positive opinions to some target products in
order to promote them) and malicious negative opinions (giving unjust negative reviews to
some other products in order to damage their reputation) [8].
2.2. Sentiment mining techniques
There are a number of different approaches that have been used in an attempt to solve the
problem of sentiment classification. One of the most widely used methods involves classifying
a single word or phrase with sentiment, and then calculating an overall sentiment rating for a
target document using some weighting [18]. The most commonly applied techniques for
sentiment mining are described as follows.
2.2.1. Machine learning techniques

Machine learning treats sentiment classification simply as a special case of topic based
categorization (with the two topics being positive sentiment and negative sentiment). The
traditional topic based categorization attempts to sort documents according to their subject
matter (e.g. sports vs. politics). The three standard machine learning algorithms commonly
used for sentiment classification are Naïve Bayes (NB) classification, maximum entropy (ME)
classification and support vector machine (SVM) classification [22]. According to the work of
B. Pang et al. [22], the experimental results produced via machine learning techniques are
quite good. In terms of relative performance, NB tends to work the worst and SVM tends to
work the best although the differences are not very large. While machine learning techniques
have been found to produce good results, there are associated disadvantages. Machine learning

11 | P a g e
classification is dependent on the training data so that there is little indication of how the
classification would perform in more general cases. The gathering of such a training set is
tough, as it involves the gathering and human classification of huge number of different
documents. In addition, with machine learning algorithm, it could be difficult to incorporate
contextual valence shifters [23].
2.2.2. Natural language processing techniques

There are a number of different language analysis techniques that fall under the umbrella of
natural language processing, of which the most common are: part of speech (POS) tagging, co-
reference resolution and full syntactic parse tree. POS tagging is the process of labeling word
occurrences with its world class. For example, whether a word is occurring as an adjective,
noun, or verb. Effective tagging requires knowledge of not just the word but also its context,
such as position within the sentences and surrounding word. Hidden markov model is a
common technique which is used in POS tagging. To avoid constantly referring a subject by
name, natural language usually contains alternative words that can be used when referring to a
previously mentioned subject. Corefernce resolution is used for automating the process of
connecting such references. Creating parse tree for natural languages is another central area of
study in NLP. Parsing is related to POS tagging as determining sentences structure requires
knowledge of which sense words are being used. ‘Chunking’, a simplified form of parsing that
doesn’t analyze sentences in as much depth, can be used in place of parsing for some
applications [18].
2.2. 3.Linguistic techniques

Others have approached the sentiment mining problem in different angle, believing that the
complexity of natural language make the existence of a general solution to sentiment problems
unlikely. These instead focus on specific sentence or test, such as use of linguistics in attempt
to classify for conditional sentences in combination with machine learning techniques. For a
topic so intrinsically linked with natural language, the use of linguistics in sentiment
classification is surprisingly limited, though there are some cases where it has been applied
successfully [24]. It should be noted though that even when linguistic methods are used, it is
often in combination with machine learning techniques [18].

2.2.4. Ontology based techniques

12 | P a g e
Ontology defines the common words and concepts (the meaning) used to describe and
represent an area of knowledge. This definition has two parts: describing and representing an
area of knowledge, defining the common words and concepts of the description [25]. Ontology
appears specially promising for sentiment mining. The use of ontology has the potential to
refine and improve the process of sentiment mining by identifying specific properties of a
domain as well as relationships between different concepts from that domain [26]. Ontology
itself is an explicitly defined reference model of application domain with the purpose of
improving information consistency and knowledge sharing. It describes the semantics of a
domain both in human-understandable and computer processable way. In general, opinion
mining is quite context sensitive, and at a coarser granularity, quite domain dependent. As a
result a fine grain approach for opinion mining is needed [19].
2.2.5. lexicon-based techniques

This technique uses sentiment and subjective lexicon of terms. The basic idea behind this
system is to classify reviews based on how many positive and negative terms are present in the
document. This is based on a rule-based classifier where if there are more positive than
negative terms then it is considered to be positive. If there are more negative than positive
terms then it is considered to be negative. If there is equal number of positive and negative
terms then it is neutral. When using this technique, it is relatively easy to incorporate
contextual valence shifters [23]. The performance of this technique depends on the
effectiveness of the lexicon of opinion terms. The main resource used for identifying positive
and negative terms in English is the General Inquirer (GI) [27]. GI is a system which lists terms
as well as different senses for the terms. For each sense it provides a short definition as well as
other information about the term. This includes tags that label the term as being positive,
negative, a negation term, an overstatement, or an understatement. Some researchers as in [23]
add extra terms from other resources such as the Choose the Right Word (CTRW) [28]. CTRW
is a dictionary of synonyms. Adding extra opinion terms from different sources strengthens the
efficiency of the lexicon. The capabilities of this technique are employed in this research work.

2.3. General steps in lexicon-based sentiment mining

13 | P a g e
In this subsection, the general steps in the lexicon based sentiment mining technique are
discussed in short. This is because our research employs lexicon based approach.
Text collection

Sentiment mining starts with collecting raw texts. This can be done manually or automatically
from the internet.
Text pre-processing

Noises that do not express contents are cleaned in this process. It includes word segmentation
and POS tagging [19]. In this research work, tokenization and normalization are the activities
done during the review pre-processing step.
Polarity words detection

This step relies on a lexicon of tagged positive and negative sentiment terms which are used to
quantify positive/ negative sentiments. This tagged lexicon provides a readily interpretable
positive and negative polarity values for a set of ‘affective/sentiment’ terms. In this step every
word content is checked whether it is a polarity word defined is in the sentiment lexicon and
get the corresponding sentiment polarity if found [20].
Weight assignment and propagation

In this step, every polarity word and modifier get the initial weight defined in the sentiment
lexicon. If the word is linked to a modifier, the polarity value is multiplied by a coefficient [20]
or some value is added to the initial value [23].
Classification
In this step, the text document or review is classified as positive, negative or neutral based on
the numerical results obtained from the previous steps.
2.4. Opinion lexicon generation

Opinion words are employed in many sentiment classification tasks. Opinion words are also
known as polar words, opinion-bearing words and sentiment words. To compile or collect the
opinion words list, three main approaches have been investigated: manual approach, dictionary
based approach and corpus-based approach [5].

2.4.1. Manual Approach

14 | P a g e
This approach is just a process of hand picking sentiment words from different sources with
the goal of populating a lexicon with polar words. This manual approach is very time
consuming and it is usually combined with automated approaches as the final check because
automated methods make mistakes. The opinion words lexicon used in [29] are manually hand-
picked based on a reading of several thousand messages. In the work of J. Yi et al. [30], they
collected a sentiment lexicon of 3000 English sentiment terms manually from different sources.
The General Inquirer (GI), dictionary of affect of language (DAL) [31] and Wordnet were the
main sources and 2500 of the total terms are adjectives.
2.4.2. Dictionary based approaches

One of the techniques in this approach is based on bootstrapping using a small set of seed
opinion words and online dictionary, e.g. WordNet [32]. The strategy is to first collect a small
set of opinion words manually with known orientations, and then to grow this set by searching
in the online dictionary for their synonyms and antonyms. The newly found words are added to
the seed list. The next iteration starts. The iteration stops when no more new words are found.
After the process completes, manual inspection can be carried out to remove and/or correct
errors. The dictionary based approach and opinion words collected has major shortcomings.
The approach is unable to find opinion words with domain specific orientations, which is quiet
common. For example, for a speakerphone, if it is quiet, it is usually negative. However, for a
car, if it is quiet, it is positive.
2.4.3. Corpus based approach

Corpus based approach rely on syntactic or co-occurrence patterns and also a seed list of
opinion words to find other opinion words in a large corpus. The technique starts with a list of
seed opinion adjectives words, and uses a set of linguistic constraints of conventions on
connectives to identify additional adjective opinion words and their orientations. One of the
constraints is about conjunction (AND), which says that conjoined adjectives usually have the
same orientation. For example, in the sentence “this car is beautiful and spacious,” if
“beautiful” is known to be positive, it can be inferred that “spacious” is also positive. This is so
because people usually express the same opinion on both sides of a conjunction. The following
sentence is rather unnatural, “this car is beautiful and difficult to derive”, if it is changed to
“this car is beautiful but difficult to derive”, it becomes acceptable. Rules or constraints can
also be designed for other connectives; OR, BUT, EITHER-OR, and NEITHER-NOR. This
idea is called sentiment consistency.
15 | P a g e
2.5. Basic rules of opinions
A rule of opinion is an implication with an expression on the left and an implied opinion on the
right. The expression is a conceptual one as it represents a concept, which can be expressed in
many ways in actual sentence. The application of opinion words/phrases can also be
represented as such rules. Let Neg be negative opinion word/phrase and Pos be positive
opinion word/phrase. The rules for applying opinion words/phrases in a sentence are given as
follows.
1: Neg Negative
2: Pos Positive

These rules say that Neg implies a negative opinion (denoted by Negative) and Pos implies a
positive opinion (denoted by Positive) in a sentence. The effect of negations can be represented
as well:
3: Negation NegPositive
4: Negation Pos Negative
These rules state that negated opinion words/phrases take their opposite orientation in a
sentence. Other related rules are also outlined as follows.
Deviation from the norm or some desired value change: in some domains, an object feature
may have an expected or desired value rang or norm. If it is above and or below the normal
range, it is negative, e.g. “this drug causes low (or high) blood pressure”. We then have the
following rules.
5: Desired value rangePositive
6: Below or above desired value rangeNegative
Decreased and increased quantities of opinionated items: This set rule is to the negation rules
above. Decreasing or increasing the quantities associated with some opinionated items may
change the orientation of the opinions. For example, “this drug reduced may pain rapidly
significantly.” Here pain is a negative opinion word, and the reduction of “pain” indicated a
desired effect of the drug. Hence the decreased pain implies a positive opinion on the drug. The
concept of decreasing also extends to “removal” or “disappearance”. e.g.” my pain has
disappeared after taking the drug”.
7: Decreased NegPositive
8: Decreased PosNegative

16 | P a g e
9: Increased NegNegative
10: Increased PosPositive
The last rules may not be as such very important as there is no change of orientation.
Producing and consuming resources and wastes: If an object produces resources, it is positive.
If it consumes resources, especially a large quantity of them, it is negative. For example,
“money” is a resource. The sentence, “Company-x charges a lot of money” gives a negative
opinion on “Company x”. Likewise, if an object produces wastes, it is negative. If it consumes
wastes, it is positive. These give us the following rules:
11. Consume resource → Negative
12. Produce resource → Positive
13. Consume waste → Positive
14. Produce waste → Negative
These basic rules can also be combined to produce compound rules, e.g., “Consume decreased
waste →Negative” which is a combination of rules 7 and 13. To build a practical system, all
these rules and their combinations need to be considered. As noted above, these are conceptual
rules. They can be expressed in many ways using different words and phrases in an actual text,
and in different domains they may also manifest differently. However, by no means, it is
claimed these are the only basic rules that govern expressions of positive and negative
opinions. With further research, additional new rules may be discovered and the current rules
may be refined or revised. Neither it is claimed that any manifestation of such rules imply
opinions in a sentence. Like opinion words and phrases, just because a rule is satisfied in a
sentence does not mean that it actually is expressing an opinion, which makes sentiment
analysis a very challenging task [5].

2.6. Summary

17 | P a g e
In this chapter, the two different types of textual information, opinions, opinion words, and
components of opinions are explained. Opinion mining, levels of sentiment mining, sentiment
mining main tasks and components of sentiment mining are also described in the same chapter.
In addition, the different techniques of sentiment mining that includes machine learning,
natural language processing, linguistic techniques, ontology based and lexicon based are
discussed. The general steps in lexicon based sentiment mining are highlighted where the high
level steps are: text collection, pre-processing, polarity words detection, weight assign and
propagation, and classification. The different approaches for building a sentiment lexicon that
includes manual approach, dictionary based and corpus based approach are also described in
this chapter. Finally, the basic rules of opinions are also described in detail.

CHAPTER THREE

18 | P a g e
3. RELATED WORKS

In this chapter, sentiment mining related researches done for different language opinionated
documents such as English [33] [23] [34], Chinese [20] and French [6] using different
techniques and approaches are reviewed. Different authors used different techniques such as
machine learning [33], ontology based approaches [34], lexicon-based approaches [23] [20]
and others [35]. In addition to the techniques, the employed approaches, goals, motivation,
domain, target language, dataset source, procedures, experimental results, performance, and
challenges are the main points given focus when going through the different works.
3.1. Sentiment mining from opinionated English texts

Sandeep Balijepalli [33] used machine learning technique to categorize opinionated English
documents taken from political blogs based on their sentiments and determine the polarity
strength of the sentiments. Contents collected from the political domain are made to pass
through pattern matching (Nave Bayes filter, bag of words and part of speech tagging) for
obtaining the sentiment oriented sentences which are later to be indexed. The index helps to
avoid the delays in fetching the data.
The framework proposed by Sandeep gets contents from the database of blogs, pass the
sentences to the sentence chunker for stripping unrelated data, then the sentence is passed
through filters for filtering out objective sentences and classifying subjective sentences, index
opinionated sentences, divide results by bloggers party and finally sort them by their polarity
strength. In the above approach, filter analysis is done by making sentences to pass through the
pattern recognizer first for checking the sentences if they follow the custom developed
subjective pattern. If the sentence matches the pattern, it is indexed otherwise it is passed
through Nave Bayes (unigram, bigram) for further analysis where this filter depends on the
training dataset. If the sentence is not indexed at this filter, it again passes through the part of
speech tagging and if the sentence is found to be subjective, it is indexed otherwise it is
considered as objective sentence and it is skipped. Then, other sentence undergoes the entire
procedure. The experimental result shows that the system performs well with unigram
approach.
Alistair Kennedy and Diana Inkpen [23] proposed a method that counts positive and negative
terms but also takes contextual valence shifters such as negations and intensifiers into account.

19 | P a g e
Two approaches are compared in their work. The first approach simply counts positive and
negative terms where the review is positive if the review contains more positive than negative
terms. Review is negative if it contains more negative than positive terms. A review is neutral
if it contains equal number of positive and negative terms. The term counting method can be
easily modified to use valence shifters. The second method counts positive and negative terms,
but takes contextual valence shifters into account. Their approaches are classified as basic (uses
the first approach) and improved one (uses the second approach).
The main lexicon used in this work was the General Inquirer (GI) though they added extra
terms from other sources. As the authors stated, their motivation to use this approach was to
see the effect of incorporating contextual valence shifters to the basic method of sentiment
classification. The data sets they for experimental purpose are taken from two sources. The first
data set is taken from www.epinions.com [11]. Epinions.com is a general consumer review
site. The data set taken contains 70 positive and 70 negative reviews. The reviews were
collected from a variety of different products, including air conditioners, sewing machine,
vacuums cleaners, TVs, cookware, beer and wine. The second data set is a movie reviews that
contains 2000 reviews, 1000 positive and 1000 negative taken from other movie review
sources.
The experimental result indicated that the proposed approaches perform well as indicated in the
following. The basic approach using GI lexicon gives an accuracy of 0.679 for product reviews
and 0.595 for movie reviews. The improved method using GI lexicon gives an accuracy of
0.686 for product reviews and 0.627 for movie reviews. The experimental results of adding
extra terms from other resources to GI are also given in their work and some improvements are
shown. In most cases the method of classification performs better when classifying product
reviews than movie reviews. This is because movie reviews are known to be more difficult to
classify than other reviews such as product reviews [36].
Lili Zhao and Chunping Li [34] used ontology based opinion mining for movie reviews with
the goal of improving feature level opinion mining by employing ontology. The use of this
approach was motivated by the role of ontology in conceptualizing domain specific
information. The main components of the proposed approach are: text collection (movie
reviews), preprocessing, feature identification, polarity identification and sentiment analysis
with the support of ontology development. Like others the polarity identification fully relies on
a lexicon of tagged positive and negative sentiment terms which are used to quantify

20 | P a g e
positive/negative sentiment. For this purpose SntiWN [37] was used as it provides a readily
interpretable positive and negative polarity values for a set of ‘affective’ terms.
The target of the ontology development is to define common terminologies in the area, and
give the definition of the relationship among the terminologies. Iterative approach is used for
developing the ontology following two steps. The first step is selecting the relevant sentences
including concepts and the second step is extracting the concepts from those sentences. Criteria
used for selecting the sentences are: the sentences that contain conjunction word and sentences
that contain at least one concept seed. At the initial state, manually labeled feature are used as
seeds. Randomly selected 1400 movie reviews from internet movie database (IMDB) [38] were
used as dataset where half of them are positive and the other half are negative. The
experimental results indicate that the accuracy is satisfying, and proves that it is reasonable to
compute the polarity score by the proposed method where the main factor is found to be the
ontology structure. Even though this work is ontology based, it depends on the lexicon of
opinion terms for assigning weights to the sentiment terms. Ontology is basically needed for
feature extraction.
3.2. Sentiment mining from opinionated non- English texts

Xiaoying Xu et al. [20] on their work titled “categorizing term’s subjectivity and polarity
manually for opinion mining”, proposed principles and guidelines to create a large-scale
Chinese sentiment lexicon for opinion mining manually. Two experiments are conducted in
their work: the first experiment is conducted to investigate the reliability of manual subjectivity
labeling of the terms. The second experiment is conducted to see the effectiveness of the
lexicon in judging the polarity of subjective sentences.
In this paper, it is indicated that for establishing the first and large scale human tagging
Chinese sentiment lexicon, the agreement of different annotators and the reliability in sentence
polarity judging system are key issues. As a result annotation principles and guidelines are
needed to be established. The principles in tagging the terms subjectivity and polarity
established by the authors are: the terms should be opinion mining oriented, the lexicon built
will be used only in the subjective sentence in opinion mining and the word will indicate
polarity in the subjective sentence. A clear guideline in annotating the sentiment word and
qualified annotators are needed for realizing the principles of building the lexicon. During the
tagging campaign and lexicon building, the main resource used was HowNet [39]. HowNet is
an on-line common-sense knowledge base unveiling inter-conceptual relations and inter-
21 | P a g e
attribute relations of concepts as connoting in lexicons of the Chinese and their English
equivalent. According to their analysis the answer they gave to the question “what kind of word
can be selected in the lexicon?” is ‘if a term has subjective meaning either in concept meaning
or in emotion meaning overtone, and it can indicate the polarity in subjective sentence, it must
be selected in our lexicon.
The experimental results for the first experiment show that the polarity of word sense can be
reliability annotated in despite of the polarity ambiguous in words is common Chinese, and
also because of its large-scale it could be very useful fundamental resource in opinion mining
and other related fields. To evaluate the lexicon in real applications (the second experiment),
they built simple sentence sentiment recognition system that contains text pre-processing,
polarity words detection and weight assign, link construction and polarity propagation. The text
pre-processing is used to segment words and tag POSs. In the polarity words detection, every
polarity word is checked whether it is polarity word defined in the sentiment lexicon and get
the corresponding sentiment polarity if found. Every polarity word and modifier word get the
initial weight defined in the sentiment lexicon. If the polarity word is linked to a modifier
word, the polarity value should be multiplied by a coefficient in the polarity propagation step.
The results of the second experiment show that using the sentiment lexicon it can achieve an
accuracy of more than 70%.
Sigrid Maurel et al. [6] used a combined approach (combination of symbolic and statistical) for
the classification of opinionated texts in the French language. The symbolic approach includes
systems for extracting information adapted to the corpora based on the rules of syntactic and
semantic analyzer. This approach analyzes texts sentence by sentence and extracts relationships
that convey feelings. Statistical method is based on machine learning techniques. It process text
in a single step and assigns a global opinion at the whole text at the end. The hybrid approach
is used in their work to increase the quality of the results. As they indicated the experimental
results show that combination of statistical and symbolic (hybrid) approaches gives more
accurate results than either method used separately.

3.3. Review’s spam detection


Evaluative texts on the Web have become a valuable source of opinions. Existing research has
been focused on classification and summarization of opinions. An important issue that has been
22 | P a g e
neglected so far is opinion spam or trustworthiness of online opinions [35]. Nitin Jindal and
Bing Liu [8] attempted to study review’s spam and spam detections. Spam review contains
false positive or malicious negative opinions. They proposed duplicate finding and
classification techniques to detect spam reviews. The duplicate approach is based on duplicate
reviews using the shingle method [54] with similarity score of >0.9. The classification
approach is based on 2-class classifications: spam and non-spam. As they indicated, the
experiment on the manufactured products review domain showed promising results.
3.4. Summary

This chapter reviewed different research attempts to solve the problem of sentiment mining for
different languages. The review showed that machine learning, ontology based and lexicon-
based are the commonly used approaches to deal with sentiment mining. The works reviewed
indicated that the approaches except the machine learning rely on tagged list of positive or
negative sentiment terms to identify the polarity of terms. The machine learning technique is
based on the concept of training the machine to learn to classify opinionated texts into
predefined categories of positive, negative or neutral. Ontology is employed particularly to
extract feature of an object for the purpose of refining feature level sentiment analysis. The
lexicon based approaches are based on the concept of counting the sentiment terms available in
the opinionated texts.

CHAPTER FOUR

4. DESIGN AND IMPLEMENTATION

23 | P a g e
4.1. Introduction

In this chapter, the design and implementation of the proposed sentiment mining model for
opinionated Amharic texts is described in detail. The proposed model has the following
components: pre-processing, sentiment word detection, weight manipulation, polarity
classification and polarity strength (post-polarity classification analysis). Each component is
composed of sub components which are the building blocks of the system. Pre-processing is
responsible for normalization of reviews and words segmentation. In the sentiment words
detection component, all possible sentiment words and contextual valence shifter terms are
checked for existence in the sentiment lexicon. The weight manipulation component contains
sub systems: weight assignment and polarity propagation. After the weight manipulation is
completed, the next step is the polarity classification of the reviews. The strength of the
polarity (whether it is positive or negative) is rated in the post-classification analysis step. The
sentiment word detection and weight manipulation activities are fully dependent on the lexicon
of Amharic opinion terms that contains opinion terms tagged with a readily interpretable
values. The procedures of building the sentiment lexica, the types of lexicon, the guidelines
and principles followed during the sentiment lexicon building process are also described in this
chapter. In addition, tools used for implementing the prototype and the proposed algorithms are
also presented.

3.2. General system architecture


The general architecture of the proposed model (sentiment mining model for opinionated
Amharic texts) is shown in figure 4.1. As shown in the Figure, the system contains different
components based on the processes required. These components are: pre-processing, sentiment
words detection, weight manipulation, polarity classification and post-classification analysis
(reviews polarity strength). The sentiment lexicon is also part of the general systems
architecture.

Reviews

Tokenization
24 | P a g e
Delimiters
Normalization

Normalized
terms list

Sentiment lexica

Domain specific
Amharic
Checking sentiment words
lexicon

Polarity words Valence shifters

General purpose
Sentiment words Detection Amharic sentiment words
lexicon

Weight assignment and polarity propagation

Polarity classification

Review polarity strength

Figure 4.1: The sentiment mining model for opinionated Amharic texts

4.2.1. Pre-processing

25 | P a g e
The first phase of the sentiment mining model for opinionated Amharic texts is the pre-
processing component. This component is responsible to accept the input review and produce a
set of terms after performing lexical analysis (tokenization) and normalization. For our work,
the pre-processing components needed are adopted from the work of [40]. The adopted
components are described below.
Tokenization
Tokenization is the first step in pre-processing of the input review. Tessema [40] used a string
tokenizer to construct words from a sequence of characters. The input for this activity is the
actual review which is going to be categorized. This activity reads a sequence of characters as a
string and tokenizes them using predefined list of delimiters such as new lines and space.
Normalization
After tokenization, it is normalization of homophones that is followed. Amharic writing system
has homophone characters, characters with same pronunciation but different symbols; for
example, it is common that the character ስ and ሥ are used interchangeably as ስራ and ሥራ to
mean work. Such types of inconsistencies in writing words are handled by replacing characters
of the same sound by a common symbol. The normalization handles:
 The replacement of Amharic alphabets that have the same pronunciation and use, but
different representation with common alphabet.

 Short forms of characters that are usually written using forward slash “/”) and period
(“.”), for example, ጠቅላይ ሚንስተር can be written as ጠ/ሚኒስተር አዲስ አበባ as አ.አ and ዶክተር
as ዶ/ር.

26 | P a g e
4.2.2. Detection of sentiment words

This activity is responsible for detecting polarity terms and contextual valence shifter terms.
After the review is preprocessed, every valid term in the review is checked whether it is
sentiment word or not. This is done by a simple detection mechanism where the whole lexicon
is scanned for every term. If the term exists in the dictionary, then the term is a polarity word
(positive or negative) or a contextual valence shifter (negation or intensifier). Polarity words
are terms that can express opinions towards an object such as ‘ጥሩ’ (good) that expresses
positive opinion and ‘መጥፎ’ (bad) that expresses negative opinion towards an object. These
terms are properly tagged in the lexicon with computer interpretable values as ‘+’ for positive
opinion terms and ‘-’ for negative opinion terms. Then, if a term is found in the lexicon and if
its corresponding value is ‘+’, then this opinion term is positive. Similarly, if a term is found in
the lexicon and if its corresponding value is ‘-‘, then this opinion term is negative. As shown in
figure 4.1, there are two lexica of opinion terms: the domain specific lexicon and the general
purpose lexicon. This division is similar to the key sentiment words and general sentiment
words indicated as a future work in the work of [20].
The key sentiment word corresponds to domain specific lexicon and the general sentiment
word corresponds to general purpose lexicon. The terms in the domain specific lexicon can be
selected according to the characteristics of different domains such as product reviews, film
reviews, political opinions etc. In this work, we have a single domain specific lexicon of movie
reviews. This lexicon contains Amharic opinions terms that are used in movie reviews domain
such as ‘አዝናኝ’ (cheerful), ‘ምርጥ’ (best), ‘የሚያስጠላ’ (morbid), ‘የሚያስቅ’ (funny) etc [41].
The general purpose lexicon, as its name indicates, is used for opinion mining system in any
domain. This is because the opinion terms in this lexicon are not restricted to specific domain
rather it contains any opinion terms in the Amharic language. As a result the valid terms in the
review are first checked in the domain specific lexicon with the assumption that both the
review and specific lexicon are from the same domain (e.g. movie reviews domain). Then if at
least a single term is found in the domain specific lexicon, the process continues to the next
step (weight assignment and polarity propagation) otherwise the general lexicon is scanned for
further search. If the term taken from the review is not found in both lexica, this term is
considered as non-sentiment word and it is discarded as such terms are not important in the
sentiment classification problem.

27 | P a g e
Incorporating contextual valence shifters
There are two different aspects of valence shifting that are used to improve the basic system (a
system without considering contextual valence shifters). These are negations and intensifiers.
Negations are terms that reverse the sentiment polarity of a certain term [9]. For example
consider the following sentence ‘ፊልሙ ጥሩ ነዉ’ (the film is good) versus ‘ፊልሙ ጥሩ አይደለም’
(the film is not good). In the first one ‘ጥሩ’ (good) is a positive term so this sentence is positive.
When ‘አይደለም’ (not) is applied to the clause, ‘ጥሩ’ (good) is being used in negative context and
so the sentence is negative.
Intensifiers are terms that change the degree of the expressed sentiment. For example, in the
sentence ‘ፊልሙ በጣም ጥሩ ነዉ’ (the film is very good), the terms በጣም ጥሩ (very good) are more
positive than just ‘ጥሩ’ (good) alone. On the other side, in the sentence ‘ፊልሙ ጥሩ ቢሆንም’ , the
term ቢሆንም (even though), makes this statement less positive. These are examples of
overstatements and understatements. Overstatements are terms that increase the intensity of a
positive/negative term, while the understatements decrease the intensity of that term. Terms
that overstate or understate are also listed in our lexicon.
To identify overstatements and understatements, all positive sentiment terms in our model are
given a value of +2. If they are preceded by an overstatement in the same clause, then they are
given a value of +3. If they are followed by an understatement in the same clause, then they are
given a value of +1. Negative terms are given a value of -2 by default. If they are preceded by
an overstatement in the same clause, they are given a value of -3. If they are followed by an
understatement in the same clause, they are given a value of -1.
4.2.3. Weight assignment and polarity propagation

In this phase the main activities are: weight assignment and polarity propagation. All possible
sentiment terms are tagged in the lexica by ‘+’ and given a default value of +2 at run time. All
the negative sentiment terms are tagged by ‘-’ and given a default value of -2. Before the final
average polarity weight is calculated, the polarity propagation is done which is used to modify
the initial value of the sentiment terms. This modification of the initial value or weight is done
only if the sentiment word is linked to a modifier term (negations or intensifiers). The polarity
propagation is done according to the following rules.
Rule 1: if any polarity term is followed by a negation term, the initial polarity value or weight
of the term will be reversed.

28 | P a g e
For example in the sentence ‘ፊልሙ ጥሩ አይደለም’ (the film is not good), the sentiment term ‘ጥሩ’
(good) is given an initial value of 2. But due to the negation term ‘አይደለም’ (not), the polarity
value of the term is reversed to -2. Similarly, in the sentence ‘‘ፊልሙ መጥፎ አይደለም’ (the film is
not bad), the sentiment term ‘መጥፎ’ (bad) is given an initial value of -2. But due to the negation
terms ‘አይደለም’ (not), the polarity value of that sentiment is reversed to +2.
Rule 2: if a positive sentiment terms is preceded by an overstatement term, then the initial
value of that terms is propagated from +2 to +3.
For example in the sentence ‘ፊልሙ በጣም ጥሩ ነዉ’ (it is very good), due to the overstatement
term ‘በጣም’ (very), the initial polarity value of the sentiment term ‘ጥሩ’(good) is increased by
+1 from +2 to +3.
Rule 3: if a positive sentiment term is followed by an understatement term, the initial value of
that term is decreased from +2 to +1.
For example in the sentence ‘ፊልሙ ጥሩ ቢሆንም’ (even though), the polarity weight of the
sentiment term ‘ጥሩ’ (good) is decreased from the initial value +2 to +1 due to the
understatement term ቢሆንም’ (even though).
Rule 4: if a negative sentiment term is preceded by an overstatement term, then the initial
value of the term is decreased by -1 from -2 to -3.
For example in the sentence ‘ፊልሙ በጣም መጥፎ ነዉ’( it is very bad), due to the overstatement
‘በጣም’ (very), the initial weight of the sentiment word ‘መጥፎ’( bad) is decreased from -2 to -3.
Rule 5: if a negative sentiment term is followed by an undrstatement term, the initial weight of
that term is increased by +1.
For example in the sentence ‘ፊልሙ መጥፎ ቢሆንም (even though), the initial weight of the
sentiment term is increased from -2 to -1 due to the understatement term.
Rule 6: if a sentiment term is not linked to any contextual valence shifting term, the initially
assigned weight is considered for further process.
Rule 7: The contextual valence shifting terms are applied only to the nearest single sentiment
term.
4.2.4. Polarity classification

In this component as shown in figure 4.2, the criteria for classifying a review into predefined
categories: positive, negative or neutral are described in detail. The total polarity weight of a
review is calculated by adding the polarity weight of the individual sentiment terms in the
review by the formula given in equation 1 [23].
29 | P a g e
Rp = ∑ T pi … … … … … … … … … … … … . equation 1
Where, Rp is review polarity value, Tp is sentiment term polarity value, n is number of
sentiment terms within the given review and i is term instance.
According to the result of the equation, if the value of Rp is greater than zero then the review is
categorized into a predefined category positive. Similarly if the value of Rp is less than zero
then the review is categorized in to a predefined category negative. Finally if the total average
weight of all the individual terms is equal to zero, the review is categorized in to the category
neutral.
Review polarity value

0
> <
> <
=
=

Positive Negative

Neutral

Review polarity strength

Figure 4.2: review polarity classification


For example in the sentence ‘ሳ ራ ጥሩ የ ፈጠራ ዉጤት ፊ ልም ነ ዉ፣ ወጣቶች የ ተካ ተቱበ ት ስ ለ ሆነ በ ጣም
የ ሚበ ረ ታታ ፊ ልም ነ ዉ’, the sentiment terms are ‘ጥሩ’ with an initial value of +2, ‘የ ሚበ ረ ታታ’

with an initial value of +2 but since it is preceded by an 'overstatement its value is +3.
Therefore the average weight is done as shown in Table 4.1.

30 | P a g e
Table 4.1 sentiment terms’ polarity propagation example
Sentiment terms Initial weight Overstatement Adjusted weight

ጥሩ +2 +2

የ ሚበ ረ ታታ +2 በ ጣም +3

Total score +5

Category Positive

4.2.5. Review’s Polarity strength


Sentiment polarity strength determines how strongly a word is positive and also how strongly a
word is negative [42]. This is different from rating, which is concerned with generalizing
sentiment classification to fine-grained scales. Rating attempts to determine sentiment
classification using ratings such as “three star”, or “four stars” rather than simply determining
whether a review is “positive” or negative [43].

Wilson et al. [44] used different clues and mechanisms to determine the polarity strength of
individual opinion terms and phrases. But in our work, we devised a technique to determine the
polarity strength of the whole review instead of the individual sentiment terms. As a result, we
used a method that computes the absolute value of the total polarity weight of all the sentiment
terms within a given review to determine the polarity strength of the review. The computed
result corresponds to a five star scale. One star (*) indicates to weak polarity strength where as
five star (*****) indicates strong polarity strength. This figurative information helps an
individual to easily understand how strong positive or negative is a given review or how weak
positive or negative a given review is. No scale is used to indicate the strength of neutral
reviews. The polarity strength of every review is computed as follows. If the absolute value of
the computed total polarity weight equals to one, it corresponds to a one star scale and it
indicates that the review is weakly positive or weakly negative. If the computed absolute value
of the total polarity weight equals to two, it corresponds to two star scales so that this indicates
that the review’s polarity is medium. If the computed absolute of the total polarity weight is
equals to three, it corresponds to a three star scale and indicates the review’s strong positivity
or strong negativity. And finally, if the absolute value of the computed total polarity weight of

31 | P a g e
all the sentiment terms in the given review equals or greater than four, it corresponds to four or
five star scale and indicates that the review is very strong positive or very strong negative. Both
four and five star denotes very strong polarity with different degrees.

4.3. Implementation
In this sub section, the Amharic sentiment lexicon building issues, the tools used for
implementing the prototype, the procedures to integrate the different components, the proposed
algorithm, the input review, output result and other related issues are described.

4.3.1. Building sentiment lexicon

The quality of lexicon-based sentiment classification systems depend on the effectiveness of


the sentiment lexicon [45]. As a result we have followed some principles and guidelines when
building the sentiment lexicon. In addition to the principles and guidelines, different sources
and mechanisms are used to build the sentiment lexicon as there are no publicly available
resources in Amharic language that can be used and integrated with our model.
The main resource we used is the subjectivity lexicon of OpinionFinder [46], which contains a
list of English subjectivity terms compiled from several sources [47]. The terms in the lexicon
are tagged as strong subjective or weak subjective. A clue which is subjective in most context
is considered strongly subjective, and those that may only have certain subjective usages are
considered weakly subjective. Moreover, the words length, part of speech (POS), and the prior
polarity value (positive, negative or neutral) are also given accordingly. The sample of this
lexicon is given in Appendix A. This subjectivity lexicon was used in the work of Theresa
Wilson et al. [47] for recognizing contextual polarity in phrase-level sentiment analysis. Then,
we took these subjectivity terms which are above 8000 words and we translated to their
corresponding Amharic meaning using the SelamSoft electronic Amharic-English dictionary
software [41]. This is a web based dictionary that works properly in both directions (English to
Amharic and Amharic to English] as shown in figure 4.3. The objective of translating the
English terms to Amharic is to find the corresponding possible subjective or sentiment terms in
Amharic. Before going to the dictionary, we devised criteria for selecting terms from the
subjectivity lexicon we used as a source. The two main criteria we used are: the strong
subjectivity and POS value. Terms which indicate strong subjectivity and have POS value of
adjective are given high priority for selection. The selection of terms using the criteria helps us

32 | P a g e
reduce the size of the subjectivity lexicon so that only the selected terms are translated. As a
result more than 3000 words are selected using the criteria.

Figure 4.3 SelamSoft electronic English-Amharic dictionary used for sentiment translations

Furthermore, hard copy Amharic dictionary “ኣማርኛ መዝገበ ቃላት” published by Addis Ababa
University [48] was used to collect additional Amharic sentiment terms. This process was done
by two post-graduate students at Addis Ababa University in the department of languages and
literatures. They used the dictionary to collect the opinion terms based on the following
guidelines.
 A term that can express subjectivity (positive or negative) independent of any other
term is selected into the sentiment lexicon.
 Terms that have a POS value of Adjective or noun are given priority.
 Only the most commonly used contextual valence shifter terms are selected
Accordingly, 895 Amharic terms are collected from the first process (through translation)
where 392 of them are positive (+) and the rest 503 are negative (-). From the second process,

33 | P a g e
393 Amharic opinion terms are collected where 159 of them are positive (+) and the rest 234
are negative (-).
From a total of 1,288 Amharic opinion terms collected from the two processes, duplicated
terms are removed and 955 terms remained as a list of final lexicon where 411 of them are
positive(+) and the rest 544 are negative (-). Finally, these Amharic opinion terms are validated
by a professional from the Linguistics Department at Addis Ababa University. Some of the list
of the collected Amharic opinion terms and the approval letter are presented in Appendix E.
As opinion mining systems are quiet domain dependent [1], a small sized additional movie
reviews specific lexicon is also built that contains 97 terms. The purpose of this domain
specific lexicon is to improve the effectiveness of the proposed model in the domain selected
for evaluation. This is because some opinion words indicate different polarities in different
domains. For example, the word “ገዳይ” as in “በሳቅ ገዳይ” is positive in movie reviews domain
where as in other domains such as Law it is negative as in “ነብስ ገዳይ”.

4.3.2. Lexicon building guidelines

When building the lexicon of Amharic opinion terms, we established principles and guidelines
as follows:
 Every sentiment term is selected considering the opinion mining orientation.
 The lexicon will be used only in Amharic sentiment classification systems.
 The word should indicate polarity in any subjective sentences.

It was found in many researches that adjectives are important indicators of subjectivities and
opinions [5]. As a result:
 Terms with POS value of adjective are given high priority when selecting them to the
Amharic sentiment lexicon in both the above processes.
As far as my knowledge, at this time there is no publicly available literature that clearly
describes the opinion expressing Amharic terms and their properties. As a result:
 In the first process, only terms found in the dictionary are selected to our lexicon.
Similarly, in the second process the terms considered as sentiment terms by the
students’ level of knowledge are selected to our lexicon.
 Commonly used and unambiguous contextual valence shifter terms are considered.

34 | P a g e
From the collected Amharic sentiment terms, sentiment terms that are ambiguous to annotate
their prior polarity are removed before approval. Some of the ambiguous Amharic sentiment
terms removed are: ‘ቀላል’, ‘አይንአፋር’, ‘ተጠያቂ’, ‘ስርየት’, ‘ብልጥ’, ‘ከባድ’… etc. The removal of these
terms from our sentiment lexicon doesn’t mean that these sentiment terms are no more
important. But, through further analysis of different domain and discussion with professionals
of the language, the ambiguity of the sentiment terms can be solved.

4.3.3. Tools

In order to achieve our objective, we used different environments and tools. Python
programming language is used to develop the prototype. Python is an interpreted, object
oriented, high level programming language with a dynamic semantics. It’s high-level built in
data structures, combined with dynamic typing and dynamic binding; make it very attractive
for Rapid Application Development, as well as for use as a scripting or glue language to
connect existing components together [49]. The python programming language is a
dynamically typed, object oriented, interpreted language and it is great for natural language
processing (NLP) because it is simple, easy to debug (exceptions and interpreted language),
easy to structure (modules and object oriented) and powerful for string manipulation.
We used python 3.0.1 version because it is possible to use encodings different than ASCII in
python source files. As a result, Amharic language characters are directly interpreted by python
3.0.1 and above versions without the need to go for transliteration or feeding the Unicode
representation of the characters. All the source codes and rules of the prototype are written in
python 3.0.1 compatible format because this version doesn’t support backward compatibility.
The SelamSoft electronic Amharic English dictionary software is used as main resource for
building the Amharic sentiment lexicon.

Dictionary representation

Dictionary is a useful built in data type into python. Regular python dictionaries iterate over a
key: value pairs in an arbitrary order. Dictionaries are sometimes found in other languages as
“associative memories” or “associative arrays”. Unlike sequences, which are indexed by a
range of numbers, dictionaries are indexed by keys, which can be any immutable type; strings
and numbers can always be keys. Tuples can be used as keys if they contain only strings,

35 | P a g e
numbers, or tuples. If tuple contains any mutable object directly or indirectly, it cannot be used
as keys. Dictionaries in python are an unordered set of keys: values pairs, with the requirement
that the keys are unique (with one dictionary). A pair of braces creates an empty dictionary: {}.
Placing a comma separated list of key: value pairs within the braces adds initial key: value
pairs to the dictionary. The main operations on a dictionary are storing a value with some key
and extracting the value given by the keys [51].

The sentiment terms in our lexicon are written in a text file according to the dictionary syntax
of python 3.0.1 where the syntax is: {“key”:” value”, . . .}. The “key” attribute represents the
sentiment terms where as the “value” represents the corresponding initial polarity value of the
sentiment term. For example the list of Amharic sentiment terms: ‘ጥሩ +, በጎ +, መጥፎ -, ጥንብ-‘can
be formulated into a python 3.0.1 dictionary as follows: {“ጥሩ “:”+”, “በጎ”: “+”, “መጥፎ”:” –“,
“ጥንብ” :”-“,} but in older versions of python this dictionary is formulated as : {‘ጥሩ ‘:’+’, ‘በጎ’:
‘+’, ‘መጥፎ’:’ –’, ‘ጥንብ’ :’-’,}. Sample of the dictionary representation of the Amharic sentiment
terms used in this research work is given in Appendix B.

The lexicon of sentiment terms (dictionary) can be put within the source code or can be
imported as a text file at run time. Therefore for each key its corresponding value is returned
for further process. As a result for each Amharic sentiment terms in the input review (key), the
whole dictionary is scanned for its corresponding value. Sample of the input review data is
given in figure 4.4.

36 | P a g e
Figure 4.4 sample movie reviews input

The corresponding output of the movie review sample input given above is given in figure 4.5
as follows

Figure 4.5 sample of polarity classified movie reviews

Sample of review polarity classification with its polarity strength is given in figure 4.6.

37 | P a g e
Figure 4.6 sample classified reviews with polarity strength

Sample of the prototype that shows accepting opinionated Amharic text inputs from the user
through the data input widget and returning the polarity classification of the input opinionated
Amharic text is given in figure 4.7. This demo indicates that the user or reviewer can write
his/her comments or opinions in Amharic towards a target of object in the input text widget
through their terminal or computer and submit so that the opinion can be pre-processed,
classified and can be used for further analysis.

38 | P a g e
Figure 4.7 sample of accepting review from user and its polarity classification

Similarly, sample of the prototype that shows browsing Amharic opinionated texts from file
and their polarity classifications are presented in figure 4.8. Large number of reviews or
opinionated texts can be collected manually or automatically and stored in file. This large
number of opinionated texts can be processed and classified at once. In this case, each
opinionated text is processed and labeled with its polarity category as positive, negative or
neutral and final statistical data that shows the number of positive opinionated texts, negative
opinionated text and neutral opinionated texts of the total is generated so that this data can be
used for further analysis and decision making. This is what the simple prototype in figure 4.8
tries to show.

39 | P a g e
Figure 4.8 sample of browsing Amharic opinionated texts and their classification results

4.3.4. The Proposed Algorithms


Given a pre-processed review, the proposed sentiment mining model operates in three steps.
First it takes tokenized and normalized review terms and checks them if they bear sentiment.
This is done by checking the existence of the terms in the dictionary of sentiment terms. Next
the sentiment terms are assigned initial polarity weight and polarity propagation is done if the
sentiment terms are linked to contextual valence shifter terms. Finally, the review assigned into
a predefined categories: positive (+), negative (-) or neutral based on the total weight obtained
from the previous step. The high level view of the proposed algorithms that show how the
sentiment terms are detected, classified and how the sentiment polarity value is propagated is
given as follows.

40 | P a g e
Algorithm 4.1: review’s sentiment detection and polarity classification

1. For every pre-processed reviews R

2. For every term T in the review R, checks its existence in the lexicon of
sentiments D

3. If a term T exists in the dictionary D

3.1. Its corresponding initial polarity weight Tpi is given

3.2. If it is linked to a contextual valence shifter term C

3.2.1. The initial polarity value Tpi of the term is propagated

3.3. Add all the polarity weights of the individual terms to get review
polarity value Rp

3.4. If the total polarity weight Rp is greater than 0, then the review is
categorized into predefined category positive (+)

3.5. If the total polarity weight Rp is less than zero, then the review is
assigned into a predefined category negative (-)

3.6. Else the review is assigned into a predefined category of neutral.

4. Else the review is assigned into a unclassified class because there are no
sentiment terms Ts in the given review

Sentiment polarity propagation

The process of polarity propagation is done only if the sentiment terms T is linked to a
contextual valence shifter terms C. accordingly, the procedures during the polarity propagation
is given as follows.

41 | P a g e
Algorithm 4.2: Sentiment polarity propagation

1. For every sentiment term T in review R


2. If a sentiment term T is linked to contextual valence term C
1.1. If a sentiment term T is linked to a negative contextual valence term
C, then the prior polarity value of the term T is reversed from Tpi to
–Tpi.
1.2. If the sentiment term T is linked to overstatement contextual valence
shifter term C, then the prior polarity value of the term T is modified
from Tpi to Tpi +1 (for positive sentiment terms) and from Tpi to
Tpi-1(for negative sentiment terms)
1.3. If the sentiment term T is linked to understatement contextual
valence shifter C, then the prior polarity value of the term is
modified from Tpi to Tpi-1( for positive sentiment terms) and from
Tpi to Tpi+1 (for negative sentiment terms)

3. Else the initial polarity value of the term Tpi is maintained


.

42 | P a g e
4.4. Summary
In this chapter, we presented our proposal for the sentiment mining model of opinionated
Amharic texts. The architecture of the proposed model contains components: preprocessing,
sentiment words detection, weight assignment and propagation, polarity classification, polarity
strength representation and sentiment lexica. After the reviews are preprocessed, each term is
checked for existence in the sentiment lexica at the sentiment words detection component. The
detected sentiment terms are assigned weight and the values of sentiment terms that are linked
to contextual valence shifters are propagated in the weight assign and polarity propagation
component. Based on the weights of the sentiment values, the reviews are classified into
predefined categories: positive, negative or neutral. Finally, the polarity strength of the reviews
is rated.
In addition, the implementation related issues such as Amharic sentiment lexica construction,
guidelines for building the lexicon, the tools used for developing the prototype, the dictionary
representation of the sentiment terms and the proposed algorithms are presented in this chapter.
The sentiment lexica are built manually from different sources based on the principles and
guidelines. Python programming language and python 3.0.1 interpreter are used to develop the
prototype. The algorithm for sentiment polarity classification and the algorithm for sentiment
polarity value propagation are the proposed algorithms.

43 | P a g e
CHAPTER FIVE
5. EXPERIMENTAL RESULTS
This chapter presents the experimental results of the developed prototype system. The
experimental setups/procedures, the evaluation parameters, results and discussions of are
presented in this chapter. The lack of readymade available resources such as lexicon of opinion
terms, data sources and well defined tools made conducting the experiment challenging.
5.1. Procedures and Experimental Setups

To evaluate the developed sentiment mining model for opinionated Amharic texts, we used
procedures and setups that include data collection, methods and manual classifications. These
are described in the subsequent sections.
5.1.1. Opinionated Data Collection

As indicated in the previous chapters, we have considered the movie reviews domain as a
major reviews domain for conducting the experiments. The main reason why we used the
movie reviews domain is due to the lack of readily available reviews written in Amharic
language. As a result it is relatively more easy and manageable to collect movie reviews
manually than any other domains. This is because it is possible to distribute questionnaires to
movie funs from the different cinemas in Addis Ababa. In addition, movie viewers can write
comments freely as compared to other domains such as politics. Hence most of the movie
reviews we used for conducting the experiments are collected manually. This is done by
preparing questionnaires and distributing them to the movie funs in the different cinemas in the
city of Addis Ababa. The questioners used and sample responses to the questioner are given in
Appendix C. The rest few movie reviews are collected from additional two sources. The first
source is habeshafilms.com, a recently published website for promoting the Ethiopian film
industry and allowing funs to leave their comments for a film they selected. The second source
is from a set of movie reviews collected by an undergraduate student of Department of
Theatrical Arts at Ababa University. The movies to be reviewed were randomly selected and
randomly distributed by the author. As a result a total of 254 movie reviews are collected from
all the sources described above.
In addition to the movie domain reviews, additional 49 reviews were taken from another
domain (newspaper reviews domain). These reviews collected from reporter [52], a local
Amharic bi-weekly news paper. The reviews were given by readers when the newspaper was

44 | P a g e
celebrating its 1000th edition. The purpose of using these additional reviews is to see the
performance of the system prototype developed in the different domains.
Movie reviews
Movie reviews are known to be more difficult with sentiment mining. This is because movie
reviews often contain many sentences with objective information about characters, directors or
actors of the movie. Although these sentences are not used to express the author’s opinion, they
may contain many positive and negative terms. In addition, movie reviews contain more
literary description than product reviews, which brings more implicit comments and results in
low performance [53].
According to [36], the unique characteristics of movie reviews is: when a person writes a
movie review, he/she probably comments not only movie elements (e.g. screen play, vision
effects, music) but also movie related people (e.g. director, script writer, actor) while in
product reviews, few people will care the issues like who has designed or manufactured the a
product. Therefore, commented features in a movie review are much richer than those product
reviews. As a result movie review mining is more challenging than other domains such as
product review mining.
5.2.2. Manual classification

This activity is concerned with labeling the reviews for experimental purpose. All the 301
reviews (both the movie domain and newspaper domain reviews) are manually categorized by
an independent individual from the domains into predefined categories: positive (+), negative
(-), neutral (N) or unclassified (UN). If the given review is not related with the topic in target, it
is assigned into the unclassified (UN) category. As a result, 170 of the total movie reviews are
labeled as positive (+), 28 of them are labeled as negative (-), while the 29 reviews are labeled
as neutral (N) and finally, the rest 27 are unclassified reviews. Similarly, 32, 14, 2 and 1 are of
the total newspaper reviews that are labeled as positive, negative, neutral and unclassified
respectively. The manually classified reviews helped us in crosschecking with the results
obtained from our prototype system: sentiment mining model for opinionated Amharic texts.
5.3. Evaluation

This activity is responsible for describing the evaluation parameters of the designed model and
its results. Evaluation of the prototype system is made with the evaluation parameter that
compares the number of reviews which are categorized correctly and incorrectly. Typically, the

45 | P a g e
comparison is done between the reviews categorized by the proposed prototype system and that
of the manually labeled (categorized) reviews.
Precision and recall, which are the evaluation parameters of information retrieval (IR), are used
in text classifications. Precision measures the exactness of a classifier. Precision is the ratio of
the number of reviews classified correctly to the total number of reviews in a given category. A
high precision means less false positive, while a lower precision means more false positives.
TC
P= … … … … … … … … . equation 2
TC + FC
Where, TC denotes the number of reviews which are classified correctly and FC denotes the
number of reviews which are classified incorrectly.
Recall measures the completeness or sensitivity of a classifier. It is the ratio of TC and the
whole reviews belonging to the category. A high recall means less false negative, while lower
recall means more false negatives.
TC
R= … … … … … … … … … . equation 3
TC + MC
Where, MC denotes the number of reviews which are missed by the classifier, i.e. neither
classified correctly or incorrectly (unclassified category).
There is trade-off between precision and recall. Greater precision decreases recall and greater
recall leads to decreased precision. The F-measure is the harmonic mean of P and R and takes
account of both the measures. As a result, F-measure is defined as follows:
2PR
F= … … … … … … … … … . . . equation 4
P+R
5.4. Results

In this section, we present the experimental results of the three different experiments. The first
experiment (basic system) is done using a single general purpose dictionary without
considering the contextual valence shifter terms. The second experiment is conducted using
two sentiment lexica: the general purpose lexicon and the domain specific lexicon. And finally,
the result of the experiment conducted using the two lexica and considering the contextual
valence shifter terms. Comparison of all the different experimental results is also presented in
this section.
All the 254 movie reviews and 49 newspaper reviews are used for conducting all the
experiments. Each review was classified by the system prototype according to the procedures
described earlier and all the results were recorded. Then the results were compared with the
46 | P a g e
manually labeled classifications. As a result, the results obtained for each experiment are given
as follows.
5.4.1. Experiment one: Basic system

This experiment used the standard lexicon of sentiment terms i.e. the general purpose Amharic
sentiment terms. The experiment is conducted for both movie and newspaper reviews domain.
The results measured by accuracy, precision, recall and F-measure for each domain and classes
is presented in table 5.1 as follows.
Table 5.1: Results of experiment one
System Reviews Class Precision Recall F-measure
Positive 0.929 0.823 0.867
Movie Negative 0.6 0.573 0.589
Basic system Positive 0.93 0.9 0.914
Newspaper Negative 0.5 0.75 0.6

5.4.2. Experiment two: using general purpose and domain specific lexica

This experiment is conducted mainly to see the effect of using domain specific lexicon. As
indicated in chapter four, domain specific lexicon refers to the list of opinion terms specific to
a given domain such as movie, politics, economics, products etc. As a result, in this experiment
we used both the general purpose lexicon and movie reviews domain lexicon. Only the 254
movie reviews are used for the experimental purpose. This is because we didn’t build a lexicon
of opinion terms specific to newspaper reviews domain. As presented in table 5.2 with similar
measurements to that of experiment one, the results of this experiment show improvements
when compared with the results of experiment one on movie reviews. This improvement is
mainly due to the use of the domain specific lexicon in addition to the general purpose lexicon
of terms.

47 | P a g e
Table 5.2: results of experiment two
System Reviews Class Precision Recall F-measure
Positive 0.937 0.943 0.939
Basic + domain Movie Negative 0.62 0.78 0.69
lexicon

5.4.3. Experiment three: using both lexica and contextual valence shifter terms

This is the last experiment conducted considering the contextual valence shifter terms into
account. As explained in the precious chapters, contextual valence shifter terms are terms that
change the initial polarity value of a term or modify the initial value polarity value of a term.
As a result, this experiment is done by using domain specific lexicon, general purpose lexicon
and the contextual valence shifter terms for the movie review. Only the general purpose
lexicon and the contextual valence shifter are used for the newspaper reviews. Therefore, the
results of this experiment are presented in table 5.3 as follows using the measurements similar
to the above experiments.
Table 5.3 results of experiment three
System Reviews class precision Recall F-measure
Positive 0.943 0.949 0.945
complete Movie Negative 0.666 0.842 0.743
General lexicon Positive 0.93 0.900 0.914
+ valence shifters Newspaper Negative 0.500 0.750 0.600

The experiments shown us that the results are promising despite the research work is in its
infant stage. One reason for this good promising result is the convergence of the sentiment
terms used by movie reviewers and the collected Amharic sentiment terms from different
sources. This is because almost all of the reviews used in the experiments are very short as
shown in Appendix D when compared to the reviews used by other researchers where most of
the reviews are composed many paragraphs. As a result, reviewers used a very commonly used
Amharic opinion words to express their opinion within those short reviews. Similarly, when we
collect Amharic sentiment terms to our lexicon, the main criteria we used was the commonality
of the sentiment terms. This was done by selecting sentiment terms that are commonly used by

48 | P a g e
the level of our knowledge and terms that represent opinion polarity without ambiguity.
Therefore, when the length and complexity of reviews written in Amharic increases, the size
and quality of the Amharic sentiment lexicon should also increased to keep and improve the
system’s performance.
In addition, the third experiment it re-conducted by using additional reviews given towards a
newspaper. Those reviews are taken from Reporter, local bi-weekly Amharic newspaper
printed on October 8, 2010 by media and communication center (MCC) [55]. The reviews are
written by readers of the newspaper towards that newspaper when it was celebrating its 15th
year crystal anniversary. The experimental results using these dataset are given in table 5.4. As
usual, the reviews are first classified manually. As a result, 21 of the 35 reviews are assigned
into positive category while the rest 10 and 4 are assigned into negative and neutral category
respectively.
Table 5.4 Results of experiment three with additional reviews
System Reviews Class Precision Recall F-measure
Positive 0.857 0.94 0.896
General lexicon Newspaper Negative 0.555 0.8 0.655
+ valence shifters

5.6. Discussion of the results


As shown above, the three different experiments are done with different experimental setups

and have shown us very good and promising results. These different experimental setups are

the reasons to the variations of experimental results. The variations of results, reasons for the

variations of results and important examples are discussed in this section.

The first results of the first experiment show that the system prototype performs relatively well

with newspaper reviews than with movie reviews. This is mainly due to the complex nature of

movie reviews. As explained above, movie reviews are known to be difficult in sentiment

classification systems as compared to other domains. Similarly, in both domains, the system

prototype performs well with positive reviews than with negative reviews. This can be caused

49 | P a g e
by different reasons related to the nature of natural language. In this research work, we have

learnt some reasons for the slanted results. The first reason is that when writing reviews in

Amharic, many reviewers use positive opinion terms to express negative opinions. For

example: in the review “ከራሳችን ጭንቅላት ፊልቆ የወጣ ቢሆን ጥሩ ነው፡፡ ከውጭ ሀገረ ፊልሞች ባንሰርቅ

ይመረጣልPolarity: Positive”, the expressed opinion is negative but the system prototype labeled

it as positive. This is because the reviewer used the positive opinion terms ‘ጥሩ’ (good) in

his/her complex sentence to express his/her negative opinion towards the film. The second

reason we have learnt is that most of the reviews collected and used for experimental purpose

are reviews that contain positive opinions. For example, from 254 movie reviews only 28

reviews are negative as learnt from the manual labeling. As a result, this less number of

negative reviews may have some influence on the precision of the negative class. In addition,

many reviewers do not use explicit Amharic sentiment terms to express negative opinions. For

example,” ፊልሙ በቤቱ ውስጥ ብቻ አለቀ Polarity:Unclassified”. This is negative opinion expressed

without using explicit Amharic sentiment terms and this may affect the precision of the

negative class.These kinds of complexities of natural languages make sentiment mining

systems more challenging. As a result, detailed analysis is needed on the Amharic languages

constraints to solve such kinds of problems.

In the second experiment, the results show that the system prototype performs when compared

to the system prototype in the first experiment on movie reviews. The improvement of

performance in the second experiment is due to the incorporation of domain specific lexicon of

opinion terms: in this case movie reviews domain. Experimental results using newspaper a

review is not given in this experiment. This is because we did not built media specific lexicon

of opinion terms. Similar justifications from experiment one can be considered to the good

50 | P a g e
performance of the system in the second experiment with positive reviews as compared with

negative reviews.

Finally in the last experiment, the results show relative improvements. As indicated earlier, this

experiment is conducted considering the contextual valence shifter term such as negations. In

this experiment, we can see two different improvements: subjective and objective. The

negation terms are the causes for the objective improvements while the intensifiers are the

causes for the subjective improvements. The objective improvement can be measured in

numbers as given in table 5.3. The subjective improvement is observed in the polarity strength

representation. This is because the intensifiers don not change the polarity orientation of an

opinion terms rather these terms change or modify the polarity strength of an opinion term. For

example in the movie review “እሽ በጣም ቆንጆ አይደለም ብዙ የተሰራበት ነገር ከመምረጡም በላይ አንድ

ቤተሰብ ላይ ብቻ ገንዘብ አፍቃሪ መሆናቸውን አገነነው ለነገሩ ፊልሞቻችን አንደ አካልን ላይ ማተኮር ይወዳሉ….

Polarity: Positive”, in the first and second experiment this review is labeled as “positive” by

the system even though this review is “negative”. But in the third experiment, the system

prototype labeled this review as negative:” እሽ በጣም ቆንጆ አይደለም ብዙ የተሰራበት ነገር ከመምረጡም

በላይ አንድ ቤተሰብ ላይ ብቻ ገንዘብ አፍቃሪ መሆናቸውን አገነነው ለነገሩ ፊልሞቻችን አንደ አካልን ላይ ማተኮር

ይወዳሉ….Polarity:Negative”. This is because the negation terms “ አይደለም (not)” reverses the

polarity orientation of the sentiment term “ቆንጆ (cute)” from positive to negative.

In addition, in the movie review “በጣም ቀሽት ነው Polarity:Positive”, the polarity strength of this

movie review was represented by two stars as in “በጣም ቀሽት ነው Polarity:Positive ❶❷” in the

first and second experimental results. But in the third experiment, the incorporation of

intensifier terms modifies the polarity strength of the review. The polarity strength of the

review is modified from two stars to three stars representation as in “በጣም ቀሽት ነው

Polarity:Positive ❶❷❸”. This is because the intensifier term “በጣም (very)” modifies the

51 | P a g e
polarity strength of the opinion terms “ቀሽት (cute)” by adding a value of +1 to the original

polarity value of +2.

As represented in table 5.3, there is no change in the objective experimental results using the

media (newspaper) reviews when compared with the results of experiment two though there is

improvement in the polarity strength representation of the results using stars. This is the

reviews used from this domain are very few in number (only 49 reviews) so that the probability

of the occurrence of negation terms is minimum.

In general, for conducting the above experiment, every component of the experimental setups

is constructed from scratch and the experimental results obtained are encouraging and

promising. Having this we have observed opinion statements written in Amharic that express

strong positivity or negativity without using explicit opinion terms. Such kinds of statements

pose challenges to sentiment mining systems. For example in the review “ላለፉት ብዙ አመታት

ሪፖርተር ጋዜጣ ከሌሎች ከማነባቸው እንደ ዋሽንግተን ፖስት እና ኒውዮርክ ታይምስ ከሚባሉ ጋዜጦች ያልተለየኝ ነው፡፡”

expressed a positive opinion towards the news paper but the system does not recognize this

review neither in positive nor in negative categories. This is because the reviewer did not use

any explicit Amharic opinion terms in expressing his/her opinion. This kind of reviews can be

managed by comparative sentiment mining system, which is concerned with addressing and

mining comparative opinion expressed in documents or reviews.

52 | P a g e
CHAPTER SIX
6. CONCLUSIONS AND RECOMMENDATIONS
6.1. Conclusions
The web has dramatically changed the way that people express their views and opinions. They
can now post reviews of products at merchant sites and express their view on almost
everything in Internet forums, discussion groups and blogs. This online word-of-mouth
behavior represents new and measurable sources of information with many practical
applications. Now if one wants to buy a product, he/she is no longer limited to asking his/her
friends and families because there are many product reviews on the web which gives opinions
of existing users of the product. For a company, it may no longer be necessary to conduct
surveys, organize forum groups or employ external consultant to find consumer opinions.
However, it is difficult for a human reader to find relevant sources, extract related sentences
with opinions, read them summarize them and organize them into useful forms. As a result,
automated opinion discovery and summarization systems are needed. Sentiment analysis, a text
mining problem, grows out of this need. Due to its tremendous value for practical applications,
there has been an explosive growth of both research in academics and applications in the
industry.
This research work has tried to go through the techniques of sentiment mining for opinionated
Amharic texts. To classify a given opinionated document or text into predefined classes, the
opinionated document passes through pre-processing, detection of sentiment words, weight
assignment and polarity classification processes. Pre-processing involves normalization and
tokenization. The detection of sentiment words is a process of detecting polarity words and
contextual valence shifters based on the sentiment lexicon. Weight assignment and polarity
propagation is responsible for assigning an initial weight for detected sentiment terms and
propagating polarity value of sentiment terms that are linked to contextual valence shifters.
Polarity classification is concerned with categorizing a given opinionated document into
predefined categories based on the weights obtained from the weight assign and polarity
propagation process.
In order to detect the sentiment terms from a given opinionated document, assign initial value
to the sentiment terms and propagate the initial polarity values, lexica of properly tagged
Amharic sentiment terms are used.

53 | P a g e
The high-level tasks that are undertaken to accomplish the objective and the results obtained
are: identifying the techniques of building Amharic sentiment lexicon, building the sentiment
lexicon where two lexica of general purpose and domain specific are built, designing the
general architecture of the proposed sentiment mining model, an initial prototype for the
sentiment mining model for opinionated Amharic texts is developed, and testing the developed
prototype for sentiment mining model for opinionated Amharic texts with movie review as a
main experimental dataset.
The results of the lexicon-based sentiment mining model for opinionated Amharic texts using
the processes explained above are encouraging. However, further work can be done to improve
the proposed model’s results.
6.2. Contributions of the study
Some of the main contributions of this research work are given below.
 A model is proposed for sentiment mining of opinionated Amharic texts.
 We collected above 950 Amharic sentiment terms
 Different techniques are employed to build the sentiment mining lexicon that can also
be advantageous to those who need to collect additional Amharic sentiment terms.
 Algorithms are developed to realize the proposed model
 We created a general understanding of the subject matter; sentiment analysis for
opinionated Amharic texts.
 This research can be used as a base work for sentiment mining related research works
for opinionated Amharic documents.
 A prototype system that is based on the model is developed.
 The prototype is evaluated for effectiveness and encouraging results are obtained.

6.3. Recommendations
Even though many things are done in this work to develop a sentiment mining model for
opinionated Amharic texts or documents, developing a full-fledged, fully functional and a more
efficient sentiment mining system needs coordinated team efforts that comprises linguistic
professional, computer science professional and other people such as those who have the
experiences of collecting large number of comments from the public. Therefore, good
coordination of those different professionals can result a sentiment mining system with full
functionality and a better performance.

54 | P a g e
6.4. Future Works
There are many possible directions for future works. In this research work, we used only
subjective Amharic sentences (opinionated Amharic sentences), but a future work can broaden
the scope to subjectivity classification which is concerned with classifying random documents
into subjective (opinionated document) or objective (non-opinionated documents). This may
help reduce the manual efforts that are needed to be applied in identifying opinionated and
non-opinionated documents.
Another strategy that can be considered in the future is to improve the performance of the
sentiment mining model and to enrich the available Amharic sentiment lexicon. This can be
done by increasing the size of the lexicon, by considering phrase-level Amharic sentiment
terms and by improving the quality of the lexicon considering more different domains. In
addition, more precise analysis will be applied to the Amharic sentiment terms’ polarity
strength because some positive and negative sentiment terms may not be equally positive or
negative. So that positive and negative terms can be given explicit weights to show how
positive or how negative they are. All overstatement and understatements also may not be
equally weighted.
Feature level sentiment mining can also be another future research work direction which is
concerned with identification and extraction of commented features and determining the
sentiments towards these features. For example, movie features such as director, actor, lighting
… etc can be identified and their corresponding opinions can be determined. This is a more
detailed area of study in the sentiment mining research works.
Handling reviews or opinionated documents with idiomatic expressions and longer reviews or
opinionated documents from different domains can also be another focus of future research
works.

55 | P a g e
References
[1]. B. Pang and L. Lee, “Opinion Mining and Sentiment Analysis”, Foundations and
Trends in Information Retrieval, Now publishers, 2008.
[2]. Lina Zhou and Pimwadee Chaovalit, “ontology- supported polarity mining”, Journal of
the American society for information science and technology, 2008.
[3]. J. Tatemura, “Virtual reviewers for collaborative exploration of movie reviews,” in
Proceedings of Intelligent User Interfaces (IUI), New Orleans, Louisiana, USA, 2000.
[4]. Bing Liu,” Sentiment Analysis”, 5th Text Analytics Summit, Boston, June 1-2, 2009.
[5]. Bing Liu. Sentiment Analysis and Subjectivity, Handbook of Natural Language
Processing, Second Edition, Chemical Rubber Company (CRC) Press, Taylor and
Francis Group, 2010.
[6]. Sigrid Maurel, Paolo Curtoni and Luca Dini, “A Hybrid Method for Sentiment
Analysis”, Statistical Analysis Software (SAS) press, Grenoble, France, 2008.
[7]. Tessema Mindaye, Meron Sahlemariam, Teshome Kassie. The Need for Amharic
WordNet, Global WordNet Conference, Mumbai, India, 2010.
[8]. N. Jindal and B. Liu, “Review spam detection,” Proceedings of WWW, California,
2007.
[9]. Livia Polanyi and Annie Zaenen, “Contextual valence shifters”, In proceedings of the
AAAI Symposium on Exploring Attitude and Affect in Text: Theories and Applications
(published as AAAI technical report SS-04-07), 2004.
[10]. Habesha Films, http://www.habeshafilms.com/, last accessed on June 2010.
[11]. “Unbiased reviews by real people”, http://www.epinions.com/, a shopping company,
last accessed October, 2009.
[12]. Yves Bestgen ,” Building Affective Lexicons from Specific Corpora for Automatic
Sentiment Analysis”, in proceedings of the European Language Resources Association
(ELRA), Belgium, 2008.
[13]. Feiyu XU & Xiwen CHENG, "Opinion Mining", Saarbrucken, Germany, Dec 3 2007.
[14]. Anne Kao and Stephen R. Poteet (Eds),"Natural Language Processing and Text
Mining", Springer-Verlag London Limited 2007.
[15]. Ding, X., Liu, B. and Yu, P, A Holistic Lexicon-Based Approach to Opinion Mining, in
proceedings of the first ACM International Conference on Web search and Data Mining
(WSDM’08), Stanford, 2008.
56 | P a g e
[16]. Opinion Miner - Online sentiment analysis, http://www.slideshare.net/igmelig/opinion-
miner-online-sentiment-analysis, last accessed on April 2010.
[17]. Anna Stavrianou , Jean-Hugues Chauchat," Opinion Mining Issues and Agreement
Identification in Forum Texts", Atelier FODOP, 2008.
[18]. Alexander O’Neill, “Sentiment Mining for Natural Language Documents”, Australian
National University, November 2009.
[19]. Lili Zhao and Chunping Li, “Ontology Based Opinion Mining for Movie reviews”,
Springer-Verlag Berlin Heidelberg 2009.
[20]. Xiaoying Xu et al,” Categorizing Terms’ Subjectivity and Polarity Manually for
Opinion Mining in Chinese”, IEEE 2009.
[21]. By Carmen Banea,"Subjectivity and Sentiment Analysis", NLP handbook, 2009.
[22]. B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment classification using
machine learning techniques,” in Proceedings of the Conference on Empirical Methods
in Natural Language Processing (EMNLP, 2002.
[23]. Alistair Kennedy and Diana Inkpen, "Sentiment Classification of Movie and Product
Reviews Using Contextual Valence Shifters", Computational Intelligence, Volume 22,
Number 2, 2006.
[24]. Narayanan, Ramanthan, Liu, Bing & Choudhary, Alok “Sentiment Analysis of
Conditional Sentences”. Proceedings of Conference on Empirical Methods in Natural
Language Processing, Singapore, 2009.
[25]. Michael C. Daconta, Leo J. Obrst, Kevin T. Smith, “The Semantic Web: A Guide to the
Future of XML, Web Services, and Knowledge Management”, 2003.
[26]. Xiwen Cheng, Feiyu Xu,"Fine-grained Opinion Topic and Polarity Identification",
European Language Resources Association (ELRA).
[27]. Philip J. Stone, Dexter C. Dunphy, Marshall S. Smith, Daniel M. Ogilvie, and
associates. The General Inquirer: A Computer Approach to Content Analysis. The MIT
Press, 1966.
[28]. S. I. Hayakawa, editor. Choose the Right Word. Second Edition, revised by Eugene
Ehrlich. HarperCollins Publishers, 1994.
[29]. S. R. Das and M. Y. Chen, “Yahoo! for Amazon: Sentiment extraction from small talk
on the Web,” Management Science, 2007.

57 | P a g e
[30]. J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack, “Sentiment analyzer: Extracting
sentiments about a given topic using natural language processing techniques,”
Proceedings of the IEEE International Conference on Data Mining (ICDM), 2003.
[31]. Whissell. The dictionary of affect in language. Emotion: Theory, Research, and
Experience, Plutchik & Kellerman Eds. Academic Press, 1989.
[32]. C. Fellbaum, ed., Wordnet: An Electronic Lexical Database. MIT Press, 1998.
[33]. Sandeep Balijepalli,"Blogvox2: A Modular Domain Independent Sentiment Analysis
System", Master’s thesis of computer science, University of Maryland, 2007.
[34]. Esuli, A., Sebastiani, F.: Sentiwordnet: A publicly available lexical resource for
opinion mining. In: Proceedings of 5th Conference on Language Resources and
Evaluation, LREC 2006.
[35]. N. Jindal and B. Liu, “Opinion spam and analysis,” Proceedings of the Conference on
Web Search and Web Data Mining (WSDM), Palo Alto, California, USA, 2008.
[36]. L. Zhuang, F. Jing, X.-Y. Zhu, and L. Zhang, “Movie review mining and
summarization,” in Proceedings of the ACM SIGIR Conference on Information and
Knowledge Management (CIKM), 2006.
[37]. Esuli, A., Sebastiani, F.: Sentiwordnet: A publicly available lexical resource for opinion
mining. In Proceedings of 5th Conference on Language Resources and Evaluation,
LREC 2006.
[38]. The Internet Movie Database, (http://www.imdb.com), an online database of
information related to movies, October 17, 1990.
[39]. Dong Zhendong,Dong Qiang. HowNet and the Computation of Meaning. World
Scientific Publishing Co.Pte.Ltd. 2006.
[40]. Tessema Mindaye, “Design and Implementation of Amharic search engine”, Master
thesis Addis Ababa University, July 2007.
[41]. SelamSoft Inc,” http://www.AmharicDictionary.com”,SelamSoft Amharic-English
Dictionary Basic 1.0,2009.
[42]. Soo-Min Kim and Eduard Hovy. Determining the sentiment of opinions. In
Proceedings of the International Conference on Computational Linguistics (COLING),
2004.
[43]. Pang, Bo and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment
categorization with respect to rating scales. In Proceedings of the ACL, 2005.

58 | P a g e
[44]. Wilson, Theresa, Janyce Wiebe, and Rebecca Hwa. Just how mad are you? Finding
strong and weak opinion clauses. In Proceedings of AAAI, 2004.
[45]. Das, S. & Chen, M.,”Yahoo! For Amazon: extracting market sentiment from stock
message boards”,Paper presented in the 8th Asia Pacific Finance Association Annual
Conference, Bankok, Thailand,2001
[46]. University of Pittsburgh, MPQA Releases - Corpus and Opinion Recognition System,
http://www.cs.pitt.edu/mpqa/, last accessed on June 2010.
[47]. Theresa Wilson, Janyce Wiebe and Paul Hoffmann. Recognizing Contextual Polarity in
Phrase-Level Sentiment Analysis. Proceedings of HLT/EMNLP 2005.
[48]. የኢትዮጵያ ቋንቋዎች ጥናትና ምርምር ማእከል፣አዲስ አበባ ዩኒቨርሲቲ “ኣማርኛ መዝገበ ቃላት” አዲስ አበባ
ዩኒቨርሲቲ ማተሚያ ቤት, 2001.
[49]. python, Python Programming Language, “http://www.python.org/doc/essays/blu, last
accessed on June 29, 2010.
[50]. Nitin Madnani, “Getting Started on Natural Language Processing with Python”,
published in ACM Crossroads, 2007.
[51]. Fred L. Drake, Jr., Python Tutorial Release 2.3.3, Python Software Foundation,
December 19, 2003.
[52]. Reporter, a local bi-weekly newspaper, http://www.ethiopianreporter.com, last accessed
January 2010.
[53]. P.D. Turney. Thumbs up or thumbs down? Semantic orientation applied to
unsupervised classication of reviews. In Proceedings of the 40th Annual Meeting of the
Association for Computational Linguistics (ACL), 2002.
[54]. Broder, A. Z. On the resemblance and containment of documents. In Proceedings of
Compression and Complexity of Sequences, IEEE Computer Society, 1997.
[55]. Reporter," local bi-weekly Amharic newspaper, media and communication center,
Addis Ababa, Ethiopia, October 10, 2010.

59 | P a g e
Annexes
Appendix A: sample subjectivity lexicon of OpinionFinder
type=weaksubj len=1 word1=good pos1=anypos stemmed1=n priorpolarity=positive
type=weaksubj len=1 word1=goodly pos1=adverb stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=goodness pos1=noun stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=goodwill pos1=adj stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=goodwill pos1=noun stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=goof pos1=noun stemmed1=n priorpolarity=negative
type=strongsubj len=1 word1=goof pos1=verb stemmed1=y priorpolarity=negative
type=strongsubj len=1 word1=gorgeous pos1=adj stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=gorgeously pos1=anypos stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=gossip pos1=adj stemmed1=n priorpolarity=negative
type=strongsubj len=1 word1=gossip pos1=noun stemmed1=n priorpolarity=negative
type=strongsubj len=1 word1=gossip pos1=verb stemmed1=y priorpolarity=negative
type=strongsubj len=1 word1=grace pos1=noun stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=graceful pos1=adj stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=gracefully pos1=adverb stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=graceless pos1=adj stemmed1=n priorpolarity=negative
type=strongsubj len=1 word1=gracelessly pos1=adverb stemmed1=n priorpolarity=negative
type=strongsubj len=1 word1=gracious pos1=adj stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=graciously pos1=adverb stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=graciousness pos1=noun stemmed1=n priorpolarity=positive
type=weaksubj len=1 word1=graft pos1=noun stemmed1=n priorpolarity=negative
type=weaksubj len=1 word1=grail pos1=adj stemmed1=n priorpolarity=positive
type=weaksubj len=1 word1=grail pos1=noun stemmed1=n priorpolarity=positive
type=weaksubj len=1 word1=grand pos1=adj stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=grandeur pos1=noun stemmed1=n priorpolarity=positive
type=strongsubj len=1 word1=grandiose pos1=adj stemmed1=n priorpolarity=negative

60 | P a g e
Appendix B: Sample representation of sentiment terms in our dictionary

Gen_ictionary={"ሰላም":"+","ምላሽ":"-","ተአምረኛ":"+","ጉዳት":"-","የሚያወላውል":"-","ዱብዳ":"-
","ባለፀጋ":"+","ትምክህተኛ":"-","ቁጥብ":"+","ጎሽ":"+","እድል":"+","ተመፃዳቂ":"-","አርበኛ":"+","ልክ":"+","ፍጭት":"-
","ጠቢብ":"+","ያልሰመረ":"-","አቤቱታ":"-","ፍቃድ":"+","ውሸት":"-","ግርማ":"+","ዘለቄታ":"+","አውደልዳይ":"-
","ርህራሄ":"+","አሜን":"+","ተብታባ":"-","ድካም":"-","ቅፅበታዊ":"-","አዝናኝ":"+","አስቸጋሪ":"-","ተመፃዳቂ":"-
","ትንግርት":"+","ከሀድ":"-","እርካሽ":"-","ርካሽ":"-","ቂላቂል":"-","ቅናታም":"-","ሸባ":"-","ፅዱ":"+","እብድ":"-","ቀማኛ":"-
","ቁጣ":"-","ጥገኛ":"-","አስጨናቂ":"-","ተሀድሶ":"+","ቁርጠት":"-","ማሸንክ":"-","ኋላቀርነት":"-","የማይለወጥ":"-
","ጠቃሚ":"+","ሀሰት":"-","ደህና":"+","ፈጣጣ":"-
","ቸር":"+","ምስጋና":"+","ሰብአዊ":"+","ማአረግ":"+","ስምምነት":"+","መደብደብያ":"-","ሸጋ":"+","ልቅ":"-","በቀል":"-
","ሀዘን":"-","ፍቅር":"+","የተሳሳተ":"-","በደል":"-","ወዳጅ":"+","አስተዋይ":"+","ከባድ":"-","ድንግልና":"+","ፈሪ":"-
","ድንግል":"+","ቋጣሪ":"-","ረብሻ":"-","ሞኝነት":"-","ይሉኝታ":"-","ፍቃደኛ":"+","የሞተ":"-","በደለኛ":"-
","ምሁር":"+","እርቃን":"-","ከይሲ":"-","ቸርነት":"+","አህዛብ":"+","መተዛዘኛ":"+","ተስፋ":"+","ምክር":"+","እንከን":"-
","ነጋሲ":"+","ኮስታራ":"-","ምሳሌ":"+","ተጨባጭ":"+","አስደናቂ":"+","እድፍ":"-","አበሳ":"-","የተወደደ":"+","የሚያመነታ":"-
","መረዳዳት":"+","ያልተረጋገጠ":"-","ራእይ":"+","ዝምተኛ":"+","ግልፅ":"+","ፅንፈኛ":"-","አፈንጋጭ":"-","ኩራተኛ":"-
","ደብዛዛ":"-","እክል":"-","መቅጫ":"-","መረጋጋት":"+","ጥዱ":"+","ጥል":"-","አላስፈላጊ":"-","ቀናተኛ":"-","ፈሳም":"-","ቦዘኔ":"-
","ሸባ":"-","ጭቁን":"-","አረንቋ":"-","የተስተካከለ":"+","ጀግንነት":"+","ቆራጥ":"+","አዳሚ":"-","መጮህ":"-","ወረርሽኝ":"-
","ትችት":"-","ውድ":"+","ተፈላጊ":"+","ማጭበርበር":"-","ገነት":"+","ከፍተኛ":"+","ከዳተኛ":"-","ንደት":"-","ሽባ":"-","ጋጋታ":"-
","አፀያፊ":"-","አመርቂ":"+","መቃብር":"-","አድናቆት":"+","ወልጋዳ":"-","አእምሮ":"+","ሞልጣፋ":"-","ግጭት":"-","ጥለኛ":"-
","ቃል":"+","ዘመናዊ":"+","ሰላማዊ":"+","አሰልቺ":"-","ውስብስብ":"-","ታማኝ":"+","የማይበገር":"+","ዝሙት":"-","ምቅኝነት":"-
","ደነዝ":"-","ደማቅ":"+","ምቀኛ":"-","ጭንቀት":"-","እርቅ":"+","እድገት":"+","ጩኸት":"-","ሰይጣን":"-
","አስተማማኝ":"+","ዘልዛላ":"-","ጥቅም":"+","ባርነት":"-","ውሸታም":"-","አክራሪ":"-","ፈታኝ":"+","ንፁህ":"+","አጭበርባሪ":"-
","መርዘኛ":"-","አማፂ":"-","ምቾት":"+","ልዩ":"+","ችሎታ":"+","ቁርጠኝነት":"+","መግረፍያ":"-","የሚያሳዝን":"-","ምፀት":"-
","የሚቃረን":"-","ወርቃማ":"+","ተፅእኖ":"-","ዘገምተኛ":"-","አመል":"-","ደፋር":"+","አፋኝ":"-","አስደማሚ":"+","ሀሰት":"-
","ወሬ":"-","ተአምራዊ":"+","ወሬኛ":"-","ዋጋ":"+","ጥንብ":"-","ግርማዊ":"+","ዝና":"+","አወንታ":"+","ያለመሳካት":"-
","ሸክም":"-","ጋኔን":"-","ምሁራዊ":"+","ህብረት":"+","እምቢተኛ":"-","ልምላሜ":"+","የተረጋጋ":"+","ብልሀት":"+","ከፋፋይ":"-
","ትንሽ":"-","ክህደት":"-","ነፃነት":"-","አሉታዊ":"-","አሸባሪ":"-","ጣፋጭ":"+","ሸረኛ":"-","መላምታዊ":"-","ፌዝ":"-
","ያልተጠበቀ":"-","ንቃት":"+","በጎ":"+","መርዝ":"-","ችኩል":"-","አድመኛ":"-","ተቃዋሚ":"-","ሌሊት":"-","ሹጣም":"-
","ትጋት":"+","ያልነቃ":"-","ጥበብ":"+","ከንቱ":"-","ወላዋይ":"-","ታታሪ":"+","እውነት":"+","የላቀ":"+","ቃጠሎ":"-","መካን":"-
","አሉባልታ":"-","ሽብርተኛ":"-","ጠሳ":"-","ወዘና":"+","ተጠቂ":"-","ሀላፊነት":"+","ዋልጌ":"-","ሚና":"-
","አብነት":"+","ባልንጀራ":"+","ድርቅ":"-","ስልጡን":"+","ገውጋዋ":"-","ምሩቅ":"+","ምሩቅ":"+","ብድር":"-","እብለት":"-
","ንፍግ":"-","ቀኝ":"+","ሌባ":"-","ሙደኛ":"+","ቀጥታ":"+","ፈታኝ":"-","ርካሽ":"-","ሙሰኛ":"-","ኩራት":"+","እጦት":"-
","ሀዋሪያ":"+","ሀርነት":"+","ተጠቃሚ":"+","አመል":"-","ጥላሸት":"-","ጥጋብ":"+","ቡካን":"-","ሰይጣን":"-
","ትብብር":"+","ትግል":"+","ጉስቁል":"-","ለፍላፊ":"-","ተንኮል":"-","አርአያ":"+","የማይረሳ":"+","ክብር":"+","ኢፍትሀዊነት":"-
","ጥፋት":"-","ብኩን":"-","ቀልጣፋ":"+","ነሁላላ":"-","ስድ":"-","ጩሀት":"-","አለመተማመን":"-
","ፅናት":"+","ቅዱስ":"+","የማይታመን":"-","ፀፅታ":"+","ድልዝ":"-","ዘላቂ":"+","ወሸኔ":"+","ደባሪ":"-","ጎስቃላ":"-
","ሀቅ":"+","ሀብት":"+","ሀይለኛ":"+","ደስተኛ":"+","ነውጠኛ":"-","ነፍናፋ":"-","ፍትሀዊ":"+","ህሊና":"+","ሸርሙጣ":"-
","ዘረኝነት":"-","ስርቅታ":"-","እሺ":"+","ተድላ":"+","ደህንነት":"+","ቅናት":"-","ምስኪን":"-","አምባገነን":"-",}

61 | P a g e
Appendix C: questionnaire and sample responses

62 | P a g e
Appendix D: sample of movie reviews
1. ግሩም የሆነ ፊልም ነው
2. በጣም ጥሩ ነው በተለይ በተለይ ትብብራቸው በተመለከተ በቀጣይነት አብረው እንድሰሩ እመኝላቸዋለሁ
3. ለማሳቅ የሚደረግ ከመጠን ያለፈ ቀልድ ሴቶች ወንዶችን የሚያደርጉት ጥረት መጥፎ ቦታ እንደሚጥቀው የድምፅ ጥራት
ችግርና ተያያዥ ችግሮች
4. ጥሩ ነው ግን copy pest ነው፡፡ ይቅርብን copy pest
5. ፊልሙ በጣም አሪፍ ነው ግን በራሳችን በሀገራችን ታሪክ ቢሰራ ጥሩ ነው copy ነው
6. አሪፍ ነው ግን አይኮርጅ
7. በጣም የሚደነቅ የአገራችንን የፊልም ደረጃ የሚያሳድግ ፊልም ነው፡፡ በርቱ ተበራቱ እንላለን
8. ያምራል
9. በጣም የሚገርም ፊልም ነው
10. ፊልሙን በአገራችን ደረጃ ካሉት ፊልሞች ምርጡ እና እጅግ በጣም ማራኪ መሆኑን መገለፅ እወዳለሁ፡፡ Thank you
11. ፊልሙ ጥሩ ትእይንት ቢኖረውም ውጥረቱ አናሳ ከመሆኑ የተነሳ ውበቱ ላይ የሚያህል ውበት ቀኑሶበታል የግል
አስተያየት በተለይ ሴቷ መሪ ተዋናይ
12. እጅግ በጣም ምርጥ እና ተመልካችን ይዞ የሚገዝ ሲሆን ነገር ግን በጥንቃቄና በማስተዋል የሚታይ ፊልም ነው
13. እስካሁን ካየሀቸው ፊልሞች በጣም የሚያስጠላ አላማ የሌለው? ገንዘብ ለመሰብሰብ ብቻ የተደረገ ስራ ፈልም፡፡
ማንኛውም ሰው ማየት የሌለበት ፊልም ቢኖር ነው
14. What is the objective of the film
15. ብሬን ብትምልሱልኝ ደስ ይደኝ ለቃጠልኩት ሰአት በጣም አዝናለሁ
16. ደግማችሁ ለማገም ለሰው ልጅ በሙሉ ባታሳዮት
17. መጀመሪያ የፊልሙ objective አልገባኝም፣ በአጠቃላይ ብዙ አልተመቸኝም
18. እሽ በጣም ቆንጆ አይደለም ብዙ የተሰራበት ነገር ከመምረጡም በላይ አንድ ቤተሰብ ላይ ብቻ ገንዘብ አፍቃሪ
መሆናቸውን አገነነው ለነገሩ ፊልሞቻችን አንደ አካልን ላይ ማተኮር ይወዳሉ….
19. በጣም የሚያስጠላ ፊልም ነው ደደብ ደራሲና ኘሮድውሰር የስራውና ያየሁት ቢቻል ከአሁን በሀላ ለህዝቡ አታሳዩ
አንዳንደ ከገንዘብ ውጭ ማማሰብ ጀምሮ ለፊልሙ ለተሳተፋ
20. ከስርየት ቀጥሎ mid ውስጥ የቀረ……
21. ፊልሙ በጣም ወድጅዋለሁ በይበልጥ የፊልሙ ታሪክና ትወናው ግን የተወሰነ sod truck እና የፊልሙ ቀረፃ የተሻለ
ቢሆን ፊልሙ በጣም ቆንጆ ነው
22. ብዙም እንደጠበኩት አላገኘሁትም ነገር ግን የተረደሁት ነገር ቢኖር “ ለበጐ ነገር የሚደረጉ መልካም ነገሮች ሁሉ ዋጋ
አይከፈልባቸውም”
23. በጣም ጥሩ ነው

63 | P a g e
Appendix E: List of validated Amharic sentiment terms and approval letter

62 | P a g e
ቅናት/- ትግል/+ ድርቅ/- ተቃዋሚ/- ተፅእኖ/-
ምስኪን/- ጉስቁል/- ስልጡን/+ ሌሊት/- ዘገምተኛ/-
አምባገነን/- ለፍላፊ/- ገውጋዋ/- ከፋፋይ/- አመል/-
ነውጠኛ/- ተንኮል/- ምሩቅ/+ ትንሽ/- ደፋር/+
ነፍናፋ/- አርአያ/+ ምሩቅ/+ ክህደት/- አፋኝ/-
ፍትሀዊ/+ የማይረሳ/+ ብድር/- ነፃነት/- አስደማሚ/+
ህሊና/+ ክብር/+ ቃጠሎ/- አሉታዊ/- ሀሰት/-
ብቸኛ/- ኢፍትሀዊነት/- መካን/- አሸባሪ/- መርዘኛ/-
ወለፌንድ/- ጥፋት/- አሉባልታ/- ጣፋጭ/+ አማፂ/-
ትእቢት/- እጦት/- ሽብርተኛ/- ሸረኛ/- ምቾት/+
ቁጠባ/+ ሀዋሪያ/+ ጠሳ/- መላምታዊ/- ልዩ/+
መማቀቅ/- ሀርነት/+ ወዘና/+ ሸክም/- ችሎታ/+
ፀፅታ/+ ተጠቃሚ/+ ተጠቂ/- ጋኔን/- ቁርጠኝነት/+
ድልዝ/- አመል/- ሀላፊነት/+ ምሁራዊ/+ መግረፍያ/-
ዘላቂ/+ ጥላሸት/- ዋልጌ/- ህብረት/+ የሚያሳዝን/-
ወሸኔ/+ ጥጋብ/+ ሹጣም/- እምቢተኛ/- ምፀት/-
ደባሪ/- ሰይጣን/- ትጋት/+ ልምላሜ/+ አስተማማኝ/+
ጎስቃላ/- ትብብር/+ ያልነቃ/- ግሩም/+ ዘልዛላ/-
ሀቅ/+ እብለት/- ጥበብ/+ የተረጋጋ/+ ጥቅም/+
ሀብት/+ ንፍግ/- ከንቱ/- ብልሀት/+ ባርነት/-
ሀይለኛ/+ ቀኝ/+ ወላዋይ/- ወሬ/- ውሸታም/-
ደስተኛ/+ ሌባ/- ታታሪ/+ ተአምራዊ/+ አክራሪ/-
ብኩን/- ሙደኛ/+ እውነት/+ ወሬኛ/- ፈታኝ/+
ቀልጣፋ/+ ቀጥታ/+ የላቀ/+ ዋጋ/+ ንፁህ/+
ነሁላላ/- ፈታኝ/- ፌዝ/- ጥንብ/- አጭበርባሪ/-
ስድ/- ርካሽ/- ያልተጠበቀ/- ግርማዊ/+ ምቅኝነት/-
ጩሀት/- ሙሰኛ/- ንቃት/+ ዝና/+ ደነዝ/-
አለመተማመን/- ኩራት/+ በጎ/+ አወንታ/+ ደማቅ/+
ፅናት/+ ሚና/- መርዝ/- ያለመሳካት/- ምቀኛ/-
ቅዱስ/+ አብነት/+ ችኩል/- የሚቃረን/- ጭንቀት/-
የማይታመን/- ባልንጀራ/+ አድመኛ/- ወርቃማ/+ እርቅ/+

63 | P a g e
እድገት/+ ጭቁን/- እድፍ/- ፈጣጣ/- ቅፅበታዊ/-
ጩኸት/- አረንቋ/- አበሳ/- ቸር/+ አዝናኝ/+
ሰይጣን/- የተስተካከለ/+ የተወደደ/+ ምስጋና/+ አስቸጋሪ/-
ጥለኛ/- ጀግንነት/+ የሚያመነታ/- ሰብአዊ/+ ተመፃዳቂ/-
ቃል/+ ቆራጥ/+ ምሁር/+ ማአረግ/+ ትንግርት/+
ዘመናዊ/+ አዳሚ/- እርቃን/- ስምምነት/+ ከሀድ/-
ሰላማዊ/+ መጮህ/- ከይሲ/- መደብደብያ/- ጠቢብ/+
አሰልቺ/- ወረርሽኝ/- ቸርነት/+ ሸጋ/+ ያልሰመረ/-
ውስብስብ/- እክል/- አህዛብ/+ ልቅ/- አቤቱታ/-
ታማኝ/+ መቅጫ/- መተዛዘኛ/+ አስጨናቂ/- ፍቃድ/+
የማይበገር/+ መረጋጋት/+ ተስፋ/+ ተሀድሶ/+ ውሸት/-
ዝሙት/- ጥዱ/+ ምክር/+ ቁርጠት/- ግርማ/+
ጋጋታ/- ጥል/- እንከን/- ማሸንክ/- ዘለቄታ/+
አፀያፊ/- አላስፈላጊ/- ፈሪ/- ኋላቀርነት/- አውደልዳይ/-
አመርቂ/+ ቀናተኛ/- ድንግል/+ የማይለወጥ/- ርህራሄ/+
መቃብር/- ፈሳም/- ቋጣሪ/- ጠቃሚ/+ ባለፀጋ/+
አድናቆት/+ ቦዘኔ/- ረብሻ/- ሀሰት/- ትምክህተኛ/-
ወልጋዳ/- መረዳዳት/+ ሞኝነት/- ደህና/+ ቁጥብ/+
አእምሮ/+ ያልተረጋገጠ/- ይሉኝታ/- እርካሽ/- ጎሽ/+
ሞልጣፋ/- ራእይ/+ ፍቃደኛ/+ ርካሽ/- እድል/+
ግጭት/- ዝምተኛ/+ የሞተ/- ቂላቂል/- ተመፃዳቂ/-
ትችት/- ግልፅ/+ በደለኛ/- ቅናታም/- አርበኛ/+
ውድ/+ ፅንፈኛ/- በቀል/- ሸባ/- ልክ/+
ተፈላጊ/+ አፈንጋጭ/- ሀዘን/- ፅዱ/+ ፍጭት/-
ማጭበርበር/- ኩራተኛ/- ፍቅር/+ እብድ/- ኢምንት/-
ገነት/+ ደብዛዛ/- የተሳሳተ/- ቀማኛ/- ፋራ/-
ከፍተኛ/+ ነጋሲ/+ በደል/- ቁጣ/- የሚያስመሰግን/-
ከዳተኛ/- ኮስታራ/- ወዳጅ/+ ጥገኛ/- ሰላም/+
ንደት/- ምሳሌ/+ አስተዋይ/+ አሜን/+ ምላሽ/-
ሽባ/- ተጨባጭ/+ ከባድ/- ተብታባ/- ተአምረኛ/+
ሸባ/- አስደናቂ/+ ድንግልና/+ ድካም/- ጉዳት/-

64 | P a g e
የሚያወላውል/- ማስወገድ/+ ግፍ/- ብልፅግና/+ ትጉ/+
ዱብዳ/- መራር/- ሲኦል/- ረጋ/+ አንጎል/-
ለምፍ/- አንጠልጣይ/+ አስቀያሚ/- ፉንጋ/- ፈት/-
የሚያወላውል/- ዘግናኝ/- ጭፍጨፋ/- ነቀፋ/- ታዋቂ/+
ጎሰኛ/- ኢፍትሀዊ/- ተመራጭ/+ ስስታም/- አይብ/+
ጠማማ/- ጉልበታም/+ ወንጀለኛ/- መሳካት/+ ለዛ/+
ጣልቃ/- አርነት/- ስደተኛ/- ወሰክ/+ አስቸካይ/-
ክፉ/- ቅልጣን/- ተሰጥኦ/+ አቅል/+ ፀፀት/-
አረመኔ/- አሰቃቂ/- ዘዋሪ/- ህቅታ/- እምነት/+
እርካታ/+ አድላዊ/- ጨቅጫቃ/- ብርቅ/+ ይቅርታ/+
የሚረብሽ/- ኮተት/- ሰለባ/- አንበሳ/+ አስከፊ/-
ጨዋ/+ ተግሳፅ/+ አደገኛ/- የተማረ/+ ህያው/+
ትሁት/+ ነውር/- ስኬታማ/+ ልሙጥ/+ ውሻ/-
ያልተገደበ/- ብልጥ/+ ፍርሀት/- ገናናነት/+ ህፀፅ/-
አመፅ/- መቅሰፍት/- ታጋሽ/+ ስህተት/- ቂመኛ/-
አስጠያፊ/- ጥፋተኛ/- አስቂኝ/+ ማግለል/- አደናቃፊ/-
መአት/- ዘለአለማዊ/+ መቃወም/- ዳተኛ/- ተራ/-
ምርቃት/+ ጭብት/+ ክብረት/+ ሎጋ/+ ጥገኝነት/-
ጀልጋጋ/- ውድመት/- አሳሳች/- ዝነኛ/+ ዋስትና/+
ቁጥብነት/+ ቸር/+ ጨካኝ/- ቆንጆ/+ ባዶ/-
ጭንቅ/- እጥረት/- እድል/+ መጋኛ/- ጠላት/-
ፍሰሀ/+ አጠራጣሪ/- እንቁ/+ ጠብ/- ድምቀት/+
ተቀባይነት/+ ሲሳይ/+ ማስጠንቀቅያ/- የማየሻሻል/- ቱባ/-
ንትርክ/- ትእቢተኛ/- ህዝባዊ/+ ሀያል/+ ደግ/+
የሚያበሳጭ/- ምህረት/+ መልእክት/+ ተከላካይ/+ ሴረኛ/-
ወቀሳ/- ዝቃጭ/- ልብ/+ ብሩክ/+ ቅጥ/+
ሁከት/- ጤና/+ ስመጥር/+ ተጠራጣሪ/- የዋህ/+
አይነተኛ/+ ጥንታዊ/- ጌጣጌጥ/+ ተመጣጣኝ/+ ህጋዊ/+
ሳይሳካ/- ጮሌ/- ልፍያ/- ሽልማት/+ ገደብ/+
ገቢራዊ/+ ቀልቃላ/- ቂል/- መተጋገዝ/+ ደመቀ/+
እቡይ/- ባለጌ/- ግድያ/- የተጋለጠ/- ባርያ/-

65 | P a g e
ጥርት/+ እውነተኛ/+ ጤናማጣት/- ትህትና/+ ነፃ/+
ወግ/+ መሰረት/+ ማታለል/- ገዳይ/- ምቹ/+
እብድ/- ስቃይ/- የሚያበረታታ/+ ጋደኛ/+ የማይመች/-
አዛኝ/+ እመርታ/+ ጥፊ/- ያልተዛባ/- ሀሰት/-
ጅንን/- ትምክህት/- የማይታለም/- ብልሀተኛ/+ በዘፈቀደ/-
ኋላቀር/- ትጉህ/+ ሀካይ/- ቀውላላ/- ሀጢአት/-
የተዘበራረቀ/- ያበደ/- ፎጋሪ/- ፍዳ/- መጥፎ/-
አነካኪ/- ግፊት/- ፍሰሀ/+ ሙድ/+ ተልካሽ/-
ኋላቀርነት/- ትርምስ/- ነቃ/+ መራራ/- አመኔታ/+
አዛዥ/- ጨላማ/- እርቃን/- እዳ/- አሪፍ/+
ጎበዝ/+ ሀሴት/+ ተጫዋች/+ ትእግስት/+ የሚቃወም/-
ጌጃ/- አስመሳይ/- በሽተኛ/- ስብእና/+ ተናደደ/-
ፈዛዛ/- አዱኛ/+ አሳመረ/+ ፀጋ/+ ህገወጥ/-
ፈገግታ/+ በፍፁም/+ ማንባት/- ምርጥ/+ ዝርክርክ/-
እርግጠኛ/+ ሀሰተኛ/- አስገራሚ/+ ሂስ/- ጉድለት/-
አስፈሪ/- ቅፅበት/- መልካም/+ እከካም/- ፈተና/+
ግርፋት/- ተልእኮ/+ ቀበጥ/- አውዳሚ/- ኮተታም/-
የተወናበደ/- የተጋነነ/- ዘራፊ/- እልቂት/- መልቲ/-
ዋና/+ ማራኪ/+ አታላይ/- ሀብት/+ አቅም/+
ቀውስ/- ሽታ/+ ህግ/+ በጎፍቃድ/+ አድማ/-
አስመሳይ/- ስልጣኔ/+ ባእድ/- መልከመልካም/ ቀጣፊ/-
ስርዝ/- ኪሳራ/- የሚያስቅ/+ + ጥርጣሬ/-
ቅን/+ ዝንጉ/- ደንብ/+ የማይካድ/+ ዘመናዊነት/+
ድንገተኛ/- የተካነ/+ ንብረት/+ ያልተገራ/- ጠንካራ/+
ሀቀኛ/+ ግብ/+ የማይታገስ/- ሻገተ/- ቅጥፈት/-
ቅሬታ/- እስከነአካቴው/- ባለሙያ/+ ማአረግ/+ የማይጎዳ/+
ወሸን/+ የሚበረታታ/+ ስጦታ/+ ወረኛ/- አሸናፊ/+
ድባቅ/- ሽንፈት/- ቸልተኛ/- ጭንቅንቅ/- አዋራጅ/-
መንፈሳዊ/+ ሀራም/- ትጋተኛ/+ ለጋሽ/+ ሰላምተኛ/+
ሀመልማል/+ ወራዳ/- ጠቀሜታ/+ ቆሻሻ/- ችግር/-
ጋጠወጥ/- ትእግስት/+ ጠንቃቃ/+ ስጋት/- ታዛዥ/+

66 | P a g e
አጥጋቢ/+ ስመጥር/+ እሽሩሩ/+ ድል/+ ስንዱ/+
ቅር/- ረባሽ/- ህፍረት/- አንዛራጭ/- ማስተዋል/+
አዘኔታ/+ ትረባ/- ግም/- ቅልጥፍና/+ መርዘኛ/+
ቻይ/+ ነገረኛ/- የባሰ/- መጫር/- ከውካዋ/-
ቆፎ/- ጤንነት/+ ወሮታ/+ ሀሜት/- ቀላል/+
ጥማት/- መቅዘፍት/- ነዳይ/- ግርፍያ/- ገገማ/-
ወንጀል/- የበለጠ/+ ጭብጥ/+ ዝግጁ/+ ወለብላባ/-
ሸካካ/- ሀሴት/- አጣዳፊ/- በሽታ/- አጥፊ/-
ልማት/+ መዘዝ/- ሞቅ/+ ጠባብ/- ደግነት/+
ዋነኛ/+ የሚያሳፍር/- ሙየያዊ/+ አንገብጋቢ/- ታጋች/+
አለቃ/+ ፍጥጫ/- ፍሬ/+ ሀይማኖተኛ/+ ታጋሽ/+
ሹም/+ ፋና/+ ዘላቂነት/+ ስልት/+ ብልጣብልጥ/-
የሚያስጠላ/- የሚያጎድል/- ፀዳል/+ ተቃራኒ/- ድንክ/-
ሳያመነታ/+ ጎርባጣ/- ንዝንዝ/- ልእልና/+ እምቅ/+
ስሜታዊ/- ነጭናጫ/- ተአምር/+ ቀለጤ/- አማፅያን/-
ከረከረ/- ጀብድ/+ ትክክለኛ/+ ታድያስ/+ እርጉም/-
እሪታ/- ቅልጥ/+ ደመኛ/- እውቅ/+ ፀያፍ/-
እርግጫ/- ድንቅ/+ የማይሰጥ/- አድካሚ/- ልምድ/+
ማለፍያ/+ አደናጋሪ/- ያማረ/+ እርኩስ/- ከሀዲ/-
የማይስማማ/- አግባብ/+ ዘረኛ/- እንቆቅልሻዊ/- ወረተኛ/-
እልሀኛ/- ቅንጣት/- መንድስ/- ጉርምርምታ/- መንዛዛት/-
ብልሹ/- ንዋይ/+ ውጤት/+ ድክመት/- ቅንጦት/+
ድህነት/+ አለቅጥ/- የሚያሰቀይም/- ግራ/- ስድብ/-
ሰናይ/+ ማስገደድ/- ሽበት/- ጉደኛ/- ተነሳሽነት/+
ውርጅብኝ/- አስደንጋጭ/- ፅኑ/+ ተመፅዋች/- ሞኝ/-
ጠቀሜታ/+ ብርሀን/+ በጎሪጥ/- ቀሽት/+ መአት/-
ጥሩ/+ ፈውስ/+ ድቃላ/- እሙን/+ ወሳኝ/+
ኮርማታ/- መግባባት/ ፈንጠዝያ/+ የተረጋገጠ/+ ብስጭት/-
ልል/- ችኮላ/- ህመም/- እጣ/+ ሩሁሩ/+
ሽፍታ/- ኩምትር/- አፍራሽ/- ፍሰሀ/+ ገሀነም/-
ፍርሀት/- መድሀኒት/+ ተፃራሪ/- ልሞሾ/- አለመግባባት/-

67 | P a g e
ነፍጠኛ/- በቂ/+ ጨፍጫፊ/- ሸሌ/- ሴራ/-
ዝቅተኛ/- ሚዛን/+ ንቁ/+ ጤነኛ/+ እቁብ/+
ጦርነት/- ሞያ/+ ጉረኛ/- ልእልና/+ ብልህ/+
ንጭጭ/- ሞገስ/+ ጥፉ/- ብክለት/- ልግመኛ/-
ቅራኔ/- እምባ/- ቁርጠኛ/+ ተመናመነ/- ብርቱ/+
ፋይዳ/+ መሸወድ/- ከበሬታ/+ አስፈላጊ/+ ውል/+
ግትር/- አልባሌ/- ያልታወቀ/- ሚና/+ ሸፋፋ/-
ማደናቀፍ/- ቅሌት/- እፁብ/+ ንጭንጭ/- ጭቅጭቅ/-
በረከት/+ ገናና/+ ብላሽ/- አሳሳቢ/+ ጉጉ/+
ግብረገብ/+ ዱርየ/- እዳ/- ደስታ/+ መከራ/-
ውጤታማ/+ ጫና/- ተፅእነኖ/- ፍሬአማ/+ ፅድቅ/+
ሞልፋግ/- አላግባብ/- ምስጢር/- ወዳጅነት/+ ቀሽም/-
ደካማ/- ይሁንታ/+ ሚስጢር/- ቀሳፊ/- ቁስል/-
አላማ/+ ሴሰኛ/- ደዌ/- ታላቅ/+ እድለኛ/+
ሀብታም/+ አሳማኝ/- ንፉግ/- ጥረት/+ ጥብቅ/+
ቅጣት/- ሰነፍ/- መደናገር/- ክልክል/- ማስጨነቅ/-
አስተካካይ/+ ስንኩል/- ሀዘን/- ሽብር/- ካልቾ/-
ውዥምብር/- ህክምና/+ አጋዥ/+ ዘንካታ/+ ባለውለታ/+
ውስብስብ/- ጠቃሚነት/+ ለቅሶ/- ጀግና/+ መና/-
ተወዳጅ/+ አመፅ/- ጨለምተኛ/- ማጣት/- ጣእም/+
አዋረደ/- ድሎት/+ ኡኡታ/- እመቤት/+ አሳማኝ/+
ነውጥ/- ውይይት/+ ቂም/- ተአማኒነት/+ ጌታ/+
አንድነት/+ ፍላጎት/+ አወንታዊ/+ ሞልቃቃ/- የማይረባ/-
ተግባቢ/+ ለዘብተኛ/- ደንበኛ/+ አክሳሪ/- ገብጋባ/-
ነዝነዛ/- ገደብ/- እንቅፋት/- ሀያል/+ ብቸኝነት/-
ግፈኛ/- ፈጣን/+ ናፋቂ/- ግድፈት/- ጅል/-
አለመስማማት/- ጥድፍያ/- ውብ/+ ውርደት/- ችስታ/-
ነውጠኛ/- ክስ/- ሎሌ/- ሳፋሪ/- ደንታ/+
ጠባሳ/- የበላይ/+ ትእግስተኛ/+ ጥበበኛ/+ መታዘዝ/+
እኩይ/- ቀዥቃዣ/- ያለአግባብ/- ፀባይ/+ ሙጥኝ/-
ድንጋጤ/- ግሽበት/- ሩህሩህ/+ አሸበረቀ/+ ወከባ/-

68 | P a g e
ፈተና/- ጥራት/+ ያስጠላል/- ጥሩ/+ ኋላቀር/-
ማይገኝ/- ምርጥ/+ አስተማሪ/+ ችሎታ/+ ይደብራል/-

የማይገኝ/- ደስ/+ ሙስና/- ትምህርት/+ ገንቢ/+

ለጋስ/+ አሪፍ/+ ምጣኔ/+ ጥረት/+


ድሀ/- ወደነዋል/+ ዋጋ/+ አስተዋፅኦ/+
አልማዝ/+ ድክመት/- መሳጭ/+ እርዳታ/+
ጋባዥ/+ አሳዛኝ/- ቢሆንም/< ሐራጅ/-
መሀይምነት/- መንፈሳዊ/+ አስቂኝ/+ ስህተት/-
የደመቀ/+ ያስደንቃል/+ መጥፎ/- በሽታ/-
አይደለም/Nega ጠቃሚ/+ ጉድለት/- ቆንጆ/+
te የሚያምር/+ አስደሳች/+ ምዝበራ/-
የማይል/Negat በጎ/+ ታዋቂ/+ ድንቅ/+
e ማራኪ/+ ተመችቶኛል/+ የሚደነቅ/+
ቢሆንም/< አዝናኝ/+ ይሻላል/+ ተገቢ/+
ግን/< ኪሳራ/- ጥርጣሬ/- የሚስብ/+
በጣም/> ነፃነት/+ ወድጀዋለሁ/+ በሚገባ/+
የሚደነቅ/+ ዝቅተኛ/- ግሩም/+ በርቱ/+
የሚገርም/+ ብራቮ/+ ቅጂ/- ዋዉ/+
አልተመቸኝም/- አዲስ/+ ብቃት/+ ለቅሶ/-
አስተማሪ/+ ዉበት/+ ጥቅም/+ ያስደስታል/+
ሸርሙጣ/- አስገራሚ/+ ቅር/- ጅል/-
ዘረኝነት/- መልካም/+ ይመሰገናል/+ የተሻለ/+
ስርቅታ/- ደደብ/- እውን/+ ጠንካራነት/+
እሺ/+ የሚያስደስት/+ ኩረጃ/- አስደናቂ/+
ተድላ/+ የሚያዝናና/+ የሚያስጠላ/- ማነስ/-
ደህንነት/+ የራቀ/- እድገት/+ አክብሮት/+
አርአያ/+ ፈጣሪ/+ የተዋጣለት/+ ገለልተኛ/+
ተስፋ/+ መርዝ/- ስህተቶች/- አለመሆኑ/Negat

ትልቅ/+ እውነት/+ ችግር/- e


ውስን/-
እስር/- ፍቅር/+ ታማኝ/+
ሙጭጭ/-
እዉነት/+ ተበዳሪ/- መልእክት/+

69 | P a g e
Declaration

I, the undersigned, declare that this thesis is my original work and has not been presented
for a degree in any other university, and all sources of materials for the thesis have been
acknowledged.

_________________________________________________

SELAMA GEBREMESKEL

This thesis has been submitted for examination with my approval as an advisor.

_______________________________________________________________

SOLOMON ATNAFU (Ph. D.)

Addis Ababa, Ethiopia


October, 2010

70 | P a g e
Authorization

I authorize the Department of Computer Science, Addis Ababa University to lend this thesis for
other institution or individuals for the purpose of scholarly research.

I further authorize the Department of Computer Science, Addis Ababa University to reproduce
the thesis by photocopy or by other means, in total or in part at the request of other institutions or
individuals for the purpose of scholarly research.

71 | P a g e

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy