Gebreyes Gebeyehu Coffee Thesis 2021
Gebreyes Gebeyehu Coffee Thesis 2021
COLLEGE OF COMPUTING
DEPARTMENT OF INFORMATION SYSTEMS
IMAGE BASED COFFEE BEAN CLASSIFICATION
USING DEEP LEARNING TECHNIQUE.
By
GEBREYES GEBEYEHU GELETA
By
GEBREYES GEBEYEHU GELETA
By
GEBREYES GEBEYEHU GELETA
Name and signature of members of the examining board
Title Name Signature Date
Advisor Michael Melese (PhD) --------------- ----------------
Chair Person -------------------------------- ---------------- ----------------
External Examiner -------------------------------- ---------------- ----------------
Internal Examiner --------------------------------- ---------------- ----------------
This is to declare that the thesis prepared by Gebreyes Gebeyehu, titled: Image Based Coffee
Bean Classification Using Deep Learning Technique and submitted in partial fulfillment of the
requirements for the Degree of Master of Science in Information Systems fulfills with the
regulations of the University and meets the accepted standards with respect to originality and
quality.
_______________________________
GEBREYES GEBEYEHU GELETA
JUNE 2021
This thesis has been submitted for examination with our approval as university advisor.
_________________________________
Advisor: Michael Melese (PhD)
JUNE 2021
ACKNOWLEDGEMENT
As usual, I'm not sure what I did to deserve your blessing. I don't believe we should spend our
lives praying for things since God already knows what we deserve. But I believe I owe it to the
almighty GOD to thank Him for what He has bestowed upon me. I'd like to express my gratitude
to you, Lord, for life and everything that comes with it. Thank you for the day, the hour, and the
minute that you have given us. I'd would like express my deepest gratitude to my advisor, Dr.
Michael Melese, for his boundless support, pleasant working atmosphere, and helpful counsel,
particularly for his sociability. Whenever I had a question, he was always available. He continually
made it possible for me to be productive and guided me in the proper direction. Thank you for
everything and wish you happy life. I must express my deepest gratitude for the experts who were
involved in this research Mr. Silesh and other Buna board employees. Finally, I want to express
my gratitude to my parents for providing me with unfailing support and continuous encouragement
throughout my years of study and my postgraduate lovely classmates.
__________________________________________
GEBREYES GEBEYEHU GELETA
JUNE 2021
i
ABSTRACT
Coffee is one of the most famous gentle drink within the world. Most peoples take a cup of or
double cup of coffee every morning to stimulate them even though they drink after the launch and
totally, over 2.25 billion cups are fed on every day. Monetarily, coffee is the second most sent out
commodity after oil, and utilizes more than 100 million individuals around the world. Coffee
Arabica has developed for thousands of years in Ethiopia, in the southwestern highlands forests.
The classification and grading of coffee in Ethiopian coffee quality inspection and certification
center or Coffee board is manual. This leads to so many problems like pruning of error, inefficient,
require a lot of labor and is not cost effective. This research was conducted with the objective of
developing an appropriate computer routine algorithm that can characterize different varieties of
coffee based on their growing region.
To address this problem, we proposed deep learning approach for classification of coffee bean
based on their growing regions. The proposed system has two main components namely, training
model and testing the trained model or developing web application using flask. In this study, we
applied different image preprocessing techniques such as: - removing noise, normalizing images
and resizing images. We proposed two novel segmentation algorithms which are used to extract
region of interest (ROI) and both are achieved excellent accuracy in this study. Two frameworks
are proposed: one is to train a deep neural network model from scratch, and the other is to transfer
learning a pre-trained network model. The model with best performance were obtained by testing
different network layers, optimizers, learning rates, loss functions, number of epoch, batch sizes,
and steps. Based on the optimized model.
The developed classification model trained on 3120 dataset’s collected from ECQIAC. To increase
the dataset, we applied different augmentation techniques. We split the dataset into different test
options such as: - 90:10, 80:20 and 70:30 for training and testing respectively. The model classified
the input coffee bean image into Gujji, Jimma, Kaffa, Nekempti, Sidamo and Yirgacheffe using
softmax classifier function with 97.8 % accuracy. The entire system has been evaluated by
employee from ECQIAC, the analysis shows that the system is effective to classify green coffee
bean based on their growing region.
Keywords: Ethiopian coffee, Coffee Bean Classification, Deep Learning, Convolutional Neural
Network (CNN).
ii
Table of Contents
ACKNOWLEDGEMENT ............................................................................................................. i
ABSTRACT ................................................................................................................................... ii
LIST OF FIGURES ..................................................................................................................... vi
LIST OF TABLES ..................................................................................................................... viii
LIST OF ABBREVIATIONS ..................................................................................................... ix
CHAPTER ONE ........................................................................................................................... 1
INTRODUCTION......................................................................................................................... 1
1.1. Background .......................................................................................................................... 1
1.2. Motivation of Research ........................................................................................................ 3
1.3. Problem Statement ............................................................................................................... 4
1.4. Objectives of the study......................................................................................................... 6
1.4.1. General objective .......................................................................................................... 6
1.4.2. Specific Objectives ....................................................................................................... 6
1.5. Significance of the Study ..................................................................................................... 6
1.6. Scope and Limitation of the study ....................................................................................... 7
1.7. Organization of the Thesis ................................................................................................... 7
CHAPTER TWO .......................................................................................................................... 9
LITERATURE REVIEW ............................................................................................................ 9
2.1. Overview .............................................................................................................................. 9
2.2. Coffee in Ethiopia ................................................................................................................ 9
2.2.1. Classifications and Grading of Ethiopian Coffee ......................................................... 9
2.2.2. Coffee Varieties in Ethiopia........................................................................................ 12
2.2.3. Ethiopian Coffee Processing ....................................................................................... 17
2.3. Image Representation......................................................................................................... 19
2.4. Image Processing ............................................................................................................... 22
2.4.1. Image Acquisition ....................................................................................................... 22
2.4.2. Image Pre-processing .................................................................................................. 23
2.4.3. Image segmentation .................................................................................................... 27
2.4.4. Feature Extraction ...................................................................................................... 31
2.5. Machine Learning .............................................................................................................. 33
2.5.1. Classification approaches............................................................................................ 33
iii
2.5.2. Artificial Neural Network ........................................................................................... 34
2.6. Deep Learning Approach. .................................................................................................. 35
2.6.1. Recurrent Neural Network (RNN) .............................................................................. 36
2.6.2. Long Short-Term Memory (LSTM) Neural Network ............................................... 37
2.6.3. Convolutional Neural Network ................................................................................... 38
2.6.4. Examples of CNN Architectures ................................................................................ 48
2.7. Related Works.................................................................................................................... 51
2.7.1. Related to Coffee and Crop Product ........................................................................... 51
2.7.2. Related to convolutional neural network approach .................................................... 55
CHAPTER THREE .................................................................................................................... 59
SYSTEM DESIGN AND ARCHITECTURE .......................................................................... 59
3.1. Overview ........................................................................................................................... 59
3.2. Design Science Research Methodology............................................................................. 59
3.3. The Proposed System Architecture ................................................................................... 62
3. 3.1. Image Acquisition ...................................................................................................... 64
3.3.2. Image Preprocessing .................................................................................................. 64
3.3.3. Image Segmentation.................................................................................................... 67
3.3.4. Data augmentation ...................................................................................................... 69
3.3.5. Normalization ............................................................................................................. 69
3.3.6. Convolutional Neural Network Model Architecture .................................................. 70
3.3.7. Hyperparameters ......................................................................................................... 73
3.3.8. Classification............................................................................................................... 74
3.4. Model Evaluation Metrics.................................................................................................. 74
CHAPTER FOUR ....................................................................................................................... 76
EXPERIMENT AND EVALUATION ...................................................................................... 76
4.1. Introduction ....................................................................................................................... 76
4.2. Description and Preparation of Dataset ............................................................................. 76
4.3. Experimental Design .......................................................................................................... 77
4.4. Simulation Environment .................................................................................................... 77
4.5. Experimental Results ......................................................................................................... 78
4.5.1. Experiments on Training from the scratch.................................................................. 78
4.5.2. Training and testing state-of-the-art CNN architecture .............................................. 84
iv
4.6. Summary of performance comparison ............................................................................... 87
4.7. Comparison of Related Works with this Study.................................................................. 88
4.8. User Acceptance Testing ................................................................................................... 89
4.9. Discussion of the Research Questions ............................................................................... 91
4.10. Web application design and Implementation ................................................................... 92
CHAPTER FIVE ........................................................................................................................ 97
CONCLUSION AND RECOMMENDATION ........................................................................ 97
5.1. Conclusion ......................................................................................................................... 97
5.2. Contribution of this study .................................................................................................. 98
5.3. Recommendation ............................................................................................................... 98
REFERENCES .......................................................................................................................... 100
APPENDIX A: .......................................................................................................................... 106
APPENDIX B: ........................................................................................................................... 107
v
List of Figures
Figure 2. 1 Moisture Apparatus .................................................................................................... 10
Figure 2. 2 Screen Size Apparatus ................................................................................................ 11
Figure 2. 3 Sidamo coffee bean image.......................................................................................... 13
Figure 2. 4 Yirgacheffe Coffee Bean Image ................................................................................. 14
Figure 2. 5 Jimma Coffee Bean Image ......................................................................................... 14
Figure 2. 6 Kaffa Coffee Bean Image ........................................................................................... 15
Figure 2. 7 Nekempti Coffee Bean Image .................................................................................... 16
Figure 2. 8 Gujji Coffee Bean Image ............................................................................................ 16
Figure 2. 9 Digitization of a continuous image............................................................................. 20
Figure 2. 10 Fundamentals steps of digital image processing ...................................................... 22
Figure 2. 11 White Color Pixel Value Corrupted with Salt & Pepper Noise ............................... 25
Figure 2. 12 classification of image segmentation techniques ..................................................... 28
Figure 2. 13 Watershed segmentation process .............................................................................. 29
Figure 2. 14 Otsu-Thresholding .................................................................................................... 30
Figure 2. 15 Classification of Feature Extraction Method ............................................................ 32
Figure 2. 16 Schematic of a typical Artificial Neural Network (ANN) architecture .................... 35
Figure 2. 17 Venn diagram which describes deep learning .......................................................... 36
Figure 2. 18 RNN structure ........................................................................................................... 37
Figure 2. 19 Long Short-Term Memory neural network .............................................................. 38
Figure 2. 20 Architecture of convolutional neural network .......................................................... 39
Figure 2. 21 convolution operation using same padding and one stride ....................................... 40
Figure 2. 22 Max Pooling ............................................................................................................. 41
Figure 2. 23 ADAM algorithms .................................................................................................... 47
Figure 2. 24 AlexNet Architecture................................................................................................ 49
Figure 2. 25 VGGNet architecture ................................................................................................ 50
Figure 2. 26 Inception module, naïve version............................................................................... 51
Figure 2. 27 Inception module with dimensionality reduction ..................................................... 51
Figure 3. 1 Design Science Research Process Model Adopted From [63] ................................... 60
Figure 3. 2 Design Science Research Framework ........................................................................ 62
Figure 3. 3 Proposed System Process ........................................................................................... 63
vi
Figure 3. 4 Image resizing Algorithm ........................................................................................... 65
Figure 3. 5 Resized Image ............................................................................................................ 65
Figure 3. 6 Algorithm for median filtering ................................................................................... 66
Figure 3. 7 Enhanced Imaged Through Median Filtering............................................................. 66
Figure 3. 8 Image after watershed segmentation was applied ...................................................... 68
Figure 3. 9 Coffee bean Image after Otsu applied ........................................................................ 69
Figure 3. 10 Image normalization algorithm ................................................................................ 70
Figure 3. 11 Proposed CNN classification model ......................................................................... 71
Figure 3. 12 Customized pretrained model ................................................................................... 73
Figure 4. 1 Graph of training and Test accuracy for experiment one ........................................... 78
Figure 4. 2 Training and Test Loss for experiment one ................................................................ 79
Figure 4. 3 Graph of training and Test accuracy for experiment two ........................................... 81
Figure 4. 4 Graph of training and Test Loss for experiment two ................................................. 81
Figure 4. 5 Trained Model Sample Screenshot of experiment three ............................................ 83
Figure 4. 6 Graph of training and Test accuracy for experiment three ......................................... 84
Figure 4. 7 Graph of training and Test loss for experiment three ................................................. 84
Figure 4. 10 sample Output of VGG19 ......................................................................................... 86
Figure 4. 11 Screenshot image of VGG16 classification report ................................................... 87
Figure 4. 13 Command prompt window to launch Flask server and to get IP address ................ 93
Figure 4. 14 Command prompt window to launch Flask server and to get IP address ................ 93
Figure 4. 15 Home page of proposed classification system.......................................................... 94
Figure 4. 16 Uploading coffee bean image from local disk .......................................................... 94
Figure 4. 17 Home page after the coffee bean image uploaded .................................................... 95
Figure 4. 18 Sample Result of classification system .................................................................... 95
Figure 4. 19 uploading sample Gujji image .................................................................................. 96
Figure 4. 20 uploaded Gujji image interface ................................................................................ 96
Figure 4. 21 Classification Result ................................................................................................. 96
vii
List of Tables
Table 2. 1 Coffee Bean Characteristics......................................................................................... 17
Table 2. 2 Class index ................................................................................................................... 26
Table 2. 3 One-Hot encoding ........................................................................................................ 27
Table 2. 4 Descriptions of segmentation Techniques ................................................................... 30
Table 2. 5 Summary of related work ............................................................................................ 57
Table 3. 1 Total number of Image dataset .................................................................................... 64
Table 3. 2 Label Encoding ............................................................................................................ 67
Table 3. 3 One-Hot Encoding ....................................................................................................... 67
Table 3. 4 Hyperparameters Description ...................................................................................... 74
Table 3. 5 Confusion Matrix ......................................................................................................... 75
Table 4. 1 Confusion matrix of experiment one ........................................................................... 79
Table 4. 2 Performance result of experiment one ......................................................................... 79
Table 4. 3 Confusion matrix of experiment two ........................................................................... 80
Table 4. 4 Performance results of experiment two ....................................................................... 80
Table 4. 5 Confusion matrix of experiment three ......................................................................... 82
Table 4. 6 Confusion matrix for binary thresholding test size 0.2 ................................................ 83
Table 4. 7 confusion matrix of VGG19 ........................................................................................ 85
Table 4. 8 classification report of VGG19 .................................................................................... 86
Table 4. 9 Comparison Table of All Experiment .......................................................................... 88
Table 4. 10 Comparison of Related Works with this Study ......................................................... 89
Table 4. 11 User Acceptance Evaluation Criteria and Their Results ........................................... 90
viii
List of Abbreviations
ADAM---------------------Adaptive Moment Estimation
AI---------------------------Artificial Intelligence
ANN-----------------------Artificial Neural Network
CCD-----------------------Charge Coupled Device
CNN-----------------------Convolutional Neural Network
CSS------------------------Cascading Style Sheet
DIP-------------------------Digital Image Processing
DNs------------------------Digital Numbers
DS-------------------------Design Science
DSRM--------------------Design Science Research Methodology
ECQIAC-----------------Ethiopian Coffee Quality Inspection and Auction Center
ECX----------------------Ethiopian Commodity Exchange
FIFO---------------------First in First Out
GDP---------------------Gross Domestic Product
GPU---------------------Graphics Processing Unit
HIS----------------------Hue, Saturation and Intensity
HOG--------------------Histogram Oriented Gradient
HTML------------------Hypertext Markup Language
HTTP-------------------Hypertext Transfer Protocol
ILSVRC----------------ImageNet LargeScale Visual Recognition Challenge
JPEG--------------------Joint Photographer Expert Group
LBP---------------------Local Binary Pattern
ML----------------------Machine Learning
PDE-------------------- Partial Differential Equations
ReLu-------------------Rectified Linear Unit
SGD--------------------Stochastic Gradient Descent
SURF------------------Speeded Up Robust Features
URL-------------------Uniform Resource Locator
VGG------------------ Visual Geometry Group
WSGI-----------------Web Server Gateway Interface
ix
CHAPTER ONE
INTRODUCTION
1.1. Background
The agricultural sector in Ethiopia represents 45 % of the gross domestic product and about 85 %
of the population gains their livelihood directly or indirectly from agricultural production including
livestock. The importance of agricultural research and its impact on development in Ethiopia can
hardly be over emphasized [1]. Relative to other African countries, agricultural research in
Ethiopia is quite young. Organized agricultural research activities and actual relations between
agricultural research and development started with the inception of the Institute of Agricultural
Research in 1966.
Coffee is one of the most famous gentle drink within the world. Most peoples take a cup of or
double cup of coffee every morning to stimulate them even though they drink after the launch and
totally, over 2.25 billion cups are fed on every day [2]. Monetarily, coffee is the second most sent
out commodity after oil, and utilizes more than 100 million individuals around the world [3].
Many nations ' economies depend on espresso production for stabilization and development.
Coffee is now cultivated in over 60 tropical nations around the world, and accounts for a significant
portion of many's foreign exchange earnings. Ethiopia is Coffee Arabica's Motherland. It has a
wide selection of coffee and its different roots. Ethiopian coffee is rich with original taste and
aroma because of the country's geographic (altitude, soil, temperature, rainfall, topography,
ecology), genotypes, and cultural range. Coffee has developed for thousands of years in Ethiopia,
in the southwestern highlands forests. Kaffa coffee drives word, name of a spot in the highlands
of south west Ethiopia where coffee is first found. It is also considered to be the first African
exporter of Coffee Arabica, and is currently the world's fifth largest espresso maker. There are
exceptional varieties of coffee Arabica which grows within Ethiopia. The criteria used for coffee
class include bean size, shape, colour, acidity of the aroma, taste and body [4].
1
Ethiopia produces premium quality coffee according to the international coffee organization’s
survey, the county of origin for crop. After Brazil, Vietnam, Colombia and Indonesia, it is the
leading producer in Africa and the 5th in the world. When we look at Arabica alone, after Brazil
and Colombia, Ethiopia is the third-largest producer. Ethiopia also has the largest highland region
suitable for production in coffee Arabica and therefore has the potential to be a leading producer
in terms of both quality and quantity. Nearly, all coffees produced in Ethiopia are color grown,
with 40 60% canopy cover, besides few home garden systems in Eastern Ethiopia. The coffee
vegetation is also specially either neighborhood varieties/ land races or of wild origin. The
chemical inputs for manufacturing are very low and even non-existence in most cases, while
processing involves both the wet and dry methods.
The dominant technique but is the dry (natural) approach, with low environmental impact.
Despite its importance for millions of people worldwide, coffee production is currently limited b
y biotic (humidity, dew and rainfall) and biotic (fungi, bacteria and nematodes) factors in the cen
ter of origin and other major producing countries [5].
Digital image processing is the use of computer algorithms to do digital image processing. The
image processing is an examination and manipulation of a digital image, in particular to improve
the image processing quality [6]. The technique of digital image processing can be used in various
fields, such as crop products and medical industry [7].
The researcher intended to apply image processing on classification and grading of Ethiopian
coffee beans in this study. An application using image processing techniques for computer vision
includes five basic steps such as image acquisition, preprocessing, segmentation, extraction and
classification of features [3]. Artificial Neural Networks (ANN) is an attempt to mimic the neurons
of the brain. The models used, however, have many simplifications and thus do not reflect the true
behavior of the brain. Development of ANN had its first peak in the 1940s, and development has
since gone up and down. The weighted sum of signals from other connected nodes is computed
and a model of a neuron is created. The nodes are connected in two main patterns. No loops occur
in feed-forward networks and loops occur in recurring networks. There is also a multi-layer feed-
forward network with an initial input stage, hidden layers, and output layer [8].
2
The main objective of machine learning (ML) is to develop systems that can change their behavior
autonomously based on experience. ML methods use training data to induce general models that
can detect the presence or absence of new (test) data patterns. In the case of images, training data
may take the form of a set of pixels, regions or images, which can be labeled or not. Patterns can
correspond to low - level attributes, such as a label for a group of pixels in a segmentation task, or
high - level concepts [9].
Deep learning is a technology inspired by the functioning of human brain. In deep getting to know,
networks of synthetic neurons analyze large dataset to automatically discover underlying styles,
without human intervention, deep learning identify patterns in unstructured statistics such as,
Images, sound, video and text. Convolutional neural networks (CNN) become very famous for
photo classification in deep getting to know; CNN’s perform better than human subjects on most
of the image type datasets [10]
3
1.3. Problem Statement
Ethiopia is the first country from Africa by exporting high quality coffee Arabica to the foreign
country. To keep this rank and to be become the leading country by exporting coffee Arabica in
the world it is necessary to keep the quality coffee. Ethiopian coffee quality inspection and
certification center or Coffee board is responsible for classifying and grading coffee bean before
exporting. To be competitive in the market the organization should care about the quality of the
product specifically.
The classification and grading of coffee in Ethiopian coffee quality inspection and certification
center or Coffee board is manual. This leads to so many problems like pruning of error, inefficient,
require a lot of labor and is not cost effective. Moreover, lack of well-educated personnel, use of
traditional grading technique and lack of advanced measuring equipment.
This method employs visual and manual methods of inspection of the major entities used,
including appearance, texture, shape, size and color of coffee beans, exposing the quality
assessment to inconsistent results and subjectivism. In addition, the tedious and time-taking human
operator inspection activity for grading this high value product is very expensive, less efficient and
less effective generating less descriptive and biased data information for quality control and other
innovative improvement activities. Lack of a specialized specific field of study and qualification
at country level for sorting, grading and classification of this item represents an important
drawback which affects the reliability, efficiency and consistency of the practice. The cost incurred
to fulfill this gap at various scales of trainings to generate capable experts is also significant. This
will be a serious problem when observed from the perspective of extending and decentralizing the
classification and grading activities to many other regions of the country. For example, they
measure the size and shape of bean using the apparatus paul kaack & co Hamburg 20. They add
300g coffee bean into this apparatus and shake the apparatus until it stops screening the beans
which are the screen size under 1/14 inch. They told may this takes more than five minutes for
300g coffee bean, therefore think how much time it consumes and it is very tedious..Previously,
different findings [11] [12] were conducted in this area. However, those study didn’t apply
different deep learning technique mainly convolutional neural network. Now day’s CNN
performing promising accuracy in image classification and helps the researcher to use the model
which are trained for different situation through using their weights. According to [11], the
4
researcher only applied manual binary thresholding to get region of interest. This type of
segmentation is simply and ease to segment image, but it is not that much enough method for
segmenting image which are overlapped together. Which mean that, it is very difficult to know the
intensity value of overlapped area of bean. The classification performance of machine learning
algorithm is not promising because, lack of many deep hidden layer [12]. Human intervention
during feature extraction is also one of the challenging task in machine learning algorithm. The
extent in which machine learning algorithm classification accuracy in previous studies are 77.%
[11] and 82.8% [12]. As a result, replacement of the human operator systems with the consistent,
nondestructive, superior speed, precise and cost effective automated system of coffee quality
classification and determination is necessary for such commercial products that generate a huge
amount of income to the country. The automated computer vision classification and grading
system enables eliminating possible and potential human error and bias in the process. The
application of this mainly three objective inspection techniques such as morphological, texture and
color has expanded into many and diverse industries for food and agriculture to assist the
inspection and grading of various fruits and vegetables in a non-destructive method. Its speed and
accuracy satisfy the ever-increasing production and quality requirements, thereby promoting the
development and expansion of totally automated processes. In addition, it has been successfully
applied in the analysis of grain characteristics and evaluation of food crops [11]. Therefore, due to
the above mentioned problems we were tried to employ relevant and applicable image analysis,
processing and classification techniques and models, with the aim of introducing and achieving
the mentioned benefits of technological approaches for coffee bean raw quality value classification
and determination. The new technologies of image analysis and machine vision have not been fully
explored in the development of automated machine in agricultural and food industries. This calls
for launching explore researches for applying, evaluating and developing these emerging
technologies to assist the betterment of quality control and productivity issues of the sectors.
In general, manual sorting and classification, which is based on traditional visual quality inspection
performed by human operators, are tedious, time-consuming, slow and inconsistent [3].
At the end of this study, this research will answer the following research questions:
Which method is best from convolutional neural network algorithm for classification of
Ethiopian coffee bean?
5
Which segmentation algorithm is best for classification of Ethiopian coffee bean?
To what extent the deep learning algorithm classify the coffee bean images?
6
It would minimize the processing time and labor cost. This would also improve quality
based export of coffee bean.
It would enable Ethiopian coffee quality inspection and certification center to have the
same standard across all products and quality control were been easy, because it gives a
platform to conduct classifying at one specific place, centralization.
Reduce capabilities of decision-making errors that were comes from human inspector
physical condition such as fatigue and eyesight, mental state caused by biases and work
pressure, and working conditions such as improper lighting, climate, etc.
It would help different merchants to get fair quality control from the organization without
any biases.
It would benefit researchers who are motivated in achieving the goal of developing efficient
digital image processing techniques for different agricultural products and other areas.
The cup test is the other. To recognize and classify coffee, it uses the human sense of taste (tongue).
It looks into coffee's chemical properties. Acidity, body, and flavor are the parameters of the cup
test. [12]. Based on this, this research uses only green analysis to classify coffee beans into Sidamo,
Jimma, Nekempti, Kaffa, Gujji and Yirgacheffe as class labels and grading in not incorporated in
this study because grading done by using cup test as the employee in the organization stated and
other coffee varieties are not included because, lack of well-prepared dataset.
7
study which incorporate the domain area mainly agriculture specific to coffee beans. This is
followed by the motivation of the study, statement of the problem, general as well as specific
objectives, scope and limitation, advantage of this research for organization, employees and
farmers is discussed and methodology of study is stated. In Chapter Two, different literature is
reviewed to understand the area and techniques used to conduct this study. The first section is deal
with background information of coffee like the origin, methods of quality control, coffee
processing types and physical as well as chemical characteristics of coffee. This is followed by
discussing background of image, artificial neural network, machine learning and deep learning.
Next to this, the basic concept and theory of CNN and other relevant image classification tasks are
reviewed and the other related works which are conducted in coffee bean is discussed. Chapter
three attempted to discuss the research design which is about design science research methodology
in detail including the framework and steps included in the design science. This is followed by the
proposed system architecture which includes data acquisition, preprocessing of images,
segmenting images to get region of interest, data augmentation and classification for both training
and test phases. Finally, in this chapter we discussed about evaluation metrics in detail. Chapter
four presents the experimentation and evaluation of developed model, experiment design, selecting
appropriate hyper-parameter, coding the experiment based on the proposed CNN architecture,
training the model and testing the trained model. At the end of this chapter, we developed web
application to make the user of the organization make interact with developed model. Chapter five
describes the conclusion part and recommendation for those who are motivated to conduct further
study in this area.
8
CHAPTER TWO
LITERATURE REVIEW
2.1. Overview
This chapter primarily focus on understanding the theoretical background of the coffee in Ethiopia
as well as in the world. In addition to this, coffee classification and grading are reviewed, two
types of coffee quality inspection, different coffee varieties and their physical and chemical
properties are discussed in detail. Introduction to image and image processing steps, deep learning
techniques, different state-of-the-art and their architectures are discussed. Finally, different related
works were reviewed and discussed related to coffee and other agricultural crop and CNN
algorithm.
Although the genus Coffea includes four major subsections such as forest and semi-forest (10%),
garden coffee (85%), and plantation coffee (5%) are the major conventional production systems
in Ethiopia [17]. 66% of the world production mostly comes from Coffea Arabica and 34% from
Coffea canophora Pierre ex Froehner (Robusta type), respectively [12]. Ethiopia is the home and
the cradle of biodiversity of Arabica coffee seeds. Arabica exist in Ethiopia than anywhere else in
the world, which has lead botanists and scientists to agree that Ethiopia is the center of origin,
diversification and dissemination of the coffee plant.
9
more. The primary issues of coffee grading are country (region) of origin, physical characteristics
and sensory standards (taste). There is no universal coffee grading system except the recommended
standards. Each producing country has its own national grading system standard [11].
In Ethiopia, there are two main major components of the coffee quality inspection.
They are green analysis (visual test) and tasting liquor (cup test). These two approaches are unive
rsally appropriate methods adapted to the quality management systems of the respective countrie
s in both coffee producing and consuming countries [12].
Green analysis accounts for 40% of a coffee's overall grading, with the remaining 60% determined
by cup test. The green analysis uses additional coffee identification and classification procedures
as well as the human sense of sight (eye). This method checks physical aspects of coffee, such as
shape, size, color, consistency or regularity, and the number of defects in the beans. The parameters
moisture content and bean size are preliminarily assessed in coffee grading. The maximum
moisture content allowed is 11.5 percent. The Ethiopian coffee bean has a screen size restriction
of 14 units, where one unit equals 1/64 of an inch. If these two characteristics are not met, the
coffees are considered of lower grade, and no further grading examination is performed. For the
case of moisture content, it is recommended to reprocess the coffees to minimize the moisture
content [11] [12].
To assess the size of each coffee bean, a sieve-like instrument is used to conduct a screen size
examination as shown in figure 2.2. The analysis is carried out by placing 300g of coffee beans in
the device and shaking them repeatedly. The number of coffee beans that pass through the holes
is then weighted to determine the fraction of the coffee bean that falls inside the prescribed screen
size. The size of a coffee bean screen is typically reported to be 14 to 20. The numbers represent
10
the size of the sieve's holes, which are 1/64 of an inch. For example, screen size 14 indicates that
the hole's diameter is 14/64 of an inch. [11].
After the beans have passed the preliminary tests, other parameters related to their processing type
will be evaluated. The parameters shape, odor, and color are used to grade washed coffee. Color,
odor, and shape parameters account for 15%, 10%, and 15%, respectively, of the total weight of
grading by green analysis, which is 40%. Similarly, the parameters of unwashed coffee are defect
count and odor, which contribute 30% and 10% respectively to the total weight of grading by green
analysis, which is 40% [12].
The cup test is the other component of grading. To identify and classify coffee, it is based on the
human sense of taste (tongue). It focuses on the chemical properties of coffee. The cup test
parameters are acidity, body, and flavor. Acidity is a primary coffee taste sensation produced when
acids in coffee combine with sugars to increase the overall sweetness of the coffee. The body of
coffee refers to its texture and sensation in the mouth; for instance, coffee can feel light or heavy.
Flavor refers to the aroma or smell perception of the elements present in roasted coffee. Each of
these parameters accounts for 20% of the total weight of the grading by cup test, which is 60%
[12].
According to [18], Ethiopian coffee is graded using a 40 percent green analysis and a 60 percent
cup test. Based on the overall test value of each parameter for either unwashed or washed coffee,
the final grade is determined using the Ethiopian coffee grading system's national standard. For
11
example, a grade one coffee has a cumulative value of 81-100 percent from both the visual and
cup tests. The grades are set specific to the growing region.
According to [19], in reality, grading and classification systems are usually based on some, or all,
of the following criteria. In reality, most grading and classification systems are based on some or
all of the following criteria. This means that most systems are often very detailed and diverse,
leaving room for misunderstanding and misinterpretation regarding the ‘transferability-' of certain
descriptions and terminologies between producing countries:
Altitude
Region
Botanical variety
Preparation method (wet vs dry)
Bean (screen) size
Bean shape and color
Number of defect counts
Density density
Cup quality
12
and brightness than their unwashed counterparts. Although some coffees are sold through the
Soddo hub, Hawassa is the main arrival point for Sidama [21].
2.2.2.2. Yirgacheffe
According to [20], Yirgacheffe is a small microregion within the much larger region of Sidama.
Yirgacheffe coffees, on the other hand, are so distinct and well-known that they have their own
category. The quality of Yirgacheffe coffee, though much, much smaller than the other regions,
has enabled it to become as well known or even better known as the big popular coffee producing
regions of Harrar and Sidama proper. Yirgacheffe is a small town of about 20,000 people,
geographically located somewhat centrally in relation to the other coffee growing areas of Sidama,
between the large towns of Dilla and Agere Maryam.
Many characteristics are shared by the best Yirgacheffe coffees and the best Sidama coffees. Fruit
flavors, a bright acidity, and a silky mouthfeel are some of its distinguishing characteristics.
Yirgacheffe grows both washed and unwashed coffee beans. While it is best known for its washed
coffees, it has recently begun to export some highly sought-after top-quality unwashed coffees as
well.
Yirgacheffe's premium washed coffees are known for their vibrant citrus acidity, mostly lemony
in flavor, and excellent sweetness. Other distinguishing features of the coffee include a light,
13
herbaceous consistency that complements the fruit flavors well, and the nearby large town of Dilla
serves as the distribution center for all Yirgachefe coffees. [21].
More than a hundred Ethiopian investors have built estates and farms in the Bonga district, a town
in the Kaffa zone, to grow high-quality Arabica coffee. It has ideal Agroecological conditions for
specialty coffee production. The soil's altitude ranges between 1600 and 1900 meters, it's red in
14
color, and the temperatures are ideal for coffee production. The area is known for its high
precipitation levels and is thus regarded as one of Ethiopia's rainiest regions. The majority of farms,
estates, and cooperatives supply both washed and natural sun-dried coffee to international markets.
Because of its distinct flavor and bean appearance, many cuppers associate Kaffa washed coffee
with coffee from the Borena region, while others compare its flavor to that of neighboring Limu
coffee [21].
2.2.2.4. Nekempti
Nekempti is an area located within the state of Wollega, also known as Nekempti. Usually, this c
offee will be sold as "Nekempti," a coffee trade name for Western Ethiopian coffees traded throu
gh the town of Nekempti, although the coffee actually comes from East Wollega, also known as
"Misraq Wollega," which is the Gimbi woreda. Nekempti is a sun-dried natural bean originally
grown in western Ethiopia. The coffee is distinguished by its large bean size, and the flavor can
have a strong perfume-like aftertaste. Wollega's coffee processing methods have traditionally been
sun-dried natural [21].
15
Figure 2. 7 Nekempti Coffee Bean Image [15]
2.2.2.5. Gujji
Gujji is another fantastic region. Coffee from Gujji, located in the southern Sidamo region, is
sought after by some of the world's best roasters. Sweet floral notes, such as jasmine with melon
and peach, and a tea-like body can be found in the cup.
16
Table 2. 1 Coffee Bean Characteristics
Red cherry is picked and freshly sorted before pulping. Over-ripe and under-ripe beans are
handpicked and separated before processing. Fresh red cherries are supplied to the washing
stations. Coffees are pulped and allowed to ferment naturally. The fermented coffee is washed in
clean running water, soaked in clean water, and dried to retain approximately 11.5 percent
moisture. Dried parchment coffee is stored in a field warehouse until it is transported to Addis
Abeba for further processing. The husks are removed from the parchment coffee in a dry
17
processing warehouse, and the clean beans are packaged in label bags (60 Kg bags/132 lbs) for
export [4].
The coffee soaks in the tank for about 12 hours before being transferred to the raised bed, where
it dries to the proper moisture level for about two weeks. After the coffee has been dried to the
appropriate moisture level, it will be handpicked to remove any exposed or damaged coffee. The
dried parchment will be transported to the cooperative warehouse where the coffee will be stored
before being loaded into the Addis Ababa warehouse for dry processing [5].
The outer skin of the coffee cherry is removed immediately after harvesting, usually on the same
day the cherries were picked, in the washed coffee processing. This is done using machines which
"pick" or scrape away just the very outer layer of the cherry, leaving behind the parchment coffee
covered in sticky mucilage. The mucilage-coated beans are then fermented with water in large
tanks made of cement. The process of fermentation breaks down the sugars in the mucilage and
frees it from the parchment. Fermentation usually takes around 24 hours [11], though shorter or
longer fermentation times are possible, depending on the local climate, altitude, and other factors.
Once fermentation is complete, the coffee is removed from the fermentation tank and physically
pushed down lengthy pipes using flowing water. Any leftover mucilage is freed and separated
from the parchment coffee by agitation. The coffee enters another tank at the end of the channels,
where it is rinsed with new water. The result is wet coffee in parchment, free of the sticky mucilage
[18].
From the final washing tank, the wet parchment coffee [22] is taken to dry in the sun on raised
beds. This process of drying happens quickly, because there is no skin or mucilage between the
sun and the parchment. After one or two days in the sun, the coffee is removed from the beds and
18
stored in sacks in a warehouse. When it is to be exported, the coffee is usually taken to a larger
central mill where the parchment is removed, and the coffee is sorted and bagged for export.
Washed coffee has a clarity of flavor and scent that natural coffees frequently lack. Many cuppers
believe that with washed coffees, the influence of soil and varietals is more immediately evident.
The acidity is more noticeable, and the cup is generally cleaner. The cleanest, highest-quality,
high-altitude washed coffees can have an intensely refreshing flavor; however, the washed process
requires a large amount of water and more infrastructure. The washing procedure is simply not
practical in many places [12].
According to [22], The cherries' skin and sticky juices dry out in the sun. Depending on the
temperature and intensity of the sun, this process can take several days to a few weeks. The coffee
is covered up at night or when it rains. The cherries shrink in size during the drying process and
eventually become hard and completely dry. Following the completion of the process, sacks of
dried cherries are transported to a hulling station for the removal of the outer cherry.
Care must be taken to ensure that the cherries are dried uniformly and that no contaminating
substances come into contact with the cherries, such as direct contact with soil. Inadequate
attention to this information may result in muddy, dirty, or fermented flavors in the cup. Natural
processing has the significant advantage of not requiring any water or elaborate machinery or
facilities. As a result, more naturally processed coffees can be found in drier areas, as well as
poorer or more remote areas [4].
19
which can be the basis for a region. In a sophisticated image processing system, specific image
processing operations should be applicable to selected regions.
A digital image a[m,n] described in a 2D discrete space is derived from an analog image a(x,y) in
a 2D continuous space through a sampling process that is frequently referred to as digitization.
The 2D continuous image a(x,y) is divided into N rows and M columns. The intersection of a row
and a column is termed a pixel. The value assigned to the integer coordinates [m,n] with
{m=0,1,2,…,M–1} and {n=0,1,2,…,N–1} is a[m,n]. In fact, in most cases a(x,y) – which we might
consider to be the physical signal that impinges on the face of a 2D sensor – is actually a function
of many variables including depth (z), color (λ), and time (t).
An image captured by a sensor or a camera is expressed as a continuous function f(x, y) defined
on continuous variables x and y. And the function values refer to the amplitude at that point (x, y)
[3]. To convert such image to digital form, we have to sample the continuous image in both
coordinates and the amplitude. Digitizing the coordinate values x and y is sampling that
provide the set of pixels. To convert such an image to digital form, we must sample it in both
coordinates and amplitude. Sampling is the process of digitizing the coordinate values x and y
in order to provide the set of pixels. Quantization is the process of digitizing the amplitude,
which is the gray level. Quantization is the process of converting a continuous graylevel
(amplitude) into a discrete quantity [12].
Columns
Value =a(x, y, z, λ, t)
Digital Image Processing (DIP) deals with manipulation of digital images through a digital
computer. DIP focuses on developing a computer system that is able to perform processing on an
20
image. This digital image processing has been employed in number of areas such as image
classification, pattern recognition, remote sensing, image-sharpening, colour and video processing
and medical [7]. A digital image can be considered as a discrete representation of data possessing
both spatial (layout) and intensity (colour) information [24].
An image contains one or more colour channels that define the intensity or colour at a particular
pixel location Iðm; nÞ. In the simplest case, each pixel location only contains a single numerical
value representing the signal level at that point in the image. The conversion from this set of
numbers to an actual (displayed) image is achieved through a colour map. A colour map assigns
a specific shade of colour to each numerical level in the image to give a visual representation of
the data. The most common colour map is the greyscale, which assigns all shades of grey from
black (zero) to white (maximum) according to the signal level.
The most commonly used color channels in the image are red, green and blue and gray-scale color
channels or colour space. colour space. Colour space is considered as a mathematical entity. An
image is really only a spatially organized set of numbers with each pixel location. Gray-scale
(intensity) or binary images are 2-D arrays that assign one numerical value to each pixel which is
representative of the intensity at that point. They use a single-channel colour space that is either
limited to a 2-bit (binary) or intensity (grey-scale) colour space. By contrast, red, green and blue
or true-colour images are 3-D arrays that assign three numerical values to each pixel, each value
corresponding to the red, green and blue components respectively [24].
The RGB color space is the most commonly used for digital image representation because it
corresponds to the three primary colors that are mixed for display on a monitor or similar device.
A common misconception is that items perceived as blue, for example, will only appear in the blue
channel, and so on. While items perceived as blue will undoubtedly appear brightest in the blue
channel (i.e. contain more blue light than the other colors), they will also contain milder red and
green components.
A simple transform can be used to convert an RGB color space to a greyscale image. Many image
analysis algorithms begin with grey-scale conversion because it essentially simplifies (i.e. reduces)
the amount of information in the image. Although a greyscale image contains less information than
a color image, the majority of important, feature-related information, such as edges, regions, blobs,
junctions, and so on, is preserved. [24].
21
2.4. Image Processing
Digital image processing refers to analyzing digital images by means of a digital computer to do
some operation to improve the images and extract important image information or representations,
it’s the extraction of meaningful information from image which contains element or pixels [25].
Image processing and image analysis are at the heart of computer vision, with a plethora of
algorithms and methods available to achieve the necessary classification and grading. According
to this viewpoint, digital image processing is divided into two major tasks. These include
improvement of pictorial information for human interpretation; and the other task is processing of
image data for storage, transmission and representation for autonomous machine perception [3].
A computer-vision application using image processing techniques involves five basic steps such
as image acquisition, preprocessing, segmentation, feature extraction, classification and
interpretation [3] [26].
22
a process of retrieving a digital image from a physical source which is captured using sensors or
cameras since the quality of images will be affected through different factors [3].
23
2.4.2.3. Image Restoration
According to [26], image restoration is an area that also deals with improving the appearance of
an image. However, unlike enhancement, which is subjective, image restoration is objective, in the
sense that restoration technique tends to be based on mathematical or probabilistic models of image
degradation.
Types of Noise
Noise is an unwanted interference that degrades image quality. It causes undesirable effects on the
image such as blurring, artifacts, edge distortion, and so on. Before applying any filter, it is critical
to understand the type of noise in the image [30]. There are various types of noise depending upon
the source and cause that can affect the image.
The standard model of amplifier noise is additive, Gaussian, dependent at each pixel and dependent
of the signal intensity, caused primarily by Johnson–Nyquist noise (thermal noise), including that
which comes from the reset noise of capacitors ("kTC noise"). It is a simplified version of white
noise caused by random fluctuations in the signal. Amplifier noise accounts for a significant
portion of image sensor noise, that is, the constant noise level in dark areas of the image. Each
pixel in the image is changed from its original value by a (usually) small amount in Gaussian noise
24
[30]. A normal distribution of noise is shown by a histogram, which is a plot of the amount of
distortion of a pixel value against the frequency with which it occurs. While other distributions are
possible, due to the central limit theorem, which states that the sum of different noises tends to
approach a Gaussian distribution, the Gaussian (normal) distribution is usually a good model. This
type of noise directly attacks on grey scale values of an image and so it is described as a PDF
function:
1 2
𝑃(𝑥 ) = 𝑒 −(𝑥−𝑚) /2𝜖 2 ……...........................Equation 2. 1
√2𝑝𝑖
where, 𝑥 is intensity, 𝑚 is mean value, and ℰ2 is the variance of x.
Salt and pepper noise can also be named as impulse valued noise or data drop noise as it causes
drop in original data values. Consider a 3x3 matrix of an image and assume that the value of upper
white colored pixel value (255 intensity) is corrupted by salt and pepper noise. So, this noise inserts
dead pixel in the image and that dead pixel can either be dark or light as shown in figure 2.11 [29].
Figure 2. 11 White Color Pixel Value Corrupted with Salt & Pepper Noise [27]
Denoising Techniques
We use the filtering process to enhance and denoise the image. There are various types of filters,
such as the Gaussian filter, median filter, morphological filter, Wiener filter, and so on. Gaussian
noise can be removed using the Gaussian denoising algorithm [31]. Salt-and-Pepper can be
removed by the median filter, combined median and mean filters algorithm.
25
Gaussian Filter: Gaussian denoising is used to smooth images more effectively. It is the first step
in detecting noise removers, but it is ineffective at removing salt-and-pepper noise. It is based on
the Gaussian distribution [32].
Median Filter: We use a median filter instead of a Gaussian filter to improve edge estimation. It
is a nonlinear filter that is used to remove salt-and-pepper noise [33].
There are two main ways to convert categorical values into numerical values. One-Hot-Encoding
and Label-Encoder. Both of these encoders are part of SciKit-learn library (one of the most widely
used Python library) and are used to convert text or categorical data into numerical data which the
model expects and perform better with [34].
Label Encoding
Label encoding is a transformation of string value to numerical value [28]. It is a popular encoding
technique for handling categorical variables. In this technique, each label is assigned a unique
integer based on alphabetical ordering. In this encoding, each category is assigned a value from 1
through N (here N is the number of categories for the feature [35].
Img1.jpg Gujji 0
Img2.jpg Jimma 1
Img3.jpg Kaffa 2
Img4.jpg Nekempti 3
Img5.jpg Sidamo 4
Img6.jpg Yirgacheffe 5
26
One-Hot Encoding
Once we have label encoding, we might confuse our model into thinking that a column has data
with some sort of order (0 < 1 <2) or hierarchy when we clearly don't have it. To avoid this, this
column is ' One - Hot Encoding '. What one hot encoding does is take a column with categorical
data, which has been encoded as a label, and then split the column into multiple value. The
numbers are replaced by 1s and 0s, depending on which column has which value [8].
Image segmentation technique is used to partition an image into meaningful parts having similar
features and properties. The main aim of segmentation is simplification i.e. representing an image
into meaningful and easily analyzable way. Image segmentation is necessary first step in image
analysis. The goal of image segmentation is to divide an image into several parts/segments having
similar features or attributes. The basic applications of image segmentation are: Content-based
image retrieval, Medical imaging, Object detection and Recognition Tasks, Automatic traffic
control systems and Video surveillance, etc. The image segmentation can be classified into two
27
basic types: Local segmentation (concerned with specific part or region of image) and Global
segmentation (concerned with segmenting the whole image, consisting of large number of pixels)
[36].
Image Segmentation
Methods
1. Compute a segmentation function. This is an image whose dark regions are the objects you are
trying to segment.
2. Compute foreground markers. These are connected blobs of pixels within each of the objects.
3. Compute background markers. These are pixels that are not part of any object.
4. Modify the segmentation function so that it only has minima at the foreground and background
marker locations.
5. Compute the watershed transform of the modified segmentation function.
28
Figure 2. 13 Watershed segmentation process
Suppose that a gray-level image f can take K possible gray levels 0, 1, 2, . . . , K - 1. Define an
integer threshold, T, that lies in the gray-scale range of T lies between (0, 1, 2, . . . , K - 1). The
process of thresholding is a process of simple comparison: each pixel value in f is compared to
threshold, T. Based on this comparison; a binary decision is made that defines the value of the
corresponding pixel in an output binary image g. If g(x, y) is a thresholded version of f(x, y) at
some global threshold T.
1 𝑖𝑓 𝑓(𝑥, 𝑦) ≥ 𝑇
𝑔(𝑥, 𝑦) = { …………………………………...Equation 2. 2
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
29
Figure 2. 14 Otsu-Thresholding
Algorithm:
Step 1: Compute histogram for a 2D image.
Step 2: Calculate foreground and background variances (measure of spread) for a single threshold.
30
Region Based Method based on partitioning more immune to noise, expensive method in
image into homogeneous useful when it is easy to terms of time and
regions define similarity criteria memory
Clustering Method based on division into fuzzy uses partial determining
homogeneous clusters membership therefore membership
more useful for real function is not easy
problems
Watershed Method based on topological results are more stable, complex calculation
interpretation detected boundaries are of gradients
continuous
PDE Based Method based on the working of fastest method, best for more computational
differential equations time critical applications complexity
ANN Based Method based on the simulation of no need to write complex more wastage of
learning process for
programs time in training
decision making
2.4.4. Feature Extraction
Feature extraction is a low-level image processing application. For a picture, the feature is the
"interest" part [27]. In the pattern recognition literature, the name feature is frequently used to
designate a descriptor. Repeatability is the desirable property of a feature detector. After image
segmentation, the next step is to extract image features useful in describing target object.
After all the representation of the image important feature of the image is extracted using various
types of feature extraction with respect to images, the similar features together form a feature
vector to identify and classify an object [25]. Various features can be extracted from the image:
color, shape, size, texture. There are some local feature detector and visual descriptor, which are
used for object recognition and classification. Some of them are Speeded Up Robust Features
(SURF), Histogram of Oriented Gradient (HOG) and Local Binary Pattern (LBP) [27].
31
Feature Extraction
32
2.5. Machine Learning
Machine learning (ML) is a subset of artificial intelligence that involves teaching a machine to
make decisions using input data. It is used to build models with self-learning capabilities. These
models are automatically trained to learn from experience or past data over and over again, and
then provide knowledge to a machine. This technique is widely used for forecasting the future or
categorizing data to assist people in making important decisions [35].
According to [28], The primary goal of machine learning (ML) is to create systems that can change
their behavior on their own based on experience. ML methods employ training data to generate
general models capable of detecting the presence or absence of new (test) data patterns. In the case
of images, training data can be in the form of a set of pixels, regions, or images that can or cannot
be labeled. Machine learning is used in a variety of fields, including hospitals, manufacturing
industries, robotics, computer games, pattern recognition, natural language processing, image
processing and classification, data mining, traffic prediction, product recommendation, marketing,
medical diagnosis, agriculture advisory, e-mail spam filtering, crime prediction through video
surveillance systems, and the like [35].
33
[43]. The degree of difficulty of the classification task depends on the variability in the feature
values of images from the same category, relative to the difference between feature values of
images from different categories. However, a perfect classification performance is often
impossible. This is mainly due to the presence of noise (in the form of shadows, occlusions,
perspective distortions, etc.), outliers (e.g., images from the category “buildings” might contain
people, animal, building, or car category), ambiguity (e.g., the same rectangular shape could
correspond to a table or a building window), the lack of labels, the availability of only small
training samples, and the imbalance of positive/negative coverage in the training data samples.
Thus, designing a classifier to make the best decision is a challenging task [43].
34
Figure 2. 16 Schematic of a typical Artificial Neural Network (ANN) architecture
As described in [35], at the end of the day we are using multi neural network architectures by
creating deep learning neural networks, the main idea behind deep learning is to mimic the human
brain like how human learn a thing by the help of human brain. Similarly, we are creating models
that are learning these things using deep learning methods. The most popular deep learning
methods have been widely used are Convolutional neural network(CNN), Denoising autoencoder
(DAE), deep belief networks (DBNs), Long Short-Term Memory (LSTM) and denoising Auto
Encoder (DAE).
35
Figure 2. 17 Venn diagram which describes deep learning [9]
36
Another widely used and popular algorithm in deep learning, especially in NLP and speech
processing, is RNN [46]. Unlike traditional neural networks, RNN utilizes the sequential
information in the network. This property is essential in many applications where the embedded
structure in the data sequence conveys useful knowledge. For example, to understand a word in a
sentence, it is necessary to know the context. Therefore, an RNN can be seen as short-term memory
units that include the input layer x, hidden (state) layer s, and output layer y.
above Figure 2.17 illustrates a simple RNN with one input unit, one output unit, and one recurrent
hidden unit expanded into a full network, where Xt is the input at time step t and ht is the output
at time step t. During the training process, RNN uses a backpropagation algorithm, a prevalent
algorithm applied in calculating gradients and adjusting weight matrices in ANN. However, it will
adjust and the weights following the modification of the feedback process.
One main issue of an RNN is its sensitivity to the vanishing and exploding gradients [47]. In other
words, the gradients might decay or explode exponentially due to the multiplications of lots of
small or big derivatives during the training. This sensitivity reduces over time, which means the
network forgets the initial inputs with the entrance of the new ones. Therefore, Long Short-Term
Memory (LSTM) is utilized to handle this issue by providing memory blocks in its recurrent
connections.
37
four interacting layers with a unique method of communication. The structure of the LSTM neural
network is shown in Figure 2.19.
In figure 2.19, 𝐶, 𝑥, ℎ represent cell, input and output values. Subscript 𝑡 denotes time step value,
i.e., 𝑡 − 1 is from previous LSTM block (or from time 𝑡 − 1) and 𝑡 denotes current block values.
The symbol σ is the sigmoid function and 𝑡𝑎𝑛ℎ is the hyperbolic tangent function. Operator + is
the element-wise summation and x is the element-wise multiplication [47].
A convolutional neural network (CNN) is a deep, feed-forward artificial neural network in which
the neural network preserves the hierarchical structure by learning internal feature representations
and generalizing the features in the common image problems like object recognition and other
38
computer vision problems. It is not restricted to images; it also achieves state-of-the-art results in
natural language processing problems and speech recognition [49].
CNN is mostly used when there is an unstructured data set (e.g., images) and the practitioners need
to extract information from it. For instance, if the task is to predict an image caption: CNN is
typically used when practitioners need to extract information from an unstructured data set (e.g.,
images). For example, suppose the task is to predict an image caption: The CNN receives an image
of, say, a cat; this image is a collection of pixels in computer terms. In general, one layer for a
greyscale image and three layers for a color image. During feature learning (i.e., hidden layers),
the network will identify unique features such as the cat's tail, ear, and so on. When the network
has thoroughly learned how to recognize a picture, it can provide a probability for each image that
it is familiar with. The label with the highest probability will become the prediction of the network
[48]. Network consists of three types of layers namely convolution layer, sub sampling layer and
the output layer [50].
Given a 2D input feature map and a convolution filter of matrix sizes 4x4 and 2x2, respectively, a
convolution layer multiplies the 2x2 filter with a highlighted patch (also 2x2) of the input feature
map and sums up all values to generate one value in the output feature map. Note that the filter
slides along the width and height of the input feature map and this process continues until the filter
can no longer slide further [43].
Let us feature map =Ai, input image size =Ak, filter or kernel size =Aik, the order to find the
feature map we can use the formula below.
40
Pooling is a technique for reducing dimensionality by selecting the most representative feature.
This allows us to reduce the number of parameters, which both shortens training time and combats
overfitting. Pooling layers downsample each feature map independently, reducing the height and
width while leaving the depth intact [53]. The most common type of pooling is max pooling, which
simply takes the maximum value in the pooling window. One limitation of convolutional layer
feature map output is that it records the precise position of features in the input. This means that even
minor changes in the position of the feature in the input image result in a different feature map. This
can occur when the input image is re-cropped, rotated, shifted, or other minor changes are made.
The most common type of pooling is max pooling which just takes the max value in the pooling
window. Contrary to the convolution operation, pooling has no parameters. It slides a window over
its input, and simply takes the max value in the window. Similar to a convolution, we specify the
window size and stride [54]. The another type of pooling average pooling which compute the
average value and takes that value in the pooling window.
1 0 5 4
Max Pooling with 6 8
5 6 7 8
2x2 window and stride 2 3 4
3 2 1 0
1 2 3 4
The above figure 2.22 show that, pooling window 2x2 and stride 2 and then compare the value
takes the max value.
41
2.6.3.4. Activation Function
The idea of an activation function comes from the analysis of how a neuron works in the human
brain. The neuron becomes active beyond a certain threshold, better known as the activation
potential. It also attempts to put the output into a small range in most cases. Sigmoid, hyperbolic
tangent (tanh), ReLU, softmax and LeakyReLU are most popular activation functions [49].
𝑓 (𝑥 ) = 𝑚𝑎𝑥(0, 𝑍)………………………………………...Equation 2. 4
Softmax
The softmax function of z, is a generalization of the sigmoid function which represents a
probability distribution over a discrete variable with n possible values. The softmax function
generally used to treat the output of each unit as the probability belonging to each class and
described as Eq. (2.5).
𝑒𝑥
𝑓 (𝑥 ) = …………………………………………….. Equation 2. 5
∑𝑘
𝑘=1 𝑒
𝑥
2.6.3.5. Hyperparameters
Stride: It specifies how much we move the convolution filter at each step. By default, the value is
1 [54].
Filter size: The most typically used filter size are, 3x3 filters, but 5x5 or 7x7 are also used
depending on the application [54].
Padding: It is commonly used in CNN to preserve the size of the feature maps, otherwise they
would shrink at each layer, which is not desirable. The most common value for padding
hyperparameter is same or zero which indicates the input and the output have the same size [54].
Batch size: The batch size is a hyperparameter that defines the number of samples to work through
before updating the internal model parameters.
42
Epoch: The number of epochs is a hyperparameter that defines the number times that the learning
algorithm will work through the entire training dataset. One epoch means that each sample in the
training dataset has had an opportunity to update the internal model parameters.
Cross-Entropy Loss
The cross-entropy loss (also termed “log loss” and “soft-max loss”) is defined as follows:
total of N neurons in the output layer, therefore p, y 𝜖 Rn The probability of each class can be
𝑒 𝑧𝑗
𝜎(𝑍)𝑗 = ………………………………………………………………….……………………Equation 2. 7
∑𝑘 𝑧
𝑘=1 𝑒 𝑘
for j = 1 … K. We can see that the softmax function normalizes a K dimensional vector z of
arbitrary real values into a K dimensional vector σ(z) whose components sum to 1.
43
Data augmentation
Data augmentation is the easiest, and often a very effective way of enhancing the generalization
power of CNN models. Especially for cases where the number of training examples is relatively
low, data augmentation can enlarge the dataset (by factors of 16x, 32x, 64x, or even more) to allow
a more robust training of large-scale models. Data augmentation is performed by making several
copies from a single image using straightforward operations such as rotations, cropping, flipping,
scaling, translations, and shearing [55].
Horizontal and Vertical Shift Augmentation: A shift to an image is the movement of all pixels
in one direction, such as horizontally or vertically, while keeping the image dimensions constant.
This means that some pixels will be clipped from the image, and a region of the image will require
new pixel values to be specified. The ImageDataGenerator constructor's width shift range and
height shift range arguments control the amount of horizontal and vertical shift, respectively. These
arguments can include a floating point value indicating the percentage (between 0 and 1) of the
image's width or height to shift. Alternately, a number of pixels can be specified to shift the image.
Specifically, a value in the range between no shift and the percentage or pixel value will be sampled
for each image and the shift performed, e.g. [0, value]. Alternately, you can specify a tuple or array
of the min and max range from which the shift will be sampled; for example: [-100, 100] or [-0.5,
0.5] [56].
Horizontal and Vertical Flip Augmentation: In the case of a vertical or horizontal flip, an image
flip means reversing the rows or columns of pixels. A boolean horizontal flip or vertical flip
argument to the ImageDataGenerator class constructor specifies the flip augmentation [56].
Random Brightness Augmentation: The brightness of the image can be increased by randomly
darkening, brightening, or both. The goal is for a model to generalize across images trained under
different lighting conditions. This can be accomplished by passing the brightness range argument
to the ImageDataGenerator constructor, which specifies the min and max range as a float
representing a percentage for determining the amount of brightening. Values less than 1.0 darken
the image, e.g. [0.5, 1.0], whereas values larger than 1.0 brighten the image, e.g. [1.0, 1.5], where
1.0 has no effect on brightness [56].
44
Random Zoom Augmentation: A zoom augmentation randomly zooms in on an image and either
adds new pixel values around it or interpolates pixel values. The zoom range argument to the
ImageDataGenerator constructor can be used to configure image zooming. The zoom percentage
can be specified as a single float or as a range as an array or tuple.
If a float is specified, then the range for the zoom will be [1-value, 1+value]. For example, if you
specify 0.3, then the range will be [0.7, 1.3], or between 70% (zoom in) and 130% (zoom out).
The zoom amount is uniformly randomly sampled from the zoom region for each dimension
(width, height) separately. The zoom may not feel intuitive. Note that zoom values less than 1.0
will zoom the image in, e.g. [0.5,0.5] makes the object in the image 50% larger or closer, and
values larger than 1.0 will zoom the image out by 50%, e.g. [1.5, 1.5] makes the object in the
image smaller or further away. A zoom of [1.0,1.0] has no effect [56].
Dropout
During network training, each neuron is activated with a fixed probability (usually 0.5 or set using
a validation set). This random sampling of a sub-network within the full-scale network introduces
an ensemble effect during the testing phase, where the full network is used to perform prediction.
Activation dropout works really well for regularization purposes and gives a significant boost in
performance on unseen data in the test phase. Dropout has predominantly been applied to fully-
connected(FC) layer.
45
-
𝜃𝑡 = 𝜃𝑡−1 𝜂𝛿𝑡 ………………………………....Equation 2. 8
𝛿𝑡 = 𝛻𝜃 Ӻ(𝜃𝑡 )……………………………………….……...Equation 2. 9
Where F(.) denotes the function represented by the neural network with parameters θ, ∇ represents
the gradient, and η denotes the learning rate.
Being computationally efficient, ADAM requires less memory and outperforms on large datasets. It
require p2, q2, t to be initialized to 0, where p0 corresponds to 1st moment vector i.e.
mean, q0 corresponds to 2nd moment vector i.e. uncentered variance and t represents timestep.
While considering ƒ(w) to be the stochastic objective function with parameters w, proposed values
of parameters in ADAM, are as follows:
α = 0.001, m1=0.9, m2=0.999, ϵ = 10-8.
Another major advantage discussed in the study of ADAM is that the updation of parameter is
completely invariant to gradient rescaling, the algorithm will converge even if objective function
46
changes with time. The drawback of this particular technique is that it requires computation of
second-order derivative which results in increased cost [58].
AdaDelta
The underlying idea of AdaDelta algorithm is to improve the two main drawbacks of AdaGrad:
The continual decay of learning rates throughout training and the need for a manually selected
global learning rate. To this end, AdaDelta restricts the window of past gradients to be some fixed
size w instead of accumulating the sum of squared gradients over all time. As mentioned in the
previous section, AdaGrad accumulates the squared gradients from each iteration starting at the
beginning of training. This accumulated sum continues to grow during training, effectively
shrinking the learning rate on each dimension. After many iterations, the learning rate becomes
infinitesimally small. With the windowed accumulation AdaGrad becomes a local estimate using
recent gradients instead of accumulating to infinity. Thus, learning continues to make progress
even after many iterations of updates have been done [57].
RMSProp
Another algorithm that modifies AdaGrad is RMSProp. It is proposed to perform better in the
nonconvex setting by changing the gradient accumulation into an exponentially weighted moving
average. AdaGrad shrinks the learning rate according to the entire history of the squared gradient.
Instead, RMSProp uses an exponentially decaying average to discard history from the extreme
past so that it can converge rapidly after finding a convex bowl [57].
47
2.6.4. Examples of CNN Architectures
Various architectures have been developed and implemented in CNN. Practically most CNN
architectures follow the same universal design philosophies of convolutional layers to the input
image, pooling layer for extract important feature, finally, fully connected layer, activation
function, and dropout layers are performed based on the convolution that gets the best architecture
and some architectures used batch normalization for accelerating training of the model and reduce
covariant shift [59]. Brief explanations of those architectures are explained below.
48
Input Image Layer 1: Conv+pool Layer 2: Conv+pool
(224x224x3) f. size (11x11) f. size (5x5)
Layer 8: Softmax
Layer 6: Full+Drop Layer 7: Full+Drop
Classes
Figure 2. 24 AlexNet Architecture
The VGGnet architecture strictly uses 3x3 convolution kernels with intermediate max-pooling
layers for feature extraction and a set of three fully connected layers toward the end for
classification. Each convolution layer is followed by a ReLU layer in the VGGnet architecture.
The design choice of using smaller kernels leads to a relatively reduced number of parameters, and
therefore an efficient training and testing. Similar to AlexNet, it also uses activation dropouts in
the first two fully connected layers to avoid over-fitting [43].
49
Figure 2. 25 VGGNet architecture
50
figure 2. 26 Inception module, naïve version
According to [3], the classification and gradation of sesame grain has been proposed using
image processing techniques based on the Ethiopian commodity exchange criteria. Based on
52
their growing area, the researcher aimed to sort and classify sesame grain into the
corresponding three groups (white Humera, white Wollega and reddish Wollega and five
grades (grade one, grade two, grade three, grade four and under grade. The researcher takes
pictures of sample sesame grains and processes the image to line the classes and grades. A
segmentation technique is proposed to segment the foreground from the background, partitioning
both sesame grains and foreign particles. The segmentation process also forms the bottom work
from which feature extractions are made. Color structure tensor is applied to return up with a far
better preprocessing, segmentation and have extraction activities. Furthermore, watershed
segmentation is applied to separate connected objects. The delta E standard color difference
algorithm, which generates six color features, is employed for classification of sesame grain
samples. These six color features are used as inputs for classification and therefore the system
generates 3 outputs like classes (types) of Ethiopian sesame grains. Grading of sesame grain
samples is performed employing a rule based approach, where the classification output is going to
be fed with 4 inputs and five or six outputs, like the morphological (size and shape) features and
grades, respectively. On top of that, calibration is introduced to standardize the whole system.
Experiments were administered to gauge the performance of our proposed system design. The
classifier achieved an overall accuracy of 88.2%. For grading of sesame grain samples, we got an
accuracy of 93.3%, much better than the manual way of grading. However, used human
interpretation of feature extraction from sesame grain images of the dataset is time consuming and
only applicable for sesame classification and grading and this study highly dependent on color.
According to [26], the researchers were developed image based flower disease detection and
identification using artificial neural network so as to group the subsequent corresponding classes
such as:- rose-aphid, rose-japanese- beetle, rose-rosettle-disease, rose-goldenrod-solider,
rose-mossay rose-gell-wasp, rose- normal, rose-rust, and rose-solider-beetles.
They mainly classified their work into two main phases. In first phase, normal and diseased
flower image are to create a knowledge domain. During the creation of the knowledge
domain, images are pre-processed and segmented to spot the region of interest. Then, seven
different texture features of images are extracted using Gabor texture feature extraction.
Finally, a man-made neural network is trained using seven input features extracted from the
individual image and eight output vectors that represent eight different classes of disease to
represent the knowledge domain. In second phase, the knowledge domain is employed to
53
spot the disease of a flower. so as to make the knowledge domain and to check the
effectiveness of the developed system, they have used 40 flower images for every of the eight
different classes of flower disease and that we have a complete of 320 flower images. From
those images 85% of the Dataset is employed for training and 15% of the info set is
employed for testing. The experimental result demonstrates that the proposed technique is
effective technique for the identification of flower disease. The developed system can
successfully identify the examined flower with an accuracy of 83.3%. However, feature
extraction is time-consuming which needs to be changed whenever the problem or the dataset
changes and feature extraction expansive since manual base feature extraction applied.
Asma Redi [11], produced raw quality, Ethiopian coffee bean value classification in the case
of Wollega area. This research work uses various techniques to eliminate noise from the image
in the picture. Subtraction of the background was performed to prevent blur, light reflections,
and other sounds that may be created on the background due to lighting effects and certain
external artifacts.
Image enhancement and histogram thresholding were used for the extraction of morphological
features and color features from the sting images of the 7 grade levels of Wollega coffee
beans. A combined morphological and color features aggregate function dataset were used
to develop the classification model. The classification models were built with the N aïve
Bayes, C4.5 and ANN yielding a performance of 82.72%, 82.09% and 80.25%, respectively.
so on reinforce the classification performance discretization of the dataset into raw quality,
value into three beans were used. Regression model for the relation b etween the raw quality
values and thus the combined aggregate feature values of the sample coffee beans were
designed to support suitability and accuracy of the dataset for classification. However, this
research work was limited to classification model for raw quality, value classification
purposes by utilizing a smaller number of datasets from each grade level of the coffee berry
sample. In addition to this, this study is limited for only Wollega region coffee and it didn’t
consider other varieties of coffee since the physical and chemical characteristics of each variety is
different. Also the researcher applied machine learning algorithm to classify the coffee bean into
corresponding classes with small number of data. But, to extract features in deep and to increase
the classification performance of model it is better to apply deep learning approach with large
number of images.
54
Habtamu Minasie [12], has developed automated coffee berry classification by taking
a coffee berry image using machine learning techniques supported morphological and color
features was developed to classify different sorts of Ethiopian coffee supported their growing
region such as: - (Bale, Nekempti, Jimma, Limu, Sidamo and Welega). The entire number of
images taken was 309 which contain 4844 coffee beans. For the classification analysis, ten
morphological and 6 color, features were extracted from each coffee berry images. The
researcher used Naïve Bayes and Neural Network classification approaches of classifiers on
each classification parameter of morphology, color and therefore the combination of the two.
The arrangement was supervised like the predefined classes of the growing regions. The
researcher was also shown that the discrimination power of morphological features was better
than color features but when both morphology and color features were used together the
classification accuracy was increased. the simplest classification accuracies (80.7%, 72.6%,
56.8%, 96.77%, 95.42% and 69.9% for Bale, Nekempti, Jimma, Limu, Sidamo and Welega
respectively) were obtained using neural networks when both morphology and color features
were used together. The general classification accuracy was 77.4%. This study, however, is
limited to six coffee bean classes and the researcher didn’t incorporate Yirgacheffe, Kaffa and
Gujji. Deep learning approach is the best approach to solve hand-crafted feature extraction process
of machine learning which requires human intervention during feature extraction. The extent in
which the classifier classified the coffee bean is not that much promising. Deep learning based
approaches have performed exceptionally well in solving complex problems, hence the feature of
each coffee bean image is relatively similar.
55
a pre-trained network model The proposed frameworks are evaluated on the most
important publicly available dermoscopy image database, HAM10000. the
primary framework achieves approximately 94.90% sensitivity, 97.19% specificity, and
96.51% accuracy and therefore the second framework achieves about 94.33% sensitivity,
96.87% specificity, and 96.51% accuracy. and therefore the two frameworks are compared
with the prevailing carcinoma classification methods and the experimental results show that
the proposed frameworks have higher classification accuracy than the opposite methods,
which indicates that the frameworks proposed during this paper are effective and feasibilities
that the frameworks proposed in this paper are effective and feasible. However, this study limited
for only skin lesion classification and dataset used to conduct this study is directly download from
the internet, these lead to reliability under questions.
Sampada Gulavnai and Rajashri Patil [63], proposed that image based mango plant
disease detection using deep learning approach. They identified techniques of detecting
mango disease are required to market better control to avoid this crisis. By considering this,
paper describes image recognition which provides cost effective and scalable disease
detection technology. Paper further describes new deep learning models which give a
chance for straightforward deployment of this technology. The image dataset which is
termed because the “original mango dataset” comprised 8,853 images taken to conduct
experiment. The four classes include four sorts of diseases. The image count within
the original dataset of every class: mango anthracnose disease (1952 images),
mango mildew disease (1217 images), red rust (3479 images) and mango golmich (2205
images). the photographs were then transformed using data augmentation techniques through
python coding into separate images to make the secondary dataset. Transfer learning
technique is employed to coach a profound Convolutional Neural Network (CNN) to
acknowledge 91% accuracy.
Vaibhav Amit Patel and Manjunath V. Joshi [64], they aimed to classify rice into five
corresponding group by using convolutional neural network. The main gaps what they identified
is manual classification of rice is neither practical nor economically feasible. In addition to this,
traders adulterating a particular type of rice with poor quality type and the mix may include broken
rice, stones, damaged seeds, etc. They proposed two methods to classify the rice types. In the first
56
method, we train a deep convolutional neural network (CNN) using the given segmented rice
images. In the second method, we train a combination of a pretrained VGG16 network and the
proposed method, while using transfer learning in which the weights of a pretrained network are
used to achieve better accuracy. Our approach can also be used for classification of rice grain as
broken or fine. We train a 5-class model for classifying rice types using 4000 training images and
another 2-class model for the classification of broken and normal rice using 1600 training images.
The overall accuracy of the model with 5-class and 2-class (broken and normal) classification with
transfer leaning are 94.20% and 99.31% respectively. The limitation of this study is, the researcher
only applied one pretrained model which is VGG16. Many better performing CNN architectures
have appeared in 2014. GoogleNet that won 2014 ImageNet Large-Scale Visual Recognition
Challenge (ILSVRC).
57
5 Getahun Tigistu Automatic Flower Disease Artificial Neural 83.3%
Identification using Image Network
Processing.
6 Habtamu Minassie Image Analysis for Ethiopian Naïve Bayes and 77.4% using neural
Coffee Classification. Neural Network network classifier
classifiers Ethiopian
7 Atnafu Solomon Automatic Skin Lesion CNN 96.51%
Classification in
Dermoscopic Images using
Deep Neural Networks
8 Sampada Gulavnai, Deep Learning for Image CNN 91%
Rajashri Patil Based Mango Leaf Disease
Detection
58
CHAPTER THREE
SYSTEM DESIGN AND ARCHITECTURE
3.1. Overview
Recently defining and classifying objects into the corresponding class using image processing
techniques is applied in different areas. This recognition and classification of an object go through
many processes in order to produce the objects. The classification of an object by the use of an
item’s attribute or features. In this chapter, we will discuss, the methodology of research design
and the proposed system architecture of the classification of coffee beans based on their growing
origin.
In this study, different literature and related works on agricultural products, primarily coffee and
sesame, were reviewed in order to obtain concepts, theories, methods, and to understand the
difference between what is expected in the domain and what is actually happening [35]. According
to ken Peffers [14], the DS process includes six steps: problem identification and motivation,
definition of the objectives for a solution, design and development, demonstration, evaluation, and
communication.
59
Figure 3. 1 Design Science Research Process Model Adopted From [63]
Problem identification and motivation: The research used a variety of ways to find out about
the problem domain, by reviewing previous literature and observing reports from organization,
through discussion with the expert domains, and other sources. This enabled the research to
identify and understand the current problem. In this study, our country Ethiopia produces a large
amount of coffee from various regions and exports it to various countries around the world. The
classification and quality control mechanism, on the other hand, is still manual. As a result of this
and other issues, Ethiopia is the world's fifth largest producer of coffee. Because of the scarcity
and poor quality of coffee, Ethiopia did not gain the expected and planned foreign currency. As
stated in Chapter One, a lack of automated technology causes a variety of problems.
Define the objectives for a solution: The method that the organization is currently using to
classify coffee beans into corresponding regions is time consuming and inefficient. We have begun
to develop an image-based classification model to classify green coffee beans based on their
growing regions, which will support the existing organization's classification method and allow
users to run cost-effective classification and reduce process time.
Design and development: in this study, we designed detailed architecture of the proposed system
and convolutional neural network algorithm. Based on the architecture, we developed the model
that can classify the coffee bean into corresponding regions. To do this, the most popular and easy
60
programing language is used to write code which is called python 3.7.11 version. Anaconda
distribution is installed which have different types deep learning libraries like Keras 2.3 and
tensorflow 2.0 for processing image in python. Integrated development environment is necessary
to edit the source code. Spyder IDE is used to edit the coding part which is embedded in anaconda
distribution. Finally, we developed the coffee bean classification model that reduce the processing
time of the organization.
Demonstration: We will develop graphical user interface to make the communication between
end users of the organization and developed model or artifacts using flask module.
Evaluation: The evaluation focuses on evaluating the results and determining how well the artifact
solves the problem. At this point, the model is evaluated by comparing its accuracy in terms of
confusion matrices such as precision, recall, and f1-score. Our developed artifact was evaluated in
terms of functionality through providing the developed artifact to the employees of EQAIC. As a
result, the system performance testing is carried out by importing various beans of coffee into the
developed prototype, in order to test how effectively the designed coffee bean classification
prototype saves time. On the other hand, user acceptance testing was carried out by preparing
questions for users related to functionality and non-functionality requirements.
Communication: In this study, we will submit a document to Debre Berhan University's
department of information systems and then present to the audience. Our study, as a research,
provides various contributions for researchers who are motivated to conduct research in this area
using deep learning to use as a reference.
61
Figure 3. 2 Design Science Research Framework
62
Figure 3. 3 Proposed System Process
After the important data was collected the next step is image pre-processing techniques were
applied to the acquired images to extract useful features that are necessary for further analysis.
After this, several analytical discriminating techniques are used to classify the images according
to the specific problem at hand. In above figure 3.3 depicts the basic procedure of the proposed
vision-based classification of coffee bean. Thus, the classification of coffee bean starts with
acquiring images of coffee bean using a digital camera. In order to remove noises that occur during
the image acquisition step, we have applied an image enhancement technique. Then, features that
are best suited to represent the image are extracted from the image using an image analysis
63
technique. Based on the extracted features the training and testing data that are used to identify are
extracted in order to classify coffee bean. Finally, appropriate deep learning identifier is selected
to classify an image in to its origin of coffee bean [26].
The both front and back sides of the pictures of the beans were taken. Image preprocessing and
image segmentation techniques were applied to isolate each beans. The size of each image was set
to 224 × 224 pixels. The images were manually labeled to the Gujji, Jimma, Kaffa, Nekempti,
Sidamo and Yirgacheffe with the help of domain experts. All images of the samples were divided
into two groups, training data and test data. The training data was used for the learning of the
neural networks. The validation data was used to confirm the transition of classification accuracy
in the learning phase of neural networks.
It is necessary to resize the coffee bean images to delete the irrelevant regions as well as to reduce
the computational time and space for the deep learning network [28].
65
steps to make it functional in the descriptive stages of coffee classification. First, the images were
converted into gray scale image. The noise occurred during image acquisition is removed using
median filtering. Image smoothing techniques help in reducing the noise. In OpenCV, image
smoothing (also called blurring) could be done in many ways. The collected and prepared images
for training the network in this study were of poor quality (not free from noise) due to a variety of
factors including the device (camera resolution) and uncontrolled environments or temperature
effects. As a result, to remove salt and pepper noise from images, the median filtering technique
was used. For the implementation of this preprocessing technique, the OpenCV programming
function library, which is primarily aimed at real-time computer vision tasks, was used because it
supports multiple programming languages, including Python, and multiple platforms [3].
66
Table 3. 2 Label Encoding
67
The Watershed Transform is a unique technique for segmenting digital images that uses a type of
region growing method based on an image gradient. The concept of Watershed Transform is based
on visualizing an image in three dimensions: two spatial coordinates versus gray levels.
The Watershed Transform effectively combines elements from both the discontinuity and
similarity based methods. Since its original development with grey-scale images, the Watershed
Transform has been extended to a computationally efficient form (using FIFO queues) and applied
to colour images. In addition to this, it requires low computation time, fast, simple and intuitive
method and able to produce a complete division of the image in separated regions [69].
As we have seen clearly figure 3.9, Based on the threshold pixel values of an image within an
object of interest is set to one and the remaining (the background) is set to a pixel value of 0. The
pixel value of 1 indicates white (object of interest) and a pixel value of 0 indicates black
(background). Finally, we have obtained each object of interest in the image has been isolated from
the background and labeled in-order to ease image analysis from the binarized image.
68
Original Image After Otsu segmentation
3.3.5. Normalization
Normalizing is another preprocessing technique we use before further process. For this study,
normalization has used to ensure that each input parameter (in this case pixel of image) has a
similar data distribution or changes the range of pixel intensity values. The main reason why this
preprocessing technique has applied in this study it makes the networks very faster and easier while
it trains the network when the data is not normalized, the shared weights of the network have
different calibrations for different features, which can make the cost function to converge very
slowly and ineffectively. Image pixel values are integer between the ranges of 0 to 255. Although
69
these pixel values can be presented directly to the model, it can result slower training time and
overflow. Overflow is what happens when numbers get too big and the machine fails to compute
correctly. So, we normalize our data values down to a decimal between 0 and 1 by dividing the
pixel values with 255 [71].
In the above figure 3.10, the all resized images were used as the input iteratively and the individual
image was changed into float and divided by 255 in order to range the pixel value of images
between 0 and 1.
70
Dropout = 0.5
+ ReLU
As shown in figure 3.11, the input layer consists of 224 by 224 pixel images which mean that the
network contains 50,176 neurons as input data and the input pixels are grayscale. Here, this model
of CNN has nine hidden layers. The first hidden layer is the convolution layer 1 which is
responsible for feature extraction from an input data. This layer performs convolution operation to
small localized areas by convolving a filter with the previous layer. In addition, it consists of
multiple feature maps with learnable kernels and rectified linear units (ReLU). The kernel size
determines the locality of the filters. ReLU is used as an activation function at the end of each
convolution layer as well as a fully connected layer to enhance the performance of the model. The
next hidden layer is the pooling layer 1. It reduces the output information from the convolution
layer and reduces the number of parameters and computational complexity of the model. The
different types of pooling are max pooling, min pooling and average pooling. Here, max pooling
71
is used to subsample the dimension of each feature map. Third and fourth hidden layer is
Convolution layer 2 and pooling layer 2 respectively, which has the same function as convolution
layer 1 and pooling layer 1 and operates in the same way except for their feature maps and kernel
size varies. Convolution layer 3 and pooling layer 3 which has the same function as convolution
layer 2 and pooling layer 2 and operates in the same way except for their number of filter and
dropout value. Flatten layer is used after the pooling layer which converts the 2D featured map
matrix to a 1D feature vector and allows the output to get handled by the fully connected layers.
A fully connected layer is another hidden layer also known as the dense layer. It is similar to the
hidden layer of Artificial Neural Networks (ANNs) but here it is fully connected and connects
every neuron from the previous layer to the next layer. In order to reduce overfitting, dropout
regularization method is used at fully each convolution layer and fully connected layer 1. It
randomly switches off some neurons during training to improve the performance of the network
by making it more robust. This causes the network to become capable of better generalization and
less compelling to overfit the training data. The final fully connected layer compute the class
scores, resulting in volume of size 1x1x6, where the 6 numbers correspond to a class score, among
the 6 categories coffee bean. Since the output layer uses an activation function such as softmax,
which is used to enhance the performance of the model, classifies the output coffee bean based on
their growing region such as Gujji, Jimma, Kaffa, Nekempti, Sidama and Yirgacheffe which has
the highest activation value as shown in figure 3.11.
72
3120 images
Performance
3.3.7. Hyperparameters
Hyperparameters and their selection are very important concepts in machine learning, especially
in the context of neural networks since these types of models employ a variety of them. Intuitively,
Hyperparameters can be seen as handles that have to be tuned independently of the model
parameters, and are typically chosen before the learning process begins. In this study, we will
mainly discuss optimization-specific hyperparameters, layer-specific hyperparameters, and
regularization-specific hyperparameters [70].
73
Table 3. 4 Hyperparameters Description
3.3.8. Classification
After learnable feature is extracted, the next step is identification and classification of coffee bean
based on their origin using softmax classifier in to six classes. SoftMax is actually a kind of
normalizing function because it deals with the probability of a sample coming from a particular
class. The sum of all the node outputs coming out from SoftMax classification equal to one. since
add 0.166 that is the probability of each six class (0.166+0.166+0.166+0.166+0.166+0.166=1)
[25].
74
Table 3. 5 Confusion Matrix
True positive (TP): - represents the number of images that are correctly classified as positive by
the developed classifier model.
False-positive (FP): -represents the number of images that are classified as positive in predicted
class but they are actually negative.
False-negative (FN): - represents the number of images that are classified as negative in predicted
class but they are actually positive.
True-negative (TN): - represents the number of images that are correctly classified as negative in
both actual and predicted classes.
Accuracy compares how close a new test value is to a value predicted by if ... then rules.
𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑋100%.............................................................................Equation 3. 1
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
Sensitivity measures the ability of a test to be positive when the condition is actually present. It is
also known as True Positive rate [72].
𝑇𝑃
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃+𝐹𝑁 𝑋100%.....................................................................................Equation 3. 2
Specificity measures the ability of a test to be negative when the condition is actually not present.
It is also known as True Negative rate [72].
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃+𝐹𝑃………………………………………………….…….…………………Equation 3. 3
𝑅𝑒𝑐𝑎𝑙𝑙∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑚
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = 2 ∗ ………………………………………………………. Equation 3. 4
𝑅𝑒𝑐𝑎𝑙𝑙+𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
75
CHAPTER FOUR
EXPERIMENT AND EVALUATION
4.1. Introduction
This chapter describes the dataset used in this study, experimental design, performance evaluation
matrices, train and evaluate our model, final train and evaluate state of the art CNN architecture
and developing web application.
76
4.3. Experimental Design
After preparing the dataset in best way to feed for the network, the next step is designing the model.
In this study, we have used the tensorflow framework and Keras API Deep learning library as a
benchmark because Keras provided an already pre-trained model, trained over ImageNet dataset.
In the process of model building, the total image used to train the model is partitioned into different
test options for training, validation and testing respectively, and, total of 3120 image used for six
class. Accordingly, two distinct convolutional neural network methods were applied to develop
classification models. The first cases, training from scratch (i.e. setting different hyperparameters
into the network from the scratch as discussed in chapter three hyperparameters section 3.3.6.1).
Secondly, transfer learning technique (i.e. through loading VGG16 and VGG19 pre-trained
model), then fine-tuning theses networks, finally build a new build. In this study, classification of
a coffee bean image based on their growing region has developed using one of the deep learning-
based approaches with convolutional neural networks.
The proposed model is implemented with Python 3.7.0 based on deep learning framework for
python called Keras [73] with Tensorflow backend and HP with Intel Core i7-7020@3.20GHz
processor with 8 GB of RAM, and 64-bit operating system type . In addition to Keras [73], and
Tensorflow, Flask which is a python framework for web application development, HTML, CSS
77
and JavaScript were used as backend(Flask) and frontend (HTM, CSS and JavaScript) to
production.
78
Figure 4. 2 Training and Test Loss for experiment one
Predicted Label
Types of Gujji Jimma Kaffa Nekempti Sidamo Yirgacheffe
coffee
Gujji 8 0 0 0 16 10
Actual Label
Jimma 1 45 0 0 0 0
Kaffa 0 0 38 2 0 0
Nekempti 0 0 0 53 0 0
Sidamo 0 0 0 0 10 20
Yirgacheffe 0 0 4 0 0 43
Performance Metrics
Types of Precision Recall F1-Score
coffee
Gujji 89% 24% 38%
Jimma 100% 98% 99%
Kaffa 90% 95% 92%
Nekempti 96% 100% 98%
Sidamo 38% 33% 35%
Yirgacheffe 59% 91% 72%
Weighted 79% 74% 72%
Average
We can see the training and validation accuracy and loss progress of the model after applied image
segmentation in figure 4.3. it’s gotten 99% training accuracy and validation accuracy is 98%. After
training and validation, the model was evaluated using 936 images and it’s gotten 97.8% test
accuracy. Since the dataset given to the model after watershed segmentation minimizing feature
learning process by removing unwanted part from the images.
Predicted Label
Types of Gujji Jimma Kaffa Nekempti Sidamo Yirgacheffe
coffee
Gujji 157 0 0 0 1 0
Actual Label
Jimma 0 169 0 5 0 0
Kaffa 1 2 159 0 0 0
Nekempti 0 0 0 141 0 0
Sidamo 9 0 0 0 140 0
Yirgacheffe 0 1 0 1 0 150
Performance Metrics
Types of Precision Recall F1-Score
coffee
Gujji 94% 99% 97%
Jimma 98% 97% 98%
Kaffa 100% 98% 99%
Nekempti 96% 100% 98%
Sidamo 99% 94% 97%
Yirgacheffe 100% 99% 99%
Weighted 98% 98% 98%
Average
80
Figure 4. 3 Graph of training and Test accuracy for experiment two
We can see figure 4.3 the training progress show increase in the training accuracy and
simultaneously decrease in the loss as the number of epochs increases. During the training and
test, the loss is the summation of error for each sample in the training and test sets. The lower the
loss, the better the model and recognition result. As we can see the result in figure 4.3, the
classification accuracy of training and test is better than experiment one. Training and test loss are
also decreased from epoch to epoch. Therefore, we can say that our model generalization capability
became much better since Training accuracy is better than test accuracy and test loss is greater
than training loss. In this experiment there is no model overfitting.
81
4.5.1.3. Experiment 3: After applying Otsu Thresholding Segmentation Algorithm
Thresholding is a significant part of image segmentation to make binary images. Binary image
analysis is useful for image feature extraction and it shortens the computation of geometrical
features of an image. Hence, for this research work, we have used thresholding-based segmentation
as it is simple and computationally inexpensive [25]. During conducting the experiment, from total
dataset 70% was used for training the model and the remaining used to test the trained model.
As we shown in figure 4.5, both accuracy and loss indicates the model have some overfitting in
which the training accuracy is less than test accuracy and test loss is less than training loss starting
from epoch 1 up to 19. But, after the epoch number 20 the performance of the model after
application of thresholding-based image segmentation, the training and test accuracy are increased
parallel. The highest training accuracy is obtained at the epoch 44 and 47 which is 98.4% and test
loss is 0.03 at the epoch value 44. And also testing accuracy 97.1% and test loss is 0.17.
Predicted Label
Types of Gujji Jimma Kaffa Nekempti Sidamo Yirgacheffe
coffee
Gujji 153 1 0 0 4 0
Actual Label
Jimma 2 172 0 0 0 0
Kaffa 0 0 162 0 0 0
Nekempti 0 1 0 140 0 0
Sidamo 5 3 1 1 139 0
Yirgacheffe 4 1 2 0 2 143
Performance Metrics
Types of Precision Recall F1-Score
coffee
Gujji 93% 97% 95%
Jimma 97% 99% 98%
Kaffa 98% 100% 99%
Nekempti 99% 99% 99%
Sidamo 96% 93% 95%
Yirgacheffe 100% 94% 97%
Weighted 97% 97% 97%
Average
82
Figure 4. 5 Trained Model Sample Screenshot of experiment three
Also we conducted the other experiment by using 80% split for training the model. The result is
actually the same, but this one shown some increment in model training accuracy as shown below
table 4.6. During this percentage split the number of dataset used for training the model is 2496
and the remaining for test. As we shown in the table 4.6 below, the performance of the model after
application of thresholding-based image segmentation, the training and validation accuracy are
increased parallel. The highest training accuracy is obtained at the epoch 44 which is 98.64% and
training loss is 0.03 at the epoch value 44. And also testing accuracy 97.5% and test loss is 0.14.
Performance Metrics
Types of Precision Recall F1-Score
coffee
Gujji 97% 98% 98%
Jimma 98% 99% 99%
Kaffa 95% 99% 97%
Nekempti 99% 97% 98%
Sidamo 98% 96% 97%
Yirgacheffe 99% 96% 98%
Weighted 98% 98% 98%
Average
83
Figure 4. 6 Graph of training and Test accuracy for experiment three
84
even though it didn’t win ILSVRC, it took the 2nd place showing nice performance. we only need
6 categories of images, so we though VGG19 is enough for our dataset. Second, VGG19
architecture is very simple. If you understand the basic CNN model, you will instantly notice that
VGG19 looks similar. Third, we have 8GB memory. This one is not the best choice, but we thought
it would be enough to run VGG19 even though VGG19 is a big in size. Lastly, since a lot of people
Predicted Label
Types of Gujji Jimma Kaffa Nekempti Sidamo Yirgacheffe
coffee
Gujji 71 0 0 1 28 0
Actual Label
Jimma 0 116 0 0 0 0
Kaffa 0 0 114 0 0 0
Nekempti 0 0 0 93 0 0
Sidamo 3 0 3 0 87 0
Yirgacheffe 0 0 4 3 16 86
85
Table 4. 8 classification report of VGG19
Performance Metrics
Types of Precision Recall F1-Score
coffee
Gujji 96% 71% 82%
Jimma 100% 100% 100%
Kaffa 94% 100% 97%
Nekempti 96% 100% 98%
Sidamo 66% 94% 78%
Yirgacheffe 100% 79% 88%
Weighted 92% 91% 91%
Average
We also conducted the experiment by splitting the dataset by 70% for training and 30% for testing.
In this case the training dataset is 2184 and 936 is for testing the trained model. Before conducting
the experiment all parameters were adjusted without any customization of the pre-trained network.
86
Finally, we got 85.7% training accuracy and 85.8% test accuracy as well as the training loss is
0.33% and test loss is 0.26%.
We were tried to conduct the experiment through changing the percentage split value from 10% to
20% for test set. In this case, the training accuracy is 89.82% and 88.4% the testing accuracy as
well as loss is 0.98 and 0.05 for training and test respectively. When compared to the pervious one
which is 10% percentage split this one increased with 0.04 training accuracy and 0.7 for validation
accuracy.
87
Our proposed model is used deep CNN feature learning process. Because the depth (number of
layers) is higher.
Above table 4.9 indicates, the proposed model after applying watershed algorithm with the test
accuracy of 97.8% was gotten. The proposed model has no model overfitting compared with pre-
trained model.
Table 4.10 shows that the proposed method has better performance than the related works method
which presented a system that done with machine learning techniques to create a model that can
classify coffee bean. The overall experimental evaluation, conducted through the performance
measure of coffee bean classification shows good result. To minimize computational resource, we
applied two image segmentation algorithm to get the region of interest. As we can see, the
immediate input fed into our proposed classification algorithm is the result of classifying coffee
bean into distinct region. Though, human visual inspection is invaluable in determining the class
of coffee bean, false estimations might also occur as bias on or loosing concentration are the natural
88
behavior of human being. Our algorithm to classify was tested using sample data selected from the
dataset. On top of that, comparison of the proposed automated approach has performed better with
respect to the manual system. As can be seen in Table 4.10, the performance of coffee bean
classification model achieved 97.8%, which is a promising result.
89
provide a value for each checklist. Thus, this method helps us to manually examine user acceptance
based on the evaluator’s response.
Very Good
Performan
Excellent
Average
Average
Prototype
Good
score
ce %
Poor
Fair
1 Usability 0 0 2 2 1 3.8 76
2 Efficiency and effectiveness 0 0 0 3 2 4.4 88
of the system
3 Attractiveness of the 0 1 2 2 4.2 84
prototype
4 Simplicity to use 0 0 0 2 3 4.6 92
5 Accuracy 0 0 1 3 1 4 80
6 Error Tolerance 0 0 0 2 3 4.6 92
7 Importance of the system 0 0 0 3 2 4.4 88
Total Average 4.28 85.7%
As shown in table 4.11, 40 % of the evaluators assessed the prototype system usability as Good,
40% gave it a Very Good rating and the remaining 20% rated it as Excellent. In the second criteria
of evaluation, the prototype effectiveness and efficiency 40% of the evaluators gave it an Excellent
rating and 60% gave it a Very Good rating. In the third category, which is Attractiveness of the
prototype, 40% of the evaluators gave it an Excellent rating, 40% gave it a Very Good rating, and
20% gave it a Good rating. In fourth phase, 60% of respondents assessed its simplicity to use and
interaction with its evaluation criteria as Excellent, while 40% ranked it as Very Good.
Related to accuracy of the developed system, 20% of the evaluators gave it an Excellent rating,
60% gave it a Very Good rating, and 20% gave it a Good rating. On the other hand, 60% of
respondents assessed the system's capacity to prevent errors as Excellent, and 40% of respondents
ranked it as Very Good. The final evaluation criterion is Importance of the system, which was
scored as Very Good by 60% of the evaluators and with 40% of replies rated as Excellent. Finally,
according to the domain experts' evaluation results, the prototype system's average performance is
90
4.28 out of 5. This result indicates that the coffee bean classification prototype overall average
performance is 85.7%, which is above Very Good.
Question #1: Which method is best from convolutional neural network algorithm for classification
of Ethiopian coffee bean?
In this study, we applied two convolutional neural network training methods namely, training from
scratch and transfer learning or pre-trained models which are developed by different researchers
for various situation. Accordingly, we were designed one our own convolutional neural network
architecture for the experiment using training from scratch by making a difference in parameters
and its corresponding value. Although, we applied two segmentation algorithms namely watershed
and binary thresholding by using the proposed convolution neural network architecture. Secondly,
two pre-trained models were adopted. Comparatively, training from the scratch with watershed
segmentation algorithm with the classification accuracy of 97.8% has given promising result for
the classification of the coffee bean based on their growing regions than transfer learning.
Therefore, we highly recommend that, applying segmentation on convolutional neural network
will help the model to easily identify the high level features from the given images.
Question #2: Which segmentation algorithm is best for classification of Ethiopian coffee bean?
In this study, the researcher applied two different image segmentation algorithm to make the
convolutional neural network easily learn the important features from the given images. Currently,
different researchers applied Otsu thresholding and watershed segmentation algorithm [25] in
order to find the region of interest and to minimize computation time and space in deep learning.
As we have seen applying segmentation helps us it easily differentiates background from
91
foreground. Comparatively, watershed algorithm with proposed convolutional neural network
architecture achieved better result with the performance accuracy of 97.8% than Otsu thresholding.
Question #3: To what extent the deep learning algorithm classify the coffee bean images?
Developing classifier model is not enough. After developing classifier model you have to
recognize the model predicting ability through comparing with previous related work. In this study,
the performance of the developed classifier model was evaluated using a confusion matrix test.
Besides, the most widely used classifier model performance evaluation metrics of precision, recall,
and F-measure also applied. Comparatively, training from the scratch with watershed segmentation
algorithm with the classification accuracy of 97.8% has given promising result for the
classification of the coffee bean based on their growing regions than transfer learning.
92
Generally, look at the following steps how to develop web application using flask:
Step1: import the different module specifically, load model module for loading trained model,
Flask module, Request module to send the requests and render template module to display the
designed HTML template
Step2: specify Flask name using flask function or method
Step3: specify the path of trained model and load the model by using load_model method which
embedded in load_model module.
Step4: define the method which have variables in which the first variable specifies path of image
and the second variable specifies the path of trained model. After specifying the path different
image preprocessing and image segmentation which are stated in proposed architecture is applied.
Step5: IF-Else IF conditional statement is used to predict the classes of coffee bean.
Step 6: define @app. route method used to specify the way how to display the HTML page and
define how to render the template.
Step 7: finally, use app. Run method in order to execute the written program. During execution
don’t forget specifying the value for port and debug variables.
User Interface
User interface is the means of communication with the end user that enable to interact easily with
the designing skin lesion classification system. The end user of the system can start the web
Figure 4. 10 Command prompt window to launch Flask server and to get IP address
application by write the URL of the coffee bean classification system on any web browser, after
that the web browser send HTTP request to the web server and then the web server gives back
HTTP response
Figure to browser.
4. 11 Command prompt window to launch Flask server and to get IP address
Figure 4.13 shows that the user must enter the URL of coffee bean prediction system which is
http://127.0.0.1:5000/ to get the home page.
93
Figure 4. 12 Home page of proposed classification system
After end user enter the URL of coffee bean prediction system the home page displayed in above
fig. After the interface illustrating in figure 4.15 displayed the user can click predict button and
import the coffee bean image to know the category of coffee. Once the user import image the
image will be displayed and the button called predict will be added in the interface. Figure 4.16
below shows the user import the image from local disk.
94
Figure 4.17 shows the modified Home page of green coffee bean classification system after
importing coffee bean image. Here the figure 4.17 below indicates that, the user imported coffee
bean image.
After the interface illustrating in figure 4.17 displayed the user can click predict button and see the
classification result which contain the predicted name of coffee class and shows the sample result
to the given coffee bean image.
95
In this study, we tried to test all classes for the sake of documentation we were documented two
sample screenshot.
96
CHAPTER FIVE
CONCLUSION AND RECOMMENDATION
5.1. Conclusion
Coffee is a commercial commodity which, among Ethiopia’s export commodities, plays a major
role in earning foreign currency. Due to its importance in commercial activities, the sub-sector
attracts governmental and nongovernmental attention. During current periods, the brand patent
development of each coffee variety based on growing region was a problem. For instance, one of
these issues has been the recent controversy over Starbucks’ Yirgacheffe coffee brand [12].
Automated classification systems for agricultural products are proven to be less costly, efficient
and non-destructive. Accordingly, this research has focused on using image processing techniques
and approaches to classify raw quality value of sample coffee beans.
In this research, the methodology was a design science research approach that grows from
relevance and rigor. Therefore, the researcher has tried to follow relevance for identifying
problems, opportunities that exist in the business, and applicable knowledge that allows us to
develop an artifact from rigor.
We proposed convolutional neural network, which is recently achieving promising accuracy in
image processing and due to its automatic feature extraction. Image preprocessing and image
segmentation was applied to improve the proposed model classification performance. Based on
their quantity or production amount and availability of coffee bean in ECQIAC we were limited
to conduct this study for six classes and the others are beyond our scope. In the classification
problem of Ethiopian coffee based on growing region, morphological and color features were
automatically extracted from a coffee bean images taken from six regions of Ethiopia – Gujji,
Jimma, Kaffa, Nekempti, Sidamo and Yirgacheffe – by using image analysis techniques520
images per class and totally, 3120 sample coffee bean images are collected from ECQIAC to
conduct this research.
To perform this analysis a deep learning-based techniques was used. Specifically, proposed CNN
architectures and transfer learning CNN techniques have been used to create a model that can
classify coffee bean based on their region of origin. Also, the augmented images were provided to
mitigate the overfitting problem during training the model, we have generated another image
dataset by modifying the original images. VGG-16 and VGG-19 pre-trained models utilized and
97
the last layers of theses model are modified as this study problem and softmax layer and fully
connected layers are added to the network architectures. Accordingly, to develop a classification,
model 80%:20%, 70%:30%, and 90%:10% percentage split test options have been utilized and a
confusion matrix and classification report to visualize the model performance.
In order to produce computationally effective and higher accuracy in the CNN model, we have
applied image segmentation separate part of image or region of interest from the background of an
image. The result demonstrated are satisfactory and, the proposed model can obtain a classification
test accuracy of 97.8 % which is higher than the recognition ability of other states of the art CNN
architecture that are selected to compare VGG16 and VGG19. And finally, developed CNN model
was test through developing user interface and achieved 85.7% user acceptance accuracy.
Secondly, improved performance and four new classes are different from previous coffee bean
classification by considering different parameters, the previous coffee bean classification achieved
77.4% accuracy these fill to local minima but in our work, we have 97.8% test accuracy.
5.3. Recommendation
Based on the investigation and findings of the study, we recommended for future and further
research works:
To improve the performance of the model, future works need to integrate increasing and
cleaning the dataset by continuing to collect more from the field. Future studies are need
to examine the CNN model by restructuring the layers with deep hidden layer with better
dataset size, to examine the CNN model by with better architecture like GoogLeNet and
ResNet with large dataset.
98
Due to different constraints like budget and time we didn’t incorporated various coffee
bean varieties such as; - Harar, Wollega and more. So, we highly recommend this as future
research direction.
Image processing technology has grown significantly over the past decade. The developed
model only applicable on web or desktop. However, its application on low-power mobile
devices has been the interest of a wide research group related to newly emerging contexts.
With the emergence of general-purpose computing on embedded GPUs and their
programming models like OpenGL ES 2.0 and OpenCL, mobile processors are gaining a
more parallel computing capability. So, an advanced image-processing mobile application
for coffee bean classification can be recommended as further research work.
99
References
[2] S. Ponte, "The ‘Latte Revolution’? Regulation, Markets and Consumption in the Global
Coffee Chain," Web Development, 2002.
[3] H. Desta, "Development of Automatic Sesame Grain Classification and Grading System
Using Image Processing Techniques," 2017.
[4] M. Mogese, "Connecting Ethiopian Coffee to Sustainable Market," March 2016. [Online].
Available: https://etbuna.com/ethiopian-coffee/ethiopian-coffee-processing/. [Accessed 22
February 2021].
[8] A. S. Getachew, "Automatic Skin Lesion Classification in Dermoscopic Images using Deep
Neural Networks," College of Software Nankai University, China, May, 2019.
[10] S. R. Karan Chauhan, "Image Classification with Deep Learning and Comparison between
Different Convolutional Neural Network Structures Using Tensorflow and Keras,"
International Journal of Advance Engineering and Research Development, vol. 5, No. 02,
February -2018.
[11] A. R. Baleker, "Raw Quality Value Classification of Ethiopian Coffee Using Image
Processing Techniques: In the case of Wollega region," Addis Ababa University, Addis
Ababa, 2011.
[14] M. A. a. T. T. Ken Peffer, "A Design Science Research Methodology for Information
Systems Research," Journal of Management Information Systems, January, 2018.
100
[15] F. Alemu, "Assessment of the Current Status of Coffee Diseases at Gedeo and Sidama zone,
Ethiopia," International Journal of Advanced Research, vol. 1, No. 8, 2013.
[17] mekuria, "The Status of Coffee Production and The Potential for Organic Conversion in
Ethiopia," in International Agricultural Research For Development, 2004.
[19] "Food and Agriculture Arganization in the United Nation," [Online]. Available:
www.fao.org. [Accessed 22 february 2021].
[21] W. J. Boot, "Practical Guideline for Purchasing and Importing Ethiopian Specialty Coffee
Bean," in United States Agency for International Development, Addis Ababa, March 2011.
[24] T. B. Chris Solomon, Fundamentals of Digital Image Processing, Wiley Blackwell, 2011.
[25] D. A. Worku, "Automatic Flower Disease Identification Using Deep Convolutional Neural
Network," February 2020.
[26] G. Tigistu, "Automatic Flower Disease Identification Using Image Processing," Addis
Ababa University, Addis Ababa, 2015.
[27] S. N. a. B. Patel, "Machine Vision based Fruit Classification and Grading - A Review,"
International Journal of Computer Applications, vol. 170, International Journal of Computer
Applications.
[28] A. Solomon, "Automatic Skin Lesion Classification in Dermoscopic Images using Deep
Neural Network," Nankai University, May, 2019.
[29] Abdalla Mohamed Hambal, Dr. Zhijun Pei, Faustini Libent Ishabailu, "Image Noise
Reduction and Filtering Techniques," International Journal of Science and Research, vol. 6,
no. 3, March 2017 .
101
[30] "Statistical Approach to Compare Image Denoising Techniques in Medical MR Images,"
International Conference on Pervasive Computing Advances and Applications, p. 367–374,
2019.
[31] A. Buades, B. Coll, and J. M. morel, "A Review of Image Denoising Algorithms, with A
New," Society for Industrial and Applied Mathematics, vol. 4, no. 2, 2010.
[32] Kumar, A., & Sodhi, S. S., "Comparative Analysis of Gaussian Filter, Median Filter and
Denoise Autoenocoder," International Conference on Computing for Sustainable Global
Development , 2020.
[35] A. B. Mekonnen, "A Deep Learning-Based Approach for Potato Leaf Diseases
Classification," Debre Berhan University, June 2020.
[37] A. Bala, "An Improved Watershed Image Segmentation Technique using MATLAB,"
International Journal of Scientific & Engineering Research, vol. 3, no. 6, 2012.
[38] J. Yousef, "Image Binarization using Otsu Thresholding Algorithm," University of Guelph,
Ontario, Canada, April 18, 2011.
[40] D. p. Tian, "A Review on Image Feature Extraction and Representation Techniques,"
International Journal of Multimedia and Ubiquitous Engineering, vol. 8, July, 2013.
[41] T. A. a. S. Mansi, "Feature Extraction for Object and Image Classification," International
Journal of Engineering Research and Technology , vol. 2, pp. 1238-1246, 2013.
[44] A. Habeeb, "Artificial intelligence," Discover the world's scientific knowledge, June, 2018.
102
[45] D. A. Rosebrock, Deep Learning for Computer Vision with Python, Pyimagesearch, 2017.
[47] S. Pouyanfar, "A Survey on Deep Learning: Algorithms, Techniques, and Applications,"
ACM Computing Surveys, vol. 51, no. 5, 2018..
[49] N. K. Manaswi, Deep Learning with Applications using Python, Apress, 2018.
[51] V. D. a. F. Visin, A Guide to Convolution Arithmetic for Deep Learning, January, 2018.
[52] J. Gu, "Recent Advances in Convolutional Neural Networks," Elsevier, October 2017.
[53] A. K. a. S. P. Antonio Gulli, Deep Learning with TensorFlow 2 and Keras, Packt, December
2019.
[55] C. S. a. T. M. Khoshgoftaar, "A Survey on Image Data Augmentation for Deep Learning,"
Journal of Big Data, 2019.
[57] D. Soydaner, "A Comparison of Optimization Algorithms for Deep Learning," International
Journal of Pattern Recognition and Artificial Intelligence, 08 May 2020.
[58] A. Saha, "ADAM (Adaptive Moment Estimation) Optimization | ML," 14 January 2020.
[Online]. Available: https://www.geeksforgeeks.org/adam-adaptive-moment-estimation-
optimization-ml/. [Accessed 08 December 2020].
103
[61] D. M. A. a. s. G. M. Abrham Debasu Mengistu, "Ethiopian Coffee Plant Diseases
Recognition Based on Imaging and Machine Learning Techniques," International Journal
of Database Theory and Application, vol. 9, pp. 79-88, (2016.
[63] S. G. a. R. Patil, "Deep Learning for Image Based Mango Leaf Disease Detection,"
International Journal of Recent Technology and Engineering, vol. 8, November 2019.
[64] V. A. Patel, "Convolutional Neural Network with Transfer Learning for Rice Type
Classification," ·September 2017.
[65] D. T. Ayane, "Automatic Plant Species Identification Using Image Processing Techniques,"
Addis Ababa University, October, 2018.
[70] F. Schilling, "The Effect of Batch Normalization on Deep Convolutional Neural Networks,"
KTH Royal Institute of Technology, Sweden, 2016.
[71] B. Lake, "Mobile Based Expert System for Diagnosis of Cattle Skin Diseases with Image
Processing Techniques," Addis Ababa University, October 2019.
104
[75] F. A. A. a. H. N. Mohammed, "Efficient Way of Web Development Using Python And
Flask," Reference International Journal of Advanced Research in Computer Science , vol. 6
, no. 2, 2015.
[77] S. Ponte, "The ‘Latte Revolution’?Regulation, Markets and Consumption in the Global
Coffee Chain," World Development, 2002.
105
Appendix A:
Interview Questions
Dear Respondents;
First of all, I would like to thank you for your willingness to present for the interview and to share
the knowledge. My name is Gebreyes Gebeyehu and I’m studying master of science in information
systems in Debre Berhan University. I’m conducting my research work on developing image based
coffee bean classification based on their growing region. To build the above model successful, as
a researcher I should have to get the answer for the following questions provided below.
106
Appendix B:
# -*- coding: utf-8 -*-
"""
@author: UNKNOWN
""
import cv2
import numpy as np
import matplotlib.pyplot as plt
import os
import itertools
import keras
os.environ['KERAS_BACKEND'] = 'tensorflow'
###Reading Images
data_path = 'A:/Y/Dataset/normal/'
dataset = []
label = []
gujji = os.listdir(data_path + 'G_filtered/')
for i, image in enumerate(gujji):
input_img = cv2.imread(data_path + 'G_filtered/' + image)
#input_img = cv2.cvtColor(input_img, cv2.COLOR_BGR2GRAY)
input_img = cv2.resize(input_img, (224,224))
dataset.append(input_img)
label.append(0)
img_data = np.array(dataset)
img_data = img_data.astype('float32')
img_data/=255
img_data = img_data.reshape((-1, 224, 224, 1))
jimma = os.listdir(data_path + 'J_filtered/')
for i, image in enumerate(jimma):
input_img = cv2.imread(data_path + 'J_filtered/' + image)
#input_img = cv2.cvtColor(input_img, cv2.COLOR_BGR2GRAY)
input_img = cv2.resize(input_img, (224,224))
dataset.append(input_img)
label.append(1)
img_data = np.array(dataset)
img_data = img_data.astype('float32')
107
img_data/=255
img_data = img_data.reshape((-1, 224, 224, 1))
#img_data = img_data.astype('float32')
#img_data/=255
kaffa = os.listdir(data_path + 'K_filtered/')
for i, image in enumerate(kaffa):
input_img = cv2.imread(data_path + 'K_filtered/' + image)
#input_img = cv2.cvtColor(input_img, cv2.COLOR_BGR2GRAY)
input_img = cv2.resize(input_img, (224,224))
dataset.append(input_img)
label.append(2)
img_data = np.array(dataset)
img_data = img_data.astype('float32')
img_data/=255
img_data = img_data.reshape((-1, 224, 224, 1))
nekempti = os.listdir(data_path + 'N_filtered/')
for i, image in enumerate(nekempti):
input_img = cv2.imread(data_path + 'N_filtered/' + image)
#input_img = cv2.cvtColor(input_img, cv2.COLOR_BGR2GRAY)
input_img = cv2.resize(input_img, (224,224))
dataset.append(input_img)
label.append(3)
img_data = np.array(dataset)
img_data = img_data.astype('float32')
img_data/=255
img_data = img_data.reshape((-1, 224, 224, 1))
sidamo = os.listdir(data_path + 'S_filtered/')
for i, image in enumerate(sidamo):
input_img = cv2.imread(data_path + 'S_filtered/' + image)
#input_img = cv2.cvtColor(input_img, cv2.COLOR_BGR2GRAY)
input_img = cv2.resize(input_img, (224,224))
dataset.append(input_img)
label.append(4)
img_data = np.array(dataset)
img_data = img_data.astype('float32')
img_data/=255
img_data = img_data.reshape((-1, 224, 224, 1))
yirga = os.listdir(data_path + 'Y_filtered/')
for i, image in enumerate(yirga):
input_img = cv2.imread(data_path + 'Y_filtered/' + image)
#input_img = cv2.cvtColor(input_img, cv2.COLOR_BGR2GRAY)
input_img = cv2.resize(input_img, (224,224))
dataset.append(input_img)
label.append(5)
### Changing into numpy array and Normalizing the input data
108
img_data = np.array(dataset)
img_data = img_data.astype('float32')
img_data/=255
img_data = img_data.reshape((-1, 224, 224, 1))
yes = Sequential()
input_shape = (224, 224, 1)
num_classes = 6
#input_img = (224,224,1)
yes.add(Conv2D(32,(3,3), padding = 'valid', input_shape = input_shape))
yes.add(Activation('relu'))
yes.add(MaxPool2D(pool_size=(2,2), strides=2))
yes.add(Dropout(rate = 0.2))
yes.add(Flatten())
yes.add(Dense(128, activation = 'relu'))
109
yes.add(Dropout(rate = 0.5))
yes.add(Dense(num_classes, activation='softmax'))
print(yes.summary())
###Data Transformation
110
epochs = 50
batch_size = 32
history = yes.fit_generator(datagen.flow(x_train,y_train, batch_size=batch_size),
epochs = epochs, validation_data = (x_validate, y_validate),
verbose = 1, steps_per_epoch=x_train.shape[0] // batch_size
, callbacks=callbacks_list)
loss, accuracy = yes.evaluate(X_test, Y_test, verbose=1)
loss_v, accuracy_v = yes.evaluate(x_validate, y_validate, verbose=1)
print("Validation: accuracy = %f ; loss_v = %f" % (accuracy_v, loss_v))
print("Test: accuracy = %f ; loss = %f" % (accuracy, loss))
#yes.save("model_without_seg_flow.h5")
111