Leaf Disease Detection On Cucumber Leaves Using
Leaf Disease Detection On Cucumber Leaves Using
Abstract—In India, smart organic farming is gaining impor- in the production of cucumbers are Karnataka, Tamil Nadu,
tance. There may be problems due to environment, temperature, and Andhra Pradesh. Karnataka grows 60 percent of India’s
humidity or nutrient deficiency in this farming. If we have a cucumbers, with the other two states accounting for 20 percent.
monitoring system for this farming it is possible to produce
healthy plant. The aim is to address this issue using computer The demand for cucumber in overseas is greater than the
aided image processing technique. Main solution is to create an demand here domestically. The health benefits of Cucumber
automation system which can detect the disease present in the leaf are as follows:
of the plant. In this paper, a first level attempt is made to detect • Vegetable with low calorie.
diseases present in the leaf of salad cucumber. The most common
• Good source of dietary fiber.
diseases which are present in salad cucumber are Alternaria
leaf blight, Bacterial wilt, Cucumber green mottle mosaic, Leaf • Very good source of potassium.
Miner, Leaf spot, Cucumber Mosaic Virus (CMV) disease and so • Contains unique anti-oxidants.
on. K-means clustering, an unsupervised algorithm along with • Have mild diuretic property.
Support Vector Machine(SVM) is used in this work to address • Have a high amount of vitamin K.
this problem.
Index Terms—k-means clustering, Multiclass SVM, Image The rest of the Section is organized as follows. The related
processing. works are discussed in Section II. In Section III, a brief
description of diseases present in cucumber leaf is given. The
I. I NTRODUCTION methodology is explained in Section IV. The discussion on
the results is carried out in Section V. The future scope of
Weather and other environmental conditions cannot be con- this work is discussed in the last Section.
trolled by farmers. Disease mitigation is the prime factor to
be considered in the case of farming practices. Immediate II. R ELATED W ORKS
attention has to be given to crops which gets affected by Researches have been carried out to use digital image
pest or disease. If proper monitoring of leaves are made at processing techniques in the agriculture field. This section will
the beginning stage itself, further spread of the disease in the discuss on the existing methods.
plants can be avoided. On observation by naked eyes it is Chaudhary, Piyush, et al. [3] segmented the region of
difficult to recognize the disease in leaf. This might result in disease from the RGB image by performing color transform.
the wrong application of the pesticides and ultimately it results Here comparisons are done on three color spaces. They are
in the crop failure. YcbCr, CIELAB and HSI color spaces. Median filter is used
Due to several factors in the environment there are several for image smoothening and this resulted in noise free algo-
diseases which are affecting the crops. This leads to reduction rithm. The experiment was conducted in dicot and monocot
in the quality and productivity of the plants. Hence an auto- leaves. Rothe, P. R., and R. V. Kshirsagar [4] extracted colour
matic system which can detect diseases is mandatory since layout descriptors and classified using Neural Networks. This
this system will be useful in monitoring the crops and hence experimentation was done in cotton leaves. Revathi, P., and
immediate actions can be taken. M. Hemalatha [5] used edge detection method for detection
It is found that image processing techniques will give of diseases. The features that were used for the analysis
successful results in disease detection [1]. The machine learn- purpose were color, boundary, shape and texture which were
ing methodologies for classification purpose, and k-means taken from the disease spots. D.S. Guru, P.B. Mallikarjuna
clustering can be used in agricultural researches like disease and S. Manjunath [6] used gray-level co-occurrence matrix
detection. Earlier this kind of disease detection has been (GLCM) for feature extraction purpose and diseases in to-
carried out in cotton [1] and paddy leaves [2]. bacco leaves were determined. Schikora, Marek, and Adam
In the total cucumber production, India occupies 30th po- Schikora [7] proved that the performance of SVM were
sition and it accounts for less than one percent of the world’s better than Neural Network for classification. Jian, Zhang,
supply. Entire year it is seen that cucumber is available on and Zhang Wei [8] considered the entire leaf instead of
a cheaper cost. The warm climate in India is suitable for diseased portion. The tests were conducted using Radial Basis
development of cucumbers. The three main states involved Function (RBF) and it gave better performance using Support
978-1-5090-4442-9/17/$31.00 2017
c IEEE 1276
This full-text paper was peer-reviewed and accepted to be presented at the IEEE WiSPNET 2017 conference.
1277
This full-text paper was peer-reviewed and accepted to be presented at the IEEE WiSPNET 2017 conference.
Fig. 5. Flowchart.
1278
This full-text paper was peer-reviewed and accepted to be presented at the IEEE WiSPNET 2017 conference.
The points (r1 , s1 ) and (r2 , s2 ) determine the shape of trans- underlying characteristics can be determined using this texture
formation structure and produces various degree of spread in analysis. The statistical texture features are calculated using
the intensity levels of the output image [15]. Gray level cooccurrence matrix (GLCM). The four main
features which are taken from GLCM for image analysis
C. Color Space Conversion are Contrast, Correlation, Energy and Homogeneity. Apart
This conversion is necessary because RGB color space is from this the other statistical features such as mean, standard
highly device dependent. Also it depends on the light inten- deviation, Entropy, RMS, Variance, Smoothness, Kurtosis, and
sity [16]. Image processing and analysis operations especially Skewness are also taken from the image. Choosing of statisti-
those involving color differences are easier to perform in cal parameter depends on requirements in output image [19].
L ∗ a ∗ b color space than in RGB. Also L ∗ a ∗ b space Consider p(i, j) as the element of co-occurrence matrix. It
is a device independent space. Initially color transformation represents the probability of moving from a pixel with gray
structure is created and then the structure is applied on the level i to a pixel with gray level j. Then parameters from
image to perform the conversion. The advantage of L ∗ a ∗ b GLCM can be represented by the below equations.
color model is that it is designed to approximate the human 1) Contrast: Measures local variations of gray levels
perception of light. L ∗ a ∗ b also separates the chrominance present in the image.
information than other models.
Contrast = |i − j|2 p(i, j). (3)
D. K Means Clustering i,j
The colors in a ∗ b color space is segmented using k-means 2) Correlation: Determines correaltion of a pixel to its
clustering which is an unsupervised learning algorithm [17]. neighbour.
The number of clusters is fixed in advance. In our case k = 3
is chosen and the three clusters will have: the background, (i − μi )(j − μj )p(i, j)
Correlation = . (4)
normal green leaf and the diseased portion of leaf. For each i,j
σi σj
cluster centroid is fixed. Placing the centroid is a crucial task.
The ideal way is to keep them away from one another. Each 3) Energy: Provides sum of squared elements in GLCM
data point is associated with the appropriate centroid. All the
Energy = p(i, j)2 . (5)
points in the data is computed and first stage grouping is done.
i,j
In further steps, new centroids are determined from previous
step. Now with this new centroids the data point matching is 4) Homogeneity: Measures the closeness of distribution of
performed. This gets repeated in loop, until the location of k elements in GLCM to GLCM diagonal.
centroids cannot be changed anymore. The algorithm aims in p(i, j)
minimizing the squared error function. Homogeneity = . (6)
i,j
1 + |i − j|
k
n
(j)
J= ||xi − cj ||2 (2) 5) Entropy: The amount of information which must be
j=1 i=1 coded is described by a quantity called as Image entropy.
(j)
where: J = error function xi = data point cj = cluster Entropy = − Pi log2 Pi . (7)
(j)
center ||xi − cj ||2 = squared distance between data point
and cluster. The algorithm can be explained in the following Pi is the probability that the difference between 2 adjacent
steps: pixels is equal to i.
6) Average: Consider a region R. a[m, n] represents the
1) Initial k centroids are placed in the space represented by
pixel brightness. So sample mean of this over ∧ pixels will
the objects that are being clustered
give the average brightness ma in that region
2) The closest centroid to each data point is chosen and then
data points are associated with corresponding centroid. 1
ma = a[m, n] (8)
3) New k centroid points are recalculated. ∧
(m,n)∈R
4) The steps 2 and 3 has to be repeated until no new points
can be found. where m, n ∈ R.
After this three different clusters for the given input image 7) Standard Deviation (S.D): Consider the brightness of
is generated. The appropriate cluster in which disease segment pixels in region R of an image. The sample standard deviation
is present will be selected for feature extraction. (sa ), is the estimate of standard deviation of this brightness
and is given by:
E. Feature Extraction
1
The main aim of feature extraction is to minimize the sa = (a[m, n] − ma )2 . (9)
resources which are required to represent large set of data ∧−1
accurately [18]. When large variables are considered the The other statistical parameters as mentioned earlier were
classification algorithm performs poorly. Certain unique and also extracted from the diseased portion of the leaf.
1279
This full-text paper was peer-reviewed and accepted to be presented at the IEEE WiSPNET 2017 conference.
1280
This full-text paper was peer-reviewed and accepted to be presented at the IEEE WiSPNET 2017 conference.
1281