0% found this document useful (0 votes)
6 views11 pages

Normalization of Zones

Uploaded by

caio.gouveia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views11 pages

Normalization of Zones

Uploaded by

caio.gouveia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Computers and Electronics in Agriculture 143 (2017) 238–248

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture


journal homepage: www.elsevier.com/locate/compag

Normalization of data for delineating management zones MARK


a,⁎ b c c
Kelyn Schenatto , Eduardo Godoy de Souza , Claudio Leones Bazzi , Alan Gavioli ,
Nelson Miguel Betzekc, Humberto Martins Beneduzzid
a
Computer Science Department, Technological Federal University of Paraná, Rua Cerejeira s/n, 85892-000 Santa Helena, Paraná State, Brazil
b
Technological and Exact Sciences Center, West Paraná State University, Rua Universitária 2069, 85819-110 Cascavel, Paraná State, Brazil
c
Computer Science Department, Technological Federal University of Paraná, Av. Brasil 4232, 85884-000 Medianeira, Paraná State, Brazil
d
Analysis and Systems Development Department, Paraná Federal Institute, Av. Araucária 780, 85860-000 Foz do Iguaçu, Paraná, Brazil

A R T I C L E I N F O A B S T R A C T

Keywords: Management zones (MZs) are a viable economic alternative to variable-rate application (VRA) based on pre-
Standardization scription maps; however, unlike the latter, MZs can employ conventional machinery. The use of management
Fuzzy C-Means zones (MZs) is considered an economically viable alternative because of its low initial cost and high return in
Precision agriculture economic and environmental benefits. Data clustering techniques and the Fuzzy C-Means algorithm are the most
Smoothness index
widely used processes for delineating MZs. The most common similarity measurement used is Euclidean dis-
Variance reduction
Euclidean distance
tance; however, because the algorithm is sensitive to the range of the input variables, these variables are ty-
pically normalized dividing the value by the standard deviation, maximum value, average, or data set range. The
objective of this study was to assess the influence of data normalization methods for delineating MZs. The
experiment was conducted in three experimental fields with 9.9, 15.0, and 19.8 ha, located in Southern Brazil
between 2010 and 2014. The variables used for delineating MZs were selected using spatial correlation statistics
and data were normalized using methods of standard score, range, and average. The MZs were delineated using
the Fuzzy C-Means algorithm, which created two, three, and four clusters. The normalization methods were
evaluated by five indices (modified partition entropy [MPE], fuzziness performance index [FPI], variance re-
duction [VR], smoothness index [SI], and kappa), and ANOVA. It was found that when the MZs delineation uses
more than one variable with different scales in the clustering process using Euclidean distance, normalization is
required. The range method was considered the overall best normalization method.

1. Introduction the yield. Among the variables identified in the literature good poten-
tial to delineate temporally stable MZs are elevation (Bazzi et al., 2015;
The study of the spatial distribution of soil and plant variables is Fraisse et al., 2001; Jaynes et al., 2005; Peralta and Costa, 2013; Farid
important to the establishment of appropriate management zones (MZs) et al., 2016; Schepers et al., 2004), soil electrical conductivity (ECa) (Li
to be used in application of the fertilizer, soil management, and irri- et al., 2007;Farid et al., 2016), soil penetration resistance (Gavioli et al.,
gation. Appropriate MZs may maximize yield, while reducing costs and 2016), and soil texture (Farid et al., 2016).
minimizing potential environmental damage (Tilman et al., 2011; Li Techniques such as principal component analysis (PCA) (Bansod
et al., 2013; Bansod and Pandey, 2013; Hedley, 2015). and Pandey, 2013) and the Moran's bivariate spatial autocorrelation
A MZ is defined as a subregion of a field that exhibits similar statistic proposed by Czaplewski and Reich (1993), and used by Reich
combinations of yield-limiting factors (Tagarakis et al., 2013). This et al. (1994) and Bonham et al. (1995) can be used to create (when PCA
facilitates the application of precision agriculture (PA) techniques by is used) or select layers for delineation MZs. When there is more than
reducing the costs of its adoption and implementation, since MZs can one crop cultivated in the same field during the year, which is a
use constant rate equipment and may reduce the number of samples common practice in Brazil, normalizing yield data makes possible to
needed to characterize the soil nutrients availability. Delineating MZs is create a more representative variable (Bunselmeyer and Lauer, 2015) to
not a simple task because numerous variables may influence crop yield. be used in ANOVA and Tukey's test.
Considering that a MZ is often used for several years, the considered Several techniques to delineate MZs are proposed in the literature
variables should be temporally stable (Doerge, 2000) and correlated to (Pedroso et al., 2010; Xiang et al., 2007), however the most used is


Corresponding author at: Computer Science Department, Technological Federal University of Paraná, Rua Cerejeira, s/n – Bairro São Luiz, 85892-000 Santa Helena, Paraná State,
Brazil.
E-mail addresses: kschenatto@utfpr.du.br (K. Schenatto), humberto.beneduzzi@ifpr.edu.br (H.M. Beneduzzi).

http://dx.doi.org/10.1016/j.compag.2017.10.017
Received 28 February 2017; Received in revised form 18 October 2017; Accepted 21 October 2017
Available online 04 November 2017
0168-1699/ © 2017 Elsevier B.V. All rights reserved.
K. Schenatto et al. Computers and Electronics in Agriculture 143 (2017) 238–248

cluster analysis (Li et al., 2007; Iliadis et al., 2010). The most commonly
used clustering methods to delineate MZs are the K-means algorithm
(Rodrigues Jr. et al., 2011; Ortega and Santibañez, 2007) and fuzzy C-
means (Li et al., 2007,2013; Fu et al., 2010; Zhang et al., 2013; Moral
et al., 2010). This algorithm, that incorporates the theory of fuzzy logic
in the division algorithm, uses a weighting exponent to control the
degree of sharing between classes (Bezdek, 1981), allowing individuals
to exhibit partial adhesion in each of the classes, which is important
when dealing with the continuous variability of natural phenomena
(Burrough, 1989). Before a dataset can be formed, it is necessary to
establish an appropriate measure of similarity. Euclidean distance is
most regularly used; this measure gives equal weight to all measured
variables and is sensitive to correlated variables (Bezdek, 1981). In
geometrical terms, the Euclidean distance creates agglomerates having
a spherical shape, which rarely occur in a soil (Odeh et al., 1992).
Fridgen et al. (2004) reports that Euclidean distance should be used
only for statistically independent variables demonstrating equal var-
iances. In this sense, when the Euclidean distance is used to clustering,
the normalization data can be very important step before creating MZs.
The normalization methods such as Standard score or Z-score
method (Eq. (1)) has been used by many researchers for delineation of
MZs (Anderberg, 1973; Romesburg, 1984; Larscheid and Blackmore
1996; Stafford et al., 1996; Molin, 2002; Kitchen et al., 2005). This
method is used for transforming normal variables to standard score
where the transformed variable will have a mean of 0.0 and a variance
of 1.00.
(X −X )
Z=
s (1)
where X is the original data value; X is the sample average; and s is the
standard deviation.
Several researches reported the use of the average method (Eq. (2))
for delineation MZs (Stafford et al., 1996; Molin, 2002; Kitchen et al.,
2005) with the assumption that the average represents the dataset well;
however, the average is sensitive, can be modified by adding any
constant, and can easily change the distribution of the normalized data Fig. 1. Step-by-step flowchart of the methodology used to evaluate the normalization
(Anderberg, 1973). methods for delineation of MZ’s.
X
Z=
X (2)
Good results were also reported by Milligan and Cooper (1988), Paraná State, Brazil: Field A (15 ha), located in the municipality of Céu
Bazzi et al. (2013), Gavioli et al. (2016), and Schenatto et al. (2016) Azul (central geographical location of 25°06′32″S, 53°49′55″W, and
using the Range (Eq. (3)) normalization method. This method is average elevation of 460 m). Field B (9.9 ha) located in the municipality
bounded by 0.0 and 1.0 with at least one observed value at each of of Serranópolis do Iguaçu (central geographic location of 25°24′28″S,
these end points. The Min(X) value used in Eq. (3) can be changed for 54°00′17″W, and average elevation of 355 m) and Field C (19.8 ha)
Median(X) (Mielke and Berry, 2007) and have the same behavior be- located in the municipality of Cascavel (central geographic location of
cause Min(X) and Median(X) are constants and not change the data 24°57′08″S, 53°33′59″W, and average elevation of 650 m).
distribution. For the delineation of MZs, only variables considered temporally
X −Min (X ) stable collected between 2010 and 2014 (Table 1) were used, to meet
Z= the recommendation of Doerge (2000). To meet the constraints of
Max (X )−Min (X ) (3)
geostatistical analysis (Journel and Huijbregts, 1978) in terms of the
The goal of this study was to evaluate the performance of these minimum number of pairs (30) to calculate the semivariances of the
methods, frequently used in the data clustering process by the Fuzzy C- semivariogram, a dense sampling grid (Table 1) was used, with 2.7
Means algorithm to delineate MZs. points ha−1 for Field A, 4.2 points ha−1 for Field B, and 3.4 points ha−1
for Field C. The irregular sampling grids were defined taking into ac-
2. Materials and methods count an imaginary central line between the elevation contour lines of
each field (Fig. 2).
A step-by-step flowchart (Fig. 1) was created to show the metho- Elevation was determined with a total station (Topcon GPT-7505,
dology used. Topcon Corporation, Tokyo, Japan), and soil penetration resistance
(SPR) was determined with a soil penetrometer (penetroLOG PGL1020,
2.1. Datasets Falker Automação Agrícola, Porto Alegre, Brazil). Soil samples were
collected at a depth of 0–0.2 m and sent to the laboratory for analysis.
This research was conducted in three fields (Fig. 2) located in Soybean yield for Field A was determined with a yield monitor (AFS

239
K. Schenatto et al. Computers and Electronics in Agriculture 143 (2017) 238–248

a) Field A b) Field B c) Field C


Fig. 2. Experimental fields. All the fields had the same soil type, classified as Rhodic Ferralsol (Embrapa, 2006), and cultivated under a no-tillage system with the sequence of soybean,
wheat, corn, and oats in Field A and succession of soybeans and corn in Fields B and C.

Table 1
Identification of the types of variables and collection periods for each experimental field.

Variables Field A (40 samples) Field B (42 samples) Field C (68 samples)

2012 2013 2014 2012 2013 2014 2010 2011

SPR 0.0–0.1 m (Mpa) X X X X X X X


SPR 0.1–0.2 m (Mpa) X X X X X X X
SPR 0.2–0.3 m (Mpa) X X X X X X X
Elevation (m) X X X
Slope (°) X X X
Density (g cm−3) X X X
Sand (%) X X X
Silt (%) X X X
Clay (%) X X X
OM (%) X X X
Soybean yield (t ha−1) X X X X X X X X
Corn yield (t ha−1) X X

SPR – soil penetration resistance; OM – organic matter.

PRO 600, Case IH, Racine, USA) coupled to a harvester (CASE IH® they are referred as yield-based management zones or productivity
model 2388, Sorocaba, Brazil). For Fields B and C, yield was de- zones.
termined by hand-harvesting a sample (from an area of approximately
n n
1 m2) at each soil sampling point (42 points in field B and 68 in field C).
∑∑ Wij ∗Xi ∗Yj
Yield values of all fields were adjusted for 13% water content. To re- i=1 j=1
duce the temporal variability of yield data, which is strongly influenced IXY =
W mX2 ∗mY2 (4)
by the weather and rainfall, and for creating a single variable (Jaynes
et al., 2003, 2005) for each field, the standard score normalization where Wij is the spatial association matrix, calculated by
technique (Eq. (1)) was used (Kitchen et al., 2005; Milani et al., 2006; Wij = (1/(1 + Dij )) ; Dij is the distance between points i and j; Xi is the
Suszek et al., 2011). value of variable X transformed, at point i; Yj is the value of the variable
Y transformed, at point j; W corresponds to the sum of the degrees of
2.2. Variable selection spatial association, obtained from the Wij matrix, for i ≠ j; corresponds
to the sample variance of X ; and corresponds to the sample variance of
The Moran’s bivariate spatial autocorrelation statistic (Eq. (4)) Y . Note that the transformation of a variable Z should be interpreted as
(Czaplewski and Reich, 1993) was calculated among all the variables by the procedure performed on their values so that it is on average equal to
using SDUM (Software for definition management zones, Bazzi et al., zero, applying the Eq. Zk = (z k−Z ) , wherein is the sample average of Z .
2013). Variables were selected by the procedure proposed by Bazzi
et al. (2013): (a) removal of variables with no significant spatial au-
tocorrelation at 95% significance; (b) removal of the variables that were 2.3. Interpolation of the selected variables
not correlated with yield; (c) decreasing ordination of the remaining
variables, considering the degree of correlation with yield; and (d) re- In the geostatistical analysis of the selected variables, data were
moval of variables which are correlated with each other, with pre- adjusted to the experimental semivariogram through the models’
ference to the withdrawal of those variables with lower correlation with spherical, exponential, and Gaussian procedures (Table 4), and the best
yield. The idea is to keep only the variables that are most correlated adjusted model was determined through cross-validation statistics (Sun
with yield and remove the variables that are less influential, although et al., 2009; Arslan, 2012). The data were then interpolated by ordinary
are correlated with yield. Since these management zones are delineated kriging in order to create a grid of 5 × 5 m looking for a more dense
using parameters selected according to their relationship with yield, number of points per area and therefore delineating more smooth MZs.

240
K. Schenatto et al. Computers and Electronics in Agriculture 143 (2017) 238–248

Fig. 3. Representation of the difference that occurs in the calculation of the Euclidean distance when used different units of measurement, with the input data: clay data (%) and elevation
(in meters (a) and in kilometers (b)).

c n
2.4. Data normalization methods P = {A1 ,A2 ,…,AC } , that satisfies ∑i = 1 Ai (xk ) = 1 and 0 < ∑k = 1 Ai (xk ) < n ,
where k ∈ {1,2,…,n} and n represents the number of elements of X. The
Euclidean distance should not be used in clustering methods when algorithm is oriented with parameters like the number of groups, a
there is not statistical independent variables demonstrating equal var- distance between the points and the centroid (m ∈ (1,∞) ) and an error
iances (Fridgen et al., 2004), because the distance between each n-di- used as a stopping criterion (ε > 0 ) (Bezdek, 1981).
mensional point and the centroid (also n dimensional) is calculated on The position of each centroid is calculated considering the distance
the basis of the values of each displayed element in the input data defined as parameter. For each C, is calculated v1(t ),…vc(t ) (Eq. (5)) for the
matrix. When these elements are presented with different measurement partition P (t ) , where the iteration is t = {1,2,…,n} . The vector vi corre-
units and unit scales (e.g., elevation in meters; SPR in kilopascals or sponds to the grouping center Ai and is the weighted average of the
megapascals; and clay, sand, and silt in%), the calculation of the gen- data in Ai . The value of the data xk is the m-th power of its relevance
erated Euclidean distance can conduce to incorrect values because of degree to the set Fuzzy Ai .
different metrics for each element, (Fig. 3). Fig. 3 shows that if the n
variable elevation is reported in meters, vector V1, which indicates the ∑ [Ai (xk )]m xk
closest element to centroid 1, receives points A and E as elements, k=1
vi = n
considering that these elements are closer to centroid 1; the other ele- ∑ [Ai (Xk )]m
ments are closer to centroid 2 and are related to vector V2. When ele- k=1 (5)
vation data are input in kilometers, despite the identical distribution of
The calculation of the relevance degree of the element xk to the class
data, owing to change in the metric units, the nearest elements to
Ai (Eq. (6)) is performed for each xk ∈ X and for the whole i ∈ {1,2,…,c } ,
centroid 1 are now points C and D. Thus, the importance of applying
if ||xk −vi(t ) ||2 > 0 .
data normalization methods before inputting the data to the clustering
1 −1
algorithm is demonstrated. ⎡ c ⎛ ||x −ν (t ) ||2 ⎞ m − 1 ⎤
For normalization of the selected variable, after interpolation by k t
Ai(t + 1) (Xk ) = ⎢ ∑ ⎜ ⎥
kriging, we used three methods: standard score, average and range. ⎢ j = 1 ||xk −νj(t ) ||2 ⎟ ⎥
⎣ ⎝ ⎠ ⎦ (6)

2.5. Delineation of the management zones (MZs) where ||xk −vi(t ) ||2
represents the distance between xk and vi .
After performing the normalization of all variables, the MZs were
Considering the widespread acceptance of the fuzzy C-means algo- delineated considering an error parameter equals to 0.0001 and a
rithm (Iliadis et al., 2010; Arno et al., 2011; Valente et al., 2012; Li weight index equals to 1.3 in the Fuzzy C-Means algorithm, thus
et al., 2013), it was used to delineate the MZs. This algorithm yields creating 2, 3, and 4 zones in Fields A, B and C, respectively. MZs were
good results (Jipkate and Gohokar, 2012; Mingoti and Lima, 2006), also delineated without applying the normalization processes to the
performs zoning automatically and in a non-subjective way (Fridgen input data, in order to compare with the other evaluated methods.
et al., 2004), and allows the division of a dataset in C-clusters with
reference to a center of mass or centroid for each built cluster (Fridgen 2.6. Evaluation of MZs
et al., 2004).
Statistically, the Fuzzy C-means technique minimizes the sum of The performance of the normalization methods in the delineation of
errors squares within each class following some criteria and the data are MZs was assessed using:
grouped iteratively to the nearest class using the minimum distance
criterion. The method assumes that a dataset X = {x1,x2,…,x n} where xk (a) Variance Reduction Index (VR, Eq. (7)) (Dobermann et al., 2003;
corresponds to a features vector xk = {xk1,xk 2,…,xkp} ∈ RP for each Xiang et al., 2007). This index was used for the normalized average
k ∈ {1,2,…,n} where RP is the p-dimensional space. The aim is to find a yield variable, with the expectation that the sum of the variances for
pseudo partition Fuzzy that corresponds to a family of C Fuzzy sets of X, each MZ will be smaller than the total variance.
which best represents the data structure and is denoted by

241
K. Schenatto et al. Computers and Electronics in Agriculture 143 (2017) 238–248

c
Table 2
⎛ ∑ W ∗V ⎞
i mz i Summary of the descriptive statistics calculated for the original yield data (sampling data)
⎜ ⎟
VR = ⎜1− i = 1 ⎟ ∗100
of each considered year and for the normalized yield data.
Vfield
⎜ ⎟ Field Year /crop Mean Median Max. Min. SD
⎝ ⎠ (7)
Original yield (sampling data) (t ha−1)
where c is the number of MZs; Wi is the proportion of the area in each A 2012/S 3.984 4.067 5.068 2.541 0.472
management zone; Vumi is the variance of the data from each manage- 2013/S 3.941 4.012 6.166 2.057 0.518
ment zone; Warea is the variance of the sample of data for the entire area. 2014/S 4.525 4.553 5.354 3.635 0.281
B 2012/S 5.255 5.255 6.980 2.563 0.749
2013/Co 8.945 9.195 10.799 6.678 0.992
(b) Fuzziness Performance Index (FPI, Eq. (8)) (Fridgen et al., 2004).
2013/S 5.079 5.107 6.808 3.366 0.586
This index allows for the determination of the degree of separation 2014/Co 10.276 10.415 13.342 6.763 1.095
(i.e., confusion) between the fuzzy c-clusters of a dataset X. When 2014/S 3.888 3.819 4.759 2.934 0.496
the FPI values approach 0, distinct classes are indicated with only a C 2010/S 2.638 2.565 4.340 1.550 0.606
small degree of sharing among members (data), whereas values 2011/S 3.243 3.263 4.644 2.300 0.484

close to 1 indicate no distinct classes, with a high degree of sharing Normalized yield
among members of classes. A – 0.000 −0.045 1.813 −1.184 0.601
B – 0.000 −0.017 0.635 −1.538 0.398
n c C – 0.000 0.005 1.759 −1.680 0.071
c ⎡ ⎤
FPI = 1− 1− ∑ ∑ (uij )2 / n⎥
(c−1) ⎢ j=1 i=1 (8) Co: corn; S: soybean; SD: standard deviation.
⎣ ⎦
where c is the number of clusters; n is the number of observations; uij is
the element of the fuzzy membership matrix.
there was no spatial dependence within each MZ).
(c) Modified Partition Entropy Index (MPE, Eq. (9)) (Boydell and
Mcbratney, 2002). This index estimates the amount of dis- (f) Kappa index (K, Eq. (11)) (Cohen, 1960). The MZs delineated from
organization created by a specific number of clusters. MPE values non-normalized data and normalized by the three methods (stan-
close to 1 indicate that disorganization predominates, whereas va- dard score, range, and average) were compared using K index. K
lues approaching 0 indicate better organization. evaluates the level of agreement, where 0 < K ≤ 0.2 indicates no
n c agreement, 0.2 < K ≤ 0.4 weak agreement, 0.4 < K ≤ 0.6 mod-
−∑ ∑ uij log(uij )/ n erate agreement, 0.6 < K ≤ 0.8 strong agreement, and
j=1 i=1 0.8 < K ≤ 1 very strong agreement (Landis and Koch, 1977).
MPE =
logc (9) r r
⎧ ⎫
wherein c is the number of clusters; n is the number of observations; uij n ∑ x ii− ∑ (x i +∗x+i )
⎨ i=1 ⎬
K= ⎩ ⎭
i = 1
is the ij elements of the fuzzy membership matrix. r
⎧ 2 ⎫
n − ∑ (x i +∗x+i )
(d) Smoothness Index (SI, Eq. (10)) (Gavioli et al., 2016). This index ⎨ ⎬ (11)
⎩ i=1 ⎭
calculates the frequency of shifts in classes of the thematic map in
wherein K is the Kappa concordance index; n is the total number of
horizontal, vertical, and diagonal directions. It characterizes the
observations (sample points); r is the number of error matrix classes;
smoothness of the contour curves by pixel. If a hypothetical map
xii is the number of combinations diagonally; xi + is the total of ob-
possessed a uniform area, resulting in the smoothness index would
servations in line i; x + i is the total of observations in column.
be 100% because of the lack of class changes. Likewise, if a map
was created with random values, the smoothness index would be
near zero.
k
⎛⎛ k k k
⎞ ⎞
⎜ ⎜ ∑ NMHi ∑ NMVj ∑ NMDdl ∑ NMDem ⎟ ⎟
j=1
SI = 100−⎜ ⎜ i = 1 + + l=1
+ m=1
⎟ ∗100⎟
⎜ ⎜ 4PH 4PV 4PDd 4PDe
⎟ ⎟ Table 3
⎜⎜ ⎟ ⎟ Scheme for selection and disposal of variables for the generation of management zones.
⎝⎝ ⎠ ⎠
(10) Variables Field A Field B Field C

where NMHI is the number of changes in the i line (horizontal); NMVJ is 2012 2013 2014 2012 2013 2014 2010
the number of changes in the j column (vertical); NMDDL is the number
SPR 0.0–0.1 m (Mpa) ₣ X ₣ X* X* X
of changes in the l diagonal (right diagonal - DD ); NMDEM is the number
SPR 0.1–0.2 m (Mpa) X X X X X †
of changes in the m diagonal (left diagonal - DE ); K is the maximum SPR 0.2–0.3 m (Mpa) X X* † X X †
number of pixels in the line, column or diagonal; PH is the possibility of Elevation (m) ₣ ₣ ₣
changing pixels horizontally; PV is the possibility of changing pixels Slope (°) X X*
vertically; PDD is the possibility of changes in the right diagonal - DD ; Density (g cm−3) X X
Sand (%) † X †
PDE is the possibility of changes in the left diagonal - DE . Silt (%) X X* X
Clay (%) † X* †
(e) Analysis of Variance (ANOVA): OM (%) X

The yield values were compared between MZs by using the nor- [X] - Eliminated for not having spatial autocorrelation; [X*] - Eliminated for not having
spatial correlation with yield; [†] - Eliminated for being redundant; [₣] - Selected to
malized average yield, and performing the Tukey’s range test to identify
generate the MZs.
whether the delineated sub-regions showed significant differences
(significance level of 0.05) in normalized average yield (assuming that

242
K. Schenatto et al. Computers and Electronics in Agriculture 143 (2017) 238–248

Table 4 all three fields (Table 3) and SPR ranging 0.0–0.1 m for fields A and B.
Geostatistical analysis of the variables selected for the generation of management zones. The choice of these variables is consistent with other studies
(Peralta et al., 2013a), which reported spatial association between
Variable Field Model Nugget Sill Range
elevation and physical properties of soil with yield of soybean and
Elevation A Spherical 0 54.75 221 wheat. Good results in the delineation of MZs were also obtained in
SPR 0.0–0.1 m (2013) A Exponential 8538 1872 571 fields cultivated with soybean and corn, using elevation and soil electric
Elevation B Exponential 12.68 37.49 356
conductivity data (Jaynes et al., 2005).
SPR 0.0–0.1 m (2012) B Exponential 4523 16544 123
Elevation C Exponential 0.19 12.53 450
3.2. Thematic maps and MZs

Geostatistical analysis of the selected variables (elevation and SPR


3. Results and discussion [0.0–0.1 m] for Fields A and B; elevation for Field C) was used to in-
terpolate the data by kriging. The spherical model had the best fit for
3.1. Yield analysis and variables selected the elevation data for Field A, and the exponential best fit the other
datasets (Table 4).
Descriptive statistics were calculated for the yield data at the sam- The variables were then normalized (standard scoring, range, and
pling points for each considered crop, and for normalized yield data average) and imported to SDUM (Bazzi et al., 2013) for delineating MZs
(Table 2). The better years for soybean were 2014 (Field A), 2012 (Field (Figs. 4–7). For Fields A and B (which used more than one variable), the
B) and 2011 (Field C), and for corn, 2014. For each field, a yield range method provided better visual delineation of MZs. However, with
variable was created as a more representative variable using yield data only one variable (Field C), there was no influence by the type of
of all evaluate years (the standard score normalization technique (Eq. normalization. This was expected because the Euclidean distance was
(1)) was used (Kitchen et al., 2005; Milani et al., 2006; Suszek et al., used in the delineation of MZs, which is sensitive to variables with
2011). This variable was used to analyze the cross correlation among different ranges (Bezdek, 1981).
variables and evaluate the MZs. For field B, MZs were delineated using elevation in meters and SPR
Table 2 Summary of the descriptive statistics calculated for the in kilopascals (Fig. 6) to illustrate the influence of data unit for non-
original yield data (sampling data) of each considered year and for the normalized variables. It was found that when the data were normalized
normalized yield data all methods gave the same results even upon changing the SPR unit
By using the spatial correlation matrix, the variables exhibiting from megapascals to kilopascals; This behavior was not observed in the
spatial autocorrelation were selected: elevation, sand, clay, SPR 0.0–0.1 case of non-normalized data (compare Figs. 5 and 7).
(2013), and SPR 0.2–0.3 (2014) for field A; silt, sand, elevation, SPR
0.0–0.1 (2012), SPR 0.0–0.1 (2013), SPR 0.0–0.1 (2014), and SPR 3.3. Evaluation of MZs
0.2–0.3 (2012) for field B; and 0.1–0.2 SPR, SPR 0.2–0.3, elevation,
slope, sand, and clay for field C. Then, the variables that did not have For Fields A and B, there was a greater variance reduction (VR,
spatial correlation with yield were eliminated: SPR 0.2–0.3 (2014) for Table 5) when the fields were divided into only two MZs and the range
field A; silt, clay, SPR 0.0–0.1 (2013), and SPR 0.0–0.1 (2014) for field method was used (VR = 42.5% and VR = 6.5%, respectively). For Field
B; and slope for field C. Finally, the redundant variables were elimi- C, the best results for VR occurred with a division into four MZs
nated: sand and silt for field A; SPR 0.2–0.3 (2012) for field B; and SPR (VR = 30.1), with no distinction between the methods of normal-
0.1–0.2, SPR 0.2–0.3, silt, and clay for field C. The results of variable ization.
selection for delineating the MZs were as follows: variable elevation for Using ANOVA and Tukey’s test, it was possible to verify that the two

Number a) Non-normalized b) Standard Score c) Range d) Average


of MZs

Fig. 4. Management zones for Field A, delineated using altitude (m) and SPR 0–0.1 m (MPa) collected in 2013, as input variables, considering the original data (a), and data normalized
by the methods of standard score (b), range (c), and average (d).

243
K. Schenatto et al. Computers and Electronics in Agriculture 143 (2017) 238–248

Number a) Non-normalized b) Standard Score c) Range d) Average


of MZs

Fig. 5. Management zones for Field B, delineated using altitude (m) and SPR 0–0.1 m (MPa) collected in 2012, as input variables, considering the original data (a), and data normalized
by the methods of standard score (b), range (c), and average (d).

Number a) Non-normalized b) Standard Score c) Range d) Average


of MZs

Fig. 6. Management zones for Field C, delineated using altitude (m) as input variable, considering the original data (a), and data normalized by the methods of standard score (b), range
(c), and average (d).

MZs had different average yields in Fields A and C; however, for Field yields. In Field B, although there were differences in the variance when
A, this was only when the data were normalized by standard score or data were normalized by the standard score and range methods
range. In Field C, when more than two MZs were delineated, despite the (VR = 2.9% and VR = 6.5%, respectively), average yields were equal.
reduction in variance, it was not possible to verify significantly different MPE and FPI showed diverse results for each set of normalized data,

244
K. Schenatto et al. Computers and Electronics in Agriculture 143 (2017) 238–248

Number a) Not normalized b) Standard Score c) Range d) Average


of MZs

Fig. 7. Management zones for Field B, delineated using altitude (m) and SPR 0–0.1 m (kPa) collected in 2012, as input variables, considering the original data (a), and data normalized by
the methods of standard score (b), range (c), and average (d).

indicating the normalization process had a significant influence on the 4. Conclusions


grouping results (Fig. 8). For all fields, regardless of the normalization
method or non-normalization, the indices showed that grouping into The normalization methods influenced the clustering process when
two MZs was best, except for Field A, where it was indicated that three using two or more attributes with different scales (for fields A and B).
MZs were best when non-normalized data or data normalized by This indicates that each method influenced data differently. When only
average were used. The normalization method interfered in the ana- one variable (field C) was used, the maps created with and without
lyzed indices (Fig. 8; VR, FPI, and MPE) when there was more than one normalization were identical.
variable, as was already shown in Table 4. The behavior of the FPI and The optimal number of zones (two zones) selected using the eva-
MPE were quite similar, providing the same interpretation (except for luation indices (MPE, FPI, and VR) and ANOVA was the same for each
Field B with non-normalized data). applied normalization method. Furthermore, the agreement among MZs
The smoothness index (Fig. 9) decreased as the number of MZs in- created using different normalization methods varied from none
creased indicating the smoothness of the contour curves decreased, (kappa = 0.1) to perfect (kappa = 1), indicating that the selection of
complicating the visual interpretation and site-specific management of the normalization method is very important.
agricultural inputs. Perfect agreement among normalization methods The range normalization method yielded the largest variance re-
occurred for Field C (MZs delineated using a single variable), as ex- duction (VR), indicating that yield data were better separated among
pected, agreeing data illustrated in Fig. 4. Standard score and range the delineated zones, and the MZs showed less fragmentation (greater
methods behaved similarly in all cases, with slight advantages for smoothness). The MZs delineated by the average method showed more
range. Additionally, average and non-normalized methods behaved fragmentation.
more poorly than the others.
The Kappa Index (Fig. 10) showed perfect agreement among MZs for
Field C (MZs delineated using a single variable), as expected, agreeing Acknowledgments
with results presented in Fig. 6. For Fields A and B (MZs delineated with
more than one variable), the agreement among MZs decreased as the The authors are grateful to the State University of Western Paraná,
number of zones increased. When using more than one variable (SPR the Technological Federal University of Paraná, the Araucária
and elevation) in the delineation of MZs (Fields A and B), methods of Foundation (Fundação Araucária), the Coordination for the
non-normalization and average showed perfect agreement (kappa ≅ 1) Improvement of Higher Education Personnel (CAPES), and the National
indicating that average method showed less influence. However, the Council for Scientific and Technological Development (CNPq) for the
range method showed the biggest influence, because it had the poorest support received, and the agronomist engineers Aldo Tasca and Agassis
agreement. Linhares for the assignment of the research area.

245
K. Schenatto et al. Computers and Electronics in Agriculture 143 (2017) 238–248

Table 5
Evaluation indices calculated for different methods of normalization in each field.

Field N° Zones Normalization Method ANOVA and Tukeys’ test* VR% FPI MPE SI%

MZ 1 MZ 2 MZ 3 MZ 4

A 2 Non-normalized a a 8.3 0.119 0.022 95.03


Standard Score a b 15.3 0.124 0.024 97.74
Range a b 42.5 0.090 0.018 98.47
Average a a 8.3 0.121 0.023 95.16
3 Non-normalized a ac bc 4.8 0.100 0.021 95.07
Standard Score a ac bc 17.4 0.211 0.043 95.75
Range a a b 25.4 0.141 0.029 97.02
Average a ac bc 4.8 0.104 0.021 95.04
4 Non-normalized a bd ad c 36.7 0.122 0.025 93.31
Standard Score a ac b bc 28.6 0.213 0.046 95.02
Range a a b b 32.9 0.194 0.041 95.69
Average a a bc ac 36.7 0.123 0.025 93.29

B 2 Non-normalized a a 2.0 0.123 0.024 97.68


Standard Score a a 2.9 0.171 0.034 95.93
Range a a 6.5 0.157 0.032 96.92
Average a a 0.0 0.087 0.018 93.21
3 Non-normalized a ac bc 2.0 0.092 0.029 96.76
Standard Score ac a bc 1.0 0.230 0.049 93.95
Range a ac bc 5.7 0.190 0.041 95.10
Average a a a 2.0 0.150 0.032 82.96
4 Non-normalized a ac bc ac 13.0 0.129 0.028 94.81
Standard Score ac a bc ac 5.7 0.225 0.052 94.13
Range a ac bc ac 3.8 0.190 0.045 93.80
Average a a a a -9.9 0.180 0.040 77.87

C 2 Non-normalized a b 27.0 0.0795 0.015 98.07


Standard Score a b 27.0 0.0795 0.015 98.07
Range a b 27.0 0.0795 0.015 98.07
Average a b 27.0 0.0795 0.015 98.07
3 Non-normalized a b b 27.5 0.1022 0.020 96.47
Standard Score a b b 27.5 0.1022 0.020 96.47
Range a b b 27.5 0.1022 0.020 96.47
Average a b b 27.5 0.1022 0.020 96.47
4 Non-normalized a a a b 30.1 0.1180 0.024 94.60
Standard Score a a a b 30.1 0.1178 0.024 94.60
Range a a a b 30.1 0.1178 0.024 94.60
Average a a a b 30.1 0.1178 0.024 94.60

* Significant level of 0.05.

Fig. 8. Statistics of variance reduction index (VR), fuzziness performance index (FPI), and modified partition entropy index (MPE) obtained after clustering process in the Fields A, B, and
C.

246
K. Schenatto et al. Computers and Electronics in Agriculture 143 (2017) 238–248

Fig. 9. Smoothness Index calculated for Fields A, B, and C, as a function of normalization methods.

Fig. 10. Agreement between management zones with Kappa index.

Appendix A. Supplementary material Fu, Q., Wang, Z., Jiang, Q., 2010. Delineating soil nutrient management zones based on
fuzzy clustering optimized by PSO. Math. Comput. Modell. 51, 1299–1305.
Gavioli, A., Souza, E.G., Bazzi, C.L., Guedes, L.P.C., Schenatto, K., 2016. Optimization of
Supplementary data associated with this article can be found, in the management zone delineation by using spatial principal components. Comput.
online version, at http://dx.doi.org/10.1016/j.compag.2017.10.017. Electron. Agric. 127, 302–310.
Hedley, C., 2015. The role of precision agriculture for improved nutrient management on
farms. J. Sci. Food Agric. 95, 12–19.
References Iliadis, L.S., Vangeloudh, M., Spartalis, S., 2010. An intelligent system employing an
enhanced fuzzy C-Means clustering model: application in the case of forest fires.
Comput. Electron. Agric. 70, 276–284.
Anderberg, M.R., 1973. Cluster Analysis for Applications. Academic Press, New York, NY.
Jaynes, D.B., Colvin, T.S., Kaspar, T.C., 2005. Identifying potential soybean management
Arno, J., Martinez-Casasnovas, J.A., Ribes-Dasi, M., Rosell, J.R., 2011. Clustering of grape
zones from multi-year yield data. Comput. Electron. Agric. 46, 309–327.
yield maps to delineate site-specific management zones. Spanish J. Agric. Res. 9,
Jaynes, D.B., Kaspar, T.C., Colvin, T.S., James, D.E., 2003. Cluster analysis of spatio-
721–729.
temporal corn yield patterns in an Iowa field. Agron. J. 95, 574–586.
Arslan, H., 2012. Spatial and temporal mapping of groundwater salinity using ordinary
Jipkate, B.R., Gohokar, V.V., 2012. A Comparative analysis of Fuzzy C-Means clustering
kriging and indicator kriging: the case of Bafra Plain, Turkey. Agric. Water Manage.
and K-Means clustering algorithms. Int. J. Comput. Eng. 2, 737–739.
113, 57–63.
Journel, A.G., Huijbregts, C.J., 1978. Mining Geostatistics. The Blackburn Press, New
Bansod, B.S., Pandey, O.P., 2013. An application of PCA and fuzzy C-Means to delineate
York, NY.
management zones and variability analysis of soil. Eurasian Soil Sci. 46, 556–564.
Kitchen, N.R., Sudduth, K.A., Myers, D.B., Drummond, S.T., Hong, S.Y., 2005. Delineating
Bazzi, C.L., Souza, E.G., Konopatzki, M.R., Nobrega, L.H.P., Uribe-Opazo, M.A., 2015.
productivity zones on claypan soil fields using apparent soil electrical conductivity.
Management zones applied to pear orchard. Int. J. Food Agric. Environm. 13, 86–92.
Comput. Electron. Agric. 46, 285–308.
Bazzi, C.L., Souza, E.G., Uribe-Opazo, M.A., Nobrega, L.H.P., Rocha, D.M., 2013.
Landis, J.R., Koch, G.G., 1977. The measurement of observer agreement for categorical
Management zones definition using soil chemical and physical attributes in a soybean
data. Biometrics 33, 159–174.
area. Engenharia Agrícola 33, 952–964.
Larscheid, G., Blackmore, B.S., 1996. Interactions between farm managers and informa-
Bezdek, J.C., 1981. Pattern Recognition With Fuzzy Objective Function Algorithms.
tion systems with respect to yield mapping. In: Int. Conf. Precision Agric. American
Springer, US, New York, NY.
Society of Agronomy, Minneapolis, pp. 1153–1163.
Bonham, C.D., Reich, R.M., Leader, K.K., 1995. Spatial cross-correlation of Boutelua
Li, Y., Shi, Z., Li, F., Li, H., 2007. Delineation of site-specific management zones using
gracilis with site factor. Grassland Sci. 41, 196–201.
fuzzy clustering analysis in a coastal saline land. Comput. Electron. Agric. 56,
Boydell, B., Mcbratney, A.B., 2002. Identifying potential within-field management zones
174–186.
from cotton yield estimates. Precision Agric. 3, 9–23.
Li, Y., Shi, Z., Wu, H., Li, F., Li, H., 2013. Definition of management zones for enhancing
Bunselmeyer, H.A., Lauer, J.G., 2015. Using corn and soybean yield history to predict
cultivated land conservation using combined spatial data. Environ. Manage. 52,
subfield yield response. Agron. J. 107, 558–562.
792–806.
Burrough, P.A., 1989. Fuzzy mathematical methods for soil survey and land evaluation.
Mielke, P.W., Berry, K.J., 2007. Permutation Methods: A Distance Function Approach.
Eur. J. Soil Sci. 40, 477–492.
Springer, New York, NY.
Cohen, J.A., 1960. Coefficient of agreement for nominal scales. Educ. Psychol. Measur.
Milani, L., Souza, E.G., Uribe-Opazo, M.A., Gabriel Filho, A.G., Johann, J.A., Pereira, J.O.,
20, 37–46.
2006. Determination of management zones using yield data. Acta Scientiarum Agron.
Czaplewski, R.L., Reich, R.M., 1993. Expected value and variance of Moran's bivariate
28, 591–598.
spatial autocorrelation statistic under permutation. Fort Collins, CO: Research Paper.
Milligan, G.W., Cooper, M.C., 1988. A study of standardization of variables in cluster
Dobermann, A., Ping, J.L., Adamchuk, V.I., Simbahan, G.C., Ferguson, R.B., 2003.
analysis. J. Classif. 5, 181–204.
Classification of crop yield variability in irrigated production fields. Agron. J. 95,
Mingoti, S.A., Lima, J.O., 2006. Comparing SOM neural network with Fuzzy C-means, K-
1105–1120.
means and traditional hierarchical clustering algorithms. Eur. J. Oper. Res. 174,
Doerge, T.A., 2000. Management Zone Concepts. Potash & Phosphate Institute,
1742–1759.
Norcross, GA.
Molin, J.P., 2002. Definição de unidades de manejo a partir de mapas de produtividade.
Embrapa, Brazilian Agricultural Research Corporation, 2006. Brazilian System of Soil
Engenharia Agrícola 22, 83–92.
Classification. Rio de Janeiro, RJ: CNPSO.
Moral, F.J., Terrón, J.M., Silva, J.R.M., 2010. Delineation of management zones using
Farid, H.U., Bakhsh, A., Ahmad, N., Ahmad, A., Mahmood-Khan, Z., 2016. Delineating
mobile measurements of soil apparent electrical conductivity and multivariate
site-specific management zones for precision agriculture. J. Agric. Sci. 154, 273–286.
geostatistical techniques. Soil Tillage Res. 106, 335–343.
Fraisse, C.W., Sudduth, K.A., Kitchen, N.R., 2001. Delineation of site–specific manage-
Odeh, I.O.A., Mcbratney, A.B., Chittleborough, D.J., 1992. Soil pattern recognition with
ment zones by unsupervised classification of topographic attributes and soil electrical
fuzzy c-means: application to classification and soil-landform interrelationships. Soil
conductivity. Trans. ASAE 44, 155–166.
Sci. Soc. Am. J. 56, 505–516.
Fridgen, J.J., Kitchen, N.R., Sudduth, K.A., Drummond, S.T., Wiebold, W.J., Fraisse, C.W.,
Ortega, R.A., Santibáñez, O.A., 2007. Determination of management zones in corn (Zea
2004. Management zone analyst (MZA): software for subfield management zone
mays L.) based on soil fertility. Comput. Electron. Agric. 58, 49–59.
delineation. Agron. J. 96, 100–108.

247
K. Schenatto et al. Computers and Electronics in Agriculture 143 (2017) 238–248

Pedroso, M., Taylor, J., Tisseyre, B., Charnomordic, B., Guillaume, S., 2010. A segmen- Stafford, J.V., Ambler, B., Lark, R.M., Catt, J., 1996. Mapping and interpreting the yield
tation algorithm for the delineation of agricultural management zones. Comput. variation in cereal crops. Comput. Electron. Agric. 14, 101–119.
Electron. Agric. 70, 199–208. Sun, Y., Kang, S., Li, F., Zhang, L., 2009. Comparison of interpolation methods for depth
Peralta, N., Costa, J.L., Castro, F., Balzarini, M., 2013a. Delimitación de zonas de manejo to groundwater and its temporal and spatial variations in the Minqin oasis of
con modelos de elevación digital y profundidad de suelo. Interciencia 38, 418–424. northwest China. Environm. Modell. Softw. 24, 1163–1170.
Peralta, N.R., Costa, J.L., 2013. Delineation of management zones with soil apparent Suszek, G., Souza, E.G., Uribe-Opazo, M.A., Nobrega, L.H.P., 2011. Determination of
electrical conductivity to improve nutrient management. Comput. Electron. Agric. management zones from normalized and standardized equivalent productivity maps
99, 218–226. in the soybean culture. Engenharia Agrícola 31, 895–905.
Reich, R.M., Czaplewski, R.L., Bechtold, W.A., 1994. Spatial cross-correlation of un- Tagarakis, A., Liakos, V., Fountas, S., Koundouras, S., Gemtos, T.A., 2013. Management
disturbed, natural shortleaf pine stands in northern Georgia. Environ. Ecol. Stat. 1, zones delineation using fuzzy clustering techniques in grapevines. Precision Agric.
201–217. 14, 18–39.
Rodrigues Junior, F.A., VIeira, L.B., Queiroz, D.M., Santos, N.T., 2011. Geração de zonas Tilman, D., Balzer, C., Hill, J., Befort, B., 2011. Global food demand and the sustainable
de manejo para cafeicultura empregando-se sensor SPAD e análise foliar. Revista intensification of agriculture. Proc. Natl. Acad. Sci. 108, 20260–20264.
Brasileira de Engenharia Agrícola e Ambiental 15, 778–787. Valente, D.S.M., Queiroz, D.M., Pinto, F.A.C., Santos, N.T., Santos, F.L., 2012. Definition
Romesburg, H.C., 1984. Cluster Analysis for Researchers, Belmont. Lifetime Learning of management zones in coffee production fields based on apparent soil electrical
Publications, CA, pp. 333p. conductivity. Scientia Agricola 69, 173–179.
Schenatto, K., Souza, E.G., Bazzi, C.L., Bier, V.A., Betzek, N.M., Gavioli, A., 2016. Data Xiang, L., Pan, Y., Ge, Z., Zhao, C., 2007. Delineation and scale effect of precision agri-
Interpolation in the definition of management zones. Acta Scientiarum Technol. 38, culture management zones using yield monitor data over four years. Agric. Sci. Chin.
31–40. 6, 180–188.
Schepers, A.R., Shanahan, J.F., Liebig, M.A., Schepers, J.S., Johnson, S.H., Luchiari, A., Zhang, Z., Lü, X., Lv, N., Chen, J., Feng, B., Li, X.W., 2013. Defining agricultural man-
2004. Appropriateness of management zones for characterizing spatial variability of agement zones using Gis techniques: Case study of Drip-irrigated cotton fields.
soil properties and irrigated corn yields across years. Agron. J. 96, 195–203. Inform. Technol. J. 12, 6241–6246.

248

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy