Fpls 14 1290078
Fpls 14 1290078
Mr.Bean: a comprehensive
OPEN ACCESS statistical and visualization
EDITED BY
Xueqiang Wang,
Zhejiang University, China
application for modeling
REVIEWED BY
Zhen Fan,
agricultural field trials data
University of Florida, United States
Waseem Hussain, Johan Aparicio 1†, Salvador A. Gezan 2, Daniel Ariza-Suarez 1†,
International Rice Research Institute (IRRI),
Philippines Bodo Raatz 1†, Santiago Diaz 1*, Ana Heilman-Morales 3
*CORRESPONDENCE and Juan Lobaton 1†
Santiago Diaz
w.s.diaz@cgiar.org
1
Bean Program, Crops for Nutrition and Health, Alliance Bioversity-International Center for
Tropical Agriculture (CIAT), Cali, Colombia, 2 Deparment of Statistical Genetics, InternationalVSN,
†
PRESENT ADDRESS Hemel Hempstead, United Kingdom, 3 Big Data Pipeline Unit, North Dakota State UniversityAES,
Johan Aparicio, Fargo, ND, United States
College of Agricultural & Life Sciences,
University of Wisconsin-Madison, WI,
United States
Daniel Ariza-Suarez, Crop improvement efforts have exploited new methods for modeling spatial
Molecular Plant Breeding, Institute of trends using the arrangement of the experimental units in the field. These
Agricultural Sciences, ETH Zurich,
Zurich, Switzerland methods have shown improvement in predicting the genetic potential of
Bodo Raatz, evaluated genotypes. However, the use of these tools may be limited by the
Limagrain Vegetable Seed, La Menitré, France
Juan Lobaton,
exposure and accessibility to these products. In addition, these new
Department of Evolutionary Biology, National methodologies often require plant scientists to be familiar with the
Australian University, Canberra, Australia programming environment used to implement them; constraints that limit
RECEIVED 06 September 2023 data analysis efficiency for decision-making. These challenges have led to the
ACCEPTED 01 December 2023
development of Mr.Bean, an accessible and user-friendly tool with a
PUBLISHED 03 January 2024
comprehensive graphical visualization interface. The application integrates
CITATION
descriptive analysis, measures of dispersion and centralization, linear mixed
Aparicio J, Gezan SA, Ariza-Suarez D, Raatz B,
Diaz S, Heilman-Morales A and Lobaton J model fitting, multi-environment trial analysis, factor analytic models, and
(2024) Mr.Bean: a comprehensive statistical genomic analysis. All these capabilities are designed to help plant breeders
and visualization application for modeling
agricultural field trials data. and scientist working with agricultural field trials make informed decisions
Front. Plant Sci. 14:1290078. more quickly. Mr.Bean is available for download at https://github.com/
doi: 10.3389/fpls.2023.1290078
AparicioJohan/MrBeanApp.
COPYRIGHT
© 2024 Aparicio, Gezan, Ariza-Suarez, Raatz,
Diaz, Heilman-Morales and Lobaton. This is an
open-access article distributed under the terms
of the Creative Commons Attribution License
KEYWORDS
(CC BY). The use, distribution or reproduction
in other forums is permitted, provided the spatial analysis, experimental designs, multi-environmental analysis, trial, breeding
original author(s) and the copyright owner(s)
are credited and that the original publication
in this journal is cited, in accordance with
accepted academic practice. No use,
distribution or reproduction is permitted
which does not comply with these terms.
1 Introduction
The selection of high-yielding and environmentally adapted genotypes in field
trials is a fundamental challenge in plant breeding. In these types of trials, multiple
genotypes are evaluated to estimate genetic parameters and determine the performance
of traits of interest in breeding programs (Mackay et al., 2019). Experimental field
design plays a crucial role in plant breeding (Piepho et al., 2022). Two experimental
designs are widely used in traditional breeding field trials: (i) this work describes “Mr.Bean”, a free R-Shiny application with a
randomized complete block design (RCBD) and (ii) incomplete friendly and easy-to-use graphical user interface (GUI). This
block design (Alvarado et al., 2020). application simplifies the analysis of large-scale plant breeding
Field trials are usually designed to account for spatial experiments by using the power and versatility of LMM with or
heterogeneity, traditionally controlled by blocking. Researchers without spatial correction. This application combines the analytical
divide replicates into blocks, as in the so-called incomplete block robustness and speed offered by several R packages such as ASReml-
design. However, spatial variation in trials cannot be fully captured, R (Butler et al., 2017), SpATS (Rodriguez-Alvarez et al., 2018), and
and has been recognized as a major source of experimental error lme4 (Bates et al., 2015) with the interactive features and visual
(Yan, 2021). Spatial heterogeneity in the field can be associated with power offered by Shiny R (Chang et al., 2023) and plotly (Sievert,
intrinsic biotic factors such as soil microorganisms, pests, diseases, 2020). The application also provides a graphical workflow for
and weeds. Abiotic factors also drive spatial heterogeneity, importing data from the Breeding Management System (BMS)
including the effects of soil fertility, nutrient concentration, and Breedbase through application programing interfaces (API),
presence of toxic elements, water availability, soil structure, and that help to identify outliers, and fit field data. Mr.Bean can analyze
slope, among others. Agronomic management of the trial can also data from single-location or multi-environmental trials (MET),
vary within and across sites (Isik et al., 2017). These conditions calculating the best linear unbiased estimator (BLUE), the best
promote the generation of localized patterns or microenvironments linear unbiased predictor (BLUP) (Piepho et al., 2008), and the
that differ between experimental units in the field, reducing the broad-sense heritabilities. In addition, Mr. Bean offers a module for
overall uniformity of the trial (Bernardeli et al., 2021). For this exploring results from Factor Analytic (FA) MET models using
reason, the experimental designs commonly used in plant breeding several graphical and multivariate techniques. The application
aim to separate genotypic information from the environmental integrates genomic and phenotypic data using the R-package
variability (non-genetic variation). Separation of genotypic and sommer (Covarrubias-Pazaran, 2016). It estimates marker effects,
environmental variability can improve selection accuracy in field variance components with genomic predictions, marker-base
trials, reducing the experimental error with increasing genetic gain heritability, and genomic breeding values (GEBVs).
(Cursi et al., 2021). This application is a convenient and accurate way to analyze
To model the genotypic and environmental components in a agronomic data, visualize field patterns and select genotypes for
field trial, researchers use linear mixed models (LMM). These breeding programs. Mr.Bean aims to help statisticians, quantitative
approaches contain a mixture of fixed and random effects to geneticists, and breeders who want to simplify and automate (or
estimate and infer the variance components (Veturi et al., 2012). semi-automate) routine analysis to accurately predict the genetic
Some of these procedures can incorporate a component to model potential of genotypes coming out of plant breeding pipelines.
the spatial variation in breeding trials (Mao et al., 2020). Moreover, Mr.Bean offers an alternative way to analyze
Understanding spatial variation can improve predictions of the field data for end-users with no previous experience in R
genetic potential of the evaluated genotypes. Towards this end, programming language.
several approaches have been proposed to correct for spatial
heterogeneity in the field (Cullis and Gleeson, 1991; Currie and
Durban, 2002; Piepho and Williams, 2010; Robbins et al., 2012). 2 Methods
There are two major classes of spatial analysis for field trials in plant
breeding: (1) using neighboring plots to adjust the mean of the plot 2.1 Mr.Bean implementation
of interest, and (2) predicting the plot values by adding a spatial
covariate to the mixed model (Zystro et al., 2018). These approaches Mr.Bean (v2.0.8) was developed in R using the package Shiny
can be further classified into those that use spatial variance- (Chang et al., 2023), an elegant and powerful web framework for
covariance structures and those using smoothing techniques to creating R applications. Shiny supports developers with no previous
model spatial trends (Rodriguez-Alvarez et al., 2018). experience using HTML, CSS, or JavaScript. Our developers
One of the great challenges of in data analysis of plant breeding improved the application’s interactive experience by employing
trials could is requires significant computational resources to additional extensions like ShinyJS, bs4dash, shinyWidgets, and
process (Harrison and Caccamo, 2022). The complexity of the ShinyBS. Mr.Bean uses a graphical interface designed to work
data and models can make it difficult these analyses. besides, the under any web browser or R software as an R-Shiny application,
analysis of this data often involves multiple steps, including executed in the x86_64-pc-linux-gnu (64-bit) platform. The core
modeling, preprocessing, feature selection, and interpretation of component consists of a set of 41 R attached packages, for r-
results (Xu et al., 2022). Multiple software has been implemented base:4.1.1 or higher. Mr.Bean uses the packages SpATS (Rodriguez-
with the aim of solving these problems. However, the Alvarez et al., 2018), ASReml-R (Butler et al., 2017), and lme4 (Bates
implementation of these approaches into end-user tools is limited et al., 2015) for fitting LMM with or without spatial corrections. The
either by the accessibility of these tools or by the requirements and sommer package (Covarrubias-Pazaran, 2016) within Mr.Bean
experience needed to program computer instructions for the integrates genomic information to estimate genomic best linear
models. Intending to help breeders or plant science researchers, unbiased predictions (GBLUPs).
2.2 Running Mr.Bean capabilities allow users to identify the missing value character for
their dataset.
Mr.Bean can be installed through the R software console from Once the dataset has been uploaded, the module provides a
GitHub (https://github.com/AparicioJohan/MrBeanApp). It can also quick view of the information for navigation (sorting, filtering, and
be installed and run locally by downloading it directly from the pagination). Additionally, users can create subsets of variables for
docker hub (https://hub.docker.com/r/johanstevenapa/mrbeanapp). further analysis. The Descriptives section provides the ability to
For better understanding and ease in installing the application using visually compare different qualitative and quantitative variables
GitHub or Docker, a video tutorial that explains the installation step using box plots and two-dimensional scatter plots. The
by step is in the following link: https://www.youtube.com/watch? Distribution section helps visualize the frequency distribution for
v=YubFj5DEQ2s. The application can be run in a beta version on the each individual trait using a histogram plot, with accompanying
internet using any web browser for users without sufficient processing summary statistics such as mean, standard deviation, quartiles, and
power, which anyone person can access through the following link: kurtosis, among others. For beginner users, a video tutorial on
https://beanteam.shinyapps.io/MrBean_BETA/ (Figure 1). The beta importing data and making plots in this section is available at the
version is a version that is hosted on a server of the Bioversity-CIAT following link: https://www.youtube.com/watch?v=IlahWdDOOzU.
alliance. The only disadvantage of this Beta version is that the
ASReml, Two-Stage analysis, and GBLUP modules are not available
and there must be a permanent internet connection. Mr.Bean follows 2.4 SpATS module
a logical process through data loading, statistical analysis, model
development, and results generation (Figure 2). Here, the user can fit an LMM with spatial correction. SpATS is
an attractive alternative to classical analyses of field trials, which
model spatial variation as correlated noise (Rodriguez-Alvarez et al.,
2018). It uses two-dimensional smoothing surfaces with penalized
2.3 Data upload splines to model the spatial trends within the LMM framework.
Hence, the implemented SpATS model is
The Data module allows users to upload their trial data. This
y = m + gen + fu,v (col, row) + ϵ
module has several ways to import data from the Upload function.
Data can be uploaded from your personal computer or via an where y is the trait of interest, µ is the overall mean, gen is the
internet connection to the Breeding (BrAPI) (https://brapi.org/), effect of the genotype, fu,v(col,row) are the row, column, bilinear
BMS and BreedBase APIs. The application is prepared to receive polynomial, and smoothing spline effects, and ϵ is the effect of
datasets with a maximum file size of 100 MB, following the tidy experimental error.
format in which every variable has a single column, and every The Single-Site function allows the SpATS model to be run for
observation a single assigned row (see Wickham, 2014 for a detailed experiments in a single location, evaluating one trait at a time. Users
explanation). Users can upload data in several formats, including can calibrate the model with the Model Specs function. This
comma-separated values (csv), tab-separated values (tsv), plain text function requires the user to specify the response variable,
(txt), and two different Excel formats (“xlsx” or “xls”). These upload genotypes, and spatial coordinates for the plots, which are
FIGURE 1
Mr.Bean application home page web.
FIGURE 2
Flow diagram showing the logical process that Mr.Bean follows to perform several analyses.
represented in rows and columns. At their discretion, users can function, the user selects the parameters required to run the model
select genotype checks for the trial and add additional variables as (response variable, genotype, spatial coordinates). The Experiment
fixed or random effects, as well as covariates in the LMM. There is a parameter allows the user to select sites for evaluation. Users can
Help button for beginner users that guides them step-by-step add other optional parameters (components with random or fixed
through each of the parameters required to run the model. The effect, covariates). In addition, users can visualize the genotypes or
application generates a table with an estimate of the broad-sense lines that are shared between sites or experiments.
heritability, residual standard deviation, R-squared, and coefficient The Results subsection compares variance components between
of variation of the fitted model. Users can perform the Least sites using a bar graph. As with the Single-Site function, the
Significant Difference (LSD) test if the genotype factor is selected application summarizes spatial trends of raw data, fitted data,
as a fixed effect in the model. The application also produces tables residuals, and genotype BLUPs with their respective histograms.
and graphs summarizing the model’s variance components, spatial The application creates a ranked error bar-plot of genotype BLUPs.
trends of raw data, fitted data, residuals, and genotype BLUPs with Between evaluated experiments, the application generates
their respective histograms. Moreover, users can visualize spatial correlation plots of phenotypic coefficients and their significance.
trends in the trial plots with two- and three-dimensional graphs. Corresponding model components and summaries of each
The BLUPs/BLUEs subsection returns the predicted values for experiment are reported with the heritability estimated using the
each genotype with their respective standard errors, including a following equation:
histogram showing the distribution. The application also displays
EDg
an error-bar plot that ranks the genotypic values for the variable of Hg2 =
mg − 1
interest. Finally, the Residuals subsection provides tools to identify
outlier observations from the analysis of residuals. It uses the where EDg is the effective dimension for genetic effects, and mg
assumption that residuals from the model follow a normal is the number of genotypes (Rodriguez-Alvarez et al., 2018). As with
distribution with a mean of zero, using a 99% confidence interval other parts of this application, users can identify outliers and
to identify outlier data that fall beyond the range of ±3 standard download clean datasets.
deviations from the mean. The application graphs the outliers in The Trait-by-Trait section has only one subsection, Model
field plots, identifying potential outliers or comparing residuals Specs. Users can run the model and observe the results for
against other traits or factors. These functions contribute to the data experiments evaluated at a single site, fitting multiple traits one at
cleaning process (quality assurance/quality check), before the user a time. This module was designed to compare the quantitative
downloads a clean dataset. response of different variables. In plant breeding experiments, it is
The Site-by-Site function fits models for experiments evaluated common to compare the behavior of one or more traits in one or
in several locations, one trait at a time. This function also has a more trials. This part of the application generates the same results
Model Specs subsection for fitting the model. As with the Single-Site described in the previous sections – spatial plots for each trait,
summaries, model components, heritability, genotype ranking, information, variance component, BLUPs, etc.). This section
outlier identification, etc. It also shows the genetics correlation generates a table with goodness-of-fit statistics (AIC, BIC, herit.PEV,
between traits, offering a graphical display of Pearson’s second heritVC, A optimality, D optimality) to select the best spatial model by
moment correlation coefficients, a dendrogram plot, and a Principal comparing the AR structure for columns, the AR structure for rows, or
Component Analysis (PCA) for the traits and genotypes evaluated the AR structure for both spatial coordinates simultaneously.
in the trial. The ASReml-R module can find the best spatial model for the
For beginner users, a video tutorial about Single-Site, Site-by-Site, data to be analyzed (Model Selector section). Similar to the other
and Trait-by-Trait analysis in this module is available at the following parts of this application, the user selects the available parameters.
link: https://www.youtube.com/watch?v=QU_2O2ycZWA&t=303s. Mr.Bean then generates goodness-of-fit statistics. This section tests
all the possible parameters for a model and then internally
compares all the models to select the one with the best fit. Models
2.5 ASReml-R module are compared by block, complete blocks, splines, rows and columns,
and the residual variance structures.
Licensed researchers can use the ASReml-R and Two-Stage-
Analysis modules. ASReml-R is a statistical software package for
fitting linear mixed models using residual maximum likelihood 2.6 Two-stage analysis module
(REML), as reported by Gilmour et al. (1995). The application for
spatial analyses, establishes the natural variation in the data as the The MET Analysis function fits LMMs for multi-environmental
product of an autoregressive correlation (AR) structure for columns trials using ASReml-R. This module has its own import data section,
and rows denoted by AR1xAR1. ASReml-R is designed to fit the in a csv format, and it is independent from the other modules.
general LMM to moderately large datasets with complex variance Similar to the other modules, the user selects the parameters in the
models. The package has applications in the analysis of repeated Model Specs subsection, providing the response variables,
measures data from multivariate analysis of variance and spline- genotypes, and experiments, which are the different trials to be
type models, unbalanced design experiments, multi-environment analyzed. The user will be able to analyze all trials of the dataset,
trials, and regular or irregular spatial data (Butler et al., 2017). Many selecting which trials to evaluate with the subset option.
of these features are implemented in Mr.Bean. Additionally, there is an option allowing users to include weights
Similar to the SpATS section, users can run the model for in the two-stage analysis. These weights can be calculated by using
experiments in a single site using the ASReml-R function. Using the the standard errors of the BLUEs, or by using the diagonal elements
same interface as in previously described modules, the user selects the of the inverse of the variance covariance matrix associated with the
parameters of the response variable, genotype, and spatial coordinates genotype effect (Smith et al., 2001). In the option Covariance
with Model Specs. Optionally, users can include spatial coordinates structure, the user can select the type of covariance structure to fit
(rows and columns) as splines or factors, and other covariates. The the model in the MET analysis. The list of the covariance structures
application generates spatial trend plots for raw data, fitted data, being offered by Mr.Bean are diagonal (diag), uniform correlation
residuals, environmental variables, and genotype. It also generates a (corv), uniform heterogeneous (corh), factor analytic 1 (FA1),
table with goodness-of-fit statistics, such as Akaike information factor analytic 2, (FA2), factor analytic 3 (FA3), factor analytic 4
criterion (AIC), Bayesian information criterion (BIC), heritability (FA4), and US covariance matrix defined with correlations (corgh).
based on variance components (herit.VC), and heritability based on The user can assess the data before running the model, by observing
predictor error variance (herit.PEV), in addition to other statistics. a barplot with the number of genotypes per trial, a heatmap for the
Furthermore, the application generates a summary with the variance shared genotypes between locations, and a barplot for means with
components, an ANOVA Wald test, and a 3D empirical variogram standard errors for the selected trait.
for the spatial trend of the residuals. In a BLUPs/BLUEs subsection, The Results section shows a correlation matrix and dendrogram
the ASReml-R module generates a table with predicted values and between trials evaluated. Also, a covariance matrix for trials is
their respective standard errors and weights, a histogram of predicted observed. Similar to the outputs of the previous modules, the
values, and a ranking of genotypes using error bar plots. application generates variance components, a summary of the
In breeding trials, field experiments often test hundreds of model, residuals analysis, BLUPs for each genotype in each
genotypes with few or poor replications, mainly in the early stages location, and a PCA biplot for the trials and genotypes (GxE
of genotype screening. In these cases, checks are used to detect trends option). Moreover, the section has a tool for comparing the
and allow the calculation of the residual variance. These trials using model with different covariance structures using the likelihood
local controls assume that checks should have a similar response to the ratio test (LR-statistic). When the factor analytic has been
tested genotypes. Typically, augmented designs are the base for selected as a covariance matrix to fit the model, the Factor
unreplicated trials, and their statistical analysis can be based on analytic section will be enabled. This section displays a bar chart
RCBD or on other spatial configurations (Gezan, 2023). For this for each factor selected, genotypic variance, and variance explained
reason, the ASReml-R module also allows fitting models for single-site for each location. In addition, the latent regression can be reviewed
unreplicated trials. The Unreplicated section presents a similar for each of the genotypes in each of the selected factors. A dot plot
architecture to the Single-Site section by selecting the input with scores by genotype and a dot plot for loadings by environment
parameters and the output results (spatial plots, residuals is produced for each component selected.
2.7 Traditional designs module squared-marker effect for each physical position similar to the
Genome-wide association studies (GWAS) can be observed.
Mr.Bean’s Traditional Designs module addresses the common Finally, in the Results section, the app shows the predictions plot
lack of information about the spatial arrangement of field plots in with the fitted and predicted valued for each genotype.
trials. The module uses the R package lme4 (Bates et al., 2015) to fit
an LMM without spatial correction. The user must first select the
response variable and genotype, before selecting the experimental 2.9 Testing dataset
design. In Mr.Bean have been implemented some traditional
experimental designs for plant breeding, such as completely The dataset comes from a breeding population (Vivero Equipo
randomized designs (CRD), RCBD, row-column design and Frijol or VEF population) of common bean (Phaseolus vulgaris L.)
alpha-lattice design. Mr.Bean provides these models to analyze developed by the Andean bean breeding program of the Alliance
data from these designs: Bioversity-CIAT (Keller et al., 2020). For the single-site analysis, a
yij = m + geni + ϵij for CRD. subset of 260 genotypes of the VEF population was planted in 2022
yijk = m + geni + repj + ϵijk for RCBD at the Alliance Bioversity-CIAT’s Palmira experimental field station
yijk = m + geni + repj + rowk (rep)j + colk (rep)j + ϵijk for row- (Colombia, 1,000 m a.s.l. altitude, latitude 3°32′N and longitude 76°
column design 18′W), under drought and irrigation.
yijk = m + geni + repj + blockk (rep)j + ϵijk for alpha- For multi-environmental trial analysis, a historical dataset of
lattice design. 1,142 genotypes was planted at the Palmira experiment station, and
Where y is the trait of interest, µ is the overall mean, gen is the at two additional sites: Darien, Colombia, with an altitude of 1,600
effect of the genotype, block is the effect of the block, rep is the effect m a.s.l., (latitude 3°55′N and longitude 76°29′W) and Quilichao,
of the replication, col and row are the effects of the spatial location Colombia, with an altitude of 1,000 m a.s.l. (latitude 3°1′N and
and ϵ is the effect of the experimental error. Mr.Bean also offers the longitude 76°28′W) over a period of seven years (2013, 2014, 2015,
ability to specify any other model formula using the lme4 syntax, 2016, 2017, 2018, and 2019). For Darien, the trials were planted
which is similar to the regular mathematical notation for specifying under three levels of phosphorus concentration – high phosphorus,
linear models (Bates et al., 2015). medium phosphorus, and low phosphorus with optimal
Like the SpATS module, the application provides the precipitation conditions (590 mm) for these trials. For Palmira,
significance of the fixed effects in the model using the F statistic, the trials were planted under drought and irrigated conditions. In
and reports variance components, likelihood-ratio test information, Quilichao, the trials were planted under drought conditions. In
and the broad-sense heritability estimate (Cullis et al., 2006), total, 13 different trials were conducted (Supplementary Table 1).
together with some regularly used information for comparing The experimental units were row plots of 2.22 m2 laid out for
different fitted models, such as AIC and BIC. The user can also each replicate of each genotype. The experimental design was an
make multiple comparisons when the genotype is taken as a fixed alpha-lattice with two and three replicates. Four traits were
factor. Likewise, as in previously described modules, this module evaluated and reported in both datasets. The number of days to
provides an analysis of residuals using a QQplot, a histogram, an flowering (DF) was measured from the planting day to when 50% of
analysis of outliers, as well as a list of ranked genotypes. the plants in the plot had at least one open flower. Days to
physiological maturity (DPM) was measured as the number of
days from planting until 50% of plants had at least one pod that had
2.8 GBLUP module lost its green pigmentation. Yield (YDHA, kg ha−1) was determined
for each plot and corrected for seed moisture of 14%. Seed weight
The last module implemented in Mr.Bean is the GBLUP (SW100, g 100 seeds−1) was obtained from 100 seeds (Diaz
module. The app allows integrate genomic and phenotypic data et al., 2020).
with the aim of performing genomic prediction analysis using the
R-package sommer (Covarrubias-Pazaran, 2016). In the Genomic
Prediction section, the user only must import the phenotypic data 3 Results
and the genotypic data. The markers genotypic data must be in
numerical format (-1, 0, 1), import the genetic map with the 3.1 Single site analysis
physical positions of the markers is also possible. In the same
section, the users only have to select the phenotypic variables they Mr.Bean enabled the analysis of the phenotypic distribution of
want to analyze and the model can be executed. The current method SW100, DPM, DF, and YDHA for 260 lines belonging to the VEF
available for this kind of analysis is GBLUP. panel dataset, evaluated in Palmira under drought and irrigated
Mr.Bean estimates the variance components with genomic conditions in 2022 (Figure 3; Table 1). Water availability
predictions, marker-base heritability, and GEBVs for each trait conditions (drought and irrigated) affected SW100 and YDHA, two
evaluated. Accuracy data and reliability, correlation plots between traits that also showed the highest coefficients of variation, 0.24 and
predicted and observed values of GBLUPs and the estimated 0.14 for drought and 0.29 and 0.13 for irrigation, respectively. The
FIGURE 3
Phenotypic distribution of 100 seed weight (SW100), days to physiological maturity (DPM), days to flowering (DF) and yield (YDHA) of 260 lines
belonging to VEF evaluated in drought (red plot) and irrigation (blue) conditions in 2022. (Figure generated directly by Mr.Bean).
phenotypic correlation between the traits for the two conditions is and SW100 (-0.35 – 0.5). YDHA was negatively correlated with DF
shown in the correlation plot (Figure 4). In both conditions, a strong and DPM under drought conditions. However, under irrigated
positive correlation was observed between DF and DPM (0.68 – 0.7). conditions the correlation was positive. Mr.Bean generates a
On the other hand, a negative correlation was observed between DF clustering dendrogram from the correlation matrix and a PCA
TABLE 1 Summary statistics for phenotypic response of 100 seed weight (SW100), days to physiological maturity (DPM), days to flowering (DF) and
yield (YDHA) of 260 lines belonging to VEF evaluated in drought and irrigation conditions in 2022.
Std. Dev 182.21 308.87 1.38 1.7 2.34 2.54 6.36 7.07
FIGURE 4
Pearson’s second moment correlation coefficients and their significances between best linear unbiased estimators (BLUEs) of evaluated traits. The
broad-sense heritabilities of the best linear unbiased predictors (BLUPs) are located within the diagonal with the gray background. 100 seed weight
(SW100), days to physiological maturity (DPM), days to flowering (DF), and yield (YDHA) of 260 lines belonging to VEF evaluated in drought (left side)
and irrigation (right side) conditions in 2022. (Figure generated directly by Mr.Bean) Significance of correlations indicated as ***: p < .0001; **:
p < .001; ns, not significant.
biplot graph for the first two principal components of the distance row-column design (Bates et al., 2015) and considering the
matrix (Figure 5). The biplot shows the correlation between DF and genotype effect as random. The heritability and variance
DPM in both trial conditions (Figures 5A, B). Figure 5 also shows the components were then calculated. Next, the application
differences in the performance of the Mesoamerican genotype checks calculated the spatial trends for raw data, fitted data, residuals,
compared to the Andean genotypes. fitted spatial trend, and genotypic BLUPs for YDHA, using SpATS
Model fitting was performed with SpATS (Rodriguez-Alvarez and ASReml-R models under drought and irrigated conditions
et al., 2018) and ASReml-R (Butler et al., 2017), using lme4 under a (Figure 6 and Table 2).
A B
FIGURE 5
Biplot of principal components analysis (top side) and dendrograms (bottom side) of the phenotypic correlation for 100 seed weight (SW100), days
to physiological maturity (DPM), days to flowering (DF) and yield (YDHA) of 260 lines (Black points) belonging to VEF population evaluated in:
(A) drought (left side) and (B) irrigation (right side) conditions in 2022. (Figure generated directly by Mr.Bean).
FIGURE 6
Spatial trends plots for raw data, fitted data, residuals, fitted spatial trend, and genotypic BLUPs for YDHA of 260 lines belonging to VEF population
evaluated in drought (top side) and irrigation (bottom side) conditions in 2022. The models used for generating the spatial trends were SpATS (left
side) and ASReml-R (right side) (Figure generated directly by Mr.Bean).
TABLE 2 Heritability and variance components for yield (YDHA), using SpATS (Rodriguez-Alvarez et al., 2018), ASReml-R (Butler et al., 2017), and row-
columns design with lme4 (Bates et al., 2015), of 260 lines belonging to the VEF panel dataset, evaluated under drought and irrigated conditions
in 2022.
col:f(row) 0 0.001 0 0
row:col!R 1 1
A B
FIGURE 7
Correlation plot (A) and dendrogram (B) for yield (YDHA) of VEF population evaluated in 13 trials. (Figure generated directly by Mr.Bean). Significance
of correlations indicated as ***: p < .0001; **: p < .001; *: p < .01; ns, not significant.
TABLE 3 Variance components for yield (YDHA), using SpATS (Rodriguez-Alvarez et al., 2018), and ASReml-R with one analytic factor as covariance
matrix (Butler et al., 2017) of VEF population evaluated in 13 trials.
SpATS ASReml-R
Experiment
varG varE varG PVE (%)
Dar16C_hiP 53372.31 53845.07 52765.65 24.4
the open-source application with individual modifications to meet Supplementary materials and education videos can be found at
their needs and requirements. Mr.Bean’s individual modules are github https://github.com/AparicioJohan/MrBeanApp and
easy to understand and accessible to novice users. The workflow Youtube https://www.youtube.com/@ndsubigdatapipelineunit5201/.
starts with downloading, cleaning, processing, and filtering the raw
data for further analysis. The modules can be used for different
analyses depending on the nature and purpose of the trials being Data availability statement
evaluated. Users can generate graphs and tables with detailed
information for future interpretation. Mr.Bean includes several The original contributions presented in the study are included
visual tools such as real-time interactive statistical graphs in the article/Supplementary Material. Further inquiries can be
developed in the R Shiny package. These tools support directed to the corresponding author.
understanding and analyzing the behavior of the raw or
processed data.
Mr.Bean models spatial variability – one of the major sources of Author contributions
error in field trials (Singh et al., 2003). The application uses linear
mixed models with spatial components of field experiments JA: Conceptualization, Investigation, Methodology, Software,
implemented with SpATS and ASReml-R packages. The Supervision, Validation, Visualization, Writing – original draft. SG:
application accommodates traditional experimental designs Formal analysis, Methodology, Software, Writing – review &
lacking spatial information, such as randomized complete block editing. DA-S: Conceptualization, Investigation, Software, Writing
designs or alpha-lattice designs, and separates genotypic variance – review & editing. BR: Conceptualization, Funding acquisition,
from environmental variance. Ultimately, Mr.Bean facilitates data Project administration, Resources, Supervision, Writing – review &
analysis towards improving genetic gain and making breeding editing. SD: Data curation, Formal analysis, Supervision, Writing –
programs more efficient (Covarrubias-Pazaran, 2020). original draft, Writing – review & editing. AH-M: Investigation,
With single-site and multi-environment trial analysis, Mr.Bean Methodology, Software, Validation, Writing – review & editing. JL:
enables breeders to make better use of their data and more robust Funding acquisition, Project administration, Resources,
decisions about genotype performance by calculating BLUEs and Supervision, Writing – original draft, Writing – review & editing.
BLUPs for every trait and every location, within and across sites. The
application estimates the selection response and provides breeders
with critical tools to select the best performing genotypes. In Funding
addition, Mr.Bean can adjust any variable as a covariate to
estimate its effect on the trial. The application allows multi-trait The author(s) declare financial support was received for the
and genetic correlation analysis, allowing the development of a research, authorship, and/or publication of this article. This work
selection index for implementation in breeding programs. was funded by the Tropical Legumes III-Improving Livelihoods for
Smallholder Farmers: Enhanced Grain Legume Productivity and organizations, or those of the publisher, the editors and the
Production in Sub-Saharan Africa and South Asia (OPP1114827), reviewers. Any product that may be evaluated in this article, or
and by the AVISA-Accelerated varietal improvement and seed claim that may be made by its manufacturer, is not guaranteed or
delivery of legumes and cereals in Africa (OPP1198373) projects endorsed by the publisher.
funded by the Bill and Melinda Gates Foundation. We would like to
thank the USAID for their contributions through the CGIAR
Research Program on Grain Legumes and Dryland Cereals.
Supplementary material
Acknowledgments The Supplementary Material for this article can be found online
We would like to thank the Bean team of the Alliance at: https://www.frontiersin.org/articles/10.3389/fpls.2023.1290078/
Bioversity-CIAT for their great support. We also want to thank full#supplementary-material
the AES Big Data Pipeline Unit of the North Dakota State
SUPPLEMENTARY TABLE 1
University for their constant help and knowledge in the Combination of location, year and conditions which established VEF
construction of this application and we appreciate to VSN population in each trial.
International for being part of this project.
SUPPLEMENTARY FIGURE 1
Phenotypic distribution of 100 seed weight (SW100), days to physiological
maturity (DPM), days to flowering (DF) and yield (YDHA) of VEF population
Conflict of interest evaluated in 13 trials. (Figure generated directly by Mr.Bean).
employed by AES Big Data Pipeline Unit. Biplot of the first two principal components of the correlation for yield (YDHA)
of 1146 lines (Black points) belonging to the VEF population, evaluated in 13
The remaining authors declare that the research was conducted trials (blue arrows) (Figure generated directly by Mr.Bean).
in the absence of any commercial or financial relationships that
could be construed as a potential conflict of interest. SUPPLEMENTARY FIGURE 3
Scores of 1,146 lines belonging to VEF population (a) and loading factor of 13
trials by Factor analytic (b) (Figure generated directly by Mr.Bean). The size
and color of each individual point correspond to BLUE values for each
Publisher’s note environment or genotype. big size points and dark blue color correspond
to environments or genotypes with higher BLUE values and small size points
All claims expressed in this article are solely those of the authors and yellow color correspond to environments or genotypes with lower
and do not necessarily represent those of their affiliated BLUE values.
References
Alvarado, G., Rodriguez, F. M., Pacheco, A., Burgueño, J., Crossa, J., Vargas, M., et al. Cursi, D. E., Gazaffi, R., Hoffmann, H. P., Brasco, T. L., do Amaral, L. R., and Neto, D.
(2020). META-R: A software to analyze data from multi-environment plant breeding D. (2021). Novel tools for adjusting spatial variability in the early sugarcane breeding
trials. Crop J. 8, 745–765. doi: 10.1016/j.cj.2020.03.010 stage. Front. Plant Sci. 12. doi: 10.3389/fpls.2021.749533
Bates, D., Mächler, M., Bolker, B., and Walker, S. (2015). Fitting linear mixed-effects Diaz, S., Ariza-Suarez, D., Izquierdo, P., Lobaton, J. D., de la Hoz, J. F., Acevedo, F.,
models using lme4. J. Stat. Software 67 (1), 1–48. doi: 10.18637/jss.v067.i01 et al. (2020). Genetic mapping of agronomic traits in a MAGIC population of common
Bernardeli, A., Rocha, J. R., Borem, A., Lorenzoni, R., Aguiar, R., Basilio, J. N., et al. bean (Phaseolus vulgaris L.) under drought conditions. BMC Genomics 21, 799.
(2021). Modeling spatial trends and enhancing genetic selection: An approch to doi: 10.1186/s12864-020-07213-6
soybean seed composition breeding. Crop Sci. 61, 976–988. doi: 10.1002/csc2.20364 Gezan, S. A. (2023) Unreplicated trials: What can they really do? Part 1. Available at:
Butler, D. G., Cullis, B. R., Gilmour, A. R., Gogel, B. G., and Thompson, R. (2017). https://vsni.co.uk/blogs/unreplicated-trials-part-1 (Accessed March 15, 2023).
ASReml-R reference manual version 4 (Hemel Hempstead, HP1 1ES, UK: VSN Gilmour, A. R., Thompson, R., and Cullis, B. R. (1995). Average information REML,
International Ltd). an efficient algorithm for variance parameter estimation in linear mixed models.
Chang, W., Cheng, J., Allaire, J., Sievert, C., Schloerke, B., Xie, Y., et al. (2023). shiny: Biometrics 51, 1440–1450. doi: 10.2307/2533274
Web application framework for R. R package version 1.8.0.9000. Available at: https:// Harrison, R. J., and Caccamo, M. (2022). "Managing data in breeding, selection and
github.com/rstudio/shiny, https://shiny.posit.co/. in practice: A hundred year problem that requires a rapid solution," in Towards
Covarrubias-Pazaran, G. (2016). Genome-assisted prediction of quantitative traits using responsible plant data linkage: Data challenges for agricultural research and
R package sommer. PloS One 11 (6), e0156744. doi: 10.1371/journal.pone.0156744 development. Eds. H. F. Williamson and S. Leonelli (Springer, Cham), 37–64.
doi: 10.1007/978-3-031-13276-6_3
Covarrubias-Pazaran, G. (2020)Manual Breeding process assessment: Genetic gain
as a high-level key performance indicator. In: Excellence in breeding platform. Isik, F., Holland, J., and Malteca, C. (2017). "Spatial analysis," in Genetic data analysis
Excellenceinbreeding.org/toolbox/tools/eib-breeding-schemeoptimization-manuals for plant and animal breeding (Springer, Cham). doi: 10.1007/978-3-319-55177-7
(Accessed March 10, 2023). Keller, B., Ariza-Suarez, D., de la Hoz, J., Aparicio, J. S., Portilla-Benavides, A. E., Buendia,
H. F., et al. (2020). ). Genomic prediction of agronomic traits in common bean (Phaseolus
Cullis, B. R., and Gleeson, A. C. (1991). Spatial analysis of field experiments-an vulgaris L.) under environmental stress. Front. Plant Sci. 711. doi: 10.3389/fpls.2020.01001
extension to two dimensions. Biometrics 47, 1449–1460. doi: 10.2307/2532398
Mackay, I., Piepho, H.-P., and Franco, A. A. F. (2019). “Statistical methods for plant
Cullis, B. R., Smith, A. B., and Coombes, N. E. (2006). On the design of early breeding,” in Handbook of statistical genomics. Eds. D. Balding, I. Moltke and Marioni,.
generation variety trials with correlated data. Journal of Agricultural. Biological J. doi: 10.1002/9781119487845.ch17
Environ. Stat 11 (4), 381–393. doi: 10.1198/108571106x154443
Mao, X., Dutta, S., Wong, R. K., and Nettleton, D. (2020). Adjusting for spatial effects
Currie, I. D., and Durban, M. (2002). Flexible smoothing with P-splines: a unified in genomic prediction. Journal of Agricultural. Biol. Environ. Stat 25, 699–718.
approach. Stat. Modeling 2 (4), 333–349. doi: 10.1191/1471082x02st039ob doi: 10.1007/s13253-020-00396-1
Piepho, H., Boer, M. P., and Williams, E. R. (2022). Two-dimensional P-splines smoothing Smith, A., Cullis, B., and Gilmour, A. (2001). Applications: the analysis of crop
for spatial analysis of plant breeding trials. Biometrical J. 64, 5. doi: 10.1002/bimj.202100212 variety evaluation data in Australia. Aust. New Z. J. Stat 43 (2), 129–145. doi: 10.1111/
Piepho, H., Mohring, J., Melchinger, A., and Buchse, A. (2008). BLUP for phenotypic 1467-842X.00163
selection in plant breeding and variety testing. Euphytica 161, 209–228. doi: 10.1007/ Veturi, Y., Kump, K., Walsh, E., Ott, O., Poland, J., Kolkman, J. M., et al. (2012).
s10681-007-9449-8 Multivariate mixed linear model analysis of longitudinal data: an information-rich
Piepho, H., and Williams, E. (2010). Linear variance models for plant breeding trials. statistical technique for analyzing plant disease resistance. Analytical Theor. Plant Pathol.
Plant Breed. 129 1, 1–8. doi: 10.1111/j.1439-0523.2009.01654.x 102 (11), 1016–1025. doi: 10.1094/PHYTO-10-11-0268
Robbins, K., Backlund, J., and Schnelle, K. (2012). Spatial corrections of unreplicated trials Wickham, H. (2014). Tidy data. J. Stat. Software 59 (10), 1–23. doi: 10.18637/
using a two-dimensional spline. Crop Sci. 52 (3), 1138–1144. doi: 10.2135/cropsci2011.08.0417 jss.v059.i10
Rodriguez-Alvarez, M. X., Boer, M. P., van Eeuwijk, F. A., and Eilers, P. H. (2018). Xu, Y., Zhang, X., Li, H., Zheng, H., Zhang, J., and Olsen, M. (2022). Smart breeding
Correcting for spatial heterogeneity in plant breeding experiments with P-splines. driven by big data, artificial intelligence, and integrated genomic-enviromic prediction.
Spatial Stat 23, 52–71. doi: 10.1016/j.spasta.2017.10.003 Mol. Plant 15 (1), 1664–1695. doi: 10.1016/j.molp.2022.09.001
Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny.
Yan, W. (2021). A systematic narration of some key concepts and procedures in plant
(Chapman and Hall/CRC). Available at: https://plotly-r.com.
breeding. Front. Plant Sci. 12. doi: 10.3389/fpls.2021.724517
Singh, M., Malhotra, R. S., Ceccarelli, S., Sarker, A., Grando, S., and Erskine, W.
(2003). ). Spatial variability models to improve dryland field trials. Exp. Agric. 39, 151– Zystro, J., Colley, M., and Dawson, J. (2018). “Alternative experimental designs for plant
160. doi: 10.1017/S0014479702001175 breeding,” in Plant breeding reviews. Ed. I. Goldman. doi: 10.1002/9781119521358.ch3