v29 Multi-Site (Genotype by Environment) Analysis
This tutorial describes a genotype by environment (GxE) analysis for a five-location field trial (SimTrial). This tutorial builds upon the adjusted means (BLUEs) and summary statistics calculated for the individual locations in the previous tutorial, Single Site Analysis: 5 Location Batch.
Select Data from the Database
Open Multi-Site Analysis from the STUDIES menu. Select Browse.
Select SimTrial under the 2024 folder to use the BLUEs and summary statistics uploaded to the BMS after the single site analysis. You need at least three sites for a GxE analysis and four or more is better.
Define environments and groups.
Environments: LOCATION_NAME often used but in this example we will use TRIAL-INSTANCE as it will make the interpretation simple for this example
Genotype: DESIGNATION usually
Environment Grouping Factor: None
the form asks if your environments are already grouped in some way which would account for significant GxE interactions. Usually, we do not know about groupings at the early stage, and mostly do not have enough environments for subsets, so leave this as is.
Traits with means available from all trial locations are selected by default. Traits that are not observed or could not be fitted with a mixed model in multiple environments in the single-site analysis are not selected for Multi-Site analysis.
Select Next.
Generate BV Input Files from BMS
Review the five environments and traits to include in the multi-site analysis. This allows you to eliminate environments for which there is insufficient data or for which the heritability is too low, and similarly at the bottom you can eliminate traits.
Select Download Input Files. This will download a zip file with three files, one of which is the xml file for means that will be the input to Breeding View. You will unzip this file.
Load Project & Data
As before, leave the BMS and open Breeding View on your desktop. In BV open the project by browsing for the xml means file. This will populate BV but this time with the Multi-site Analysis flow diagram.
Run Analysis
When a project has been created or opened, a visual representation of the analytical pipeline is displayed in the Analysis Pipeline tab. The analysis pipeline includes a set of connected nodes, which can be used to run and configure pipelines.
Node Descriptions:
Quality Control Phenotypes: Summary statistics within and between environments for the trait(s)
Finlay-Wilkinson: Performs a Finlay-Wilkinson joint regression (Finlay and Wilkinson, 1963)
AMMI Analysis: Fits an AMMI model and generates summaries and a biplot (Gauch, 1988)
GGE Biplot: Fits a GGE model and generates a biplot (Yan et al., 2000).
Variance-Covariance Modeling: Fits different variance-covariance models to the GxE data and selects the best one for the data
Stability Coefficients: Estimates different stability coefficient parameters to assess genotype performance
Generate report: Generates an HTML report of the results.
Select only grain yield (GY_M_kgPlot) from the analysis.
Run the analysis using the default settings by right-clicking the Quality Control Phenotype node and choosing Run Pipeline.
When the analysis is complete, a popup will notify the user and there will be a directory in your downloads with a date-stamp as shown.
Within this directory are files of what is generated about association of sites (eg heat maps) and stability measures.
Analysis Report & Graphs
The analysis output can be viewed from the Breeding View interface under the results and graphs tabs. Analysis results can also be reviewed as individual files are automatically saved in the location specified by your browser settings, generally the Downloads folder.
Descriptive Statistics
Breeding View provides descriptive statistics describing the entire dataset's variance and covariance.
Trait Summary Statistics
The trait summary statistics describe each trait based on the means calculated for each environment in the single-site analysis.
The box plot of means provides a visual representation of the summary statistics.
Boxplot of Grain Yield Means: Location 2 has the highest grain yield while locations 3 and 5 have high variances.
Best Variance-Covariance Model for Each Trait
The GxE analysis pipeline formally models the variance-covariance structure in the means data and selects the best model for each trait. The main purpose is to establish a model for later testing of fixed effects, like determining marker effects in a quantitative trait loci by environment (QTLxE) analysis using BLUPs calculated in the single site analysis.
Best Variance-Covariance Model: In this example, grain yield means are best described by Compound symmetry.
Genotype By Environment (GxE) Interactions
Stability, or lack of phenotypic plasticity, is calculated for each genotype considering all traits using the following analyses:
Cultivar-Superiority Measure
Static Stability Measures Coefficients
Wricke’s Ecovalence Stability Coefficients
GxE interactions are also examined for each individual trait using the following analyses:
Finlay and Wilkinson Modified Joint Regression
AMMI Model
GGE Model
Best Variance-Covariance Model
Correlation Matrix
Scatter Plot Matrix
Stability Superiority Measure
Stability Superiority Measure(Lin & Binns,1988) is the sum of the squares of the difference between the genotypic mean in each environment and the mean of the best genotype, divided by twice the number of environments. Genotypes with the smallest values of superiority tend to be more stable and closer to the best genotype in each environment.
Static Stability Measures Coefficients
The Static Stability Coefficient is defined as the variance around the germplasm’s phenotypic mean across all environments. This provides a measure of the consistency of the genotype without accounting for performance.
Wrick’s Ecovalence Stability Coefficients
Wricke’s Ecovalence Stability Coefficient (Wricke, 1962) is the contribution of each genotype to the genotype-by-environment sum of squares in an unweighted analysis of the genotype-by-environment means. A low value indicates that the genotype responds in a consistent manner to changes in environment; i.e. stable from a dynamic point of view. Like static stability, the Wricke’s Ecovalence does not account for genotype performance.
Finlay and Wilkinson Modified Joint Regression Analysis
The Finlay and Wilkinson Modified Joint Regression Analysis ranks germplasm based on phenotypic stability for individual traits.
AMMI Model
In the Additive Main Effects and Multiplicative Interaction (AMMI) model, a two-way ANOVA additive model is performed (additive main effects), followed by a principal component analysis on the residuals (multiplicative interaction). As a result, the interaction is characterized by Interaction Principal Components (IPCA), where genotypes and environments can be simultaneously plotted in biplots.
GGE Model
In the Genotype Main Effects and Genotype × Environment Interaction Effects (GGE) model (Yan et al. 2000 & 2003) a 1-way ANOVA, including environment as a main effect, is run followed by a principal component analysis on the residuals. Like AMMI, principal component scores can be used to construct biplots. Unlike the AMMI Model, in GGE the genotypic main effects are also represented in the plot. The GGE model is superior to AMMI analysis at differentiating mega-environments (Yan et al. 2007)
Environments 1,3 & 4 cluster, indicating that these two locations have similar environmental effects on phenotype and small GxE interactions.
Variance-Covariance Model & Correlation Matrix
Details on the variance-covariance model, including the pairwise correlation matrix from the covariance model is presented in a table in the Report tab. In the correlation matrix values close to 1 indicate higher correlation between environments. A value of 1 indicates a perfect correlation, such as when an environment is compared to itself.
Correlation Matrix for Grain Yield (GY_M_kgPlot).
Correlation Heat Map
The correlation heat matrix visualizes correlations with color; warm colors (red) indicating high positive correlation between environments, and cool colors (blue) indicating high negative correlation between environments.
Correlation Heat Map of Grain Yield (GY_M_kgPlot): Environment 2 is most positively correlated (red) to Environment 4 environment, suggesting that these two locations have similar environmental effects on phenotype and small GxE interactions.
Scatter Plot Matrix
The scatter plot matrix illustrates the association of genotypic performance between each pair of environments.
Scatter Plot Matrix for Grain Yield (GY_M_kgPlot):
Interpretation
The most important of these files encompasses this correlation of sites and the stability of lines. They are really all you need and they are:
· GxE_Means and
· GGE biplot mega environs
The GxE means table looks like this.
It needs simplification and rearranging to group sites according to the GGE biplot mega environs.
The GGE Biplot cluster showed sites 1, 3 and 4 were giving similar ranking and should be considered together, and they were different to sites 2 and 5.
Thus, presenting the data according to these clusters reveals some interesting conclusions. So, the GxE table above then is arranged as:
So, putting sites 1,3 and 4 together reveals patterns that may be overlooked otherwise eg Meghan1 is very high yield in cluster one only, Bert1 has good yield but is very variable and Meghan5 Check is only adapted to site 5.
Within this one table you have captured correlation of sites (sites 1,3 & 4 giving similar information) and stability (Bill1 = high and stable yield, Meghan1= high yield but unstable).
References
Gauch, H. G. (1988). Model selection and validation for yield trials with interaction. Biometrics, 44, 705–715.
Gauch, H.G. (1992). Statistical Analysis of Regional Yield Trials – AMMI analysis of factorial designs. Elsevier, Amsterdam.
Finlay, K.W. & Wilkinson, G.N. (1963). The analysis of adaptation in a plant-breeding programme. Australian Journal of Agricultural Research, 14, 742-754.
Murray, D. Payne, R, & Zhang, Z. (2014) Breeding View, a Visual Tool for Running Analytical Pipelines: User Guide. VSN International Ltd. (.pdf) (Sample data .zip).
Lin, C.S. & Binns. M.R. (1988). A superiority performance measure of cultivar performance for cultivar x location data. Canadian Journal of Plant Science, 68, 193-198.
Lin, C. S., Binns, M. R., & Lefkovitch, L. P. (1986). Stability analysis: Where do we stand? Crop Science, 26, 894–900.
Oakey, H., Verbyla, A. P., Pitchford, W., Cullis, B., & Kuchel, H. (2006). Joint modeling of additive and non-additive genetic line effects in single field trials. Theoretical and Applied Genetics, 113, 809–819.
Wricke, G. (1962). Uber eine method zur erfassung der okogischen streubreite in feldversuchen. Zeitschriff Fur Pflanzenzuchtung, 47, 92-96.
Yan, W., Hunt, L. A., Sheng, Q., & Szlavnics, Z. (2000). Cultivar Evaluation and Mega-Environment Investigation Based on the GGE Biplot. Crop Science, 40, 597–605.
Yan, W. & Kang, M.S. (2003). GGE Biplot Analysis: a Graphical Tool for Breeders, Geneticists and Agronomists. CRC Press, Boca Raton.
Yan, W., Kang, M.S. Ma, B., Woods, S., Cornelius, P.L. (2007) GGE Biplot vs. AMMI Analysis of Genotype-by-Environment Data. Crop Science. 47, 643–653.