v29 Multi-Site (Genotype by Environment) Analysis

v29 Multi-Site (Genotype by Environment) Analysis

This tutorial describes a genotype by environment (GxE) analysis for a five-location field trial (SimTrial). This tutorial builds upon the adjusted means (BLUEs) and summary statistics calculated for the individual locations in the previous tutorial, Single Site Analysis: 5 Location Batch.

Select Data from the Database

  • Open Multi-Site Analysis from the STUDIES menu. Select Browse.

image-20240710-061540.png
  • Select SimTrial under the 2024 folder to use the BLUEs and summary statistics uploaded to the BMS after the single site analysis. You need at least three sites for a GxE analysis and four or more is better.

  • Define environments and groups.

    • Environments: LOCATION_NAME often used but in this example we will use TRIAL-INSTANCE as it will make the interpretation simple for this example

    • Genotype: DESIGNATION usually

    • Environment Grouping Factor: None

      • the form asks if your environments are already grouped in some way which would account for significant GxE interactions. Usually, we do not know about groupings at the early stage, and mostly do not have enough environments for subsets, so leave this as is.

Traits with means available from all trial locations are selected by default. Traits that are not observed or could not be fitted with a mixed model in multiple environments in the single-site analysis are not selected for Multi-Site analysis.

image-20241029-033909.png
  • Select Next.

Generate BV Input Files from BMS

  • Review the five environments and traits to include in the multi-site analysis.  This allows you to eliminate environments for which there is insufficient data or for which the heritability is too low, and similarly at the bottom you can eliminate traits. 

  • Select Download Input Files. This will download a zip file with three files, one of which is the xml file for means that will be the input to Breeding View. You will unzip this file.

image-20241029-034532.png

Load Project & Data

  • As before, leave the BMS and open Breeding View on your desktop. In BV open the project by browsing for the xml means file. This will populate BV but this time with the Multi-site Analysis flow diagram.

image-20241029-035150.png

Run Analysis

When a project has been created or opened, a visual representation of the analytical pipeline is displayed in the Analysis Pipeline tab. The analysis pipeline includes a set of connected nodes, which can be used to run and configure pipelines. 

Node Descriptions:

  • Quality Control Phenotypes: Summary statistics within and between environments for the trait(s)

  • Finlay-Wilkinson: Performs a Finlay-Wilkinson joint regression (Finlay and Wilkinson, 1963)

  • AMMI Analysis: Fits an AMMI model and generates summaries and a biplot (Gauch, 1988)

  • GGE Biplot: Fits a GGE model and generates a biplot (Yan et al., 2000). 

  • Variance-Covariance Modeling: Fits different variance-covariance models to the GxE data and selects the best one for the data

  • Stability Coefficients: Estimates different stability coefficient parameters to assess genotype performance

  • Generate report: Generates an HTML report of the results.

 

  • Select only grain yield (GY_M_kgPlot) from the analysis.

  • Run the analysis using the default settings by right-clicking the Quality Control Phenotype node and choosing Run Pipeline.

  • When the analysis is complete, a popup will notify the user and there will be a directory in your downloads with a date-stamp as shown.

image-20241029-040438.png
  • Within this directory are files of what is generated about association of sites (eg heat maps) and stability measures.

image-20241029-040604.png

Analysis Report & Graphs

The analysis output can be viewed from the Breeding View interface under the results and graphs tabs. Analysis results can also be reviewed as individual files are automatically saved in the location specified by your browser settings, generally the Downloads folder.

Descriptive Statistics

Breeding View provides descriptive statistics describing the entire dataset's variance and covariance.

Trait Summary Statistics

The trait summary statistics describe each trait based on the means calculated for each environment in the single-site analysis.

M1.png

The box plot of means provides a visual representation of the summary statistics.

M2.png

Boxplot of Grain Yield Means: Location 2 has the highest grain yield while locations 3 and 5 have high variances.

Best Variance-Covariance Model for Each Trait

The GxE analysis pipeline formally models the variance-covariance structure in the means data and selects the best model for each trait. The main purpose is to establish a model for later testing of fixed effects, like determining marker effects in a quantitative trait loci by environment (QTLxE) analysis using BLUPs calculated in the single site analysis. 

M3.png

Best Variance-Covariance Model: In this example, grain yield means are best described by Compound symmetry.

Genotype By Environment (GxE) Interactions

Stability, or lack of phenotypic plasticity, is calculated for each genotype considering all traits using the following analyses:

  • Cultivar-Superiority Measure

  • Static Stability Measures Coefficients

  • Wricke’s Ecovalence Stability Coefficients

GxE interactions are also examined for each individual trait using the following analyses:

  • Finlay and Wilkinson Modified Joint Regression

  • AMMI Model

  • GGE Model

  • Best Variance-Covariance Model

  • Correlation Matrix

  • Scatter Plot Matrix

Stability Superiority Measure

Stability Superiority Measure(Lin & Binns,1988) is the sum of the squares of the difference between the genotypic mean in each environment and the mean of the best genotype, divided by twice the number of environments. Genotypes with the smallest values of superiority tend to be more stable and closer to the best genotype in each environment.

M4.png

Static Stability Measures Coefficients

The Static Stability Coefficient is defined as the variance around the germplasm’s phenotypic mean across all environments. This provides a measure of the consistency of the genotype without accounting for performance.

M5.png

Wrick’s Ecovalence Stability Coefficients

Wricke’s Ecovalence Stability Coefficient (Wricke, 1962) is the contribution of each genotype to the genotype-by-environment sum of squares in an unweighted analysis of the genotype-by-environment means. A low value indicates that the genotype responds in a consistent manner to changes in environment; i.e. stable from a dynamic point of view. Like static stability, the Wricke’s Ecovalence does not account for genotype performance.

M6.png
M7.png

Finlay and Wilkinson Modified Joint Regression Analysis

The Finlay and Wilkinson Modified Joint Regression Analysis ranks germplasm based on phenotypic stability for individual traits.

M8.png
M9.png

AMMI Model

In the Additive Main Effects and Multiplicative Interaction (AMMI) model, a two-way ANOVA additive model is performed (additive main effects), followed by a principal component analysis on the residuals (multiplicative interaction). As a result, the interaction is characterized by Interaction Principal Components (IPCA), where genotypes and environments can be simultaneously plotted in biplots.

 

M10.png

 

M11.png

GGE Model

In the Genotype Main Effects and Genotype × Environment Interaction Effects (GGE) model (Yan et al. 2000 & 2003) a 1-way ANOVA, including environment as a main effect, is run followed by a principal component analysis on the residuals. Like AMMI, principal component scores can be used to construct biplots. Unlike the AMMI Model, in GGE the genotypic main effects are also represented in the plot. The GGE model is superior to AMMI analysis at differentiating mega-environments (Yan et al. 2007)

M12.png
M13.png

Environments 1,3 & 4 cluster, indicating that these two locations have similar environmental effects on phenotype and small GxE interactions.

Variance-Covariance Model & Correlation Matrix

Details on the variance-covariance model, including the pairwise correlation matrix from the covariance model is presented in a table in the Report tab. In the correlation  matrix values close to 1 indicate higher correlation between environments. A value of 1 indicates a perfect correlation, such as when an environment is compared to itself.

M14.png

Correlation Matrix for Grain Yield (GY_M_kgPlot).

Correlation Heat Map 

The correlation heat matrix visualizes correlations with color; warm colors (red) indicating high positive correlation between environments, and cool colors (blue) indicating high negative correlation between environments.

M15 Heat map.png

 

Correlation Heat Map of Grain Yield (GY_M_kgPlot): Environment 2 is most positively correlated (red) to Environment 4 environment, suggesting that these two locations have similar environmental effects on phenotype and small GxE interactions.

Scatter Plot Matrix

The scatter plot matrix illustrates the association of genotypic performance between each pair of environments.

Scatter Plot Matrix for Grain Yield (GY_M_kgPlot):

M16.png

Interpretation

  • The most important of these files encompasses this correlation of sites and the stability of lines.  They are really all you need and they are:

    ·        GxE_Means and

    ·        GGE biplot mega environs

  • The GxE means table looks like this.

image-20241029-142414.png
  • It needs simplification and rearranging to group sites according to the GGE biplot mega environs.

  • The GGE Biplot cluster showed sites 1, 3 and 4 were giving similar ranking and should be considered together, and they were different to sites 2 and 5. 

image-20241029-142621.png
  • Thus, presenting the data according to these clusters reveals some interesting conclusions. So, the GxE table above then is arranged as:

image-20241029-142712.png
  • So, putting sites 1,3 and 4 together reveals patterns that may be overlooked otherwise eg Meghan1 is very high yield in cluster one only, Bert1 has good yield but is very variable and Meghan5 Check is only adapted to site 5.

  • Within this one table you have captured correlation of sites (sites 1,3 & 4 giving similar information) and stability (Bill1 = high and stable yield, Meghan1= high yield but unstable).

References

Gauch, H. G. (1988). Model selection and validation for yield trials with interaction. Biometrics, 44, 705–715.

Gauch, H.G. (1992). Statistical Analysis of Regional Yield Trials – AMMI analysis of factorial designs. Elsevier, Amsterdam.

Finlay, K.W. & Wilkinson, G.N. (1963). The analysis of adaptation in a plant-breeding programme. Australian Journal of Agricultural Research, 14, 742-754.

Murray, D. Payne, R, & Zhang, Z. (2014) Breeding View, a Visual Tool for Running Analytical Pipelines: User Guide. VSN International Ltd. (.pdf) (Sample data .zip).

Lin, C.S. & Binns. M.R. (1988). A superiority performance measure of cultivar performance for cultivar x location data. Canadian Journal of Plant Science, 68, 193-198.

Lin, C. S., Binns, M. R., & Lefkovitch, L. P. (1986). Stability analysis: Where do we stand? Crop Science, 26, 894–900.

Oakey, H., Verbyla, A. P., Pitchford, W., Cullis, B., & Kuchel, H. (2006). Joint modeling of additive and non-additive genetic line effects in single field trials. Theoretical and Applied Genetics, 113, 809–819.

Wricke, G. (1962). Uber eine method zur erfassung der okogischen streubreite in feldversuchen. Zeitschriff Fur Pflanzenzuchtung, 47, 92-96.

Yan, W., Hunt, L. A., Sheng, Q., & Szlavnics, Z. (2000). Cultivar Evaluation and Mega-Environment Investigation Based on the GGE Biplot. Crop Science, 40, 597–605.

Yan, W. & Kang, M.S. (2003). GGE Biplot Analysis: a Graphical Tool for Breeders, Geneticists and Agronomists. CRC Press, Boca Raton.

Yan, W., Kang, M.S. Ma, B., Woods, S., Cornelius, P.L. (2007) GGE Biplot vs. AMMI Analysis of Genotype-by-Environment Data. Crop Science. 47, 643–653.