v29 Single-Site Analysis
- 1 Introduction
- 2 Select Dataset to Analyze
- 3 Load Data into Breeding View
- 3.1 Run Analysis
- 4 Review Results
- 4.1 Quality Assurance
- 4.2 Report
- 4.3 Heritability Table
- 4.4 Combined File of Predicted Means
- 4.5 Individual Environment Report using the Performance Trial 2018 as an example
- 4.5.1 Environment Report Summary
- 4.5.2 Genotypes by Environment Sorted by BLUPs
- 4.5.3 Summary of Traits by BLUP
- 4.5.4 Genetic Correlations Between Traits
- 4.5.5 Summary Statistics for Individual Trait Raw Data
- 4.5.6 Estimated Heritability of Individual Trait
- 4.5.7 Genotypes by Trait Sorted by BLUEs
- 4.5.8 Standard Errors of Difference
- 4.5.9 Wald/F Test
- 4.5.10 Diagnostic Residual Plots of Individual Traits
- 5 Upload BLUEs & Summary Stats to BMS -SimTrial
- 6 SSA analysis Result in BMS
- 7 References
Introduction
The BMS links with a Statistical Analysis package called Breeding View. This package is propriety software developed by VSNi and is based on Genstat and uses ASREML for mixed model analysis of plant breeding trials. Breeding View is designed to quickly perform routine analysis for plant breeding. It is not so versatile as a statistical package, such as the full version of Genstat, which is required for research analysis.
Breeding View’s single site analysis produces adjusted means, best linear unbiased estimators and best linear unbiased predictors (BLUEs and BLUPs) per genotype, as well as summary statistics to describe the data. The next tutorial, Multi-Site (GxE) Analysis, uses the summary statistics and adjusted means (BLUEs) to perform a genotype by environment (GxE) analysis. Adjusted means can also be used in a QTL (quantitative trait loci) analysis pipeline.
(Click here to learn more about Statistical Analysis)
Select Dataset to Analyze
Open Single Site Analysis from the Statistical Analysis menu of the Workbench. Select Browse to find the SimTrial dataset.
Select 2024 > SimTrial. This trial has 16 Genotypes x 3reps x 5sites
The study displays the factors it contains:
By default, all the traits are selected for Analysis – only one trait in the example. Deselect those which do not require analysis.
Any of the non-analysed variables can be specified to be covariates in an analysis of covariance for the analysed variables. We will not specify any covariates. Click Next.
On the form to Specify Options for Breeding View Analysis you must
Site/Environment: first select the variable which distinguishes between sites. You often will select the LOCATION_NAME. We will select TRIAL_INSTANCE and select all sites.
Design details: These fields will be filled out if you have designed the trial in the BMS. You need to specify the design type if you have imported the trial design – here Randomised Complete Block. Only the fields with “*” need filling.
Genotypes: Usually the best is “DESIGNATION”
^
Note : BV (Breeding View) icon on desktop
Select Download Input Files.
This will download a zip file with the name of the study.zip – SimTrial.zip in the example.
Notice the green and white “BV” button on the desktop toolbar at the bottom outside the BMS.
You need to extract the files from this zip file into a directory where the analysis will be performed. There are two files in the zip, one csv file which contains the data to be analysed and an xml file which contains the instructions for the analysis.
Load Data into Breeding View
BMS Version 27.0 is a server application. Breeding View (BV) is a Windows-compatible desktop application (see Install Breeding View). Trial data exported from the BMS are perfectly formatted for analysis by the BV application.
You now leave the BMS as it is.
Launch the Breeding View application on your local computer and select Open Project.
In BV you go to “Open project “on the top toolbar and browse for the xml file you just created and open.
The Single Site Analysis pipeline is now populated.
Run Analysis
The analysis pipeline includes a set of connected nodes, which can be used to run and configure pipelines.
Right click on the Quality Control Phenotypes revealing two options:
“Settings” – you can add extra statistics to be displayed in the output eg “Boxplots”
“Run selected environment pipelines” -this will begin the analysis and you will see its progress on your screen in the bottom left hand corner.
Wait until you see the “Pipeline complete” popup.
All nodes in the analysis pipeline are green when the analysis is complete.
Select OK to view the analysis results.
Review Results
Quality Assurance
Breeding View provides an overview of potentially influential measurements to help users identify and possibly exclude observations. Influential observations may reflect true genotypic variation and care should be taken not to exclude these data from the trial. Observations that deserve exclusion are obvious errors or measurements influenced by heterogeneous environmental variation within a block, like damage to a single plot.
Once you have eliminated these possibilities, you can choose to make these outliers as missing. The other reps of the entry with an outlier are tabled at the bottom to aid in your decision. You can set a plot to “missing” and rerun the pipeline. If the heritability is improved by removing outliers, this seems a good reason to leave them out. If the heritability is worse or not improved, put the outliers back in and re-run.
In this example, the outlier in Environ 3 is set to missing. Note the other reps of this entry “12” are at the bottom.
You now rerun the pipeline.
Rerunning it shows an improved heritability for this site - so keep it out.
Report
This tutorial section provides a brief guide to interpreting the results, including graphs, under the Report tab.
Report Tab Contents
Heritability Table
Combined File of Predicted Means: Excel file of BLUEs and BLUPs
Links to Individual Environment Reports
Heritability Table
The heritability table summarizes the generalized heritabilities calculated for each trait by location as described by Oakey et al. 2006. The method uses the average pairwise prediction error variance to obtain genetic and error covariance matrices and allows for the estimation of heritability in unbalanced data with complex error and genetic structures. If a model cannot be fitted to the trait data, such as when there is no variability in the trait measurement, that trait will not be included in the heritability table.
Combined File of Predicted Means
Select the link to the combined file of predicted means.
Open in Excel. Non-informative traits that could not be fitted with a model will appear as missing data.
BLUPs across all locations
About BLUEs & BLUPs
Breeding View reports BLUPS and BLUES for each entry in each location. BLUES (Best Linear Unbiased Estimates) are the classical least squares means from the fixed effect model, and BLUPS are the Best Linear Unbiased Predictors from a mixed model in which the entries are modeled as a random effect. BLUES suffer a disadvantage in predicting future performance of an entry because extreme values can result from a good genotype or a lucky environment (eg extra fertile plot). Mixed models estimate the variability of these two components - genotypes and environment, and adjust the means towards the grand mean by an amount which depends on the ratio of genetic to total phenotypic variance (the heritability). If this ratio is large, near 1, the adjustment is very small and BLUES and BLUPS will be almost the same, but if this ratio is small, near 0, the adjustment will be large and BLUPS for all entries will be about the value of the grand mean. BLUPS have been empirically shown to be much better predictors of future performance than BLUES, and the heritability (more accurately described as the repeatability) measure presented is a very good guide to the value of the results of each location in terms of distinguishing between the genetic merit of the entries. If heritability is moderate to low users should be guided in their selections by the BLUPS, in cases of high heritability there is little difference.
Individual Environment Report using the Performance Trial 2018 as an example
AT THIS POINT WE WILL LOOK AT THE PERFORMANCE TRIAL 2018 AS IT HAS SIX TRAITS MEASURED AND WILL BE BEST TO ILLUSTRATE THE CORRELATION BETWEEN TRAITS FOR EXAMPLE. (The SimTrial has only one trait).
Although the four locations for the Performance Trial are summarized together in the table, each location is represented by an individual analysis. Select the link to the Environment 4 individual trial report to review the analysis performed at this location.
Individual Environment Reports Include:
File of predicted means: Link to Excel (.xls) file containing an environment-specific subset of BLUEs and BLUPs
Best Genotypes Table: Best genotypes as defined by BLUPs sorted by factors defined in the report options
Summary of Traits: A table presenting the minimum, mean, maximum, and heritability for each trait within this location based on BLUPs
Estimated Genetic Correlations Between Traits
Principle Components Biplot
Individual Trait Analysis: Summary statistics of raw data, heritability, sorted genotype table (BLUEs), standard errors of differences, and residual diagnostic plots.?
Environment Report Summary
The environment report provides the project name, the environment name, and the field design, along with a date stamp for the analysis. Users are presented with a link to the adjusted means data for this environment, which is a subset of the data presented in the combined mean file reporting all locations. The file reports the BLUPS and BLUES for each entry on separate sheets (see details above). Users are also notified about analysis failures.
About the Statistics
This file also contains summary statistics which are sometimes a little different from those computed from older statistical packages. This is because they are computed from results of different models - mixed models, which are more suited to representing field trial data and are now available to us with the new computing algorithms available in Genstat and Breeding View. For example, for the CV Breeding View uses 100 x sqrt (estimated residual m.s.) / (estimated grand mean), where the residual sum of squares is estimated from fitting the FIXED model by GLS with a covariance matrix derived from the estimated variance components. The estimated grand mean is calculated by taking the mean of the fitted values, and the heritability is the generalized heritability which is different to the simple heritability calculation of Var(G)/Var(P) which is adequate for simple models, but not for more complex ones such as spatial models. Generalized heritability is described in Oakey et al. 2006 and Cullis et al. 2006.
Genotypes by Environment Sorted by BLUPs
The Predicted means sorted by the highest Genotypes at Environment1- Sorted by Ant_DT_day values:
Summary of Traits by BLUP
Tlaltizapan Summary of Traits: The minimum, mean, maximum, and heritability for each trait based on BLUPs
Genetic Correlations Between Traits
BLUPs Principle Components Biplot
Tlaltizapan Principle Components Biplot of BLUPs:The biplot shows that field weight, grain yield dry & fresh weight (GY_DW_gPlot & GY_FW_kgPlot), plant height (PHTsIB_M_cm) , and ear height (EH_M_cm) are all positively correlated among each other (acute angle of vectors), and negatively correlated with anthesis date (DTA_days_obs) (obtuse angle). In addition, all traits are weakly correlated with grain moisture (GMoi_NIRS_pct) (right angle). These relationships can also be seen in the genetic correlation matrix below.
Genetic Correlation Matrix
Tlaltizapan Estimated Genetic Correlations: Pairwise correlation (r) of phenotypic traits. There is a strong positive correlation (0.9978) between the related yield measures, grain yield and field weight (GY_DW_gPlot & GY_FW_kgPlot). These two yield traits are moderately negatively correlated (-0.6890 & -0.6830 respectively) to anthesis date (Ant_DT_days). In other words, late anthesis is correlated to low yield.
Summary Statistics for Individual Trait Raw Data
Tlaltizapan Summary Statistics for Grain Yield (GY_FW_kgPlot) based on raw data
Estimated Heritability of Individual Trait
BV reports the generalised heritability. This is different to the simple heritability calculation of Var(G)/Var(P) which is adequate for simple models, but not for more complex ones such as spatial models. Generalized heritability is described in Oakey et al. 2006 and Cullis, Smith and Coombes, 2006.
Estimated Heritability of Grain Yield (GY_FW_kgPlot) calculated at Tlaltizapan
Genotypes by Trait Sorted by BLUEs
20 Best Grain Yield (GY_FW_kgPlot) Genotypes Calculated at Tlaltizapan: Genotypes are sorted by BLUEs in descending order by value as specified in the Report Options.
Standard Errors of Difference
Two genotypes are considered different when their means are 2 times the standard error of the difference (SED), equivalent to LSD (Least Squared Difference). In general a breeder would use the average, and consider the minimum and maximum to have some sense of the differences in precision of comparisons among means.
Standard Errors of Difference for Grain Yield (GY_FW_kgPlot): In a balanced design without missing data, like in this example the average, maximum, and minimum SED are equivalent.
Wald/F Test
Diagnostic Residual Plots of Individual Traits
Diagnostic residual plots are used to check the model assumptions. Residuals are defined as the difference between the observed and fitted values. A good model “fit” for adjusted means will have residuals that should be independent and follow a normal distribution with a mean of zero and a constant variance. All but the independence assumption can be checked with the residual plots - independence follows from the randomization of the experimental design.
Utility of Each Diagnostic Plot
Histogram of Residuals: Check for a normal, or Gaussian distribution, as well as centering on a mean of zero.
Fitted-Value Plot: Check for constant variance as well as centering on zero. A random distribution, or “shot-gun pattern”, reflects constant variance. Positive or negative correlations between residuals and fitted values, or ‘loud-speaker-shaped’ distributions, point to violation of the constant variance assumption
Normal Plot: Check normality. Distribution in a straight line across the diagonal reflects a normal distribution. The Normal Plot has the same use as the Histogram of Residuals, but is generally a better visualization.
Half-Normal Plot: Check normality. Distribution in a straight line across the diagonal reflects a normal distribution. The Half-Normal plot is the same as the normal plot, but considers the absolute value of residuals. This plot is useful with small data sets.
Diagnostic plots for Grain Yield (GY_FW_kgPlot) in Tlaltizapan: Example of a continuous variable exhibiting a good model fit
Upload BLUEs & Summary Stats to BMS -SimTrial
WE RETURN TO THE SIMTRIAL TO UPLOAD AS IT IS VERY SUITABLE TO ILLUSTRATE THE SUBSEQUENT MULTI-SITE ANALYSIS.
Return to the Breeding Management System Single-Site Analysis. Select Upload Breeding View Output Files to BMS to save the adjusted means and summary statistics to the BMS database for use in the subsequent Multi-Site (genotype by environment) analysis.
Browse then Highlight the SimTrial and Select.
Select Browse.
A directory is found in your download saying “Upload”.
Browse to the zipped Breeding View Upload file in your computer within the upload folder. The zip file is date and time stamped. There maybe more than one zip file from each time you ran the pipeline. Select the last zip file and upload. Note you don't unzip the file.
Once the import is successful, the means and summary statistics from the single site analysis are available to perform a Multi-Site (genotype by environment) analysis.
SSA analysis Result in BMS
If an SSA result has been uploaded for a study, the results in BMS can be viewed in the Summary View or as a tab when the study is opened.
SSA Result in Summary view
Confirm the uploaded SSA analysis data by selecting the SimTrial from the STUDIES menu option in the Manage Studies Tool.
You will notice there is now a SSA Results tab which displays the Summary statistics including Heritability and CV.
You can also select the Means (BLUES) to see the individual location means but your data is ready for the Multi-site Analysis.
References
Cullis, Smith and Coombes, 2006. On the Design of Early Generation Variety Trials With Correlated Data. American Statistical Association and the International Biometric Society Journal of Agricultural, Biological, and Environmental Statistics, Volume 11, Number 4, Pages 381–393. DOI: 10.1198/108571106X
Oakey, H., Verbyla, A. P., Pitchford, W., Cullis, B., & Kuchel, H. (2006). Joint modeling of additive and non-additive genetic line effects in single field trials. Theoretical and Applied Genetics, 113, 809–819.
Murray, D. Payne, R, & Zhang, Z. (2014) Breeding View, a Visual Tool for Running Analytical Pipelines: User Guide. VSN International Ltd. (.pdf) (Sample data .zip)