v28 Import Data Set

v28 Import Data Set

About

The dataset importer allows users to upload historical studies from Excel files into the Breeding Management System. Note : There are preliminary steps to before a dataset can be imported.

Preliminary Steps

Dataset import cannot proceed until preliminary steps are completed.

  1. Germplasm represented in the dataset must first be imported into to the BMS and a list created.

  2. All of the phenotypes measured in the dataset must first be defined it the BMS ontology.

  3. All of the locations represented in the dataset must first be defined in the BMS.

  4. All of the people represented in the dataset must first be added as BMS users.

Example Data

The following instructions follow import of an example data set after the completion of required preliminary steps.

2014 CIMMYT Maize Study -

  • 42 maize lines

  • Evaluated at 7 locations (instances) across Mexico.
       (1.) Palmira
       (2.) Agua Fria
       (3.) Obregon Station
       (4.) Palmira
       (5.) San Julian
       (6.) Tlaltizapan
       (7.) Veracruz

  • 2 reps per location

  • Randomized complete block design

  • 5 phenotypes measured

image-20250110-055202.png


The example dataset: Gold colored columns of data represent those that should be formatted before import. The darker gold columns are BMS-specific indentifiers that must be matched before import. 

Data Import Wizard

The Data Import Wizard is the most flexible way to import historical data from trials and nurseries.  Some critical identifiers must be perfectly formatted, but other data, like phenotypic names can be mapped to BMS-specific terms. Once a mapping is completed, the BMS "remembers" past mapping to
ease subsequent dataset uploads.

image-20240626-054108.png

Excel File Format

The Wizard will consider one worksheet of observation data (.xls).  Formatting and some data matching is needed for trial design and BMS identifiers. The phenotypes do not require any formatting. At least four columns of data are required to import phenotypic observations: TRIAL_INSTANCE, ENTRY_NO, PLOT_NO and GID. These column headers must be exact. However you will probably also want to include additional descriptive data.

Common Columns of Data

About

Common Columns of Data

About

EXPT_DESIGN

Needed to be included and filled for the study to be available from Manage Studies

  • RCBD = randomized complete block design

  • RIBD = resolvable incomplete block design

  • RRCD = row-and-rolumn design

  • Augmented = augmented design

  • EGDGN = other design

ENTRY_TYPE

Needed to differentiate test entries from checks

  • T = test entry

  • C = check entry

GID

REQUIRED to match germplasm to the BMS database. Can be found in BMS Manage Germplasm. Open the germplasm list, lock it, and export as .xls file. Importing this list into the BMS is one the required preliminary steps (see above).

DESIGNATION

Human readable germplasm name.

ENTRY_NO

REQUIRED sequential numbering of the germplasm list (1,2,3.....,n).

REP_NO

Sequential numbering of replicates per instance (1,2,3.....,n).

PLOT_NO

Sequential numbering of plots per instance (1,2,3.....,n).

SUB_BLOCK

Sequential numbering of blocks per instance (1,2,3.....,n).

LOCATION_NAME

Human readable location name.

LOCATION_ID

Needed to match locations to the BMS database. Can be exported directly from the database tables or read from the Manage Program Settings>Locations interface,

TRIAL_INSTANCE

REQUIRED sequential numbering of the instances per study (1,2,3.....,n). Generally there is a one-to-one relationship between instance and location, but instance can also represent time. For example, a single location can be associated with two instances of  study that took place over two \years.

PI_NAME

First and last name of a BMS user

PI_NAME_ID

Needed to match users to the BMS database. Can be exported directly from the database tables or read from the description sheet (.xls) of study created by the person of interest.

Simple Data Matching

EXPT_DESIGN

  • The example dataset has a randomized complete block design. Enter "RCBD" into the first empty cell of the column. Copy.

    image-20250110-055427.png
  • Paste to fill all cells of the column with "RCBD".

PI_NAME_ID

  • There is only one person associated with this dataset, McLaren Graham. His PI_NAME_ID in the example BMS is 7. Enter PI_NAME_ID in the first empty column. Copy. Paste to fill all cells of the column with PI_NAME_ID.

    image-20250110-065529.png

Matching with VLOOKUP

In columns where there isn't a one-to-one relationship for data matching you can use a matching array and VLOOKUP to match data.

  • Open a second worksheet. Create matching arrays for the 42 germplasm and their GIDs as well as the 7 TRIAL_INSTANCEs and their Location_IDs.

image-20250110-070024.png

Two data matching arrays. The germplasm array contains columns of data copied from a BMS list export file. The location array was created manually by matching location, instance, and LOCATION_IDs.  LOCATION_IDs were copied from the Manage Program Settings-Locations interface.

GID

  • Return to the dataset worksheet. Develop the VLOOKUP formula needed to retrieve GID based on DESIGNATION. Lock the coordinates for the matching array by inserting $ before each position.


=VLOOKUP(E2,Sheet1!$A$2:$B$43,2,0) Retrieves GID from 2nd column of the matching array based on exact DESIGNATION matches. The dollar signs lock the match array so that the formula can be copied to down the column.

  • Copy the formula and paste down the entire column to match all designations with GID.

image-20250110-070438.png

LOCATION_ID

  • Develop the VLOOKUP formula needed to retrieve LOCATION_ID based on TRIAL_INSTANCE. Lock the coordinates for the matching array by inserting $ before each position.

    image-20250110-070916.png
  • Copy the formula and paste down the entire column to match all TRIAL_INSTANCES with LOCATION_ID.

Remove Formulas & Save

The BMS will reject an .xls file with cells that contain formulas.

  • You must create empty columns for each colum of data that contains formulas. Copy and paste "As Values" all columns of data that contain formulas. Delete the columns that contain formulas.

  • Save the (.xls) file. If you are using a newer version of Excel, you may be asked to confirm the file type.

    image-20250110-070643.png

Start Import

Example File:

  • In Import Datasets, select the file and click Submit.

image-20250110-073407.png
  • Specify all of the study details and select Next.

    image-20250110-073813.png

Review & Save Mapping

The BMS will attempt to map all the headers in the dataset file. Once a dataset has been mapped, the BMS will remember the mapping.

  • Carefully review BMS mapping. If a term is unmatched drag it to the appropriate variable category and match it to the BMS ontology. If a term is mapped incorrectly, use the re-mapping function to find the correct ontology term. If a match term can't be found, data import will need to stop until the term is added to the BMS ontology. Save mapping.

image-20250110-074659.png

The un-mapped variables can be mapped into BMS ontology by dragging the variable into its group. In the example above, drag location name into the Trial Environment Group. Select LOCATION_NAME. Apply Mapping.

image-20250110-075102.png
  • Map PI_NAME to COOPERATOR and PI_NAME_ID to COOPERATOR_ID. Click Save Mapping.

image-20250110-075832.png

 

  • Confirm header mapping.

Import

  • Import observations.

    image-20250110-080223.png
  • Keep or discard out of range values (defined by the ontology).

The study is now available for analysis and queries. Use Manage Studies to open the study.

image-20250110-080330.png

Correct Mapping Errors

Upon saving, the BMS validates mapping and will flag errors.

  • Select Re-do header mapping.

  • Remap to the correct term and apply mapping. Save mapping again.

Import Means Data

  • Click on Import Datasets and Select a File. After choosing the means file, click Submit.

image-20240626-030339.png

Fill out the information for Observation and Study Details. Take note that in the Dataset Type, select “Means data for each trial”. Click Next.

image-20240626-033159.png

Map the Means Variables. In this example, the variable EH_cm_CV was automatically mapped.

image-20240626-034433.png

Click Save Mapping and Click Confirm Header Mapping

image-20240626-034606.png

Click Import Study Book

image-20240626-034647.png