v29 Import Data Set
- 1 About
- 1.1 Preliminary Steps
- 1.2 Example Data
- 2 Data Import Wizard
- 2.1 Excel File Format
- 2.1.1 Simple Data Matching
- 2.1.2 Matching with VLOOKUP
- 2.1.3 Remove Formulas & Save
- 2.2 Start Import
- 2.3 Review & Save Mapping
- 2.4 Import
- 2.4.1 Correct Mapping Errors
- 2.1 Excel File Format
- 3 Import Means Data
About
The dataset importer allows users to upload historical studies from Excel files into the Breeding Management System. Note : There are preliminary steps to before a dataset can be imported.
Preliminary Steps
Dataset import cannot proceed until preliminary steps are completed.
Germplasm represented in the dataset must first be imported into to the BMS and a list created.
All of the phenotypes measured in the dataset must first be defined it the BMS ontology.
All of the locations represented in the dataset must first be defined in the BMS.
All of the people represented in the dataset must first be added as BMS users.
Example Data
The following instructions follow import of an example data set after the completion of required preliminary steps.
2014 CIMMYT Maize Study -
42 maize lines
Evaluated at 7 locations (instances) across Mexico.
(1.) Palmira
(2.) Agua Fria
(3.) Obregon Station
(4.) Palmira
(5.) San Julian
(6.) Tlaltizapan
(7.) Veracruz2 reps per location
Randomized complete block design
5 phenotypes measured
The example dataset: Gold colored columns of data represent those that should be formatted before import. The darker gold columns are BMS-specific indentifiers that must be matched before import.
Data Import Wizard
The Data Import Wizard is the most flexible way to import historical data from trials and nurseries. Some critical identifiers must be perfectly formatted, but other data, like phenotypic names can be mapped to BMS-specific terms. Once a mapping is completed, the BMS "remembers" past mapping to
ease subsequent dataset uploads.
Excel File Format
The Wizard will consider one worksheet of observation data (.xls). Formatting and some data matching is needed for trial design and BMS identifiers. The phenotypes do not require any formatting. At least four columns of data are required to import phenotypic observations: TRIAL_INSTANCE, ENTRY_NO, PLOT_NO and GID. These column headers must be exact. However you will probably also want to include additional descriptive data.
Common Columns of Data | About |
---|---|
EXPT_DESIGN | Needed to be included and filled for the study to be available from Manage Studies
|
ENTRY_TYPE | Needed to differentiate test entries from checks
|
GID | REQUIRED to match germplasm to the BMS database. Can be found in BMS Manage Germplasm. Open the germplasm list, lock it, and export as .xls file. Importing this list into the BMS is one the required preliminary steps (see above). |
DESIGNATION | Human readable germplasm name. |
ENTRY_NO | REQUIRED sequential numbering of the germplasm list (1,2,3.....,n). |
REP_NO | Sequential numbering of replicates per instance (1,2,3.....,n). |
PLOT_NO | Sequential numbering of plots per instance (1,2,3.....,n). |
SUB_BLOCK | Sequential numbering of blocks per instance (1,2,3.....,n). |
LOCATION_NAME | Human readable location name. |
LOCATION_ID | Needed to match locations to the BMS database. Can be exported directly from the database tables or read from the Manage Program Settings>Locations interface, |
TRIAL_INSTANCE | REQUIRED sequential numbering of the instances per study (1,2,3.....,n). Generally there is a one-to-one relationship between instance and location, but instance can also represent time. For example, a single location can be associated with two instances of study that took place over two \years. |
PI_NAME | First and last name of a BMS user |
PI_NAME_ID | Needed to match users to the BMS database. Can be exported directly from the database tables or read from the description sheet (.xls) of study created by the person of interest. |
Simple Data Matching
EXPT_DESIGN
The example dataset has a randomized complete block design. Enter "RCBD" into the first empty cell of the column. Copy.
Paste to fill all cells of the column with "RCBD".
PI_NAME_ID
There is only one person associated with this dataset, McLaren Graham. His PI_NAME_ID in the example BMS is 7. Enter PI_NAME_ID in the first empty column. Copy. Paste to fill all cells of the column with PI_NAME_ID.
Matching with VLOOKUP
In columns where there isn't a one-to-one relationship for data matching you can use a matching array and VLOOKUP to match data.
Open a second worksheet. Create matching arrays for the 42 germplasm and their GIDs as well as the 7 TRIAL_INSTANCEs and their Location_IDs.
Two data matching arrays. The germplasm array contains columns of data copied from a BMS list export file. The location array was created manually by matching location, instance, and LOCATION_IDs. LOCATION_IDs were copied from the Manage Program Settings-Locations interface.
GID
Return to the dataset worksheet. Develop the VLOOKUP formula needed to retrieve GID based on DESIGNATION. Lock the coordinates for the matching array by inserting $ before each position.
=VLOOKUP(E2,Sheet1!$A$2:$B$43,2,0) Retrieves GID from 2nd column of the matching array based on exact DESIGNATION matches. The dollar signs lock the match array so that the formula can be copied to down the column.
Copy the formula and paste down the entire column to match all designations with GID.
LOCATION_ID
Develop the VLOOKUP formula needed to retrieve LOCATION_ID based on TRIAL_INSTANCE. Lock the coordinates for the matching array by inserting $ before each position.
Copy the formula and paste down the entire column to match all TRIAL_INSTANCES with LOCATION_ID.
Remove Formulas & Save
The BMS will reject an .xls file with cells that contain formulas.
You must create empty columns for each colum of data that contains formulas. Copy and paste "As Values" all columns of data that contain formulas. Delete the columns that contain formulas.
Save the (.xls) file. If you are using a newer version of Excel, you may be asked to confirm the file type.
Start Import
In Import Datasets, select the file and click Submit.
Specify all of the study details and select Next.
Review & Save Mapping
The BMS will attempt to map all the headers in the dataset file. Once a dataset has been mapped, the BMS will remember the mapping.
Carefully review BMS mapping. If a term is unmatched drag it to the appropriate variable category and match it to the BMS ontology. If a term is mapped incorrectly, use the re-mapping function to find the correct ontology term. If a match term can't be found, data import will need to stop until the term is added to the BMS ontology. Save mapping.
The un-mapped variables can be mapped into BMS ontology by dragging the variable into its group. In the example above, drag location name into the Trial Environment Group. Select LOCATION_NAME. Apply Mapping.
Map PI_NAME to COOPERATOR and PI_NAME_ID to COOPERATOR_ID. Click Save Mapping.
Confirm header mapping.
Import
Import observations.
Keep or discard out of range values (defined by the ontology).
The study is now available for analysis and queries. Use Manage Studies to open the study.
Correct Mapping Errors
Upon saving, the BMS validates mapping and will flag errors.
Select Re-do header mapping.
Remap to the correct term and apply mapping. Save mapping again.
Import Means Data
Click on Import Datasets and Select a File. After choosing the means file, click Submit.
Fill out the information for Observation and Study Details. Take note that in the Dataset Type, select “Means data for each trial”. Click Next.
Map the Means Variables. In this example, the variable EH_cm_CV was automatically mapped.
Click Save Mapping and Click Confirm Header Mapping
Click Import Study Book