The following validations and standardizations were performed to improve the quality and utility of data in BIEN 2:
- Taxonomic name resolution
- Taxon names standardized according to Tropicos using the external Taxonomic Name Resolution Service (TNRS)
- Family classification standardized according to APG III
- Due to shortcomings of Tropicos's Computed Acceptance algorithm, not all synonymous names have been resolved. In particular, many slight spelling variants remain in the BIEN 2 database. See Caveat #3.
- Geographic name resolution
- Political division names (country, state/province, county/parish) are standardized where possible
- A required preparation for geovalidation
- Scripts by Brad Boyle, with assistance from John Donoghue
- Geovalidation
- Uses point-in-polygon method to check if coordinate fall within asserted political divisions
- All observations validated at country level
- US, Mexico and Brazil validated at state/province level
- Distance error is provided for points falling outside the declared political division
- Scripts by John Donoghue
- Counts of individuals per species per plot
- Aggregation of individuals and counts of abundance implemented during building of core database table PlotAggregateFact (by students of Steve Dolins et al.)
- Unfortunately, this aggregation appears to have been done incorrectly and therefore abundance values in BIEN 2 cannot be used (see Caveats)
- Detection and flagging of suspected cultivated specimens
- Uses original cultivated flags, if any, plus algorithms based on (a) key words in locality description (e.g., "cultivated", "planted", "garden", etc.), (b) known distributions of specific higher taxa (e.g., no pines south of Nicaragua), and (c) proximity to locations of herbaria and botanical gardens.
- Scripts by Brad Boyle
- Designation of major higher taxon for each species
- All nodes of the NCBI Taxonomy phylogenetic tree are included in BIEN2
- Each taxon is indexed using modified pre-order tree traversal, allowing rapid lookup of all descendent or all ancestors of a given node.
- Each observation in BIEN 2 is joined to the NCBI phylogenetic backbone by family and genus
- Ancestor lookups of NCBI nodes are used to populate column `higherPlantGroup`, which provides a convenient shortcut for major higher taxa
- Values of higherPlantGroup: "bryophytes", "ferns and allies", "Flowering plants", "gymnosperms (conifers)", "gymnosperms (non-conifer)"
- Embryophytes (land plants) have a non-null value of higherPlantGroup, non-Embryophytes are null.
- Scripts by Brad Boyle
- Normalization and indexing of data sources
- Metadata pertaining to data sources and data ownership were scattered throughout the original BIEN database
- These have been collated into a single table, with sources linked to the observations they provided
- Querying this table allows proper attribution of all data used
- Scripts by Brad Boyle
- Improved plot methodology metadata
- Variable, unconstrained descriptions of plot methodology have been standardized in the columns `plotAreaHa`, `plotMinDbh` and `plotMethod`, allowing more reliable recognition of plots which use a similar methodology (for example, "0.1 ha transect", "1 ha plot").