{"id":1453,"date":"2012-11-22T09:08:36","date_gmt":"2012-11-22T17:08:36","guid":{"rendered":"http:\/\/bien.nceas.ucsb.edu\/bien\/?page_id=1453"},"modified":"2012-11-27T08:47:51","modified_gmt":"2012-11-27T16:47:51","slug":"validation","status":"publish","type":"page","link":"https:\/\/bien.nceas.ucsb.edu\/bien\/biendata\/previous-bien-versions\/bien-3\/validation\/","title":{"rendered":"Validations"},"content":{"rendered":"<p>As with\u00a0<a title=\"BIEN 2\" href=\"http:\/\/bien.nceas.ucsb.edu\/bien\/bien-database-summary-statistics\/bien-2\/validations\/\">BIEN 2<\/a>, data within the BIEN 3 database are subjected to a number of validations and standardization, including taxonomic name resolution, geographic name resolution, geovalidation and application of a standard higher taxonomic classification.\u00a0<a href=\"http:\/\/bien.nceas.ucsb.edu\/bien\/bien-database-summary-statistics\/bien-2\/validations\/\">All validations and standardizations used in BIEN 2<\/a>\u00a0have been applied to BIEN 3, with major algorithmic improvements, particularly for geographic name resolution and geovalidation.<\/p>\n<ul>\n<li><strong>Taxonomic name resolution<\/strong>\n<ul>\n<li>Taxon names standardized according to\u00a0<a title=\"Tropicos\" href=\"http:\/\/www.tropicos.org\">Tropicos<\/a>\u00a0using the external Taxonomic Name Resolution Service (<a title=\"TNRS\" href=\"http:\/\/tnrs.iplantcollaborative.org\/\" target=\"_blank\">TNRS<\/a>)<\/li>\n<li>Family classification standardized according to\u00a0<a title=\"APG III\" href=\"http:\/\/en.wikipedia.org\/wiki\/APG_III_system\">APG III<\/a><\/li>\n<\/ul>\n<\/li>\n<li><strong>Geographic name resolution<\/strong>\n<ul>\n<li>Required for geovalidation<\/li>\n<li>Asserted names of the three political divisions country, state\/province and county\/parish are translated to standard <a title=\"GADM database of Global Administrative Areas\" href=\"http:\/\/www.gadm.org\/\" target=\"_blank\">GADM<\/a> names<\/li>\n<li>Country names are standardized first, then states within countries, counties within states<\/li>\n<li>Standardization consists of various steps, including:\n<ul>\n<li>Converting unconverted utf-8 and extended ascii codes<\/li>\n<li>Converting alternative codes (for example, 2 character ISO codes for countries) to actual names<\/li>\n<li>Matching against both accented and plain ascii versions of names<\/li>\n<li>Lookups against tables of synonymys and alternative names in multiple languages from the <a title=\"GeoNames\" href=\"http:\/\/www.geonames.org\/\" target=\"_blank\">GeoNames database<\/a><\/li>\n<\/ul>\n<\/li>\n<li>Scripts by Jim Regetz<\/li>\n<\/ul>\n<\/li>\n<li><strong>Geovalidation<\/strong>\n<ul>\n<li>\"Geovalidation\" as used here means checking that the latitude and longitude of a taxon observation falls within its declared political divisions.<\/li>\n<li>BIEN 3 uses a completely new geovalidation pipeline, developed by Jim Regetz<\/li>\n<li>The pipeline runs in PostGIS\/PostgreSQL, thereby taking advantage of Postgres's ability to natively execute spatial joins<\/li>\n<li>Political division spatial data from the\u00a0<a title=\"GADM\" href=\"http:\/\/www.gadm.org\/\" target=\"_blank\">GADM database of Global Administrative Areas<\/a>.<\/li>\n<li>Optimizations include simplification of political division boundaries using the\u00a0PostGIS implementation of the\u00a0<a href=\"http:\/\/en.wikipedia.org\/wiki\/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm\">Douglas-Peucker algorithm<\/a><\/li>\n<li>Geovalidation of all 1,707,970 unique localities within the entire BIEN 3 database \u00a0to the level of county\/parish takes about 2 hours (compare to several weeks in BIEN2 to validate to state level only).<\/li>\n<\/ul>\n<\/li>\n<li><strong>Counts of individuals per species per plot<\/strong>\n<ul>\n<li>Aggregation of individuals and counts of abundance per species for individuals-based\u00a0plots<\/li>\n<\/ul>\n<\/li>\n<li><strong>Combining of plots and specimens<\/strong>\n<ul>\n<li>Observations from both plots and specimens are combined into a single table of georeferenced taxon occurrences<\/li>\n<\/ul>\n<\/li>\n<li><strong>Detection and flagging of suspected cultivated specimens<\/strong>\n<ul>\n<li>Uses original cultivated flags, if any, plus algorithms based on (a) key words in locality description (e.g., \"cultivated\", \"planted\", \"garden\", etc.), (b) known distributions of specific higher taxa (e.g., no pines south of Nicaragua), and (c) proximity to locations of herbaria and botanical gardens.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Designation of major higher taxon for each species<\/strong>\n<ul>\n<li>All nodes of the\u00a0<a title=\"NCBI Taxonomy\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/taxonomy\" target=\"_blank\">NCBI Taxonomy<\/a>\u00a0phylogenetic tree are included in BIEN 3<\/li>\n<li>Each observation in BIEN 3 is joined to the NCBI phylogenetic backbone by family (using APG III families returned by the TNRS during name resolution; see <strong>Taxonomic resolution<\/strong>, above)<\/li>\n<li>Ancestor lookup are used to populate column `higherPlantGroup`, which provides convenient categories of major higher taxa<\/li>\n<li>Values of higherPlantGroup: \"bryophytes\", \"ferns and allies\", \"Flowering plants\", \"gymnosperms (conifers)\", \"gymnosperms (non-conifer)\"<\/li>\n<li>Embryophytes (land plants) have a non-null value of higherPlantGroup, non-Embryophytes are null.<\/li>\n<li>Other more detailed or custom breakdowns possible by querying directly the phylogenetic backbone<\/li>\n<\/ul>\n<\/li>\n<li><strong>Normalization and indexing of data sources<\/strong>\n<ul>\n<li>Metadata pertaining to data sources and data ownership are\u00a0linked to the observations they provide<\/li>\n<li>Enables dataset-level application of access rules and proper attribution of sources<\/li>\n<\/ul>\n<\/li>\n<li><strong>Standardization of plot methodology metadata<\/strong>\n<ul>\n<li>Standardization of unconstrained vocabulary pertaining to plot methodology enables more reliable selection of inventories collected using standard methodologies (for example, \"0.1 ha transect, &gt;=2.5 cm dbh\", \"1 ha plot, &gt;=10 cm dbh\").<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>As with\u00a0BIEN 2, data within the BIEN 3 database are subjected to a number of validations and standardization, including taxonomic name resolution, geographic name resolution, geovalidation and application of a standard higher taxonomic classification.\u00a0All validations and standardizations used in BIEN 2\u00a0have been applied to BIEN 3, with major algorithmic improvements, particularly for geographic name resolution [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":1385,"menu_order":10,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"class_list":["post-1453","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/bien.nceas.ucsb.edu\/bien\/wp-json\/wp\/v2\/pages\/1453"}],"collection":[{"href":"https:\/\/bien.nceas.ucsb.edu\/bien\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/bien.nceas.ucsb.edu\/bien\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/bien.nceas.ucsb.edu\/bien\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/bien.nceas.ucsb.edu\/bien\/wp-json\/wp\/v2\/comments?post=1453"}],"version-history":[{"count":8,"href":"https:\/\/bien.nceas.ucsb.edu\/bien\/wp-json\/wp\/v2\/pages\/1453\/revisions"}],"predecessor-version":[{"id":1755,"href":"https:\/\/bien.nceas.ucsb.edu\/bien\/wp-json\/wp\/v2\/pages\/1453\/revisions\/1755"}],"up":[{"embeddable":true,"href":"https:\/\/bien.nceas.ucsb.edu\/bien\/wp-json\/wp\/v2\/pages\/1385"}],"wp:attachment":[{"href":"https:\/\/bien.nceas.ucsb.edu\/bien\/wp-json\/wp\/v2\/media?parent=1453"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}