Using the Native Status Resolver (NSR) bulk processing application
To use the NSR in batch mode, you must have an account on nimoy and be authorized to shell in remotely.
The NSR uses published regional checklists to determine if an observation of a species within a region represents a native or introduced individual organism. "Region" is defined as one or more of the standard hierarchy of political divisions: country, state-province and county-parish. The minimum required input for the NSR is therefore a species name and a country. The NSR assumes that both species names and political division names have been standardized prior to processing. Please ensure that all taxon and place names are correct prior to upload, otherwise they will not be resolved.
The NSR works as both a web service (processing one observation at a time) and a batch application (processing multiple observation in a single file). Until I have time to develop a web interface for uploading to the NSR batch application, you will need to upload files directly to the NSR working directory on NCEAS server nimoy.
STEP-BY-STEP PROCEDURE
-
- From your data, extract a table or file consisting of all unique taxon + political division combinations within the database.
- Select ONLY records identified at least to species that also have a value for country. All other values are optional
- If taxon is identified to a variety or subspecies, use only the species part of the name (e.g., For "Poa annua var. supina" use "Poa annua").
- Do not include authors with species names
- Taxon names should be previously standardized using TNRS
- Spelling of political division names should match the plain ascii (unaccented) English language format used in the GeoNames database. Among other rules, omit categories from state and county/parish names. For example, use "Guerrero" not "Estado de Guerrero" or "State of Guerrero" for the Mexican state.
- Use blanks (empty string) for unknown or missing values, NOT "NULL", "null" or "NA"
- Fields must be in the following order. Include all fields, even if a field is all blanks.
- Include the headers
Field name | Meaning | Required? |
taxon | Genus + specific epithet, separated by space, no authority* | Yes |
country | Country of observation | Yes |
state_province | State/province of observation | No |
county-parish | Next lower political division of observation (county/parish, etc.). Currently no NSR sources resolve to this level. Field must be included, just leave blank. Any included values are ignored, but this field must still be present. | No |
user_id | User-assigned record identifier. MUST be an integer. Leave blank if not used | No |
* You may also enter a family or a genus (without specific epithet) in the taxon column
- Save the input as a UTF-8 plain text file
- File must contain column header as first line
- File type can be CSV (default) or tab-delimited
- Move input file to NSR data directory
- Server: nimoy.nceas.ucsb.edu
- NSR data directory: /var/www/bien/apps/nsr/data/
- Run the NSR on the input file
- Navigate to the main application directory: /var/www/bien/apps/nsr/
- Execute the following command, where $INPUTFILENAME is the name of your input file. Other options are explained below.
php nsr_batch.php -e=false -i=false -f=$INPUTFILENAME -l=$LINE_ENDINGS -t=$FILE_TYPE -r=false
- Options are as follows (default in bold):
-e: terminal echo on [true,false]
-i: interactive mode [true,false]
-f: input file name [default name = 'nsr_input.csv']
-l: line-endings [unix,mac,win]
-t: file type [csv,tab]
-r: replace the cache [true,false]
- Please use –r=false to retain all previously cached results. Option –r=true is used only when NSR reference database has changed and previous results may not be valid.
- When the NSR has finished running, results file will be saved to the NSR data directory
- Will have the same base name as your input file, plus the suffix "_nsr_results.txt"
- Is always tab-delimitted, regardless of the format of the input file
- Contains all the fields in your original file, plus the following NSRs results fields:
Column | Meaning (values) |
native_status_country | Native status in country (see native status values, below) |
native_status_state_province | Native status in state_province, if any (see native status values, below) |
native_status_county_parish | Native status in county_parish, if any (see native status values, below) |
native_status | Overall native status in lowest declared political division (see native status values, below) |
native_status_reason | Reason native status was assigned |
native_status_sources | Checklists used to determine native status |
is_cultivated_in_region | Species is known to be cultivated in declared region (1=cultivated; 0=wild, not cultivate; blank=status unknown) |
- Note that is_cultivated_in_region is NOT the same as the original BIEN2 and BIEN3 isCultivated. The latter =1 is an individual plant is likely human planted, as indicated by the original specimen label or inferred from proximity to a botanical garden. is_cultivated_in_region=1 if a given species within a given checklist region has been flagged as commonly cultivated (for example, Capsicum annuum [chile] in Mexico). Although it is likely that an observation with a value of is_cultivated_in_region=1 represents a planted individual, it is possible is it wild (again, Capsicum annuum in Mexico).
Native status values
Native status code | Meaning |
P | Present in checklist for region of observation but no explicit status assigned |
N | Native to region of observation |
Ne | Native and endemic to region of observation |
A | Absent from all checklists for region of observation |
I | Introduced, as declared in checklist for region of observation |
Ie | Endemic to other region and therefore introduced in region of observation |
UNK | Unknown; no checklists available for region of observation and taxon not edemic elsewhere |
- Copy the results file back to the source server and update the source database
- I recommend copying the contents of the following columns to your source database:
native_status
native_status_reason
native_status_sources
- I suggest adding prefix "nsr_" to make it very clear where these results come from and to avoid confusion with existing columns:
nsr_native_status
nsr_native_status_reason
nsr_native_status_sources
RECOMMENDATIONS:
- Run a small test file in interactive mode
- This will give you a better idea of how the application works, and will help you to troubleshoot potential issues, such as problems with file format or line endings
- Set interactive mode and echo to on, as follows:
php nsr_batch.php -e=true -i=true -f=$INPUTFILENAME -l=$LINE_ENDINGS -t=$FILE_TYPE -r=false
- Tip: if your results file is empty, you probably (a) used the wrong type of line ending, or (b) declared the wrong file type, or (c) used -t=csv, but used a non-standard CSV format .
- How to ensure a faithful join back to the original records
- NULLs in the original database present challenges for joining results back to the relevant records. This is because you cannot join on a NULL. To circumvent this problem, I recommend the following.
- Prior to completing step 1 above ("Prepare the input table"), extract from the source database a table consisting only of family, genus, species, country, state_province, AND the original primary key (PK) of the observation. This new "linking table" is not a select distinct; it should have exactly as many records as the original observation table.
- The query to construct this table should include a WHERE clause that filters out unwanted records (i.e., no species, no country, etc.)
- Set all NULL values to empty string in the linking table.
- Extract the "input table" (describe in step 1) from the linking table, using a SELECT DISTINCT and omitting the PK of the linking table.
- After importing the NSR results back to the database, join it back to the original observation table via the linking table as follows:
- Join the results table to the linking table on country, state_province and species
- Join the linking table to the observation table on the PK
- If you have a better way of doing this, go for it. This is how I do it.