NSR Batch Mode – Botanical Information and Ecology Network

Using the Native Status Resolver (NSR) bulk processing application

To use the NSR in batch mode, you must have an account on nimoy and be authorized to shell in remotely.

The NSR uses published regional checklists to determine if an observation of a species within a region represents a native or introduced individual organism. "Region" is defined as one or more of the standard hierarchy of political divisions: country, state-province and county-parish. The minimum required input for the NSR is therefore a species name and a country. The NSR assumes that both species names and political division names have been standardized prior to processing. Please ensure that all taxon and place names are correct prior to upload, otherwise they will not be resolved.

The NSR works as both a web service (processing one observation at a time) and a batch application (processing multiple observation in a single file). Until I have time to develop a web interface for uploading to the NSR batch application, you will need to upload files directly to the NSR working directory on NCEAS server nimoy.

STEP-BY-STEP PROCEDURE

Prepare the input

- From your data, extract a table or file consisting of all unique taxon + political division combinations within the database.
- Select ONLY records identified at least to species that also have a value for country. All other values are optional
- If taxon is identified to a variety or subspecies, use only the species part of the name (e.g., For "Poa annua var. supina" use "Poa annua").
- Do not include authors with species names
- Taxon names should be previously standardized using TNRS
- Spelling of political division names should match the plain ascii (unaccented) English language format used in the GeoNames database. Among other rules, omit categories from state and county/parish names. For example, use "Guerrero" not "Estado de Guerrero" or "State of Guerrero" for the Mexican state.
- Use blanks (empty string) for unknown or missing values, NOT "NULL", "null" or "NA"
- Fields must be in the following order. Include all fields, even if a field is all blanks.
- Include the headers

Field name	Meaning	Required?
taxon	Genus + specific epithet, separated by space, no authority*	Yes
country	Country of observation	Yes
state_province	State/province of observation	No
county-parish	Next lower political division of observation (county/parish, etc.). Currently no NSR sources resolve to this level. Field must be included, just leave blank. Any included values are ignored, but this field must still be present.	No
user_id	User-assigned record identifier. MUST be an integer. Leave blank if not used	No

* You may also enter a family or a genus (without specific epithet) in the taxon column

Save the input as a UTF-8 plain text file

File must contain column header as first line
File type can be CSV (default) or tab-delimited

Move input file to NSR data directory

Server: nimoy.nceas.ucsb.edu
NSR data directory: /var/www/bien/apps/nsr/data/

Run the NSR on the input file
- Navigate to the main application directory: /var/www/bien/apps/nsr/
- Execute the following command, where $INPUTFILENAME is the name of your input file. Other options are explained below.

php nsr_batch.php -e=false -i=false -f=$INPUTFILENAME -l=$LINE_ENDINGS -t=$FILE_TYPE -r=false

Options are as follows (default in bold):

-e: terminal echo on [true,false]

-i: interactive mode [true,false]

-f: input file name [default name = 'nsr_input.csv']

-l: line-endings [unix,mac,win]

-t: file type [csv,tab]

-r: replace the cache [true,false]

Please use –r=false to retain all previously cached results. Option –r=true is used only when NSR reference database has changed and previous results may not be valid.
When the NSR has finished running, results file will be saved to the NSR data directory

Results file

Will have the same base name as your input file, plus the suffix "_nsr_results.txt"
Is always tab-delimitted, regardless of the format of the input file
Contains all the fields in your original file, plus the following NSRs results fields:

Column	Meaning (values)
native_status_country	Native status in country (see native status values, below)
native_status_state_province	Native status in state_province, if any (see native status values, below)
native_status_county_parish	Native status in county_parish, if any (see native status values, below)
native_status	Overall native status in lowest declared political division (see native status values, below)
native_status_reason	Reason native status was assigned
native_status_sources	Checklists used to determine native status
is_cultivated_in_region	Species is known to be cultivated in declared region (1=cultivated; 0=wild, not cultivate; blank=status unknown)

Note that is_cultivated_in_region is NOT the same as the original BIEN2 and BIEN3 isCultivated. The latter =1 is an individual plant is likely human planted, as indicated by the original specimen label or inferred from proximity to a botanical garden. is_cultivated_in_region=1 if a given species within a given checklist region has been flagged as commonly cultivated (for example, Capsicum annuum [chile] in Mexico). Although it is likely that an observation with a value of is_cultivated_in_region=1 represents a planted individual, it is possible is it wild (again, Capsicum annuum in Mexico).

Native status values

Native status code	Meaning
P	Present in checklist for region of observation but no explicit status assigned
N	Native to region of observation
Ne	Native and endemic to region of observation
A	Absent from all checklists for region of observation
I	Introduced, as declared in checklist for region of observation
Ie	Endemic to other region and therefore introduced in region of observation
UNK	Unknown; no checklists available for region of observation and taxon not edemic elsewhere

Copy the results file back to the source server and update the source database

I recommend copying the contents of the following columns to your source database:

native_status

native_status_reason

native_status_sources

I suggest adding prefix "nsr_" to make it very clear where these results come from and to avoid confusion with existing columns:

nsr_native_status

nsr_native_status_reason

nsr_native_status_sources

RECOMMENDATIONS:

Run a small test file in interactive mode

This will give you a better idea of how the application works, and will help you to troubleshoot potential issues, such as problems with file format or line endings
Set interactive mode and echo to on, as follows:

php nsr_batch.php -e=true -i=true -f=$INPUTFILENAME -l=$LINE_ENDINGS -t=$FILE_TYPE -r=false

Tip: if your results file is empty, you probably (a) used the wrong type of line ending, or (b) declared the wrong file type, or (c) used -t=csv, but used a non-standard CSV format .

How to ensure a faithful join back to the original records

NULLs in the original database present challenges for joining results back to the relevant records. This is because you cannot join on a NULL. To circumvent this problem, I recommend the following.
Prior to completing step 1 above ("Prepare the input table"), extract from the source database a table consisting only of family, genus, species, country, state_province, AND the original primary key (PK) of the observation. This new "linking table" is not a select distinct; it should have exactly as many records as the original observation table.
The query to construct this table should include a WHERE clause that filters out unwanted records (i.e., no species, no country, etc.)
Set all NULL values to empty string in the linking table.
Extract the "input table" (describe in step 1) from the linking table, using a SELECT DISTINCT and omitting the PK of the linking table.
After importing the NSR results back to the database, join it back to the original observation table via the linking table as follows:
- Join the results table to the linking table on country, state_province and species
- Join the linking table to the observation table on the PK
If you have a better way of doing this, go for it. This is how I do it.