An R package for importing and exporting vegetation data using the Veg-X standard
By Miquel De Cáceres, Sebastian Schmidtlein and Susan Wiser
There is currently an enormous amount of data about the vegetation found around the world. In many cases these are data from vegetation plots, established for specific studies or that are part of collective large-scale observation efforts, such as floristic or forest inventories (Wiser, 2016). As of March 2018, the Global Index of Vegetation-Plot Databases (GIVD; www.givd.info) (Dengler et al., 2011) contained information about 260 vegetation databases. However, most of these (and others for personal use not included in GIVD) have their own vocabularies, definitions of fields and data, hence their own ‘idiosyncratic structures’. This is not an issue for studies drawing data solely from large vegetation-plot databases (e.g. VegBank, EVA, sPlot), as these require researchers to only understand a single data structure. Furthermore, some international plot networks have alleviated data exchange problems caused by sharing the same field sampling protocols and database structures (Condit et al., 2014). However, in many cases where a researcher wants to integrate a range of different data sources for new analyses, disparate structures and field protocols have to be unified, a time-consuming, error-prone and frustrating task. For biodiversity observations (e.g. from herbarium records) the Darwin Core data exchange standard (http://rs.tdwg.org/dwc/) has been key to successful data harmonization and reuse in combination with the GBIF (www.gbif.org). Clearly, analyses requiring the integration of vegetation data from several sources are currently hampered because there neither an internationally recognized data exchange standard (analogous to Darwin Core) or corresponding tools to easily use such a standard.
In 2003, the IAVS Ecoinformatics Working Group (EWG) decided to promote the development of a standard to exchange vegetation-plot data. An international collaboration produced a draft standard, Veg-X for data exchange, implemented in an XML Schema (Extensible Markup Language). Veg-X was designed to be compatible with the most commonly used vegetation databases (Wiser et al., 2011). The standard distinguishes between the observed entity (e.g. a tree or taxon) and the act of observation (i.e., the measurements), and can accommodate both data from individual plants (e.g. tree diameters, heights, etc.) and abundance data characterizing a taxon (e.g., the percent cover of a species), including the position of plants in vertical layers. The standard also supports the repeated sampling of organisms and plots, and grouping of observations by non-temporal criteria (e.g., from defining sub-plots). All elements of the XML schema are clearly defined to facilitate interoperability via mapping fields from source data structures to Veg-X.
Fig 1. Main Veg-X (ver. 2.0) elements and their logical relationships. Arrows indicate that an identifier of the origin element is referenced in the destination element. Accompanying numbers indicate the number of instances of the origin element that are allowed to be referenced in the destination element. Observations are in tinted boxes.
One of the impediments to the adoption of Veg-X has been its complexity, which is required to accommodate the wide variety of ways vegetation plot data are collected and stored. An analogous situation exists for the Ecological Metadata Language (https://knb.ecoinformatics.org/), where its sophistication has been a barrier to use. To ensure that the Veg-X standard is adopted by a large community of users, it is important to develop tools to facilitate interoperability, the integration of documents, harmonization of units, etc. Thanks to support from IAVS, in 2018 we began to develop an R package to perform these tasks. It contains functions to import, integrate, harmonize and export vegetation data using the Veg-X standard. The development of the package, inventively also called VegX, has been carried out in parallel to a major revision of the standard itself to simplify it. The files conforming the Veg-X XML schema (version 2.0) and the R package VegX can be downloaded from a GitHub repository (https://github.com/miquelcaceres/VegX). A detailed description of the schema and a user manual of the package can be found at https://miquelcaceres.github.io/VegX/.
Although it is already functional, the VegX package is at a development stage where it is necessary to confirm its usefulness for importing and integrating different types of data sources. As such, we appeal to all of you, vegetation scientists, interested in testing the functions of the package by trying to import your data (e.g. from Excel spreadsheets or ASCII text files). If you are interested in helping you can send us example datasets and R scripts, as well as descriptions of problems or doubts that may have arisen during the process of testing the data import, directly to us via email or posting issues on the GitHub site. Example datasets can be very small, since we are only interested in their data structure. This information will greatly help us improve these tools.
Veg-X is mainly intended to ease direct data exchange but it could also facilitate data access on public repositories such as figshare (https://figshare.com/) or DRYAD (https://datadryad.org/) as a complementation of more centralized structures such as the large vegetation plot databases VegBank, EVA etc. (which could then be fed by the repositories). Both approaches together, the ‘grassroot’ public repositories and more administered databases, could further the long term objective of the Ecoinformatics Working Group to promote that vegetation data from scientific studies is archived and available for reuse. To promote the use of public repositories, we are also considering the developing interactive web applications in R Shiny (https://shiny.rstudio.com/), which would use the VegX package internally. Furthermore, software tools for collecting field data can include Veg-X as an export format. One such tool, an Android app called Vegapp, is about to be published on Play. Such applications facilitate the transformation of data in the Veg-X standard for users who are unfamiliar with R. We believe that a widespread adoption of Veg-X and the mentioned tools (the XML schema, R package, documentation, web-based applications and data collection tools with Veg-X export) could contribute substantially to the exchange, harmonization and reuse of vegetation plot data.
References
Condit, R., Lao, S., Singh, A., Esufali, S., Dolins, S. 2014. Data and database standards for permanent forest plots in a global network. Forest Ecology and Management 316: 21-31.
Dengler, J., Jansen, F., Glöckler, F., Peet, R.K., De Cáceres, M., Chytrý, M., Ewald, J. et al. 2011. The Global Index of Vegetation-Plot Databases (GIVD): a new resource for vegetation science. Journal of Vegetation Science 22: 582-597.
Wiser, S.K. 2016. Achievements and challenges in the integration, reuse and synthesis of vegetation plot data Chiarucci, A. (ed.), Journal of Vegetation Science 27: 868-879.
Wiser, S.K., Spencer, N., De Cáceres, M.D., Kleikamp, M., Boyle, B., Peet, R.K. 2011. Veg-X – an exchange standard for plot-based vegetation data. Journal of Vegetation Science 22: 598-609.