New opportunity for publication of species sampling and monitoring data

Changes to the Darwin Core vocabulary standard support improved discovery and access of sample-based data on GBIF.org.

Basking survey
Ashley Ballou, biologist with the Florida Fish and Wildlife Conservation Commission, holds up Sternotherus odoratus, a common musk turtle during a field survey. Photo by licensed under CC BY-ND.

The international body responsible for maintaining standards for the exchange of biological data has ratified changes that improve support for mobilization and access of sample-based species data through the GBIF network.

By adding to the rich set of terms already available in the Darwin Core (DwC) standard, this action by Biodiversity Information Standards—also known as TDWG—will help GBIF-mediated data move beyond “presence only” data and support the discovery and application of richer, more quantitative information used in other areas of scientific discovery and research, particularly ecological monitoring and assessment.

Sample-based data come from thousands of different kinds of environmental, ecological, and natural resource investigations. These events range from one-off surveys to ongoing monitoring and include activities like freshwater and marine sampling, plant cover and vegetation plots, and citizen science bird counts, among others.

In addition to bringing in new datasets, these changes could also improve the quality and utility of many datasets already published through GBIF, which derive from the more complex sources required to understand how species populations change across space and time.

“The Darwin Core extension for sample data is a major advancement for the global biodiversity community,” said Henrique Pereira, chair of GEO BON. “Monitoring biodiversity change often requires repeated measures at the same place. This extension will enable data holders publishing through the GBIF network to share population abundance data (including time series population data) or presence/absence data, and also to document the sampling protocol.”

Because of their quantitative and calibrated nature and precisely described methods, sample-based data are better at detecting changes and trends in populations than the ‘presence-only’ observation and collection occurrences that make up much of today’s open-access biodiversity data. As a result, sample-based data are critical to understanding the prospect and pace of widespread global change.

The challenge in sharing sample-based data has been that the underlying data is often complex and difficult to encode in a relatively simplified data model. But over the past two years, the GBIF Secretariat has been working with EU BON partners and the wider community to identify additional terms for the Darwin Core vocabulary and enable support of sample-based data in the Integrated Publishing Toolkit (IPT). The aim is to demonstrate a new means of exposing datasets to maximize their discoverability and reuse, rather than prescribing methods for the capture or modeling of data. The newly ratified Darwin Core terms will support mobilization of data for GEO BON’s Essential Biodiversity Variables (EBVs), in particular the ‘species populations’ class. EBVs function as an intermediate layer between ‘raw’ data and the indicators established by governments under the CBD, and may also support assessments carried out for IPBES.

Interested users can learn more about these efforts by reading this primer about Publishing sample-based data with the IPT and by testing a prototype installation of the IPT v2.3 that supports sample-based datasets. Upon completion of testing, GBIF.org will begin supporting basic registration and search of sample-based data sets, with enhanced indexing and discovery of datasets expected during the second half of 2015.