Call for proposals to help mature and test how specimens are handled in GBIF’s emerging unified data model

Selected data publishers, nodes and developers of collections management systems will secure small contracts to participate in a pilot that will explore and refine the display of richer specimen data
Deadline: 2 October 2022

datamodel-collage

The GBIF Secretariat is seeking proposals from data publishers, Participant nodes and developers of collection management systems to participate in a pilot project showcasing how an emerging unified data model can improve how data derived from specimens in scientific collections are represented in GBIF. The holders and managers of specimen data chosen to participate in the pilot will help advance the data model’s ability to support broader scientific applications of specimen data in the future and help enable the discovery of specimen records within the Global Registry of Scientific Collections (GRSciColl).

GBIF is developing a new data model in response to community feedback and to recommendations in CODATA’s Twenty-year Review of GBIF to enable the publication of richer, more complex types of biodiversity data in the future. The initiative has taken a case-study approach, starting with narratives developed by groups of data publishers who currently face challenges in sharing their data via GBIF. Analysis of these case studies has led to an emerging single unified data model that aims to satisfy the requirements detailed in a growing collection of case studies.

The success of this work will depend on engaging a broad spectrum of individuals who publish biodiversity data. Those selected to further test and refine the unified model through this call for proposals will prepare datasets that exemplify the complexities that specimen-data publishers wish to capture and share through GBIF in the future. GBIF’s ongoing series of community webinars will continue to explore, case study by case study, how the proposed unified model can support data structuring and publishing.

A July webinar focused on collections management systems, using example data from Arctos to demonstrate a proof of concept for the data model, while other GBIF network members who work with data relating to physical collections and specimens were invited to prepare interventions and offer feedback. These inputs will all contribute to the deployment of a material catalogue that explores richer aspects of specimen data that go beyond the simple physical spatiotemporal evidence of taxa currently discoverable in specimen-based occurrences on GBIF.org.

A selection panel will review applications with outcomes communicated before the end of October 2022. Work on the pilot will commence in November 2022.

Pilot project scope

Applicants selected through this call for proposals will work with the Secretariat to populate a pilot database used to demonstrate a queryable online specimen catalogue. The objective is to faithfully represent this data from its source within an aggregated environment that supports broader scientific research and inquiry than currently possible. The result of the pilot project will showcase data from the selected applicants on specimen pages (see example).

Successful applicants will be required to complete the following tasks over the course of approximately six to eight weeks:

  1. Provide one or more datasets reformatted into the emerging unified data model as CSV files suitable for the pilot database
  2. Participate in group calls with other participants
  3. Provide feedback and suggestions on the display of the data on material catalogue demo pages, including topics such as:
    1. How the information should be structured and displayed
    2. How GBIF should apply enrichments, such as higher taxa or geographic information, and the inferred links to other items through our clustering algorithms
    3. How the specimen should be cited, e.g., as material examined in a paper
    4. Which identifiers should be used to refer to the specimen and the compiled specimen page (potentially including DOIs issued through GBIF)
    5. How annotations should be displayed, e.g. automatic flags from processing, future use of community-generated annotations
  4. Record a presentation and attend a community webinar to share and showcase your own data in the results

Available funding and eligibility to apply

Applicants can seek up to a maximum of €3,000 to support their participation in this pilot project in the form of a small contract. We also welcome applications from groups wishing to self-fund their participation in this pilot project.

Applications are welcome from representatives including staff or individual/freelance contractors of:

  • Institutions located in a country participating in GBIF (see list of voting and associate Participant countries)
  • Vendors of collection management systems used by institutions within countries participating in GBIF
  • GBIF Participant nodes with experience of shaping collections data and who have access to collections data to use in this pilot project
  • Partnerships among any of the above

In all cases, applicants should demonstrate in their proposal that they have permission to use the data that they intend to work with during this pilot project.

All applicants must fulfill one of the following criteria:

a) be actively publishing data via GBIF
b) commit to publishing new datasets through GBIF as a result of this work
c) actively develop or support a collection management system and have a dataset that can comply with criterion a or b

Applicants should have experience and knowledge of the following:

  • Current GBIF data publishing practices
  • Structure of the data they will use in the pilot work
  • Familiarity with the goals of the Diversifying the GBIF Data Model project
  • Sufficient technical skill to be able understand the Unified Data Model and deliver one or more datasets using that model. Technical support on the Unified Model will be available during the period of the contract, but applicants should be familiar with the basics of using a database such as PostgreSQL, schemas, constraints and foreign keys.

Selection process and criteria

Applications will be assessed by a selection panel, convened by the GBIF Secretariat, including external experts, using the following criteria:

  • History of regular data publishing with GBIF and updating of published datasets
  • Maintenance of the associated GRSciColl entries
  • Volume of data currently published in GBIF that would be affected, or the potential to fill known data gaps in the future, as a result of this work
  • Demonstrated previous experience with publishing more complex data through GBIF (in ABCD or using DwC-A extensions)
  • Extent of the draft data model that will be covered in the data provided for this pilot project, for example (but not limited to):
    • Measurements
    • Media
    • DNA-related identifications
    • Multiple taxa within identifications
    • Complex specimens
    • Preparations of material
    • Cross-specimen relationships
    • Multiple occurrences of specific biological individuals
    • Cross-collection linkages
  • Potential for the solution to be reused by other publishers; e.g., reusable scripts or improvements to a tool that many people use
  • Potential of project to increase the diversity of collections covered in the GBIF Data Model project and the geographic regions of the contributors
  • Clear rationale for expected costs associated with this work

Application process

Applicants must submit proposals in English via email by Sunday, 2 October 2022.

For all applications, please make it clear that the data custodian(s) concerned support the application. Provide letters of support if needed.

Applicants should provide a proposal document (two pages maximum) including:

  • The primary contact person for the application
  • The person or institution that would receive the contract
  • Any other partners that would be involved in the pilot project work supported by the contract
  • A description of:
    • Which datasets you - or your partners - are currently publishing data via GBIF and what challenges have you encountered in shaping the data to the current data model?
    • Which dataset(s) would you use in this pilot project and which areas of the new data model does the data cover?
    • Which GRSciColl records relate to the application?
    • How could the work you undertake in this pilot project be reused by others?
    • How would the funding you request be used to support this work and do you plan to contribute any other resources in kind?

A selection panel will review applications with outcomes communicated before the end of October 2022. Work on the pilot will commence in November 2022.

Please note that successful applicants will be required to sign a service contract for this work. Payment in euros will follow successful completion of the tasks and a final narrative report.