Release notes

Updates of the GBIF.org software and infrastructure

Osmia-bicolor-iNat-gsanmartin-hero
Red-tailed mason bee (Osmia bicolor), Namur, Wallonia, Belgium. Photo 2019 Gilles San Martin via iNaturalist Research-grade Observations, licensed under CC BY-SA 4.0.

This pages details the main updates to the GBIF.org and related infrastructure. Further details are found in the GitHub repositories, including

15 August 2024

Reference database updated in the Sequence ID tool
  • COI: (Animalia) database updated to International Barcode of Life v2024-07-19

29 February 2024

Darwin Core extension support for downloads

Occurrence extension data published to GBIF can now be downloaded as part of a Darwin Core Archive download. See the documentation for the supported extensions and how to download them.

This feature is currently only available through the API.

19 February 2024

New Darwin Core terms supported

The following Darwin Core terms, added to the standard in 2023, are now available for publishers to use in the IPT. Term values will be shown on GBIF.org and in the API.

  • verbatimLabel
  • caste
  • vitality
  • eventType
  • superfamily, tribe and subtribe

12 February 2024

New occurrence search and download filters available

IPT version 3 has been released. Please see the release announcement.

30 January 2024

New occurrence search and download filters available
  • earliestEonOrLowestEonothem
  • latestEonOrHighestEonothem
  • earliestEraOrLowestErathem
  • latestEraOrHighestErathem
  • earliestPeriodOrLowestSystem
  • latestPeriodOrHighestSystem
  • earliestEpochOrLowestSeries
  • latestEpochOrHighestSeries
  • earliestAgeOrLowestStage
  • latestAgeOrHighestStage
  • lowestBiostratigraphicZone
  • highestBiostratigraphicZone
  • group
  • formation
  • member
  • bed
  • gbifRegion
  • publishedByGbifRegion
  • fieldNumber
  • preparations
  • sex
  • startDayOfYear
  • endDayOfYear
  • higherGeography
  • island
  • islandGroup
  • georeferencedBy
  • previousIdentifications
  • datasetName
  • datasetID
  • otherCatalogNumbers
  • taxonConceptID
  • isSequenced
  • associatedSequences
eventDate term now supports intervals

Dates in ISO 8601 interval formats like 2024-01-10/2024-01-20, 2024 or 2023-12/2024-01 are now supported and used in searching and downloads.

10 January 2024

Reference databases updated in the Sequence ID tool
  • 16S: (Bacteria and Archaea) Genome Taxonomy Database r214
  • COI: (Animalia) database updated to International Barcode of Life v2024-01-06
  • 12S: MitoFish - Mitochondrial Genome Database of Fish V3.97
  • ITS: Unite v9.0 (2023-07-25)

26 October 2023

  • The rewrite of the workflows that generate the precalculated maps on GBIF.org was deployed.

20 September 2023

  • GBIF.org now processes scientificNameID, taxonID and taxonConceptID for configured identifier schemes; initially WoRMS LSID. The details of this are discussed in this issue.

28 August 2023

  • New GBIF backbone taxonomy, with three new sources and other updates. Name matching to species aggregates has been improved. Refer to the backbone build log for additional details.
  • The field http://purl.org/dc/terms/identifier (identifier) has been removed from interpreted data and downloads, as it had been introduced unintentionally and contained only internal identifiers.
  • 43 unused Dublin Core terms — which have never been used in Darwin Core, and were always empty — have also been removed from new Darwin Core archive downloads.
  • Searching using gbif_id is now supported, in the API, website and downloads.

24 August 2023

Two new reference databases added to the Sequence ID tool
  • 12S: MitoFish - Mitochondrial Genome Database of Fish
  • 18S: PR2 18S rRNA database

12 January 2023

Dataset filters
  • Continent interpretation now considers occurrence coordinates. All georeferenced, terrestrial records now have a continent value, and issues are applied where the publisher's value is unexpected.
  • A new field distanceFromCentroidInMeters is present for occurrences within 5000m of a known country centroid. Particularly for preserved specimens, this can highlight imprecise georeferencing. See the data blog for the motivation for this field.
  • Coordinate uncertainty, where provided by the publisher, is now taken into account when verifying the country/countryCode values.
Registry
  • A new check prevents derived datasets from being created without a related dataset.
Data ingestion
  • The Camera Trap Data Package (Camtrap DP) format is now supported.

26 August 2022

  • Mechanisms deployed to detect large scale changes in occurrence record IDs before indexing. When detected data managers can intervene and confirm or correct mistakes to better ensure GBIF ID stability.

23 June 2022

  • Clustering rules relaxed for iBOL and EMBL(INSDC) datasets to accommodate more sparse data. Now when a record is from there, it is sufficient to have the accepted scientific name and identifier overlap to connect the records.

22 June 2022

19 May 2022

  • GBIF data now on Google BigQuery as a public table, updated monthly

2 May 2022

Sequence ID tool
  • 16S (Bacteria and Archaea) Genome Taxonomy Database r207
  • COI (Animalia) database updated to International Barcode of Life v2022-02-22

28 April 2022

  • The following fields that can contain multiple values, can now be searched using individual values (datasetID, datasetName, otherCatalogNumbers, typeStatus, recordedBy, identifiedBy, preparations, samplingProtocol). For example, searching for records collected by "L. Richardson" now returns records when they were part of a group of people making the observation. (pipelines/665 and pipelines/283)

  • Occurrences are now searchable using the Darwin Core datasetName and `datasetID fields. This search respects what the record states the value is, allowing different values within a dataset registered on GBIF to support search within aggregated datasets. (pipelines/662)

  • The datasetName is now included in the occurrence download (pipelines/270)

  • Occurrence records are now searchable using the Darwin Core term otherCatalogNumbers in the API (pipelines/664)

  • The preparations field is now correctly populated (pipelines/667)

11 March 2022

  • Rules for clustering tightened, to avoid over-eagerly clustering records that have the same species and catalogue number but nothing else to support the link

1 March 2022

28 February 2022

  • Datasets can now declare which country should be attributed as publishing the data on a record by record basis. Previously, only eBird had this capability, now others can, such as iNaturalist

7 February 2022

  • Changes to the map server have been deployed, to provider higher resolution maps to all hosted portals, such as on GBIF.us

17 February 2022

  • Changes to the content model deployed allowing the communications team more control of the GBIF.org homepage. Deployed with first changes to the styling.

3 February 2022

  • The data validator has been updated to be consistent with GBIF.org indexing. Within the tool, logged in users can now find their historical validation reports

31 January 2022

  • The occurrence index has been updated to support the latest version of Darwin Core, released last year.
    • A new basis of record, Material Citation, replaces the obsolete Literature basis of record. Additionally, the "Unknown" basis of record will no longer be used, instead records will be shown with an "Occurrence" basis of record.
    • The existing term Establishment Means now has a vocabulary in Darwin Core. This replaces the GBIF enumeration used for this term.
    • New terms Degree of Establishment and Pathway are now available, and have their own vocabularies.
    • The new terms Vertical Datum, Verbatim Identification, Subfamily, Infrageneric Epithet and Cultivar Epithet may be used on occurrences, although the taxonomic terms are not yet supported by the GBIF Backbone Taxonomy.

29 January 2022

  • Changes to data ingestion applied that aborts the process if >5% of record IDs are seen to change, allowing data managers to verify before proceeding.

13 January 2022

  • GRSciColl now supports the ability to select a GBIF dataset or publishing organisation as the "master" source of information for a GRSciColl collection or institution record. Changes made in the organisation's registration or dataset metadata will automatically be reflected on the GRSciColl entity.

2021

3 December 2021

  • A new IPT release, (2.5.2), addresses 26 issues. Most improvements are for the new user interface (including bugfixes), and to deployment / server administration.

  • A new backbone is live, with a new WCVP Fabaceae source and additional Plazi publications. For further details, see the build log

28 October 2021

  • New filters and facets for the literature service (gbifTaxonKey, gbifOccurrenceKey, gbifHigherTaxonKey, citationType)
  • New geo distance filter/predicate for occurrence search and download (geoDistance)
GrSciColl
  • New model for GRSciColl contacts that replaces the current staff members (#379)
  • Number of specimens in institutions made optional (#389)
  • Taxonomic coverage added to the collections search (#390)
  • Lookup now accepts alternative codes + ID matches as exact (#381)

17 September 2021

New backbone live

Refer to the backbone build log for additional details.

31 August 2021

Integrated Publishing Toolkit (IPT)

A new version of the IPT has been released (2.5.0), addressing 81 issues. New/improved features include:

  • A fresher-looking user interface, which should still be familiar to existing users
  • The user manual has been converted from the GitHub Wiki to AsciiDoctor/Antora
  • Source data files can now be downloaded by a resource manager
  • Auto-publishing can now be set to specific, future dates
  • Archive mode can be limited to a set number of old archives to retain
  • A new health/troubleshooting page reports common system problems, like running out of disk space or incorrect filesystem permissions
  • The administration contact (for forgotten passwords) is now configurable
  • Database (JDBC) drivers have been updated
  • A URL can now be used as a data source

2 July 2021

Occurrence images
  • Occurrence records with IIIF manifest given in Audubon Core extension or Dynamic Properties now display draggable IIIF icon with link to viewer (example)

11 June 2021

GRSciColl

31 May 2021

Dataset filters
  • Dataset search API supports filters and facets by networkKey, hostingCountry and endorsingNodeKey
Dataset export services
Download statistics
Miscellanous

21 May 2021

Features
  • Search occurrences using modification date stated by publisher #219
  • Download filters support search “field has a value” using the isNull predicate #244
  • Registry console supports user filtering by roles and editor scopes #330
  • API response for dataset citation now includes authors as objects, if they are also contacts and indication if the citation was provided or generated #351
  • Dataset search API supports filters and facets by installationKey and endpointType #148
Bug fixes
  • Creating a network constituent for a non existing network no longer throws error #349
  • Network suggest no longer includes deleted entities #308
  • Consistent behaviour on GBIF.org and Registry management console for publisher search #198

17 May 2021

  • First GBIF Parquet export added to the Amazon Public Data Catalog, with data available on 5 continents

5 May 2021

API and processing
Derived datasets

20 April 2021

  • New Parquet download format added to the API
  • First GBIF Parquet export added to the Microsoft Planetary Computer data catalogue.

22 March 2021

Sequence ID tool
  • Classification of Bacteria and Archaea by 16S sequences matched against the Genome Taxonomy Database r95
  • ITS (Fungi) database updated to UNITE v8.2
  • COI (Animalia) database updated to International Barcode of Life v2021-02-08

11 March 2021

New backbone live
  • Data source replacements, primarily for Fabaceae family and the prokaryotic kingdoms Bacteria and Archaea
  • Improvement for stable identifiers, esp relating to OTUs
  • Algorithm improvements (misplaced taxa)
  • Removal of names / terms on a denylist
  • Please refer to the backbone build log for additional details

23 February 2021

  • Support for registering dataset endpoints in Catalogue of Life Data Package format
  • Flagging of potential duplicates added to assist editors in deduplication entries in the GRSciColl catalogue. E.g. Reuse of the code PCU
  • Ability to restrict permissions for GRSciColl editors to institution or collection, allowing more people to participate
  • Schema.org metadata tags revised on the dataset and taxon pages to improve search engine discoverability

11 February 2021

26 January 2021

  • Improvements to the handling of networks (groupings of datasets) including
    • Listing in search e.g. searching for Arctos
    • Listing the publishers, and the datasets in the summary e.g. OBIS network
    • Ability to control if they are visible on a dataset page
    • Ability to assign editorial control to trusted users in the registry
  • Support for DOIs for adhoc data exports by GBIFS staff (example https://doi.org/10.15468/dd.jskxae)
  • Bug fix for BioCASe protocol metadata synchronisation
  • Added the literature vocabularies type, topic and relevance to the API to support analyses by external data scientists
  • Added an experimental API categorisation of the griddedness of datasets (e.g. this example)
  • Added capability to associate ROR and GRID ids to organisations in the GBIF registry

2020

17 December 2020

15 December 2020

14 December 2020

  • API deployed to support Literature search by DOI. This API is documented in GitHub but documentation will be moved to the GBIF API documentation shortly

8 December 2020

  • The new Catalogue of Life website is live. This is the first deployment that is powered by GBIF and hosted on GBIF infrastructure. In addition to the public website are the common repository known as the checklistbank, and a new API which is supported in the rOpenSci client.

2 December 2020

  • Extension data now shown on all occurrence pages e.g. measurements example
  • Specimen-related occurrence records now link to the collection catalogue entries in addition to the dataset they originate from e.g. this record from SAIAB. Matching uses a variety of fields including collectionCode, institutionCode, collectionID and institutionID. See the FAQ on how to improve matching
  • New API to improve searching against the Collection Catalogue, e.g searching for "K"
  • Elasticsearch updated to version 7.10.0

9 November 2020