This tool will query taxonomy reference databases for Fungi, Animals, Bacteria and Archaea.
ITS sequences for Fungi will be queried against the UNITE Species Hypothesis General FASTA release, a reference database for DNA barcoding of Fungi.
COI sequences for Animals will be queried against a 99% clustered version of the International Barcode of Life project (iBOL) public data (COI-5P sequences). This is similar to the assembly of preconfigured reference databases for Chordates and Arthropods in the AMPtk (AMPlicon Tool Kit).
16S sequences for Bacteria and Archaea will be queried against the Genome Taxonomy Database 16S rRNA gene sequences identified within the set of representative genomes.
18S Sequences for Eukaryotes will be queried against the PR2 18S rRNA database
12S Sequences for fish will be queried against the MitoFish Database
All matches returned are compared with Operational Taxonomic Units (OTUs) from UNITE, International Barcode of Life project (iBOL) and the Genome Taxonomy Database that are in the GBIF backbone taxonomy. Users can download the matched results as a csv that includes the OTU identifiers (e.g. SH1528411.08FU or BOLD:AAI8076), which can then be used to publish sequence-based occurrence records to GBIF.
Users may upload files in either CSV or FASTA formats. Sequence ID expects CSVs to have a column named 'sequence' and an 'id' or 'occurrenceId' column. Users can also paste FASTA-formatted sequences into a text field.