Register List Endorsement and Validation Guide
Contributors: Maria Chuvochina, Marike Palmer, Miguel Rodriguez-R
Workflow for Registration
The workflow for registration is presented below only for reference and to
standardize the terminology used throughout:
- Initial name(s) registration: The names are entered into the Registry by
the submitter or via automatic search by the Excubia Bot. At this stage,
names can be in the status of “Automatically discovered” (publicly visible)
or “draft” (private).
- Submitter claim: The names are claimed by the submitter (unless
registered by them). Names are in the status of “draft” (private).
- Name(s) in register list: The submitter adds the name(s) to the register
list using the “propose name” button.
- Name submission (through a Register List submission): The submitter
completes a Register List submission, which automatically changes the status
for the list and all the names to “submitted”. For Path 2, this leads
directly to step 6 (without preliminary curation).
- Name endorsement (through a Register List endorsement): The curators
endorse the Register List, which automatically changes the status for the
list and all the names to “endorsed”. This step is bypassed in Path 2.
- SeqCode notification: The submitter notifies the SeqCode Registry of the
publication of the manuscript (the typeset version of the effective
publication). The list changes to the status of “notified” (names remain
either “submitted” for Path 2 or “endorsed” for Path 1).
- Name validation (through a Register List validation): The curators
validate the Register List, thereby rendering all contained names “validly
published”.
- Register List publication: The DataCite format is reviewed for
consistency and a DOI is obtained for the Register List.
General Procedures on Register List Curation for Endorsement and Validation
- All detected issues are to be addressed before validation, but some can be
ignored for endorsement (and are explicitly indicated in the SeqCode Registry
as Quality Check warnings). Changes can be introduced after endorsement, but
ideally should be addressed by then. No changes are to be introduced after
validation.
- When a list is submitted, curators might identify individual issues and
record them in the SeqCode Registry using one of the following spaces:
- Name Correspondence with Curators: If the issues refer to a specific
name (and not the entire list), the preferred space to document issues is
in the name’s page itself.
- Register List Correspondence with Curators: If the issues refer to the
entire list (and not just a specific entry), the preferred space to
document issues is in the registry list’s page. For single-entry lists,
both spaces can be treated as equivalent.
- Notes on “Return to submitter”: If only minor issues remain that can be
quickly addressed and don’t need permanent documentation, they may be
communicated to authors in the notes when returning the registry list back
to submitters. However, note that this is NOT to remain in the permanent
record, and will be purged upon validation. In any cases, this space should
also be used to highlight to the authors the actions that must be taken
before endorsement, even if this is elaborated on in the Correspondence
with Curators space.
- When a curator is satisfied with the quality of all names in a register list,
they should document their endorsement first in the register list, indicating
the type of review they have performed (nomenclature or genomics). The system
allows for the endorsement of individual names, but this procedure should be
avoided, preferring instead actions at the level of the register lists. Note
that progress in individual names can be documented through the Expert
Curation checks in each name. Similarly, individual names can be returned to
submitters, but this too should be avoided as it does not generate an email
alert.
- When an entry has been revised for nomenclature and genomic data quality, it
can be endorsed and/or validated by a curator.
Most potential issues covered by the rules and recommendations of the SeqCode
are now implemented as Expert Checks. Nomenclature Review and
Genomics Review can be accomplished by addressing the expert checks one by
one. Failed checks associated with recommendations are identified as warnings,
whereas errors corresponding to rules can prevent the endorsement or validation
of the names in a register list. Although the Expert Checks include a brief
description and the corresponding excerpt of the SeqCode, see the
recommendations below for a standardized curation process.
Nomenclature Review
To evaluate the nomenclature of a genus and species, follow these steps:
- Confirm that the correct rank for the taxon is indicated. For species,
information on the specific epithet should be provided on the name page, with
the binomial indicated in the name section, while the information on the
genus name is required on a separate genus page.
- Confirm that the species is linked to a genus as its parent. If the genus is
a validly published genus under the SeqCode, ICNP, or ICN, only page/s for
the species name/s, containing the binomial, are necessary, move to point 7.
If a new genus is proposed, ensure that both the species name and the genus
name pages are completely filled in and linked to the same register list if
proposed in the same effective publication. If the earliest species in the
genus is the designated type, go to the Expert Curation section of the genus
name, and check the green mark for “Later species as genus type”.
- Evaluate the genus name first to ensure that the first part of the species
name is formed correctly and the specific epithet is of a gender consistent
with the genus name when appropriate.
- If the genus name is formed from a personal name, check that the name is well
formed (check green mark for "Inapt personal name” entry), is feminine in
gender (check green mark for “Personal genus must be feminine”), and is not
contentious (check green mark for “Contentious name”).
- If the genus name is a compound name (i.e., consist of two or more word
stems), ensure that i) the name is formed by combining the stems of the first
or preceding words [to identify the stem one needs to identify the genitive
case of the word and drop its ending]; ii) -i- as the connecting vowel is
used, except where established usage from Greek origin use -o- connecting
vowels, or where the following word starts with a vowel; iii) the genus
gender is determined by the gender of the last word in the compound name.
Confirm the appropriate gender is assigned to the genus and check the green
mark for “Grammatical gender varies from source”. Confirm that roots from
existing languages are used, where words consisting of suffixes alone do not
count as roots, as these are modifiers of a root word. If roots are from
existing or extinct languages, check the green mark for the entry “Missing
roots from existing languages”. If mnemonic cues are present for the name,
check the green mark for “Lacking mnemonic cues”.
- From all of these, ensure the genus name is spelled correctly, and check the
green mark for the “Incorrect spelling” entry. Confirm that no significant
emendations of the taxon name have been missed for names that are already in
use in the literature (but are not yet validly published).
- Proceed to specific epithet evaluation. Determine whether the epithet is an
adjective, a nominative noun in apposition, or a noun in genitive case. If
the epithet is an adjective, confirm that it is of the same gender as the
genus, and check the green mark for “Inconsistent epithet relationship to
genus”. If it is a noun in genitive case or noun in opposition, no gender
agreement is necessary.
- Confirm that names are formed: i) preferably from Latin (check mark “Latin
should be preferred”); ii) in a non-contentious way (check green mark for
“Contentious name”); and iii) with mnemonic cues and roots from existing
languages (check green marks for “Lacking mnemonic cues” and “Missing roots
from existing languages”). Morphemes consisting of Latin suffixes do not
count as roots, as suffixes are modifiers of a root word, thus roots from
existing or extinct languages are required.
- Confirm spelling of name and that no emendations for the names have been
missed (check green marks for “Missing publication of emendation” and
“Incorrect spelling”).
- If no revisions on the register list is required and all names on the
register list can be endorsed or validated, tick the Nomenclature review
checkmark in the “Curator team tracking” section of the register page. If
the Genomic review is also checked, endorse or validate all names in the
list. If revisions to the register lists or names on the register list is
required, clearly indicate required revisions in the “correspondence with
curators” space on each name page, write a short description of the necessary
revisions in the register list “correspondence with curators” space, and send
the list back to the contributor.
For registry list exclusively composed of names of genera and higher taxa,
identify the type genus and species, and confirm whether these have been
validated, have been endorsed, or have been submitted for evaluation. Higher
taxon names cannot be validated if the subordinate types have not been
validated. Cases where the higher taxon names may be submitted for evaluation on
a separate register list to that of the subordinate types are where they were
proposed in a different effective publication than the nomenclatural types for
these taxa (e.g., a genus and species is proposed as incertae sedis in a
publication or no intermediate taxonomic ranks between genus and domain were
proposed), and a subsequent publication proposes the intermediate ranks of
family, order, class, and/or phylum.
To evaluate the nomenclature of a family, order, class, and/or phylum, follow
these steps:
- Confirm that the correct rank for the taxon is indicated. Current
recommendations are that families should end in the suffix -aceae, orders
should end in the suffix -ales, classes should end in the suffix -ia, and
phyla should end in the suffix -ota.
- Check that the name is formed from the inferred stem of the type genus name.
The “Nomenclature” section includes the inferred stem to aid with this
process. If the name is properly formed according to recommendations, not
contentious, and correctly spelled, check the entries as passed (green check
marks) for “Inapt personal name”, “Contentious name”, “Lacking mnemonic
cues”, “Missing roots from existing languages”, and “Incorrect spelling”.
- Confirm under the etymology section that the information (language, grammar,
particle, and description or translation) of the first morpheme is that of
the full word of the type genus.
- Confirm that the designated type genus is the first legitimate genus in the
taxon. Under the SeqCode, higher taxonomic ranks should be formed from the
first legitimate genus name belonging to the taxon. Confirm that no earlier
correct, preferred, and legitimate genera under the SeqCode, ICNP, or ICN
exist in this taxon. If there are no earlier legitimate genera to serve as
type, select the green check mark for “Later taxon as type” entry. However,
note that there could be contention on what the earliest legitimate genus is,
depending on taxonomic opinion. In any cases, defer to the taxonomy being
used or proposed by the name’s author to determine the earliest legitimate
genus in the taxon.
- For the second morpheme, no language or grammar information is necessary,
with the particle indicated as the appropriate suffix (e.g., -aceae,
-ales, -ai, or -ota) and the description being “ending to denote a
[replace with taxonomic rank]”.
- The etymology of the full word should be as follows:
Language: “N.L.”
(for Neo-Latin)
Grammar: “fem. pl. n.” (for family and order) or “neut.
pl. n.” (for class and phylum)
- Confirm that the taxon is assigned a parent taxon and at least one child
taxon (that which serves as nomenclatural type), and ideally classification
to domain-level.
- Confirm that any taxonomic emendations (for taxa published prior to
registration) are linked under the publications section.
- If no revisions on the register list is required and all names on the
register list can be endorsed or validated, tick the Nomenclature review
checkmark in the “Curator team tracking” section of the register page. If the
Genomic review is also checked, endorse or validate all names in the list. If
revisions to the register lists or names on the register list is required,
clearly indicate required revisions in the “correspondence with curators”
space on each name page, write a short description of the necessary revisions
in the register list “correspondence with curators” space, and send the list
back to the contributor.
Genomics Review
For register lists exclusively composed of names of genera and above ranks,
genome quality is not to be revised. Simply mark the list as Genomics Review
complete in the “Curator team tracking” section of the list.
To evaluate genome quality of species and subspecies types, follow these steps
for each of the species and subspecies names:
- Visit the name page, and confirm that the automated checks have not
identified any suspicious results. If the system flags inconsistent
annotations, request the methods be filled in the Submitter Comments
(unless already provided).
a. In the case of inconsistent number and fraction of the 16S rRNA genes,
note that oftentimes 16S genes may be fragmented, and the authors could
report 1 copy at 100% even though it’s split in two contigs. MiGA will
find 2 partial copies instead in such cases, but it’s not truly
inconsistent information.
- Open the “Expert Curation” box and scroll down to the Genomics subsection.
- Confirm that the genome entry is properly linked. The SeqCode Registry will
automatically flag missing entries, but you can also confirm the availability
of the entry in INSDC by clicking the first link/s in the Genomics
subsection. In general, assembly entries are preferred over Nucleotide
accessions, but both can be processed.
- Open the genome page, the link with format “Genome sc|XXXXXXX”. Scroll
down to the section “Sample Metadata”, and confirm that the source sample has
been properly linked. If the Sample Metadata section doesn’t exist in the
genome page, make sure that the link/s in the “Source” entry of the
“Genomics” section is/are correctly formed and point to existing pages. Once
confirmed, you could also click on “Update external metadata” in the “Curator
Actions” section of the genome page, wait a couple of second, and reload the
page. Consider the following:
a. BioSamples are generally preferred over SRA Experiments or Runs, except in
cases of ambiguity. For example, if a single BioSample has multiple SRA
Experiments associated and only one was used to obtain the genome.
b. In some cases, the BioSample explicitly reference another BioSample;
e.g., when separate entries were created for the binning process and the
metagenome itself. In such cases, capture all BioSamples (separated by
commas) in the genome information and notify the submitter via
Correspondence with Curators so they can check the changes for
correctness. The system can often identify these automatically, and you
can simply click on the “Link” button.
c. Note that for names effectively published before January 1 2023 the data
source is recommended but not required.
- From the BioSample, revise the metadata retrieved by the Registry. At
minimum, the BioSample should report: date, location (coordinates), toponym
(location name), environment (or biome), and at least one other field
(e.g., temperature, pH, host, etc). The system aids with this check by
automatically detecting frequent fields as part of each of the categories.
However, consider revising the values, as sometimes they’re non-informative
(e.g., a value of “NA”, “missing”, or “unreported”). The system also
detects the metadata package, as this can sometimes help with this step.
Finally, if any data field is empty (marked by the system as “Not detected”),
you can also examine the full metadata retrieved by the Registry by clicking
on the BioSample link under “All retrieved samples”. Once you check
everything from steps 4 and 5, go back to the name page and mark the “Missing
metadata in databases” as appropriate (green if passed, red if failed).
- Back in the Genomics subsection, click on the MiGA link, and revise the 16S
rRNA classification (if any) for consistency with the genome classification
(“Ribosomal and transfer RNA” section in MiGA). If the taxonomic
classification is not consistent, flag the entry as fail “Inconsistent 16S
assignment” (red mark), or mark it as pass otherwise (green mark).
- Check the ANI/AAI against both TypeMat genomes (“Taxonomy” section in MiGA)
and other genomes of the SeqCode Check project (“Distances” section in MiGA).
In the cases covered by issues on classification 1-4 (see below), request
additional justification of novelty. Make sure that this justification is
also included in the taxon description, since the correspondence with
curators will remain confidential and won’t be part of the public version of
the name.
- If minimum requirements of quality are not met, or in extreme cases of point
7 above (e.g., ANI >> 95%), flag the entry as fail “Ambiguous type
material” (red mark), or mark it as pass otherwise (green mark).
Once the process is finished for all relevant names in a given register list,
mark the Genomics Review complete.
Issues on Classification
Freedom of taxonomic opinion is a guiding principle of the SeqCode. However,
recommendations can be made in the Correspondence with Curators when any of
the following are observed in the MiGA check page of the genome (linked in the
Genomics section).
These checks are only meant to identify potentially erroneous data being
submitted (e.g., a mistaken genome accession or a missing taxonomic
evaluation), and do not constitute an endorsement of techniques such as ANI or
AAI. Therefore, any issues should be brought up to the authors as a courtesy,
but ultimately the taxonomic classification defers to that registered by the
submitters:
- For novel species: The ANI against any genome from the TypeMat database in
MiGA is greater than 95% (“Taxonomy” section in MiGA) and no justification
has been explicitly given in the description of the species.
- For novel species: The ANI against any genome from the SeqCode Check project
(“Distances” section in MiGA) is greater than 95% and the associated name is
in a protected status (i.e., proposed less than a year prior, or already
validly published). To view the SeqCode genome page of a MiGA match, visit
seqco.de/g:XXXX, where XXXX should be replaced by the genome ID. In
particular, note that high ANI between genomes of different species from the
same submission are often detected, because authors might check novelty
against databases (e.g., GTDB or MiGA) but not among their own collection.
- For novel genera or above: The AAI against any genome from TypeMat
(“Taxonomy” section in MiGA) or the SeqCode Check project (“Distances”
section in MiGA) is greater than 70% and no justification has been explicitly
given in the description of the genus.
Additionally, the following classification issues should
block the process and be addressed before endorsement or validation:
- The classification lineage does not extend up to domain. Note that this
extension can be achieved either by means of a complete lineage or through
the use of incertae sedis.
Other Issues
Any of the issues below should be followed up via
Correspondence with Curators to maintain a traceable record:
- If a species name is proposed within a genus that is not validly published,
the genus name should be proposed first (or together) with the species name
in the same register list. However, note that this is not a SeqCode rule (or
even recommendation), so it’s only a preferred practice. Similarly for the
proposal of subspecies names in a species without a validly published name.
- Whenever possible, capture previously used names other than Latin or
Latinized names (e.g., alphanumeric strings or so-called placeholder names)
in the name’s notes.
- The description must not consist of a single reference to the effective
publication, but should instead include the actual description of the taxon.
We do not enforce a specific format or minimum information, but it is
preferred that the description includes at least the properties of the taxon
(observed or inferred) and the proposed circumscription (e.g., RED, AAI,
ANI, phylogenetic reconstruction, or any other methods).
- If a description is emended, the description should indicate the original
publication first (in bold), and all emendations in separate paragraphs with
the emending publication(s) clearly indicated (also in bold). All referenced
citations should be linked to the name page.
- If a name is proposed in a publication on the basis of 16S rRNA, any other
marker gene, a low-quality genome, or any other basis, the manuscript should
be linked as a citing publication, but that is not sufficient basis to be
considered the effective publication. If the type material (a genome
sequence) is determined in a subsequent (different) publication, the latter
should be considered the effective publication.
- If the spelling of a name is corrected in a subsequent a publication, the
publication should be identified as a corrigendum (not as effective
publication). Similarly, if the description of a taxon is modified by a
publication, the publication should be identified as emedavit (not as
effective publication). The exception to this rule is in cases in which the
emendation consists of establishing a new type and the name was not
previously considered as validly published (see issue 3 above).