NCBI ID: CP182352
Our MAG has been assembled in a 1-contig circularized contig from long-read metagenomic assembly. It then has been refined through iterative assemblies and contig clustering using Trycycler v0.5.4 to generate a consensus circularized 1-contig long-read assembly, with read correction performed using medaka 190 v1.7.2 and polypolish 0.5.0. The depth of the final assembly reaches 179.3x with an estimated completeness with BUSCO as follows [C:99.0% S:98.7% D:0.3% F:0.3% M:0.6% n:623] with very few duplicated markers (D:0.3%), suggesting low contamination.
We looked in depth in the multiple copies markers found, i.e.:
2 PGK: Phosphoglycerate kinase.
3 tRNA-synt_1d: tRNA synthetases class I (R).
3 Methyltransf_5: MraW methylase family.
2 TIGR00420: trmU: tRNA (5-methylaminomethyl-2-thiouridylate)-methyltransferase.
2 TIGR00436: era: GTP-binding protein Era.
We downloaded from RefSeq Select the closest genomes to our Halarcobacter azotofixans according to blastp (5 best hits on all multi-copies markers), and submitted to MIGA.
We can see that in all 13 Arcobacteraceae RefSeq Select genomes checked, 3 of the 5 flagged markers also appears to be in multiple copies, suggesting those duplications being a biological event happening in Arcobacteraceae.
3 tRNA-synt_1d: tRNA synthetases class I (R): 2-4 copies
2 TIGR00420: trmU: tRNA (5-methylaminomethyl-2-thiouridylate)-methyltransferase: 2 copies
2 TIGR00436: era: GTP-binding protein Era: 2 copies
Concerning the PGK marker, both copies found on our genome align on Arcobacteraceae/Campylobacterales hits on blastp against RefSeq select. Moreover, the more distant variant best hit is on the Campylobacterales Sulfurimonas aquatica, in which PGK is also reported in multiple copies in MIGA.
Concerning the Methyltransf_5 marker, we only found 1 occurence of the sequence in our genome, i.e., XPV70243.1 MAG: 16S rRNA (cytosine(1402)-N(4))-methyltransferase RsmH [Halarcobacter sp.]; both with blastp and parsing the NCBI annotation using the provided marker sequence by MIGA.
We also note that for the Arcobacteraceae-level duplications:
For the t-RNA-synt_1d, we only found 1 occurence of the sequence in our genome, i.e., XPV68203.1 MAG: arginine--tRNA ligase [Halarcobacter sp.]; both with blast and parsing the NCBI annotation.
In the end and after this thorough investigation, we do believe that our genome as a lower contamination percentage than reported, if any. We therefore believe the provided metric is underestimating the real value for the assembly (i.e., our assembly is actually less contaminated than reported).
We provide as proofs:
- the list of all actual multiple copies for PGK, TIGR00420 and TIGR00436 (tRNA-synt_1d and Methyltransf_5 markers being found in single copy only after investigation)
- blastp results of all multiple copies markers against RefSeq Select
- MIGA assessment for 13 Arcobacteraceae RefSeq Select genomes showing the widely occuring multiple copies of 3 of 5 markers and MIGA assessment of 1
Campylobacterales RefSeq Select genome
The above analyses were led by Julie Boisard