The completeness and contamination estimates provided by CheckM are based on the presence of predefined lineage-specific marker sets [
17]. These marker sets can be further refined using checkm qa --out_format 4 (list of marker genes and their counts), which allows screening a lineage, in our case
Candidatus Poseidoniales (MG II), to identify markers which are absent from all MAGs in the lineage. Based on this workflow we excluded 11 marker genes (TIGR00537, TIGR02237, TIGR00422, PF01287.15, TIGR03677, PF09249.6, PF13685.1, TIGR00162, PF02649.9, PF03684.8, PF01849.13) that were absent across all 270
Ca. Poseidoniales MAGs, resulting in 177 marker genes being used for completeness and contamination estimates.