1st International and 10th National Iranian Conference on Bioinformatics
Towards accurate avian tree of life
Paper ID : 1186-ICB10
Authors:
Sina Majidian *
Postdoctoral Fellow at the University of Lausanne
Abstract:
Species tree reconstruction was long ago based on morphological characters. In 1990s, a few universal orthologous genes (gene markers) including the small and large subunit ribosomal RNAs were utilised for this purpose. This built the foundation of molecular phylogenetics [1]. However, thanks to the advent of DNA sequencing technologies, sequence data are being widely used for species tree inference. Thus, nowadays hundreds of gene markers could be used in this regard [2]. Nevertheless, inferring these marker loci is computationally demanding and requires complicated pipelines to extract the single copy ortholog groups (OGs). Recently, a new phylogeny for a dataset of 363 birds has been proposed [3]. Here, we discuss the speed and accuracy of their approach compared to the NCBI taxonomy using the Robinson–Foulds metric. Besides, we present our fast method for tree inference for arbitrary species sets. In this method, the OMAmer software was used to place proteins of species of interest onto a database of hierarchical orthologous groups from OMA [4]. Then, each protein is mapped to an OG if possible. Next, the most informative OGs are selected for which multiple sequence alignment (MSA) matrices are computed. Finally, the species tree is inferred using IQ-tree using the super-matrix. The super-matrix of orthologous characters was computed within 50 CPU hours and IQ-Tree needs 11h on 48 CPUs to infer the avian tree.

[1] Kapli, Paschalia, et al. "Phylogenetic tree building in the genomic age." Nature Reviews Genetics 21.7 (2020): 428-444.
[2] Dylus, David, et al. "How to build phylogenetic species trees with OMA." F1000Research 9 (2020): 511.
[3] Feng, Shaohong, et al. "Dense sampling of bird diversity increases power of comparative genomics." Nature 587.7833 (2020): 252-257.
[4] Rossier, Victor, et al. "OMAmer: tree-driven and alignment-free protein assignment to subfamilies outperforms closest sequence approaches." Bioinformatics 37.18 (2021): 2866-2873.
Keywords:
Comparative genomics, Species tree, NCBI taxonomy, Orthologous groups .
Status : Paper Accepted (Poster Presentation)