Latest Cover

Online Office

Contact Us

Issue:ISSN 1000-7083
          CN 51-1193/Q
Director:Sichuan Association for Science and Technology
Sponsored by:Sichuan Society of Zoologists; Chengdu Giant Panda Breeding Research Foundation; Sichuan Association of Wildlife Conservation; Sichuan University
Address:College of Life Sciences, Sichuan University, No.29, Wangjiang Road, Chengdu, Sichuan Province, 610064, China
Fax:+86-28-85410485 &
Your Position :Home->Past Journals Catalog->2018 Vol.37 No.3

Clustering Mitochondrial DNA Sequences Experienced Tandem Duplication Based on Alignment-free Comparison in Quasipaa boulengeri
Author of the article:CAO Yue1,2, XIA Yun1, ZHENG Yuchi1*
Author's Workplace:1. Department of Herpetology, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China
Key Words:Quasipaa boulengeri; mitochondrial DNA; alignment-free comparison; clustering; duplication region; Robinson-Foulds distance; protein-coding gene; Maximum Likelihood tree
Abstract:Animal mitochondrial genome regions experienced tandem duplication and the following random loss are often hypervariable and hence challenging for alignment algorithms. In theory, alignment-free comparison methods (AFM) can be used to summarize and visually present the relationships and similarities of such sequences. To our knowledge, relevant evaluations and applications are lacking. We evaluated 3 types of commonly used k-mer-based AFM with a system of intraspecific sequence variation for one such region around the origin of light strand replication. From the frog species Quasipaa boulengeri, 19 sequences ranging from 583 bp to 695 bp were clustered using 28 AFM. For each method, substrings of length k=4, 6, 8, 10, 12, 14, 16, 18, and 20 bp were tried. From the same individuals, the mitochondrial protein-coding sequences with length of 1 518 bp were used to reconstruct a Maximum Likelihood tree as the reference topology. Between the reference and AFM topologies, the Robinson-Foulds distance was calculated and the major topological difference was recorded. Using a k value of typically 8, half of the methods produced a tree different from the reference by only 2 nodes (11.8%). However, poor performances were constantly observed for some methods. A small k value of 4 was found to be suitable for inferring the relationships among sequence groups. These findings support a successful application of AFM on animal mitochondrial tandem duplication regions. The combinations between methods and k values with ideal performance obtained here may be applied to similar systems. For different systems, similar evaluations will be helpful.
2018,37(3): 261-267 收稿日期:2018-01-18
Almeida JS. 2013. Sequence analysis by iterated maps, a review[J]. Briefings in Bioinformatics, 15(3):369-375.
Bernard G, Chan CX, Ragan MA. 2016. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer[J]. Scientific Reports, 6:28970.
Bonham-Carter O, Steele J, Bastola D. 2013. Alignment-free genetic sequence comparisons:a review of recent approaches by word analysis[J]. Briefings in Bioinformatics, 15(6):890-905.
Chan CX, Bernard G, Poirion O, et al. 2014. Inferring phylogenies of evolving sequences without multiple sequence alignment[J]. Scientific Reports, 4:6504.
Felsenstein J. 1989. PHYLIP-phylogeny inference package (version 3.2)[J]. Cladistics, 5(2):164-166.
Fonseca MM, Harris DJ. 2008. Relationship between mitochondrial gene rearrangements and stability of the origin of light strand replication[J]. Genetics and Molecular Biology, 31(2):566-574.
Haubold B. 2013.Alignment-free phylogenetics and population genetics[J]. Briefings in Bioinformatics, 15(3):407-418.
Hide W, Burke J, Da Vison DB. 1994. Biological evaluation of d2, an algorithm for high-performance sequence comparison[J]. Journal of Computational Biology, 1(3):199-215.
H hl M, Ragan MA. 2007. Is multiple-sequence alignment required for accurate inference of phylogeny?[J]. Systematic Biology, 56(2):206-221.
H hl M, Rigoutsos I, Ragan MA. 2006. Pattern-based phylogenetic distance rstimation and tree reconstruction[J]. Evolutionary Bioinformatics Online, 2(1):359-375.
Jiang B, Song K, Ren J, et al. 2012. Comparison of metagenomic samples using sequence signatures[J]. BMC Genomics, 13:730.
Jun SR, Sims GE, Wu GA, et al. 2010. Whole-proteome phylogeny of prokaryotes by feature frequency profiles:an alignment-free method with optimal feature resolution[J]. Proceedings of the National Academy of Sciences, 107(1):133-138.
Lanfear R, Frandsen PB, Wright AM, et al. 2016. PartitionFinder 2:new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses[J]. Molecular Biology and Evolution, 34(3):772-773.
Larkin MA, Blackshields G, Brown NP, et al. 2007. Clustal W and Clustal X version 2.0[J]. Bioinformatics, 23(21):2947-2948.
Lu YY, Tang K, Ren J, et al. 2017. CAFE:aCcelerated Alignment-FrEe sequence analysis[J]. Nucleic Acids Research, 45:W554-W559.
Qi J, Luo H, Hao B. 2004. CVTree:a phylogenetic tree reconstruction tool based on whole genomes[J]. Nucleic Acids Research, 32:W45-W47.
Ren J, Song K, Deng M, et al. 2016. Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics[J]. Bioinformatics, 32(7):993-1000.
Robinson DF, Foulds LR. 1981. Comparison of phylogenetic trees[J]. Mathematical Biosciences, 53(1-2):131-147.
San Mauro D, Gower DJ, Zardoya R, et al. 2006. A hotspot of gene order rearrangement by tandem duplication and random loss in the vertebrate mitochondrial genome[J]. Molecular Biology and Evolution, 23(1):227-234.
Sims GE, Jun SR, Wu GA, et al. 2009a. Whole-genome phylogeny of mammals:evolutionary information in genic and nongenic regions[J]. Proceedings of the National Academy of Sciences, 106(40):17077-17082.
Sims GE, Jun SR, Wu GA, et al. 2009b. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions[J]. Proceedings of the National Academy of Sciences, 106(8):2677-2682.
Song K, Ren J, Reinert G, et al. 2013. New developments of alignment-free sequence comparison:measures, statistics and next-generation sequencing[J]. Briefings in Bioinformatics, 15(3):343-353.
Stamatakis A. 2014. RAxML version 8:a tool for phylogenetic analysis and post-analysis of large phylogenies[J]. Bioinformatics, 30(9):1312-1313.
Sukumaran J, Holder MT. 2010. DendroPy:a Python library for phylogenetic computing[J]. Bioinformatics, 26(12):1569-1571.
Ulitsky I, Burstein D, Tuller T, et al. 2006. The average common substring approach to phylogenomic reconstruction[J]. Journal of Computational Biology, 13(2):336-350.
Vinga S, Almeida J. 2003. Alignment-free sequence comparison-a review[J]. Bioinformatics, 19(4):513-523.
Vinga S. 2013.Information theory applications for biological sequence analysis[J]. Briefings in Bioinformatics, 15(3):376-389.
Vinga S. 2014. Alignment-free methods in computational biology[J]. Briefings in Bioinformatics, 15(3):341-342.
Wang Y, Liu L, Chen L, et al. 2014. Comparison of metatranscriptomic samples based on k-tuple frequencies[J]. PLoS ONE, 9(1):e84348. DOI:10.1371/journal.pone.0084348.
Wu TJ, Huang YH, Li LA. 2005. Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences[J]. Bioinformatics, 21(22):4125-4132.
Xia Y, Zheng Y, Murphy RW, et al. 2016. Intraspecific rearrangement of mitochondrial genome suggests the prevalence of the tandem duplication-random loss (TDLR) mechanism in Quasipaa boulengeri[J]. BMC Genomics, 17:965.
Yi H, Jin L. 2013. Co-phylog:an assembly-free phylogenomic approach for closely related organisms[J]. Nucleic Acids Research, 41(7):e75. DOI:10.1093/nar/gkt003.
Zielezinski A, Vinga S, Almeida J, et al. 2017. Alignment-free sequence comparison:benefits, applications, and tools[J]. Genome Biology, 18(1):186.
CopyRight©2020 Editorial Office of Sichuan Journal of Zoology