1932

Abstract

Abstract

Orthologs and paralogs are two fundamentally different types of homologous genes that evolved, respectively, by vertical descent from a single ancestral gene and by duplication. Orthology and paralogy are key concepts of evolutionary genomics. A clear distinction between orthologs and paralogs is critical for the construction of a robust evolutionary classification of genes and reliable functional annotation of newly sequenced genomes. Genome comparisons show that orthologous relationships with genes from taxonomically distant species can be established for the majority of the genes from each sequenced genome. This review examines in depth the definitions and subtypes of orthologs and paralogs, outlines the principal methodological approaches employed for identification of orthology and paralogy, and considers evolutionary and functional implications of these concepts.

Loading

Article metrics loading...

/content/journals/10.1146/annurev.genet.39.073003.114725
2005-12-15
2024-03-28
Loading full text...

Full text loading...

/deliver/fulltext/ge/39/1/annurev.genet.39.073003.114725.html?itemId=/content/journals/10.1146/annurev.genet.39.073003.114725&mimeType=html&fmt=ahah

Literature Cited

  1. Amos LA, van den Ent F, Lowe J. 2004. Structural/functional homology between the bacterial and eukaryotic cytoskeletons. Curr. Opin. Cell Biol. 16:24–31 [Google Scholar]
  2. Aravind L, Koonin EV. 2001. Prokaryotic homologs of the eukaryotic DNA-end-binding protein Ku, novel domains in the Ku protein and prediction of a prokaryotic double-strand break repair system. Genome Res. 11:1365–74 [Google Scholar]
  3. Arvestad L, Berglund AC, Lagergren J, Sennblad B. 2003. Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics 19:(Suppl.)1i7–15 [Google Scholar]
  4. Benkovic SJ, Valentine AM, Salinas F. 2001. Replisome-mediated DNA replication. Annu. Rev. Biochem. 70:181–208 [Google Scholar]
  5. Bromham L, Penny D. 2003. The modern molecular clock. Nat. Rev. Genet. 4:216–24 [Google Scholar]
  6. Cutler DJ. 2000. Understanding the overdispersed molecular clock. Genetics 154:1403–17 [Google Scholar]
  7. Dandekar T, Snel B, Huynen M, Bork P. 1998. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23:324–28 [Google Scholar]
  8. Darwin C. 1859. On the Origin of Species London:
  9. Davis JC, Petrov DA. 2004. Preferential duplication of conserved proteins in eukaryotic genomes. PLoS Biol. 2:E55 [Google Scholar]
  10. Davison AJ, Scott JE. 1986. The complete DNA sequence of varicella-zoster virus. J. Gen. Virol. 67:1759–816 [Google Scholar]
  11. Della M, Palmbos PL, Tseng HM, Tonkin LM, Daley JM. et al. 2004. Mycobacterial Ku and ligase proteins constitute a two-component NHEJ repair machine. Science 306:683–85 [Google Scholar]
  12. Doerks T, von Mering C, Bork P. 2004. Functional clues for hypothetical proteins based on genomic context analysis in prokaryotes. Nucleic Acids Res. 32:6321–26 [Google Scholar]
  13. Doolittle WF. 1998. You are what you eat: A gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet. 14:307–11 [Google Scholar]
  14. Doolittle WF. 1999. Lateral genomics. Trends Cell Biol. 9:M5–8 [Google Scholar]
  15. Doolittle WF. 1999. Phylogenetic classification and the universal tree. Science 284:2124–29 [Google Scholar]
  16. Doolittle WF. 2000. Uprooting the tree of life. Sci. Am. 282:90–95 [Google Scholar]
  17. Dufayard JF, Duret L, Penel S, Gouy M, Rechenmann F, Perrière G. 2005. Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics 21:2596–603 [Google Scholar]
  18. Duret L, Mouchiroud D, Gouy M. 1994. HOVERGEN: a database of homologous vertebrate genes. Nucleic Acids Res. 22:2360–65 [Google Scholar]
  19. Eisen JA. 1998. Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8:163–67 [Google Scholar]
  20. Eulenstein O, Mirkin B, Vingron M. 1998. Duplication-based measures of difference between gene and species trees. J. Comput. Biol. 5:135–48 [Google Scholar]
  21. Evguenieva-Hackenburg E, Walter P, Hochleitner E, Lottspeich F, Klug G. 2003. An exosome-like complex in Sulfolobus solfataricus. EMBO Rep. 4:889–93 [Google Scholar]
  22. Fisher RA. 1928. The possible modification of the response of the wild type to recurrent mutations. Am. Nat. 62:115–26 [Google Scholar]
  23. Fitch WM. 1970. Distinguishing homologous from analogous proteins. Syst. Zool. 19:99–10623. The classical work defining, for the first time, orthologs and paralogs as terms and concepts. [Google Scholar]
  24. Fitch WM. 1995. Uses for evolutionary trees. Philos. Trans. R. Soc. London Ser. B 349:93–102 [Google Scholar]
  25. Fitch WM. 2000. Homology a personal view on some of the problems. Trends Genet. 16:227–31 [Google Scholar]
  26. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF. et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512 [Google Scholar]
  27. Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531–4527. The idea of subfunctionalization as the mode of evolution of paralogs is introduced as an alternative to neofunctionalization. [Google Scholar]
  28. Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA. et al. 1995. The minimal gene complement of Mycoplasma genitalium. Science 270:397–403 [Google Scholar]
  29. Fukuchi S, Nishikawa K. 2004. Estimation of the number of authentic orphan genes in bacterial genomes. DNA Res. 11:219–31–311–13 [Google Scholar]
  30. Galperin MY, Koonin EV. 2004. ‘Conserved hypothetical’ proteins: prioritization of targets for experimental study. Nucleic Acids Res. 32:5452–63 [Google Scholar]
  31. Gillespie JH. 1984. The molecular clock may be an episodic clock. Proc. Natl. Acad. Sci. USA 81:8009–13 [Google Scholar]
  32. Gogarten JP. 1994. Which is the most conserved group of proteins? Homology-orthology, paralogy, xenology, and the fusion of independent lineages. J. Mol. Evol. 39:541–43 [Google Scholar]
  33. Gray GS, Fitch WM. 1983. Evolution of antibiotic resistance genes: the DNA sequence of a kanamycin resistance gene from Staphylococcus aureus. Mol. Biol. Evol. 1:57–6633. This paper introduces the notion of xenology. [Google Scholar]
  34. Gray MW, Burger G, Lang BF. 2001. The origin and early evolution of mitochondria. Genome Biol.2
  35. Hannenhalli S, Chappey C, Koonin EV, Pevzner PA. 1995. Genome sequence comparison and scenarios for gene rearrangements: a test case. Genomics 30:299–311 [Google Scholar]
  36. He X, Zhang J. 2005. Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169:1157–6436. The latest study on functional diversification of paralogs integrates the previous models in the subneofunctionalization scheme whereby the subfunctionalization phase immediately after duplication is succeeded by neofunctionalization. [Google Scholar]
  37. Huxley THH. 1860. ‘The Origin of Species’. Westminst. Rev. 17:541–70 [Google Scholar]
  38. Huynen MA, van Nimwegen E. 1998. The frequency distribution of gene family sizes in complete genomes. Mol. Biol. Evol. 15:583–89 [Google Scholar]
  39. Jenkins C, Samudrala R, Anderson I, Hedlund BP, Petroni G. et al. 2002. Genes for the cytoskeletal protein tubulin in the bacterial genus Prosthecobacter. Proc. Natl. Acad. Sci. USA 99:17049–54 [Google Scholar]
  40. Jensen RA. 2001. Orthologs and paralogs—we need to get it right. Genome Biol. 2: INTERACTIONS1002 40. Continuation of the debate on the importance of orthologs and paralogs as concepts and terms. Emphasizes the importance of exact definitions, in particular, that the notion of paralogy applies not only to genes in the same genome. [Google Scholar]
  41. Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV. 2001. Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res. 11:555–65 [Google Scholar]
  42. Jordan IK, Wolf YI, Koonin EV. 2004. Duplicated genes evolve slower than singletons despite the initial rate increase. BMC Evol. Biol. 4:22 [Google Scholar]
  43. Karlberg O, Canback B, Kurland CG, Andersson SG. 2000. The dual origin of the yeast mitochondrial proteome. Yeast 17:170–87 [Google Scholar]
  44. Kimura M. 1983. The Neutral Theory of Molecular Evolution Cambridge: Cambridge Univ. Press
  45. Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV. 2002. Selection in the evolution of gene duplications. Genome Biol. 3 RESEARCH0008
  46. Koonin EV. 2001. An apology for orthologs—or brave new memes. Genome Biol. 2 COMMENT1005 46. Reply to the “Homologuephobia” comment of Petsko. Emphasizes that orthologs and paralogs are not just words but crucial concepts of evolutionary genomics.
  47. Koonin EV. 2003. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat. Rev. Microbiol. 1:127–36 [Google Scholar]
  48. Koonin EV, Aravind L, Kondrashov AS. 2000. The impact of comparative genomics on our understanding of evolution. Cell 101:573–76 [Google Scholar]
  49. Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM. et al. 2004. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 5:R749. Description of the first collection of sets of probable orthologs from 7 sequenced eukaryotic genomes (KOGs). Reports analysis of various evolutionary patterns in KOGs, including lineage-specific gene loss, functional characteristics of one-to-one orthologs, and quantitative assessment of domain accretion. [Google Scholar]
  50. Koonin EV, Galperin MY. 2002. Sequence—Evolution—Function. Computational Approaches in Comparative Genomics New York: Kluwer
  51. Koonin EV, Makarova KS, Aravind L. 2001. Horizontal gene transfer in prokaryotes: quantification and classification. Annu. Rev. Microbiol. 55:709–42 [Google Scholar]
  52. Koonin EV, Mushegian AR, Bork P. 1996. Non-orthologous gene displacement. Trends Genet. 12:334–36 [Google Scholar]
  53. Koonin EV, Wolf YI, Karev GP. 2002. The structure of the protein universe and genome evolution. Nature 420:218–23 [Google Scholar]
  54. Kummerfeld SK, Teichmann SA. 2005. Relative rates of gene fusion and fission in multi-domain proteins. Trends Genet. 21:25–30 [Google Scholar]
  55. Kunin V, Ouzounis CA. 2003. The balance of driving forces during genome evolution in prokaryotes. Genome Res. 13:1589–94 [Google Scholar]
  56. Lang BF, Gray MW, Burger G. 1999. Mitochondrial genome evolution and the origin of eukaryotes. Annu. Rev. Genet. 33:351–97 [Google Scholar]
  57. Lawrence JG, Hendrickson H. 2003. Lateral gene transfer: When will adolescence end? Mol. Microbiol. 50:725–27 [Google Scholar]
  58. Lowe J, van den Ent F, Amos LA. 2004. Molecules of the bacterial cytoskeleton. Annu. Rev. Biophys. Biomol. Struct. 33:177–98 [Google Scholar]
  59. Lynch M, Conery JS. 2000. The evolutionary fate and consequences of duplicate genes. Science 290:1151–55 [Google Scholar]
  60. Lynch M, Force A. 2000. The probability of duplicate gene preservation by subfunctionalization. Genetics 154:459–73 [Google Scholar]
  61. Lynch M, Katju V. 2004. The altered evolutionary trajectories of gene duplicates. Trends Genet. 20:544–49 [Google Scholar]
  62. Makarova KS, Aravind L, Galperin MY, Grishin NV, Tatusov RL. et al. 1999. Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell. Genome Res. 9:608–28 [Google Scholar]
  63. Mirkin B, Muchnik I, Smith TF. 1995. A biologically consistent model for comparing molecular phylogenies. J. Comput. Biol. 2:493–50763. The first method for tree reconciliation, in principle, the approach of choice for identification of orthologs. [Google Scholar]
  64. Mirkin BG, Fenner TI, Galperin MY, Koonin EV. 2003. Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3:2 [Google Scholar]
  65. Nogales E, Downing KH, Amos LA, Lowe J. 1998. Tubulin and FtsZ form a distinct family of GTPases. Nat. Struct. Biol. 5:451–58 [Google Scholar]
  66. Novichkov PS, Omelchenko MV, Gelfand MS, Mironov AA, Wolf YI, Koonin EV. 2004. Genome-wide molecular clock and horizontal gene transfer in bacterial evolution. J. Bacteriol. 186:6575–8566. An assessment of the validity of molecular clock on genome scale. Shows that the majority of clusters of one-to-one orthologs evolve in the clock-like mode but also that a significant minority experienced XGD. [Google Scholar]
  67. O'Brien KP, Remm M, Sonnhammer EL. 2005. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33: Database Issue D476–80 [Google Scholar]
  68. Ochman H. 2002. Distinguishing the ORFs from the ELFs: short bacterial genes and the annotation of genomes. Trends Genet. 18:335–37 [Google Scholar]
  69. Ohno S. 1970. Evolution by Gene Duplication New York: Springer-Verlag69. A seminal work, the first to present a coherent concept of gene duplication as a major formative force of evolution.
  70. Omelchenko MV, Makarova KS, Wolf YI, Rogozin IB, Koonin EV. 2003. Evolution of mosaic operons by horizontal gene transfer and gene displacement in situ. Genome Biol. 4:R55 [Google Scholar]
  71. Ouzounis C. 1999. Orthology: another terminology muddle. Trends Genet. 15:445 [Google Scholar]
  72. Owen R. 1848. On the Archetype and Homologies of the Vertebrate Skeleton London: Murray72. Introduces homology referring to “the same organ in different animals under every variety of form and function”.
  73. Page RD, Charleston MA. 1997. From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Mol. Phylogenet. Evol. 7:231–40 [Google Scholar]
  74. Panchen AL. 1994. Richard Owen and the concept of homology. In Homology: The Hierarchical Basis of Comparative Biology ed. BK Hall pp. 21–62 San Diego: Academic [Google Scholar]
  75. Papp B, Pal C, Hurst LD. 2003. Dosage sensitivity and the evolution of gene families in yeast. Nature 424:194–97 [Google Scholar]
  76. Patterson C. 1988. Homology in classical and molecular biology. Mol. Biol. Evol. 5:603–25 [Google Scholar]
  77. Pennisi E. 1998. Genome data shake tree of life. Science 280:672–74 [Google Scholar]
  78. Pennisi E. 2001. Microbial genomes. Sequences reveal borrowed genes. Science 294:1634–35 [Google Scholar]
  79. Perrière G, Duret L, Gouy M. 2000. HOBACGEN: database system for comparative genomics in bacteria. Genome Res. 10:379–85 [Google Scholar]
  80. Petsko GA. 2001. Homologuephobia. Genome Biol. 2 COMMENT1002 80. A witty comment that sparked the discussion of the meaning and importance of the terms orthologs and paralogs.
  81. Reeck GR, de Haen C, Teller DC, Doolittle RF, Fitch WM. et al. 1987. “Homology” in proteins and nucleic acids: a terminology muddle and a way out of it. Cell 50:66781. An early condemnation of incorrect uses of the term homology (as in “percent homology,” “strong homology” etc). Emphasizes that homology should be used exclusively to refer to common origin of genes (proteins). [Google Scholar]
  82. Remm M, Storm CE, Sonnhammer EL. 2001. Automatic clustering of orthologs and inparalogs from pairwise species comparisons. J. Mol. Biol. 314:1041–5282. Introduces the terms in- and outparalogs. [Google Scholar]
  83. Snel B, Bork P, Huynen M. 2000. Genome evolution: gene fusion versus gene fission. Trends Genet. 16:9–11 [Google Scholar]
  84. Snel B, Bork P, Huynen MA. 2002. Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res. 12:17–25 [Google Scholar]
  85. Sonnhammer EL, Koonin EV. 2002. Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. 18:619–2085. Conceptualizes and explains the notions of in- and outparalogs, and coorthologs. [Google Scholar]
  86. Storm CE, Sonnhammer EL. 2002. Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics 18:92–99 [Google Scholar]
  87. Storm CE, Sonnhammer EL. 2003. Comprehensive analysis of orthologous protein domains using the HOPS database. Genome Res. 13:2353–62 [Google Scholar]
  88. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B. et al. 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformat. 4:41 [Google Scholar]
  89. Tatusov RL, Koonin EV, Lipman DJ. 1997. A genomic perspective on protein families. Science 278:631–3789. Description of the first method for identifying clusters of orthologs in multiple genomes and the first version of Clusters of Orthologous Groups of proteins (COGs). [Google Scholar]
  90. Tatusov RL, Mushegian AR, Bork P, Brown NP, Hayes WS. et al. 1996. Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr. Biol. 6:279–91 [Google Scholar]
  91. Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT. et al. 2001. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29:22–28 [Google Scholar]
  92. van den Ent F, Amos LA, Lowe J. 2001. Prokaryotic origin of the actin cytoskeleton. Nature 413:39–44 [Google Scholar]
  93. Varshavsky A. 2004. ‘Spalog’ and ‘sequelog’: neutral terms for spatial and sequence similarity. Curr. Biol. 14:R181–8393. The latest twist in the debate on homology, orthology and paralogy. Inference-free terms are proposed to designate sequence and structural similarity between proteins. [Google Scholar]
  94. Veitia RA. 2004. Gene dosage balance in cellular pathways: implications for dominance and gene duplicability. Genetics 168:569–74 [Google Scholar]
  95. Veitia RA. 2005. Gene dosage balance: deletions, duplications and dominance. Trends Genet. 21:33–35 [Google Scholar]
  96. Wang C, Xi J, Begley TP, Nicholson LK. 2001. Solution structure of ThiS and implications for the evolutionary roots of ubiquitin. Nat. Struct. Biol. 8:47–51 [Google Scholar]
  97. Webber C, Ponting CP. 2004. Genes and homology. Curr. Biol. 14:R332–33 [Google Scholar]
  98. Wolf YI, Rogozin IB, Grishin NV, Koonin EV. 2002. Genome trees and the tree of life. Trends Genet. 18:472–79 [Google Scholar]
  99. Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV. 2001. Genome alignment, evolution of prokaryotic genome organization and prediction of gene function using genomic context. Genome Res. 11:356–72 [Google Scholar]
  100. Yanai I, Wolf YI, Koonin EV. 2002. Evolution of gene fusions: horizontal transfer versus independent events. Genome Biol. 3 research0024
  101. Zawel L, Reinberg D. 1995. Common themes in assembly and function of eukaryotic transcription complexes. Annu. Rev. Biochem. 64:533–61 [Google Scholar]
  102. Zhang L. 1997. On a Mirkin-Muchnik-Smith conjecture for comparing molecular phylogenies. J. Comput. Biol. 4:177–87 [Google Scholar]
  103. Zmasek CM, Eddy SR. 2002. RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformat 3:14 [Google Scholar]
  104. Zuckerkandl E, Pauling L. 1962. Molecular evolution. In Horizons in Biochemistry ed. M Kasha, B Pullman pp. 189–225 New York: Academic [Google Scholar]
  105. Zuckerkandl E, Pauling L. 1965. Evolutionary divergence and convergence of proteins. In Evolving Gene and Proteins ed. Bryson V, Vogel HJ pp. 97–166 New York: Academic105. This and the preceding paper are seminal works that laid the foundation of molecular evolution and include discussion of different types of homologous relationships presaging the concepts of orthology and paralogy. [Google Scholar]
/content/journals/10.1146/annurev.genet.39.073003.114725
Loading
/content/journals/10.1146/annurev.genet.39.073003.114725
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error