Symposium proposal | |
Organizer: | Qiqing Tao (Temple University) |
Co-organizer: | Beatriz Mello (Federal University of Rio de Janeiro) |
Reliable divergence time estimates from different species, genes, and strains are crucial to decipher the micro- and macro-evolutionary temporal patterns. Ongoing advances in sequencing technologies have led to a fast expansion in the size of molecular datasets. This expansion necessitates the development of innovative and efficient methods to infer divergence times from genome-scale datasets that often scale to hundreds of species. However, the power and potential of these methods are not fully recognized. In this symposium, we will bring together the community to highlight studies of efficient dating methods, including the theoretical foundation of emerging computational efficient dating methods and tools, extensive evaluations of these methods using simulated and empirical datasets, and their new practical applications. Focus will be on the molecular dating analysis of phylogenomic data. This symposium will promote an open discussion on each method's strengths and weaknesses, providing practical guidelines for using these methods effectively. |
S18-1
Dating the tree of life using big data
Sudhir Kumar1
1Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
2Department of Biology, Temple University, Philadelphia, PA 19122, USA
2Department of Biology, Temple University, Philadelphia, PA 19122, USA
Molecular sequence datasets have become key to adding a temporal dimension to the Tree of Life. Many relaxed clock methods now enable us to predict species divergence times despite extensive evolutionary rate variation among lineages. Now, molecular data set are growing quickly due to the development of inexpensive sequencing technology for which we have developed the RelTime method. In this presentation, I will present theoretical and practical features of the RelTime method and introduce its implementation in the MEGA software for both RelTime node= and tip-dating. I will demonstrate that RelTime is fast, accurate, and flexible. RelTime does not require many of the priors needed by the Bayesian approaches while permitting variable rates among lineages, accommodating multiple calibrations and their probability densities, and generating realistic confidence intervals around the estimated times. The computational speed and reliability of RelTime enable enhanced scientific rigor and reproducibility in biological research for large and small data sets.
S18-2
Future directions for molecular dating: incorporating a wider range of evidence
Lindell Bromham1, Xia Hua2
1Research School of Biology Australian National University
2Mathematical Sciences institute Australian National University
2Mathematical Sciences institute Australian National University
Molecular data have an ever-expanding reach in biology, used not only to understand evolutionary past and processes, but also a wide range of practical applications including epidemiology and conservation biology. These applications rely on the accuracy of DNA-based estimates of time, genetic distance, and rates of genomic change, which rest on assumptions about molecular evolutionary processes. The assumptions underlying molecular dating analyses proliferate as the methods become more sophisticated. First generation molecular dating methods assumed uniform rates of change in all lineages. Second generation methods placed lineages in distinct rate classes. Third generation “relaxed clock” methods allow rates to differ across all lineages, relying on stochastic models of rate change to assign rates to lineages. What should the fourth generation of molecular clocks be like? We will discuss possible future directions that use a wider range of information to “calibrate” rates of change, by incorporating more forms of temporal information and empirically-based models of rate variation.
S18-3
Divergence time estimation with Hamiltonian Monte Carlo sampling and ratio transform
Xiang Ji1
1Tulane University
Since the first molecular clock model proposed in the sixties, researchers have been seeking more reliable divergence time estimations. Due to the confounding nature of the evolutionary rate and time, the effort for improving the accuracy and efficiency of inferring the divergence times is limited. This limitation is partly due to the constraints on the node heights imposed by the tree structure such that a parent node is always closer to the root node than its descendant nodes.
Inspired by the pioneering work of Kishino, Thorne and Bruno 2001 (PMID: 11230536), we apply a ratio transformation for the internal node heights that one can apply to both serially sampled or concurrent data. The ratio transformation serves as a reparameterization that works with any existing phylogenetic models. We illustrate the ratio transformation and combine it with the recent linear-time $O(N)$-dimensional gradient algorithm developed in Ji et al. 2020 (PMID: 32458974) where N is the number of tips such that the gradient with respect to the ratio space is still linear in computation. We apply Hamiltonian Monte Carlo (HMC) method in the transformed ratio space for divergence time estimations. In this talk, we illustrate the performance of divergence time estimations with HMC through the ratio transformation using three viral data sets. We show that the proposed HMC method performs better in terms of accuracy and efficiency compared with classic univariate Metropolis-Hastings (UMH) samplers implemented in BEAST.
S18-4
A simple and robust method for dating divergence times with relaxed clocks
Koichiro Tamura1,2, Qiqing Tao3,4, Sudhir Kumar3,4,5
1Department of Biological Sciences, Tokyo Metropolitan University
2Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University
3Institute for Genomics and Evolutionary Medicine, Temple University
4Department of Biology, Temple University
5Center for Excellence in Genome Medicine and Research, King Abdulaziz University
2Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University
3Institute for Genomics and Evolutionary Medicine, Temple University
4Department of Biology, Temple University
5Center for Excellence in Genome Medicine and Research, King Abdulaziz University
For the inference of divergence time with a relaxed molecular clock, Bayesian approaches are widely utilized. Bayesian methods require specification of a probability distribution of evolutionary rates and an assumption of the presence or absence of rate autocorrelation among branches. They are computationally demanding, as they need extensive sampling from the posterior distribution in the MCMC algorithm, which makes it challenging to apply them to very large datasets. However, individual branches of a phylogenetic tree contain information on their evolutionary rate relative to other branches nearby in their lengths. Extracting the relative rates of evolution for individual branches from the tree, we can relax the assumption of the strict molecular clock and estimate the divergence times using neither any assumptions nor heavy computational burden. This presentation will show how our method, RelTime, works using a simple theoretical framework. RelTime usually works as accurate as Bayesian methods without assuming a probability distribution, under the presence and absence of rate autocorrelation among branches, with much shorter computation time. When the assumptions in the Bayesian methods do not hold, RelTime works better than the Bayesian methods.
S18-5
Accessing the relative performance of fast molecular dating methods for phylogenomic data
Beatriz Mello1
1Federal University of Rio de Janeiro
Molecular dating has become an essential tool for evolutionary studies. Over the last decades, major advances in sequencing technologies have allowed the assembly of large molecular datasets to estimate divergence times between species. Such huge datasets pose a computational burden to molecular dating methods, hindering the testing and proposition of evolutionary hypotheses. This has prompted the development of faster methodologies, which have been increasingly used to infer biological timescales. However, a large-scale comparison of fast methods against the standard and computationally demanding Bayesian framework has not been performed yet. This evaluation is important to cross-validate the evolutionary scenarios inferred using rapid methods, and to understand their relative performance under several diversification scenarios. Therefore, we compared two widely used rapid dating methods, penalized likelihood (implemented in TreePL) and RelTime, to Bayesian approaches by investigating 23 empirical phylogenomic data. Our results showed that rapid dating methodologies may be an efficient alternative to Bayesian methods, since they generally exhibited similar performance requiring significantly less computational times. However, RelTime method generally produced time estimates that were closer to Bayesian estimates than TreePL. Additionally, RelTime confidence intervals showed a superior performance when compared to the bootstrap procedure used in penalized likelihood dating.