Phylogenetics and Population Genetics (10/24/2014)


10/01/14

Wed, 10/01/2014 at 2:21 PM

Customer: We’re interested in phylogenetic analysis of a couple of data sets (nucleotide sequence files) derived from related mammalian taxa. The first samples several loci from single individuals representing numerous species, and the second samples a single gene from a subset of those species. The tasks include: (1) Infer an effective population size (Ne) for each node of the multigene-derived phylogenetic tree. (2) Estimate species divergence times associated with the single gene-derived phylogenetic tree. (3) Infer a reconstructed ancestral sequence for the basal node of the single gene tree. (4) Perform a codon-by-codon ka/ks analysis of the extant sequences. (5) Determine the mutation rate for each branch of the single gene tree. And (6) Infer a phylogenetic tree for the single gene data set using small indels as “derived characters”. Relevant software might include PAML, BEAST, & Lazarus (http://markov.uoregon.edu/software/lazarus/).

Thu, 10/09/2014 at 8:29 AM

AccuraScience LB: Our general plan is, we will adopt various molecular clock models under likelihood or Bayesian framework to deduce the species divergence time and the mutation rate for each branch. Based on the best reconstructed species tree (using likelihood tests for multiple alternative topologies), the maximum likelihood or MP method (method chosen depends on the divergence of the species) would be adopted to estimate the ancestral sequence of the internal nodes. PAML will be employed to estimate the site-specific Ka/Ks. For the phylogenetic trees based on indel, the distance based method would adopted to deduce the phylogeny.

A few technical questions/comments: For task 1, since it’s multiple species data, are you trying to estimate the ancestral population size of these species?

For task 2, if you have already gained the information of some confirmed (fossil record, etc) divergence time among certain species, could you provide them to us?

For task 3, if only the root (basal) sequence would be estimated, it would be better for us to have one extra sequence out group species - or let us know the name(s) of out group species that we may use.

Thu, 10/09/2014 at 12:54 PM

Customer: (1) Yes, we would like to estimate ancestral population size of the species at each node of the phylogenetic tree. Does the ad hoc think this can be done in the absence of data concerning allelic diversity?

(2) Yes, we can provide fossil dates to calibrate the molecular clock.

(3) Yes, we can suggest a suitable out-group sequence (or sequences) to provide temporal orientation to the tree.

Our main interest concerns the estimation of ancestral effective population sizes occurring 20-40 MYA. As mentioned previously, we have access to a data set of single samples (i.e., no alleles) of multiple loci from numerous species. Do you believe population size estimation can be achieved with that type of sequence information at that time scale?

Fri, 10/24/2014 at 11:34 AM

AccuraScience LB: Indeed, the relatively long divergence time (phylogenetic depth) of interesting species could be one factor contributing to the uncertainty in the estimated ancestral effective population size (theta). However, many other factors in models and the implementation of Bayesian MCMC would profoundly influence the estimation, too. The complexity involved in the estimation is highly dependent on the data, and we would need to examine the data and play with it in order to see how it will pan out. If the estimation of ancestral effective population size (theta) at the interested node is not robust (low posterior probability, not convergent when MCMC chain is long enough, incongruent across models, highly dependent on the selection of loci or prior parameters (such as theta) in Bayesian estimation), we will provide an objective description of the results (with some related discussions) and recommend particular results which could be used for publication.

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.