Nonhomogeneous models

Most P-symbiont studies have used phylogenetic methods implementing standard models of molecular evolution, which are based on stochastic processes with two main implicit assumptions: homogeneous base composition and constant substitution rates Due to these assumptions, the evolutionary processes can be modeled and analyzed using the time-reversible Markov chain model as the methodological basis Consequently, if any force directs substitution processes, the assumption of time-reversibility becomes violated . A typical, well-known example of such selection-driven change is the compositional difference between 16S rRNA genes of thermophilic and mesophilic bacteria . When analyzed in the context of thermophils, the mesophilic bacteria Deinoccus and Bacillus cluster as sister groups, in contrast to strong evidence for their polyphyly. This conflict has been repeatedly attributed to convergent selection-driven evolution of thermophils toward a GC-rich genome (Mooers and Holmes, 2000; Foster, 2004) . A similar effect can be seen in the AT-rich sequences of symbiotic bacteria. Particularly in 16S rDNA analysis, this phenomenon can play a crucial role, because long stretches within the transcribed rRNA loops can accommodate an enrichment of AT residues

This problem does not have any simple solution Initial attempts to cope with it relied on distance calculations eliminating the effect of composition heterogeneity, particularly the paralinear (Logdet) method (Lake, 1994; Lockhart et al , 1994) or alternative distance formula suggested by Galtier and Gouy (1995) However, the distance methods are generally considered an inferior phylogenetic tool compared to the maximum parsimony (MP), maximum likelihood (ML), or Bayesian analysis It is therefore understandable that the nonhomogeneous approach was soon introduced into the maximum likelihood framework

The model developed by Yang and Roberts (1995) extended the well-known HKY85 (Hasegawa et al , 1985) substitution model by introducing different compositional parameters for each tree branch . Although this algorithm is in principle capable of dealing with nonhomogeneous sequences, this model is too parameter rich and thus computationally demanding Moreover, the necessity to estimate parameters from the data is a potential source of topological distortions To overcome these difficulties, Galtier and Gouy (1998) simplified the model by replacing the HKY basis with T92 (a single parameter for G + C) (Tamura, 1992) It was only this new version of the nonhomogeneous model that was subsequently used to test the monophyly/polyphyly of the P-symbiotic lineages (Herbeck et al , 2005) This study brought the first strong evidence favoring P-symbionts polyphyly However, it has not settled the issue at all On the contrary, several authors expressed their dissent with the polyphyletic view and tried to prove the opposite

The main problem is that while there is no doubt about the superior performance of nonhomogeneous model(s) in some particular cases, it may be extremely difficult to predict their behavior for various matrices and datasets Indeed, selection of a proper model is one of the very central issues of ML methodology A well known property of evolutionary models is that their predictive power decreases with additional parameters (Posada and Buckley, 2004; Steel, 2005) The nonhomogeneous model applied to the P-symbiont phylogeny uses a free compositional parameter(s) on each branch, which may rapidly lead to the over-parameterization of the analysis with the increasing number of branches in the tree . Ultimately, this property is a reason why the nonhomogeneous technique may not be particularly suitable for solving the P-symbionts issue To decrease the complexity of the nonhomogeneous model, Foster (2004) suggested that instead of introducing many free parameters along a tree, an application of only a few vectors of composition is sufficient to handle compositional changes He used the aforementioned thermophile problem to test this method and showed that it can indeed be solved by introducing only two vectors To find the optimal solution, he employed Bayesian analysis to test the fit of the nonhomo-geneous model to data Lately, the nonhomogeneous models are being further developed in several different directions (Blanquart and Lartillot, 2006; Gowri-Shankar and Rattray,

2007) . None of these new Bayesian-based methods, optimizing the number of parameters, have so far been used to address the P-symbiont issues . However, rapid development of the techniques extracting phylogenetic signal from heterogeneous sequences indicates that it would be premature to draw any conclusion on P-symbionts monophyly/polyphyly from the analyses that have been reported

0 0

Post a comment