Phylogenetic methods

The various methods that attempt to recover the pattern produced by evolutionary history rely on observations on living and fossil organisms. Historically, relationships of insect taxa were based on estimates of overall (phenetic) similarity derived principally from morphology, and taxonomic rank was governed by degree of difference. It is recognized now that analyses of overall similarity are unlikely to recover the pattern of evolution and thus phenetic classifications are considered artificial. Their use in phylogeny largely has been abandoned except perhaps for organisms, such as viruses and bacteria, which exhibit reticulate evolution. However, phenetic methods are useful in DNA barcoding (section 17.3.3) in which identification of an unknown species is often possible based on comparing the nucleotide sequences of one of its genes with those in a database of identified species of the group, but the process has its problems. Alternative methods in phylo-genetics are based on the premise that the pattern produced by evolutionary processes can be estimated, and, furthermore, ought to be reflected in the classification. Popular methods in current use for all types of data include cladistics (maximum parsimony), maximum likelihood, and Bayesian inference. A detailed discussion of the methods of phylogenetic inference is beyond the scope of an entomology textbook and readers should consult other sources for in-depth information. Here the basic principles and terms are explained, followed by a section on molecular phylogenetics that considers the use of genetic markers and problems with estimating relationships of insects using nucleotide sequence data.

Phylogenetic trees, also called dendrograms, are the branching diagrams that depict purported relationships or resemblances among taxa. Different kinds of trees emphasize different components of the evolutionary process. Cladograms depict only the branching pattern of ancestor/descendant relationships, and branch lengths have no meaning. Phylograms show the branching pattern and the number of character-state changes represented by differences in the branch lengths. Chronograms explicitly represent time through branch lengths. Evolution is directional (through time) and thus phylogenetic trees usually are drawn with a root; unrooted trees must be interpreted with extreme care. In modern systematics, phylogenetic trees are the basis for erecting new or revised classifications, although there are no universally accepted rules for how to convert the topology of a tree into a ranked classification. Here we use cladograms to explain some terms that also apply more widely to phylogenetic trees derived by other analytical methods.

The cladistic method (cladistics) seeks patterns of special similarity based only on shared, evolutionarily novel features (synapomorphies). Synapomorphies are contrasted with shared ancestral features (plesio-morphies or symplesiomorphies), which do not indicate closeness of relationship. The terms apo-morphy and plesiomorphy are relative terms because an apomorphy at one level in the taxonomic hierarchy; for example, the ordinal apomorphy of all beetles having fore wings in the form of protective sheaths (the elytra), becomes a plesiomorphy if we consider wing characters among families of beetles. Furthermore, features that are unique to a particular group (autapo-morphies) but unknown outside the group do not indicate inter-group relationships, although they are very useful for diagnosing the group. Construction of a cladogram (Fig. 7.1), a tree-like diagram portraying the phylogenetic branching pattern, is fundamental to cladistics. Cladograms are constructed so that character-state changes across the tree are minimized, based on the principle of parsimony (i.e. a simple explanation is preferred to a more complex one). The tree with the fewest changes (the 'shortest' tree) is monophyly paraphyly polyphyly polyphyly

Fig. 7.1 A cladogram showing the relationships of four species, A, B, C, and D, and examples of (a) the three monophyletic groups, (b) two of the four possible (ABC, ABD, ACD, BCD) paraphyletic groups, and (c) one of the four possible (AC, AD, BC, and BD) polyphyletic groups that could be recognized based on this cladogram.

Fig. 7.1 A cladogram showing the relationships of four species, A, B, C, and D, and examples of (a) the three monophyletic groups, (b) two of the four possible (ABC, ABD, ACD, BCD) paraphyletic groups, and (c) one of the four possible (AC, AD, BC, and BD) polyphyletic groups that could be recognized based on this cladogram.

considered to have the optimal topology (tree shape). It is important to remember that parsimony is a rule for evaluating hypotheses, not a description of evolution.

From a cladogram, monophyletic groups, or clades, their relationships to each other, and a classification, can be inferred directly. Only the branching pattern of relationships is considered. Sister groups are taxa that are each other's closest relatives; they arise from the same node (branching point) on a tree. A mono-phyletic group contains a hypothetical ancestor and all of its descendants (Fig. 7.1a). Further groupings can be identified from Fig. 7.1: a paraphyletic group lacks a clade from amongst the descendants of a common ancestor, and often is created by the recognition (and removal) of a derived subgroup; polyphyletic groups fail to include two or more clades from amongst the descendants of a common ancestor (e.g. A and D in Fig. 7.1c). Thus, when we recognize the monophyletic Pterygota (winged or secondarily apterous insects), two other extant orders (Archaeognatha and Zygen-toma) form a grade of primitively wingless insects (see Fig. 7.3, below) and, if treated as a named group ("Apterygota", as in some older books), it would paraphyletic. If we were to recognize a group of flying insects with fully developed wings restricted to the mesothorax (true flies, male scale insects, and a few mayflies), this would be a polyphyletic grouping. Para-phyletic groups should be avoided if possible because their only defining features are ancestral ones shared with other indirect relatives. Thus, the absence of wings in the paraphyletic apterygotes is an ancestral feature shared by many other invertebrates. The mixed ancestry of polyphyletic groups means that they are biologically uninformative and such artificial taxa should never be included in any classification.

"Tree thinking" (interpreting and using phylogen-ies) is fundamental to all of biology and yet misinterpretations of trees abound. The most common error is to read trees as ladders of evolutionary progress with the often species-poor sister group erroneously referred to as the "basal" taxon and misinterpreted as displaying characteristics found in the common ancestor. All extant species are mixes of ancestral and derived features and there is no a priori reason to assume that a species-poor lineage retains more plesiomorphies than its species-rich sister lineage. Trees have basal nodes, but they do not have basal taxa. Readers should refer to papers by Gregory (2008) and Omland et al. (2008) in this chapter's Further reading section for more information on this important topic.

Molecular phylogenetics

Pioneers in the field of molecular systematics used chromosome structure or the chemistry of molecules such as enzymes, carbohydrates, and proteins to access the genetic basis for evolution. Although these techniques are largely superceded, understanding the banding patterns in "giant" chromosomes remains important in locating gene function in some medically important flies, and comparative patterns of enzyme polymorphism (isoenzymes) remains valuable in population genetics. The field of phylogenetic systematics has been revolutionized by ever more automated and cheaper techniques that access the genetic code of life directly via the nucleotides (bases) of DNA and RNA, and the amino acids of proteins that are coded for by the genes. Obtaining genetic data requires extraction of DNA, use of the polymerase chain reaction (PCR) to amplify DNA, and a range of procedures to sequence (that is, determine) the order of the nucleotide bases - adenine, guanine (purines), cytosine, and thymine (pyrimidines) - that make up the selected section of DNA. Generally comparable sequences of 3001000 nucleotides are sought, preferably from each of several genes, for comparisons across a range of taxa of interest.

Choice of genes for phylogenetic study involves selection of those with an appropriate substitution (mutation) rate of molecular evolution for the question at hand. Closely related (recently diverged) taxa may be near identical in slower-evolving genes that provide little or no phylogenetic information, but should differ more in fast-evolving genes. From an ever-increasing database, appropriate genes ("markers") can be selected with successful previous history of use and a "cook book" method followed. Already tried-and-tested primers (nucleic acid strands that start DNA replication at a chosen place on a selected gene) can be selected appropriately. For insect molecular phylogenetics, a suite of genes typically includes one, some, or all of: the mitochondrial genes 16S, COI, and COII, the nuclear small subunit rRNA (18S), part of the large subunit rRNA (28S), and progressively more nuclear coding genes such as elongation factor 1a (EF-1a), histone 3 (H3), wingless (wg), and rudimentary CAD. Very slowly mutating (highly conserved) genes are needed to infer older branching patterns of 100 million years and more.

The particular sequence of nucleotide bases (a haplo-type) produced after processing can be used to examine genetic variation between individuals in a population.

However, in molecular phylogenetics the haplotype is used more as a characteristic of a species (or to represent a higher taxon) to be compared among other taxa. The first procedure in such an analysis is to "align" comparable (homologous) sequences in a species-by-nucleotide matrix, with rows being the sampled taxa and columns being the nucleotide identified at a particular position (site) in the gene, reading from where the primer commenced. This matrix is quite comparable with one scored for morphology in which each column is one character, and the different nucleotides are the various "states" at the site. Many characters will be invariate, unchanged amongst all taxa studied, whereas others will show variation due to substitution (mutation) at a site. Some sites are more prone to substitution than others, which are more constrained. For example, each amino acid is coded for by a triplet of nucleotides, but the state of the third nucleotide in each triplet is freer to vary (compared to the first and second positions) without affecting the resultant amino acid.

The aligned species-by-nucleotide matrix consisting of the ingroup (of taxa under study) and one or more outgroups (more distantly removed taxa) can be analysed by one or more of a suite of methods. Some molecular phylogeneticists argue that parsimony -minimising the number of character-state changes across the tree with the same weight allocated to each substitution observed in a column (character) - is the only justifiable approach, given our uncertainty about how molecular evolution takes place. However, increasingly analyses are based on more complex models involving application of weights to different kinds of mutations; for example, a transition between one purine and the other (adenine^guanine) or between two pyrimides (cytosine^thymine) is more "likely" to have occured than a transversion between a purine and a chemically more dissimilar pyrimidine (and vice versa). Most systematists examine results (phylogenetic trees) based on both the simplest model (parsimony) and ever more complex "likelihood" models, including Bayesian statistical programs, requiring one or a suite of powerful computers running over days or weeks. Each model and analysis method should provide estimates of confidence (support) for the relationships portrayed, to allow critical assessment of all hypothesised relationships.

The procedures described briefly above are implemented in many research laboratories, covering much of the immense diversity of the hexapods, and generating results (or hypotheses of relationships) that are published in numerous journal articles each year. The reader may be excused from asking why, if this is the case, do so many unresolved or contradictory ideas on insect relationships exist? Today we may seem to know less about many relationships than we did previously based on either morphology or early molecular work. Molecular data, which can provide many thousands of characters and more than adequately sufficient variable character states (mutations), has yet to deliver its promise of a full understanding of the evolution of the insects.

There are many problems unforeseen by early practitioners.

1 It is relatively straightforward to assess the morphology of a group of insects including from historical specimens, but it much less easy to obtain the same diversity of appropriately preserved species for molecular study. Collection requires specialist knowledge and techniques, with increasing legal impediments for genetic collections.

2 Even with a good sample of the diversity of material, the procedures for DNA extraction and sequencing can be unsuccessful for some individual specimens or even broader groups, or for some genes, for a number of reasons including poor preservation that allows water to enhance DNA degradation by nucleases.

3 Given sequences, the procedure of alignment -constructing columns of homologous nucleotides -becomes non-trivial as more distantly related taxa are included, because the gene sections sequenced may differ in length. One or more sequences with respect to others may be longer or shorter due to insertions or deletions (indels) of some to many nucleotides in one or more places. It is problematic as to how to align such sequences by insertion of "gaps" (absences of nucleotides) and how to "weight" such inferred changes relative to regular nucleotide substitutions. Although nuclear ribosomal genes (18S and 28S) have been commonly chosen especially for deeper divergences, these genes may present major alignment problems compared to protein-encoding nuclear genes.

4 There are only four nucleotides amongst which substitutions can take place, so the higher the mutation rate, the more likely that a second (or further) mutation may take place that could revert a nucleotide either to its original or another condition (a so-called "multiple hit"). The existence and history of the substitutions that resulted in an identical end state cannot be recognized: an adenine is adenine whether or not it has mutated via a guanine and back.

5 There is variation in the phylogenetic information between different parts of a gene: there may be regions, said to be hypervariable, with a high concentration of substitutions and therefore many multiple hits may be interspersed (unrecognized) within sections that are near impossible to align.

6 There are differences in propensities of sites to mutate: not only the first and second nucleotides relative to third positions (above), but related to the general and specific (secondary and tertiary) structure of the protein or RNA molecule from which the sequence came.

7 The evolutionary signal derived from phylogenetic analysis of one gene provides an evolutionary insight into that gene, but this need not be the "true" history of the organisms. Perhaps there is no "true" history derivable from genes: many insects diversified very rapidly but speciation historically is a drawn-out process. Population-level processes of gene flow, genetic drift, selection, and mutation take place at much shorter time scales and can create multiple histories of genes. Genes can duplicate within a lineage, with each copy (paralog) subjected subsequently to different substitutions. Only similar (homologous) copies of genes should be compared in phylogenetic study of the organisms.

8 In contrast to population genetics, a major challenge to phylogenetics is to obtain primers that can amplify the targeted locus from a diversity of organisms. But primer sites also diverge with time reducing the "cross-reactivity" of primers between distant taxa. Primers with better ability across distant taxa (incorporating degenerate bases) will lose specificity for the study group, increasing the propensity of non-specific priming of non-target sequences.

These, and other problems with genetic data, do not imply that such studies cannot untangle insect evolutionary history but may explain the many divergent results that have arisen from molecular phylogenetic studies. Some problems can be addressed, for example, by better sampling or by improved models for alignment and for site-specific variation in substitution rates according to better understanding of molecular structures. As vastly more molecular data become available, including of whole genomes, computation will be challenged by large matrices including those with missing data (lacking genes or taxa). Perhaps phylogenetic analyses will converge on consistent relationships among the groups of insects that interest us. At the time of writing, we have taken a conservative approach to portraying and discussing the evolutionary relationships in this chapter, showing strongly supported groupings from molecular studies where there is a congruent morphological basis.

Beekeeping for Beginners

Beekeeping for Beginners

The information in this book is useful to anyone wanting to start beekeeping as a hobby or a business. It was written for beginners. Those who have never looked into beekeeping, may not understand the meaning of the terminology used by people in the industry. We have tried to overcome the problem by giving explanations. We want you to be able to use this book as a guide in to beekeeping.

Get My Free Ebook

Post a comment