Use a clever algorithm

Principle

The coded matrix, together with the hypothesis on the character transformation, and may be an outgroup, can be processed to build trees and find the most parsimonious one(s). We have seen how simple the principle is. And how time consuming it is. Since it is impossible to explore the entire tree-space when more that ten taxa are present, many algorithms have been and are still developed to lead to the best solution. These heuristic methods can never guarantee the most parsimonious tree in absolute, but are clever enough to be close to it. These algorithms are most generally very efficient for specific type of data, and relatively little has been done regarding quantitative continuous characters. I often hope that astrocladistics will stir relevant researches.

The most used method is too aggregate taxa one by one, finding the best trees at each new taxon by swapping branches randomly. Then the process starts again with a new first taxon. Of course, the higher the number of iterations, the higher the chance to find a global minimum and avoid being trapped in a local minimum.

A very efficient variant of it is called the ratchet approach, where the parameter space is deformed by putting arbitrary weights to some characters. Once a minimum is found, the parameter space is reestablished in its original form and the minimum is compared to previous ones. This strategy greatly reduces the inconvenience of being trapped in a local minimum and converges quite quickly to significantly parsimonious trees. From these, it is recommended to start a new deep search around these minima.

Softwares

I present here the softwares that I have used and found useful for astrocladistics. All run under Linux, which is a good thing since we can take advantage of very powerful computers and grids. All are free, except the first one PAUP.

PAUP* (Phylogeny Analysis Using Parsimony and others)

It is a very much used and well known phylogenetic packages that is especially devoted to cladistics and well suited for morphometrics analyses. I find it perfect for astrocladistics. It is not free, but not so expensive. It has many many softwares extremely useful for all aspects of cladistic analyses. It can be run in batch which is essential for big computations.

MBPR,  PRAP, Pauprat,

These are merely scripts that implement the ratchet strategy to be run with PAUP. I use MBPR (Multi-Batch Paup Ratchet) which I find very fast and efficient. But it seems to have disappeared.
The best bet is thus PRAP that is still available (version 2007) and is a very nice script also. Pauprat is nice but older (2001).

PHYLIP (PHYLogeny Inference Package)

This is a well-known package to infer phylogenies. However, it is not as developed as PAUP for cladistics. In particular, it is limited to 8 states instead of 32 for PAUP. For continuous data, I find it a handicap. Nevertheless, the developer is a figure in phylogeny inference, and the website contains a wealth of information, in particular all softwares that exist on the topic.

TNT  (Tree analysis using New Technology)

This package implement new efficient algorithms, in particular to compute quantitative continuous data. This means that there is no need to code the characters. However, this is not entirely true, since the number of bins is limited to … 65 000. Well, ok, this much much more than 32. I use it in parallel to PAUP, but there are some assumptions behind the computation that I have not fully worked out yet. I a not sure they are adequate for astrophysics. Anyhow, it is the only package that I know of that uses the continuous characters as such.

Clann

This software aims at construction supertrees from a set of smaller trees. It builds an artificial  matrix describing the input trees and then searches the most parsimonious tree using PAUP.

Mesquite

Mesquite is really a must. I use it a lot to visualize, manipulate and interpret the trees. But it can do much more than that, it can compute also some parsimony searches, branch lengths, reconstruct ancestral states, project trees on scatter plots, it can be linked with the R package etc.

R

Yes R has many tools to do phylogeny. I don’t know if it is able to make parsimony searches, but I use it for statistics (of course) within and between the groups I have defined from the trees, either with Mesquite or with R. As mentioned above, it can be linked with Mesquite to transfer matrices and datasets. Note that R is included in some Linux distributions.

TreeView, FigTree, Treegraph, Treedyn

These visualization softwares can be useful even if I don’t use them much. They are listed roughly in order of possibilities, the first one being particularly light and the last one being able to plot very large trees.

Advertisements
  1. Leave a comment

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s