The evolutionary stages have been defined by the coding. But what do they represent? How does evolution proceed through the stages? Is the coding corresponds to the cost of evolution, are two subsequent bins equals one evolutionary step? This is the aim of the task we now describe: define a transformation scenario for each of the characters. Let’s assume that the coding goes from 1 to 32 as I usually do.
The simplest transformation scenario for a character is to go from one bin to another. There are however four different ways of doing so that are called hypotheses:
- Wagner or ordered character: character stages are reversible and additives. The number of steps is equal to the absolute value of the bin difference. If you go from 1 to 2 or from 20 to 21, there is one step, from 18 to 16 there are two steps, and so forth.
- Fitch or unordered character: stages are reversible and non-additives. In this case, going from 3 to 4 or from 5 to 25 or 18 to 9 costs always one step and only one step.
- Camin-Sokal: stages are irreversible. You can go one way (1 towards 32), or the other way (32 towards 1), but there must be only one direction for the entire evolutionary transformation of this character. This is a very strong constraint since reversals are forbidden and requires very strong arguments to be adopted.
- Dollo: each stage can only occur once. Parallel evolutions are thus forbidden, convergences as well. This also is a very very strong constraint.
In astrophysics, I have found the ordered (Wagner) hypothesis well suited. I indeed think this is the general default behavior of quantitative continuous characters that change through continuous physical processes.
The irreversible (Camin-Sokal) hypothesis is also quite useful in some cases. For instance, the mass of a galaxy is bound to increase since gravitation is an attractive force. Mass is thus an irreversible character. However, in some strong galaxy encounters, some matter can be stripped away and create what we call tidal dwarf galaxies. But this is probably marginal.
The least constraining hypothesis is that of the unordered character (Fitch). It can be useful but less constraint means more allowed arrangements and hence higher computation time. In practice, it often leads to less convergence and thus less resolved consensus trees.
The parsimony optimization looks for the simplest tree by minimizing the number of steps. Obviously, this depends on the hypothesis chosen. But there is not much freedom here since at the end, this choice will influences a lot the evolutionary scenario depicted on the tree and this should be physically meaningful. Recall that the cladogram is called an phylogenetic hypothesis.
Apart from these four general hypotheses, more complex character transformations can be given. For instance, there may be alternate paths (from stage 4 it might be possible to jump to 5 or 6 with equal cost or not). Trees or step matrices can describe these transformations. Ste matrices are tables in which the cost between any two states is given, irreversibility being indicated by an infinite number or the letter “i”.
All that said, astronomers can easily understand the possibility to impose some character transformation constraint by using a physical model.