Select the sample

Cladistics analyze taxa, which can be real individual entities or species. Do not forget that cladistics is looking for a phylogeny, a kind of genealogy for species, not a genealogy for individuals. Hence, if individuals are taken, they are supposed to represent a species or a class of objects. In this case, the individuals are called exemplars.

Taxa can also be virtual. For instance, the representative of a species can be “democratic”, or a Common Equals Primitive, that takes the most common values for characters. This is obviously an artificial object. In the case of galaxies, it is easy to imagine totally artificial objects coming out from a pure modeling.

The notion of taxon is very useful because otherwise it would be impossible to study huge samples, like the entire biodiversity. The problem with cladistics is that it needs to build all possible trees from a given set of taxa, and then compute for each of the trees a number of steps from which the most parsimonious one is chosen as the “best” hypothesis for the phylogeny.

But the number of possible arrangements is $\frac{(2n-3)!}{2^{n-2}(n-2)!}$. This is a N-P hard problem, i.e. not solvable in a reasonable amount of time. So a few thousands of taxa is generally the largest number to be considered for the sample. In most cases, expect several days of computation before getting any result.

But since you must use heuristic techniques, you are never certain of finding the really most parsimonious trees. Hence it is a good idea to repeating the computation several times…

An interesting workaround is to use a divide-and-conquer approach, and build a supertree. But I do not think it is a good idea at the present stage of galaxy classification because the subsamples have to be chosen carefully to get consistent and complementary subtrees.

Anyhow, I am currently focusing on obtaining some simple classification for galaxies, so that I do not expect to find thousands of species. So the question is not so much of analyzing the millions of objects available in databases, but rather of identifying a few clades in which I can put these millions of galaxies.