Cladistics

Reconstructing Phylogenetic Trees

We describe the reconstruction of a phylogenetic tree for a set of taxa, in the language of lparse (Version 1.0.13) as in "Character-based cladistics and answer set programming," by D. R. Brooks, E. Erdem, J. W. Minett, and D. Ringe, in Proc. of PADL'05, pages 37--51, 2005.

The files describing the reconstruction of a phylogeny for a set of taxa (phylogeny.lp), and the preprocessed datasets used in our experiments (Alcataenia, Chinese, Indo-European) are enclosed in cladistics-basic.tar.gz.

With these files, we can generate phylogenies with at most n incompatible characters, for Chinese dialects, Indo-European languages, and Alcataenia species, using cmodels (with zchaff). For instance, a phylogeny for Chinese dialects, with at most 6 incompatible characters, can be generated by the command

lparse -d none -c n=6 Chinese phylogeny.lp | cmodels -zc

For a more efficient computation, we modify the phylogeny program above according to the heuristics described in Section 4 of [Brooks et al., 2005]. This program is presented in the file phylogeny-improved.lp. For instance, a phylogeny for Indo-European language groups, with at most 16 incompatible characters, can be generated by the command

lparse -d none -c n=9 Indo-European phylogeny-improved.lp | cmodels -zc

(The reason why we use n=9 instead of n=16 is due to preprocessing, and is explained in Section 7 of [Brooks et al., 2005].)

The datasets before preprocessing are presented in the files Chinese-unprocessed, Alcataenia-unprocessed, Indo-European-unprocessed.