Distribution and asymptotic behavior of the phylogenetic transfer distance

The transfer distance (TD) was introduced in the classification framework and studied in the context of phylogenetic tree matching. Recently, Lemoine et al. (2018) showed that TD can be a powerful tool to assess the branch support of phylogenies with large data sets, thus providing a relevant alternative to Felsenstein's bootstrap. This distance allows a reference branch β in a reference tree T to be compared to a branch b from another tree T, both on the same set of n taxa. The TD between these branches is the number of taxa that must be transferred from one side of b to the other in order to obtain β. By taking the minimum TD from β to all branches in T we define the transfer index, denoted by ϕ(β,T), measuring the degree of agreement of T with β. Let us consider a reference branch β having p tips on its light side and define the transfer support (TS) as 1 - ϕ(β,T)/(p-1). The aim of this article is to provide evidence that p-1 is a meaningful normalization constant in the definition of TS, and measure the statistical significance of TS, assuming that β is compared to a tree T drawn according to a null model. We obtain several results that shed light on these questions in a number of settings. In particular, we study the asymptotic behavior of TS when n tends to ∞, and fully characterize the distribution of ϕ when T is a caterpillar tree.