Estimation of trees on the basis of pairwise comparisons with random errors

The estimators of the trees on the basis of multiple pairwise comparisons, with random errors, are proposed in the paper. The estimators are based on the idea of the nearest adjoining order (see Slater, 1961; Klukowski 201 !). Two kinds of trees are examined: non-directed and directed. The approach is similar to estimation of the preference relation with incomparable elements on the basis of binary comparisons. The estimates are obtained on the basis of discrete optimization problems; their properties, especially accuracy, are similar to those for the preference relation. Such the trees can be applied to modelling of many phenomena, e


Introduction
The problems of estimation of the relations of: preference ( complete and with partial order), equivalence and tolerance, on the basis of multiple pairwise comparisons with random errors, has been examined in Klukowski (I 994, 20 I I Chap I. 7 -11 , 2013 , 20 I 4a, b ).The same approach can be applied to the trees -non-directed and directed; they are more general objects than the relations mentioned.The non-directed tree can be defined as a graph -non-directed, acyclic and complete; in other words : it doesn ' t exist a path (sequence of edges) from a fixed node ( element) to this node and each pair of nodes is connected with a path.The directed tree can be defined as a graph directed, acyclic and complete.The directed graph has a root (initial node), paths leading in one direction and leafs (final nodes of a tree).
The problem of estimation of a tree, non-directed or directed, can be expressed as follows: -it is given a finite set of nodes (elements) with unknown paths (system of edges); -instead of system of edges, it is known a set of pairwise comparisons, which evaluate unknown paths, with random en-ors; any comparison states existence or non-existence of a connection between two elements -in the case of non-directed tree connection means an edge in the case of directed tree a connection mean a path and its direction; -a random error means that a result of any comparison can be true or not with a probability satisfying some weak assumptions; any pair is compared N times (i'l::'.:l),all comparisons are assumed independent in stochastic way; -the form ofa tree, i.e. the system of its paths, has to be determined (estimated) on the basis of the set of pairwise comparisons characterized above.
The idea of estimation consists in minimization of differences between the form of a tree, expressed in appropriate way, and a given set of pairwise comparisons with random errors (Slater 1961, Klukowski 2011, 2013).The estimates are obtained as the optimal solutions of the discrete programming problems defined below; the number of solutions can exceed one.
The approach rested on the statistical paradigm provides the properties of estimates and the possibility of verification of the results obtained.The main property is consistency, for the number of comparisons N (for each pair) converging to infinity, under non-restricted assumptions about comparison errors.In general it is assumed that probability of correct comparison is greater than ½ and that multiple comparisons of each pair are independent random variables.The estimators can be also applied in the case of unknown distributions of comparison errors, which have to satisfy the assumptions made.
The idea of the estimators was introduced firstly by Slater (1961) -for the case of single, binary comparisons and the complete preference relation; some other ideas, in the area of pairwise comparisons, have been presented in: David (1988), Bradley (1984), Flinger and Verducci (1993), Gordon (I 999), Klukowski (2011Klukowski ( , 2013)).
The paper consists of four sections.The second section presents the definitions, notations and assumptions about comparison errors.The next sections consider the form of estimators, for both kinds of trees, and their properties.The last section summarizes the results.The Appendix presents proofs of some relationships determining properties of the estimators proposed.

Definitions, notations and assumptions about comparisons errors 2.1 Definitions and notations
The problem of estimation of the non-directed tree on the basis of pairwise comparisons can be stated as follows.
We are given a finite set of elements X = {x,, ... , x.,) (3:<::m<oo).The elements of the set X (nodes) are connected with edges generating a non-directed tree (non-directed, acyclic and complete).Each pair of elements (x,, x) can have an edge or not; thus the set of pairs of indices: Rm={< i,j >/ i= I, ... , m-1 , J = i +I, ... , m) (I) can be divided into two disjoint subsets -the first one JO include pairs connected with an edge, the second one J v pairs not connected with an edge, and Rm = JO u / v .Any pair <i,J> is not ordered, i.e.
The (non-directed) tree can be expressed with a use of values T v(x;,x) (<i,)>ER,,,) , indicating existence or non-existence of an edge: { 1 if x; and XJ are connected with an edge, T v(X;,x1) = .
. 0 if x ; and XJ are not connectedw wrt an edge. (2) The values Tv(X;,x) define the non-directed tree in the unique way.
The similar considerations relate to the directed tree (directed, acyclic and complete).Such the tree can be expressed with a use of values T d(X;,X J) (< i, j >ER,,,), indicating existence or non-existence of a path between elements (nodes) and direction of the path: i -1 if there exists a path from x , to x i , T d(x;,x) = 1 if there exists a path from XJ to x ;, 2 if there not exists a path between x ; and x i . ( The set of indices Rm can be expressed as the alternative of the subsets Rm= f ±1 u / v , where: J ±I includes indices of pairs of elements connected with a path and J v includes indices of non-connected pairs; any pair of indices <i,j>E1± 1 of connected elements is ordered, i.e. shows that the direction ofa path between x ; and x., .It is clear that Td(X;, x 1 )=-TAxJ,x;) for < i,j>Ef±i.
The values TAx ;,x) define the directed tree in the unique way .

Examp les
The The values TAx; ,x;) :

Assumptions about distributions of comparisons errors
The form of both types of the trees, expressed -respectively -by Tv(x;,x;) or Td(X;,x;), has to be determined (estimated) on the basis of N (N?.I) comparisons of each pair (x,,x) ( < i,j >E Rm), evaluating the values T v(x;,x;) or T d(X;,x;), disturbed by random errors.
The assumptions about comparisons errors reflect the following facts.The probability of a correct comparison is greater than incorrect one (assumptions ( 6), ( 7)).The comparisons errors are independent in the stochastic way.The assumption can be relaxed in such a way that (multiple) comparisons of the same pair are independent and comparisons of pairs comprising different elements are independent.

Estimation problems and properties of estimates
The idea of the nearest adjoining order estimators is to minimize the absolute differences between a set of comparisons and the tree, expressed by the values T v (x ,, x 1 ) or T d (x, , x) .
Thus, the estimates f v(x, ,x 1 ) or f Ax,, x) (< i,j >E R.,) are the optimal solutions of the discrete programming problems -respectively:
FYI) (u E {v, d}-feasible set, i.e. family of all trees (non-directed or directed) determined on the set X ,