Namespaces
Variants
Actions

Kendall tau metric

From Encyclopedia of Mathematics
Jump to: navigation, search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2020 Mathematics Subject Classification: Primary: 62H20 [MSN][ZBL]

Kendall tau

The non-parametric correlation coefficient (or measure of association) known as Kendall's tau was first discussed by G.T. Fechner and others about 1900, and was rediscovered (independently) by M.G. Kendall in 1938 [a3], [a4]. In modern use, the term "correlation" refers to a measure of a linear relationship between variates (such as the Pearson product-moment correlation coefficient), while "measure of association" refers to a measure of a monotone relationship between variates (such as Kendall's tau and the Spearman rho metric). For a historical review of Kendall's tau and related coefficients, see [a5].

Underlying the definition of Kendall's tau is the notion of concordance. If $( x_j, y_j )$ and $( x _ { k } , y _ { k } )$ are two elements of a sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$ from a bivariate population, one says that $( x_j, y_j )$ and $( x _ { k } , y _ { k } )$ are concordant if $x _ { j } < x _ { k }$ and $y _ { j } < y _ { k }$ or if $x _ { j } > x _ { k }$ and $y _ { j } > y _ { k }$ (i.e., if $( x _ { j } - x _ { k } ) ( y _ { j } - y _ { k } ) > 0$); and discordant if $x _ { j } < x _ { k }$ and $y _ { j } > y _ { k }$ or if $x _ { j } > x _ { k }$ and $y _ { j } < y _ { k }$ (i.e., if $( x _ { j } - x _ { k } ) ( y _ { j } - y _ { k } ) < 0$). There are $\left( \begin{array} { l } { n } \\ { 2 } \end{array} \right)$ distinct pairs of observations in the sample, and each pair (barring ties) is either concordant or discordant. Denoting by $S$ the number $c$ of concordant pairs minus the number $d$ of discordant pairs, Kendall's tau for the sample is defined as

\begin{equation*} \tau _ { n } = \frac { c - d } { c + d } = \frac { S } { \left( \begin{array} { l } { n } \\ { 2 } \end{array} \right) } = \frac { 2 S } { n ( n - 1 ) } \end{equation*}

When ties exist in the data, the following adjusted formula is used:

\begin{equation*} \tau _ { n } = \frac { S } { \sqrt { n ( n - 1 ) / 2 - T } \sqrt { n ( n - 1 ) / 2 - U } }, \end{equation*}

where $T = \sum _ { t } t ( t - 1 ) / 2$ for $t$ the number of $X$ observations that are tied at a given rank, and $U = \sum _ { u } u ( u - 1 ) / 2$ for $u$ the number of $Y$ observations that are tied at a given rank. For details on the use of $\tau _ { n }$ in hypotheses testing, and for large-sample theory, see [a2].

Note that $\tau _ { n }$ is equal to the probability of concordance minus the probability of discordance for a pair of observations $( x_j, y_j )$ and $( x _ { k } , y _ { k } )$ chosen randomly from the sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$. The population version $\tau$ of Kendall's tau is defined similarly for random variables $X$ and $Y$ (cf. also Random variable). Let $( X _ { 1 } , Y _ { 1 } )$ and $( X _ { 2 } , Y _ { 2 } )$ be independent random vectors with the same distribution as $( X , Y )$. Then

\begin{equation*} \tau = \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 2 } ) > 0 ] + \end{equation*}

\begin{equation*} - \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 2 } ) < 0 ] = \end{equation*}

\begin{equation*} = \operatorname { corr } [ \operatorname { sign } ( X _ { 1 } - X _ { 2 } ) , \operatorname { sign } ( Y _ { 1 } - Y _ { 2 } ) ]. \end{equation*}

Since $\tau$ is the Pearson product-moment correlation coefficient of the random variables $\operatorname { sign } ( X _ { 1 } - X _ { 2 } )$ and $\operatorname { sign } ( Y _ { 1 } - Y _ { 2 } )$, $\tau$ is sometimes called the difference sign correlation coefficient.

When $X$ and $Y$ are continuous,

\begin{equation*} \tau = 4 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } C _ { X , Y } ( u , v ) d C _ { X , Y } ( u , v ) - 1, \end{equation*}

where $C _ { X , Y }$ is the copula of $X$ and $Y$. Consequently, $\tau$ is invariant under strictly increasing transformations of $X$ and $Y$, a property $\tau$ shares with Spearman's rho, but not with the Pearson product-moment correlation coefficient. For a survey of copulas and their relationship with measures of association, see [a6].

Besides Kendall's tau, there are other measures of association based on the notion of concordance, one of which is Blomqvist's coefficient [a1]. Let $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$ denote a sample from a continuous bivariate population, and let $\tilde{x}$ and $\tilde{y}$ denote sample medians (cf. also Median (in statistics)). Divide the $( x , y )$-plane into four quadrants with the lines $x = \tilde { x }$ and $y = \tilde { y }$; and let $n_ 1$ be the number of sample points belonging to the first or third quadrants, and $n_{2}$ the number of points belonging to the second or fourth quadrants. If the sample size $n$ is even, the calculation of $n_ 1$ and $n_{2}$ is evident. If $n$ is odd, then one or two of the sample points fall on the lines $x = \tilde { x }$ and $y = \tilde { y }$. In the first case one ignores the point; in the second case one assigns one point to the quadrant touched by both points and ignores the other. Then Blomqvist's $q$ is defined as

\begin{equation*} q = \frac { n_1 - n_2 } { n_1 + n_2 }. \end{equation*}

For details on the use of $q$ in hypothesis testing, and for large-sample theory, see [a1].

The population parameter estimated by $q$, denoted by $\beta$, is defined analogously to Kendall's tau (cf. Kendall tau metric). Denoting by $\tilde{X}$ and $\tilde{Y}$ the population medians of $X$ and $Y$, then

\begin{equation*} \beta = \mathsf{P} [ ( X - \tilde { X } ) ( Y - \tilde { Y } ) > 0 ] + \end{equation*}

\begin{equation*} - \mathsf{P} [ ( X - \tilde { X } ) ( Y - \tilde { Y } ) < 0 ] = \end{equation*}

\begin{equation*} 4 F_{X,Y}(\tilde{X},\tilde{Y}) - 1 \end{equation*}

where $F_{ X , Y}$ denotes the joint distribution function of $X$ and $Y$. Since $\beta$ depends only on the value of $F_{ X , Y}$ at the point whose coordinates are the population medians of $X$ and $Y$, it is sometimes called the medial correlation coefficient. When $X$ and $Y$ are continuous,

\begin{equation*} \beta = 4 C _ { X , Y } \left( \frac { 1 } { 2 } , \frac { 1 } { 2 } \right) - 1, \end{equation*}

where $C _ { X , Y }$ again denotes the copula of $X$ and $Y$. Thus $\beta$, like $\tau$, is invariant under strictly increasing transformations of $X$ and $Y$.

References

[a1] N. Blomqvist, "On a measure of dependence between two random variables" Ann. Math. Stat. , 21 (1950) pp. 503–600
[a2] J.D. Gibbons, "Nonparametric methods for quantitative analysis" , Holt, Rinehart & Winston (1976)
[a3] M.G. Kendall, "A new measure of rank correlation" Biometrika , 30 (1938) pp. 81–93
[a4] M.G. Kendall, "Rank correlation methods" , Charles Griffin (1970) (Edition: Fourth)
[a5] W.H. Kruskal, "Ordinal measures of association" J. Amer. Statist. Assoc. , 53 (1958) pp. 814–861
[a6] R.B. Nelsen, "An introduction to copulas" , Springer (1999) Zbl 0909.62052
How to Cite This Entry:
Kendall tau metric. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Kendall_tau_metric&oldid=52761
This article was adapted from an original article by R.B. Nelsen (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article