Namespaces
Variants
Actions

Spearman rho metric

From Encyclopedia of Mathematics
Jump to: navigation, search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Spearman rho

The non-parametric correlation coefficient (or measure of association) known as Spearman's rho was first discussed by the psychologist C. Spearman in 1904 [a4] as a coefficient of correlation on ranks (cf. also Correlation coefficient; Rank statistic). In modern use, the term "correlation" refers to a measure of a linear relationship between variates (such as the Pearson product-moment correlation coefficient), while "measure of association" refers to a measure of a monotone relationship between variates (such as the Kendall tau metric and Spearman's rho). For an historical review of Spearman's rho and related coefficients, see [a2].

Spearman's rho, denoted $r _ { S }$, is computed by applying the Pearson product-moment correlation coefficient procedure to the ranks associated with a sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$. Let $R _ { i } = \operatorname { rank } ( x _ { i } )$ and $S _ { i } = \operatorname { rank } ( y _ { i } )$; then computing the sample (Pearson) correlation coefficient $r$ for $\{ ( R _ { i } , S _ { i } ) \} _ { i = 1 } ^ { n }$ yields

\begin{equation*} r _{S} = \frac { \sum _ { i = 1 } ^ { n } ( R _ { i } - \overline { R } ) ( S _ { i } - \overline{S} ) } { \sqrt { \sum _ { i = 1 } ^ { n } ( R _ { i } - \overline { R } ) ^ { 2 }\cdot \sum _ { i = 1 } ^ { n } ( S _ { i } - \overline { S } ) ^ { 2 } } } = \end{equation*}

\begin{equation*} = 1 - \frac { 6 \sum _ { i = 1 } ^ { n } ( R _ { i } - S _ { i } ) ^ { 2 } } { n ( n ^ { 2 } - 1 ) }, \end{equation*}

where $\overline { R } = \sum _ { i = 1 } ^ { n } R _ { i } / n = ( n + 1 ) / 2 = \sum _ { i = 1 } ^ { n } S _ { i } / n = \overline { S }$. When ties exist in the data, the following adjusted formula for $r _ { S }$ is used:

\begin{equation*} r_{S} = \frac { n ( n ^ { 2 } - 1 ) - 6 \sum _ { i = 1 } ^ { n } ( R _ { i } - S _ { i } ) ^ { 2 } - 6 ( T + U ) } { \sqrt { n ( n ^ { 2 } - 1 ) - 12 T } \sqrt { n ( n ^ { 2 } - 1 ) - 12 U } }, \end{equation*}

where $T = \sum _ { t } t ( t ^ { 2 } - 1 ) / 12$ for $t$ the number of $X$ observations that are tied at a given rank, and $U = \sum _ { u } u ( u ^ { 2 } - 1 ) / 12$ for $u$ the number of $Y$ observations that are tied at a given rank. For details on the use of $r _ { S }$ in hypothesis testing, and for large-sample theory, see [a1].

If $X$ and $Y$ are random variables (cf. Random variable) with respective distribution functions $F _ { X }$ and $F _{Y}$, then the population parameter estimated by $r _ { S }$, usually denoted $\rho_{ S}$, is defined to be the Pearson product-moment correlation coefficient of the random variables $F _ { X } ( X )$ and $F _ { Y } ( Y )$:

\begin{equation*} \rho _ { S } = \operatorname { corr } [ F _ { X } ( X ) , F _ { Y } ( Y ) ] = \end{equation*}

\begin{equation*} = 12 \mathsf{E} [ F_{ X} ( X ) F _ { Y } ( Y ) ] - 3. \end{equation*}

Spearman's $\rho_{ S}$ is occasionally referred to as the grade correlation coefficient, since $F _ { X } ( X )$ and $F _ { Y } ( Y )$ are sometimes called the "grades" of $X$ and $Y$.

Like Kendall's tau, $\rho_{ S}$ is a measure of association based on the notion of concordance. One says that two pairs $( x _ { 1 } , y _ { 1 } )$ and $( x _ { 2 } , y _ { 2 } )$ of real numbers are concordant if $x _ { 1 } < x _ { 2 }$ and $y _ { 1 } < y _ { 2 }$ or if $x _ { 1 } > x _ { 2 }$ and $y _ { 1 } > y _ { 2 }$ (i.e., if $( x _ { 1 } - x _ { 2 } ) ( y _ { 1 } - y _ { 2 } ) > 0$); and discordant if $x _ { 1 } < x _ { 2 }$ and $y _ { 1 } > y _ { 2 }$ or if $x _ { 1 } > x _ { 2 }$ and $y _ { 1 } < y _ { 2 }$ (i.e., if $( x _ { 1 } - x _ { 2 } ) ( y _ { 1 } - y _ { 2 } ) < 0$). Now, let $( X _ { 1 } , Y _ { 1 } )$, $( X _ { 2 } , Y _ { 2 } )$ and $( X _ { 3 } , Y _ { 3 } )$ be independent random vectors with the same distribution as $( X , Y )$. Then

\begin{equation*} \rho _ { S } = 3 \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 3 } ) > 0 ] + \end{equation*}

\begin{equation*} - 3 \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 3 } ) < 0 ], \end{equation*}

that is, $\rho_{ S}$ is proportional to the difference between the probabilities of concordance and discordance between the random vectors $( X _ { 1 } , Y _ { 1 } )$ and $( X _ { 2 } , Y _ { 3 } )$ (clearly, $( X _ { 2 } , Y _ { 3 } )$ can be replaced by $( X _ { 3 } , Y _ { 2 } )$).

When $X$ and $Y$ are continuous,

\begin{equation*} \rho _ { S } = 12 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } u v d C _ { X , Y } ( u , v ) - 3 = \end{equation*}

\begin{equation*} = 12 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } [ C _ { X , Y } ( u , v ) - u v ] d u d v, \end{equation*}

where $C _ { X , Y }$ is the copula of $X$ and $Y$. Consequently, $\rho_{ S}$ is invariant under strictly increasing transformations of $X$ and $Y$, a property $\rho_{ S}$ shares with Kendall's tau but not with the Pearson product-moment correlation coefficient. Note that $\rho_{ S}$ is proportional to the signed volume between the graphs of the copula $C _ { X , Y } ( u , v )$ and the "product" copula $\Pi ( u , v ) = u v$, the copula of independent random variables. For a survey of copulas and their relationship with measures of association, see [a3].

Spearman [a5] also proposed an $L_1$ version of $r _ { S }$, known as Spearman's footrule, based on absolute differences $| R _ { i } - S _ { i } |$ in ranks rather than squared differences:

\begin{equation*} f _ { S } = 1 - \frac { 3 \sum _ { i = 1 } ^ { n } | R _ { i } - S _ { i } | } { n ^ {2} - 1 }. \end{equation*}

The population parameter $\phi_S$ estimated by $f _ { S }$ is given by

\begin{equation*} \phi _ { S } = 1 - 3 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } | u - v | d C _ { X , Y } \gamma ( u , v ) = \end{equation*}

\begin{equation*} = 6 \int _ { 0 } ^ { 1 } C _ { X , Y } ( u , u ) d u - 2. \end{equation*}

References

[a1] J.D. Gibbons, "Nonparametric methods for quantitative analysis" , Holt, Rinehart & Winston (1976)
[a2] W.H. Kruskal, "Ordinal measures of association" J. Amer. Statist. Assoc. , 53 (1958) pp. 814–861
[a3] R.B. Nelsen, "An introduction to copulas" , Springer (1999)
[a4] C. Spearman, "The proof and measurement of association between two things" Amer. J. Psychol. , 15 (1904) pp. 72–101
[a5] C. Spearman, "A footrule for measuring correlation" Brit. J. Psychol. , 2 (1906) pp. 89–108
How to Cite This Entry:
Spearman rho metric. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Spearman_rho_metric&oldid=55286
This article was adapted from an original article by R.B. Nelsen (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article