Namespaces
Variants
Actions

Difference between revisions of "Spearman coefficient of rank correlation"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
(latex details)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
A measure of the dependence of two random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s0862501.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s0862502.png" />, based on the rankings of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s0862503.png" />'s and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s0862504.png" />'s in independent pairs of observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s0862505.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s0862506.png" /> is the [[Rank|rank]] of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s0862507.png" /> corresponding to that pair <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s0862508.png" /> for which the rank of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s0862509.png" /> is equal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625010.png" />, then the Spearman coefficient of rank correlation is defined by the formula
+
<!--
 +
s0862501.png
 +
$#A+1 = 46 n = 0
 +
$#C+1 = 46 : ~/encyclopedia/old_files/data/S086/S.0806250 Spearman coefficient of rank correlation
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625011.png" /></td> </tr></table>
+
{{TEX|auto}}
 +
{{TEX|done}}
  
or, equivalently, by
+
A measure of the dependence of two random variables  $  X $
 +
and  $  Y $,  
 +
based on the rankings of the  $  X _ {i} $'
 +
s and  $  Y _ {i} $'
 +
s in independent pairs of observations  $  ( X _ {1} , Y _ {1} ) \dots ( X _ {n} , Y _ {n} ) $.
 +
If  $  R _ {i} $
 +
is the [[Rank|rank]] of  $  Y $
 +
corresponding to that pair  $  ( X , Y ) $
 +
for which the rank of  $  X $
 +
is equal to  $  i $,
 +
then the Spearman coefficient of rank correlation is defined by the formula
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625012.png" /></td> </tr></table>
+
$$
 +
r _ {s}  =
 +
\frac{12}{n ( n  ^ {2} - 1 ) }
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625013.png" /> is the difference between the ranks of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625014.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625015.png" />. The value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625016.png" /> lies between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625017.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625018.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625019.png" /> when the rank sequences completely coincide, i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625020.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625021.png" />; and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625022.png" /> when the rank sequences are completely opposite, i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625023.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625024.png" />. This coefficient, like any other [[Rank statistic|rank statistic]], is applied to test the hypothesis of independence of two variables. If the variables are independent, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625025.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625026.png" />. Thus, the amount of deviation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625027.png" /> from zero gives information about the dependence or independence of the variables. To construct the corresponding test one computes the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625028.png" /> for independent variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625029.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625030.png" />. When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625031.png" /> one can use tables of the exact distribution (see [[#References|[2]]], [[#References|[4]]]), and when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625032.png" /> one can take advantage, for example, of the fact that as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625033.png" /> the random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625034.png" /> is asymptotically distributed as a standard normal distribution. In the latter case the hypothesis of independence is rejected if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625035.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625036.png" /> is the root of the equation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625037.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625038.png" /> is the standard [[Normal distribution|normal distribution]] function.
+
\sum _ { i=1}^ { n }
 +
\left ( i - n+
 +
\frac{1}{2}
 +
\right )
 +
\left ( R _ {i} - n+
 +
\frac{1}{2}
 +
\right )
 +
$$
  
Under the assumption that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625039.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625040.png" /> have a joint normal distribution with (ordinary) [[Correlation coefficient|correlation coefficient]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625041.png" />,
+
or, equivalently, by
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625042.png" /></td> </tr></table>
 
  
as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625043.png" />, and therefore the variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625044.png" /> can be used as an estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625045.png" />.
+
$$
 +
r _ {s}  = 1 -
 +
\frac{6 }{n ( n  ^ {2} - 1 ) }
 +
\sum _ {i=1} ^ { n }  d _ {i}  ^ {2} ,
 +
$$
  
The Spearman coefficient of rank correlation was named in honour of the psychologist C. Spearman (1904), who used it in research on psychology in place of the ordinary correlation coefficient. The tests based on the Spearman coefficient of rank correlation and on the [[Kendall coefficient of rank correlation|Kendall coefficient of rank correlation]] are asymptotically equivalent (when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s086/s086250/s08625046.png" />, the corresponding rank statistics coincide).
+
where  $  d _ {i} $
 +
is the difference between the ranks of  $  X _ {i} $
 +
and  $  Y _ {i} $.
 +
The value of  $  r _ {s} $
 +
lies between  $  - 1 $
 +
and  $  + 1 $;
 +
$  r _ {s} = + 1 $
 +
when the rank sequences completely coincide, i.e.  $  i = R _ {i} $,
 +
$  i = 1 \dots n $;
 +
and  $  r _ {s} = - 1 $
 +
when the rank sequences are completely opposite, i.e.  $  i = ( n + 1 ) - R _ {i} $,
 +
$  i = 1 \dots n $.
 +
This coefficient, like any other [[rank statistic]], is applied to test the hypothesis of independence of two variables. If the variables are independent, then  $  {\mathsf E} r _ {s} = 0 $,
 +
and  $  {\mathsf D} r _ {s} = 1 / ( n - 1 ) $.
 +
Thus, the amount of deviation of $  r _ {s} $
 +
from zero gives information about the dependence or independence of the variables. To construct the corresponding test one computes the distribution of $  r _ {s} $
 +
for independent variables  $  X $
 +
and $  Y $.
 +
When  $  4 \leq  n \leq  10 $
 +
one can use tables of the exact distribution (see [[#References|[2]]], [[#References|[4]]]), and when  $  n > 10 $
 +
one can take advantage, for example, of the fact that as  $  n \rightarrow \infty $
 +
the random variable  $  \sqrt n- 1 r _ {s} $
 +
is asymptotically distributed as a standard normal distribution. In the latter case the hypothesis of independence is rejected if  $  | r _ {s} | > u _ {1 - \alpha / 2 }  / \sqrt n- 1 $,
 +
where  $  u _ {1 - \alpha / 2 }  $
 +
is the root of the equation  $  \Phi ( u) = 1 - \alpha / 2 $
 +
and  $  \Phi ( u) $
 +
is the standard [[normal distribution]] function.
  
====References====
+
Under the assumption that  $  X $
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  C. Spearman,  "The proof and measurement of association between two rings" ''Amer. J. Psychol.'' , '''15''' (1904) pp. 72–101</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  M.G. Kendall,  "Rank correlation methods" , Griffin  (1962)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  B.L. van der Waerden,  "Mathematische Statistik" , Springer (1957)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> L.N. Bol'shev,  N.V. Smirnov,  "Tables of mathematical statistics" , ''Libr. math. tables'' , '''46''' , Nauka  (1983)  (In Russian)  (Processed by L.S. Bark and E.S. Kedrova)</TD></TR></table>
+
and  $ Y $
 +
have a joint normal distribution with (ordinary) [[correlation coefficient]]  $ \rho $,
  
 +
$$
 +
{\mathsf E} r _ {s}  \sim 
 +
\frac{6} \pi
 +
  { \mathop{\rm arc}  \sin } 
 +
\frac \rho {2}
  
 +
$$
  
====Comments====
+
as  $  n \rightarrow \infty $,
 +
and therefore the variable  $  2  \sin ( \pi r _ {s} / 6 ) $
 +
can be used as an estimator for  $  \rho $.
  
 +
The Spearman coefficient of rank correlation was named in honour of the psychologist C. Spearman (1904), who used it in research on psychology in place of the ordinary correlation coefficient. The tests based on the Spearman coefficient of rank correlation and on the [[Kendall coefficient of rank correlation|Kendall coefficient of rank correlation]] are asymptotically equivalent (when  $  n = 2 $,
 +
the corresponding rank statistics coincide).
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  J. Hájek,  Z. Sidák,  "Theory of rank tests" , Acad. Press  (1967)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  M. Hollander,  D.A. Wolfe,  "Nonparametric statistical methods" , Wiley  (1973)</TD></TR></table>
+
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  C. Spearman,  "The proof and measurement of association between two rings"  ''Amer. J. Psychol.'' , '''15'''  (1904)  pp. 72–101</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  M.G. Kendall,  "Rank correlation methods" , Griffin  (1962)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  B.L. van der Waerden,  "Mathematische Statistik" , Springer  (1957)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  L.N. Bol'shev,  N.V. Smirnov,  "Tables of mathematical statistics" , ''Libr. math. tables'' , '''46''' , Nauka  (1983)  (In Russian)  (Processed by L.S. Bark and E.S. Kedrova)</TD></TR>
 +
<TR><TD valign="top">[a1]</TD> <TD valign="top">  J. Hájek,  Z. Sidák,  "Theory of rank tests" , Acad. Press  (1967)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  M. Hollander,  D.A. Wolfe,  "Nonparametric statistical methods" , Wiley  (1973)</TD></TR></table>

Latest revision as of 09:15, 6 January 2024


A measure of the dependence of two random variables $ X $ and $ Y $, based on the rankings of the $ X _ {i} $' s and $ Y _ {i} $' s in independent pairs of observations $ ( X _ {1} , Y _ {1} ) \dots ( X _ {n} , Y _ {n} ) $. If $ R _ {i} $ is the rank of $ Y $ corresponding to that pair $ ( X , Y ) $ for which the rank of $ X $ is equal to $ i $, then the Spearman coefficient of rank correlation is defined by the formula

$$ r _ {s} = \frac{12}{n ( n ^ {2} - 1 ) } \sum _ { i=1}^ { n } \left ( i - n+ \frac{1}{2} \right ) \left ( R _ {i} - n+ \frac{1}{2} \right ) $$

or, equivalently, by

$$ r _ {s} = 1 - \frac{6 }{n ( n ^ {2} - 1 ) } \sum _ {i=1} ^ { n } d _ {i} ^ {2} , $$

where $ d _ {i} $ is the difference between the ranks of $ X _ {i} $ and $ Y _ {i} $. The value of $ r _ {s} $ lies between $ - 1 $ and $ + 1 $; $ r _ {s} = + 1 $ when the rank sequences completely coincide, i.e. $ i = R _ {i} $, $ i = 1 \dots n $; and $ r _ {s} = - 1 $ when the rank sequences are completely opposite, i.e. $ i = ( n + 1 ) - R _ {i} $, $ i = 1 \dots n $. This coefficient, like any other rank statistic, is applied to test the hypothesis of independence of two variables. If the variables are independent, then $ {\mathsf E} r _ {s} = 0 $, and $ {\mathsf D} r _ {s} = 1 / ( n - 1 ) $. Thus, the amount of deviation of $ r _ {s} $ from zero gives information about the dependence or independence of the variables. To construct the corresponding test one computes the distribution of $ r _ {s} $ for independent variables $ X $ and $ Y $. When $ 4 \leq n \leq 10 $ one can use tables of the exact distribution (see [2], [4]), and when $ n > 10 $ one can take advantage, for example, of the fact that as $ n \rightarrow \infty $ the random variable $ \sqrt n- 1 r _ {s} $ is asymptotically distributed as a standard normal distribution. In the latter case the hypothesis of independence is rejected if $ | r _ {s} | > u _ {1 - \alpha / 2 } / \sqrt n- 1 $, where $ u _ {1 - \alpha / 2 } $ is the root of the equation $ \Phi ( u) = 1 - \alpha / 2 $ and $ \Phi ( u) $ is the standard normal distribution function.

Under the assumption that $ X $ and $ Y $ have a joint normal distribution with (ordinary) correlation coefficient $ \rho $,

$$ {\mathsf E} r _ {s} \sim \frac{6} \pi { \mathop{\rm arc} \sin } \frac \rho {2} $$

as $ n \rightarrow \infty $, and therefore the variable $ 2 \sin ( \pi r _ {s} / 6 ) $ can be used as an estimator for $ \rho $.

The Spearman coefficient of rank correlation was named in honour of the psychologist C. Spearman (1904), who used it in research on psychology in place of the ordinary correlation coefficient. The tests based on the Spearman coefficient of rank correlation and on the Kendall coefficient of rank correlation are asymptotically equivalent (when $ n = 2 $, the corresponding rank statistics coincide).

References

[1] C. Spearman, "The proof and measurement of association between two rings" Amer. J. Psychol. , 15 (1904) pp. 72–101
[2] M.G. Kendall, "Rank correlation methods" , Griffin (1962)
[3] B.L. van der Waerden, "Mathematische Statistik" , Springer (1957)
[4] L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , Libr. math. tables , 46 , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)
[a1] J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)
[a2] M. Hollander, D.A. Wolfe, "Nonparametric statistical methods" , Wiley (1973)
How to Cite This Entry:
Spearman coefficient of rank correlation. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Spearman_coefficient_of_rank_correlation&oldid=15078
This article was adapted from an original article by A.V. Prokhorov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article