Namespaces
Variants
Actions

Difference between revisions of "Rank vector"

From Encyclopedia of Mathematics
Jump to: navigation, search
m (Undo revision 48436 by Ulf Rehmann (talk))
Tag: Undo
m (tex encoded by computer)
Line 1: Line 1:
A vector statistic (cf. [[Statistics|Statistics]]) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775401.png" /> constructed from a random observation vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775402.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775403.png" />-th component <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775404.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775405.png" />, defined by
+
<!--
 +
r0775401.png
 +
$#A+1 = 81 n = 0
 +
$#C+1 = 81 : ~/encyclopedia/old_files/data/R077/R.0707540 Rank vector
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775406.png" /></td> </tr></table>
+
{{TEX|auto}}
 +
{{TEX|done}}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775407.png" /> is the characteristic function (indicator function) of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775408.png" />, that is,
+
A vector statistic (cf. [[Statistics|Statistics]])  $  R = ( R _ {1} \dots R _ {n} ) $
 +
constructed from a random observation vector  $  X = ( X _ {1} \dots X _ {n} ) $
 +
with  $  i $-
 +
th component  $  R _ {i} = R _ {i} ( X) $,
 +
$  i = 1 \dots n $,  
 +
defined by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775409.png" /></td> </tr></table>
+
$$
 +
R _ {i}  = \sum _ { j= } 1 ^ { n }  \delta ( X _ {i} - X _ {j} ) ,
 +
$$
  
The statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754010.png" /> is called the rank of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754011.png" />-th component <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754012.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754013.png" />, of the random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754014.png" />. This definition of a rank vector is precise under the condition
+
where  $  \delta ( x) $
 +
is the characteristic function (indicator function) of $  [ 0 , + \infty ] $,  
 +
that is,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754015.png" /></td> </tr></table>
+
$$
 +
\delta ( x)  = \
 +
\left \{
 +
\begin{array}{ll}
 +
1  & \textrm{ if }  x \geq  0 ,  \\
 +
0  & \textrm{ if }  x < 0 . \\
 +
\end{array}
  
which automatically holds if the probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754016.png" /> is defined by a density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754017.png" />. It follows from the definition of a rank vector that, under these conditions, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754018.png" /> takes values in the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754019.png" /> of all permutations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754020.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754021.png" /> and the realization <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754022.png" /> of the rank <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754023.png" /> is equal to the number of components of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754024.png" /> whose observed values do not exceed the realization of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754025.png" />-th component <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754026.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754027.png" />.
+
\right .$$
  
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754028.png" /> be the vector of order statistics (cf. [[Order statistic|Order statistic]]) constructed from the observation vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754029.png" />. Then the pair <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754030.png" /> is a [[Sufficient statistic|sufficient statistic]] for the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754031.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754032.png" /> itself can be uniquely recovered from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754033.png" />. Moreover, under the additional assumption that the density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754034.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754035.png" /> is symmetric with respect to permutations of the arguments, the components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754036.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754037.png" /> of the sufficient statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754038.png" /> are independent and
+
The statistic $  R _ {i} $
 +
is called the rank of the  $  i $-
 +
th component  $  X _ {i} $,  
 +
$  i = 1 \dots n $,  
 +
of the random vector  $  X $.  
 +
This definition of a rank vector is precise under the condition
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754039.png" /></td> </tr></table>
+
$$
 +
{\mathsf P} \{ X _ {i} = X _ {j} \}  = 0 ,\ \
 +
i \neq j ,
 +
$$
 +
 
 +
which automatically holds if the probability distribution of  $  X $
 +
is defined by a density  $  p ( x) = p ( x _ {1} \dots x _ {n} ) $.
 +
It follows from the definition of a rank vector that, under these conditions,  $  R $
 +
takes values in the space  $  \mathfrak R = \{ r \} $
 +
of all permutations  $  r = ( r _ {1} \dots r _ {n} ) $
 +
of  $  1 \dots n $
 +
and the realization  $  r _ {i} $
 +
of the rank  $  R _ {i} $
 +
is equal to the number of components of  $  X $
 +
whose observed values do not exceed the realization of the  $  i $-
 +
th component  $  X _ {i} $,
 +
$  i = 1 \dots n $.
 +
 
 +
Let  $  X ^ {( \cdot ) } = ( X _ {(} n1) \dots X _ {(} nn) ) $
 +
be the vector of order statistics (cf. [[Order statistic|Order statistic]]) constructed from the observation vector  $  X $.  
 +
Then the pair  $  ( R , X ^ {( \cdot ) } ) $
 +
is a [[Sufficient statistic|sufficient statistic]] for the distribution of  $  X $,
 +
and  $  X $
 +
itself can be uniquely recovered from  $  ( R , X ^ {( \cdot ) } ) $.  
 +
Moreover, under the additional assumption that the density  $  p ( x) $
 +
of  $  X $
 +
is symmetric with respect to permutations of the arguments, the components  $  R $
 +
and  $  X ^ {( \cdot ) } $
 +
of the sufficient statistic  $  ( R , X ^ {( \cdot ) } ) $
 +
are independent and
 +
 
 +
$$
 +
{\mathsf P} \{ R = r \}  = 
 +
\frac{1}{n ! }
 +
,\ \
 +
r \in \mathfrak R .
 +
$$
  
 
In particular, if
 
In particular, if
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754040.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+
$$ \tag{1 }
 +
p ( x)  = p ( x _ {1} \dots x _ {n} )
 +
= \prod _ { i= } 1 ^ { n }  f ( x _ {i} ) ,
 +
$$
 +
 
 +
that is, the components  $  X _ {1} \dots X _ {n} $
 +
are independent identically-distributed random variables ( $  f ( x _ {i} ) $
 +
stands for the density of  $  X _ {i} $),
 +
then
 +
 
 +
$$ \tag{2 }
 +
\left .
 +
\begin{array}{c}
 +
{\mathsf P} \{ R _ {i} = k \}  = 
 +
\frac{1}{n}
 +
,\  i = 1 \dots n ,  \\
 +
{\mathsf P} \{ R _ {i} = k , R _ {j} = m \}  =
 +
\frac{1}{n ( n - 1 ) }
 +
,
 +
\  i \neq j ,\  k \neq m ,  \\
 +
{\mathsf E} \{ R _ {i} \}  = n+
 +
\frac{1}{2}
 +
,\  {\mathsf D} \{ R _ {i} \}  =
 +
\frac{n  ^ {2} - 1 }{12}
 +
,\ \
 +
i = 1 \dots n ,  \\
 +
\end{array}
 +
\right \}
 +
$$
 +
 
 +
for any  $  k = 1 \dots n $.
  
that is, the components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754041.png" /> are independent identically-distributed random variables (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754042.png" /> stands for the density of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754043.png" />), then
+
If (1) holds, there is a joint density  $  q ( x _ {i} , k ) $,
 +
$  k = 1 \dots n $,
 +
of $  X _ {i} $
 +
and  $  R _ {i} $,  
 +
defined by the formula
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754044.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+
$$ \tag{3 }
 +
q ( x _ {i} , k ) =
 +
$$
  
for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754045.png" />.
+
$$
 +
= \
  
If (1) holds, there is a joint density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754046.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754047.png" />, of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754048.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754049.png" />, defined by the formula
+
\frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! }
 +
[ F (
 +
x _ {i} ) ]  ^ {k-} 1 [ 1 - F ( x _ {i} ) ]  ^ {n-} k f ( x _ {i} ) ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754050.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
+
where  $  F ( x _ {i} ) $
 +
is the distribution function of  $  X _ {i} $.  
 +
It follows from (2) and (3) that the conditional density  $  q ( X _ {i} \mid  R _ {i} = k ) $
 +
of  $  X _ {i} $
 +
given  $  R _ {i} = k $(
 +
$  k = 1 \dots n $)  
 +
is expressed by the formula
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754051.png" /></td> </tr></table>
+
$$ \tag{4 }
 +
q ( x _ {i} \mid  R _ {i} = k ) =
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754052.png" /> is the distribution function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754053.png" />. It follows from (2) and (3) that the conditional density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754054.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754055.png" /> given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754056.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754057.png" />) is expressed by the formula
+
$$
 +
= \
 +
n! over {( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} )
 +
]  ^ {k-} 1 [ 1 - F ( x _ {i} ) ]  ^ {n-} k f ( x _ {i} ) .
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754058.png" /></td> <td valign="top" style="width:5%;text-align:right;">(4)</td></tr></table>
+
The latter formula allows one to trace the internal connection between the observation vector  $  X $,
 +
the rank vector  $  R $
 +
and the vector  $  X ^ {( \cdot ) } $
 +
of order statistics, since (4) is just the probability density of the  $  k $-
 +
th order statistic  $  X _ {(} nk) $,
 +
$  k = 1 \dots n $.  
 +
Moreover, it follows from (3) that the conditional distribution of the rank  $  R _ {i} $
 +
is given by the formula
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754059.png" /></td> </tr></table>
+
$$
 +
{\mathsf P} \{ R _ {i} = k \mid  X _ {i} \} =
 +
$$
  
The latter formula allows one to trace the internal connection between the observation vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754060.png" />, the rank vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754061.png" /> and the vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754062.png" /> of order statistics, since (4) is just the probability density of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754063.png" />-th order statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754064.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754065.png" />. Moreover, it follows from (3) that the conditional distribution of the rank <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754066.png" /> is given by the formula
+
$$
 +
= \
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754067.png" /></td> </tr></table>
+
\frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! }
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754068.png" /></td> </tr></table>
+
[ F ( X _ {i} ) ]  ^ {k-} 1 [ 1 - F ( X _ {i} ) ]  ^ {n-} k .
 +
$$
  
Finally, under the assumption that the moments <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754069.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754070.png" /> exist and that (1) holds, (2) and (3) imply that the correlation coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754071.png" /> between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754072.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754073.png" /> is equal to
+
Finally, under the assumption that the moments $  {\mathsf E} \{ X _ {i} \} $
 +
and $  {\mathsf D} \{ X _ {i} \} $
 +
exist and that (1) holds, (2) and (3) imply that the correlation coefficient $  \rho ( X _ {i} , R _ {i} ) $
 +
between $  X _ {i} $
 +
and $  R _ {i} $
 +
is equal to
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754074.png" /></td> </tr></table>
+
$$
 +
\rho ( X _ {i} , R _ {i} )  = \
 +
\sqrt {
 +
\frac{12 ( n - 1 ) }{( n + 1 ) {\mathsf D} \{ X _ {i} \} }
 +
}
 +
\int\limits _ {- \infty } ^  \infty 
 +
x _ {i} \left [ F ( x _ {i} ) -  
 +
\frac{1}{2}
 +
\right ]  d F ( x _ {i} ) .
 +
$$
  
In particular, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754075.png" /> is uniformly distributed on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754076.png" />, then
+
In particular, if $  X _ {i} $
 +
is uniformly distributed on $  [ 0 , 1 ] $,
 +
then
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754077.png" /></td> </tr></table>
+
$$
 +
\rho ( X _ {i} , R _ {i} )  = \
 +
\sqrt {n-  
 +
\frac{1}{n+}
 +
1 } .
 +
$$
  
If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754078.png" /> has the [[Normal distribution|normal distribution]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754079.png" />, then
+
If $  X $
 +
has the [[Normal distribution|normal distribution]] $  N ( a , \sigma  ^ {2} ) $,
 +
then
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754080.png" /></td> </tr></table>
+
$$
 +
\rho ( X _ {i} , R _ {i} )  = \
 +
\sqrt {
 +
\frac{3 ( n - 1 ) }{\pi ( n + 1 ) }
 +
} ,
 +
$$
  
and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754081.png" /> does not depend on the parameters of the normal distribution.
+
and $  \rho ( X _ {i} , R _ {i} ) $
 +
does not depend on the parameters of the normal distribution.
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  W. Hoeffding,  " "Optimum"  nonparametric tests" , ''Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950)'' , Univ. California Press  (1951)  pp. 83–92</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  J. Hájek,  Z. Sidák,  "Theory of rank tests" , Acad. Press  (1967)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  F.P. Tarasenko,  "Non-parametric statistics" , Tomsk  (1976)  (In Russian)</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  W. Hoeffding,  " "Optimum"  nonparametric tests" , ''Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950)'' , Univ. California Press  (1951)  pp. 83–92</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  J. Hájek,  Z. Sidák,  "Theory of rank tests" , Acad. Press  (1967)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  F.P. Tarasenko,  "Non-parametric statistics" , Tomsk  (1976)  (In Russian)</TD></TR></table>

Revision as of 14:54, 7 June 2020


A vector statistic (cf. Statistics) $ R = ( R _ {1} \dots R _ {n} ) $ constructed from a random observation vector $ X = ( X _ {1} \dots X _ {n} ) $ with $ i $- th component $ R _ {i} = R _ {i} ( X) $, $ i = 1 \dots n $, defined by

$$ R _ {i} = \sum _ { j= } 1 ^ { n } \delta ( X _ {i} - X _ {j} ) , $$

where $ \delta ( x) $ is the characteristic function (indicator function) of $ [ 0 , + \infty ] $, that is,

$$ \delta ( x) = \ \left \{ \begin{array}{ll} 1 & \textrm{ if } x \geq 0 , \\ 0 & \textrm{ if } x < 0 . \\ \end{array} \right .$$

The statistic $ R _ {i} $ is called the rank of the $ i $- th component $ X _ {i} $, $ i = 1 \dots n $, of the random vector $ X $. This definition of a rank vector is precise under the condition

$$ {\mathsf P} \{ X _ {i} = X _ {j} \} = 0 ,\ \ i \neq j , $$

which automatically holds if the probability distribution of $ X $ is defined by a density $ p ( x) = p ( x _ {1} \dots x _ {n} ) $. It follows from the definition of a rank vector that, under these conditions, $ R $ takes values in the space $ \mathfrak R = \{ r \} $ of all permutations $ r = ( r _ {1} \dots r _ {n} ) $ of $ 1 \dots n $ and the realization $ r _ {i} $ of the rank $ R _ {i} $ is equal to the number of components of $ X $ whose observed values do not exceed the realization of the $ i $- th component $ X _ {i} $, $ i = 1 \dots n $.

Let $ X ^ {( \cdot ) } = ( X _ {(} n1) \dots X _ {(} nn) ) $ be the vector of order statistics (cf. Order statistic) constructed from the observation vector $ X $. Then the pair $ ( R , X ^ {( \cdot ) } ) $ is a sufficient statistic for the distribution of $ X $, and $ X $ itself can be uniquely recovered from $ ( R , X ^ {( \cdot ) } ) $. Moreover, under the additional assumption that the density $ p ( x) $ of $ X $ is symmetric with respect to permutations of the arguments, the components $ R $ and $ X ^ {( \cdot ) } $ of the sufficient statistic $ ( R , X ^ {( \cdot ) } ) $ are independent and

$$ {\mathsf P} \{ R = r \} = \frac{1}{n ! } ,\ \ r \in \mathfrak R . $$

In particular, if

$$ \tag{1 } p ( x) = p ( x _ {1} \dots x _ {n} ) = \prod _ { i= } 1 ^ { n } f ( x _ {i} ) , $$

that is, the components $ X _ {1} \dots X _ {n} $ are independent identically-distributed random variables ( $ f ( x _ {i} ) $ stands for the density of $ X _ {i} $), then

$$ \tag{2 } \left . \begin{array}{c} {\mathsf P} \{ R _ {i} = k \} = \frac{1}{n} ,\ i = 1 \dots n , \\ {\mathsf P} \{ R _ {i} = k , R _ {j} = m \} = \frac{1}{n ( n - 1 ) } , \ i \neq j ,\ k \neq m , \\ {\mathsf E} \{ R _ {i} \} = n+ \frac{1}{2} ,\ {\mathsf D} \{ R _ {i} \} = \frac{n ^ {2} - 1 }{12} ,\ \ i = 1 \dots n , \\ \end{array} \right \} $$

for any $ k = 1 \dots n $.

If (1) holds, there is a joint density $ q ( x _ {i} , k ) $, $ k = 1 \dots n $, of $ X _ {i} $ and $ R _ {i} $, defined by the formula

$$ \tag{3 } q ( x _ {i} , k ) = $$

$$ = \ \frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} ) ] ^ {k-} 1 [ 1 - F ( x _ {i} ) ] ^ {n-} k f ( x _ {i} ) , $$

where $ F ( x _ {i} ) $ is the distribution function of $ X _ {i} $. It follows from (2) and (3) that the conditional density $ q ( X _ {i} \mid R _ {i} = k ) $ of $ X _ {i} $ given $ R _ {i} = k $( $ k = 1 \dots n $) is expressed by the formula

$$ \tag{4 } q ( x _ {i} \mid R _ {i} = k ) = $$

$$ = \ n! over {( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} ) ] ^ {k-} 1 [ 1 - F ( x _ {i} ) ] ^ {n-} k f ( x _ {i} ) . $$

The latter formula allows one to trace the internal connection between the observation vector $ X $, the rank vector $ R $ and the vector $ X ^ {( \cdot ) } $ of order statistics, since (4) is just the probability density of the $ k $- th order statistic $ X _ {(} nk) $, $ k = 1 \dots n $. Moreover, it follows from (3) that the conditional distribution of the rank $ R _ {i} $ is given by the formula

$$ {\mathsf P} \{ R _ {i} = k \mid X _ {i} \} = $$

$$ = \ \frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! } [ F ( X _ {i} ) ] ^ {k-} 1 [ 1 - F ( X _ {i} ) ] ^ {n-} k . $$

Finally, under the assumption that the moments $ {\mathsf E} \{ X _ {i} \} $ and $ {\mathsf D} \{ X _ {i} \} $ exist and that (1) holds, (2) and (3) imply that the correlation coefficient $ \rho ( X _ {i} , R _ {i} ) $ between $ X _ {i} $ and $ R _ {i} $ is equal to

$$ \rho ( X _ {i} , R _ {i} ) = \ \sqrt { \frac{12 ( n - 1 ) }{( n + 1 ) {\mathsf D} \{ X _ {i} \} } } \int\limits _ {- \infty } ^ \infty x _ {i} \left [ F ( x _ {i} ) - \frac{1}{2} \right ] d F ( x _ {i} ) . $$

In particular, if $ X _ {i} $ is uniformly distributed on $ [ 0 , 1 ] $, then

$$ \rho ( X _ {i} , R _ {i} ) = \ \sqrt {n- \frac{1}{n+} 1 } . $$

If $ X $ has the normal distribution $ N ( a , \sigma ^ {2} ) $, then

$$ \rho ( X _ {i} , R _ {i} ) = \ \sqrt { \frac{3 ( n - 1 ) }{\pi ( n + 1 ) } } , $$

and $ \rho ( X _ {i} , R _ {i} ) $ does not depend on the parameters of the normal distribution.

References

[1] W. Hoeffding, " "Optimum" nonparametric tests" , Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950) , Univ. California Press (1951) pp. 83–92
[2] J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)
[3] F.P. Tarasenko, "Non-parametric statistics" , Tomsk (1976) (In Russian)
How to Cite This Entry:
Rank vector. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rank_vector&oldid=49547
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article