Namespaces
Variants
Actions

Difference between revisions of "Rao-Blackwell-Kolmogorov theorem"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
(latex details)
 
(4 intermediate revisions by one other user not shown)
Line 1: Line 1:
 +
<!--
 +
r0775501.png
 +
$#A+1 = 70 n = 0
 +
$#C+1 = 70 : ~/encyclopedia/old_files/data/R077/R.0707550 Rao\ANDBlackwell\ANDKolmogorov theorem
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
 +
 +
{{TEX|auto}}
 +
{{TEX|done}}
 +
 
A proposition from the theory of statistical estimation on which a method for the improvement of unbiased statistical estimators is based.
 
A proposition from the theory of statistical estimation on which a method for the improvement of unbiased statistical estimators is based.
  
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775501.png" /> be a random variable with values in a sample space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775502.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775503.png" />, such that the family of probability distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775504.png" /> has a [[Sufficient statistic|sufficient statistic]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775505.png" />, and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775506.png" /> be a vector statistic with finite matrix of second moments. Then the mean <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775507.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775508.png" /> exists and, moreover, the conditional mean <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775509.png" /> is an [[Unbiased estimator|unbiased estimator]] for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755010.png" />, that is,
+
Let $  X $
 +
be a random variable with values in a sample space $  ( \mathfrak X , {\mathcal B} , {\mathsf P} _  \theta  ) $,  
 +
$  \theta \in \Theta $,  
 +
such that the family of probability distributions $  \{ { {\mathsf P} _  \theta  } : {\theta \in \Theta } \} $
 +
has a [[Sufficient statistic|sufficient statistic]] $  T = T ( X) $,  
 +
and let $  \phi = \phi ( X) $
 +
be a vector statistic with finite matrix of second moments. Then the mean $  {\mathsf E} _  \theta  \{ \phi \} $
 +
of $  \phi $
 +
exists and, moreover, the conditional mean $  \phi  ^ {*} = {\mathsf E} _  \theta  \{ \phi \mid  T \} $
 +
is an [[Unbiased estimator|unbiased estimator]] for $  {\mathsf E} _  \theta  \{ \phi \} $,  
 +
that is,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755011.png" /></td> </tr></table>
+
$$
 +
{\mathsf E} _  \theta  \{ \phi  ^ {*} \}  = \
 +
{\mathsf E} _ {0} \{ {\mathsf E} _ {0} \{ \phi \mid  T \} \}
 +
= {\mathsf E} _  \theta  \{ \phi \} .
 +
$$
  
The Rao–Blackwell–Kolmogorov theorem states that under these conditions the quadratic risk of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755012.png" /> does not exceed the quadratic risk of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755013.png" />, uniformly in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755014.png" />, i.e. for any vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755015.png" /> of the same dimension as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755016.png" />, the inequality
+
The Rao–Blackwell–Kolmogorov theorem states that under these conditions the quadratic risk of $  \phi  ^ {*} $
 +
does not exceed the quadratic risk of $  \phi $,  
 +
uniformly in $  \theta \in \Theta $,  
 +
i.e. for any vector $  z $
 +
of the same dimension as $  \phi $,  
 +
the inequality
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755017.png" /></td> </tr></table>
+
$$
 +
z {\mathsf E} _ {0} \{ ( \phi - {\mathsf E} _ {0} \{ \phi \} )  ^ {T}
 +
( \phi - {\mathsf E} _ {0} \{ \phi \} ) \} z  ^ {T\ } \geq
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755018.png" /></td> </tr></table>
+
$$
 +
\geq  \
 +
z {\mathsf E} _ {0} \{ ( \phi  ^ {*}
 +
- {\mathsf E} _ {0} \{ \phi  ^ {*} \} )  ^ {T} ( \phi  ^ {*} - {\mathsf E} _ {0} \{ \phi  ^ {*} \} ) \} z  ^ {T}
 +
$$
  
holds for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755019.png" />. In particular, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755020.png" /> is a one-dimensional statistic, then for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755021.png" /> the variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755022.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755023.png" /> does not exceed the variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755024.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755025.png" />.
+
holds for any $  \theta \in \Theta $.  
 +
In particular, if $  \phi $
 +
is a one-dimensional statistic, then for any $  \theta \in \Theta $
 +
the variance $  {\mathsf D} _  \theta  \phi  ^ {*} $
 +
of $  \phi  ^ {*} $
 +
does not exceed the variance $  {\mathsf D} _  \theta  \phi $
 +
of $  \phi $.
  
 
In the most general situation the Rao–Blackwell–Kolmogorov theorem states that averaging over a sufficient statistic does not lead to an increase of the risk with respect to any convex loss function. This implies that good statistical estimators should be looked for only in terms of sufficient statistics, that is, in the class of functions of sufficient statistics.
 
In the most general situation the Rao–Blackwell–Kolmogorov theorem states that averaging over a sufficient statistic does not lead to an increase of the risk with respect to any convex loss function. This implies that good statistical estimators should be looked for only in terms of sufficient statistics, that is, in the class of functions of sufficient statistics.
  
In case the family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755026.png" /> is complete, that is, when the function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755027.png" /> that is almost-everywhere equal to zero is the only unbiased estimator based on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755028.png" /> for zero, the unbiased estimator with uniformly minimal risk provided by the Rao–Blackwell–Kolmogorov theorem is unique. Thus, the Rao–Blackwell–Kolmogorov theorem gives a recipe for constructing best unbiased estimators: one has to take some unbiased estimator and then average it over a sufficient statistic. That is how the best unbiased estimator for the distribution function of the normal law is constructed in the following example, which is due to A.N. Kolmogorov.
+
In case the family $  \{ {\mathsf P} _  \theta  T  ^ {-1} \} $
 +
is complete, that is, when the function of $  T $
 +
that is almost-everywhere equal to zero is the only unbiased estimator based on $  T $
 +
for zero, the unbiased estimator with uniformly minimal risk provided by the Rao–Blackwell–Kolmogorov theorem is unique. Thus, the Rao–Blackwell–Kolmogorov theorem gives a recipe for constructing best unbiased estimators: one has to take some unbiased estimator and then average it over a sufficient statistic. That is how the best unbiased estimator for the distribution function of the normal law is constructed in the following example, which is due to A.N. Kolmogorov.
  
Example. Given a realization of a random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755029.png" /> whose components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755030.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755031.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755032.png" />, are independent random variables subject to the same normal law <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755033.png" />, it is required to estimate the distribution function
+
Example. Given a realization of a random vector $  X = ( X _ {1} \dots X _ {n} ) $
 +
whose components $  X _ {i} $,  
 +
$  i = 1 \dots n $,  
 +
$  n \geq  3 $,  
 +
are independent random variables subject to the same normal law $  N _ {1} ( \xi , \sigma  ^ {2} ) $,  
 +
it is required to estimate the distribution function
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755034.png" /></td> </tr></table>
+
$$
 +
\Phi \left (
 +
\frac{x - \xi } \sigma
 +
\right )  = \
  
The parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755035.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755036.png" /> are supposed to be unknown. Since the family
+
\frac{1}{\sqrt {2 \pi } \sigma }
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755037.png" /></td> </tr></table>
+
\int\limits _ {- \infty } ^ { x }
 +
e ^ {- ( u - \xi )  ^ {2} / 2 \sigma  ^ {2} } \
 +
d u ,\  | \xi | < \infty ,\ \
 +
\sigma > 0 .
 +
$$
  
of normal laws has a complete sufficient statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755038.png" />, where
+
The parameters  $  \xi $
 +
and  $  \sigma  ^ {2} $
 +
are supposed to be unknown. Since the family
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755039.png" /></td> </tr></table>
+
$$
 +
\left \{ {\Phi \left (
 +
\frac{x - \xi } \sigma
 +
\right ) } : {
 +
| \xi | \langle  \infty , \sigma \rangle 0 } \right \}
 +
$$
 +
 
 +
of normal laws has a complete sufficient statistic  $  T = ( \overline{X}\; , S  ^ {2} ) $,
 +
where
 +
 
 +
$$
 +
\overline{X}\; =
 +
\frac{X _ {1} + \dots + X _ {n} }{n}
 +
 
 +
$$
  
 
and
 
and
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755040.png" /></td> </tr></table>
+
$$
 +
S^2 = \frac{1}{n} \sum_{i=1}^n  ( X _ {i} - \overline{X}\; )  ^ {2} ,
 +
$$
 +
 
 +
the Rao–Blackwell–Kolmogorov theorem can be used for the construction of the best unbiased estimator for the distribution function  $  \Phi ( ( x - \xi ) / \sigma ) $.  
 +
As an initial statistic  $  \phi $
 +
one may use, e.g., the empirical distribution function constructed from an arbitrary component  $  X _ {1} $
 +
of  $  X $:
  
the Rao–Blackwell–Kolmogorov theorem can be used for the construction of the best unbiased estimator for the distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755041.png" />. As an initial statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755042.png" /> one may use, e.g., the empirical distribution function constructed from an arbitrary component <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755043.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755044.png" />:
+
$$
 +
\phi  = \left \{
 +
\begin{array}{ll}
 +
0 & \textrm{ if }  x < X _ {1} , \\
 +
1  & \textrm{ if }  x \geq  X _ {1} . \\
 +
\end{array}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755045.png" /></td> </tr></table>
+
\right .$$
  
This is a trivial unbiased estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755046.png" />, since
+
This is a trivial unbiased estimator for $  \Phi ( ( x - \xi ) / \sigma ) $,  
 +
since
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755047.png" /></td> </tr></table>
+
$$
 +
{\mathsf E} \{ \phi \}  = {\mathsf P} \{ X _ {1} \leq  x \}
 +
= \Phi \left (
 +
\frac{x - \xi } \sigma
 +
\right ) .
 +
$$
  
Averaging of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755048.png" /> over the sufficient statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755049.png" /> gives the estimator
+
Averaging of $  \phi $
 +
over the sufficient statistic $  T $
 +
gives the estimator
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755050.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+
$$ \tag{1 }
 +
\phi  ^ {*}  = {\mathsf E} \{ \phi \mid  T \}  = \
 +
{\mathsf P} \{ X _ {1} \leq  x \mid  \overline{X}\; , S  ^ {2} \} =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755051.png" /></td> </tr></table>
+
$$
 +
= \
 +
{\mathsf P} \left \{
 +
\frac{X _ {1} - \overline{X}\; }{S}
 +
\leq 
 +
\frac{x
 +
- \overline{X}\; }{S}
 +
\mid  \overline{X}\; , S  ^ {2} \right \} .
 +
$$
  
 
Since the statistic
 
Since the statistic
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755052.png" /></td> </tr></table>
+
$$
 +
= \left (
 +
 
 +
\frac{X _ {1} - \overline{X}\; }{S}
 +
\dots
  
which is complementary to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755053.png" />, has a uniform distribution on the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755054.png" />-dimensional sphere of radius <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755055.png" /> and, therefore, depends neither on the unknown parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755056.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755057.png" /> nor on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755058.png" />, the same is true for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755059.png" /> and
+
\frac{X _ {n} - \overline{X}\; }{S}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755060.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+
\right ) ,
 +
$$
 +
 
 +
which is complementary to  $  T $,
 +
has a uniform distribution on the  $  ( n - 2 ) $-
 +
dimensional sphere of radius  $  n $
 +
and, therefore, depends neither on the unknown parameters  $  \xi $
 +
and  $  \sigma  ^ {2} $
 +
nor on  $  T $,
 +
the same is true for  $  ( X _ {1} - \overline{X}\; ) / S $
 +
and
 +
 
 +
$$ \tag{2 }
 +
{\mathsf P} \left \{
 +
 
 +
\frac{X _ {1} - \overline{X}\; }{S}
 +
\leq  u \right \}
 +
= T _ {n-2} ( u) ,\ \
 +
| u | < \sqrt n- 1 ,
 +
$$
  
 
where
 
where
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755061.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
+
$$ \tag{3 }
 +
T _ {f} ( u) =
 +
$$
 +
 
 +
$$
 +
= \
 +
 
 +
\frac{1}{\sqrt {\pi ( f + 1 ) } }
 +
 +
\frac{\Gamma ( ( f+ 1) / 2 ) }{\Gamma ( f / 2 ) }
 +
\int\limits _ {- \sqrt {f + 1 } } ^ { u }  \left
 +
( 1 -
 +
\frac{t ^ {2 } }{f+1} \right ) ^ {( f - 2) / 2 }  du
 +
$$
 +
 
 +
is the Thompson distribution with  $  f $
 +
degrees of freedom. Thus, (1)–(3) imply that the best unbiased estimator for  $  \Phi ( ( x - \xi ) / \sigma ) $
 +
obtained from  $  n $
 +
independent observations  $  X _ {1} \dots X _ {n} $
 +
is
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755062.png" /></td> </tr></table>
+
$$
 +
\phi  ^ {*}  = \
 +
T _ {n-2}
 +
\left (
  
is the Thompson distribution with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755063.png" /> degrees of freedom. Thus, (1)–(3) imply that the best unbiased estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755064.png" /> obtained from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755065.png" /> independent observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755066.png" /> is
+
\frac{x - \overline{X}\; }{S}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755067.png" /></td> </tr></table>
+
\right ) =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755068.png" /></td> </tr></table>
+
$$
 +
= \
 +
S _ {n-2} \left (
 +
\frac{x - \overline{X}\; }{S}
 +
\sqrt {n-
 +
\frac{2}{n - 1 - ( ( x - \overline{X}\; ) / S )  ^ {2} }
 +
} \right ) ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755069.png" /> is the Student distribution with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755070.png" /> degrees of freedom.
+
where $  S _ {f} ( \cdot ) $
 +
is the Student distribution with $  f $
 +
degrees of freedom.
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  A.N. Kolmogorov,  "Unbiased estimates"  ''Izv. Akad. Nauk SSSR Ser. Mat.'' , '''14''' :  4  (1950)  pp. 303–326  (In Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  C.R. Rao,  "Linear statistical inference and its applications" , Wiley  (1965)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  B.L. van der Waerden,  "Mathematische Statistik" , Springer  (1957)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  D. Blackwell,  "Conditional expectation and unbiased sequential estimation"  ''Ann. Math. Stat.'' , '''18'''  (1947)  pp. 105–110</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  A.N. Kolmogorov,  "Unbiased estimates"  ''Izv. Akad. Nauk SSSR Ser. Mat.'' , '''14''' :  4  (1950)  pp. 303–326  (In Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  C.R. Rao,  "Linear statistical inference and its applications" , Wiley  (1965)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  B.L. van der Waerden,  "Mathematische Statistik" , Springer  (1957)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  D. Blackwell,  "Conditional expectation and unbiased sequential estimation"  ''Ann. Math. Stat.'' , '''18'''  (1947)  pp. 105–110</TD></TR></table>
 
 
  
 
====Comments====
 
====Comments====
 
In the Western literature this theorem is mostly referred to as the Rao–Blackwell theorem.
 
In the Western literature this theorem is mostly referred to as the Rao–Blackwell theorem.

Latest revision as of 19:47, 16 January 2024


A proposition from the theory of statistical estimation on which a method for the improvement of unbiased statistical estimators is based.

Let $ X $ be a random variable with values in a sample space $ ( \mathfrak X , {\mathcal B} , {\mathsf P} _ \theta ) $, $ \theta \in \Theta $, such that the family of probability distributions $ \{ { {\mathsf P} _ \theta } : {\theta \in \Theta } \} $ has a sufficient statistic $ T = T ( X) $, and let $ \phi = \phi ( X) $ be a vector statistic with finite matrix of second moments. Then the mean $ {\mathsf E} _ \theta \{ \phi \} $ of $ \phi $ exists and, moreover, the conditional mean $ \phi ^ {*} = {\mathsf E} _ \theta \{ \phi \mid T \} $ is an unbiased estimator for $ {\mathsf E} _ \theta \{ \phi \} $, that is,

$$ {\mathsf E} _ \theta \{ \phi ^ {*} \} = \ {\mathsf E} _ {0} \{ {\mathsf E} _ {0} \{ \phi \mid T \} \} = {\mathsf E} _ \theta \{ \phi \} . $$

The Rao–Blackwell–Kolmogorov theorem states that under these conditions the quadratic risk of $ \phi ^ {*} $ does not exceed the quadratic risk of $ \phi $, uniformly in $ \theta \in \Theta $, i.e. for any vector $ z $ of the same dimension as $ \phi $, the inequality

$$ z {\mathsf E} _ {0} \{ ( \phi - {\mathsf E} _ {0} \{ \phi \} ) ^ {T} ( \phi - {\mathsf E} _ {0} \{ \phi \} ) \} z ^ {T\ } \geq $$

$$ \geq \ z {\mathsf E} _ {0} \{ ( \phi ^ {*} - {\mathsf E} _ {0} \{ \phi ^ {*} \} ) ^ {T} ( \phi ^ {*} - {\mathsf E} _ {0} \{ \phi ^ {*} \} ) \} z ^ {T} $$

holds for any $ \theta \in \Theta $. In particular, if $ \phi $ is a one-dimensional statistic, then for any $ \theta \in \Theta $ the variance $ {\mathsf D} _ \theta \phi ^ {*} $ of $ \phi ^ {*} $ does not exceed the variance $ {\mathsf D} _ \theta \phi $ of $ \phi $.

In the most general situation the Rao–Blackwell–Kolmogorov theorem states that averaging over a sufficient statistic does not lead to an increase of the risk with respect to any convex loss function. This implies that good statistical estimators should be looked for only in terms of sufficient statistics, that is, in the class of functions of sufficient statistics.

In case the family $ \{ {\mathsf P} _ \theta T ^ {-1} \} $ is complete, that is, when the function of $ T $ that is almost-everywhere equal to zero is the only unbiased estimator based on $ T $ for zero, the unbiased estimator with uniformly minimal risk provided by the Rao–Blackwell–Kolmogorov theorem is unique. Thus, the Rao–Blackwell–Kolmogorov theorem gives a recipe for constructing best unbiased estimators: one has to take some unbiased estimator and then average it over a sufficient statistic. That is how the best unbiased estimator for the distribution function of the normal law is constructed in the following example, which is due to A.N. Kolmogorov.

Example. Given a realization of a random vector $ X = ( X _ {1} \dots X _ {n} ) $ whose components $ X _ {i} $, $ i = 1 \dots n $, $ n \geq 3 $, are independent random variables subject to the same normal law $ N _ {1} ( \xi , \sigma ^ {2} ) $, it is required to estimate the distribution function

$$ \Phi \left ( \frac{x - \xi } \sigma \right ) = \ \frac{1}{\sqrt {2 \pi } \sigma } \int\limits _ {- \infty } ^ { x } e ^ {- ( u - \xi ) ^ {2} / 2 \sigma ^ {2} } \ d u ,\ | \xi | < \infty ,\ \ \sigma > 0 . $$

The parameters $ \xi $ and $ \sigma ^ {2} $ are supposed to be unknown. Since the family

$$ \left \{ {\Phi \left ( \frac{x - \xi } \sigma \right ) } : { | \xi | \langle \infty , \sigma \rangle 0 } \right \} $$

of normal laws has a complete sufficient statistic $ T = ( \overline{X}\; , S ^ {2} ) $, where

$$ \overline{X}\; = \frac{X _ {1} + \dots + X _ {n} }{n} $$

and

$$ S^2 = \frac{1}{n} \sum_{i=1}^n ( X _ {i} - \overline{X}\; ) ^ {2} , $$

the Rao–Blackwell–Kolmogorov theorem can be used for the construction of the best unbiased estimator for the distribution function $ \Phi ( ( x - \xi ) / \sigma ) $. As an initial statistic $ \phi $ one may use, e.g., the empirical distribution function constructed from an arbitrary component $ X _ {1} $ of $ X $:

$$ \phi = \left \{ \begin{array}{ll} 0 & \textrm{ if } x < X _ {1} , \\ 1 & \textrm{ if } x \geq X _ {1} . \\ \end{array} \right .$$

This is a trivial unbiased estimator for $ \Phi ( ( x - \xi ) / \sigma ) $, since

$$ {\mathsf E} \{ \phi \} = {\mathsf P} \{ X _ {1} \leq x \} = \Phi \left ( \frac{x - \xi } \sigma \right ) . $$

Averaging of $ \phi $ over the sufficient statistic $ T $ gives the estimator

$$ \tag{1 } \phi ^ {*} = {\mathsf E} \{ \phi \mid T \} = \ {\mathsf P} \{ X _ {1} \leq x \mid \overline{X}\; , S ^ {2} \} = $$

$$ = \ {\mathsf P} \left \{ \frac{X _ {1} - \overline{X}\; }{S} \leq \frac{x - \overline{X}\; }{S} \mid \overline{X}\; , S ^ {2} \right \} . $$

Since the statistic

$$ V = \left ( \frac{X _ {1} - \overline{X}\; }{S} \dots \frac{X _ {n} - \overline{X}\; }{S} \right ) , $$

which is complementary to $ T $, has a uniform distribution on the $ ( n - 2 ) $- dimensional sphere of radius $ n $ and, therefore, depends neither on the unknown parameters $ \xi $ and $ \sigma ^ {2} $ nor on $ T $, the same is true for $ ( X _ {1} - \overline{X}\; ) / S $ and

$$ \tag{2 } {\mathsf P} \left \{ \frac{X _ {1} - \overline{X}\; }{S} \leq u \right \} = T _ {n-2} ( u) ,\ \ | u | < \sqrt n- 1 , $$

where

$$ \tag{3 } T _ {f} ( u) = $$

$$ = \ \frac{1}{\sqrt {\pi ( f + 1 ) } } \frac{\Gamma ( ( f+ 1) / 2 ) }{\Gamma ( f / 2 ) } \int\limits _ {- \sqrt {f + 1 } } ^ { u } \left ( 1 - \frac{t ^ {2 } }{f+1} \right ) ^ {( f - 2) / 2 } du $$

is the Thompson distribution with $ f $ degrees of freedom. Thus, (1)–(3) imply that the best unbiased estimator for $ \Phi ( ( x - \xi ) / \sigma ) $ obtained from $ n $ independent observations $ X _ {1} \dots X _ {n} $ is

$$ \phi ^ {*} = \ T _ {n-2} \left ( \frac{x - \overline{X}\; }{S} \right ) = $$

$$ = \ S _ {n-2} \left ( \frac{x - \overline{X}\; }{S} \sqrt {n- \frac{2}{n - 1 - ( ( x - \overline{X}\; ) / S ) ^ {2} } } \right ) , $$

where $ S _ {f} ( \cdot ) $ is the Student distribution with $ f $ degrees of freedom.

References

[1] A.N. Kolmogorov, "Unbiased estimates" Izv. Akad. Nauk SSSR Ser. Mat. , 14 : 4 (1950) pp. 303–326 (In Russian)
[2] C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965)
[3] B.L. van der Waerden, "Mathematische Statistik" , Springer (1957)
[4] D. Blackwell, "Conditional expectation and unbiased sequential estimation" Ann. Math. Stat. , 18 (1947) pp. 105–110

Comments

In the Western literature this theorem is mostly referred to as the Rao–Blackwell theorem.

How to Cite This Entry:
Rao-Blackwell-Kolmogorov theorem. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rao-Blackwell-Kolmogorov_theorem&oldid=15384
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article