Namespaces
Variants
Actions

Difference between revisions of "Rao-Blackwell-Kolmogorov theorem"

From Encyclopedia of Mathematics
Jump to: navigation, search
m (tex encoded by computer)
m (Undo revision 48437 by Ulf Rehmann (talk))
Tag: Undo
Line 1: Line 1:
<!--
 
r0775501.png
 
$#A+1 = 70 n = 0
 
$#C+1 = 70 : ~/encyclopedia/old_files/data/R077/R.0707550 Rao\ANDBlackwell\ANDKolmogorov theorem
 
Automatically converted into TeX, above some diagnostics.
 
Please remove this comment and the {{TEX|auto}} line below,
 
if TeX found to be correct.
 
-->
 
 
{{TEX|auto}}
 
{{TEX|done}}
 
 
 
A proposition from the theory of statistical estimation on which a method for the improvement of unbiased statistical estimators is based.
 
A proposition from the theory of statistical estimation on which a method for the improvement of unbiased statistical estimators is based.
  
Let $  X $
+
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775501.png" /> be a random variable with values in a sample space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775502.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775503.png" />, such that the family of probability distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775504.png" /> has a [[Sufficient statistic|sufficient statistic]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775505.png" />, and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775506.png" /> be a vector statistic with finite matrix of second moments. Then the mean <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775507.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775508.png" /> exists and, moreover, the conditional mean <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r0775509.png" /> is an [[Unbiased estimator|unbiased estimator]] for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755010.png" />, that is,
be a random variable with values in a sample space $  ( \mathfrak X , {\mathcal B} , {\mathsf P} _  \theta  ) $,
 
$  \theta \in \Theta $,
 
such that the family of probability distributions $  \{ { {\mathsf P} _  \theta  } : {\theta \in \Theta } \} $
 
has a [[Sufficient statistic|sufficient statistic]] $  T = T ( X) $,  
 
and let $  \phi = \phi ( X) $
 
be a vector statistic with finite matrix of second moments. Then the mean $  {\mathsf E} _  \theta  \{ \phi \} $
 
of $  \phi $
 
exists and, moreover, the conditional mean $  \phi  ^ {*} = {\mathsf E} _  \theta  \{ \phi \mid  T \} $
 
is an [[Unbiased estimator|unbiased estimator]] for $  {\mathsf E} _  \theta  \{ \phi \} $,  
 
that is,
 
  
$$
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755011.png" /></td> </tr></table>
{\mathsf E} _  \theta  \{ \phi  ^ {*} \}  = \
 
{\mathsf E} _ {0} \{ {\mathsf E} _ {0} \{ \phi \mid  T \} \}
 
= {\mathsf E} _  \theta  \{ \phi \} .
 
$$
 
  
The Rao–Blackwell–Kolmogorov theorem states that under these conditions the quadratic risk of $  \phi  ^ {*} $
+
The Rao–Blackwell–Kolmogorov theorem states that under these conditions the quadratic risk of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755012.png" /> does not exceed the quadratic risk of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755013.png" />, uniformly in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755014.png" />, i.e. for any vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755015.png" /> of the same dimension as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755016.png" />, the inequality
does not exceed the quadratic risk of $  \phi $,  
 
uniformly in $  \theta \in \Theta $,  
 
i.e. for any vector $  z $
 
of the same dimension as $  \phi $,  
 
the inequality
 
  
$$
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755017.png" /></td> </tr></table>
z {\mathsf E} _ {0} \{ ( \phi - {\mathsf E} _ {0} \{ \phi \} )  ^ {T}
 
( \phi - {\mathsf E} _ {0} \{ \phi \} ) \} z  ^ {T\ } \geq
 
$$
 
  
$$
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755018.png" /></td> </tr></table>
\geq  \
 
z {\mathsf E} _ {0} \{ ( \phi  ^ {*}
 
- {\mathsf E} _ {0} \{ \phi  ^ {*} \} )  ^ {T} ( \phi  ^ {*} - {\mathsf E} _ {0} \{ \phi  ^ {*} \} ) \} z  ^ {T}
 
$$
 
  
holds for any $  \theta \in \Theta $.  
+
holds for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755019.png" />. In particular, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755020.png" /> is a one-dimensional statistic, then for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755021.png" /> the variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755022.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755023.png" /> does not exceed the variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755024.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755025.png" />.
In particular, if $  \phi $
 
is a one-dimensional statistic, then for any $  \theta \in \Theta $
 
the variance $  {\mathsf D} _  \theta  \phi  ^ {*} $
 
of $  \phi  ^ {*} $
 
does not exceed the variance $  {\mathsf D} _  \theta  \phi $
 
of $  \phi $.
 
  
 
In the most general situation the Rao–Blackwell–Kolmogorov theorem states that averaging over a sufficient statistic does not lead to an increase of the risk with respect to any convex loss function. This implies that good statistical estimators should be looked for only in terms of sufficient statistics, that is, in the class of functions of sufficient statistics.
 
In the most general situation the Rao–Blackwell–Kolmogorov theorem states that averaging over a sufficient statistic does not lead to an increase of the risk with respect to any convex loss function. This implies that good statistical estimators should be looked for only in terms of sufficient statistics, that is, in the class of functions of sufficient statistics.
  
In case the family $  \{ {\mathsf P} _  \theta  T  ^ {-} 1 \} $
+
In case the family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755026.png" /> is complete, that is, when the function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755027.png" /> that is almost-everywhere equal to zero is the only unbiased estimator based on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755028.png" /> for zero, the unbiased estimator with uniformly minimal risk provided by the Rao–Blackwell–Kolmogorov theorem is unique. Thus, the Rao–Blackwell–Kolmogorov theorem gives a recipe for constructing best unbiased estimators: one has to take some unbiased estimator and then average it over a sufficient statistic. That is how the best unbiased estimator for the distribution function of the normal law is constructed in the following example, which is due to A.N. Kolmogorov.
is complete, that is, when the function of $  T $
 
that is almost-everywhere equal to zero is the only unbiased estimator based on $  T $
 
for zero, the unbiased estimator with uniformly minimal risk provided by the Rao–Blackwell–Kolmogorov theorem is unique. Thus, the Rao–Blackwell–Kolmogorov theorem gives a recipe for constructing best unbiased estimators: one has to take some unbiased estimator and then average it over a sufficient statistic. That is how the best unbiased estimator for the distribution function of the normal law is constructed in the following example, which is due to A.N. Kolmogorov.
 
 
 
Example. Given a realization of a random vector  $  X = ( X _ {1} \dots X _ {n} ) $
 
whose components  $  X _ {i} $,
 
$  i = 1 \dots n $,
 
$  n \geq  3 $,
 
are independent random variables subject to the same normal law  $  N _ {1} ( \xi , \sigma  ^ {2} ) $,
 
it is required to estimate the distribution function
 
 
 
$$
 
\Phi \left (
 
\frac{x - \xi } \sigma
 
\right )  = \
 
 
 
\frac{1}{\sqrt {2 \pi } \sigma }
 
  
\int\limits _ {- \infty } ^ { x }
+
Example. Given a realization of a random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755029.png" /> whose components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755030.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755031.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755032.png" />, are independent random variables subject to the same normal law <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755033.png" />, it is required to estimate the distribution function
e ^ {- ( u - \xi )  ^ {2} / 2 \sigma  ^ {2} } \
 
d u ,\  | \xi | < \infty ,\ \
 
\sigma > 0 .
 
$$
 
  
The parameters  $  \xi $
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755034.png" /></td> </tr></table>
and  $  \sigma  ^ {2} $
 
are supposed to be unknown. Since the family
 
  
$$
+
The parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755035.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755036.png" /> are supposed to be unknown. Since the family
\left \{ {\Phi \left (
 
\frac{x - \xi } \sigma
 
\right ) } : {
 
| \xi | \langle  \infty , \sigma \rangle 0 } \right \}
 
$$
 
  
of normal laws has a complete sufficient statistic  $  T = ( \overline{X}\; , S  ^ {2} ) $,
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755037.png" /></td> </tr></table>
where
 
  
$$
+
of normal laws has a complete sufficient statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755038.png" />, where
\overline{X}\;  =
 
\frac{X _ {1} + \dots + X _ {n} }{n}
 
  
$$
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755039.png" /></td> </tr></table>
  
 
and
 
and
  
$$
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755040.png" /></td> </tr></table>
S  ^ {2}  =
 
\frac{1}{n}
 
\sum _ { i= } 1 ^ { n }  ( X _ {i} - \overline{X}\; )  ^ {2} ,
 
$$
 
  
the Rao–Blackwell–Kolmogorov theorem can be used for the construction of the best unbiased estimator for the distribution function $  \Phi ( ( x - \xi ) / \sigma ) $.  
+
the Rao–Blackwell–Kolmogorov theorem can be used for the construction of the best unbiased estimator for the distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755041.png" />. As an initial statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755042.png" /> one may use, e.g., the empirical distribution function constructed from an arbitrary component <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755043.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755044.png" />:
As an initial statistic $  \phi $
 
one may use, e.g., the empirical distribution function constructed from an arbitrary component $  X _ {1} $
 
of $  X $:
 
  
$$
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755045.png" /></td> </tr></table>
\phi  = \left \{
 
  
This is a trivial unbiased estimator for $  \Phi ( ( x - \xi ) / \sigma ) $,  
+
This is a trivial unbiased estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755046.png" />, since
since
 
  
$$
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755047.png" /></td> </tr></table>
{\mathsf E} \{ \phi \}  = {\mathsf P} \{ X _ {1} \leq  x \}
 
= \Phi \left (
 
\frac{x - \xi } \sigma
 
\right ) .
 
$$
 
  
Averaging of $  \phi $
+
Averaging of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755048.png" /> over the sufficient statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755049.png" /> gives the estimator
over the sufficient statistic $  T $
 
gives the estimator
 
  
$$ \tag{1 }
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755050.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
\phi  ^ {*}  = {\mathsf E} \{ \phi \mid  T \}  = \
 
{\mathsf P} \{ X _ {1} \leq  x \mid  \overline{X}\; , S  ^ {2} \} =
 
$$
 
  
$$
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755051.png" /></td> </tr></table>
= \
 
{\mathsf P} \left \{
 
\frac{X _ {1} - \overline{X}\; }{S}
 
\leq 
 
\frac{x
 
- \overline{X}\; }{S}
 
\mid  \overline{X}\; , S  ^ {2} \right \} .
 
$$
 
  
 
Since the statistic
 
Since the statistic
  
$$
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755052.png" /></td> </tr></table>
= \left (
 
 
 
\frac{X _ {1} - \overline{X}\; }{S}
 
\dots
 
 
 
\frac{X _ {n} - \overline{X}\; }{S}
 
 
 
\right ) ,
 
$$
 
 
 
which is complementary to  $  T $,
 
has a uniform distribution on the  $  ( n - 2 ) $-
 
dimensional sphere of radius  $  n $
 
and, therefore, depends neither on the unknown parameters  $  \xi $
 
and  $  \sigma  ^ {2} $
 
nor on  $  T $,
 
the same is true for  $  ( X _ {1} - \overline{X}\; ) / S $
 
and
 
  
$$ \tag{2 }
+
which is complementary to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755053.png" />, has a uniform distribution on the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755054.png" />-dimensional sphere of radius <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755055.png" /> and, therefore, depends neither on the unknown parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755056.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755057.png" /> nor on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755058.png" />, the same is true for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755059.png" /> and
{\mathsf P} \left \{
 
  
\frac{X _ {1} - \overline{X}\; }{S}
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755060.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
\leq  u \right \}
 
= T _ {n-} 2 ( u) ,\ \
 
| u | < \sqrt n- 1 ,
 
$$
 
  
 
where
 
where
  
$$ \tag{3 }
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755061.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
T _ {f} ( u) =
 
$$
 
  
$$
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755062.png" /></td> </tr></table>
= \
 
  
\frac{1}{\sqrt {\pi ( f + 1 ) } }
+
is the Thompson distribution with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755063.png" /> degrees of freedom. Thus, (1)(3) imply that the best unbiased estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755064.png" /> obtained from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755065.png" /> independent observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755066.png" /> is
 
\frac{\Gamma ( ( f+ 1) / 2 ) }{\Gamma ( f / 2 ) }
 
\int\limits _ {- \sqrt {f + 1 } } ^ { u }  \left
 
( 1 -
 
\frac{t ^ {2 } }{f+}
 
1 \right ) ^ {( f - 2) / 2 }  du
 
$$
 
  
is the Thompson distribution with  $  f $
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755067.png" /></td> </tr></table>
degrees of freedom. Thus, (1)–(3) imply that the best unbiased estimator for  $  \Phi ( ( x - \xi ) / \sigma ) $
 
obtained from  $  n $
 
independent observations  $  X _ {1} \dots X _ {n} $
 
is
 
  
$$
+
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755068.png" /></td> </tr></table>
\phi  ^ {*}  = \
 
T _ {n-} 2
 
\left (
 
  
\frac{x - \overline{X}\; }{S}
+
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755069.png" /> is the Student distribution with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077550/r07755070.png" /> degrees of freedom.
  
\right ) =
+
====References====
$$
+
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  A.N. Kolmogorov,  "Unbiased estimates"  ''Izv. Akad. Nauk SSSR Ser. Mat.'' , '''14''' :  4  (1950)  pp. 303–326 (In Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  C.R. Rao,  "Linear statistical inference and its applications" , Wiley  (1965)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  B.L. van der Waerden,  "Mathematische Statistik" , Springer  (1957)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  D. Blackwell,  "Conditional expectation and unbiased sequential estimation" ''Ann. Math. Stat.'' , '''18''' (1947) pp. 105–110</TD></TR></table>
 
 
$$
 
= \
 
S _ {n-} 2 \left (  
 
\frac{x - \overline{X}\; }{S}
 
  \sqrt {n-
 
\frac{2}{n - 1 - ( ( x - \overline{X}\; ) / S ) ^ {2} }
 
  } \right ) ,
 
$$
 
  
where  $  S _ {f} ( \cdot ) $
 
is the Student distribution with  $  f $
 
degrees of freedom.
 
  
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  A.N. Kolmogorov,  "Unbiased estimates"  ''Izv. Akad. Nauk SSSR Ser. Mat.'' , '''14''' :  4  (1950)  pp. 303–326  (In Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  C.R. Rao,  "Linear statistical inference and its applications" , Wiley  (1965)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  B.L. van der Waerden,  "Mathematische Statistik" , Springer  (1957)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  D. Blackwell,  "Conditional expectation and unbiased sequential estimation"  ''Ann. Math. Stat.'' , '''18'''  (1947)  pp. 105–110</TD></TR></table>
 
  
 
====Comments====
 
====Comments====
 
In the Western literature this theorem is mostly referred to as the Rao–Blackwell theorem.
 
In the Western literature this theorem is mostly referred to as the Rao–Blackwell theorem.

Revision as of 14:53, 7 June 2020

A proposition from the theory of statistical estimation on which a method for the improvement of unbiased statistical estimators is based.

Let be a random variable with values in a sample space , , such that the family of probability distributions has a sufficient statistic , and let be a vector statistic with finite matrix of second moments. Then the mean of exists and, moreover, the conditional mean is an unbiased estimator for , that is,

The Rao–Blackwell–Kolmogorov theorem states that under these conditions the quadratic risk of does not exceed the quadratic risk of , uniformly in , i.e. for any vector of the same dimension as , the inequality

holds for any . In particular, if is a one-dimensional statistic, then for any the variance of does not exceed the variance of .

In the most general situation the Rao–Blackwell–Kolmogorov theorem states that averaging over a sufficient statistic does not lead to an increase of the risk with respect to any convex loss function. This implies that good statistical estimators should be looked for only in terms of sufficient statistics, that is, in the class of functions of sufficient statistics.

In case the family is complete, that is, when the function of that is almost-everywhere equal to zero is the only unbiased estimator based on for zero, the unbiased estimator with uniformly minimal risk provided by the Rao–Blackwell–Kolmogorov theorem is unique. Thus, the Rao–Blackwell–Kolmogorov theorem gives a recipe for constructing best unbiased estimators: one has to take some unbiased estimator and then average it over a sufficient statistic. That is how the best unbiased estimator for the distribution function of the normal law is constructed in the following example, which is due to A.N. Kolmogorov.

Example. Given a realization of a random vector whose components , , , are independent random variables subject to the same normal law , it is required to estimate the distribution function

The parameters and are supposed to be unknown. Since the family

of normal laws has a complete sufficient statistic , where

and

the Rao–Blackwell–Kolmogorov theorem can be used for the construction of the best unbiased estimator for the distribution function . As an initial statistic one may use, e.g., the empirical distribution function constructed from an arbitrary component of :

This is a trivial unbiased estimator for , since

Averaging of over the sufficient statistic gives the estimator

(1)

Since the statistic

which is complementary to , has a uniform distribution on the -dimensional sphere of radius and, therefore, depends neither on the unknown parameters and nor on , the same is true for and

(2)

where

(3)

is the Thompson distribution with degrees of freedom. Thus, (1)–(3) imply that the best unbiased estimator for obtained from independent observations is

where is the Student distribution with degrees of freedom.

References

[1] A.N. Kolmogorov, "Unbiased estimates" Izv. Akad. Nauk SSSR Ser. Mat. , 14 : 4 (1950) pp. 303–326 (In Russian)
[2] C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965)
[3] B.L. van der Waerden, "Mathematische Statistik" , Springer (1957)
[4] D. Blackwell, "Conditional expectation and unbiased sequential estimation" Ann. Math. Stat. , 18 (1947) pp. 105–110


Comments

In the Western literature this theorem is mostly referred to as the Rao–Blackwell theorem.

How to Cite This Entry:
Rao-Blackwell-Kolmogorov theorem. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rao-Blackwell-Kolmogorov_theorem&oldid=49389
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article