Namespaces
Variants
Actions

Difference between revisions of "Bernstein-von Mises theorem"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex encoded by computer)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b1103701.png" /> be independent identically distributed random variables with a probability density depending on a parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b1103702.png" /> (cf. [[Random variable|Random variable]]; [[Probability distribution|Probability distribution]]). Suppose that an a priori distribution for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b1103703.png" /> is chosen. One of the fundamental theorems in the asymptotic theory of Bayesian inference (cf. [[Bayesian approach|Bayesian approach]]) is concerned with the convergence of the a posteriori density of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b1103704.png" />, given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b1103705.png" />, to the normal density. In other words, the a posteriori distribution tends to look like a [[Normal distribution|normal distribution]] asymptotically. This phenomenon was first noted in the case of independent and identically distributed observations by P.S. Laplace. A related, but different, result was proved by S.N. Bernstein [[#References|[a2]]], who considered the a posteriori distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b1103706.png" /> given the average <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b1103707.png" />. R. von Mises [[#References|[a12]]] extended the result to a posteriori distributions conditioned by a finite number of differentiable functionals of the empirical distribution function. L. Le Cam [[#References|[a5]]] studied the problem in his work on asymptotic properties of maximum likelihood and related Bayesian estimates. The Bernstein–von Mises theorem about convergence in the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b1103708.png" />-mean for the case of independent and identically distributed random variables reads as follows, see [[#References|[a3]]].
+
<!--
 +
b1103701.png
 +
$#A+1 = 54 n = 0
 +
$#C+1 = 54 : ~/encyclopedia/old_files/data/B110/B.1100370 Bernstein\ANDvon Mises theorem
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b1103709.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037010.png" />, be independent identically distributed random variables with probability density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037011.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037012.png" />. Suppose <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037013.png" /> is open and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037014.png" /> is an a priori probability density on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037015.png" /> which is continuous and positive in an open neighbourhood of the true parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037016.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037017.png" />. Suppose that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037018.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037019.png" /> exist and are continuous in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037020.png" />. Further, suppose that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037021.png" /> is continuous, with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037022.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037023.png" /> be a non-negative function satisfying
+
{{TEX|auto}}
 +
{{TEX|done}}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037024.png" /></td> </tr></table>
+
Let  $  \{ {X _ {j} } : {j \geq  1 } \} $
 +
be independent identically distributed random variables with a probability density depending on a parameter  $  \theta $(
 +
cf. [[Random variable|Random variable]]; [[Probability distribution|Probability distribution]]). Suppose that an a priori distribution for  $  \theta $
 +
is chosen. One of the fundamental theorems in the asymptotic theory of Bayesian inference (cf. [[Bayesian approach|Bayesian approach]]) is concerned with the convergence of the a posteriori density of  $  \theta $,
 +
given  $  X _ {1} \dots X _ {n} $,
 +
to the normal density. In other words, the a posteriori distribution tends to look like a [[Normal distribution|normal distribution]] asymptotically. This phenomenon was first noted in the case of independent and identically distributed observations by P.S. Laplace. A related, but different, result was proved by S.N. Bernstein [[#References|[a2]]], who considered the a posteriori distribution of  $  \theta $
 +
given the average  $  n ^ {-1 } ( X _ {1} + \dots + X _ {n} ) $.
 +
R. von Mises [[#References|[a12]]] extended the result to a posteriori distributions conditioned by a finite number of differentiable functionals of the empirical distribution function. L. Le Cam [[#References|[a5]]] studied the problem in his work on asymptotic properties of maximum likelihood and related Bayesian estimates. The Bernstein–von Mises theorem about convergence in the  $  L _ {1} $-
 +
mean for the case of independent and identically distributed random variables reads as follows, see [[#References|[a3]]].
  
for some <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037025.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037026.png" /> be a maximum-likelihood estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037027.png" /> based on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037028.png" /> (cf. [[Maximum-likelihood method|Maximum-likelihood method]]) and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037029.png" /> be the corresponding likelihood function. It is known that under certain regularity conditions there exists a compact neighbourhood <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037030.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037031.png" /> such that:
+
Let $  X _ {i} $,
 +
$  1 \leq  i \leq  n $,
 +
be independent identically distributed random variables with probability density  $  f ( x, \theta ) $,
 +
$  \theta \in \Theta \subset  \mathbf R $.  
 +
Suppose  $  \Theta $
 +
is open and  $  \lambda $
 +
is an a priori probability density on  $  \Theta $
 +
which is continuous and positive in an open neighbourhood of the true parameter  $  \theta _ {0} $.  
 +
Let  $  h ( x, \theta ) = { \mathop{\rm log} } f ( x, \theta ) $.  
 +
Suppose that  $  { {\partial  h } / {\partial  \theta } } $
 +
and  $  { {\partial  ^ {2} h } / {\partial  \theta  ^ {2} } } $
 +
exist and are continuous in  $  \theta $.  
 +
Further, suppose that  $  i ( \theta ) = - {\mathsf E} _  \theta  [ { {\partial  ^ {2} h } / {\partial  \theta  ^ {2} } } ] $
 +
is continuous, with  $  0 < i ( \theta ) < \infty $.  
 +
Let  $  K ( \cdot ) $
 +
be a non-negative function satisfying
  
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037032.png" /> almost surely;
+
$$
 +
\int\limits _ {- \infty } ^  \infty  {K ( t ) { \mathop{\rm exp} } \left [ - {
 +
\frac{( i ( \theta _ {0} ) - \epsilon ) t  ^ {2} }{2}
 +
} \right ] }  {d t } < \infty
 +
$$
  
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037033.png" /> for large <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037034.png" />;
+
for some  $  0 < \epsilon < i ( \theta _ {0} ) $.  
 +
Let  $  {\widehat \theta  } _ {n} $
 +
be a maximum-likelihood estimator of  $  \theta $
 +
based on  $  X _ {1} \dots X _ {n} $(
 +
cf. [[Maximum-likelihood method|Maximum-likelihood method]]) and let  $  L _ {n} ( \theta ) $
 +
be the corresponding likelihood function. It is known that under certain regularity conditions there exists a compact neighbourhood  $  U _ {\theta _ {0}  } $
 +
of  $  \theta _ {0} $
 +
such that:
  
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037035.png" /> converges in distribution (cf. [[Convergence in distribution|Convergence in distribution]]) to the normal distribution with mean <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037036.png" /> and variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037037.png" /> as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037038.png" />.
+
$  {\widehat \theta  } _ {n} \rightarrow \theta _ {0} $
 +
almost surely;
  
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037039.png" /> denote the a posteriori density of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037040.png" /> given the observation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037041.png" /> and the a priori probability density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037042.png" />, that is,
+
$  ( { {\partial  { \mathop{\rm log} } L _ {n} ( \theta ) } / {\partial  \theta } } ) \mid  _ {\theta = {\widehat \theta  }  _ {n} } = 0 $
 +
for large  $  n $;
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037043.png" /></td> </tr></table>
+
$  n ^ {1/2 } ( {\widehat \theta  } _ {n} - \theta _ {0} ) $
 +
converges in distribution (cf. [[Convergence in distribution|Convergence in distribution]]) to the normal distribution with mean  $  0 $
 +
and variance  $  {1 / {i ( \theta _ {0} ) } } $
 +
as  $  n \rightarrow \infty $.
  
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037044.png" />. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037045.png" /> is the a posteriori density of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037046.png" />.
+
Let $  f _ {n} ( \theta \mid  x _ {1} \dots x _ {n} ) $
 +
denote the a posteriori density of  $  \theta $
 +
given the observation  $  ( x _ {1} \dots x _ {n} ) $
 +
and the a priori probability density  $  \lambda ( \theta ) $,
 +
that is,
 +
 
 +
$$
 +
f _ {n} ( \theta \mid  x _ {1} \dots x _ {n} ) = {
 +
\frac{\prod _ {i = 1 } ^ { n }  f ( x _ {i} , \theta ) \lambda ( \theta ) }{\int\limits _  \Theta  {\prod _ {i = 1 } ^ { n }  f ( x _ {i} , \phi ) \lambda ( \phi ) }  {d \phi } }
 +
} .
 +
$$
 +
 
 +
Let  $  f _ {n}  ^ {*} ( t \mid  x _ {1} \dots x _ {n} ) = n ^ {- 1/2 } f _ {n} ( {\widehat \theta  } _ {n} + tn ^ {- 1/2 } ) $.  
 +
Then $  f _ {n}  ^ {*} ( t \mid  x _ {1} \dots x _ {n} ) $
 +
is the a posteriori density of $  t = n ^ {1/2 } ( \theta - {\widehat \theta  } _ {n} ) $.
  
 
A generalized version of the Bernstein–von Mises theorem, under the assumptions stated above and some addition technical conditions, is as follows.
 
A generalized version of the Bernstein–von Mises theorem, under the assumptions stated above and some addition technical conditions, is as follows.
  
If, for every <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037047.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037048.png" />,
+
If, for every $  h > 0 $
 +
and  $  \delta > 0 $,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037049.png" /></td> </tr></table>
+
$$
 +
e ^ {- n \delta } \int\limits _ {\left | t \right | > h } {K ( n ^ {1/2 } t ) \lambda ( {\widehat \theta  } _ {n} + t ) }  {d t } \rightarrow 0 \textrm{ a.s. }  [ {\mathsf P} _ {\theta _ {0}  } ] ,
 +
$$
  
 
then
 
then
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037050.png" /></td> </tr></table>
+
$$
 +
{\lim\limits } _ {n \rightarrow \infty } \int\limits _ {- \infty } ^  \infty  {K ( t ) } \cdot
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037051.png" /></td> </tr></table>
+
$$
 +
\cdot
 +
{\left | {f _ {n}  ^ {*} ( t \mid  X _ {1} \dots X _ {n} ) - \left ( {
 +
\frac{i ( \theta _ {0} ) }{2 \pi }
 +
} \right ) ^ { {1 / 2 } } e ^ {- {
 +
\frac{1}{2}
 +
} i ( \theta _ {0} ) t  ^ {2} } } \right | }  {d t }  =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037052.png" /></td> </tr></table>
+
$$
 +
=  
 +
0 \textrm{ a.s. }  [ {\mathsf P} _ {\theta _ {0}  } ] .
 +
$$
  
For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037053.png" /> one finds that the a posteriori density converges to the normal density in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110370/b11037054.png" />-mean convergence. The result can be extended to a multi-dimensional parameter. As an application of the above theorem, it can be shown that the Bayesian estimator is strongly consistent and asymptotically efficient for a suitable class of loss functions (cf. [[#References|[a11]]]). For rates of convergence see [[#References|[a4]]], [[#References|[a7]]], [[#References|[a8]]].
+
For $  K ( t ) \equiv 1 $
 +
one finds that the a posteriori density converges to the normal density in $  L _ {1} $-
 +
mean convergence. The result can be extended to a multi-dimensional parameter. As an application of the above theorem, it can be shown that the Bayesian estimator is strongly consistent and asymptotically efficient for a suitable class of loss functions (cf. [[#References|[a11]]]). For rates of convergence see [[#References|[a4]]], [[#References|[a7]]], [[#References|[a8]]].
  
 
B.L.S. Prakasa Rao [[#References|[a6]]] has generalized the result to arbitrary discrete-time stochastic processes (cf. [[#References|[a1]]]); for extensions to diffusion processes and diffusion fields, see [[#References|[a9]]], [[#References|[a10]]].
 
B.L.S. Prakasa Rao [[#References|[a6]]] has generalized the result to arbitrary discrete-time stochastic processes (cf. [[#References|[a1]]]); for extensions to diffusion processes and diffusion fields, see [[#References|[a9]]], [[#References|[a10]]].

Latest revision as of 10:58, 29 May 2020


Let $ \{ {X _ {j} } : {j \geq 1 } \} $ be independent identically distributed random variables with a probability density depending on a parameter $ \theta $( cf. Random variable; Probability distribution). Suppose that an a priori distribution for $ \theta $ is chosen. One of the fundamental theorems in the asymptotic theory of Bayesian inference (cf. Bayesian approach) is concerned with the convergence of the a posteriori density of $ \theta $, given $ X _ {1} \dots X _ {n} $, to the normal density. In other words, the a posteriori distribution tends to look like a normal distribution asymptotically. This phenomenon was first noted in the case of independent and identically distributed observations by P.S. Laplace. A related, but different, result was proved by S.N. Bernstein [a2], who considered the a posteriori distribution of $ \theta $ given the average $ n ^ {-1 } ( X _ {1} + \dots + X _ {n} ) $. R. von Mises [a12] extended the result to a posteriori distributions conditioned by a finite number of differentiable functionals of the empirical distribution function. L. Le Cam [a5] studied the problem in his work on asymptotic properties of maximum likelihood and related Bayesian estimates. The Bernstein–von Mises theorem about convergence in the $ L _ {1} $- mean for the case of independent and identically distributed random variables reads as follows, see [a3].

Let $ X _ {i} $, $ 1 \leq i \leq n $, be independent identically distributed random variables with probability density $ f ( x, \theta ) $, $ \theta \in \Theta \subset \mathbf R $. Suppose $ \Theta $ is open and $ \lambda $ is an a priori probability density on $ \Theta $ which is continuous and positive in an open neighbourhood of the true parameter $ \theta _ {0} $. Let $ h ( x, \theta ) = { \mathop{\rm log} } f ( x, \theta ) $. Suppose that $ { {\partial h } / {\partial \theta } } $ and $ { {\partial ^ {2} h } / {\partial \theta ^ {2} } } $ exist and are continuous in $ \theta $. Further, suppose that $ i ( \theta ) = - {\mathsf E} _ \theta [ { {\partial ^ {2} h } / {\partial \theta ^ {2} } } ] $ is continuous, with $ 0 < i ( \theta ) < \infty $. Let $ K ( \cdot ) $ be a non-negative function satisfying

$$ \int\limits _ {- \infty } ^ \infty {K ( t ) { \mathop{\rm exp} } \left [ - { \frac{( i ( \theta _ {0} ) - \epsilon ) t ^ {2} }{2} } \right ] } {d t } < \infty $$

for some $ 0 < \epsilon < i ( \theta _ {0} ) $. Let $ {\widehat \theta } _ {n} $ be a maximum-likelihood estimator of $ \theta $ based on $ X _ {1} \dots X _ {n} $( cf. Maximum-likelihood method) and let $ L _ {n} ( \theta ) $ be the corresponding likelihood function. It is known that under certain regularity conditions there exists a compact neighbourhood $ U _ {\theta _ {0} } $ of $ \theta _ {0} $ such that:

$ {\widehat \theta } _ {n} \rightarrow \theta _ {0} $ almost surely;

$ ( { {\partial { \mathop{\rm log} } L _ {n} ( \theta ) } / {\partial \theta } } ) \mid _ {\theta = {\widehat \theta } _ {n} } = 0 $ for large $ n $;

$ n ^ {1/2 } ( {\widehat \theta } _ {n} - \theta _ {0} ) $ converges in distribution (cf. Convergence in distribution) to the normal distribution with mean $ 0 $ and variance $ {1 / {i ( \theta _ {0} ) } } $ as $ n \rightarrow \infty $.

Let $ f _ {n} ( \theta \mid x _ {1} \dots x _ {n} ) $ denote the a posteriori density of $ \theta $ given the observation $ ( x _ {1} \dots x _ {n} ) $ and the a priori probability density $ \lambda ( \theta ) $, that is,

$$ f _ {n} ( \theta \mid x _ {1} \dots x _ {n} ) = { \frac{\prod _ {i = 1 } ^ { n } f ( x _ {i} , \theta ) \lambda ( \theta ) }{\int\limits _ \Theta {\prod _ {i = 1 } ^ { n } f ( x _ {i} , \phi ) \lambda ( \phi ) } {d \phi } } } . $$

Let $ f _ {n} ^ {*} ( t \mid x _ {1} \dots x _ {n} ) = n ^ {- 1/2 } f _ {n} ( {\widehat \theta } _ {n} + tn ^ {- 1/2 } ) $. Then $ f _ {n} ^ {*} ( t \mid x _ {1} \dots x _ {n} ) $ is the a posteriori density of $ t = n ^ {1/2 } ( \theta - {\widehat \theta } _ {n} ) $.

A generalized version of the Bernstein–von Mises theorem, under the assumptions stated above and some addition technical conditions, is as follows.

If, for every $ h > 0 $ and $ \delta > 0 $,

$$ e ^ {- n \delta } \int\limits _ {\left | t \right | > h } {K ( n ^ {1/2 } t ) \lambda ( {\widehat \theta } _ {n} + t ) } {d t } \rightarrow 0 \textrm{ a.s. } [ {\mathsf P} _ {\theta _ {0} } ] , $$

then

$$ {\lim\limits } _ {n \rightarrow \infty } \int\limits _ {- \infty } ^ \infty {K ( t ) } \cdot $$

$$ \cdot {\left | {f _ {n} ^ {*} ( t \mid X _ {1} \dots X _ {n} ) - \left ( { \frac{i ( \theta _ {0} ) }{2 \pi } } \right ) ^ { {1 / 2 } } e ^ {- { \frac{1}{2} } i ( \theta _ {0} ) t ^ {2} } } \right | } {d t } = $$

$$ = 0 \textrm{ a.s. } [ {\mathsf P} _ {\theta _ {0} } ] . $$

For $ K ( t ) \equiv 1 $ one finds that the a posteriori density converges to the normal density in $ L _ {1} $- mean convergence. The result can be extended to a multi-dimensional parameter. As an application of the above theorem, it can be shown that the Bayesian estimator is strongly consistent and asymptotically efficient for a suitable class of loss functions (cf. [a11]). For rates of convergence see [a4], [a7], [a8].

B.L.S. Prakasa Rao [a6] has generalized the result to arbitrary discrete-time stochastic processes (cf. [a1]); for extensions to diffusion processes and diffusion fields, see [a9], [a10].

References

[a1] I.V. Basawa, B.L.S. Prakasa Rao, "Statistical inference for stochastic processes" , Acad. Press (1980)
[a2] S.N. Bernstein, "Theory of probability" (1917) (In Russian)
[a3] J.D. Borwanker, G. Kallianpur, B.L.S. Prakasa Rao, "The Bernstein–von Mises theorem for Markov processes" Ann. Math. Stat. , 43 (1971) pp. 1241–1253
[a4] C. Hipp, R. Michael, "On the Bernstein–von Mises approximation of posterior distribution" Ann. Stat. , 4 (1976) pp. 972–980
[a5] L. Le Cam, "On some asymptotic properties of maximum likelihood estimates and related Bayes estimates" Univ. California Publ. Stat. , 1 (1953) pp. 277–330
[a6] B.L.S. Prakasa Rao, "Statistical inference for stochastic processes" G. Sankaranarayanan (ed.) , Proc. Advanced Symp. on Probability and its Applications , Annamalai Univ. (1976) pp. 43–150
[a7] B.L.S. Prakasa Rao, "Rate of convergence of Bernstein–von Mises approximation for Markov processes" Serdica , 4 (1978) pp. 36–42
[a8] B.L.S. Prakasa Rao, "The equivalence between (modified) Bayes estimator and maximum likelihood estimator for Markov processes" Ann. Inst. Statist. Math. , 31 (1979) pp. 499–513
[a9] B.L.S. Prakasa Rao, "The Bernstein–von Mises theorem for a class of diffusion processes" Teor. Sluch. Prots. , 9 (1981) pp. 95–104 (In Russian)
[a10] B.L.S. Prakasa Rao, "On Bayes estimation for diffusion fields" J.K. Ghosh (ed.) J. Roy (ed.) , Statistics: Applications and New Directions , Statistical Publishing Soc. (1984) pp. 504–511
[a11] B.L.S. Prakasa Rao, "Asymptotic theory of statistical inference" , Wiley (1987)
[a12] R. von Mises, "Wahrscheinlichkeitsrechnung" , Springer (1931)
How to Cite This Entry:
Bernstein-von Mises theorem. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bernstein-von_Mises_theorem&oldid=16482
This article was adapted from an original article by B.L.S. Prakasa-Rao (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article