Namespaces
Variants
Actions

Difference between revisions of "Exponential family of probability distributions"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (AUTOMATIC EDIT (latexlist): Replaced 140 formulas out of 142 by TEX code with an average confidence of 2.0 and a minimal confidence of 2.0.)
Line 1: Line 1:
 +
<!--This article has been texified automatically. Since there was no Nroff source code for this article,
 +
the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist
 +
was used.
 +
If the TeX and formula formatting is correct, please remove this message and the {{TEX|semi-auto}} category.
 +
 +
Out of 142 formulas, 140 were replaced by TEX code.-->
 +
 +
{{TEX|semi-auto}}{{TEX|partial}}
 
A certain model (i.e., a set of probability distributions on the same measurable space) in statistics which is widely used and studied for two reasons:
 
A certain model (i.e., a set of probability distributions on the same measurable space) in statistics which is widely used and studied for two reasons:
  
Line 7: Line 15:
 
The definitions found in the literature can be rather inelegant or lacking rigour. A mathematically satisfactory definition is obtained by first defining a significant particular case, namely the natural exponential family, and then using it to define general exponential families.
 
The definitions found in the literature can be rather inelegant or lacking rigour. A mathematically satisfactory definition is obtained by first defining a significant particular case, namely the natural exponential family, and then using it to define general exponential families.
  
Given a finite-dimensional real linear space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e1202601.png" />, denote by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e1202602.png" /> the space of linear forms <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e1202603.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e1202604.png" /> to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e1202605.png" />. One writes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e1202606.png" /> instead of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e1202607.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e1202608.png" /> be a positive [[Measure|measure]] on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e1202609.png" /> (equipped with Borel sets), and assume that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026010.png" /> is not concentrated on an affine hyperplane of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026011.png" />. Denote by
+
Given a finite-dimensional real linear space $E$, denote by $E ^ { * }$ the space of linear forms $\theta$ from $E$ to $\mathbf{R}$. One writes $\langle  \theta , x \rangle $ instead of $\theta ( x )$. Let $\mu$ be a positive [[Measure|measure]] on $E$ (equipped with Borel sets), and assume that $\mu$ is not concentrated on an affine hyperplane of $E$. Denote by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026012.png" /></td> </tr></table>
+
\begin{equation*} L _ { \mu } ( \theta ) = \int _ { E } \operatorname { exp } \langle \theta , x \rangle \mu ( d x ) \end{equation*}
  
its [[Laplace transform|Laplace transform]] and by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026013.png" /> the subset of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026014.png" /> on which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026015.png" /> is finite. It is easily seen that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026016.png" /> is convex. Assume that the interior <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026017.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026018.png" /> is not empty. The set of probability measures (cf. also [[Probability measure|Probability measure]]) on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026019.png" />:
+
its [[Laplace transform|Laplace transform]] and by $D ( \mu )$ the subset of $E ^ { * }$ on which $L _ { \mu } ( \theta )$ is finite. It is easily seen that $D ( \mu )$ is convex. Assume that the interior $\Theta ( \mu )$ of $D ( \mu )$ is not empty. The set of probability measures (cf. also [[Probability measure|Probability measure]]) on $E$:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026020.png" /></td> </tr></table>
+
\begin{equation*} F = F ( \mu ) = \{ \mathsf{P} ( \theta , \mu ) : \theta \in \Theta ( \mu ) \}, \end{equation*}
  
 
where
 
where
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026021.png" /></td> </tr></table>
+
\begin{equation*} \mathsf{P} ( \theta , \mu ) ( d x ) = \frac { 1 } { L _ { \mu } ( \theta ) } \operatorname { exp } \langle \theta , x \rangle \mu ( d x ), \end{equation*}
  
is called the natural exponential family (abbreviated NEF) generated by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026022.png" />. The mapping
+
is called the natural exponential family (abbreviated NEF) generated by $\mu$. The mapping
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026023.png" /></td> </tr></table>
+
<table class="eq" style="width:100%;"> <tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026023.png"/></td> </tr></table>
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026024.png" /></td> </tr></table>
+
\begin{equation*} \theta \mapsto \mathsf{P} ( \theta , \mu ), \end{equation*}
  
is called the canonical parametrization of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026025.png" />. A simple example of a natural exponential family is given by the family of binomial distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026026.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026027.png" />, with fixed parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026028.png" />, generated by the measure
+
is called the canonical parametrization of $F ( \mu )$. A simple example of a natural exponential family is given by the family of binomial distributions $B ( n , p )$, $0 &lt; p &lt; 1$, with fixed parameter $n$, generated by the measure
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026029.png" /></td> </tr></table>
+
\begin{equation*} \mu ( d x ) = \sum _ { k = 0 } ^ { n } \left( \begin{array} { l } { n } \\ { k } \end{array} \right) \delta _ { k } ( d x ), \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026030.png" /> is the Dirac measure (cf. [[Measure|Measure]]) on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026031.png" /> (cf. also [[Binomial distribution|Binomial distribution]]). Here, with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026032.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026033.png" /> one has
+
where $\delta _ { k }$ is the Dirac measure (cf. [[Measure|Measure]]) on $k$ (cf. also [[Binomial distribution|Binomial distribution]]). Here, with $p = e ^ { \theta } / ( 1 + e ^ { \theta } )$ and $q = 1 - p$ one has
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026034.png" /></td> </tr></table>
+
\begin{equation*} \mathsf{P} ( \theta , \mu ) ( d x ) = \sum _ { k = 0 } ^ { n } \left( \begin{array} { l } { n } \\ { k } \end{array} \right) p ^ { k } q ^ { n - k } \delta _ { k } ( d x ). \end{equation*}
  
Note that the canonical parametrization by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026035.png" /> generally differs from a more familiar parametrization if the natural exponential family is a classical family. This is illustrated by the above example, where the parametrization by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026036.png" /> is traditional.
+
Note that the canonical parametrization by $\theta$ generally differs from a more familiar parametrization if the natural exponential family is a classical family. This is illustrated by the above example, where the parametrization by $p$ is traditional.
  
A general exponential family (abbreviated GEF) is defined on an abstract [[Measure space|measure space]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026037.png" /> (the measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026038.png" /> is not necessarily bounded) by a measurable mapping <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026039.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026040.png" /> to a finite-dimensional real linear space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026041.png" />. This mapping <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026042.png" /> must have the following property: the image <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026043.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026044.png" /> by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026045.png" /> must be such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026046.png" /> is not concentrated on an affine hyperplane of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026047.png" />, and such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026048.png" /> is not empty. Under these circumstances, the general exponential family on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026049.png" /> generated by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026050.png" /> is:
+
A general exponential family (abbreviated GEF) is defined on an abstract [[Measure space|measure space]] $( \Omega , \mathcal{A} , \nu )$ (the measure $\nu$ is not necessarily bounded) by a measurable mapping $t$ from $\Omega$ to a finite-dimensional real linear space $E$. This mapping $t$ must have the following property: the image $\mu$ of $\nu$ by $t$ must be such that $\mu$ is not concentrated on an affine hyperplane of $E$, and such that $\Theta ( \mu )$ is not empty. Under these circumstances, the general exponential family on $\Omega$ generated by $( t , \nu )$ is:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026051.png" /></td> </tr></table>
+
\begin{equation*} F ( t , \nu ) = \{ \mathsf{P} ( \theta , t , \nu ) : \theta \in \Theta ( \mu ) \}, \end{equation*}
  
 
where
 
where
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026052.png" /></td> </tr></table>
+
\begin{equation*} \mathsf{P} ( \theta , t , \nu ) ( d \omega ) = \frac { 1 } { L _ { \mu } ( \theta ) } \operatorname { exp } \langle \theta , t ( \omega ) \rangle \nu ( d \omega ). \end{equation*}
  
In this case, the NEF <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026053.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026054.png" /> is said to be associated to the GEF <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026055.png" />. In a sense, all results about GEFs are actually results about their associated NEF. The dimension of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026056.png" /> is called the order of the general exponential family.
+
In this case, the NEF $F ( \mu )$ on $E$ is said to be associated to the GEF $F ( t , \nu )$. In a sense, all results about GEFs are actually results about their associated NEF. The dimension of $E$ is called the order of the general exponential family.
  
The most celebrated example of a general exponential family is the family of the normal distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026057.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026058.png" />, where the mean <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026059.png" /> and the variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026060.png" /> are both unknown parameters (cf. also [[Normal distribution|Normal distribution]]). Here, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026061.png" />, the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026062.png" /> is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026063.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026064.png" /> is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026065.png" />. Here, again, the canonical parametrization is not the classical one but is related to it by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026066.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026067.png" />. The associated NEF is concentrated on a parabola in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026068.png" />.
+
The most celebrated example of a general exponential family is the family of the normal distributions $N ( m , \sigma ^ { 2 } )$ on $\Omega = \mathbf{R}$, where the mean $m$ and the variance $\sigma ^ { 2 }$ are both unknown parameters (cf. also [[Normal distribution|Normal distribution]]). Here, $\nu ( d \omega ) = d x / \sqrt { 2 \pi }$, the space $E$ is $\mathbf{R} ^ { 2 }$ and $t ( \omega )$ is $( \omega , \omega ^ { 2 } / 2 )$. Here, again, the canonical parametrization is not the classical one but is related to it by $\theta _ { 1 } = m / \sigma ^ { 2 }$ and $\theta _ { 2 } = - 1 / \sigma ^ { 2 }$. The associated NEF is concentrated on a parabola in $\mathbf{R} ^ { 2 }$.
  
A common incorrect statement about such a model says that it belongs to  "the"  exponential family. Such a statement is induced by a confusion between a definite probability distribution and a family of them. When a NEF is concentrated on the set of non-negative integers, its elements are sometimes called  "power series"  distributions, since the Laplace transform is more conveniently written <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026069.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026070.png" /> is analytic around <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026071.png" />. The same confusion arises here.
+
A common incorrect statement about such a model says that it belongs to  "the"  exponential family. Such a statement is induced by a confusion between a definite probability distribution and a family of them. When a NEF is concentrated on the set of non-negative integers, its elements are sometimes called  "power series"  distributions, since the Laplace transform is more conveniently written $L _ { \mu } ( \theta ) = f ( e ^ { \theta } )$, where $f$ is analytic around $0$. The same confusion arises here.
  
There are several variations of the above definition of a GEF: mostly, the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026072.png" /> is taken to belong to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026073.png" /> and not only to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026074.png" />, thus obtaining what one may call a full-NEF. A full-GEF is similarly obtained. However, many results are not true anymore for such an extension: for instance, this is the case for the NEF on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026075.png" /> generated by a positive stable distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026076.png" /> with parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026077.png" />: this NEF is a family of inverse Gaussian distributions, with exponential moments, while <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026078.png" /> has no expectation and belongs to the full-NEF. A more genuine extension gives curved exponential families (abbreviated CEF). In this case, the set of parameters is restricted to a non-affine subset of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026079.png" />, generally a [[Manifold|manifold]]. However, this extension is in a sense too general, since most of the models in statistics can be regarded as a CEF. The reason is the following: Starting from a statistical model of the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026080.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026081.png" /> is a subset of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026082.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026083.png" /> is a CEF if and only if the linear subspace of the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026084.png" /> generated by the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026085.png" /> is finite dimensional. This is also why exponential families constructed on infinite-dimensional spaces are uninteresting (at least without further structure). For these CEFs, there are no really general results available concerning the application of the [[Maximum-likelihood method|maximum-likelihood method]]. General references are [[#References|[a2]]] and [[#References|[a5]]].
+
There are several variations of the above definition of a GEF: mostly, the parameter $\theta$ is taken to belong to $D ( \mu )$ and not only to $\Theta ( \mu )$, thus obtaining what one may call a full-NEF. A full-GEF is similarly obtained. However, many results are not true anymore for such an extension: for instance, this is the case for the NEF on $\mathbf{R}$ generated by a positive stable distribution $\mu$ with parameter $1/2$: this NEF is a family of inverse Gaussian distributions, with exponential moments, while $\mu$ has no expectation and belongs to the full-NEF. A more genuine extension gives curved exponential families (abbreviated CEF). In this case, the set of parameters is restricted to a non-affine subset of $\Theta ( \mu )$, generally a [[Manifold|manifold]]. However, this extension is in a sense too general, since most of the models in statistics can be regarded as a CEF. The reason is the following: Starting from a statistical model of the form $F = \{ f d \nu : f \in S \}$, where $S$ is a subset of $L ^ { 1 } ( \nu )$, then $F$ is a CEF if and only if the linear subspace of the space $L ^ { 0 } ( \nu )$ generated by the set $\{ \operatorname { log } f : f \in S \}$ is finite dimensional. This is also why exponential families constructed on infinite-dimensional spaces are uninteresting (at least without further structure). For these CEFs, there are no really general results available concerning the application of the [[Maximum-likelihood method|maximum-likelihood method]]. General references are [[#References|[a2]]] and [[#References|[a5]]].
  
The exponential dispersion model (abbreviated, EDP) is a concept which is related to natural exponential families as follows: starting from the NEF <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026086.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026087.png" />, the Jorgensen set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026088.png" /> is the set of positive <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026089.png" /> such that there exists a positive measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026090.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026091.png" /> whose Laplace transform is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026092.png" /> (see [[#References|[a4]]]. Trivially, it contains all positive integers. The model
+
The exponential dispersion model (abbreviated, EDP) is a concept which is related to natural exponential families as follows: starting from the NEF $F ( \mu )$ on $E$, the Jorgensen set $\Lambda ( \mu )$ is the set of positive $p$ such that there exists a positive measure $\mu _ { p }$ on $E$ whose Laplace transform is $( L _ { \mu } ) ^ { p }$ (see [[#References|[a4]]]. Trivially, it contains all positive integers. The model
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026093.png" /></td> </tr></table>
+
\begin{equation*} \{ \mathsf{P} ( \theta , \mu _ { p } ) : \theta \in \Theta ( \mu ) , p \in \Lambda ( \mu ) \} \end{equation*}
  
is the exponential dispersion model generated by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026094.png" />. It has the following striking property: Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026095.png" /> be fixed in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026096.png" />, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026097.png" /> be in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026098.png" /> and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e12026099.png" /> be independent random variables with respective distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260100.png" />, with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260101.png" />. Then the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260102.png" /> conditioned by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260103.png" /> does not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260104.png" />. The distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260105.png" /> is obviously <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260106.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260107.png" />. Furthermore, if the parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260108.png" /> are known, and if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260109.png" /> is unknown, then the maximum-likelihood method to estimate <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260110.png" /> from the knowledge of the observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260111.png" /> is the one obtained from the knowledge of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260112.png" />. For instance, if the NEF is the Bernoulli family of distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260113.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260114.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260115.png" />, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260116.png" /> are independent Bernoulli random variables with the same unknown <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260117.png" />, then in order to estimate <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260118.png" /> it is useless to keep track of the individual values of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260119.png" />. All necessary information about <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260120.png" /> is contained in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260121.png" />, which has a binomial distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260122.png" />.
+
is the exponential dispersion model generated by $\mu$. It has the following striking property: Let $\theta$ be fixed in $\Theta ( \mu )$, let $p _ { 1 } , \dots , p _ { n }$ be in $\Lambda ( \mu )$ and let $X _ { 1 } , \ldots , X _ { n }$ be independent random variables with respective distributions $\mathsf{P} ( \theta , \mu _ { p _ { j } } )$, with $j = 1 , \ldots , n$. Then the distribution of $( X _ { 1 } , \ldots , X _ { n } )$ conditioned by $S = X _ { 1 } + \ldots + X _ { n }$ does not depend on $\theta$. The distribution of $S$ is obviously $\mathsf{P} ( \theta , \mu _ { p } )$ with $p = p _ { 1 } + \ldots + p _ { n }$. Furthermore, if the parameters $p _ { 1 } , \dots , p _ { n }$ are known, and if $\theta$ is unknown, then the maximum-likelihood method to estimate $\theta$ from the knowledge of the observations $( X _ { 1 } , \ldots , X _ { n } )$ is the one obtained from the knowledge of $S$. For instance, if the NEF is the Bernoulli family of distributions $q \delta _ { 0 } + p \delta _ { 1 }$ on $0$ and $1$, if $X _ { 1 } , \ldots , X _ { n }$ are independent Bernoulli random variables with the same unknown $p$, then in order to estimate $p$ it is useless to keep track of the individual values of the $X _ { 1 } , \ldots , X _ { n }$. All necessary information about $p$ is contained in $S$, which has a binomial distribution $B ( n , p )$.
  
Thus, the problem of estimating the canonical parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260123.png" />, given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260124.png" /> independent observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260125.png" />, for a NEF model is reduced to the problem of estimating with only one observation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260126.png" />, whose distribution is in the NEF <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260127.png" />. See [[Natural exponential family of probability distributions|Natural exponential family of probability distributions]] for details about estimation by the maximum-likelihood method. When dealing with a GEF, the problem is reduced to the associated NEF.
+
Thus, the problem of estimating the canonical parameter $\theta$, given $n$ independent observations $X _ { 1 } , \ldots , X _ { n }$, for a NEF model is reduced to the problem of estimating with only one observation $S$, whose distribution is in the NEF $F ( \mu _ { n } )$. See [[Natural exponential family of probability distributions|Natural exponential family of probability distributions]] for details about estimation by the maximum-likelihood method. When dealing with a GEF, the problem is reduced to the associated NEF.
  
Bayesian theory (cf. also [[Bayesian approach|Bayesian approach]]) also constitutes a successful domain of application of exponential families. Given a NEF <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260128.png" /> and a positive measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260129.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260130.png" />, consider the set of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260131.png" /> such that
+
Bayesian theory (cf. also [[Bayesian approach|Bayesian approach]]) also constitutes a successful domain of application of exponential families. Given a NEF $F ( \mu )$ and a positive measure $\alpha ( d \theta )$ on $\Theta ( \mu )$, consider the set of $( v , p ) \in E \times \mathbf{R}$ such that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260132.png" /></td> </tr></table>
+
\begin{equation*} \pi _ { v , p } ( d \theta ) = A ( m , p ) ( L _ { \mu } ( \theta ) ) ^ { - p } \operatorname { exp } \langle \theta , v \rangle \alpha ( d \theta ) \end{equation*}
  
is a probability for some number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260133.png" />, and assume that this set is not empty. This set of a priori distributions on the parameter space is an example of a conjugate family. This means that if the [[Random variable|random variable]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260134.png" /> has distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260135.png" />, then the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260136.png" /> conditioned by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260137.png" /> (a posteriori distribution) is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260138.png" /> for some <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260139.png" /> depending on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260140.png" />. See [[#References|[a1]]] for a complete study; however, [[#References|[a3]]] is devoted to the case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e120/e120260/e120260141.png" />, which has special properties and has, for many years, been the only serious study of the subject.
+
is a probability for some number $A ( v , p )$, and assume that this set is not empty. This set of a priori distributions on the parameter space is an example of a conjugate family. This means that if the [[Random variable|random variable]] $( \theta , X )$ has distribution $\pi _ { v , p } ( d \theta ) \mathsf{P} ( \theta , \mu ) ( d x )$, then the distribution of $\theta$ conditioned by $X = x$ (a posteriori distribution) is $\pi _ { v ^ { \prime } , p ^ { \prime } }$ for some $( v ^ { \prime } , p ^ { \prime } )$ depending on $v , p , x$. See [[#References|[a1]]] for a complete study; however, [[#References|[a3]]] is devoted to the case $\alpha ( d \theta ) = d \theta$, which has special properties and has, for many years, been the only serious study of the subject.
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  S. Bar-Lev,  P. Enis,  G. Letac,  "Sampling models which admit a given general exponential family as a conjugate family of priors"  ''Ann. Statist.'' , '''22'''  (1994)  pp. 1555–1586</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  O. Barndorff-Nielsen,  "Information and exponential families in statistical theory" , Wiley  (1978)</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  P. Diaconis,  D. Ylvizaker,  "Conjugate priors for exponential families"  ''Ann. Statist.'' , '''7'''  (1979)  pp. 269–281</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  B. Jorgensen,  "Exponential dispersion models"  ''J. R. Statist. Soc. Ser. B'' , '''49'''  (1987)  pp. 127–162</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  G. Letac,  "Lectures on natural exponential families and their variance functions" , ''Monogr. Mat.'' , '''50''' , Inst. Mat. Pura Aplic. Rio  (1992)</TD></TR></table>
+
<table><tr><td valign="top">[a1]</td> <td valign="top">  S. Bar-Lev,  P. Enis,  G. Letac,  "Sampling models which admit a given general exponential family as a conjugate family of priors"  ''Ann. Statist.'' , '''22'''  (1994)  pp. 1555–1586</td></tr><tr><td valign="top">[a2]</td> <td valign="top">  O. Barndorff-Nielsen,  "Information and exponential families in statistical theory" , Wiley  (1978)</td></tr><tr><td valign="top">[a3]</td> <td valign="top">  P. Diaconis,  D. Ylvizaker,  "Conjugate priors for exponential families"  ''Ann. Statist.'' , '''7'''  (1979)  pp. 269–281</td></tr><tr><td valign="top">[a4]</td> <td valign="top">  B. Jorgensen,  "Exponential dispersion models"  ''J. R. Statist. Soc. Ser. B'' , '''49'''  (1987)  pp. 127–162</td></tr><tr><td valign="top">[a5]</td> <td valign="top">  G. Letac,  "Lectures on natural exponential families and their variance functions" , ''Monogr. Mat.'' , '''50''' , Inst. Mat. Pura Aplic. Rio  (1992)</td></tr></table>

Revision as of 16:58, 1 July 2020

A certain model (i.e., a set of probability distributions on the same measurable space) in statistics which is widely used and studied for two reasons:

i) many classical models are actually exponential families;

ii) most of the classical methods of estimation of parameters and testing work successfully when the model is an exponential family.

The definitions found in the literature can be rather inelegant or lacking rigour. A mathematically satisfactory definition is obtained by first defining a significant particular case, namely the natural exponential family, and then using it to define general exponential families.

Given a finite-dimensional real linear space $E$, denote by $E ^ { * }$ the space of linear forms $\theta$ from $E$ to $\mathbf{R}$. One writes $\langle \theta , x \rangle $ instead of $\theta ( x )$. Let $\mu$ be a positive measure on $E$ (equipped with Borel sets), and assume that $\mu$ is not concentrated on an affine hyperplane of $E$. Denote by

\begin{equation*} L _ { \mu } ( \theta ) = \int _ { E } \operatorname { exp } \langle \theta , x \rangle \mu ( d x ) \end{equation*}

its Laplace transform and by $D ( \mu )$ the subset of $E ^ { * }$ on which $L _ { \mu } ( \theta )$ is finite. It is easily seen that $D ( \mu )$ is convex. Assume that the interior $\Theta ( \mu )$ of $D ( \mu )$ is not empty. The set of probability measures (cf. also Probability measure) on $E$:

\begin{equation*} F = F ( \mu ) = \{ \mathsf{P} ( \theta , \mu ) : \theta \in \Theta ( \mu ) \}, \end{equation*}

where

\begin{equation*} \mathsf{P} ( \theta , \mu ) ( d x ) = \frac { 1 } { L _ { \mu } ( \theta ) } \operatorname { exp } \langle \theta , x \rangle \mu ( d x ), \end{equation*}

is called the natural exponential family (abbreviated NEF) generated by $\mu$. The mapping

\begin{equation*} \theta \mapsto \mathsf{P} ( \theta , \mu ), \end{equation*}

is called the canonical parametrization of $F ( \mu )$. A simple example of a natural exponential family is given by the family of binomial distributions $B ( n , p )$, $0 < p < 1$, with fixed parameter $n$, generated by the measure

\begin{equation*} \mu ( d x ) = \sum _ { k = 0 } ^ { n } \left( \begin{array} { l } { n } \\ { k } \end{array} \right) \delta _ { k } ( d x ), \end{equation*}

where $\delta _ { k }$ is the Dirac measure (cf. Measure) on $k$ (cf. also Binomial distribution). Here, with $p = e ^ { \theta } / ( 1 + e ^ { \theta } )$ and $q = 1 - p$ one has

\begin{equation*} \mathsf{P} ( \theta , \mu ) ( d x ) = \sum _ { k = 0 } ^ { n } \left( \begin{array} { l } { n } \\ { k } \end{array} \right) p ^ { k } q ^ { n - k } \delta _ { k } ( d x ). \end{equation*}

Note that the canonical parametrization by $\theta$ generally differs from a more familiar parametrization if the natural exponential family is a classical family. This is illustrated by the above example, where the parametrization by $p$ is traditional.

A general exponential family (abbreviated GEF) is defined on an abstract measure space $( \Omega , \mathcal{A} , \nu )$ (the measure $\nu$ is not necessarily bounded) by a measurable mapping $t$ from $\Omega$ to a finite-dimensional real linear space $E$. This mapping $t$ must have the following property: the image $\mu$ of $\nu$ by $t$ must be such that $\mu$ is not concentrated on an affine hyperplane of $E$, and such that $\Theta ( \mu )$ is not empty. Under these circumstances, the general exponential family on $\Omega$ generated by $( t , \nu )$ is:

\begin{equation*} F ( t , \nu ) = \{ \mathsf{P} ( \theta , t , \nu ) : \theta \in \Theta ( \mu ) \}, \end{equation*}

where

\begin{equation*} \mathsf{P} ( \theta , t , \nu ) ( d \omega ) = \frac { 1 } { L _ { \mu } ( \theta ) } \operatorname { exp } \langle \theta , t ( \omega ) \rangle \nu ( d \omega ). \end{equation*}

In this case, the NEF $F ( \mu )$ on $E$ is said to be associated to the GEF $F ( t , \nu )$. In a sense, all results about GEFs are actually results about their associated NEF. The dimension of $E$ is called the order of the general exponential family.

The most celebrated example of a general exponential family is the family of the normal distributions $N ( m , \sigma ^ { 2 } )$ on $\Omega = \mathbf{R}$, where the mean $m$ and the variance $\sigma ^ { 2 }$ are both unknown parameters (cf. also Normal distribution). Here, $\nu ( d \omega ) = d x / \sqrt { 2 \pi }$, the space $E$ is $\mathbf{R} ^ { 2 }$ and $t ( \omega )$ is $( \omega , \omega ^ { 2 } / 2 )$. Here, again, the canonical parametrization is not the classical one but is related to it by $\theta _ { 1 } = m / \sigma ^ { 2 }$ and $\theta _ { 2 } = - 1 / \sigma ^ { 2 }$. The associated NEF is concentrated on a parabola in $\mathbf{R} ^ { 2 }$.

A common incorrect statement about such a model says that it belongs to "the" exponential family. Such a statement is induced by a confusion between a definite probability distribution and a family of them. When a NEF is concentrated on the set of non-negative integers, its elements are sometimes called "power series" distributions, since the Laplace transform is more conveniently written $L _ { \mu } ( \theta ) = f ( e ^ { \theta } )$, where $f$ is analytic around $0$. The same confusion arises here.

There are several variations of the above definition of a GEF: mostly, the parameter $\theta$ is taken to belong to $D ( \mu )$ and not only to $\Theta ( \mu )$, thus obtaining what one may call a full-NEF. A full-GEF is similarly obtained. However, many results are not true anymore for such an extension: for instance, this is the case for the NEF on $\mathbf{R}$ generated by a positive stable distribution $\mu$ with parameter $1/2$: this NEF is a family of inverse Gaussian distributions, with exponential moments, while $\mu$ has no expectation and belongs to the full-NEF. A more genuine extension gives curved exponential families (abbreviated CEF). In this case, the set of parameters is restricted to a non-affine subset of $\Theta ( \mu )$, generally a manifold. However, this extension is in a sense too general, since most of the models in statistics can be regarded as a CEF. The reason is the following: Starting from a statistical model of the form $F = \{ f d \nu : f \in S \}$, where $S$ is a subset of $L ^ { 1 } ( \nu )$, then $F$ is a CEF if and only if the linear subspace of the space $L ^ { 0 } ( \nu )$ generated by the set $\{ \operatorname { log } f : f \in S \}$ is finite dimensional. This is also why exponential families constructed on infinite-dimensional spaces are uninteresting (at least without further structure). For these CEFs, there are no really general results available concerning the application of the maximum-likelihood method. General references are [a2] and [a5].

The exponential dispersion model (abbreviated, EDP) is a concept which is related to natural exponential families as follows: starting from the NEF $F ( \mu )$ on $E$, the Jorgensen set $\Lambda ( \mu )$ is the set of positive $p$ such that there exists a positive measure $\mu _ { p }$ on $E$ whose Laplace transform is $( L _ { \mu } ) ^ { p }$ (see [a4]. Trivially, it contains all positive integers. The model

\begin{equation*} \{ \mathsf{P} ( \theta , \mu _ { p } ) : \theta \in \Theta ( \mu ) , p \in \Lambda ( \mu ) \} \end{equation*}

is the exponential dispersion model generated by $\mu$. It has the following striking property: Let $\theta$ be fixed in $\Theta ( \mu )$, let $p _ { 1 } , \dots , p _ { n }$ be in $\Lambda ( \mu )$ and let $X _ { 1 } , \ldots , X _ { n }$ be independent random variables with respective distributions $\mathsf{P} ( \theta , \mu _ { p _ { j } } )$, with $j = 1 , \ldots , n$. Then the distribution of $( X _ { 1 } , \ldots , X _ { n } )$ conditioned by $S = X _ { 1 } + \ldots + X _ { n }$ does not depend on $\theta$. The distribution of $S$ is obviously $\mathsf{P} ( \theta , \mu _ { p } )$ with $p = p _ { 1 } + \ldots + p _ { n }$. Furthermore, if the parameters $p _ { 1 } , \dots , p _ { n }$ are known, and if $\theta$ is unknown, then the maximum-likelihood method to estimate $\theta$ from the knowledge of the observations $( X _ { 1 } , \ldots , X _ { n } )$ is the one obtained from the knowledge of $S$. For instance, if the NEF is the Bernoulli family of distributions $q \delta _ { 0 } + p \delta _ { 1 }$ on $0$ and $1$, if $X _ { 1 } , \ldots , X _ { n }$ are independent Bernoulli random variables with the same unknown $p$, then in order to estimate $p$ it is useless to keep track of the individual values of the $X _ { 1 } , \ldots , X _ { n }$. All necessary information about $p$ is contained in $S$, which has a binomial distribution $B ( n , p )$.

Thus, the problem of estimating the canonical parameter $\theta$, given $n$ independent observations $X _ { 1 } , \ldots , X _ { n }$, for a NEF model is reduced to the problem of estimating with only one observation $S$, whose distribution is in the NEF $F ( \mu _ { n } )$. See Natural exponential family of probability distributions for details about estimation by the maximum-likelihood method. When dealing with a GEF, the problem is reduced to the associated NEF.

Bayesian theory (cf. also Bayesian approach) also constitutes a successful domain of application of exponential families. Given a NEF $F ( \mu )$ and a positive measure $\alpha ( d \theta )$ on $\Theta ( \mu )$, consider the set of $( v , p ) \in E \times \mathbf{R}$ such that

\begin{equation*} \pi _ { v , p } ( d \theta ) = A ( m , p ) ( L _ { \mu } ( \theta ) ) ^ { - p } \operatorname { exp } \langle \theta , v \rangle \alpha ( d \theta ) \end{equation*}

is a probability for some number $A ( v , p )$, and assume that this set is not empty. This set of a priori distributions on the parameter space is an example of a conjugate family. This means that if the random variable $( \theta , X )$ has distribution $\pi _ { v , p } ( d \theta ) \mathsf{P} ( \theta , \mu ) ( d x )$, then the distribution of $\theta$ conditioned by $X = x$ (a posteriori distribution) is $\pi _ { v ^ { \prime } , p ^ { \prime } }$ for some $( v ^ { \prime } , p ^ { \prime } )$ depending on $v , p , x$. See [a1] for a complete study; however, [a3] is devoted to the case $\alpha ( d \theta ) = d \theta$, which has special properties and has, for many years, been the only serious study of the subject.

References

[a1] S. Bar-Lev, P. Enis, G. Letac, "Sampling models which admit a given general exponential family as a conjugate family of priors" Ann. Statist. , 22 (1994) pp. 1555–1586
[a2] O. Barndorff-Nielsen, "Information and exponential families in statistical theory" , Wiley (1978)
[a3] P. Diaconis, D. Ylvizaker, "Conjugate priors for exponential families" Ann. Statist. , 7 (1979) pp. 269–281
[a4] B. Jorgensen, "Exponential dispersion models" J. R. Statist. Soc. Ser. B , 49 (1987) pp. 127–162
[a5] G. Letac, "Lectures on natural exponential families and their variance functions" , Monogr. Mat. , 50 , Inst. Mat. Pura Aplic. Rio (1992)
How to Cite This Entry:
Exponential family of probability distributions. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Exponential_family_of_probability_distributions&oldid=17972
This article was adapted from an original article by Gérard Letac (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article