# Bayesian approach

*to statistical problems*

An approach based on the assumption that to any parameter in a statistical problem there can be assigned a definite probability distribution. Any general statistical decision problem is determined by the following elements: by a space $ (X,\ {\mathcal B} _ {X} ) $ of (potential) samples $ x $, by a space $ ( \Theta ,\ {\mathcal B} _ \Theta ) $ of values of the unknown parameter $ \theta $, by a family of probability distributions $ \{ { {\mathsf P} _ \theta } : {\theta \in \Theta } \} $ on $ (X,\ {\mathcal B} _ {X} ) $, by a space of decisions $ (D,\ {\mathcal B} _ {D} ) $ and by a function $ L( \theta ,\ d) $, which characterizes the losses caused by accepting the decision $ d $ when the true value of the parameter is $ \theta $. The objective of decision making is to find in a certain sense an optimal rule (decision function) $ \delta = \delta (x) $, assigning to each result of an observation $ x \in X $ the decision $ \delta (x) \in D $. In the Bayesian approach, when it is assumed that the unknown parameter $ \theta $ is a random variable with a given (a priori) distribution $ \pi = \pi (d \theta ) $ on $ ( \Theta ,\ {\mathcal B} _ \Theta ) $ the best decision function (Bayesian decision function) $ {\delta ^ {*} } = {\delta ^ {*} } (x) $ is defined as the function for which the minimum expected loss $ \inf _ \delta \ \rho ( \pi ,\ \delta ) $, where

$$ \rho ( \pi ,\ \delta ) \ = \ \int\limits _ \Theta \rho ( \theta ,\ \delta ) \ \pi (d \theta ) , $$

and

$$ \rho ( \theta ,\ \delta ) \ = \ \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx) $$

is attained. Thus,

$$ \rho ( \pi ,\ \delta ^ {*} ) \ = \ \inf _ \delta \ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx) \ \pi ( d \theta ) . $$

In searching for the Bayesian decision function $ \delta ^ {*} = \delta ^ {*} (x) $, the following remark is useful. Let $ {\mathsf P} _ \theta (dx) = p (x \mid \theta ) \ d \mu (x) $, $ \pi (d \theta ) = \pi ( \theta ) \ d \nu ( \theta ) $, where $ \mu $ and $ \nu $ are certain $ \sigma $- finite measures. One then finds, assuming that the order of integration may be changed,

$$ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx ) \ \pi ( d \theta )\ = $$

$$ = \ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) p ( x \mid \theta ) \pi ( \theta ) \ d \mu (x) \ d \nu ( \theta )\ = $$

$$ = \ \int\limits _ { X } \ d \mu (x) \left [ \int\limits _ \Theta L ( \theta ,\ \delta (x)) p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) \right ] . $$

It is seen from the above that for a given $ x \in X ,\ \delta ^ {*} (x) $ is that value of $ d ^ {*} $ for which

$$ \inf _ { d } \ \int\limits _ \Theta L ( \theta ,\ d) p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) $$

is attained, or, what is equivalent, for which

$$ \inf _ { d } \ \int\limits _ \Theta L ( \theta ,\ d) \frac{p (x \mid \theta ) \pi ( \theta ) }{p (x) } \ d \nu ( \theta ) , $$

is attained, where

$$ p (x) \ = \ \int\limits _ \Theta p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) . $$

But, according to the Bayes formula

$$ \int\limits _ \Theta L( \theta ,\ d) \frac{p (x \mid \theta ) \pi ( \theta ) }{p (x) } \ d \nu ( \theta ) \ = \ {\mathsf E} [L ( \theta ,\ d) \mid x]. $$

Thus, for a given $ x $, $ \delta ^ {*} (x) $ is that value of $ d ^ {*} $ for which the conditional average loss $ {\mathsf E} [L ( \theta ,\ d) \mid x] $ attains a minimum.

Example. (The Bayesian approach applied to the case of distinguishing between two simple hypotheses.) Let $ \Theta = \{ \theta _ {1} ,\ \theta _ {2} \} $, $ D = \{ d _ {1} ,\ d _ {2} \} $, $ L _ {ij } = L = ( \theta _ {i} ,\ d _ {j} ) $, $ i,\ j = 1,\ 2 $; $ \pi ( \theta _ {1} ) = \pi _ {1} $, $ \pi ( \theta _ {2} ) = \pi _ {2} $, $ \pi _ {1} + \pi _ {2} = 1 $. If the solution $ d _ {i} $ is identified with the acceptance of the hypothesis $ H _ {i} $: $ \theta = \theta _ {i} $, it is natural to assume that $ L _ {11} < L _ {12} $, $ L _ {22} < L _ {21} $. Then

$$ \rho ( \pi ,\ \delta ) \ = \ \int\limits _ { X } [ \pi _ {1} p (x \mid \theta _ {1} ) L ( \theta _ {1} ,\ \delta ( x)) + $$

$$ + {} \pi _ {2} p (x \mid \theta _ {2} ) L ( \theta _ {2} ,\ \delta (x))] \ d \mu (x) $$

implies that $ \inf _ \delta \ \rho ( \pi ,\ \delta ) $ is attained for the function

$$ \delta ^ {*} (x) \ = \ \left \{ \begin{array}{l} d _ {1} ,\ \ \textrm{ if } \ \frac{p (x \mid \theta _ {2} ) }{p (x \mid \theta _ {1} ) } \ \leq \ \frac{\pi _ {1} }{\pi _ {2} } \ \frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} } , \\ d _ {2} ,\ \ \textrm{ if } \ \frac{p (x \mid \theta _ {2} ) }{p (x \mid \theta _ {1} ) } \ \geq \ \frac{\pi _ {1} }{\pi _ {2} } \ \frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} } . \\ \end{array} \right . $$

The advantage of the Bayesian approach consists in the fact that, unlike the losses $ \rho ( \theta ,\ \delta ) $, the expected losses $ \rho ( \pi ,\ \delta ) $ are numbers which are dependent on the unknown parameter $ \theta $, and, consequently, it is known that solutions $ \delta _ \epsilon ^ {*} $ for which

$$ \rho ( \pi ,\ \delta _ \epsilon ^ {*} ) \ \leq \ \inf _ \delta \ \rho ( \pi ,\ \delta ) + \epsilon , $$

and which are, if not optimal, at least $ \epsilon $- optimal $ ( \epsilon > 0) $, are certain to exist. The disadvantage of the Bayesian approach is the necessity of postulating both the existence of an a priori distribution of the unknown parameter and its precise form (the latter disadvantage may be overcome to a certain extent by adopting an empirical Bayesian approach, cf. Bayesian approach, empirical).

#### References

[1] | A. Wald, "Statistical decision functions" , Wiley (1950) |

[2] | M.H. de Groot, "Optimal statistical decisions" , McGraw-Hill (1970) |

**How to Cite This Entry:**

Bayesian approach.

*Encyclopedia of Mathematics.*URL: http://www.encyclopediaofmath.org/index.php?title=Bayesian_approach&oldid=44399