Bayes, Thomas

From Encyclopedia of Mathematics
Jump to: navigation, search
Copyright notice
This article Thomas Bayes was adapted from an original article by D.V. Lindley, which appeared in StatProb: The Encyclopedia Sponsored by Statistics and Probability Societies. The original article ([ StatProb Source], Local Files: pdf | tex) is copyrighted by the author(s), the article has been donated to Encyclopedia of Mathematics, and its further issues are under Creative Commons Attribution Share-Alike License'. All pages from StatProb are contained in the Category StatProb.

Thomas BAYES

b.c. 1701 - d. 7 April 1761

Summary. The problem of passing from a population to the properties of a sample was one of the first studied in probability. Thomas Bayes, a nonconformist minister, was the first to solve the inverse problem of passage from sample to population, using ideas that are widely used today.

Thomas Bayes, born in London, the son of a nonconformist minister, spent most of his adult life in a similar position in Tunbridge Wells, England. He was educated at Edinburgh University and was elected a fellow of the Royal Society in 1742. During his lifetime he published a few mathematical papers, of which the best-known is a 1736 defence of Newton's ideas against an attack by Bishop Berkeley. He is today remembered for a paper that his friend Richard Price claimed to have found amongst his possessions after death. It appeared in the Society's Transactions in 1763 and has often been republished. Apart from these bare facts, surprisingly little is known of Bayes' life.

By the middle of the 18th century it was well-understood that if, to use modern terminology, in each of $n$ independent trials, the chance of success had the same value, $\theta$ say; then the probability of exactly $r$ successes was given by the binomial distribution $$P(r|\theta,n) = {n \choose r}\theta^r(1-\theta)^{n-r}.$$

Jacob Bernoulli (q.v.) had established the weak law of large numbers and de Moivre (q.v.) had found the normal approximation to the binomial. The passage from a known value of $\theta$ to the empirical observation of $r$ was therefore extensively appreciated. Bayes studied the inverse problem; what did the data $(r,n)$ say about the chance $\theta$? There already existed partial answers. For example, Arbuthnot had observed $r$ male, and $n-r$ female, births with $r$ considerably in excess of $1/2n$. He argued that, on the basis of the binomial with $\theta = 1/2$, a value of $r$ as high as this was so improbable that $\theta$ could not be $1/2$. That idea has been much extended into the modern form of a significance test and its associated $P$-value or significance level.

Bayes proceeded differently using the theorem that nowadays always bears his name, though it does not appear explicitly in the 1763 paper, $$P(A|B) = P(B|A)P(A)/P(B)$$

for events $A, B$ with $P(B)\neq 0$. The theorem permits the inversion of the events in $P(B|A)$ into $P(A|B)$. Applied when $A$ refers to $\theta$ and $B$ to the empirical $r$, we have $$P(\theta|r,n) \propto P(r|\theta,n)P(\theta|n).$$

(The missing constant of proportionality does not depend on $\theta$. It is $P(r|n)^{-1}$ but is most easily found by making the product integrate to one by multiplying by the constant). The result effects the passage from the binomial, on the right, to a probability statement about the change, on the left. It therefore becomes possible to pass from the data to a statement about what are probable, and what are improbable, values of the chance.

This elegantly and simply solves the problem, except for one difficulty. It requires a value for $P(\theta|n)$, a probability distribution for the chance before the result of the trials has been observed. It is usual to describe this as the prior distribution (prior, that is, to $r$) and the final result as the posterior distribution. Thus the theorem describes how your views of $\theta$ change, from prior to posterior, as a result of data $r$. Bayes discussed the choice of prior but his approach is ambiguous. He is usually supposed to have taken $P(\theta|n)$ uniform in (0,1) - the so-called Bayes's postulate - but an alternative reading suggests he took $P(r|n)$ to be uniform. Mathematically these lead to the same result.

Little notice was taken of the 1763 paper at the time. It was first appreciated by Laplace (q.v.), in the early years of the next century, who used the ideas in his eclectic approach to probability. The theorem is of basic importance because it provides a solution to the general problem of inference or induction. Let $H$ be a universal hypothesis and $E$ empirical evidence bearing on $H$. A simple example might be $H$, all swans are white, and $E$ the observation of the colour of a swan. A more sophisticated one would have $H$ as Newton's laws and $E$ observation of the motions of the planets. In either case, $P(E|H)$ can be calculated. Bayes's theorem says $$P(H|E) \propto P(E|H)P(H),$$

expressing a view about the hypothesis, given the evidence, in terms of the known probability of the evidence, given the hypothesis, and the prior view about $H$. As more evidence supporting $H$ accrues, having large probability on $H$, so even the sceptic, with low $P(H)$, will become convinced, $P(H|E)$ will approach one and the hypothesis accepted. Many people, following Jeffreys (q.v.), who extensively developed these ideas into a practicable scientific tool, hold that this provides a description of the scientific method. This view differs from that of Popper, who only admits refutation of a hypothesis and whose attitude to probability is regarded as unsound by supporters of Jeffrey's ideas.

Recently Bayes's theorem has been used as a means of processing evidence in a court of law. Let $G$ be the hypothesis that the defendant is truly guilty of the offence with which he or she has been charged and $E$ a new piece of evidence. Then applying the theorem both to $G$ and to $\bar G$, denoting innocence, $$\frac{P(G|E)}{P(\bar G|E)} = \frac{P(E|G)}{P(E|\bar G)}\;\frac{P(G)}{P(\bar G)}.$$

The expression on the left is the odds on guilt, given $E$; that on the right is the same odds without $E$. The remaining term is the likelihood ratio, being a comparison of the probability of the evidence, supposing guilt, to the same probability supposing innocence. Forensic scientists present evidence to the court in the form of a likelihood ratio. The court can then multiply their former (prior) odds by the likelihood ratio to obtain new (posterior) odds as a result of hearing the evidence.

Bayes's result has attained increased importance following work by Ramsey, de Finetti and Savage between 1925 and 1955, which demonstrated that our knowledge had to be based on probability and that our beliefs must obey the rules of the probability calculus, of which Bayes's is essentially the multiplication rule. In this view, the significance levels of the classical school are unsound because they do not express opinions about hypotheses like $H$, or parameters like $\theta$, in terms of direct probabilities of $H$ or $\theta$. The resulting methodology is called Bayesian statistics and has rather different procedures and results from those of the classical school. Bayesians regard probability as a measure of a person's belief, whereas the classical school only admits probability as a frequency concept.

Bayes's result is also central to modern ideas on decision-making under uncertainty. Suppose there is a choice to be made amongst a set ${d}$ of decisions in the presence of uncertainty about a parameter $\theta$. The work of Ramsey and others leads to the introduction of a utility function $u(d, \theta)$; describing the worth of decision $d$ when the parameter has the value $\theta$, and the choice of that $d$ which maximizes the expected utility $\sigma_{\theta}u(d,\theta)P(\theta)$. Additional evidence $E$ updates $P(\theta)$ to $P(\theta|E)$, by the theorem, and improves the decision-making. All this is a long way from Bayes's original problem and its resolution. He would doubtless be astonished were he to realize how his wonderful idea has been extended and his name used.


[1] The original paper appeared in The Philosophical Transactions of the Royal Society of London (1763) 53, 370-418. There is a reprint in Biometrika (1958) 45, 296-315. An illuminating commentary on it is provided by S.M. Stigler (1982) Thomas Bayes's Bayesian Inference. Journal of the Royal Statistical Society, Series A, 145, 250-258. The most complete biography is provided by A.W.F. Edwards in the latest edition of The Dictionary of National Biography.
[2] Two recent books on modern Bayesion methods are A.O'Hagan (1994) Bayesian Inference. Vol.2B of Kendall's Advanced Theory of Statistics. Edward Arnold, London; John Wiley, New York. J.M. Bernardo $&$ A.F.M. Smith (1994) Bayesian Theory. John Wiley, Chichester. The latter is part of a forthcoming 3-volume work and has an extensive bibliography. The modern `classic' is B. de Finetti (1974/5) Theory of Probability. John Wiley, London, in 2 volumes, translated from the Italian.
[3] C.G.G. Aitken (1995) Statistics and the Evaluation of Evidence for Forensic Scientists. John Wiley, Chichester, deals with legal applications. D.V. Lindley (1985) Making Decisions. John Wiley, London, extends Bayesian ideas to decision-making.

Reprinted with permission from Christopher Charles Heyde and Eugene William Seneta (Editors), Statisticians of the Centuries, Springer-Verlag Inc., New York, USA.

How to Cite This Entry:
Bayes, Thomas. Encyclopedia of Mathematics. URL:,_Thomas&oldid=39173