\documentclass[12pt]{article}
\pagestyle{empty}
\setlength{\paperwidth}{8.5in}
\setlength{\paperheight}{11in}
\setlength{\topmargin}{0.00in}
\setlength{\headsep}{0.00in}
\setlength{\headheight}{0.00in}
\setlength{\evensidemargin}{0.00in}
\setlength{\oddsidemargin}{0.00in}
\setlength{\textwidth}{6.5in}
\setlength{\textheight}{9.00in}
\setlength{\voffset}{0.00in}
\setlength{\hoffset}{0.00in}
\setlength{\marginparwidth}{0.00in}
\setlength{\marginparsep}{0.00in}
\setlength{\parindent}{0.00in}
\setlength{\parskip}{0.15in}
% this is the default PlanetMath preamble. as your knowledge
% of TeX increases, you will probably want to edit this, but
% it should be fine as is for beginners.
% almost certainly you want these
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{amsfonts}
% used for TeXing text within eps files
%\usepackage{psfrag}
% need this for including graphics (\includegraphics)
%\usepackage{graphicx}
% for neatly defining theorems and propositions
%\usepackage{amsthm}
% making logically defined graphics
%\usepackage{xypic}
% there are many more packages, add them here as you need them
% define commands here
\begin{document}
\noindent{\bf William Sealy GOSSET}\\
b. 13 June 1876 - d. 16 October 1937
\vspace{.5 cm}
\noindent{\bf Summary.} Better known by his pseudonym, `Student',
Gosset's name is
associated with the discovery of the {\em t}-distribution and its use, and he
had a profound effect on the practice of statistics in industry and
agriculture.
\vspace{.5 cm}
William Sealy Gosset was born in Canterbury, England.
He received a degree from Oxford University in Chemistry and
went to work as a ``brewer'' in 1899 at Arthur Guinness Son and
Co.\ Ltd.\ in Dublin, Ireland. He died in
Beaconsfield, England at the age of 61, still in the employ of
Guinness.
By the circumstances of his work, Gosset was led early in his career
at Guinness to examine the relationship between the raw materials for
beer and the finished product, and this activity naturally led him to
learn the tools of statistical analysis. In 1905, Gosset sought out
the advice of Karl Pearson (q.v.) and subsequently spent the better part of a
year, in 1906-1907, in Pearson's Biometric Laboratory at University
College London, where he worked on small sample statistics problems.
Gosset then produced a pair of papers that were published in {\em
Biometrika} in 1908, under the nom de plume, `Student.' The
first of these derived what we now know as `Student's' {\em
t}-distribution,
and the second dealt with the small sample distribution of Pearson's
correlation coefficient. These contributions placed Gosset among the
great men of the newly emerging field of statistical methodology. In
fact, the {\em t}-test based on his 1908 paper is perhaps the single
most
widely used statistical tool in
applications.
In the years that followed, Gosset worked on a variety of statistical
problems in agriculture, including experiments. He was in active
correspondence with the leading English statisticians of his day,
including Karl Pearson, Egon Pearson (q.v.), and R. A. Fisher (q.v.).
Gosset's
correspondence with Fisher dealt with highly varied topics and was, as
Plackett and Barnard note, ``interspersed with friendly advice on both
sides.'' In his later years, he had a number of public disagreements
with Fisher over the role of randomisation in experimentation. Gosset
was a strong advocate of experimental control, a point that came
through quite vividly in his proposal in connection with the
Lanarkshire milk experiment in `Student' (1931), although in this
paper he was also critical of an evaluation of the study carried out
by Bartlett and Fisher (1931). In particular, Gosset was enamoured by
the use of systematic experimental plans and opposed the use of
randomisation. This controversy led Gosset to prepare his
final paper (`Student,' 1937) published a few months after his death.
In the next section, we comment on some of the technical details of
Gosset's seminal 1908 contributions. For further details on Gosset's
life and contributions, see Plackett and Barnard (1990). Gosset's
writings are collected in `Student' (1942).
\vspace{.2in}
\noindent
{\bf Gosset on the Mean and the Correlation Coefficient}\\
\noindent{\it Small Sample Theory of the Mean}
In 1908, Gosset's work at the Guinness brewery led him to
publish the results that would become associated with his
name in future generations. In an article entitled
``The probable error of a mean'' (`Student', 1908a), he established
the sampling distributions of $s^2$ and $s$ for an
independent and identically distributed sample
of size $n$ from a normal population. He then showed that
the mean and the standard deviation of such a sample are uncorrelated
and derived what we now know as `Student's' $t$-distribution.
At time of publication, the importance of these results was
not fully recognised. The focus among most contemporary statisticians
was on large-sample theory and Gosset's emphasis on small samples,
arising from his work at the brewery, set him somewhat apart. In
fact, it was not until Fisher generalised `Student's' $t$-distribution
that
it came into widespread use outside of the Guinness brewery itself.
Aside from the derivations mentioned above, there are a number of
interesting features of the 1908 manuscript. First is the break from
the tradition of the Biometric School, which used the same symbol for
both the population parameter and the sample statistic. In Gosset's
paper, he uses $s^2$ for the sample variance and $\sigma^2$ for the
population variance. Work with large samples had obscured the need
for this distinction, which became clearer when the focus was shifted
to small samples (Pearson, 1939).
Another aspect of this paper worthy of note is Gosset's use of a
sampling
experiment to help empirically solve the problem at hand, instead of
finding an analytic solution. The essence of the simulation was the
following -- using data on the height and left middle finger
measurements of 3000 criminals, he generated 750 random samples of
size 4. Gosset then calculated the means, standard deviations and
correlation coefficient of each sample as well as $t$-statistics.
He plotted the empirical distributions of the latter
and compared them to the theoretical ones he had derived. Using
$\chi^2$ tests for goodness of fit, he deemed the results to be
satisfactory. In connection with this empirical study, Gosset noted
that ``...if the distribution is approximately normal our theory
gives us a satisfactory measure of the certainty to be derived from
a small sample...'' ( `Student', 1908a, p. 19). Furthermore, ``[i]f
the distribution is not normal, the mean and the standard deviation
of a sample will be positively correlated, so that although both will
have greater variability, yet they will tend to counteract each other,
a mean deviating largely from the general mean tending to be divided
by a larger standard deviation. Consequently, I believe that the
table given... below may be used in estimating the degree of certainty
arrived at by the mean of a few experiments, in the case of most
laboratory or biological work where the distributions are as a rule...
sufficiently nearly normal'' (ibid). Gosset's intuition that the
$t$-test would be robust against small departures from normality,
while not proven here, would later be verified (Pearson, 1929;
Geary (q.v.), 1936, 1947).
This paper implicitly takes an inverse probability approach, although
there is no discussion of prior distributions. We encounter, for
instance, statements such as ``Thus, to take the tables for samples
of 6, the probability of the mean of the population lying between
$-\infty$ and once the standard deviation of the sample is 0.9622...''
(`Student', 1908a, p. 20). Jeffreys (q.v.) (1937) was later to observe exactly
how Gosset's derivation coincided with his own based on inverse
probability. Interestingly, a treatment of the small
sample theory of the mean from the inverse probability perspective
had appeared earlier, in a paper by Edgeworth (q.v.) (1883) who also derived
the
$t$-distribution. Edgeworth's derivation, however, was heavily
reliant on the form of the prior distribution for $\mu$ and $\sigma$,
which he assumed had the form $C\sigma^{-2} d\mu d\sigma$. Gosset
appears to have been unaware of this contribution of
Edgeworth. Welch (1958) provides a thorough discussion of
Edgeworth's 1883 paper and its connection to `Student's' own work
(see also Stigler, 1978). More recently, Pfanzagl and Sheynin (1996)
present evidence of an even earlier derivation, in 1876, of a generalization of
the $t$-distribution by the German mathematician
Jakob L\"{u}roth,
also from what we would call a Bayesian perspective.
\vspace{.1in}
\noindent{\it The Correlation Coefficient}
In addition to the famous article establishing the $t$-distribution,
there appeared in 1908 another effort by `Student', this time dealing
with the correlation coefficient (`Student', 1908b). Gosset did not
actually succeed in deriving a sampling distribution for $r$, instead
relying on the empirical method used in the first paper to establish
properties of the distribution of $r$ assuming that $\rho=0$. We note
that here the inverse probability approach is much more obviously
stated than in Gosset's article dealing with the sample mean; in
particular, he suggests various ``priors'' for $\rho$, the population
correlation. Since he was not able to give a concrete expression for
the sampling distribution $f(r|\rho)dr$, Gosset was also unable to
write down the posterior distribution for $\rho$. Subsequently, in
his first major contribution to mathematical statistics, R. A. Fisher
(1915) derived the sampling distribution of $r$, using a geometrical
argument, and this work led to the famous Fisher-Gosset
correspondence. In this same paper, Fisher also established that
for the case of normal data under consideration,
the mean and standard deviation are
not only uncorrelated, but independent.
\vspace{.2in}
\noindent
{\bf Gosset on Experimental Design}
R. A. Fisher's correspondence with Gosset began in 1912, when Fisher
sent Gosset a copy of his paper applying maximum likelihood (as it
would later come to be known) to estimate the mean and variance of a
normal population. They did not meet until a decade later, however,
when Gosset visited Rothamsted and presented Fisher with a copy of his
statistical tables. They continued to correspond on a variety of
topics and, in 1923, there was an exchange of letters between the two
on Fisher's work with Mackenzie on the design of experiments, in which
Gosset advocated the use of systematic field arrangements, in essence
rejecting Fisher's proposal for randomisation. Their disagreement on
the use of randomisation continued in private correspondence (see
various excerpts in Plackett and Barnard,1990, Chapter 5) and could
hardly be read into Gosset's only public criticism of Fisher, in the
context of a published comment on the infamous Lanarkshire milk
experiment (`Student', 1931).
In 1936, however, the debate became public in a discussion of a paper
read before the Royal Statistical Society on `co-operation in
large-scale experiments.' Gosset led off the discussion by extolling
the virtues of Beaven's half-drill strip systematic design, and
Fisher, who spoke next, expressed his opposition to such systematic
designs. This was followed by a paper by Fisher and Barbacki
criticising `the supposed precision of systematic designs'
and an exchange of letters between Fisher and Gosset in {\em Nature}.
At the time of his death, Gosset was working on a detailed response to
Fisher in which he once again put forth his support of systematic
experimentation and expressed doubts about the role of randomisation.
After so many years, they had not resolved their differences on this
fundamental statistical issue. The paper appeared posthumously in
1938, and when he read it Fisher observed in a letter to Harold
Jeffreys:
\begin{quote}
``So far as I can judge, `Student' and I would have differed
quite inappreciably
on randomisation if we had seen enough of each other to
know exactly what
the other meant, and if he had not felt in duty bound,
not only to extol
the merits, but also to deny the defects of Beaven's half drill
strip system. \\
...
``I fancy also that Gosset never realised that a fertility
gradient when, as in
my experience is not very frequent,
it is important enough to bother about,
can easily be eliminated from a randomised experiment. It is,
I think, my fault
that I have not made this clear earlier, but until the last two
years I had really
thought that `Student' accepted all that I had put forward on
behalf of randomisation.'' (Bennett (1990), pp. 271-272.)
\end{quote}
\vspace{.5cm}
\noindent {\bf References}
\noindent Bartlett, S. and Fisher, R.A. (1931). Pasteurized and raw milk.
{\em Nature}, {\bf 127}, 591--592.
\noindent Bennett J.H. (1990). {\em Statistical Inference and Analysis.
Selected Correspondence of R.A. Fisher.} Oxford, pp.271--272.
\noindent Fisher, R. A. (1915). Frequency distribution of the values of the
correlation coefficient in samples from an indefinitely large
population.
{\em Biometrika}, {\bf 10}, 507--521.
\noindent Geary, R. C. (1936). The distribution of `Student's' ratio for
non-normal samples. {\em Journal of the Royal Statistical Society
Supplement} (superseded by {\em Series B}), {\bf 3}, 178--184.
\noindent Geary, R. C. (1947). Testing for normality. {\em Biometrika},
{\bf 34}, 209--242.
\noindent Jeffreys, H. (1937). On the relation between direct and inverse
methods
in statistics. {\em Proc. R. Soc. A}, {\bf 160}, 325--348.
\noindent Pearson, E. S. (1929). The distribution of frequency constants
in small samples from non-normal symmetrical and skew populations.
{\em Biometrika}, {\bf 21}, 259--286.
\noindent Pearson, E. S. (1939). `Student' as statistician. {\em
Biometrika}, {\bf 30}, 210--250.
\noindent Pfanzagl, J. and Sheynin, O. (1996). Studies in the history
of probability and statistics XLIV: A forerunner of the
$t$-distribution. {\em Biometrika}, {\bf 83}, 891--898.
\noindent Plackett, R. L. and Barnard, G.A. (1990). {\em `Student': A
Statistical Biography of William Sealy Gosset. Based on the
writings of E.S. Pearson.} Oxford: Clarendon Press.
\noindent Stigler, S. (1978). Francis Ysidro Edgeworth, Statistician (with
Discussion). {\em Journal of the Royal Statistical Society}, {\bf
141}, 287--322.
\noindent `Student' (1908a). The probable error of a mean. {\em Biometrika},
{\bf 6}, 1--25.
\noindent `Student' (1908b). Probable error of a correlation coefficient.
{\em Biometrika}, {\bf 6}, 302--310.
\noindent `Student' (1931). The Lanarkshire milk experiment. {\em
Biometrika}, {\bf 23}, 398--406.
\noindent `Student' (1937). Random and balanced arrangements. {\em
Biometrika}, {\bf 29}, 363--379.
\noindent `Student' (1942). {\em `Student's' Collected Papers.} (ed. by
E.S. Pearson and J. Wishart), with a forward by L. McMullen.
Biometrika Office, University College.
\noindent Welch B. L. (1958). `Student' and small sample theory. {\em
Journal of the American Statistical Association}, {\bf 53},
777-788.
\vspace{1 cm}
\hfill{Stephen E. Fienberg and Nicole Lazar}
\end{document}