Namespaces
Variants
Actions

Difference between revisions of "ANOVA"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (Automatically changed introduction)
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
<!--This article has been texified automatically. Since there was no Nroff source code for this article,
 +
the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist
 +
was used.
 +
If the TeX and formula formatting is correct and if all png images have been replaced by TeX code, please remove this message and the {{TEX|semi-auto}} category.
 +
 +
Out of 545 formulas, 539 were replaced by TEX code.-->
 +
 +
{{TEX|semi-auto}}{{TEX|part}}
 
''analysis of variance''
 
''analysis of variance''
  
 
Here, ANOVA will be understood in the wide sense, i.e., equated to the univariate linear model whose model equation is
 
Here, ANOVA will be understood in the wide sense, i.e., equated to the univariate linear model whose model equation is
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a1302401.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a1)</td></tr></table>
+
\begin{equation} \tag{a1} \bf y = X \beta + e, \end{equation}
  
in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a1302402.png" /> is an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a1302403.png" /> observable random vector, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a1302404.png" /> is a known <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a1302405.png" />-matrix (the "design matrix" ), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a1302406.png" /> is an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a1302407.png" />-vector of unknown parameters, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a1302408.png" /> is an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a1302409.png" />-vector of unobservable random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024010.png" /> (the "errors" ) that are assumed to be independent and to have a [[Normal distribution|normal distribution]] with mean <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024011.png" /> and unknown variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024012.png" /> (i.e., the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024013.png" /> are independent identically distributed <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024014.png" />). It is assumed throughout that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024015.png" />. Inference is desired on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024016.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024017.png" />. The <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024018.png" /> may represent measurement error and/or inherent variability in the experiment. The model equation (a1) can also be expressed in words by: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024019.png" /> has independent normal elements <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024020.png" /> with common, unknown variance and expectation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024021.png" />, in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024022.png" /> is known and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024023.png" /> is unknown. In most experimental situations the assumptions made on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024024.png" /> should be regarded as an approximation, though often a good one. Studies on some of the effects of deviations from these assumptions can be found in [[#References|[a48]]], Chap. 10, and [[#References|[a51]]] discusses diagnostics and remedies for lack of fit in linear regression models. To a certain extent the ANOVA ideas have been carried over to discrete data, then called the log-linear model; see [[#References|[a6]]], and [[#References|[a10]]].
+
in which $\mathbf{y}$ is an $n \times 1$ observable random vector, $\mathbf{X}$ is a known $( n \times m )$-matrix (the "design matrix" ), $\beta$ is an $( m \times 1 )$-vector of unknown parameters, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a1302408.png"/> is an $( n \times 1 )$-vector of unobservable random variables $e _ { i }$ (the "errors" ) that are assumed to be independent and to have a [[Normal distribution|normal distribution]] with mean $0$ and unknown variance $\sigma ^ { 2 }$ (i.e., the $e _ { i }$ are independent identically distributed $N ( 0 , \sigma ^ { 2 } )$). It is assumed throughout that $n &gt; m$. Inference is desired on $\beta$ and $\sigma ^ { 2 }$. The $e _ { i }$ may represent measurement error and/or inherent variability in the experiment. The model equation (a1) can also be expressed in words by: $\mathbf{y}$ has independent normal elements $y _ { i }$ with common, unknown variance and expectation $\mathsf E ( \mathbf y ) = \mathbf X \beta$, in which $\mathbf{X}$ is known and $\beta$ is unknown. In most experimental situations the assumptions made on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024024.png"/> should be regarded as an approximation, though often a good one. Studies on some of the effects of deviations from these assumptions can be found in [[#References|[a48]]], Chap. 10, and [[#References|[a51]]] discusses diagnostics and remedies for lack of fit in linear regression models. To a certain extent the ANOVA ideas have been carried over to discrete data, then called the log-linear model; see [[#References|[a6]]], and [[#References|[a10]]].
  
MANOVA (multivariate analysis of variance) is the multivariate generalization of ANOVA. Its model equation is obtained from (a1) by replacing the column vectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024025.png" /> by matrices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024026.png" /> to obtain
+
MANOVA (multivariate analysis of variance) is the multivariate generalization of ANOVA. Its model equation is obtained from (a1) by replacing the column vectors $\mathbf{y} , \beta , \mathbf{e}$ by matrices $\mathbf{Y} , \mathbf{B} , \mathbf{E}$ to obtain
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024027.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a2)</td></tr></table>
+
\begin{equation} \tag{a2} \bf Y = X B + E, \end{equation}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024028.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024029.png" /> are <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024030.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024031.png" /> is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024032.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024033.png" /> is as in (a1). The assumption on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024034.png" /> is that its <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024035.png" /> rows are independent identically distributed <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024036.png" />, i.e., the common distribution of the independent rows is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024037.png" />-variate normal with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024038.png" /> mean and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024039.png" /> non-singular covariance matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024040.png" />.
+
where $\mathbf{Y}$ and $\mathbf{E}$ are $n \times p$, $\mathbf{B}$ is $m \times p$, and $\mathbf{X}$ is as in (a1). The assumption on $\mathbf{E}$ is that its $n$ rows are independent identically distributed $N ( 0 , \Sigma )$, i.e., the common distribution of the independent rows is $p$-variate normal with $0$ mean and $p \times p$ non-singular covariance matrix $\Sigma$.
  
 
GMANOVA (generalized multivariate analysis of variance) generalizes the model equation (a2) of MANOVA to
 
GMANOVA (generalized multivariate analysis of variance) generalizes the model equation (a2) of MANOVA to
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024041.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a3)</td></tr></table>
+
\begin{equation} \tag{a3} \mathbf{Y} = \mathbf{X} _ { 1 } \mathbf{BX} _ { 2 } + \mathbf{E}, \end{equation}
  
in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024042.png" /> is as in (a2), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024043.png" /> is as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024044.png" /> in (a2), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024045.png" /> is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024046.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024047.png" /> is an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024048.png" /> second design matrix.
+
in which $\mathbf{E}$ is as in (a2), $\mathbf{X} _ { 1 }$ is as $\mathbf{X}$ in (a2), $\mathbf{B}$ is $m \times s$, and $\mathbf{X} _ { 2 }$ is an $s \times p$ second design matrix.
  
 
Logically, it would seem that it suffices to deal only with (a3), since (a2) is a special case of (a3), and (a1) of (a2). This turns out to be impossible and it is necessary to treat the three topics in their own right. This will be done, below. For unexplained terms in the fields of estimation and testing hypotheses, see [[#References|[a30]]], [[#References|[a31]]] (and also [[Statistical hypotheses, verification of|Statistical hypotheses, verification of]]; [[Statistical estimation|Statistical estimation]]).
 
Logically, it would seem that it suffices to deal only with (a3), since (a2) is a special case of (a3), and (a1) of (a2). This turns out to be impossible and it is necessary to treat the three topics in their own right. This will be done, below. For unexplained terms in the fields of estimation and testing hypotheses, see [[#References|[a30]]], [[#References|[a31]]] (and also [[Statistical hypotheses, verification of|Statistical hypotheses, verification of]]; [[Statistical estimation|Statistical estimation]]).
Line 24: Line 32:
 
This field is very large, well-developed, and well-documented. Only a brief outline is given here; see the references for more detail. An excellent introduction to the essential elements of the field is [[#References|[a48]]] and a short history is given in [[#References|[a47]]], Sect. 2. Brief descriptions are also given in [[#References|[a56]]], headings Anova; General Linear Model. Other references are [[#References|[a49]]] [[#References|[a50]]], [[#References|[a43]]], [[#References|[a26]]], and [[#References|[a15]]]. A collection of survey articles on many aspects of ANOVA (and of MANOVA and GMANOVA) can be found in [[#References|[a14]]].
 
This field is very large, well-developed, and well-documented. Only a brief outline is given here; see the references for more detail. An excellent introduction to the essential elements of the field is [[#References|[a48]]] and a short history is given in [[#References|[a47]]], Sect. 2. Brief descriptions are also given in [[#References|[a56]]], headings Anova; General Linear Model. Other references are [[#References|[a49]]] [[#References|[a50]]], [[#References|[a43]]], [[#References|[a26]]], and [[#References|[a15]]]. A collection of survey articles on many aspects of ANOVA (and of MANOVA and GMANOVA) can be found in [[#References|[a14]]].
  
In (a1) it is assumed that the parameter vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024049.png" /> is fixed (even though unknown). This is called a fixed effects model, or Model I. In some experimental situations it is more appropriate to consider <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024050.png" /> random and inference is then about parameters in the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024051.png" />. This is called a random effects model, or Model II. It is called a mixed model if some elements of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024052.png" /> are fixed, others random. There are also various randomization models that are not described by (a1). For reasons of space limitation, only the fixed effects model will be treated here. For the other models see [[#References|[a48]]], Chaps. 7, 8, 9.
+
In (a1) it is assumed that the parameter vector $\beta$ is fixed (even though unknown). This is called a fixed effects model, or Model I. In some experimental situations it is more appropriate to consider $\beta$ random and inference is then about parameters in the distribution of $\beta$. This is called a random effects model, or Model II. It is called a mixed model if some elements of $\beta$ are fixed, others random. There are also various randomization models that are not described by (a1). For reasons of space limitation, only the fixed effects model will be treated here. For the other models see [[#References|[a48]]], Chaps. 7, 8, 9.
  
The name "analysis of variance" was coined by R.A. Fisher, who developed statistical techniques for dealing with agricultural experiments; see [[#References|[a48]]], Sect. 1.1: references to Fisher. As a typical example, consider the two-way layout for the simultaneous study of two different factors, for convenience denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024053.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024054.png" />, on the measurement of a certain quantity. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024055.png" /> have levels <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024056.png" />, and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024057.png" /> have levels <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024058.png" />. For each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024059.png" /> combination, measurements <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024060.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024061.png" />, are made. For instance, in a study of the effects of different varieties and different fertilizers on the yield of tomatoes, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024062.png" /> be the weight of ripe tomatoes from plant <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024063.png" /> of variety <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024064.png" /> using fertilizer <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024065.png" />. The model equation is
+
The name "analysis of variance" was coined by R.A. Fisher, who developed statistical techniques for dealing with agricultural experiments; see [[#References|[a48]]], Sect. 1.1: references to Fisher. As a typical example, consider the two-way layout for the simultaneous study of two different factors, for convenience denoted by $\mathbf{A}$ and $\operatorname{B}$, on the measurement of a certain quantity. Let $\mathbf{A}$ have levels $i = 1 , \ldots , I$, and let $\operatorname{B}$ have levels $j = 1 , \ldots , J$. For each $( i , j )$ combination, measurements $y _ { i j k }$, $k = 1 , \ldots , K$, are made. For instance, in a study of the effects of different varieties and different fertilizers on the yield of tomatoes, let $y _ { i j k }$ be the weight of ripe tomatoes from plant $k$ of variety $i$ using fertilizer $j$. The model equation is
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024066.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a4)</td></tr></table>
+
\begin{equation} \tag{a4} y _ { i j k } = \mu + \alpha _ { i } + \beta _ { j } + \gamma _ { i j } + e _ { i j k }, \end{equation}
  
and it is assumed that the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024067.png" /> are independent identically distributed <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024068.png" />. This is of the form (a1) after the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024069.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024070.png" /> are strung out to form the column vectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024071.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024072.png" /> of (a1) with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024073.png" />; similarly, the parameters on the right-hand side of (a4) form an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024074.png" />-vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024075.png" />, with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024076.png" />; finally, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024077.png" /> in (a1) has one column for each of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024078.png" /> parameters, and in row <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024079.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024080.png" /> there is a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024081.png" /> in the columns for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024082.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024083.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024084.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024085.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024086.png" />s elsewhere. Some of the customary terminology is as follows. Each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024087.png" /> combination is a cell. In the example (a4), each cell has the same number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024088.png" /> of observations (balanced design); in general, the cell numbers need not be equal. The parameters on the right-hand side of (a4) are called the effects: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024089.png" /> is the general mean, the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024090.png" />s are the main effects for factor <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024091.png" />, the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024092.png" />s for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024093.png" />, and the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024094.png" />s are the interactions.
+
and it is assumed that the $e _ {i j k }$ are independent identically distributed $N ( 0 , \sigma ^ { 2 } )$. This is of the form (a1) after the $y _ { i j k }$ and $e _ {i j k }$ are strung out to form the column vectors $\mathbf{y}$ and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024072.png"/> of (a1) with $n = I J K$; similarly, the parameters on the right-hand side of (a4) form an $( m \times 1 )$-vector $\beta$, with $m = 1 + I + J + I J$; finally, $\mathbf{X}$ in (a1) has one column for each of the $m$ parameters, and in row $( i , j , k )$ of $\mathbf{X}$ there is a $1$ in the columns for $\mu$, $\alpha_i$, $\beta_j$, and $\gamma _ { i j }$, and $0$s elsewhere. Some of the customary terminology is as follows. Each $( i , j )$ combination is a cell. In the example (a4), each cell has the same number $K$ of observations (balanced design); in general, the cell numbers need not be equal. The parameters on the right-hand side of (a4) are called the effects: $\mu$ is the general mean, the $\alpha$s are the main effects for factor $\mathbf{A}$, the $\beta$s for $\operatorname{B}$, and the $\gamma$s are the interactions.
  
The extension to more than two factors is immediate. There are then potentially more types of interactions; e.g., in a three-way layout there are three types of two-factor interactions and one type of three-factor interactions. Layouts of this type are called factorial, and completely crossed if there is at least one observation in each cell. The latter may not always be feasible for practical reasons if the number of cells is large. In that case it may be necessary to restrict observations to only a fraction of the cells and assume certain interactions to be <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024095.png" />. The judicious choice of this is the subject of design of experiments; see [[#References|[a26]]], [[#References|[a15]]].
+
The extension to more than two factors is immediate. There are then potentially more types of interactions; e.g., in a three-way layout there are three types of two-factor interactions and one type of three-factor interactions. Layouts of this type are called factorial, and completely crossed if there is at least one observation in each cell. The latter may not always be feasible for practical reasons if the number of cells is large. In that case it may be necessary to restrict observations to only a fraction of the cells and assume certain interactions to be $0$. The judicious choice of this is the subject of design of experiments; see [[#References|[a26]]], [[#References|[a15]]].
  
A different type of experiment involves regression. In the simplest case the measurement <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024096.png" /> of a certain quantity may be modelled as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024097.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024098.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a13024099.png" /> are unknown real-valued parameters and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240100.png" /> is the value of some continuously measurable quantity such as time, temperature, distance, etc.. This is called linear regression (i.e., linear in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240101.png" />). More generally, there could be an arbitrary polynomial in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240102.png" /> on the right-hand side. As an example, assume quadratic regression and suppose <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240103.png" /> denotes time. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240104.png" /> be the measurement on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240105.png" /> at time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240106.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240107.png" />. The model equation is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240108.png" />, which is of the form (a1) with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240109.png" /> of (a1). The matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240110.png" /> of (a1) has three columns corresponding to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240111.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240112.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240113.png" />; the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240114.png" />th row of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240115.png" /> is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240116.png" />. Functions of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240117.png" /> other than polynomials are sometimes appropriate. Frequently, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240118.png" /> is referred to as a regressor variable or independent variable, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240119.png" /> the dependent variable. Instead of one regressor variable there may be several (multiple regression).
+
A different type of experiment involves regression. In the simplest case the measurement $y$ of a certain quantity may be modelled as $y = \alpha + \beta t +\text{error}$, where $\alpha$ and $\beta$ are unknown real-valued parameters and $t$ is the value of some continuously measurable quantity such as time, temperature, distance, etc.. This is called linear regression (i.e., linear in $t$). More generally, there could be an arbitrary polynomial in $t$ on the right-hand side. As an example, assume quadratic regression and suppose $t$ denotes time. Let $y _ { i }$ be the measurement on $y$ at time $t_i$, $i = 1 , \dots , n$. The model equation is $y _ { i } = \alpha + \beta t _ { i } + \gamma t_{i} ^ { 2 } + e _ { i }$, which is of the form (a1) with $( \alpha , \beta , \gamma ) ^ { \prime } = \beta$ of (a1). The matrix $\mathbf{X}$ of (a1) has three columns corresponding to $\alpha$, $\beta$, and $\gamma$; the $i$th row of $\mathbf{X}$ is $( 1 , t _ { i } , t _ { i } ^ { 2 } )$. Functions of $t$ other than polynomials are sometimes appropriate. Frequently, $t$ is referred to as a regressor variable or independent variable, and $y$ the dependent variable. Instead of one regressor variable there may be several (multiple regression).
  
Factors such as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240120.png" /> above whose values can be measured on a continuous scale are called quantitative. In contrast, categorical variables (e.g., variety of tomato) are called qualitative. A quantitative factor <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240121.png" /> may be treated qualitatively if the experiment is conducted at several values, say <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240122.png" />, but these are only regarded as levels <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240123.png" /> of the factor whereas the actual values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240124.png" /> are ignored. The name analysis of variance is often reserved for models that have only factors that are qualitative or treated qualitatively. In contrast, regression analysis has only quantitative factors. Analysis of covariance covers models that have both kinds of factors. See [[#References|[a48]]], Chap. 6, for more detail.
+
Factors such as $t$ above whose values can be measured on a continuous scale are called quantitative. In contrast, categorical variables (e.g., variety of tomato) are called qualitative. A quantitative factor $t$ may be treated qualitatively if the experiment is conducted at several values, say $t _ { 1 } , t _ { 2 } , \ldots$, but these are only regarded as levels $i = 1,2 , \dots$ of the factor whereas the actual values $t _ { 1 } , t _ { 2 } , \ldots$ are ignored. The name analysis of variance is often reserved for models that have only factors that are qualitative or treated qualitatively. In contrast, regression analysis has only quantitative factors. Analysis of covariance covers models that have both kinds of factors. See [[#References|[a48]]], Chap. 6, for more detail.
  
Another important distinction involving factors is between the notions of crossing and nesting. Two factors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240125.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240126.png" /> are crossed if each level of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240127.png" /> can occur with each level of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240128.png" /> (completely crossed if there is at least one observation for each combination of levels, otherwise incompletely or partly crossed). For instance, in the tomato example of the two-way layout (a4), the two factors are crossed since each variety <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240129.png" /> can be grown with any fertilizer <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240130.png" />. In contrast, factor <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240131.png" /> is said to be nested within factor <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240132.png" /> if every level of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240133.png" /> can only occur with one level of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240134.png" />. For instance, suppose two different manufacturing processes (factor <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240135.png" />) for the production of cords have to be compared. From each of the two processes several cords are chosen (factor <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240136.png" />), each cord cut into several pieces and the breaking strength of each piece measured. Here each cord goes only with one of the processes so that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240137.png" /> is nested within <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240138.png" />. Nested factors should be treated more realistically as random. However, for the analysis it is necessary to analyze the corresponding fixed effects model first. See [[#References|[a48]]], Sect. 5.3, for more examples and detail.
+
Another important distinction involving factors is between the notions of crossing and nesting. Two factors $\mathbf{A}$ and $\operatorname{B}$ are crossed if each level of $\mathbf{A}$ can occur with each level of $\operatorname{B}$ (completely crossed if there is at least one observation for each combination of levels, otherwise incompletely or partly crossed). For instance, in the tomato example of the two-way layout (a4), the two factors are crossed since each variety $i$ can be grown with any fertilizer $j$. In contrast, factor $\operatorname{B}$ is said to be nested within factor $\mathbf{A}$ if every level of $\operatorname{B}$ can only occur with one level of $\mathbf{A}$. For instance, suppose two different manufacturing processes (factor $\mathbf{A}$) for the production of cords have to be compared. From each of the two processes several cords are chosen (factor $\operatorname{B}$), each cord cut into several pieces and the breaking strength of each piece measured. Here each cord goes only with one of the processes so that $\operatorname{B}$ is nested within $\mathbf{A}$. Nested factors should be treated more realistically as random. However, for the analysis it is necessary to analyze the corresponding fixed effects model first. See [[#References|[a48]]], Sect. 5.3, for more examples and detail.
  
 
===Estimation and testing hypotheses.===
 
===Estimation and testing hypotheses.===
The main interest is in inference on linear functions of the parameter vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240139.png" /> of (a1), called parametric functions, i.e., functions of the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240140.png" />, with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240141.png" /> of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240142.png" />. Usually one requires point estimators (cf. also [[Point estimator|Point estimator]]) of such <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240143.png" />s to be unbiased (cf. also [[Unbiased estimator|Unbiased estimator]]). Of particular interest are the elements of the vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240144.png" />. However, there is a complication arising from the fact that the design matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240145.png" /> in (a1) may be of less than maximal rank (the columns can be linearly dependent). This happens typically in analysis of variance models (but not usually in regression models). For instance, in the two-way layout (a4) the sum of the columns for the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240146.png" /> equals the column for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240147.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240148.png" /> is of less than full rank, then the elements of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240149.png" /> are not identifiable in the sense that even if the error vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240150.png" /> in (a1) were <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240151.png" />, so that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240152.png" /> is known, there is no unique solution for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240153.png" />. A fortiori the elements of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240154.png" /> do not possess unbiased estimators. Yet, there are parametric functions that do have an unbiased estimator; they are called estimable. It is easily shown that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240155.png" /> is estimable if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240156.png" /> is in the row space of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240157.png" /> (see [[#References|[a48]]], Sect. 1.4). In particular, if one sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240158.png" /> and takes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240159.png" /> to be the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240160.png" />th row of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240161.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240162.png" /> is estimable. Thus, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240163.png" /> is estimable if and only if it is a linear combination of the elements of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240164.png" />.
+
The main interest is in inference on linear functions of the parameter vector $\beta$ of (a1), called parametric functions, i.e., functions of the form $\psi = \mathbf{c} ^ { \prime } \beta$, with $\mathbf{c}$ of order $m \times 1$. Usually one requires point estimators (cf. also [[Point estimator|Point estimator]]) of such $\psi$s to be unbiased (cf. also [[Unbiased estimator|Unbiased estimator]]). Of particular interest are the elements of the vector $\beta$. However, there is a complication arising from the fact that the design matrix $\mathbf{X}$ in (a1) may be of less than maximal rank (the columns can be linearly dependent). This happens typically in analysis of variance models (but not usually in regression models). For instance, in the two-way layout (a4) the sum of the columns for the $\alpha_i$ equals the column for $\mu$. If $\mathbf{X}$ is of less than full rank, then the elements of $\beta$ are not identifiable in the sense that even if the error vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240150.png"/> in (a1) were $0$, so that $\mathbf{X} \beta$ is known, there is no unique solution for $\beta$. A fortiori the elements of $\beta$ do not possess unbiased estimators. Yet, there are parametric functions that do have an unbiased estimator; they are called estimable. It is easily shown that $\mathbf{c} ^ { \prime } \beta$ is estimable if and only if $\mathbf{c} ^ { \prime }$ is in the row space of $\mathbf{X}$ (see [[#References|[a48]]], Sect. 1.4). In particular, if one sets $\mathsf E ( y _ { i } ) = \eta _ { i }$ and takes $\mathbf{c} ^ { \prime }$ to be the $i$th row of $\mathbf{X}$, then $\mathbf{c} ^ { \prime } \beta = \eta_{i}$ is estimable. Thus, $\psi$ is estimable if and only if it is a linear combination of the elements of $\eta = \mathsf E ( \mathbf y )$.
  
The complication presented by a design matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240165.png" /> that is not of full rank may be handled in several ways. First, a re-parametrization with fewer parameters and fewer columns of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240166.png" /> is possible. Second, a popular way is to impose side conditions on the parameters that make them unique. For instance, in the two-way layout (a4) often-used side conditions are: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240167.png" />, or, equivalently, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240168.png" /> (where dotting on a subscript means averaging over that subscript); similarly, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240169.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240170.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240171.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240172.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240173.png" />. Then all parameters are estimable and (for instance) the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240174.png" /> that all main effects of factor <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240175.png" /> are <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240176.png" /> can be expressed by: All <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240177.png" /> are equal to zero. A third way of dealing with an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240178.png" /> of less than full rank is to express all questions of inference in terms of estimable parametric functions. For instance, if in (a4) one writes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240179.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240180.png" />), then all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240181.png" /> are estimable and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240182.png" /> can be expressed by stating that all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240183.png" /> are equal, or, equivalently, that all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240184.png" /> are equal to zero.
+
The complication presented by a design matrix $\mathbf{X}$ that is not of full rank may be handled in several ways. First, a re-parametrization with fewer parameters and fewer columns of $\mathbf{X}$ is possible. Second, a popular way is to impose side conditions on the parameters that make them unique. For instance, in the two-way layout (a4) often-used side conditions are: $\sum \alpha _ { i } = 0$, or, equivalently, $\alpha_{.} = 0$ (where dotting on a subscript means averaging over that subscript); similarly, $\beta . = 0$, and $\gamma _ { i } = 0.$ for all $i$, $\gamma _ { j } = 0$ for all $j$. Then all parameters are estimable and (for instance) the hypothesis $\mathcal{H} _ { \text{A} }$ that all main effects of factor $\mathbf{A}$ are $0$ can be expressed by: All $\alpha_i$ are equal to zero. A third way of dealing with an $\mathbf{X}$ of less than full rank is to express all questions of inference in terms of estimable parametric functions. For instance, if in (a4) one writes $\eta _ { i j } = \mu + \alpha _ { i } + \beta _ { j } + \gamma _ { i j }$ ($= \mathsf{E} ( y _ { i j k } )$), then all $\eta_{ij}$ are estimable and $\mathcal{H} _ { \text{A} }$ can be expressed by stating that all $\eta_{ i}.$ are equal, or, equivalently, that all $\eta _ { i .} - \eta _ { - }$ are equal to zero.
  
Another type of estimator that always exists is a least-squares estimator (LSE; cf. also [[Least squares, method of|Least squares, method of]]). A least-squares estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240185.png" /> is any vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240186.png" /> minimizing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240187.png" />. A minimizing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240188.png" /> (unique if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240189.png" /> is of full rank) is denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240190.png" /> and satisfies the normal equations
+
Another type of estimator that always exists is a least-squares estimator (LSE; cf. also [[Least squares, method of|Least squares, method of]]). A least-squares estimator of $\beta$ is any vector $\flat$ minimizing $\| \mathbf{y} - \mathbf{Xb} \| ^ { 2 }$. A minimizing $\flat$ (unique if and only if $\mathbf{X}$ is of full rank) is denoted by $\hat{\beta}$ and satisfies the normal equations
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240191.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a5)</td></tr></table>
+
\begin{equation} \tag{a5} \mathbf{X} ^ { \prime } \mathbf{X} \widehat { \beta } = \mathbf{X} ^ { \prime } \mathbf{y} . \end{equation}
  
If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240192.png" /> is estimable, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240193.png" /> is unique (even when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240194.png" /> is not) and is called the least-squares estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240195.png" />. By the Gauss–Markov theorem (cf. also [[Least squares, method of|Least squares, method of]]), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240196.png" /> is the minimum variance unbiased estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240197.png" />. See [[#References|[a48]]], Sect. 1.4.
+
If $\psi = \mathbf{c} ^ { \prime } \beta$ is estimable, then $\hat { \psi } = \mathbf{c} ^ { \prime } \hat { \beta }$ is unique (even when $\hat{\beta}$ is not) and is called the least-squares estimator of $\psi$. By the Gauss–Markov theorem (cf. also [[Least squares, method of|Least squares, method of]]), $\widehat { \psi }$ is the minimum variance unbiased estimator of $\psi$. See [[#References|[a48]]], Sect. 1.4.
  
A linear hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240198.png" /> consists of one or more linear restrictions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240199.png" />:
+
A linear hypothesis $\mathcal{H}$ consists of one or more linear restrictions on $\beta$:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240200.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a6)</td></tr></table>
+
\begin{equation} \tag{a6} \mathcal{H} : \mathbf{X} _ { 3 } \beta = 0 \end{equation}
  
with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240201.png" /> of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240202.png" /> and rank <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240203.png" />. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240204.png" /> is to be tested against the alternative <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240205.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240206.png" />. The model (a1) together with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240207.png" /> of (a6) can be expressed in geometric language as follows: The mean vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240208.png" /> lies in a linear subspace <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240209.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240210.png" />-dimensional space, spanned by the columns of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240211.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240212.png" /> restricts <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240213.png" /> to a further subspace <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240214.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240215.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240216.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240217.png" />. Further analysis is simplified by a transformation to the canonical system, below.
+
with $\mathbf{X} _ { 3 }$ of order $q \times m$ and rank $q$. Then $\mathcal{H}$ is to be tested against the alternative $\mathbf{X} _ { 3 } \beta \neq 0$. Let $\operatorname{rank} ( \mathbf{X} ) = r$. The model (a1) together with $\mathcal{H}$ of (a6) can be expressed in geometric language as follows: The mean vector $\eta = \mathsf E ( \mathbf y )$ lies in a linear subspace $\Omega$ of $n$-dimensional space, spanned by the columns of $\mathbf{X}$, and $\mathcal{H}$ restricts $ \eta $ to a further subspace $\omega$ of $\Omega$, where $\operatorname { dim } ( \Omega ) = r$ and $\operatorname { dim } ( \omega ) = r - q$. Further analysis is simplified by a transformation to the canonical system, below.
  
 
===Canonical form.===
 
===Canonical form.===
There is a transformation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240218.png" />, with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240219.png" /> of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240220.png" /> and orthogonal, so that the model (a1) together with the hypothesis (a6) can be put in the following form (in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240221.png" /> are the elements of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240222.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240223.png" />): <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240224.png" /> are independent, normal, with common variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240225.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240226.png" />, and, additionally, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240227.png" /> specifies <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240228.png" />. Note that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240229.png" /> are unrestricted throughout. Any estimable parametric function can be expressed in the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240230.png" />, with constants <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240231.png" />, and the least-squares estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240232.png" /> is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240233.png" />. To estimate <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240234.png" /> one forms the sum of squares for error <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240235.png" />, and divides by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240236.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240237.png" /> degrees of freedom for the error) to form the mean square <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240238.png" />. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240239.png" /> is an unbiased estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240240.png" />. A test of the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240241.png" /> can be obtained by forming <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240242.png" />, with degrees of freedom <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240243.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240244.png" />. Then, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240245.png" /> is true, the test statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240246.png" /> has an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240247.png" />-distribution with degrees of freedom <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240248.png" />. For a test of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240249.png" /> of level of significance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240250.png" /> one rejects <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240251.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240252.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240253.png" /> the upper <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240254.png" />-point of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240255.png" />-distribution with degrees of freedom <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240256.png" />). This is "the" <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240258.png" />-test; it can be derived as a likelihood-ratio test (LR test) or as a uniformly most powerful invariant test (UMP invariant test) and has several other optimum properties; see [[#References|[a48]]], Sect. 2.10. For the power of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240259.png" />-test, see [[#References|[a48]]], Sect. 2.8.
+
There is a transformation $\mathbf z = \Gamma \mathbf y $, with $\Gamma$ of order $n \times n$ and orthogonal, so that the model (a1) together with the hypothesis (a6) can be put in the following form (in which $z_1 , \dots ,z_n$ are the elements of $z$ and $\zeta _ { i } = \mathsf{E} ( z _ { i } )$): $z_1 , \dots ,z_n$ are independent, normal, with common variance $\sigma ^ { 2 }$; $\zeta _ { r + 1 } = \ldots = \zeta _ { n } = 0$, and, additionally, $\mathcal{H}$ specifies $\zeta _ { 1 } = \ldots = \zeta _ { q } = 0$. Note that $\zeta _ { q  + 1} , \dots , \zeta _ { r }$ are unrestricted throughout. Any estimable parametric function can be expressed in the form $\psi = \sum _ { i = 1 } ^ { r } d _ { i } \zeta _ { i }$, with constants $d_{i}$, and the least-squares estimator of $\psi$ is $\hat { \psi } = \sum _ { i = 1 } ^ { r } d _ { i } z _ { i }$. To estimate $\sigma ^ { 2 }$ one forms the sum of squares for error $\operatorname{SS} _ { e } = \sum _ { i = r + 1 } ^ { n } z _ { i } ^ { 2 }$, and divides by $n - r$ ($=$ degrees of freedom for the error) to form the mean square $\operatorname{MS} _ { e } = \operatorname{SS} _ { e } / ( n - r )$. Then $ \operatorname{MS} _ { e }$ is an unbiased estimator of $\sigma ^ { 2 }$. A test of the hypothesis $\mathcal{H}$ can be obtained by forming $\text{SS} _ { \mathcal{H} } = \sum _ { i = 1 } ^ { q } z _ { i } ^ { 2 }$, with degrees of freedom $q$, and $ \operatorname { MS } _{\mathcal{H}}=\operatorname {SS} _{\mathcal{H}} / q$. Then, if $\mathcal{H}$ is true, the test statistic $\mathcal{F} = \operatorname {MS} _ { \mathcal{H} } / \operatorname {MS}_{\text{e}}$ has an $F$-distribution with degrees of freedom $( q , n - r )$. For a test of $\mathcal{H}$ of level of significance $\alpha$ one rejects $\mathcal{H}$ if $\mathcal{F} &gt; F _ { \alpha ; q , n - r}$ ($=$ the upper $\alpha$-point of the $F$-distribution with degrees of freedom $( q , n - r )$). This is "the" $F$-test; it can be derived as a likelihood-ratio test (LR test) or as a uniformly most powerful invariant test (UMP invariant test) and has several other optimum properties; see [[#References|[a48]]], Sect. 2.10. For the power of the $F$-test, see [[#References|[a48]]], Sect. 2.8.
  
 
===Simultaneous confidence intervals.===
 
===Simultaneous confidence intervals.===
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240260.png" /> be the linear space of all parametric functions of the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240261.png" />, i.e., all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240262.png" /> that are <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240263.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240264.png" /> is true. The <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240265.png" />-test provides a way to obtain simultaneous confidence intervals for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240266.png" /> with confidence level <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240267.png" /> (cf. also [[Confidence interval|Confidence interval]]). This is useful, for instance, in cases where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240268.png" /> is rejected. Then any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240269.png" /> whose confidence interval does not include <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240270.png" /> is said to be "significantly different from 0" and can be held responsible for the rejection of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240271.png" />. Observe that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240272.png" /> has an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240273.png" />-distribution with degrees of freedom <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240274.png" /> (whether or not <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240275.png" /> is true) so that this quantity is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240276.png" /> with probability <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240277.png" />. This inequality can be converted into a family of double inequalities and leads to the simultaneous confidence intervals
+
Let $L$ be the linear space of all parametric functions of the form $\psi = \sum _ { i = 1 } ^ { q } d _ { i } \zeta _ { i }$, i.e., all $\psi$ that are $0$ if $\mathcal{H}$ is true. The $F$-test provides a way to obtain simultaneous confidence intervals for all $\psi \in L$ with confidence level $1 - \alpha$ (cf. also [[Confidence interval|Confidence interval]]). This is useful, for instance, in cases where $\mathcal{H}$ is rejected. Then any $\psi \in L$ whose confidence interval does not include $0$ is said to be "significantly different from 0" and can be held responsible for the rejection of $\mathcal{H}$. Observe that $q ^ { - 1 } \sum _ { i = 1 } ^ { q } ( z _ { i } - \zeta _ { i } ) ^ { 2 } / \operatorname{MS} _ { e }$ has an $F$-distribution with degrees of freedom $( q , n - r )$ (whether or not $\mathcal{H}$ is true) so that this quantity is $\leq F _ { \alpha ; q , n - \gamma }$ with probability $1 - \alpha$. This inequality can be converted into a family of double inequalities and leads to the simultaneous confidence intervals
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240278.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a7)</td></tr></table>
+
\begin{equation} \tag{a7} \mathsf{P} ( \widehat { \psi } - S \widehat { \sigma } _ { \widehat { \psi } } \leq \psi \leq \widehat { \psi } + S \widehat { \sigma } _ { \widehat { \psi } } , \forall \psi \in L ) = 1 - \alpha, \end{equation}
  
in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240279.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240280.png" /> is the square root of the unbiased estimator of the variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240281.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240282.png" />. Thus, the confidence interval for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240283.png" /> has endpoints <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240284.png" />, and all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240285.png" /> are covered by their confidence intervals simultaneously with probability <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240286.png" />. Note that (a7) is stated without needing the canonical system so that the confidence intervals can be evaluated directly in the original system.
+
in which $S = ( q F _ { \alpha ; q , n - r } ) ^ { 1 / 2 }$ and $\hat { \sigma }_{ \hat { \psi }} = \| \mathbf{d} \| ( \text{MS} _ { e } ) ^ { 1 / 2 }$ is the square root of the unbiased estimator of the variance $\| \mathbf{d} \| ^ { 2 } \sigma ^ { 2 }$ of $\widehat { \psi } = \sum _ { i = 1 } ^ { q } d _ { i } z _ { i }$. Thus, the confidence interval for $\psi$ has endpoints $\hat { \psi } \pm S \ \hat { \sigma }_{ \hat { \psi }}$, and all $\psi \in L$ are covered by their confidence intervals simultaneously with probability $1 - \alpha$. Note that (a7) is stated without needing the canonical system so that the confidence intervals can be evaluated directly in the original system.
  
With help of (a7) the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240287.png" />-test can also be expressed as follows: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240288.png" /> is accepted if and only if all confidence intervals with endpoints <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240289.png" /> cover the value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240290.png" />. More generally, it is convenient to make the following definition: a test of a hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240291.png" /> is exact with respect to a family of simultaneous confidence intervals for a family of parametric functions if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240292.png" /> is accepted if and only if the confidence interval of every <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240293.png" /> in the family includes the value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240294.png" /> specified by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240295.png" />; see [[#References|[a52]]], [[#References|[a53]]]. Thus, the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240296.png" />-test is exact with respect to the simultaneous confidence intervals (a7).
+
With help of (a7) the $F$-test can also be expressed as follows: $\mathcal{H}$ is accepted if and only if all confidence intervals with endpoints $\hat { \psi } \pm S \ \hat { \sigma }_{ \hat { \psi }}$ cover the value $0$. More generally, it is convenient to make the following definition: a test of a hypothesis $\mathcal{H}$ is exact with respect to a family of simultaneous confidence intervals for a family of parametric functions if $\mathcal{H}$ is accepted if and only if the confidence interval of every $\psi$ in the family includes the value of $\psi$ specified by $\mathcal{H}$; see [[#References|[a52]]], [[#References|[a53]]]. Thus, the $F$-test is exact with respect to the simultaneous confidence intervals (a7).
  
The confidence intervals obtained in (a7) are called Scheffé-type simultaneous confidence intervals. Shorter confidence intervals of Tukey-type within a smaller class of parametric functions are possible in some designs. This is applicable, for instance, in the two-way layout of (a4) with equal cell numbers if only differences between the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240297.png" /> are considered important rather than all parametric functions that are <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240298.png" /> under <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240299.png" /> (so-called contrasts). See [[#References|[a48]]], Sect. 3.6.
+
The confidence intervals obtained in (a7) are called Scheffé-type simultaneous confidence intervals. Shorter confidence intervals of Tukey-type within a smaller class of parametric functions are possible in some designs. This is applicable, for instance, in the two-way layout of (a4) with equal cell numbers if only differences between the $\alpha_i$ are considered important rather than all parametric functions that are $0$ under $\mathcal{H} _ { \text{A} }$ (so-called contrasts). See [[#References|[a48]]], Sect. 3.6.
  
The canonical system is very useful to derive formulas and prove properties in a unified way, but it is usually not advisable in any given linear model to carry out the transformation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240300.png" /> explicitly. Instead, the necessary expressions can be derived in the original system. For instance, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240301.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240302.png" /> are the orthogonal projections of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240303.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240304.png" /> and on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240305.png" />, respectively, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240306.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240307.png" />. These projections can be found by solving the normal equations (a5) (and one gets, for instance, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240308.png" />), or by minimizing quadratic forms. As an example of the latter: In the two-way layout (a4), minimize <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240309.png" /> over the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240310.png" />. This yields <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240311.png" />, so that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240312.png" />. If desired, formulas can be expressed in vector and matrix form. As an example, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240313.png" /> is of maximal rank, then (a5) yields <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240314.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240315.png" />. Similar expressions hold under <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240316.png" /> after replacing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240317.png" /> by a matrix whose columns span <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240318.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240319.png" /> is not of maximal rank, then a generalized inverse may be employed. See [[#References|[a43]]], Sect. 4a.3, and [[#References|[a45]]].
+
The canonical system is very useful to derive formulas and prove properties in a unified way, but it is usually not advisable in any given linear model to carry out the transformation $\mathbf z = \Gamma \mathbf y $ explicitly. Instead, the necessary expressions can be derived in the original system. For instance, if $\hat { \eta } \Omega$ and $\widehat { \eta } \omega$ are the orthogonal projections of $\mathbf{y}$ on $\Omega$ and on $\omega$, respectively, then $\operatorname {SS} _ { e } = \| \mathbf{y} - \hat { \eta } _ { \Omega } \| ^ { 2 }$ and $\operatorname {SS} _ { \mathcal H } = \| \widehat { \eta } _ { \Omega } - \widehat { \eta } _ { \omega } \| ^ { 2 }$. These projections can be found by solving the normal equations (a5) (and one gets, for instance, $\hat { \eta } _ { \Omega } = \mathbf{X} \hat { \beta }$), or by minimizing quadratic forms. As an example of the latter: In the two-way layout (a4), minimize $\sum _ { i j k } ( y _ { i j k } - \eta _ { i j } ) ^ { 2 }$ over the $\eta_{ij}$. This yields $\hat { \eta } _ { i j } = y _ { i j }.$, so that $\operatorname{SS} _ { e } = \sum _ { i j k } ( y _ { i j k } - y _ { i j .} ) ^ { 2 }$. If desired, formulas can be expressed in vector and matrix form. As an example, if $\mathbf{X}$ is of maximal rank, then (a5) yields $\hat { \beta } = ( \mathbf{X} ^ { \prime } \mathbf{X} ) ^ { - 1 } \mathbf{X} ^ { \prime } \mathbf{y}$ and $\operatorname {SS} _ { e } = \mathbf{y} ^ { \prime } ( \mathbf{I} _ { n } - \mathbf{X} ( \mathbf{X} ^ { \prime } \mathbf{X} ) ^ { - 1 } \mathbf{X} ^ { \prime } ) \mathbf{y}$. Similar expressions hold under $\mathcal{H}$ after replacing $\mathbf{X}$ by a matrix whose columns span $\omega$. If $\mathbf{X}$ is not of maximal rank, then a generalized inverse may be employed. See [[#References|[a43]]], Sect. 4a.3, and [[#References|[a45]]].
  
 
==MANOVA.==
 
==MANOVA.==
There are several good textbooks on multivariate analysis that treat various aspects of MANOVA. Among the major ones are [[#References|[a1]]], [[#References|[a8]]], [[#References|[a19]]], [[#References|[a29]]], [[#References|[a36]]], [[#References|[a41]]], and [[#References|[a43]]], Chap. 8. See also [[#References|[a56]]], headings Multivariate Analysis; Multivariate Analysis Of Variance, and [[#References|[a14]]]. The ideas involved in MANOVA are essentially the same as in ANOVA, but there is an added dimension in that the observations are now multivariate. For instance, if measurements are made on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240320.png" /> different features of the same individual, then this should be regarded as one observation on a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240321.png" />-variate distribution. The MANOVA model is given by (a2). A linear hypothesis on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240322.png" /> analogous to (a6) is
+
There are several good textbooks on multivariate analysis that treat various aspects of MANOVA. Among the major ones are [[#References|[a1]]], [[#References|[a8]]], [[#References|[a19]]], [[#References|[a29]]], [[#References|[a36]]], [[#References|[a41]]], and [[#References|[a43]]], Chap. 8. See also [[#References|[a56]]], headings Multivariate Analysis; Multivariate Analysis Of Variance, and [[#References|[a14]]]. The ideas involved in MANOVA are essentially the same as in ANOVA, but there is an added dimension in that the observations are now multivariate. For instance, if measurements are made on $p$ different features of the same individual, then this should be regarded as one observation on a $p$-variate distribution. The MANOVA model is given by (a2). A linear hypothesis on $\mathbf{B}$ analogous to (a6) is
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240323.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a8)</td></tr></table>
+
\begin{equation} \tag{a8} \mathcal{H} : \mathbf{X} _ { 3 } \mathbf{B} = 0, \end{equation}
  
with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240324.png" /> as in (a6). Any ANOVA testing problem defined by the choice of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240325.png" /> in (a1) and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240326.png" /> in (a6) carries over to the same kind of problem given by (a2) and (a8). However, since <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240327.png" /> is a matrix, there are other ways than (a8) of formulating a linear hypothesis. The most obvious extension of (a8) is
+
with $\mathbf{X} _ { 3 }$ as in (a6). Any ANOVA testing problem defined by the choice of $\mathbf{X}$ in (a1) and $\mathbf{X} _ { 3 }$ in (a6) carries over to the same kind of problem given by (a2) and (a8). However, since $\mathbf{B}$ is a matrix, there are other ways than (a8) of formulating a linear hypothesis. The most obvious extension of (a8) is
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240328.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a9)</td></tr></table>
+
\begin{equation} \tag{a9} \mathcal {H} : {\bf X} _ { 3 } {\bf B X} _ { 4 } = 0, \end{equation}
  
in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240329.png" /> is a known <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240330.png" />-matrix of rank <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240331.png" />. However, (a9) can be reduced to (a8) by making the transformation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240332.png" />, of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240333.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240334.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240335.png" />; then the model is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240336.png" />, with the rows of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240337.png" /> independent identically distributed <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240338.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240339.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240340.png" />. Thus, the transformed problem is as (a2), (a8), with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240341.png" /> replacing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240342.png" />. This can be applied, for instance, to profile analysis; see [[#References|[a29]]], Sect. 5.4 (A5), [[#References|[a36]]], Sects. 4.6, 5.6.
+
in which $\mathbf{X}_{4}$ is a known $( p \times p _ { 1 } )$-matrix of rank $p _ { 1 }$. However, (a9) can be reduced to (a8) by making the transformation $\mathbf{Z} = \mathbf{Y X}_4$, of order $n \times p _ { 1 }$, $\Gamma = \mathbf{B} \mathbf{X}_4$, $\mathbf{F} = \mathbf{EX}_4$; then the model is ${\bf Z = X} \Gamma + \bf F$, with the rows of $\mathbf{F}$ independent identically distributed $N ( 0 , \Sigma _ { 1 } )$, $\Sigma _ { 1 } = \mathbf{X} _ { 4 } ^ { \prime } \Sigma \mathbf{X} _ { 4 }$, and $\mathcal{H} : \mathbf{X} _ { 3 } \Gamma = 0$. Thus, the transformed problem is as (a2), (a8), with $\mathbf{Z} , \Gamma , \mathbf{F}$ replacing $\mathbf{Y} , \mathbf{B} , \mathbf{E}$. This can be applied, for instance, to profile analysis; see [[#References|[a29]]], Sect. 5.4 (A5), [[#References|[a36]]], Sects. 4.6, 5.6.
  
There is a canonical form of the MANOVA testing problem (a2), (a8) analogous to the ANOVA problem (a1), (a6), the difference being that the real-valued random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240343.png" /> of ANOVA are replaced by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240344.png" /> random vectors. These vectors form the rows of three random matrices, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240345.png" /> of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240346.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240347.png" /> of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240348.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240349.png" /> of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240350.png" />, all of whose rows are assumed independent and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240351.png" />-variate normal with common non-singular covariance matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240352.png" />; furthermore, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240353.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240354.png" /> is unspecified, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240355.png" /> specifies <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240356.png" />. It is assumed that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240357.png" />. Put <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240358.png" />, so that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240359.png" /> is an unbiased estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240360.png" />. For testing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240361.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240362.png" /> is ignored and the sums of squares <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240363.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240364.png" /> of ANOVA are replaced by the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240365.png" />-matrices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240366.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240367.png" />, respectively. An application of sufficiency plus the principle of invariance restricts tests of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240368.png" /> to those that depend only on the positive characteristic roots of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240369.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240370.png" /> the positive characteristic roots of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240371.png" />). The case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240372.png" />, when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240373.png" /> is a row vector, deserves special attention. It arises, for instance, when testing for zero mean in a single multivariate population or testing the equality of means in two such populations. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240374.png" /> is the only positive characteristic root; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240375.png" /> is called Hotelling's <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240377.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240378.png" /> has an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240379.png" />-distribution with degrees of freedom <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240380.png" />, central or non-central according as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240381.png" /> is true or false. Rejecting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240382.png" /> for large values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240383.png" /> is uniformly most powerful invariant. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240384.png" /> there is no best way of combining the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240385.png" /> characteristic roots, so that there is no uniformly most powerful invariant test (unlike there is in ANOVA). The following tests have been proposed:
+
There is a canonical form of the MANOVA testing problem (a2), (a8) analogous to the ANOVA problem (a1), (a6), the difference being that the real-valued random variables $z_i$ of ANOVA are replaced by $1 \times p$ random vectors. These vectors form the rows of three random matrices, $\mathbf{Z} _ { 1 }$ of order $q \times p$, $\mathbf{Z}_{2}$ of order $( r - q ) \times p$, and $\mathbf{Z}_{3}$ of order $( n - r ) \times p$, all of whose rows are assumed independent and $p$-variate normal with common non-singular covariance matrix $\Sigma$; furthermore, $\mathsf{E} ( \mathbf{Z} _ { 3 } ) = 0$, $\mathsf{E} ( \mathbf Z _ { 2 } )$ is unspecified, and $\mathcal{H}$ specifies $\mathsf{E} ( {\bf Z} _ { 1 } ) = 0$. It is assumed that $n - r \geq p$. Put $\mathsf E ( \mathbf Z _ { 1 } ) = \Theta$, so that $\mathbf{Z} _ { 1 }$ is an unbiased estimator of $\Theta$. For testing $\mathcal{H} : \Theta = 0$, $\mathbf{Z}_{2}$ is ignored and the sums of squares $\text{SS} _ { \mathcal{H} }$ and $\text{SS} _ { e }$ of ANOVA are replaced by the $( p \times p )$-matrices $\mathbf{M} _ { \mathcal{H} } = \mathbf{Z} _ { 1 } ^ { \prime }\mathbf{ Z} _ { 1 }$ and $\mathbf{M} _ { \mathsf{E} } = \mathbf{Z} _ { 3 } ^ { \prime } \mathbf{Z} _ { 3 }$, respectively. An application of sufficiency plus the principle of invariance restricts tests of $\mathcal{H}$ to those that depend only on the positive characteristic roots of $\mathbf{M} _ { \mathcal{H} } \mathbf{M} _ { \mathsf{E} } ^ { - 1 }$ ($=$ the positive characteristic roots of $\mathbf{Z} _ { 1 } \mathbf{M} _ { \mathsf{E} } ^ { - 1 } \mathbf{Z} _ { 1 } ^ { \prime }$). The case $q = 1$, when $\mathbf{Z} _ { 1 }$ is a row vector, deserves special attention. It arises, for instance, when testing for zero mean in a single multivariate population or testing the equality of means in two such populations. Then $F = \mathbf{Z} _ { 1 } \mathbf{M} _ {  \mathsf{E} } ^ { - 1 } \mathbf{Z} _ { 1 } ^ { \prime }$ is the only positive characteristic root; $( n - r ) F$ is called Hotelling's $T ^ { 2 }$, and $p ^ { - 1 } ( n - r - p + 1 ) F$ has an $F$-distribution with degrees of freedom $( p , n - r - p + 1 )$, central or non-central according as $\mathcal{H}$ is true or false. Rejecting $\mathcal{H}$ for large values of $F$ is uniformly most powerful invariant. If $q \geq 2$ there is no best way of combining the $q$ characteristic roots, so that there is no uniformly most powerful invariant test (unlike there is in ANOVA). The following tests have been proposed:
  
reject <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240386.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240387.png" /> (Wilks LR test);
+
reject $\mathcal{H}$ if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240387.png"/> (Wilks LR test);
  
reject <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240388.png" /> if the largest characteristic root of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240389.png" /> exceeds a constant (Roy's test);
+
reject $\mathcal{H}$ if the largest characteristic root of $\mathbf{M} _ { \mathcal{H} } \mathbf{M} _ { \mathsf{E} } ^ { - 1 }$ exceeds a constant (Roy's test);
  
reject <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240390.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240391.png" /> (Lawley–Hotelling test);
+
reject $\mathcal{H}$ if $\operatorname{tr}( \mathbf{M} _ { \mathcal{H} } \mathbf{M} _ { \mathsf{E} } ^ { - 1 } ) &gt; \text{const}$ (Lawley–Hotelling test);
  
reject <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240392.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240393.png" /> (Bartlett–Nanda–Pillai test). For references, see [[#References|[a1]]], Sects. 8.3, 8.6, or [[#References|[a36]]], Chap. 5. For distribution theory, see [[#References|[a1]]], Sects. 8.4, 8.6, [[#References|[a41]]], Sects. 10.4–10.6, [[#References|[a55]]], Sect. 10.3. Tables and charts can be found in [[#References|[a1]]], Appendix, and [[#References|[a36]]], Appendix.
+
reject $\mathcal{H}$ if $\operatorname { tr } ( \mathbf{M} _ { \mathcal{H} } ( \mathbf{M} _ { H } + \mathbf{M} _ { \mathsf{E} } ) ^ { - 1 } ) &gt; \text{const}$ (Bartlett–Nanda–Pillai test). For references, see [[#References|[a1]]], Sects. 8.3, 8.6, or [[#References|[a36]]], Chap. 5. For distribution theory, see [[#References|[a1]]], Sects. 8.4, 8.6, [[#References|[a41]]], Sects. 10.4–10.6, [[#References|[a55]]], Sect. 10.3. Tables and charts can be found in [[#References|[a1]]], Appendix, and [[#References|[a36]]], Appendix.
  
The problem of expressing the matrices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240394.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240395.png" /> in terms of the original model given by (a2), (a8) is very similar to the situation in ANOVA. One way is to express <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240396.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240397.png" /> explicitly in terms of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240398.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240399.png" />. Another is to consider the ANOVA problem with the same <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240400.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240401.png" />; if explicit formulas exist for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240402.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240403.png" />, they can be converted to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240404.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240405.png" />. For instance, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240406.png" /> in the ANOVA two-way layout (a4) converts to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240407.png" /> in the corresponding MANOVA problem, where now the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240408.png" /> are <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240409.png" />-vectors.
+
The problem of expressing the matrices $\mathbf{M} _ { \mathcal{H} }$ and ${\bf M} _ { \mathsf{E} }$ in terms of the original model given by (a2), (a8) is very similar to the situation in ANOVA. One way is to express $\mathbf{M} _ { \mathcal{H} }$ and ${\bf M} _ { \mathsf{E} }$ explicitly in terms of $\mathbf{X}$ and $\mathbf{X} _ { 3 }$. Another is to consider the ANOVA problem with the same $\mathbf{X}$ and $\mathbf{X} _ { 3 }$; if explicit formulas exist for $\text{SS} _ { \mathcal{H} }$ and $\text{SS} _ { e }$, they can be converted to $\mathbf{M} _ { \mathcal{H} }$ and ${\bf M} _ { \mathsf{E} }$. For instance, $\operatorname{SS} _ { e } = \sum _ { i j k } ( y _ { i j k } - y _ { i j .} ) ^ { 2 }$ in the ANOVA two-way layout (a4) converts to $\mathbf{M} _ { \mathsf{E} } = \sum _ { i j k } ( \mathbf{y} _ { i j k } - \mathbf{y} _ { i j  }. ) ^ { \prime } ( \mathbf{y} _ { i j k } - \mathbf{y} _ { i j }. )$ in the corresponding MANOVA problem, where now the $\mathbf{y} _ { i j k }$ are $( 1 \times p )$-vectors.
  
 
===Point estimation.===
 
===Point estimation.===
In the canonical system <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240410.png" /> is an unbiased estimator and the maximum-likelihood estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240411.png" /> (cf. also [[Maximum-likelihood method|Maximum-likelihood method]]). If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240412.png" /> is a linear function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240413.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240414.png" /> is both an unbiased estimator and a maximum-likelihood estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240415.png" />. An unbiased estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240416.png" /> is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240417.png" />, whereas its maximum-likelihood estimator is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240418.png" />.
+
In the canonical system $\mathbf{Z} _ { 1 }$ is an unbiased estimator and the maximum-likelihood estimator of $\Theta$ (cf. also [[Maximum-likelihood method|Maximum-likelihood method]]). If $f$ is a linear function of $\Theta$, then $f ( \mathbf{Z} _ { 1 } )$ is both an unbiased estimator and a maximum-likelihood estimator of $f ( \Theta )$. An unbiased estimator of $\Sigma$ is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240417.png"/>, whereas its maximum-likelihood estimator is $n ^ { - 1 } \mathbf{M} _ { \mathsf{E} }$.
  
 
===Confidence intervals and sets.===
 
===Confidence intervals and sets.===
There are several kinds of linear functions of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240419.png" /> that are of interest. The direct analogue of a linear function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240420.png" /> in ANOVA is a function of the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240421.png" /> (with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240422.png" /> of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240423.png" />), which is a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240424.png" />-vector. This leads to a confidence set in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240425.png" />-space for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240426.png" />, rather than an interval. Simultaneous confidence sets for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240427.png" /> can be derived from any of the proposed tests for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240428.png" />, but it turns out that only Roy's maximum root test is exact with respect to these confidence sets (and not, for instance, the LR test of Wilks); see [[#References|[a52]]], [[#References|[a53]]]. The same is true for simultaneous confidence sets for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240429.png" />, and confidence intervals for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240430.png" />. Simultaneous confidence sets for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240431.png" /> were given in [[#References|[a18]]]. In [[#References|[a46]]] simultaneous confidence intervals for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240432.png" /> are derived (called "double linear compounds" ). These are special cases of all (possibly matrix-valued) functions of the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240433.png" /> are treated in [[#References|[a11]]]. The most general linear functions of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240434.png" /> are of the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240435.png" />. Simultaneous confidence intervals for all such functions as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240436.png" /> runs through all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240437.png" />-matrices are given in [[#References|[a37]]]. These are derived from a test defined in terms of a symmetric gauge function rather than from Roy's maximum root test. In [[#References|[a52]]], [[#References|[a53]]] a generalization of this is given if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240438.png" /> has its rank restricted; for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240439.png" /> this reproduces the confidence intervals of [[#References|[a46]]].
+
There are several kinds of linear functions of $\Theta$ that are of interest. The direct analogue of a linear function of $\zeta _ { 1 } , \ldots , \zeta _ { q }$ in ANOVA is a function of the form $\mathbf{a} ^ { \prime } \Theta$ (with $\mathbf{a}$ of order $q \times 1$), which is a $( 1 \times p )$-vector. This leads to a confidence set in $p$-space for $\mathbf{a} ^ { \prime } \Theta$, rather than an interval. Simultaneous confidence sets for all $\mathbf{a} ^ { \prime } \Theta$ can be derived from any of the proposed tests for $\mathcal{H}$, but it turns out that only Roy's maximum root test is exact with respect to these confidence sets (and not, for instance, the LR test of Wilks); see [[#References|[a52]]], [[#References|[a53]]]. The same is true for simultaneous confidence sets for all $\Theta \mathbf{b}$, and confidence intervals for all $\mathbf{a} ^ { \prime } \Theta \mathbf b $. Simultaneous confidence sets for all $\mathbf{a} ^ { \prime } \Theta$ were given in [[#References|[a18]]]. In [[#References|[a46]]] simultaneous confidence intervals for all $\mathbf{a} ^ { \prime } \Theta \mathbf b $ are derived (called "double linear compounds" ). These are special cases of all (possibly matrix-valued) functions of the form $\mathbf{A} \Theta \mathbf{B}$ are treated in [[#References|[a11]]]. The most general linear functions of $\Theta$ are of the form $\operatorname { tr } ( \mathbf{N} \Theta )$. Simultaneous confidence intervals for all such functions as $\mathbf{N}$ runs through all $( p \times q )$-matrices are given in [[#References|[a37]]]. These are derived from a test defined in terms of a symmetric gauge function rather than from Roy's maximum root test. In [[#References|[a52]]], [[#References|[a53]]] a generalization of this is given if $\mathbf{N}$ has its rank restricted; for $\operatorname{rank}( \mathbf{N}) \leq 1$ this reproduces the confidence intervals of [[#References|[a46]]].
  
 
===Step-down procedures.===
 
===Step-down procedures.===
Partition <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240440.png" /> into its columns <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240441.png" />; then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240442.png" /> of (a8) is the intersection of the component hypotheses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240443.png" />. Also partition <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240444.png" /> into its columns <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240445.png" />. Then for each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240446.png" />, the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240447.png" /> is tested with a univariate ANOVA <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240448.png" />-test that depends only on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240449.png" />. If any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240450.png" /> is rejected, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240451.png" /> is rejected. The tests are independent, which permits easy determination of the overall level of significance in terms of the individual ones. For details, history of the subject and references, see [[#References|[a38]]] and [[#References|[a39]]], Sect. 3. A variation, based on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240452.png" />-values, is presented in [[#References|[a40]]]. Step-down procedures are convenient, but it is shown in [[#References|[a34]]] that even in the simplest case when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240453.png" />, a step-down test is not admissible. Furthermore, a step-down test is not exact with respect to simultaneous confidence intervals or confidence sets derived from the test for various linear functions of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240454.png" />; see [[#References|[a53]]], Sect. 4.4. A generalization of step-down procedures is proposed in [[#References|[a38]]] by grouping the column vectors of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240455.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240456.png" /> into blocks.
+
Partition $\mathbf{B}$ into its columns $\beta _ { 1 } , \ldots , \beta _ { p }$; then $\mathcal{H}$ of (a8) is the intersection of the component hypotheses $\mathcal{H} _ { j } : \mathbf{X} _ { 3 } \beta _ { j } = 0$. Also partition $\mathbf{Y}$ into its columns ${\bf y} _ { 1 } , \dots , {\bf y} _ { p }$. Then for each $j = 1 , \ldots , p$, the hypothesis ${\cal H} _ { j }$ is tested with a univariate ANOVA $F$-test that depends only on ${\bf y} _ { 1 } , \dots , {\bf y} _ { j }$. If any ${\cal H} _ { j }$ is rejected, then $\mathcal{H}$ is rejected. The tests are independent, which permits easy determination of the overall level of significance in terms of the individual ones. For details, history of the subject and references, see [[#References|[a38]]] and [[#References|[a39]]], Sect. 3. A variation, based on $P$-values, is presented in [[#References|[a40]]]. Step-down procedures are convenient, but it is shown in [[#References|[a34]]] that even in the simplest case when $q = 1$, a step-down test is not admissible. Furthermore, a step-down test is not exact with respect to simultaneous confidence intervals or confidence sets derived from the test for various linear functions of $\mathbf{B}$; see [[#References|[a53]]], Sect. 4.4. A generalization of step-down procedures is proposed in [[#References|[a38]]] by grouping the column vectors of $\mathbf{Y}$ and $\mathbf{B}$ into blocks.
  
 
===Random effects models.===
 
===Random effects models.===
Line 109: Line 117:
  
 
===Missing data.===
 
===Missing data.===
Statistical experiments involving multivariate observations bring in an element that is not present with univariate observations, such as in ANOVA. Above, it has been taken for granted that of every individual in a sample all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240457.png" /> variates are observed. In practice this is not always true, for various reasons, in which case some of the observations have missing data. (This is not to be confused with the notion of empty cells in ANOVA.) If that happens, one can group all observations with complete data together as the complete sample and call the remaining observations an incomplete sample. From a slightly different point of view, the incomplete sample is sometimes considered extra data on some of the variates. The analysis of MANOVA problems is more complicated when there are missing data. In the simplest case, all missing data are on the same variates. This is a special case of nested missing data patterns. In the latter case explicit expressions of maximum-likelihood estimators are possible; see [[#References|[a3]]] and the references therein. For more complicated missing data patterns explicit maximum-likelihood estimators are usually not available unless certain assumptions are made on the structure of the unknown covariance matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240458.png" />; see [[#References|[a3]]], [[#References|[a4]]] and [[#References|[a5]]]. The situation is even worse for testing. For instance, even in the simplest case of testing the hypothesis that the mean of a multivariate population is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240459.png" />, if in addition to a complete sample there is an incomplete one taken on a subset of the variates, then there is no locally (let alone uniformly) [[Most-powerful test|most-powerful test]]; see [[#References|[a9]]]. Several aspects of estimation and testing in the presence of various patterns of missing data can be found in [[#References|[a25]]], wherein also appear many references to other papers in the field.
+
Statistical experiments involving multivariate observations bring in an element that is not present with univariate observations, such as in ANOVA. Above, it has been taken for granted that of every individual in a sample all $p$ variates are observed. In practice this is not always true, for various reasons, in which case some of the observations have missing data. (This is not to be confused with the notion of empty cells in ANOVA.) If that happens, one can group all observations with complete data together as the complete sample and call the remaining observations an incomplete sample. From a slightly different point of view, the incomplete sample is sometimes considered extra data on some of the variates. The analysis of MANOVA problems is more complicated when there are missing data. In the simplest case, all missing data are on the same variates. This is a special case of nested missing data patterns. In the latter case explicit expressions of maximum-likelihood estimators are possible; see [[#References|[a3]]] and the references therein. For more complicated missing data patterns explicit maximum-likelihood estimators are usually not available unless certain assumptions are made on the structure of the unknown covariance matrix $\Sigma$; see [[#References|[a3]]], [[#References|[a4]]] and [[#References|[a5]]]. The situation is even worse for testing. For instance, even in the simplest case of testing the hypothesis that the mean of a multivariate population is $0$, if in addition to a complete sample there is an incomplete one taken on a subset of the variates, then there is no locally (let alone uniformly) [[Most-powerful test|most-powerful test]]; see [[#References|[a9]]]. Several aspects of estimation and testing in the presence of various patterns of missing data can be found in [[#References|[a25]]], wherein also appear many references to other papers in the field.
  
 
==GMANOVA.==
 
==GMANOVA.==
 
This topic has not been recognized as a distinct entity within multivariate analysis until relatively recently. Consequently, most of today's (2000) knowledge of the subject is found in the research literature, rather than in textbooks. (There is an introduction to GMANOVA in [[#References|[a41]]], Problem 10.18, and a little can be found in [[#References|[a8]]], Sect. 9.6, second part.) A good exposition of testing aspects of GMANOVA, pointing to applications in various experimental settings, is given in [[#References|[a21]]].
 
This topic has not been recognized as a distinct entity within multivariate analysis until relatively recently. Consequently, most of today's (2000) knowledge of the subject is found in the research literature, rather than in textbooks. (There is an introduction to GMANOVA in [[#References|[a41]]], Problem 10.18, and a little can be found in [[#References|[a8]]], Sect. 9.6, second part.) A good exposition of testing aspects of GMANOVA, pointing to applications in various experimental settings, is given in [[#References|[a21]]].
  
The general GMANOVA model was first stated in [[#References|[a42]]], where the motivation was the modelling of experiments on the comparison of growth curves in different populations. Suppose such a growth curve can be represented by a polynomial in the time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240460.png" />, say <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240461.png" />. If measurements are made on an individual at times <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240462.png" />, then these <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240463.png" /> data are thought of as one observation on a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240464.png" />-variate population with population mean <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240465.png" /> and covariance matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240466.png" />, where the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240467.png" />s and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240468.png" /> are unknown parameters. Suppose <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240469.png" /> populations are to be compared and a sample of size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240470.png" /> is taken from the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240471.png" />th population, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240472.png" />. In order to model this by (a3), let the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240473.png" />th column of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240474.png" /> (corresponding to the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240475.png" />th population) have <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240476.png" /> <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240477.png" />s, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240478.png" />s otherwise. Specifically, the first column has a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240479.png" /> in positions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240480.png" />, the second in positions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240481.png" />, etc.; then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240482.png" />. Let the growth curve in the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240483.png" />th population be <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240484.png" />; then the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240485.png" /> has <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240486.png" /> rows, the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240487.png" />th row being <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240488.png" />, so that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240489.png" /> in (a3); and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240490.png" /> has <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240491.png" /> columns, the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240492.png" />th one being <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240493.png" />. (In the example given in [[#References|[a42]]], measurements were taken at ages 8, 10, 12, and 14 in a group of girls and a group of boys; each measurement was of a certain distance between two points inside the head (with help of an X-ray picture) that is of interest in orthodontistry to monitor growth.)
+
The general GMANOVA model was first stated in [[#References|[a42]]], where the motivation was the modelling of experiments on the comparison of growth curves in different populations. Suppose such a growth curve can be represented by a polynomial in the time $t$, say $f ( t ) = \beta _ { 0 } + \beta _ { 1 } t + \ldots + \beta _ { k } t ^ { k }$. If measurements are made on an individual at times $t _ { 1 } , \ldots , t _ { p }$, then these $p$ data are thought of as one observation on a $p$-variate population with population mean $( f ( t _ { 1 } ) , \ldots , f ( t _ { p } ) )$ and covariance matrix $\Sigma$, where the $\beta$s and $\Sigma$ are unknown parameters. Suppose $m$ populations are to be compared and a sample of size $n_i$ is taken from the $i$th population, $i = 1 , \ldots , m$. In order to model this by (a3), let the $i$th column of $\mathbf{X} _ { 1 }$ (corresponding to the $i$th population) have $n_i$ $1$s, and $0$s otherwise. Specifically, the first column has a $1$ in positions $1 , \ldots , n _ { 1 }$, the second in positions $n _ { 1 } + 1 , \ldots , n _ { 1 } + n _ { 2 }$, etc.; then $n = \sum n_{i}$. Let the growth curve in the $i$th population be $\beta _ { i 0 } + \beta _ { i 1 } t + \ldots + \beta _ { i k } t ^ { k }$; then the matrix $\mathbf{B}$ has $m$ rows, the $i$th row being $( \beta _ { i 0 } , \ldots , \beta _ { i k } )$, so that $s = k + 1$ in (a3); and $\mathbf{X} _ { 2 }$ has $p$ columns, the $j$th one being $( 1 , t _ { j } , \ldots , t _ { j } ^ { k } ) ^ { \prime }$. (In the example given in [[#References|[a42]]], measurements were taken at ages 8, 10, 12, and 14 in a group of girls and a group of boys; each measurement was of a certain distance between two points inside the head (with help of an X-ray picture) that is of interest in orthodontistry to monitor growth.)
  
Linear hypotheses are in general of the form (a9). For instance, suppose two growth curves are to be compared, both assumed to be straight lines (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240494.png" />) so that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240495.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240496.png" />. Suppose the hypothesis is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240497.png" /> (equal slope in the two populations). Then in (a9) one can take <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240498.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240499.png" />. Other examples of GMANOVA may be found in [[#References|[a21]]].
+
Linear hypotheses are in general of the form (a9). For instance, suppose two growth curves are to be compared, both assumed to be straight lines ($k = 1$) so that $m = 2$, $s = 2$. Suppose the hypothesis is $\beta _ { 11 } = \beta _ { 21 }$ (equal slope in the two populations). Then in (a9) one can take $\mathbf{X} _ { 3 } = ( 1 , - 1 )$ and $\mathbf{X} _ { 4 } = ( 0,1 ) ^ { \prime }$. Other examples of GMANOVA may be found in [[#References|[a21]]].
  
A canonical form for the GMANOVA model was derived in [[#References|[a13]]]; it can also be found in [[#References|[a21]]], Sect. 3.2. It can be obtained from the canonical form of MANOVA by partitioning the matrices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240500.png" /> columnwise into three blocks, resulting in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240501.png" /> matrices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240502.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240503.png" />. Invariance reduction eliminates all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240504.png" /> except <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240505.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240506.png" /> (the latter is used for estimating the relevant portion of the unknown covariance matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240507.png" />). It is given that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240508.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240509.png" />; inference is desired on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240510.png" />, e.g., to test the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240511.png" />. Further sufficiency reduction leads to two matrix-valued statistics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240512.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240513.png" /> ([[#References|[a20]]], [[#References|[a21]]]), of which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240514.png" /> is the most important and is built-up from the following statistic:
+
A canonical form for the GMANOVA model was derived in [[#References|[a13]]]; it can also be found in [[#References|[a21]]], Sect. 3.2. It can be obtained from the canonical form of MANOVA by partitioning the matrices $\mathbf{Z}_{i}$ columnwise into three blocks, resulting in $9$ matrices ${\bf Z} _ { i j }$, $i, j = 1,2,3$. Invariance reduction eliminates all ${\bf Z} _ { i j }$ except $[ \mathbf{Z} _ { 12 } , \mathbf{Z} _ { 13 } ]$ and $[\mathbf{Z} _ { 32 } , \mathbf{Z} _ { 33 }]$ (the latter is used for estimating the relevant portion of the unknown covariance matrix $\Sigma$). It is given that $\mathsf{E} ( {\bf Z} _ { 13 } ) = 0$ and $\mathsf E [ \mathbf Z _ { 32 } , \mathbf Z _ { 33 } ] = 0$; inference is desired on $\Theta = \textsf{E} ( \mathbf{Z} _ { 12 } )$, e.g., to test the hypothesis $\mathcal{H} : \Theta = 0$. Further sufficiency reduction leads to two matrix-valued statistics $\mathbf{T} _ { 1 }$ and $\mathbf{T} _ { 2 }$ ([[#References|[a20]]], [[#References|[a21]]]), of which $\mathbf{T} _ { 1 }$ is the most important and is built-up from the following statistic:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240515.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a10)</td></tr></table>
+
\begin{equation} \tag{a10} \mathbf{Z} _ { 0 } = \mathbf{Z} _ { 12 } - \mathbf{Z} _ { 13 } \mathbf{R}, \end{equation}
  
in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240516.png" /> (with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240517.png" />) is the estimated regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240518.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240519.png" />, the true regression being <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240520.png" />. That inference on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240521.png" /> should be centred on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240522.png" /> can be understood intuitively by realizing that if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240523.png" /> were known, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240524.png" /> minimizes the variances among all linear combinations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240525.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240526.png" /> whose mean is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240527.png" />, and provides therefore better inference than using only <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240528.png" />. The unknown regression is then estimated by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240529.png" />, leading to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240530.png" /> of (a10).
+
in which $\mathbf{R} = \mathbf{V} _ { 33 } ^ { - 1 } \mathbf{V} _ { 32 }$ (with ${\bf V} _ { j j ^ { \prime } } = {\bf Z} _ { 3 j } ^ { \prime } {\bf Z} _ { 3 j^{\prime} }$) is the estimated regression of $\mathbf{Z} _ { 12 }$ on $\mathbf{Z} _ { 13 }$, the true regression being $\Sigma _ { 33 } ^ { - 1 } \Sigma _ { 32 }$. That inference on $\Theta$ should be centred on $\mathbf{Z}_{0}$ can be understood intuitively by realizing that if $\Sigma$ were known, then $\mathbf{Z} _ { 12 } - \mathbf{Z} _ { 13 } \Sigma _ { 33 } ^ { - 1 } \Sigma _ { 32 }$ minimizes the variances among all linear combinations of $\mathbf{Z} _ { 12 }$ and $\mathbf{Z} _ { 13 }$ whose mean is $\Theta$, and provides therefore better inference than using only $\mathbf{Z} _ { 12 }$. The unknown regression is then estimated by $\mathbf{R}$, leading to $\mathbf{Z}_{0}$ of (a10).
  
The essential difference between GMANOVA and MANOVA lies in the presence of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240531.png" />, which is correlated with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240532.png" /> and has zero mean. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240533.png" /> is used as a covariate for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240534.png" />; see, e.g., [[#References|[a33]]]. However, not all models that appear to be GMANOVA produce such a covariate. More precisely, if in (a3) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240535.png" />, then it turns out that in the canonical form there are no matrices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240536.png" /> and the model reduces essentially to MANOVA. This situation was encountered previously when it was pointed out that the MANOVA model (a2) together with the GMANOVA-type hypothesis (a9) was immediately reducible to straight MANOVA. The same conclusion would have been reached after treating (a2), (a9) as a special case of GMANOVA and inspecting the canonical form. For a "true" GMANOVA the existence of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240537.png" /> is essential. A typical example of true GMANOVA, where the covariate data are built into the experiment, was given in [[#References|[a7]]].
+
The essential difference between GMANOVA and MANOVA lies in the presence of $\mathbf{Z} _ { 13 }$, which is correlated with $\mathbf{Z} _ { 12 }$ and has zero mean. Then $\mathbf{Z} _ { 13 }$ is used as a covariate for $\mathbf{Z} _ { 12 }$; see, e.g., [[#References|[a33]]]. However, not all models that appear to be GMANOVA produce such a covariate. More precisely, if in (a3) $\operatorname{rank} (\mathbf{X} _ { 2 } ) = p$, then it turns out that in the canonical form there are no matrices ${\bf Z} _ { i3 }$ and the model reduces essentially to MANOVA. This situation was encountered previously when it was pointed out that the MANOVA model (a2) together with the GMANOVA-type hypothesis (a9) was immediately reducible to straight MANOVA. The same conclusion would have been reached after treating (a2), (a9) as a special case of GMANOVA and inspecting the canonical form. For a "true" GMANOVA the existence of $\mathbf{Z} _ { 13 }$ is essential. A typical example of true GMANOVA, where the covariate data are built into the experiment, was given in [[#References|[a7]]].
  
Inference on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240538.png" /> can proceed using only <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240539.png" /> (e.g., [[#References|[a27]]], and [[#References|[a13]]]), but is not necessarily the best possible. For testing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240540.png" /> an essentially complete class of tests include those that also involve <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240541.png" /> explicitly. One such test is the locally most-powerful test derived in [[#References|[a20]]]. For the distribution theory of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240542.png" /> see [[#References|[a21]]], Sect. 3.6, and [[#References|[a54]]], Sect. 6.5. Admissibility and inadmissibility results were obtained in [[#References|[a32]]]; comparison of various tests can also be found there. A natural estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240543.png" /> is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240544.png" /> of (a10); it is an unbiased estimator and in [[#References|[a22]]] it is shown to be best equivariant. Other kinds of estimators have also been considered, e.g., in [[#References|[a24]]], in which several references to earlier work can be found. Simultaneous confidence intervals and sets have been treated in [[#References|[a16]]], [[#References|[a17]]], [[#References|[a27]]], and [[#References|[a28]]]. Special structures of the covariance matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240545.png" /> have been studied in [[#References|[a44]]], where also references to earlier work on related topics can be found.
+
Inference on $\Theta$ can proceed using only $\mathbf{T} _ { 1 }$ (e.g., [[#References|[a27]]], and [[#References|[a13]]]), but is not necessarily the best possible. For testing $\mathcal{H}$ an essentially complete class of tests include those that also involve $\mathbf{T} _ { 2 }$ explicitly. One such test is the locally most-powerful test derived in [[#References|[a20]]]. For the distribution theory of $( \mathbf{T} _ { 1 } , \mathbf{T} _ { 2 } )$ see [[#References|[a21]]], Sect. 3.6, and [[#References|[a54]]], Sect. 6.5. Admissibility and inadmissibility results were obtained in [[#References|[a32]]]; comparison of various tests can also be found there. A natural estimator of $\Theta$ is $\mathbf{Z}_{0}$ of (a10); it is an unbiased estimator and in [[#References|[a22]]] it is shown to be best equivariant. Other kinds of estimators have also been considered, e.g., in [[#References|[a24]]], in which several references to earlier work can be found. Simultaneous confidence intervals and sets have been treated in [[#References|[a16]]], [[#References|[a17]]], [[#References|[a27]]], and [[#References|[a28]]]. Special structures of the covariance matrix $\Sigma$ have been studied in [[#References|[a44]]], where also references to earlier work on related topics can be found.
  
 
===Generalizations.===
 
===Generalizations.===
A natural generalization of the GMANOVA model is indicated in [[#References|[a13]]] by having a further partitioning of the blocks of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240546.png" />s in the canonical form. This is called extended GMANOVA in [[#References|[a21]]] and examples are given there. Another generalization involves some relaxation of the usual assumptions of multivariate normality, etc. See [[#References|[a23]]], [[#References|[a12]]], [[#References|[a17]]].
+
A natural generalization of the GMANOVA model is indicated in [[#References|[a13]]] by having a further partitioning of the blocks of $Z$s in the canonical form. This is called extended GMANOVA in [[#References|[a21]]] and examples are given there. Another generalization involves some relaxation of the usual assumptions of multivariate normality, etc. See [[#References|[a23]]], [[#References|[a12]]], [[#References|[a17]]].
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top"> T.W. Anderson,   "An introduction to multivariate statistical analysis" , Wiley (1984) (Edition: Second)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top"> T.W. Anderson,   "The asymptotic distribution of characteristic roots and vectors in multivariate components of variance" L.J. Gleser (ed.) M.D. Perlman (ed.) S.J. Press (ed.) A.R. Sampson (ed.) , ''Contributions to Probability and Statistics; Essays in Honor of Ingram Olkin'' , Springer (1989) pp. 177–196</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top"> S.A. Andersson,   M.D. Perlman,   "Lattice-ordered conditional independence models for missing data" ''Statist. Prob. Lett.'' , '''12''' (1991) pp. 465–486</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top"> S.A. Andersson,   M.D. Perlman,   "Lattice models for conditional independence in a multivariate normal distribution" ''Ann. Statist.'' , '''21''' (1993) pp. 1318–1358</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top"> S.A. Andersson,   J.I. Marden,   M.D. Perlman,   "Totally ordered multivariate linear models" ''Sankhyā A'' , '''55''' (1993) pp. 370–394</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top"> Y.M.M. Bishop,   S.E. Fienberg,   P.W. Holland,   "Discrete multivariate analysis: Theory and practice" , MIT (1975)</TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top"> W.G. Cochran,   C.I. Bliss,   "Discrimination functions with covariance" ''Ann. Statist.'' , '''19''' (1948) pp. 151–176</TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top"> M.L. Eaton,   "Multivariate statistics, a vector space approach" , Wiley (1983)</TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top"> M.L. Eaton,   T. Kariya,   "Multivariate tests with incomplete data" ''Ann. Statist.'' , '''11''' (1983) pp. 654–665</TD></TR><TR><TD valign="top">[a10]</TD> <TD valign="top"> S.E. Fienberg,   "The analysis of cross-classified categorical data" , MIT (1980) (Edition: Second)</TD></TR><TR><TD valign="top">[a11]</TD> <TD valign="top"> K.R. Gabriel,   "Simultaneous test procedures in multivariate analysis of variance" ''Biometrika'' , '''55''' (1968) pp. 489–504</TD></TR><TR><TD valign="top">[a12]</TD> <TD valign="top"> N. Giri,   K. Das,   "On a robust test of the extended MANOVA problem in elliptically symmetric distributions" ''Sankhyā A'' , '''50''' (1988) pp. 234–248</TD></TR><TR><TD valign="top">[a13]</TD> <TD valign="top"> L.J. Gleser,   I. Olkin,   "Linear models in multivariate analysis" R.C. Bose (ed.) , ''Essays in Probability and Statistics: In memory of S.N. Roy'' , Univ. North Carolina Press (1970) pp. 267–292</TD></TR><TR><TD valign="top">[a14]</TD> <TD valign="top"> "Analysis of Variance" P.R. Krishnaiah (ed.) , ''Handbook of Statistics'' , '''1''' , North-Holland (1980)</TD></TR><TR><TD valign="top">[a15]</TD> <TD valign="top"> K. Hinkelmann,   O. Kempthorne,   "Design and analysis of experiments" , '''I: Introduction to experimental design''' , Wiley (1994)</TD></TR><TR><TD valign="top">[a16]</TD> <TD valign="top"> P.M. Hooper,   "Simultaneous interval estimation in the general multivariate analysis of variance model" ''Ann. Statist.'' , '''11''' (1983) pp. 666–673 (Correction in: 12 (1984), 785)</TD></TR><TR><TD valign="top">[a17]</TD> <TD valign="top"> P.M. Hooper,   W.K. Yau,   "Optimal confidence regions in GMANOVA" ''Canad. J. Statist.'' , '''14''' (1986) pp. 315–322</TD></TR><TR><TD valign="top">[a18]</TD> <TD valign="top"> D.R. Jensen,   L.S. Mayer,   "Some variational results and their applications in multiple inference" ''Ann. Statist.'' , '''5''' (1977) pp. 922–931</TD></TR><TR><TD valign="top">[a19]</TD> <TD valign="top"> R.A. Johnson,   D.W. Wichern,   "Applied multivariate statistical analysis" , Prentice-Hall (1988) (Edition: Second)</TD></TR><TR><TD valign="top">[a20]</TD> <TD valign="top"> T. Kariya,   "The general MANOVA problem" ''Ann. Statist.'' , '''6''' (1978) pp. 200–214</TD></TR><TR><TD valign="top">[a21]</TD> <TD valign="top"> T. Kariya,   "Testing in the multivariate general linear model" , Kinokuniya (1985)</TD></TR><TR><TD valign="top">[a22]</TD> <TD valign="top"> T. Kariya,   "Equivariant estimation in a model with an ancillary statistic" ''Ann. Statist.'' , '''17''' (1989) pp. 920–928</TD></TR><TR><TD valign="top">[a23]</TD> <TD valign="top"> T. Kariya,   B.K. Sinha,   "Robustness of statistical tests" , Acad. Press (1989)</TD></TR><TR><TD valign="top">[a24]</TD> <TD valign="top"> T. Kariya,   Y. Konno,   W.E. Strawderman,   "Double shrinkage estimators in the GMANOVA model" ''J. Multivar. Anal.'' , '''56''' (1996) pp. 245–258</TD></TR><TR><TD valign="top">[a25]</TD> <TD valign="top"> T. Kariya,   P.R. Krishnaiah,   C.R. Rao,   "Statistical inference from multivariate normal populations when some data is missing" P.R. Krishnaiah (ed.) , ''Developm. in Statist.'' , '''4''' , Acad. Press (1983) pp. 137–148</TD></TR><TR><TD valign="top">[a26]</TD> <TD valign="top"> O. Kempthorne,   "The design and analysis of experiments" , Wiley (1952)</TD></TR><TR><TD valign="top">[a27]</TD> <TD valign="top"> C.G. Khatri,   "A note on a MANOVA model applied to problems in growth curves" ''Ann. Inst. Statist. Math.'' , '''18''' (1966) pp. 75–86</TD></TR><TR><TD valign="top">[a28]</TD> <TD valign="top"> P.R. Krishnaiah,   "Simultaneous test procedures under general MANOVA models" P.R. Krishnaiah (ed.) , ''Multivariate Analysis II'' , Acad. Press (1969) pp. 121–143</TD></TR><TR><TD valign="top">[a29]</TD> <TD valign="top"> A.M. Kshirsagar,   "Multivariate analysis" , M. Dekker (1972)</TD></TR><TR><TD valign="top">[a30]</TD> <TD valign="top"> E.L. Lehmann,   "Theory of point estimation" , Wiley (1983)</TD></TR><TR><TD valign="top">[a31]</TD> <TD valign="top"> E L. Lehmann,   "Testing statistical hypotheses" , Wiley (1986) (Edition: Second)</TD></TR><TR><TD valign="top">[a32]</TD> <TD valign="top"> J.I. Marden,   "Admissibility of invariant tests in the general multivariate analysis of variance problem" ''Ann. Statist.'' , '''11''' (1983) pp. 1086–1099</TD></TR><TR><TD valign="top">[a33]</TD> <TD valign="top"> J.I. Marden,   M.D. Perlman,   "Invariant tests for means with covariates" ''Ann. Statist.'' , '''8''' (1980) pp. 25–63</TD></TR><TR><TD valign="top">[a34]</TD> <TD valign="top"> J.I. Marden,   M.D. Perlman,   "On the inadmissibility of step-down procedures for the Hotelling <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a130/a130240/a130240547.png" /> problem" ''Ann. Statist.'' , '''18''' (1990) pp. 172–190</TD></TR><TR><TD valign="top">[a35]</TD> <TD valign="top"> T. Mathew,   A. Niyogi,   B.K. Sinha,   "Improved nonnegative estimation of variance components in balanced multivariate mixed models" ''J. Multivar. Anal.'' , '''51''' (1994) pp. 83–101</TD></TR><TR><TD valign="top">[a36]</TD> <TD valign="top"> D.F. Morrison,   "Multivariate statistical methods" , McGraw-Hill (1976) (Edition: Second)</TD></TR><TR><TD valign="top">[a37]</TD> <TD valign="top"> G.S. Mudholkar,   "On confidence bounds associated with multivariate analysis of variance and non-independence between two sets of variates" ''Ann. Math. Statist.'' , '''37''' (1966) pp. 1736–1746</TD></TR><TR><TD valign="top">[a38]</TD> <TD valign="top"> G.S. Mudholkar,   P. Subbaiah,   "A review of step-down procedures for multivariate analysis of variance" R.P. Gupta (ed.) , ''Multivariate Statistical Analysis'' , North-Holland (1980) pp. 161–178</TD></TR><TR><TD valign="top">[a39]</TD> <TD valign="top"> G.S. Mudholkar,   P. Subbaiah,   "Some simple optimum tests in multivariate analysis" A.K. Gupta (ed.) , ''Advances in Multivariate Statistical Analysis'' , Reidel (1987) pp. 253–275</TD></TR><TR><TD valign="top">[a40]</TD> <TD valign="top"> G.S. Mudholkar,   P. Subbaiah,   "On a Fisherian detour of the step-down procedure for MANOVA" ''Commun. Statist. Theory and Methods'' , '''17''' (1988) pp. 599–611</TD></TR><TR><TD valign="top">[a41]</TD> <TD valign="top"> R.J. Muirhead,   "Aspects of multivariate statistical theory" , Wiley (1982)</TD></TR><TR><TD valign="top">[a42]</TD> <TD valign="top"> R.F. Potthoff,   S.N. Roy,   "A generalized multivariate analysis of variance model useful especially for growth curve models" ''Biometrika'' , '''51''' (1964) pp. 313–326</TD></TR><TR><TD valign="top">[a43]</TD> <TD valign="top"> C.R. Rao,   "Linear statistical inference and its applications" , Wiley (1973) (Edition: Second)</TD></TR><TR><TD valign="top">[a44]</TD> <TD valign="top"> C.R. Rao,   "Least squares theory using an estimated dispersion matrix and its application to measurement of signals" L.M. Le Cam (ed.) J. Neyman (ed.) , ''Fifth Berkeley Symp. Math. Statist. Probab.'' , '''1''' , Univ. California Press (1967) pp. 355–372</TD></TR><TR><TD valign="top">[a45]</TD> <TD valign="top"> C.R. Rao,   S.K. Mitra,   "Generalized inverses of matrices and its applications" , Wiley (1971)</TD></TR><TR><TD valign="top">[a46]</TD> <TD valign="top"> S.N. Roy,   R.C. Bose,   "Simultaneous confidence interval estimation" ''Ann. Math. Statist.'' , '''24''' (1953) pp. 513–536</TD></TR><TR><TD valign="top">[a47]</TD> <TD valign="top"> H. Scheffé,   "Alternative models for the analysis of variance" ''Ann. Math. Statist.'' , '''27''' (1956) pp. 251–271</TD></TR><TR><TD valign="top">[a48]</TD> <TD valign="top"> H. Scheffé,   "The analysis of variance" , Wiley (1959)</TD></TR><TR><TD valign="top">[a49]</TD> <TD valign="top"> S.R. Searle,   "Linear models" , Wiley (1971)</TD></TR><TR><TD valign="top">[a50]</TD> <TD valign="top"> S.R. Searle,   "Linear models for unbalanced data" , Wiley (1987)</TD></TR><TR><TD valign="top">[a51]</TD> <TD valign="top"> S. Weisberg,   "Applied linear regression" , Wiley (1985) (Edition: Second)</TD></TR><TR><TD valign="top">[a52]</TD> <TD valign="top"> R.A. Wijsman,   "Constructing all smallest simultaneous confidence sets in a given class, with applications to MANOVA" ''Ann. Statist.'' , '''7''' (1979) pp. 1003–1018</TD></TR><TR><TD valign="top">[a53]</TD> <TD valign="top"> R.A. Wijsman,   "Smallest simultaneous confidence sets with applications in multivariate analysis" P.R. Krishnaiah (ed.) , ''Multivariate Analysis V'' , North-Holland (1980) pp. 483–498</TD></TR><TR><TD valign="top">[a54]</TD> <TD valign="top"> R.A. Wijsman,   "Global cross sections as a tool for factorization of measures and distribution of maximal invariants" ''Sankhyā A'' , '''48''' (1986) pp. 1–42</TD></TR><TR><TD valign="top">[a55]</TD> <TD valign="top"> R.A. Wijsman,   "Invariant measures on groups and their use in statistics" , ''Lecture Notes Monograph Ser.'' , '''14''' , Inst. Math. Statist. (1990)</TD></TR><TR><TD valign="top">[a56]</TD> <TD valign="top"> "Encyclopedia of Statistical Sciences" S. Kotz (ed.) N.L. Johnson (ed.) , Wiley (1982/88)</TD></TR></table>
+
<table><tr><td valign="top">[a1]</td> <td valign="top"> T.W. Anderson, "An introduction to multivariate statistical analysis" , Wiley (1984) (Edition: Second) {{MR|0771294}} {{ZBL|0651.62041}} </td></tr><tr><td valign="top">[a2]</td> <td valign="top"> T.W. Anderson, "The asymptotic distribution of characteristic roots and vectors in multivariate components of variance" L.J. Gleser (ed.) M.D. Perlman (ed.) S.J. Press (ed.) A.R. Sampson (ed.) , ''Contributions to Probability and Statistics; Essays in Honor of Ingram Olkin'' , Springer (1989) pp. 177–196 {{MR|1024331}} {{ZBL|}} </td></tr><tr><td valign="top">[a3]</td> <td valign="top"> S.A. Andersson, M.D. Perlman, "Lattice-ordered conditional independence models for missing data" ''Statist. Prob. Lett.'' , '''12''' (1991) pp. 465–486 {{MR|1143745}} {{ZBL|0751.62026}} </td></tr><tr><td valign="top">[a4]</td> <td valign="top"> S.A. Andersson, M.D. Perlman, "Lattice models for conditional independence in a multivariate normal distribution" ''Ann. Statist.'' , '''21''' (1993) pp. 1318–1358 {{MR|1241268}} {{ZBL|0803.62042}} </td></tr><tr><td valign="top">[a5]</td> <td valign="top"> S.A. Andersson, J.I. Marden, M.D. Perlman, "Totally ordered multivariate linear models" ''Sankhyā A'' , '''55''' (1993) pp. 370–394 {{MR|1323395}} {{ZBL|}} </td></tr><tr><td valign="top">[a6]</td> <td valign="top"> Y.M.M. Bishop, S.E. Fienberg, P.W. Holland, "Discrete multivariate analysis: Theory and practice" , MIT (1975) {{MR|0381130}} {{ZBL|0332.62039}} </td></tr><tr><td valign="top">[a7]</td> <td valign="top"> W.G. Cochran, C.I. Bliss, "Discrimination functions with covariance" ''Ann. Statist.'' , '''19''' (1948) pp. 151–176</td></tr><tr><td valign="top">[a8]</td> <td valign="top"> M.L. Eaton, "Multivariate statistics, a vector space approach" , Wiley (1983) {{MR|}} {{ZBL|0587.62097}} </td></tr><tr><td valign="top">[a9]</td> <td valign="top"> M.L. Eaton, T. Kariya, "Multivariate tests with incomplete data" ''Ann. Statist.'' , '''11''' (1983) pp. 654–665 {{MR|0696076}} {{ZBL|0524.62051}} </td></tr><tr><td valign="top">[a10]</td> <td valign="top"> S.E. Fienberg, "The analysis of cross-classified categorical data" , MIT (1980) (Edition: Second) {{MR|0623082}} {{ZBL|0499.62049}} </td></tr><tr><td valign="top">[a11]</td> <td valign="top"> K.R. Gabriel, "Simultaneous test procedures in multivariate analysis of variance" ''Biometrika'' , '''55''' (1968) pp. 489–504 {{MR|0235667}} {{ZBL|}} </td></tr><tr><td valign="top">[a12]</td> <td valign="top"> N. Giri, K. Das, "On a robust test of the extended MANOVA problem in elliptically symmetric distributions" ''Sankhyā A'' , '''50''' (1988) pp. 234–248</td></tr><tr><td valign="top">[a13]</td> <td valign="top"> L.J. Gleser, I. Olkin, "Linear models in multivariate analysis" R.C. Bose (ed.) , ''Essays in Probability and Statistics: In memory of S.N. Roy'' , Univ. North Carolina Press (1970) pp. 267–292 {{MR|0267693}} {{ZBL|}} </td></tr><tr><td valign="top">[a14]</td> <td valign="top"> "Analysis of Variance" P.R. Krishnaiah (ed.) , ''Handbook of Statistics'' , '''1''' , North-Holland (1980) {{MR|0600318}} {{ZBL|0447.00013}} </td></tr><tr><td valign="top">[a15]</td> <td valign="top"> K. Hinkelmann, O. Kempthorne, "Design and analysis of experiments" , '''I: Introduction to experimental design''' , Wiley (1994) {{MR|1265939}} {{ZBL|0805.62071}} </td></tr><tr><td valign="top">[a16]</td> <td valign="top"> P.M. Hooper, "Simultaneous interval estimation in the general multivariate analysis of variance model" ''Ann. Statist.'' , '''11''' (1983) pp. 666–673 (Correction in: 12 (1984), 785) {{MR|0696077}} {{MR|0740934}} {{ZBL|0526.62032}} </td></tr><tr><td valign="top">[a17]</td> <td valign="top"> P.M. Hooper, W.K. Yau, "Optimal confidence regions in GMANOVA" ''Canad. J. Statist.'' , '''14''' (1986) pp. 315–322 {{MR|0876757}} {{ZBL|0625.62021}} </td></tr><tr><td valign="top">[a18]</td> <td valign="top"> D.R. Jensen, L.S. Mayer, "Some variational results and their applications in multiple inference" ''Ann. Statist.'' , '''5''' (1977) pp. 922–931 {{MR|0448707}} {{ZBL|0368.62007}} </td></tr><tr><td valign="top">[a19]</td> <td valign="top"> R.A. Johnson, D.W. Wichern, "Applied multivariate statistical analysis" , Prentice-Hall (1988) (Edition: Second) {{MR|2372475}} {{MR|1168210}} {{MR|0653327}} {{ZBL|0663.62061}} </td></tr><tr><td valign="top">[a20]</td> <td valign="top"> T. Kariya, "The general MANOVA problem" ''Ann. Statist.'' , '''6''' (1978) pp. 200–214 {{MR|0474629}} {{ZBL|0382.62042}} </td></tr><tr><td valign="top">[a21]</td> <td valign="top"> T. Kariya, "Testing in the multivariate general linear model" , Kinokuniya (1985)</td></tr><tr><td valign="top">[a22]</td> <td valign="top"> T. Kariya, "Equivariant estimation in a model with an ancillary statistic" ''Ann. Statist.'' , '''17''' (1989) pp. 920–928 {{MR|0994276}} {{ZBL|0697.62020}} </td></tr><tr><td valign="top">[a23]</td> <td valign="top"> T. Kariya, B.K. Sinha, "Robustness of statistical tests" , Acad. Press (1989) {{MR|0996634}} {{ZBL|0699.62033}} </td></tr><tr><td valign="top">[a24]</td> <td valign="top"> T. Kariya, Y. Konno, W.E. Strawderman, "Double shrinkage estimators in the GMANOVA model" ''J. Multivar. Anal.'' , '''56''' (1996) pp. 245–258 {{MR|1379529}} {{ZBL|0863.62055}} </td></tr><tr><td valign="top">[a25]</td> <td valign="top"> T. Kariya, P.R. Krishnaiah, C.R. Rao, "Statistical inference from multivariate normal populations when some data is missing" P.R. Krishnaiah (ed.) , ''Developm. in Statist.'' , '''4''' , Acad. Press (1983) pp. 137–148</td></tr><tr><td valign="top">[a26]</td> <td valign="top"> O. Kempthorne, "The design and analysis of experiments" , Wiley (1952) {{MR|1528291}} {{MR|0045368}} {{ZBL|0049.09901}} </td></tr><tr><td valign="top">[a27]</td> <td valign="top"> C.G. Khatri, "A note on a MANOVA model applied to problems in growth curves" ''Ann. Inst. Statist. Math.'' , '''18''' (1966) pp. 75–86 {{MR|0219181}} {{ZBL|}} </td></tr><tr><td valign="top">[a28]</td> <td valign="top"> P.R. Krishnaiah, "Simultaneous test procedures under general MANOVA models" P.R. Krishnaiah (ed.) , ''Multivariate Analysis II'' , Acad. Press (1969) pp. 121–143 {{MR|254975}} {{ZBL|}} </td></tr><tr><td valign="top">[a29]</td> <td valign="top"> A.M. Kshirsagar, "Multivariate analysis" , M. Dekker (1972) {{MR|0343478}} {{ZBL|0246.62064}} </td></tr><tr><td valign="top">[a30]</td> <td valign="top"> E.L. Lehmann, "Theory of point estimation" , Wiley (1983) {{MR|0702834}} {{ZBL|0522.62020}} </td></tr><tr><td valign="top">[a31]</td> <td valign="top"> E L. Lehmann, "Testing statistical hypotheses" , Wiley (1986) (Edition: Second) {{MR|0852406}} {{ZBL|0608.62020}} </td></tr><tr><td valign="top">[a32]</td> <td valign="top"> J.I. Marden, "Admissibility of invariant tests in the general multivariate analysis of variance problem" ''Ann. Statist.'' , '''11''' (1983) pp. 1086–1099 {{MR|0720255}} {{ZBL|0598.62006}} </td></tr><tr><td valign="top">[a33]</td> <td valign="top"> J.I. Marden, M.D. Perlman, "Invariant tests for means with covariates" ''Ann. Statist.'' , '''8''' (1980) pp. 25–63 {{MR|0557553}} {{ZBL|0454.62049}} </td></tr><tr><td valign="top">[a34]</td> <td valign="top"> J.I. Marden, M.D. Perlman, "On the inadmissibility of step-down procedures for the Hotelling ${\bf T} ^ { 2 }$ problem" ''Ann. Statist.'' , '''18''' (1990) pp. 172–190 {{MR|1041390}} {{ZBL|0712.62052}} </td></tr><tr><td valign="top">[a35]</td> <td valign="top"> T. Mathew, A. Niyogi, B.K. Sinha, "Improved nonnegative estimation of variance components in balanced multivariate mixed models" ''J. Multivar. Anal.'' , '''51''' (1994) pp. 83–101 {{MR|1309370}} {{ZBL|0806.62057}} </td></tr><tr><td valign="top">[a36]</td> <td valign="top"> D.F. Morrison, "Multivariate statistical methods" , McGraw-Hill (1976) (Edition: Second) {{MR|0408108}} {{ZBL|0355.62049}} </td></tr><tr><td valign="top">[a37]</td> <td valign="top"> G.S. Mudholkar, "On confidence bounds associated with multivariate analysis of variance and non-independence between two sets of variates" ''Ann. Math. Statist.'' , '''37''' (1966) pp. 1736–1746 {{MR|0214204}} {{ZBL|0146.40403}} </td></tr><tr><td valign="top">[a38]</td> <td valign="top"> G.S. Mudholkar, P. Subbaiah, "A review of step-down procedures for multivariate analysis of variance" R.P. Gupta (ed.) , ''Multivariate Statistical Analysis'' , North-Holland (1980) pp. 161–178 {{MR|0600149}} {{ZBL|0445.62079}} </td></tr><tr><td valign="top">[a39]</td> <td valign="top"> G.S. Mudholkar, P. Subbaiah, "Some simple optimum tests in multivariate analysis" A.K. Gupta (ed.) , ''Advances in Multivariate Statistical Analysis'' , Reidel (1987) pp. 253–275</td></tr><tr><td valign="top">[a40]</td> <td valign="top"> G.S. Mudholkar, P. Subbaiah, "On a Fisherian detour of the step-down procedure for MANOVA" ''Commun. Statist. Theory and Methods'' , '''17''' (1988) pp. 599–611 {{MR|0939669}} {{ZBL|0665.62056}} </td></tr><tr><td valign="top">[a41]</td> <td valign="top"> R.J. Muirhead, "Aspects of multivariate statistical theory" , Wiley (1982) {{MR|0652932}} {{ZBL|0556.62028}} {{ZBL|0678.62065}} </td></tr><tr><td valign="top">[a42]</td> <td valign="top"> R.F. Potthoff, S.N. Roy, "A generalized multivariate analysis of variance model useful especially for growth curve models" ''Biometrika'' , '''51''' (1964) pp. 313–326</td></tr><tr><td valign="top">[a43]</td> <td valign="top"> C.R. Rao, "Linear statistical inference and its applications" , Wiley (1973) (Edition: Second) {{MR|0346957}} {{ZBL|0256.62002}} </td></tr><tr><td valign="top">[a44]</td> <td valign="top"> C.R. Rao, "Least squares theory using an estimated dispersion matrix and its application to measurement of signals" L.M. Le Cam (ed.) J. Neyman (ed.) , ''Fifth Berkeley Symp. Math. Statist. Probab.'' , '''1''' , Univ. California Press (1967) pp. 355–372 {{MR|0212930}} {{ZBL|0189.18503}} </td></tr><tr><td valign="top">[a45]</td> <td valign="top"> C.R. Rao, S.K. Mitra, "Generalized inverses of matrices and its applications" , Wiley (1971) {{MR|0338013}} {{MR|0321249}} {{ZBL|}} </td></tr><tr><td valign="top">[a46]</td> <td valign="top"> S.N. Roy, R.C. Bose, "Simultaneous confidence interval estimation" ''Ann. Math. Statist.'' , '''24''' (1953) pp. 513–536 {{MR|0060781}} {{ZBL|0052.15403}} </td></tr><tr><td valign="top">[a47]</td> <td valign="top"> H. Scheffé, "Alternative models for the analysis of variance" ''Ann. Math. Statist.'' , '''27''' (1956) pp. 251–271 {{MR|0082249}} {{ZBL|0072.36602}} </td></tr><tr><td valign="top">[a48]</td> <td valign="top"> H. Scheffé, "The analysis of variance" , Wiley (1959) {{MR|0116429}} {{ZBL|0086.34603}} </td></tr><tr><td valign="top">[a49]</td> <td valign="top"> S.R. Searle, "Linear models" , Wiley (1971) {{MR|0293792}} {{ZBL|0218.62071}} </td></tr><tr><td valign="top">[a50]</td> <td valign="top"> S.R. Searle, "Linear models for unbalanced data" , Wiley (1987) {{MR|0907471}} {{ZBL|1095.62080}} </td></tr><tr><td valign="top">[a51]</td> <td valign="top"> S. Weisberg, "Applied linear regression" , Wiley (1985) (Edition: Second) {{MR|2112740}} {{MR|0591462}} {{ZBL|0646.62058}} </td></tr><tr><td valign="top">[a52]</td> <td valign="top"> R.A. Wijsman, "Constructing all smallest simultaneous confidence sets in a given class, with applications to MANOVA" ''Ann. Statist.'' , '''7''' (1979) pp. 1003–1018 {{MR|0536503}} {{ZBL|0416.62030}} </td></tr><tr><td valign="top">[a53]</td> <td valign="top"> R.A. Wijsman, "Smallest simultaneous confidence sets with applications in multivariate analysis" P.R. Krishnaiah (ed.) , ''Multivariate Analysis V'' , North-Holland (1980) pp. 483–498 {{MR|0566358}} {{ZBL|0431.62031}} </td></tr><tr><td valign="top">[a54]</td> <td valign="top"> R.A. Wijsman, "Global cross sections as a tool for factorization of measures and distribution of maximal invariants" ''Sankhyā A'' , '''48''' (1986) pp. 1–42 {{MR|0883948}} {{ZBL|0618.62006}} </td></tr><tr><td valign="top">[a55]</td> <td valign="top"> R.A. Wijsman, "Invariant measures on groups and their use in statistics" , ''Lecture Notes Monograph Ser.'' , '''14''' , Inst. Math. Statist. (1990) {{MR|1218397}} {{ZBL|0803.62001}} </td></tr><tr><td valign="top">[a56]</td> <td valign="top"> "Encyclopedia of Statistical Sciences" S. Kotz (ed.) N.L. Johnson (ed.) , Wiley (1982/88) {{MR|1679440}} {{MR|1605063}} {{MR|1469744}} {{MR|1044999}} {{MR|0976457}} {{MR|0976456}} {{MR|0892738}} {{MR|0873585}} {{MR|0793593}} {{MR|0719029}} {{MR|0719028}} {{MR|0670950}} {{MR|0646617}} {{ZBL|1136.62001}} {{ZBL|0919.62001}} {{ZBL|0897.62002}} {{ZBL|0897.62001}} {{ZBL|0727.62001}} {{ZBL|0706.62002}} {{ZBL|0657.62003}} {{ZBL|0657.62002}} {{ZBL|0657.62001}} {{ZBL|0585.62002}} {{ZBL|0585.62001}} {{ZBL|0552.62001}} </td></tr></table>

Latest revision as of 17:45, 1 July 2020

analysis of variance

Here, ANOVA will be understood in the wide sense, i.e., equated to the univariate linear model whose model equation is

\begin{equation} \tag{a1} \bf y = X \beta + e, \end{equation}

in which $\mathbf{y}$ is an $n \times 1$ observable random vector, $\mathbf{X}$ is a known $( n \times m )$-matrix (the "design matrix" ), $\beta$ is an $( m \times 1 )$-vector of unknown parameters, and is an $( n \times 1 )$-vector of unobservable random variables $e _ { i }$ (the "errors" ) that are assumed to be independent and to have a normal distribution with mean $0$ and unknown variance $\sigma ^ { 2 }$ (i.e., the $e _ { i }$ are independent identically distributed $N ( 0 , \sigma ^ { 2 } )$). It is assumed throughout that $n > m$. Inference is desired on $\beta$ and $\sigma ^ { 2 }$. The $e _ { i }$ may represent measurement error and/or inherent variability in the experiment. The model equation (a1) can also be expressed in words by: $\mathbf{y}$ has independent normal elements $y _ { i }$ with common, unknown variance and expectation $\mathsf E ( \mathbf y ) = \mathbf X \beta$, in which $\mathbf{X}$ is known and $\beta$ is unknown. In most experimental situations the assumptions made on should be regarded as an approximation, though often a good one. Studies on some of the effects of deviations from these assumptions can be found in [a48], Chap. 10, and [a51] discusses diagnostics and remedies for lack of fit in linear regression models. To a certain extent the ANOVA ideas have been carried over to discrete data, then called the log-linear model; see [a6], and [a10].

MANOVA (multivariate analysis of variance) is the multivariate generalization of ANOVA. Its model equation is obtained from (a1) by replacing the column vectors $\mathbf{y} , \beta , \mathbf{e}$ by matrices $\mathbf{Y} , \mathbf{B} , \mathbf{E}$ to obtain

\begin{equation} \tag{a2} \bf Y = X B + E, \end{equation}

where $\mathbf{Y}$ and $\mathbf{E}$ are $n \times p$, $\mathbf{B}$ is $m \times p$, and $\mathbf{X}$ is as in (a1). The assumption on $\mathbf{E}$ is that its $n$ rows are independent identically distributed $N ( 0 , \Sigma )$, i.e., the common distribution of the independent rows is $p$-variate normal with $0$ mean and $p \times p$ non-singular covariance matrix $\Sigma$.

GMANOVA (generalized multivariate analysis of variance) generalizes the model equation (a2) of MANOVA to

\begin{equation} \tag{a3} \mathbf{Y} = \mathbf{X} _ { 1 } \mathbf{BX} _ { 2 } + \mathbf{E}, \end{equation}

in which $\mathbf{E}$ is as in (a2), $\mathbf{X} _ { 1 }$ is as $\mathbf{X}$ in (a2), $\mathbf{B}$ is $m \times s$, and $\mathbf{X} _ { 2 }$ is an $s \times p$ second design matrix.

Logically, it would seem that it suffices to deal only with (a3), since (a2) is a special case of (a3), and (a1) of (a2). This turns out to be impossible and it is necessary to treat the three topics in their own right. This will be done, below. For unexplained terms in the fields of estimation and testing hypotheses, see [a30], [a31] (and also Statistical hypotheses, verification of; Statistical estimation).

ANOVA.

This field is very large, well-developed, and well-documented. Only a brief outline is given here; see the references for more detail. An excellent introduction to the essential elements of the field is [a48] and a short history is given in [a47], Sect. 2. Brief descriptions are also given in [a56], headings Anova; General Linear Model. Other references are [a49] [a50], [a43], [a26], and [a15]. A collection of survey articles on many aspects of ANOVA (and of MANOVA and GMANOVA) can be found in [a14].

In (a1) it is assumed that the parameter vector $\beta$ is fixed (even though unknown). This is called a fixed effects model, or Model I. In some experimental situations it is more appropriate to consider $\beta$ random and inference is then about parameters in the distribution of $\beta$. This is called a random effects model, or Model II. It is called a mixed model if some elements of $\beta$ are fixed, others random. There are also various randomization models that are not described by (a1). For reasons of space limitation, only the fixed effects model will be treated here. For the other models see [a48], Chaps. 7, 8, 9.

The name "analysis of variance" was coined by R.A. Fisher, who developed statistical techniques for dealing with agricultural experiments; see [a48], Sect. 1.1: references to Fisher. As a typical example, consider the two-way layout for the simultaneous study of two different factors, for convenience denoted by $\mathbf{A}$ and $\operatorname{B}$, on the measurement of a certain quantity. Let $\mathbf{A}$ have levels $i = 1 , \ldots , I$, and let $\operatorname{B}$ have levels $j = 1 , \ldots , J$. For each $( i , j )$ combination, measurements $y _ { i j k }$, $k = 1 , \ldots , K$, are made. For instance, in a study of the effects of different varieties and different fertilizers on the yield of tomatoes, let $y _ { i j k }$ be the weight of ripe tomatoes from plant $k$ of variety $i$ using fertilizer $j$. The model equation is

\begin{equation} \tag{a4} y _ { i j k } = \mu + \alpha _ { i } + \beta _ { j } + \gamma _ { i j } + e _ { i j k }, \end{equation}

and it is assumed that the $e _ {i j k }$ are independent identically distributed $N ( 0 , \sigma ^ { 2 } )$. This is of the form (a1) after the $y _ { i j k }$ and $e _ {i j k }$ are strung out to form the column vectors $\mathbf{y}$ and of (a1) with $n = I J K$; similarly, the parameters on the right-hand side of (a4) form an $( m \times 1 )$-vector $\beta$, with $m = 1 + I + J + I J$; finally, $\mathbf{X}$ in (a1) has one column for each of the $m$ parameters, and in row $( i , j , k )$ of $\mathbf{X}$ there is a $1$ in the columns for $\mu$, $\alpha_i$, $\beta_j$, and $\gamma _ { i j }$, and $0$s elsewhere. Some of the customary terminology is as follows. Each $( i , j )$ combination is a cell. In the example (a4), each cell has the same number $K$ of observations (balanced design); in general, the cell numbers need not be equal. The parameters on the right-hand side of (a4) are called the effects: $\mu$ is the general mean, the $\alpha$s are the main effects for factor $\mathbf{A}$, the $\beta$s for $\operatorname{B}$, and the $\gamma$s are the interactions.

The extension to more than two factors is immediate. There are then potentially more types of interactions; e.g., in a three-way layout there are three types of two-factor interactions and one type of three-factor interactions. Layouts of this type are called factorial, and completely crossed if there is at least one observation in each cell. The latter may not always be feasible for practical reasons if the number of cells is large. In that case it may be necessary to restrict observations to only a fraction of the cells and assume certain interactions to be $0$. The judicious choice of this is the subject of design of experiments; see [a26], [a15].

A different type of experiment involves regression. In the simplest case the measurement $y$ of a certain quantity may be modelled as $y = \alpha + \beta t +\text{error}$, where $\alpha$ and $\beta$ are unknown real-valued parameters and $t$ is the value of some continuously measurable quantity such as time, temperature, distance, etc.. This is called linear regression (i.e., linear in $t$). More generally, there could be an arbitrary polynomial in $t$ on the right-hand side. As an example, assume quadratic regression and suppose $t$ denotes time. Let $y _ { i }$ be the measurement on $y$ at time $t_i$, $i = 1 , \dots , n$. The model equation is $y _ { i } = \alpha + \beta t _ { i } + \gamma t_{i} ^ { 2 } + e _ { i }$, which is of the form (a1) with $( \alpha , \beta , \gamma ) ^ { \prime } = \beta$ of (a1). The matrix $\mathbf{X}$ of (a1) has three columns corresponding to $\alpha$, $\beta$, and $\gamma$; the $i$th row of $\mathbf{X}$ is $( 1 , t _ { i } , t _ { i } ^ { 2 } )$. Functions of $t$ other than polynomials are sometimes appropriate. Frequently, $t$ is referred to as a regressor variable or independent variable, and $y$ the dependent variable. Instead of one regressor variable there may be several (multiple regression).

Factors such as $t$ above whose values can be measured on a continuous scale are called quantitative. In contrast, categorical variables (e.g., variety of tomato) are called qualitative. A quantitative factor $t$ may be treated qualitatively if the experiment is conducted at several values, say $t _ { 1 } , t _ { 2 } , \ldots$, but these are only regarded as levels $i = 1,2 , \dots$ of the factor whereas the actual values $t _ { 1 } , t _ { 2 } , \ldots$ are ignored. The name analysis of variance is often reserved for models that have only factors that are qualitative or treated qualitatively. In contrast, regression analysis has only quantitative factors. Analysis of covariance covers models that have both kinds of factors. See [a48], Chap. 6, for more detail.

Another important distinction involving factors is between the notions of crossing and nesting. Two factors $\mathbf{A}$ and $\operatorname{B}$ are crossed if each level of $\mathbf{A}$ can occur with each level of $\operatorname{B}$ (completely crossed if there is at least one observation for each combination of levels, otherwise incompletely or partly crossed). For instance, in the tomato example of the two-way layout (a4), the two factors are crossed since each variety $i$ can be grown with any fertilizer $j$. In contrast, factor $\operatorname{B}$ is said to be nested within factor $\mathbf{A}$ if every level of $\operatorname{B}$ can only occur with one level of $\mathbf{A}$. For instance, suppose two different manufacturing processes (factor $\mathbf{A}$) for the production of cords have to be compared. From each of the two processes several cords are chosen (factor $\operatorname{B}$), each cord cut into several pieces and the breaking strength of each piece measured. Here each cord goes only with one of the processes so that $\operatorname{B}$ is nested within $\mathbf{A}$. Nested factors should be treated more realistically as random. However, for the analysis it is necessary to analyze the corresponding fixed effects model first. See [a48], Sect. 5.3, for more examples and detail.

Estimation and testing hypotheses.

The main interest is in inference on linear functions of the parameter vector $\beta$ of (a1), called parametric functions, i.e., functions of the form $\psi = \mathbf{c} ^ { \prime } \beta$, with $\mathbf{c}$ of order $m \times 1$. Usually one requires point estimators (cf. also Point estimator) of such $\psi$s to be unbiased (cf. also Unbiased estimator). Of particular interest are the elements of the vector $\beta$. However, there is a complication arising from the fact that the design matrix $\mathbf{X}$ in (a1) may be of less than maximal rank (the columns can be linearly dependent). This happens typically in analysis of variance models (but not usually in regression models). For instance, in the two-way layout (a4) the sum of the columns for the $\alpha_i$ equals the column for $\mu$. If $\mathbf{X}$ is of less than full rank, then the elements of $\beta$ are not identifiable in the sense that even if the error vector in (a1) were $0$, so that $\mathbf{X} \beta$ is known, there is no unique solution for $\beta$. A fortiori the elements of $\beta$ do not possess unbiased estimators. Yet, there are parametric functions that do have an unbiased estimator; they are called estimable. It is easily shown that $\mathbf{c} ^ { \prime } \beta$ is estimable if and only if $\mathbf{c} ^ { \prime }$ is in the row space of $\mathbf{X}$ (see [a48], Sect. 1.4). In particular, if one sets $\mathsf E ( y _ { i } ) = \eta _ { i }$ and takes $\mathbf{c} ^ { \prime }$ to be the $i$th row of $\mathbf{X}$, then $\mathbf{c} ^ { \prime } \beta = \eta_{i}$ is estimable. Thus, $\psi$ is estimable if and only if it is a linear combination of the elements of $\eta = \mathsf E ( \mathbf y )$.

The complication presented by a design matrix $\mathbf{X}$ that is not of full rank may be handled in several ways. First, a re-parametrization with fewer parameters and fewer columns of $\mathbf{X}$ is possible. Second, a popular way is to impose side conditions on the parameters that make them unique. For instance, in the two-way layout (a4) often-used side conditions are: $\sum \alpha _ { i } = 0$, or, equivalently, $\alpha_{.} = 0$ (where dotting on a subscript means averaging over that subscript); similarly, $\beta . = 0$, and $\gamma _ { i } = 0.$ for all $i$, $\gamma _ { j } = 0$ for all $j$. Then all parameters are estimable and (for instance) the hypothesis $\mathcal{H} _ { \text{A} }$ that all main effects of factor $\mathbf{A}$ are $0$ can be expressed by: All $\alpha_i$ are equal to zero. A third way of dealing with an $\mathbf{X}$ of less than full rank is to express all questions of inference in terms of estimable parametric functions. For instance, if in (a4) one writes $\eta _ { i j } = \mu + \alpha _ { i } + \beta _ { j } + \gamma _ { i j }$ ($= \mathsf{E} ( y _ { i j k } )$), then all $\eta_{ij}$ are estimable and $\mathcal{H} _ { \text{A} }$ can be expressed by stating that all $\eta_{ i}.$ are equal, or, equivalently, that all $\eta _ { i .} - \eta _ { - }$ are equal to zero.

Another type of estimator that always exists is a least-squares estimator (LSE; cf. also Least squares, method of). A least-squares estimator of $\beta$ is any vector $\flat$ minimizing $\| \mathbf{y} - \mathbf{Xb} \| ^ { 2 }$. A minimizing $\flat$ (unique if and only if $\mathbf{X}$ is of full rank) is denoted by $\hat{\beta}$ and satisfies the normal equations

\begin{equation} \tag{a5} \mathbf{X} ^ { \prime } \mathbf{X} \widehat { \beta } = \mathbf{X} ^ { \prime } \mathbf{y} . \end{equation}

If $\psi = \mathbf{c} ^ { \prime } \beta$ is estimable, then $\hat { \psi } = \mathbf{c} ^ { \prime } \hat { \beta }$ is unique (even when $\hat{\beta}$ is not) and is called the least-squares estimator of $\psi$. By the Gauss–Markov theorem (cf. also Least squares, method of), $\widehat { \psi }$ is the minimum variance unbiased estimator of $\psi$. See [a48], Sect. 1.4.

A linear hypothesis $\mathcal{H}$ consists of one or more linear restrictions on $\beta$:

\begin{equation} \tag{a6} \mathcal{H} : \mathbf{X} _ { 3 } \beta = 0 \end{equation}

with $\mathbf{X} _ { 3 }$ of order $q \times m$ and rank $q$. Then $\mathcal{H}$ is to be tested against the alternative $\mathbf{X} _ { 3 } \beta \neq 0$. Let $\operatorname{rank} ( \mathbf{X} ) = r$. The model (a1) together with $\mathcal{H}$ of (a6) can be expressed in geometric language as follows: The mean vector $\eta = \mathsf E ( \mathbf y )$ lies in a linear subspace $\Omega$ of $n$-dimensional space, spanned by the columns of $\mathbf{X}$, and $\mathcal{H}$ restricts $ \eta $ to a further subspace $\omega$ of $\Omega$, where $\operatorname { dim } ( \Omega ) = r$ and $\operatorname { dim } ( \omega ) = r - q$. Further analysis is simplified by a transformation to the canonical system, below.

Canonical form.

There is a transformation $\mathbf z = \Gamma \mathbf y $, with $\Gamma$ of order $n \times n$ and orthogonal, so that the model (a1) together with the hypothesis (a6) can be put in the following form (in which $z_1 , \dots ,z_n$ are the elements of $z$ and $\zeta _ { i } = \mathsf{E} ( z _ { i } )$): $z_1 , \dots ,z_n$ are independent, normal, with common variance $\sigma ^ { 2 }$; $\zeta _ { r + 1 } = \ldots = \zeta _ { n } = 0$, and, additionally, $\mathcal{H}$ specifies $\zeta _ { 1 } = \ldots = \zeta _ { q } = 0$. Note that $\zeta _ { q + 1} , \dots , \zeta _ { r }$ are unrestricted throughout. Any estimable parametric function can be expressed in the form $\psi = \sum _ { i = 1 } ^ { r } d _ { i } \zeta _ { i }$, with constants $d_{i}$, and the least-squares estimator of $\psi$ is $\hat { \psi } = \sum _ { i = 1 } ^ { r } d _ { i } z _ { i }$. To estimate $\sigma ^ { 2 }$ one forms the sum of squares for error $\operatorname{SS} _ { e } = \sum _ { i = r + 1 } ^ { n } z _ { i } ^ { 2 }$, and divides by $n - r$ ($=$ degrees of freedom for the error) to form the mean square $\operatorname{MS} _ { e } = \operatorname{SS} _ { e } / ( n - r )$. Then $ \operatorname{MS} _ { e }$ is an unbiased estimator of $\sigma ^ { 2 }$. A test of the hypothesis $\mathcal{H}$ can be obtained by forming $\text{SS} _ { \mathcal{H} } = \sum _ { i = 1 } ^ { q } z _ { i } ^ { 2 }$, with degrees of freedom $q$, and $ \operatorname { MS } _{\mathcal{H}}=\operatorname {SS} _{\mathcal{H}} / q$. Then, if $\mathcal{H}$ is true, the test statistic $\mathcal{F} = \operatorname {MS} _ { \mathcal{H} } / \operatorname {MS}_{\text{e}}$ has an $F$-distribution with degrees of freedom $( q , n - r )$. For a test of $\mathcal{H}$ of level of significance $\alpha$ one rejects $\mathcal{H}$ if $\mathcal{F} > F _ { \alpha ; q , n - r}$ ($=$ the upper $\alpha$-point of the $F$-distribution with degrees of freedom $( q , n - r )$). This is "the" $F$-test; it can be derived as a likelihood-ratio test (LR test) or as a uniformly most powerful invariant test (UMP invariant test) and has several other optimum properties; see [a48], Sect. 2.10. For the power of the $F$-test, see [a48], Sect. 2.8.

Simultaneous confidence intervals.

Let $L$ be the linear space of all parametric functions of the form $\psi = \sum _ { i = 1 } ^ { q } d _ { i } \zeta _ { i }$, i.e., all $\psi$ that are $0$ if $\mathcal{H}$ is true. The $F$-test provides a way to obtain simultaneous confidence intervals for all $\psi \in L$ with confidence level $1 - \alpha$ (cf. also Confidence interval). This is useful, for instance, in cases where $\mathcal{H}$ is rejected. Then any $\psi \in L$ whose confidence interval does not include $0$ is said to be "significantly different from 0" and can be held responsible for the rejection of $\mathcal{H}$. Observe that $q ^ { - 1 } \sum _ { i = 1 } ^ { q } ( z _ { i } - \zeta _ { i } ) ^ { 2 } / \operatorname{MS} _ { e }$ has an $F$-distribution with degrees of freedom $( q , n - r )$ (whether or not $\mathcal{H}$ is true) so that this quantity is $\leq F _ { \alpha ; q , n - \gamma }$ with probability $1 - \alpha$. This inequality can be converted into a family of double inequalities and leads to the simultaneous confidence intervals

\begin{equation} \tag{a7} \mathsf{P} ( \widehat { \psi } - S \widehat { \sigma } _ { \widehat { \psi } } \leq \psi \leq \widehat { \psi } + S \widehat { \sigma } _ { \widehat { \psi } } , \forall \psi \in L ) = 1 - \alpha, \end{equation}

in which $S = ( q F _ { \alpha ; q , n - r } ) ^ { 1 / 2 }$ and $\hat { \sigma }_{ \hat { \psi }} = \| \mathbf{d} \| ( \text{MS} _ { e } ) ^ { 1 / 2 }$ is the square root of the unbiased estimator of the variance $\| \mathbf{d} \| ^ { 2 } \sigma ^ { 2 }$ of $\widehat { \psi } = \sum _ { i = 1 } ^ { q } d _ { i } z _ { i }$. Thus, the confidence interval for $\psi$ has endpoints $\hat { \psi } \pm S \ \hat { \sigma }_{ \hat { \psi }}$, and all $\psi \in L$ are covered by their confidence intervals simultaneously with probability $1 - \alpha$. Note that (a7) is stated without needing the canonical system so that the confidence intervals can be evaluated directly in the original system.

With help of (a7) the $F$-test can also be expressed as follows: $\mathcal{H}$ is accepted if and only if all confidence intervals with endpoints $\hat { \psi } \pm S \ \hat { \sigma }_{ \hat { \psi }}$ cover the value $0$. More generally, it is convenient to make the following definition: a test of a hypothesis $\mathcal{H}$ is exact with respect to a family of simultaneous confidence intervals for a family of parametric functions if $\mathcal{H}$ is accepted if and only if the confidence interval of every $\psi$ in the family includes the value of $\psi$ specified by $\mathcal{H}$; see [a52], [a53]. Thus, the $F$-test is exact with respect to the simultaneous confidence intervals (a7).

The confidence intervals obtained in (a7) are called Scheffé-type simultaneous confidence intervals. Shorter confidence intervals of Tukey-type within a smaller class of parametric functions are possible in some designs. This is applicable, for instance, in the two-way layout of (a4) with equal cell numbers if only differences between the $\alpha_i$ are considered important rather than all parametric functions that are $0$ under $\mathcal{H} _ { \text{A} }$ (so-called contrasts). See [a48], Sect. 3.6.

The canonical system is very useful to derive formulas and prove properties in a unified way, but it is usually not advisable in any given linear model to carry out the transformation $\mathbf z = \Gamma \mathbf y $ explicitly. Instead, the necessary expressions can be derived in the original system. For instance, if $\hat { \eta } \Omega$ and $\widehat { \eta } \omega$ are the orthogonal projections of $\mathbf{y}$ on $\Omega$ and on $\omega$, respectively, then $\operatorname {SS} _ { e } = \| \mathbf{y} - \hat { \eta } _ { \Omega } \| ^ { 2 }$ and $\operatorname {SS} _ { \mathcal H } = \| \widehat { \eta } _ { \Omega } - \widehat { \eta } _ { \omega } \| ^ { 2 }$. These projections can be found by solving the normal equations (a5) (and one gets, for instance, $\hat { \eta } _ { \Omega } = \mathbf{X} \hat { \beta }$), or by minimizing quadratic forms. As an example of the latter: In the two-way layout (a4), minimize $\sum _ { i j k } ( y _ { i j k } - \eta _ { i j } ) ^ { 2 }$ over the $\eta_{ij}$. This yields $\hat { \eta } _ { i j } = y _ { i j }.$, so that $\operatorname{SS} _ { e } = \sum _ { i j k } ( y _ { i j k } - y _ { i j .} ) ^ { 2 }$. If desired, formulas can be expressed in vector and matrix form. As an example, if $\mathbf{X}$ is of maximal rank, then (a5) yields $\hat { \beta } = ( \mathbf{X} ^ { \prime } \mathbf{X} ) ^ { - 1 } \mathbf{X} ^ { \prime } \mathbf{y}$ and $\operatorname {SS} _ { e } = \mathbf{y} ^ { \prime } ( \mathbf{I} _ { n } - \mathbf{X} ( \mathbf{X} ^ { \prime } \mathbf{X} ) ^ { - 1 } \mathbf{X} ^ { \prime } ) \mathbf{y}$. Similar expressions hold under $\mathcal{H}$ after replacing $\mathbf{X}$ by a matrix whose columns span $\omega$. If $\mathbf{X}$ is not of maximal rank, then a generalized inverse may be employed. See [a43], Sect. 4a.3, and [a45].

MANOVA.

There are several good textbooks on multivariate analysis that treat various aspects of MANOVA. Among the major ones are [a1], [a8], [a19], [a29], [a36], [a41], and [a43], Chap. 8. See also [a56], headings Multivariate Analysis; Multivariate Analysis Of Variance, and [a14]. The ideas involved in MANOVA are essentially the same as in ANOVA, but there is an added dimension in that the observations are now multivariate. For instance, if measurements are made on $p$ different features of the same individual, then this should be regarded as one observation on a $p$-variate distribution. The MANOVA model is given by (a2). A linear hypothesis on $\mathbf{B}$ analogous to (a6) is

\begin{equation} \tag{a8} \mathcal{H} : \mathbf{X} _ { 3 } \mathbf{B} = 0, \end{equation}

with $\mathbf{X} _ { 3 }$ as in (a6). Any ANOVA testing problem defined by the choice of $\mathbf{X}$ in (a1) and $\mathbf{X} _ { 3 }$ in (a6) carries over to the same kind of problem given by (a2) and (a8). However, since $\mathbf{B}$ is a matrix, there are other ways than (a8) of formulating a linear hypothesis. The most obvious extension of (a8) is

\begin{equation} \tag{a9} \mathcal {H} : {\bf X} _ { 3 } {\bf B X} _ { 4 } = 0, \end{equation}

in which $\mathbf{X}_{4}$ is a known $( p \times p _ { 1 } )$-matrix of rank $p _ { 1 }$. However, (a9) can be reduced to (a8) by making the transformation $\mathbf{Z} = \mathbf{Y X}_4$, of order $n \times p _ { 1 }$, $\Gamma = \mathbf{B} \mathbf{X}_4$, $\mathbf{F} = \mathbf{EX}_4$; then the model is ${\bf Z = X} \Gamma + \bf F$, with the rows of $\mathbf{F}$ independent identically distributed $N ( 0 , \Sigma _ { 1 } )$, $\Sigma _ { 1 } = \mathbf{X} _ { 4 } ^ { \prime } \Sigma \mathbf{X} _ { 4 }$, and $\mathcal{H} : \mathbf{X} _ { 3 } \Gamma = 0$. Thus, the transformed problem is as (a2), (a8), with $\mathbf{Z} , \Gamma , \mathbf{F}$ replacing $\mathbf{Y} , \mathbf{B} , \mathbf{E}$. This can be applied, for instance, to profile analysis; see [a29], Sect. 5.4 (A5), [a36], Sects. 4.6, 5.6.

There is a canonical form of the MANOVA testing problem (a2), (a8) analogous to the ANOVA problem (a1), (a6), the difference being that the real-valued random variables $z_i$ of ANOVA are replaced by $1 \times p$ random vectors. These vectors form the rows of three random matrices, $\mathbf{Z} _ { 1 }$ of order $q \times p$, $\mathbf{Z}_{2}$ of order $( r - q ) \times p$, and $\mathbf{Z}_{3}$ of order $( n - r ) \times p$, all of whose rows are assumed independent and $p$-variate normal with common non-singular covariance matrix $\Sigma$; furthermore, $\mathsf{E} ( \mathbf{Z} _ { 3 } ) = 0$, $\mathsf{E} ( \mathbf Z _ { 2 } )$ is unspecified, and $\mathcal{H}$ specifies $\mathsf{E} ( {\bf Z} _ { 1 } ) = 0$. It is assumed that $n - r \geq p$. Put $\mathsf E ( \mathbf Z _ { 1 } ) = \Theta$, so that $\mathbf{Z} _ { 1 }$ is an unbiased estimator of $\Theta$. For testing $\mathcal{H} : \Theta = 0$, $\mathbf{Z}_{2}$ is ignored and the sums of squares $\text{SS} _ { \mathcal{H} }$ and $\text{SS} _ { e }$ of ANOVA are replaced by the $( p \times p )$-matrices $\mathbf{M} _ { \mathcal{H} } = \mathbf{Z} _ { 1 } ^ { \prime }\mathbf{ Z} _ { 1 }$ and $\mathbf{M} _ { \mathsf{E} } = \mathbf{Z} _ { 3 } ^ { \prime } \mathbf{Z} _ { 3 }$, respectively. An application of sufficiency plus the principle of invariance restricts tests of $\mathcal{H}$ to those that depend only on the positive characteristic roots of $\mathbf{M} _ { \mathcal{H} } \mathbf{M} _ { \mathsf{E} } ^ { - 1 }$ ($=$ the positive characteristic roots of $\mathbf{Z} _ { 1 } \mathbf{M} _ { \mathsf{E} } ^ { - 1 } \mathbf{Z} _ { 1 } ^ { \prime }$). The case $q = 1$, when $\mathbf{Z} _ { 1 }$ is a row vector, deserves special attention. It arises, for instance, when testing for zero mean in a single multivariate population or testing the equality of means in two such populations. Then $F = \mathbf{Z} _ { 1 } \mathbf{M} _ { \mathsf{E} } ^ { - 1 } \mathbf{Z} _ { 1 } ^ { \prime }$ is the only positive characteristic root; $( n - r ) F$ is called Hotelling's $T ^ { 2 }$, and $p ^ { - 1 } ( n - r - p + 1 ) F$ has an $F$-distribution with degrees of freedom $( p , n - r - p + 1 )$, central or non-central according as $\mathcal{H}$ is true or false. Rejecting $\mathcal{H}$ for large values of $F$ is uniformly most powerful invariant. If $q \geq 2$ there is no best way of combining the $q$ characteristic roots, so that there is no uniformly most powerful invariant test (unlike there is in ANOVA). The following tests have been proposed:

reject $\mathcal{H}$ if (Wilks LR test);

reject $\mathcal{H}$ if the largest characteristic root of $\mathbf{M} _ { \mathcal{H} } \mathbf{M} _ { \mathsf{E} } ^ { - 1 }$ exceeds a constant (Roy's test);

reject $\mathcal{H}$ if $\operatorname{tr}( \mathbf{M} _ { \mathcal{H} } \mathbf{M} _ { \mathsf{E} } ^ { - 1 } ) > \text{const}$ (Lawley–Hotelling test);

reject $\mathcal{H}$ if $\operatorname { tr } ( \mathbf{M} _ { \mathcal{H} } ( \mathbf{M} _ { H } + \mathbf{M} _ { \mathsf{E} } ) ^ { - 1 } ) > \text{const}$ (Bartlett–Nanda–Pillai test). For references, see [a1], Sects. 8.3, 8.6, or [a36], Chap. 5. For distribution theory, see [a1], Sects. 8.4, 8.6, [a41], Sects. 10.4–10.6, [a55], Sect. 10.3. Tables and charts can be found in [a1], Appendix, and [a36], Appendix.

The problem of expressing the matrices $\mathbf{M} _ { \mathcal{H} }$ and ${\bf M} _ { \mathsf{E} }$ in terms of the original model given by (a2), (a8) is very similar to the situation in ANOVA. One way is to express $\mathbf{M} _ { \mathcal{H} }$ and ${\bf M} _ { \mathsf{E} }$ explicitly in terms of $\mathbf{X}$ and $\mathbf{X} _ { 3 }$. Another is to consider the ANOVA problem with the same $\mathbf{X}$ and $\mathbf{X} _ { 3 }$; if explicit formulas exist for $\text{SS} _ { \mathcal{H} }$ and $\text{SS} _ { e }$, they can be converted to $\mathbf{M} _ { \mathcal{H} }$ and ${\bf M} _ { \mathsf{E} }$. For instance, $\operatorname{SS} _ { e } = \sum _ { i j k } ( y _ { i j k } - y _ { i j .} ) ^ { 2 }$ in the ANOVA two-way layout (a4) converts to $\mathbf{M} _ { \mathsf{E} } = \sum _ { i j k } ( \mathbf{y} _ { i j k } - \mathbf{y} _ { i j }. ) ^ { \prime } ( \mathbf{y} _ { i j k } - \mathbf{y} _ { i j }. )$ in the corresponding MANOVA problem, where now the $\mathbf{y} _ { i j k }$ are $( 1 \times p )$-vectors.

Point estimation.

In the canonical system $\mathbf{Z} _ { 1 }$ is an unbiased estimator and the maximum-likelihood estimator of $\Theta$ (cf. also Maximum-likelihood method). If $f$ is a linear function of $\Theta$, then $f ( \mathbf{Z} _ { 1 } )$ is both an unbiased estimator and a maximum-likelihood estimator of $f ( \Theta )$. An unbiased estimator of $\Sigma$ is , whereas its maximum-likelihood estimator is $n ^ { - 1 } \mathbf{M} _ { \mathsf{E} }$.

Confidence intervals and sets.

There are several kinds of linear functions of $\Theta$ that are of interest. The direct analogue of a linear function of $\zeta _ { 1 } , \ldots , \zeta _ { q }$ in ANOVA is a function of the form $\mathbf{a} ^ { \prime } \Theta$ (with $\mathbf{a}$ of order $q \times 1$), which is a $( 1 \times p )$-vector. This leads to a confidence set in $p$-space for $\mathbf{a} ^ { \prime } \Theta$, rather than an interval. Simultaneous confidence sets for all $\mathbf{a} ^ { \prime } \Theta$ can be derived from any of the proposed tests for $\mathcal{H}$, but it turns out that only Roy's maximum root test is exact with respect to these confidence sets (and not, for instance, the LR test of Wilks); see [a52], [a53]. The same is true for simultaneous confidence sets for all $\Theta \mathbf{b}$, and confidence intervals for all $\mathbf{a} ^ { \prime } \Theta \mathbf b $. Simultaneous confidence sets for all $\mathbf{a} ^ { \prime } \Theta$ were given in [a18]. In [a46] simultaneous confidence intervals for all $\mathbf{a} ^ { \prime } \Theta \mathbf b $ are derived (called "double linear compounds" ). These are special cases of all (possibly matrix-valued) functions of the form $\mathbf{A} \Theta \mathbf{B}$ are treated in [a11]. The most general linear functions of $\Theta$ are of the form $\operatorname { tr } ( \mathbf{N} \Theta )$. Simultaneous confidence intervals for all such functions as $\mathbf{N}$ runs through all $( p \times q )$-matrices are given in [a37]. These are derived from a test defined in terms of a symmetric gauge function rather than from Roy's maximum root test. In [a52], [a53] a generalization of this is given if $\mathbf{N}$ has its rank restricted; for $\operatorname{rank}( \mathbf{N}) \leq 1$ this reproduces the confidence intervals of [a46].

Step-down procedures.

Partition $\mathbf{B}$ into its columns $\beta _ { 1 } , \ldots , \beta _ { p }$; then $\mathcal{H}$ of (a8) is the intersection of the component hypotheses $\mathcal{H} _ { j } : \mathbf{X} _ { 3 } \beta _ { j } = 0$. Also partition $\mathbf{Y}$ into its columns ${\bf y} _ { 1 } , \dots , {\bf y} _ { p }$. Then for each $j = 1 , \ldots , p$, the hypothesis ${\cal H} _ { j }$ is tested with a univariate ANOVA $F$-test that depends only on ${\bf y} _ { 1 } , \dots , {\bf y} _ { j }$. If any ${\cal H} _ { j }$ is rejected, then $\mathcal{H}$ is rejected. The tests are independent, which permits easy determination of the overall level of significance in terms of the individual ones. For details, history of the subject and references, see [a38] and [a39], Sect. 3. A variation, based on $P$-values, is presented in [a40]. Step-down procedures are convenient, but it is shown in [a34] that even in the simplest case when $q = 1$, a step-down test is not admissible. Furthermore, a step-down test is not exact with respect to simultaneous confidence intervals or confidence sets derived from the test for various linear functions of $\mathbf{B}$; see [a53], Sect. 4.4. A generalization of step-down procedures is proposed in [a38] by grouping the column vectors of $\mathbf{Y}$ and $\mathbf{B}$ into blocks.

Random effects models.

Some references on this topic in MANOVA are [a2] and [a35]; see also references quoted therein.

Missing data.

Statistical experiments involving multivariate observations bring in an element that is not present with univariate observations, such as in ANOVA. Above, it has been taken for granted that of every individual in a sample all $p$ variates are observed. In practice this is not always true, for various reasons, in which case some of the observations have missing data. (This is not to be confused with the notion of empty cells in ANOVA.) If that happens, one can group all observations with complete data together as the complete sample and call the remaining observations an incomplete sample. From a slightly different point of view, the incomplete sample is sometimes considered extra data on some of the variates. The analysis of MANOVA problems is more complicated when there are missing data. In the simplest case, all missing data are on the same variates. This is a special case of nested missing data patterns. In the latter case explicit expressions of maximum-likelihood estimators are possible; see [a3] and the references therein. For more complicated missing data patterns explicit maximum-likelihood estimators are usually not available unless certain assumptions are made on the structure of the unknown covariance matrix $\Sigma$; see [a3], [a4] and [a5]. The situation is even worse for testing. For instance, even in the simplest case of testing the hypothesis that the mean of a multivariate population is $0$, if in addition to a complete sample there is an incomplete one taken on a subset of the variates, then there is no locally (let alone uniformly) most-powerful test; see [a9]. Several aspects of estimation and testing in the presence of various patterns of missing data can be found in [a25], wherein also appear many references to other papers in the field.

GMANOVA.

This topic has not been recognized as a distinct entity within multivariate analysis until relatively recently. Consequently, most of today's (2000) knowledge of the subject is found in the research literature, rather than in textbooks. (There is an introduction to GMANOVA in [a41], Problem 10.18, and a little can be found in [a8], Sect. 9.6, second part.) A good exposition of testing aspects of GMANOVA, pointing to applications in various experimental settings, is given in [a21].

The general GMANOVA model was first stated in [a42], where the motivation was the modelling of experiments on the comparison of growth curves in different populations. Suppose such a growth curve can be represented by a polynomial in the time $t$, say $f ( t ) = \beta _ { 0 } + \beta _ { 1 } t + \ldots + \beta _ { k } t ^ { k }$. If measurements are made on an individual at times $t _ { 1 } , \ldots , t _ { p }$, then these $p$ data are thought of as one observation on a $p$-variate population with population mean $( f ( t _ { 1 } ) , \ldots , f ( t _ { p } ) )$ and covariance matrix $\Sigma$, where the $\beta$s and $\Sigma$ are unknown parameters. Suppose $m$ populations are to be compared and a sample of size $n_i$ is taken from the $i$th population, $i = 1 , \ldots , m$. In order to model this by (a3), let the $i$th column of $\mathbf{X} _ { 1 }$ (corresponding to the $i$th population) have $n_i$ $1$s, and $0$s otherwise. Specifically, the first column has a $1$ in positions $1 , \ldots , n _ { 1 }$, the second in positions $n _ { 1 } + 1 , \ldots , n _ { 1 } + n _ { 2 }$, etc.; then $n = \sum n_{i}$. Let the growth curve in the $i$th population be $\beta _ { i 0 } + \beta _ { i 1 } t + \ldots + \beta _ { i k } t ^ { k }$; then the matrix $\mathbf{B}$ has $m$ rows, the $i$th row being $( \beta _ { i 0 } , \ldots , \beta _ { i k } )$, so that $s = k + 1$ in (a3); and $\mathbf{X} _ { 2 }$ has $p$ columns, the $j$th one being $( 1 , t _ { j } , \ldots , t _ { j } ^ { k } ) ^ { \prime }$. (In the example given in [a42], measurements were taken at ages 8, 10, 12, and 14 in a group of girls and a group of boys; each measurement was of a certain distance between two points inside the head (with help of an X-ray picture) that is of interest in orthodontistry to monitor growth.)

Linear hypotheses are in general of the form (a9). For instance, suppose two growth curves are to be compared, both assumed to be straight lines ($k = 1$) so that $m = 2$, $s = 2$. Suppose the hypothesis is $\beta _ { 11 } = \beta _ { 21 }$ (equal slope in the two populations). Then in (a9) one can take $\mathbf{X} _ { 3 } = ( 1 , - 1 )$ and $\mathbf{X} _ { 4 } = ( 0,1 ) ^ { \prime }$. Other examples of GMANOVA may be found in [a21].

A canonical form for the GMANOVA model was derived in [a13]; it can also be found in [a21], Sect. 3.2. It can be obtained from the canonical form of MANOVA by partitioning the matrices $\mathbf{Z}_{i}$ columnwise into three blocks, resulting in $9$ matrices ${\bf Z} _ { i j }$, $i, j = 1,2,3$. Invariance reduction eliminates all ${\bf Z} _ { i j }$ except $[ \mathbf{Z} _ { 12 } , \mathbf{Z} _ { 13 } ]$ and $[\mathbf{Z} _ { 32 } , \mathbf{Z} _ { 33 }]$ (the latter is used for estimating the relevant portion of the unknown covariance matrix $\Sigma$). It is given that $\mathsf{E} ( {\bf Z} _ { 13 } ) = 0$ and $\mathsf E [ \mathbf Z _ { 32 } , \mathbf Z _ { 33 } ] = 0$; inference is desired on $\Theta = \textsf{E} ( \mathbf{Z} _ { 12 } )$, e.g., to test the hypothesis $\mathcal{H} : \Theta = 0$. Further sufficiency reduction leads to two matrix-valued statistics $\mathbf{T} _ { 1 }$ and $\mathbf{T} _ { 2 }$ ([a20], [a21]), of which $\mathbf{T} _ { 1 }$ is the most important and is built-up from the following statistic:

\begin{equation} \tag{a10} \mathbf{Z} _ { 0 } = \mathbf{Z} _ { 12 } - \mathbf{Z} _ { 13 } \mathbf{R}, \end{equation}

in which $\mathbf{R} = \mathbf{V} _ { 33 } ^ { - 1 } \mathbf{V} _ { 32 }$ (with ${\bf V} _ { j j ^ { \prime } } = {\bf Z} _ { 3 j } ^ { \prime } {\bf Z} _ { 3 j^{\prime} }$) is the estimated regression of $\mathbf{Z} _ { 12 }$ on $\mathbf{Z} _ { 13 }$, the true regression being $\Sigma _ { 33 } ^ { - 1 } \Sigma _ { 32 }$. That inference on $\Theta$ should be centred on $\mathbf{Z}_{0}$ can be understood intuitively by realizing that if $\Sigma$ were known, then $\mathbf{Z} _ { 12 } - \mathbf{Z} _ { 13 } \Sigma _ { 33 } ^ { - 1 } \Sigma _ { 32 }$ minimizes the variances among all linear combinations of $\mathbf{Z} _ { 12 }$ and $\mathbf{Z} _ { 13 }$ whose mean is $\Theta$, and provides therefore better inference than using only $\mathbf{Z} _ { 12 }$. The unknown regression is then estimated by $\mathbf{R}$, leading to $\mathbf{Z}_{0}$ of (a10).

The essential difference between GMANOVA and MANOVA lies in the presence of $\mathbf{Z} _ { 13 }$, which is correlated with $\mathbf{Z} _ { 12 }$ and has zero mean. Then $\mathbf{Z} _ { 13 }$ is used as a covariate for $\mathbf{Z} _ { 12 }$; see, e.g., [a33]. However, not all models that appear to be GMANOVA produce such a covariate. More precisely, if in (a3) $\operatorname{rank} (\mathbf{X} _ { 2 } ) = p$, then it turns out that in the canonical form there are no matrices ${\bf Z} _ { i3 }$ and the model reduces essentially to MANOVA. This situation was encountered previously when it was pointed out that the MANOVA model (a2) together with the GMANOVA-type hypothesis (a9) was immediately reducible to straight MANOVA. The same conclusion would have been reached after treating (a2), (a9) as a special case of GMANOVA and inspecting the canonical form. For a "true" GMANOVA the existence of $\mathbf{Z} _ { 13 }$ is essential. A typical example of true GMANOVA, where the covariate data are built into the experiment, was given in [a7].

Inference on $\Theta$ can proceed using only $\mathbf{T} _ { 1 }$ (e.g., [a27], and [a13]), but is not necessarily the best possible. For testing $\mathcal{H}$ an essentially complete class of tests include those that also involve $\mathbf{T} _ { 2 }$ explicitly. One such test is the locally most-powerful test derived in [a20]. For the distribution theory of $( \mathbf{T} _ { 1 } , \mathbf{T} _ { 2 } )$ see [a21], Sect. 3.6, and [a54], Sect. 6.5. Admissibility and inadmissibility results were obtained in [a32]; comparison of various tests can also be found there. A natural estimator of $\Theta$ is $\mathbf{Z}_{0}$ of (a10); it is an unbiased estimator and in [a22] it is shown to be best equivariant. Other kinds of estimators have also been considered, e.g., in [a24], in which several references to earlier work can be found. Simultaneous confidence intervals and sets have been treated in [a16], [a17], [a27], and [a28]. Special structures of the covariance matrix $\Sigma$ have been studied in [a44], where also references to earlier work on related topics can be found.

Generalizations.

A natural generalization of the GMANOVA model is indicated in [a13] by having a further partitioning of the blocks of $Z$s in the canonical form. This is called extended GMANOVA in [a21] and examples are given there. Another generalization involves some relaxation of the usual assumptions of multivariate normality, etc. See [a23], [a12], [a17].

References

[a1] T.W. Anderson, "An introduction to multivariate statistical analysis" , Wiley (1984) (Edition: Second) MR0771294 Zbl 0651.62041
[a2] T.W. Anderson, "The asymptotic distribution of characteristic roots and vectors in multivariate components of variance" L.J. Gleser (ed.) M.D. Perlman (ed.) S.J. Press (ed.) A.R. Sampson (ed.) , Contributions to Probability and Statistics; Essays in Honor of Ingram Olkin , Springer (1989) pp. 177–196 MR1024331
[a3] S.A. Andersson, M.D. Perlman, "Lattice-ordered conditional independence models for missing data" Statist. Prob. Lett. , 12 (1991) pp. 465–486 MR1143745 Zbl 0751.62026
[a4] S.A. Andersson, M.D. Perlman, "Lattice models for conditional independence in a multivariate normal distribution" Ann. Statist. , 21 (1993) pp. 1318–1358 MR1241268 Zbl 0803.62042
[a5] S.A. Andersson, J.I. Marden, M.D. Perlman, "Totally ordered multivariate linear models" Sankhyā A , 55 (1993) pp. 370–394 MR1323395
[a6] Y.M.M. Bishop, S.E. Fienberg, P.W. Holland, "Discrete multivariate analysis: Theory and practice" , MIT (1975) MR0381130 Zbl 0332.62039
[a7] W.G. Cochran, C.I. Bliss, "Discrimination functions with covariance" Ann. Statist. , 19 (1948) pp. 151–176
[a8] M.L. Eaton, "Multivariate statistics, a vector space approach" , Wiley (1983) Zbl 0587.62097
[a9] M.L. Eaton, T. Kariya, "Multivariate tests with incomplete data" Ann. Statist. , 11 (1983) pp. 654–665 MR0696076 Zbl 0524.62051
[a10] S.E. Fienberg, "The analysis of cross-classified categorical data" , MIT (1980) (Edition: Second) MR0623082 Zbl 0499.62049
[a11] K.R. Gabriel, "Simultaneous test procedures in multivariate analysis of variance" Biometrika , 55 (1968) pp. 489–504 MR0235667
[a12] N. Giri, K. Das, "On a robust test of the extended MANOVA problem in elliptically symmetric distributions" Sankhyā A , 50 (1988) pp. 234–248
[a13] L.J. Gleser, I. Olkin, "Linear models in multivariate analysis" R.C. Bose (ed.) , Essays in Probability and Statistics: In memory of S.N. Roy , Univ. North Carolina Press (1970) pp. 267–292 MR0267693
[a14] "Analysis of Variance" P.R. Krishnaiah (ed.) , Handbook of Statistics , 1 , North-Holland (1980) MR0600318 Zbl 0447.00013
[a15] K. Hinkelmann, O. Kempthorne, "Design and analysis of experiments" , I: Introduction to experimental design , Wiley (1994) MR1265939 Zbl 0805.62071
[a16] P.M. Hooper, "Simultaneous interval estimation in the general multivariate analysis of variance model" Ann. Statist. , 11 (1983) pp. 666–673 (Correction in: 12 (1984), 785) MR0696077 MR0740934 Zbl 0526.62032
[a17] P.M. Hooper, W.K. Yau, "Optimal confidence regions in GMANOVA" Canad. J. Statist. , 14 (1986) pp. 315–322 MR0876757 Zbl 0625.62021
[a18] D.R. Jensen, L.S. Mayer, "Some variational results and their applications in multiple inference" Ann. Statist. , 5 (1977) pp. 922–931 MR0448707 Zbl 0368.62007
[a19] R.A. Johnson, D.W. Wichern, "Applied multivariate statistical analysis" , Prentice-Hall (1988) (Edition: Second) MR2372475 MR1168210 MR0653327 Zbl 0663.62061
[a20] T. Kariya, "The general MANOVA problem" Ann. Statist. , 6 (1978) pp. 200–214 MR0474629 Zbl 0382.62042
[a21] T. Kariya, "Testing in the multivariate general linear model" , Kinokuniya (1985)
[a22] T. Kariya, "Equivariant estimation in a model with an ancillary statistic" Ann. Statist. , 17 (1989) pp. 920–928 MR0994276 Zbl 0697.62020
[a23] T. Kariya, B.K. Sinha, "Robustness of statistical tests" , Acad. Press (1989) MR0996634 Zbl 0699.62033
[a24] T. Kariya, Y. Konno, W.E. Strawderman, "Double shrinkage estimators in the GMANOVA model" J. Multivar. Anal. , 56 (1996) pp. 245–258 MR1379529 Zbl 0863.62055
[a25] T. Kariya, P.R. Krishnaiah, C.R. Rao, "Statistical inference from multivariate normal populations when some data is missing" P.R. Krishnaiah (ed.) , Developm. in Statist. , 4 , Acad. Press (1983) pp. 137–148
[a26] O. Kempthorne, "The design and analysis of experiments" , Wiley (1952) MR1528291 MR0045368 Zbl 0049.09901
[a27] C.G. Khatri, "A note on a MANOVA model applied to problems in growth curves" Ann. Inst. Statist. Math. , 18 (1966) pp. 75–86 MR0219181
[a28] P.R. Krishnaiah, "Simultaneous test procedures under general MANOVA models" P.R. Krishnaiah (ed.) , Multivariate Analysis II , Acad. Press (1969) pp. 121–143 MR254975
[a29] A.M. Kshirsagar, "Multivariate analysis" , M. Dekker (1972) MR0343478 Zbl 0246.62064
[a30] E.L. Lehmann, "Theory of point estimation" , Wiley (1983) MR0702834 Zbl 0522.62020
[a31] E L. Lehmann, "Testing statistical hypotheses" , Wiley (1986) (Edition: Second) MR0852406 Zbl 0608.62020
[a32] J.I. Marden, "Admissibility of invariant tests in the general multivariate analysis of variance problem" Ann. Statist. , 11 (1983) pp. 1086–1099 MR0720255 Zbl 0598.62006
[a33] J.I. Marden, M.D. Perlman, "Invariant tests for means with covariates" Ann. Statist. , 8 (1980) pp. 25–63 MR0557553 Zbl 0454.62049
[a34] J.I. Marden, M.D. Perlman, "On the inadmissibility of step-down procedures for the Hotelling ${\bf T} ^ { 2 }$ problem" Ann. Statist. , 18 (1990) pp. 172–190 MR1041390 Zbl 0712.62052
[a35] T. Mathew, A. Niyogi, B.K. Sinha, "Improved nonnegative estimation of variance components in balanced multivariate mixed models" J. Multivar. Anal. , 51 (1994) pp. 83–101 MR1309370 Zbl 0806.62057
[a36] D.F. Morrison, "Multivariate statistical methods" , McGraw-Hill (1976) (Edition: Second) MR0408108 Zbl 0355.62049
[a37] G.S. Mudholkar, "On confidence bounds associated with multivariate analysis of variance and non-independence between two sets of variates" Ann. Math. Statist. , 37 (1966) pp. 1736–1746 MR0214204 Zbl 0146.40403
[a38] G.S. Mudholkar, P. Subbaiah, "A review of step-down procedures for multivariate analysis of variance" R.P. Gupta (ed.) , Multivariate Statistical Analysis , North-Holland (1980) pp. 161–178 MR0600149 Zbl 0445.62079
[a39] G.S. Mudholkar, P. Subbaiah, "Some simple optimum tests in multivariate analysis" A.K. Gupta (ed.) , Advances in Multivariate Statistical Analysis , Reidel (1987) pp. 253–275
[a40] G.S. Mudholkar, P. Subbaiah, "On a Fisherian detour of the step-down procedure for MANOVA" Commun. Statist. Theory and Methods , 17 (1988) pp. 599–611 MR0939669 Zbl 0665.62056
[a41] R.J. Muirhead, "Aspects of multivariate statistical theory" , Wiley (1982) MR0652932 Zbl 0556.62028 Zbl 0678.62065
[a42] R.F. Potthoff, S.N. Roy, "A generalized multivariate analysis of variance model useful especially for growth curve models" Biometrika , 51 (1964) pp. 313–326
[a43] C.R. Rao, "Linear statistical inference and its applications" , Wiley (1973) (Edition: Second) MR0346957 Zbl 0256.62002
[a44] C.R. Rao, "Least squares theory using an estimated dispersion matrix and its application to measurement of signals" L.M. Le Cam (ed.) J. Neyman (ed.) , Fifth Berkeley Symp. Math. Statist. Probab. , 1 , Univ. California Press (1967) pp. 355–372 MR0212930 Zbl 0189.18503
[a45] C.R. Rao, S.K. Mitra, "Generalized inverses of matrices and its applications" , Wiley (1971) MR0338013 MR0321249
[a46] S.N. Roy, R.C. Bose, "Simultaneous confidence interval estimation" Ann. Math. Statist. , 24 (1953) pp. 513–536 MR0060781 Zbl 0052.15403
[a47] H. Scheffé, "Alternative models for the analysis of variance" Ann. Math. Statist. , 27 (1956) pp. 251–271 MR0082249 Zbl 0072.36602
[a48] H. Scheffé, "The analysis of variance" , Wiley (1959) MR0116429 Zbl 0086.34603
[a49] S.R. Searle, "Linear models" , Wiley (1971) MR0293792 Zbl 0218.62071
[a50] S.R. Searle, "Linear models for unbalanced data" , Wiley (1987) MR0907471 Zbl 1095.62080
[a51] S. Weisberg, "Applied linear regression" , Wiley (1985) (Edition: Second) MR2112740 MR0591462 Zbl 0646.62058
[a52] R.A. Wijsman, "Constructing all smallest simultaneous confidence sets in a given class, with applications to MANOVA" Ann. Statist. , 7 (1979) pp. 1003–1018 MR0536503 Zbl 0416.62030
[a53] R.A. Wijsman, "Smallest simultaneous confidence sets with applications in multivariate analysis" P.R. Krishnaiah (ed.) , Multivariate Analysis V , North-Holland (1980) pp. 483–498 MR0566358 Zbl 0431.62031
[a54] R.A. Wijsman, "Global cross sections as a tool for factorization of measures and distribution of maximal invariants" Sankhyā A , 48 (1986) pp. 1–42 MR0883948 Zbl 0618.62006
[a55] R.A. Wijsman, "Invariant measures on groups and their use in statistics" , Lecture Notes Monograph Ser. , 14 , Inst. Math. Statist. (1990) MR1218397 Zbl 0803.62001
[a56] "Encyclopedia of Statistical Sciences" S. Kotz (ed.) N.L. Johnson (ed.) , Wiley (1982/88) MR1679440 MR1605063 MR1469744 MR1044999 MR0976457 MR0976456 MR0892738 MR0873585 MR0793593 MR0719029 MR0719028 MR0670950 MR0646617 Zbl 1136.62001 Zbl 0919.62001 Zbl 0897.62002 Zbl 0897.62001 Zbl 0727.62001 Zbl 0706.62002 Zbl 0657.62003 Zbl 0657.62002 Zbl 0657.62001 Zbl 0585.62002 Zbl 0585.62001 Zbl 0552.62001
How to Cite This Entry:
ANOVA. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=ANOVA&oldid=14171
This article was adapted from an original article by Robert A. Wijsman (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article