Namespaces
Variants
Actions

Difference between revisions of "Statistical hypotheses, verification of"

From Encyclopedia of Mathematics
Jump to: navigation, search
m (Undo revision 48814 by Ulf Rehmann (talk))
Tag: Undo
m (tex encoded by computer)
 
Line 1: Line 1:
 +
<!--
 +
s0874001.png
 +
$#A+1 = 160 n = 0
 +
$#C+1 = 160 : ~/encyclopedia/old_files/data/S087/S.0807400 Statistical hypotheses, verification of,
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
 +
 +
{{TEX|auto}}
 +
{{TEX|done}}
 +
 
''statistical hypotheses testing''
 
''statistical hypotheses testing''
  
 
One of the basic parts of mathematical statistics, expounding ideas and methods for the statistical testing of correspondences between experimental data on the one hand and hypotheses on their probability characteristics on the other.
 
One of the basic parts of mathematical statistics, expounding ideas and methods for the statistical testing of correspondences between experimental data on the one hand and hypotheses on their probability characteristics on the other.
  
Let a random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s0874001.png" /> be observed, taking values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s0874002.png" /> in a measurable space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s0874003.png" />, and suppose it is known that the probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s0874004.png" /> belongs to a given set of probability distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s0874005.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s0874006.png" /> is a certain parametric set. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s0874007.png" /> is called the set of admissible hypotheses, and any non-empty subset <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s0874008.png" /> of it is called a statistical hypothesis, or simply a hypothesis. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s0874009.png" /> contains precisely one element, then the hypothesis is said to be simple, otherwise it is said to be compound. Moreover, if there are two so-called competing hypotheses distinguished in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740010.png" />:
+
Let a random vector $  X = ( X _ {1} \dots X _ {n} ) $
 +
be observed, taking values $  x = ( x _ {1} \dots x _ {n} ) $
 +
in a measurable space $  ( \mathfrak X _ {n} , {\mathcal B} _ {n} ) $,  
 +
and suppose it is known that the probability distribution of $  X $
 +
belongs to a given set of probability distributions $  H = \{ { {\mathsf P} _  \theta  } : {\theta \in \Theta } \} $,  
 +
where $  \Theta $
 +
is a certain parametric set. $  H $
 +
is called the set of admissible hypotheses, and any non-empty subset $  H _ {i} $
 +
of it is called a statistical hypothesis, or simply a hypothesis. If $  H _ {i} $
 +
contains precisely one element, then the hypothesis is said to be simple, otherwise it is said to be compound. Moreover, if there are two so-called competing hypotheses distinguished in $  H $:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740011.png" /></td> </tr></table>
+
$$
 +
H _ {0}  = \{ { {\mathsf P} _  \theta  } : {\theta \in \Theta _ {0} \subset  \Theta } \}
 +
$$
  
 
and
 
and
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740012.png" /></td> </tr></table>
+
$$
 +
H _ {1}  = H \setminus  H _ {0}  = \
 +
\{ { {\mathsf P} _  \theta  } : {\theta \in \Theta _ {1} = \Theta \setminus  \Theta _ {0} } \}
 +
,
 +
$$
  
then one of which, for example <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740013.png" />, is called the null, and the other the alternative, hypothesis. In terms of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740014.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740015.png" />, the basic problem in the theory of statistical hypotheses testing can be conveniently formulated using the Neyman–Pearson model (see , [[#References|[2]]]). Namely, find an optimal method that makes it possible, on the basis of an observed realization of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740016.png" />, to test whether the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740017.png" />: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740018.png" /> is correct, according to which the probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740019.png" /> belongs to the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740020.png" />, or whether the alternative hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740021.png" />: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740022.png" /> is correct, according to which the probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740023.png" /> belongs to the set
+
then one of which, for example $  H _ {0} $,  
 +
is called the null, and the other the alternative, hypothesis. In terms of $  H _ {0} $
 +
and $  H _ {1} $,  
 +
the basic problem in the theory of statistical hypotheses testing can be conveniently formulated using the Neyman–Pearson model (see , [[#References|[2]]]). Namely, find an optimal method that makes it possible, on the basis of an observed realization of $  X $,  
 +
to test whether the hypothesis $  H _ {0} $:  
 +
$  \theta \in \Theta _ {0} $
 +
is correct, according to which the probability distribution of $  X $
 +
belongs to the set $  H _ {0} = \{ { {\mathsf P} _  \theta  } : {\theta \in \Theta _ {0} } \} $,  
 +
or whether the alternative hypothesis $  H _ {1} $:  
 +
$  \theta \in \Theta _ {1} $
 +
is correct, according to which the probability distribution of $  X $
 +
belongs to the set
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740024.png" /></td> </tr></table>
+
$$
 +
H _ {1}  = \{ { {\mathsf P} _  \theta  } : {\theta \in \Theta _ {1} = \Theta \setminus
 +
\Theta _ {0} } \}
 +
.
 +
$$
  
 
===Example 1.===
 
===Example 1.===
Let a random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740025.png" /> be observed, with components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740026.png" /> that are independent identically-distributed random variables subject to the normal law <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740027.png" />, with unknown mathematical expectation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740028.png" /> <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740029.png" />, while the variance is equal to 1, i.e. for any real number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740030.png" />,
+
Let a random vector $  X = ( X _ {1} \dots X _ {n} ) $
 +
be observed, with components $  X _ {1} \dots X _ {n} $
 +
that are independent identically-distributed random variables subject to the normal law $  N _ {1} ( \theta , 1) $,  
 +
with unknown mathematical expectation $  \theta = {\mathsf E} X _ {i} $
 +
$  ( | \theta | < \infty ) $,  
 +
while the variance is equal to 1, i.e. for any real number $  x $,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740031.png" /></td> </tr></table>
+
$$
 +
{\mathsf P} \{ X _ {i} < x  \mid  \theta \}  = \
 +
\Phi ( x- \theta )  = \
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740032.png" /></td> </tr></table>
+
\frac{1}{\sqrt {2 \pi } }
 +
\int\limits _ {- \infty } ^ { x }  e ^ {-( t- \theta )  ^ {2} /2 }  dt,
 +
$$
  
Under these conditions it is possible to examine the problem of testing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740033.png" />: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740034.png" /> against <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740035.png" />: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740036.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740037.png" /> is a given number. In the given example, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740038.png" /> is a simple, while <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740039.png" /> is a compound hypothesis.
+
$$
 +
= 1 \dots n.
 +
$$
  
Formally, the competing hypotheses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740040.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740041.png" /> are equivalent in the problem of choosing between them, and the question of which of these two non-intersecting and mutually-complementary sets from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740042.png" /> should be called the null hypothesis is not vital, and does not affect the construction of the theory of statistical hypotheses testing itself. However, as a rule, the researcher's attitude to the problem itself affects the choice of the null hypothesis, with the result that the null hypothesis is often taken to be that subset <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740043.png" /> of the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740044.png" /> of all admissible hypotheses that in the researcher's opinion, bearing in mind the nature of the phenomenon in question, or in the light of any physical considerations, will best fit in with the expected experimental data. For this very reason, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740045.png" /> is often called the hypothesis to be tested. On a theoretical plan, the difference between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740046.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740047.png" /> is often explained by the fact that, as a rule, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740048.png" /> has a simpler structure than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740049.png" />, as reflected in the researcher's preference for the simpler model.
+
Under these conditions it is possible to examine the problem of testing  $  H _ {0} $:  
 +
$  \theta = \theta _ {0} $
 +
against  $  H _ {1} $:  
 +
$  \theta \neq \theta _ {0} $,  
 +
where  $  \theta _ {0} $
 +
is a given number. In the given example, $  H _ {0} $
 +
is a simple, while  $  H _ {1} $
 +
is a compound hypothesis.
  
In the theory of statistical hypotheses testing, the decision on the correctness of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740050.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740051.png" /> is taken on the basis of an observed realization of the random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740052.png" />; the decision principle used in taking the decision "the hypothesis Hi is correct"  <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740053.png" />, is called a [[Statistical test|statistical test]]. The structure of any statistical test is completely defined by its so-called [[Critical function|critical function]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740054.png" />. According to the statistical test with critical function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740055.png" />, the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740056.png" /> to be tested is rejected with probability <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740057.png" /> in favour of the alternative <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740058.png" />, while <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740059.png" /> is rejected with probability <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740060.png" /> in favour of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740061.png" />. From a practical point of view, the most interesting are the so-called non-randomized tests, whose critical functions take only two values: 0 and 1. Whichever the test used in choosing between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740062.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740063.png" />, it may lead either to a correct or a false decision being taken. In the theory of statistical hypotheses testing, wrong inferences are classified in the following way.
+
Formally, the competing hypotheses $  H _ {0} $
 +
and  $  H _ {1} $
 +
are equivalent in the problem of choosing between them, and the question of which of these two non-intersecting and mutually-complementary sets from  $ H $
 +
should be called the null hypothesis is not vital, and does not affect the construction of the theory of statistical hypotheses testing itself. However, as a rule, the researcher's attitude to the problem itself affects the choice of the null hypothesis, with the result that the null hypothesis is often taken to be that subset  $  H _ {0} $
 +
of the set  $  H $
 +
of all admissible hypotheses that in the researcher's opinion, bearing in mind the nature of the phenomenon in question, or in the light of any physical considerations, will best fit in with the expected experimental data. For this very reason, $  H _ {0} $
 +
is often called the hypothesis to be tested. On a theoretical plan, the difference between  $  H _ {0} $
 +
and $  H _ {1} $
 +
is often explained by the fact that, as a rule,  $  H _ {0} $
 +
has a simpler structure than  $  H _ {1} $,  
 +
as reflected in the researcher's preference for the simpler model.
  
If the test rejects the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740064.png" /> to be tested when in reality it is correct, then one says that an error of the first kind has been committed. Conversely, if the test does not reject <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740065.png" /> (and, in this test, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740066.png" /> is therefore accepted) when it is in fact incorrect, then one says that an error of the second kind has been committed. The problem of testing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740067.png" /> against <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740068.png" /> should ideally be approached in such a way as to minimize the probabilities of these errors. Unfortunately, it is impossible, given the fixed dimension <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740069.png" /> of the vector of observations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740070.png" />, to control both error probabilities simultaneously: as a rule, as one decreases, so the other increases. The probabilities of these errors are expressed numerically in terms of the so-called power function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740071.png" /> of the statistical test, defined on the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740072.png" /> by means of the rule:
+
In the theory of statistical hypotheses testing, the decision on the correctness of  $  H _ {0} $
 +
or  $  H _ {1} $
 +
is taken on the basis of an observed realization of the random vector  $  X $;
 +
the decision principle used in taking the decision  "the hypothesis Hi is correct"   $  ( i = 0, 1) $,  
 +
is called a [[Statistical test|statistical test]]. The structure of any statistical test is completely defined by its so-called [[Critical function|critical function]]  $  \phi _ {n} ( \cdot ) : \mathfrak X _ {n} \rightarrow [ 0, 1] $.  
 +
According to the statistical test with critical function  $  \phi _ {n} ( \cdot ) $,  
 +
the hypothesis  $  H _ {0} $
 +
to be tested is rejected with probability  $  \phi _ {n} ( X) $
 +
in favour of the alternative  $  H _ {1} $,
 +
while  $  H _ {1} $
 +
is rejected with probability  $  1- \phi _ {n} ( X) $
 +
in favour of $  H _ {0} $.  
 +
From a practical point of view, the most interesting are the so-called non-randomized tests, whose critical functions take only two values: 0 and 1. Whichever the test used in choosing between  $  H _ {0} $
 +
and  $  H _ {1} $,
 +
it may lead either to a correct or a false decision being taken. In the theory of statistical hypotheses testing, wrong inferences are classified in the following way.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740073.png" /></td> </tr></table>
+
If the test rejects the hypothesis  $  H _ {0} $
 +
to be tested when in reality it is correct, then one says that an error of the first kind has been committed. Conversely, if the test does not reject  $  H _ {0} $(
 +
and, in this test,  $  H _ {0} $
 +
is therefore accepted) when it is in fact incorrect, then one says that an error of the second kind has been committed. The problem of testing  $  H _ {0} $
 +
against  $  H _ {1} $
 +
should ideally be approached in such a way as to minimize the probabilities of these errors. Unfortunately, it is impossible, given the fixed dimension  $  n $
 +
of the vector of observations of  $  X $,
 +
to control both error probabilities simultaneously: as a rule, as one decreases, so the other increases. The probabilities of these errors are expressed numerically in terms of the so-called power function  $  \beta _ {n} ( \cdot ) $
 +
of the statistical test, defined on the set  $  \Theta = \Theta _ {0} \cup \Theta _ {1} $
 +
by means of the rule:
  
It follows from the definition of the power function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740074.png" /> that if the random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740075.png" /> is subject to the law <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740076.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740077.png" />, then the statistical test based on the critical function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740078.png" /> will reject the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740079.png" /> to be tested with probability <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740080.png" />. Thus, the restriction of the power function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740081.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740082.png" /> to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740083.png" /> will show the probability of errors of the first kind, i.e. the probability of wrongly rejecting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740084.png" />. Conversely, the restriction of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740085.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740086.png" /> to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740087.png" />, called the power of the statistical test, shows another important quantity of the statistical test: the probability of rejecting the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740088.png" /> to be tested when in reality the competing hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740089.png" /> is correct. The power of the statistical test is sometimes defined as the number
+
$$
 +
\beta _ {n} ( \theta )  = \
 +
{\mathsf E} _  \theta  \phi _ {n} ( X)  = \
 +
\int\limits _ { \mathfrak X } \phi _ {n} ( x)  d {\mathsf P} _  \theta  ( x),\ \
 +
\theta \in \Theta = \Theta _ {0} \cup \Theta _ {1} .
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740090.png" /></td> </tr></table>
+
It follows from the definition of the power function  $  \beta _ {n} ( \cdot ) $
 +
that if the random vector  $  X $
 +
is subject to the law  $  {\mathsf P} _  \theta  $,
 +
$  \theta \in \Theta = \Theta _ {0} \cup \Theta _ {1} $,
 +
then the statistical test based on the critical function  $  \phi _ {n} ( \cdot ) $
 +
will reject the hypothesis  $  H _ {0} $
 +
to be tested with probability  $  \beta _ {n} ( \Theta ) $.
 +
Thus, the restriction of the power function  $  \beta _ {n} ( \cdot ) $
 +
from  $  \Theta $
 +
to  $  \Theta _ {0} $
 +
will show the probability of errors of the first kind, i.e. the probability of wrongly rejecting  $  H _ {0} $.  
 +
Conversely, the restriction of  $  \beta _ {n} ( \cdot ) $
 +
from  $  \Theta $
 +
to  $  \Theta _ {1} $,
 +
called the power of the statistical test, shows another important quantity of the statistical test: the probability of rejecting the hypothesis  $  H _ {0} $
 +
to be tested when in reality the competing hypothesis  $  H _ {1} $
 +
is correct. The power of the statistical test is sometimes defined as the number
  
By complementation, i.e. by use of the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740091.png" />, defined on the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740092.png" />, the probability of an error of the second kind can be calculated.
+
$$
 +
\beta  = \inf _ {\theta \in \Theta _ {1} }  \beta _ {n} ( \theta )  = \
 +
\inf _ {\theta \in \Theta _ {1} }  {\mathsf E} _  \theta  \phi _ {n} ( X).
 +
$$
  
The problem of testing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740093.png" /> against <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740094.png" /> using the classical Neyman–Pearson model begins with the choice of an upper bound <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740095.png" /> <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740096.png" /> for the probability of wrongly rejecting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740097.png" />, i.e. for the probability of an error of the first kind, and, given this bound <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740098.png" />, the test with the greatest power is then sought. Owing to the special role played by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s08740099.png" /> in the researcher's work, the number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400100.png" />, called the significance level of the test, is taken to be sufficiently small, equal for example to 0.01; 0.05; 0.1; etc. The choice of the significance level <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400101.png" /> means that the set of all statistical tests designed to test <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400102.png" /> against <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400103.png" /> is restricted to the set of those tests satisfying the condition
+
By complementation, i.e. by use of the function  $  1- \beta _ {n} ( \cdot ) $,
 +
defined on the set  $  \Theta _ {1} $,  
 +
the probability of an error of the second kind can be calculated.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400104.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+
The problem of testing  $  H _ {0} $
 +
against  $  H _ {1} $
 +
using the classical Neyman–Pearson model begins with the choice of an upper bound  $  \alpha $
 +
$  ( 0 < \alpha < 1 ) $
 +
for the probability of wrongly rejecting  $  H _ {0} $,
 +
i.e. for the probability of an error of the first kind, and, given this bound  $  \alpha $,
 +
the test with the greatest power is then sought. Owing to the special role played by  $  H _ {0} $
 +
in the researcher's work, the number  $  \alpha $,
 +
called the significance level of the test, is taken to be sufficiently small, equal for example to 0.01; 0.05; 0.1; etc. The choice of the significance level  $  \alpha $
 +
means that the set of all statistical tests designed to test  $  H _ {0} $
 +
against  $  H _ {1} $
 +
is restricted to the set of those tests satisfying the condition
  
(It is sometimes required that, instead of condition (1), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400105.png" />, which makes no difference to the general theory of statistical hypotheses testing.) A statistical test that satisfies (1) is called a test at level <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400107.png" />. Thus, in the classical formulation, the problem of testing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400108.png" /> against <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400109.png" /> reduces to the construction of a statistical test at level <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400110.png" /> whose power function satisfies the condition
+
$$ \tag{1 }
 +
\sup _ {\theta \in \Theta _ {0} }  \beta _ {n} ( \theta ) = \
 +
\sup _ {\theta \in \Theta _ {0} }  {\mathsf E} _  \theta  \phi _ {n} ( X)  = \
 +
\alpha .
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400111.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+
(It is sometimes required that, instead of condition (1),  $  \sup _ {\theta \in \Theta _ {0}  }  \beta _ {n} ( \theta ) \leq  \alpha $,
 +
which makes no difference to the general theory of statistical hypotheses testing.) A statistical test that satisfies (1) is called a test at level  $  \alpha $.
 +
Thus, in the classical formulation, the problem of testing  $  H _ {0} $
 +
against  $  H _ {1} $
 +
reduces to the construction of a statistical test at level  $  \alpha $
 +
whose power function satisfies the condition
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400112.png" /> is the power function of an arbitrary test at level <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400113.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400114.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400115.png" /> are simple, an effective solution of this optimization problem is provided by the [[Likelihood-ratio test|likelihood-ratio test]]. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400116.png" /> is compound, however, then it is rare for a statistical test to satisfy condition (2). However, if such a test does exist, then it is recognized as the best test of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400117.png" /> against <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400118.png" />, and is called the [[Uniformly most-powerful test|uniformly most-powerful test]] at level <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400119.png" /> in the problem of choosing between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400120.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400121.png" />. Since uniformly most-powerful tests exist only rarely, the class of statistical tests has to be restricted by means of certain extra requirements, such as unbiasedness, similarity, completeness, and others, and the best test in the sense of (2) has to be constructed in this narrower class. For example, the requirement that the test be unbiased means that its power function must satisfy the relation
+
$$ \tag{2 }
 +
\beta _ {n}  ^  \star  ( \theta ) \geq  \beta _ {n} ( \theta ) \ \
 +
\textrm{ for }  \textrm{ all }  \theta \in \Theta _ {1} ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400122.png" /></td> </tr></table>
+
where  $  \beta _ {n} ( \cdot ) $
 +
is the power function of an arbitrary test at level  $  \alpha $.
 +
If  $  H _ {0} $
 +
and  $  H _ {1} $
 +
are simple, an effective solution of this optimization problem is provided by the [[Likelihood-ratio test|likelihood-ratio test]]. If  $  H _ {1} $
 +
is compound, however, then it is rare for a statistical test to satisfy condition (2). However, if such a test does exist, then it is recognized as the best test of  $  H _ {0} $
 +
against  $  H _ {1} $,
 +
and is called the [[Uniformly most-powerful test|uniformly most-powerful test]] at level  $  \alpha $
 +
in the problem of choosing between  $  H _ {0} $
 +
and  $  H _ {1} $.  
 +
Since uniformly most-powerful tests exist only rarely, the class of statistical tests has to be restricted by means of certain extra requirements, such as unbiasedness, similarity, completeness, and others, and the best test in the sense of (2) has to be constructed in this narrower class. For example, the requirement that the test be unbiased means that its power function must satisfy the relation
 +
 
 +
$$
 +
\sup _ {\theta \in \Theta _ {0} }  \beta _ {n} ( \theta )  \leq  \inf _ {
 +
\theta \in \Theta _ {1} }  \beta _ {n} ( \theta ).
 +
$$
  
 
===Example 2.===
 
===Example 2.===
Under the conditions of example 1, for any fixed significance level <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400123.png" />, a non-randomized, uniformly most-powerful, unbiased test of level <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400124.png" /> exists for testing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400125.png" /> against <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400126.png" />, namely the likelihood-ratio test. The critical function of this best test is defined as:
+
Under the conditions of example 1, for any fixed significance level $  \alpha $,  
 +
a non-randomized, uniformly most-powerful, unbiased test of level $  \alpha $
 +
exists for testing $  H _ {0} $
 +
against $  H _ {1} $,  
 +
namely the likelihood-ratio test. The critical function of this best test is defined as:
 +
 
 +
$$
 +
\phi _ {n} ( X)  =  \left \{
 +
 
 +
\begin{array}{ll}
 +
1  & \textrm{ if }  | \overline{X}\; - \theta _ {0} | >
 +
\frac{1}{\sqrt n }
 +
\Phi  ^ {-} 1 \left ( 1-
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400127.png" /></td> </tr></table>
+
\frac \alpha {2}
 +
\right ) ,  \\
 +
0  & \textrm{ if }  | \overline{X}\; - \theta _ {0} | \leq 
 +
\frac{1}{\sqrt n }
 +
\Phi
 +
^ {-} 1 \left ( 1-
 +
\frac \alpha {2}
 +
\right ) ,  \\
 +
\end{array}
 +
\right .
 +
$$
  
 
where
 
where
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400128.png" /></td> </tr></table>
+
$$
 +
\overline{X}\; =
 +
\frac{X _ {1} + \dots + X _ {n} }{n}
 +
.
 +
$$
  
Owing to the fact that the statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400129.png" />, called the test statistic, is subject to the normal law <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400130.png" /> with parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400131.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400132.png" />, i.e. for any real number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400133.png" />,
+
Owing to the fact that the statistic $  \overline{X}\; $,  
 +
called the test statistic, is subject to the normal law $  N _ {1} ( \theta , 1/n) $
 +
with parameters $  {\mathsf E} \overline{X}\; = \theta $
 +
and $  {\mathsf D} \overline{X}\; = 1/n $,  
 +
i.e. for any real number $  x $,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400134.png" /></td> </tr></table>
+
$$
 +
{\mathsf P} \{ \overline{X}\; < x  \mid  \theta \}  = \Phi [ \sqrt n ( x- \theta )],
 +
$$
  
the power function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400135.png" /> of the best test for testing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400136.png" /> against <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400137.png" /> is expressed by the formula
+
the power function $  \beta _ {n} ( \cdot ) $
 +
of the best test for testing $  H _ {0} $
 +
against $  H _ {1} $
 +
is expressed by the formula
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400138.png" /></td> </tr></table>
+
$$
 +
\beta _ {n} ( \theta )  = {\mathsf E} _  \theta  \phi _ {n} ( X) =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400139.png" /></td> </tr></table>
+
$$
 +
= \
 +
{\mathsf P} \left \{ | \overline{X}\; - \theta _ {0} | >  
 +
\frac{1}{\sqrt n }
 +
\Phi  ^ {-} 1 \left ( 1-  
 +
\frac \alpha {2}
 +
\right )  \mid  \Theta \right \} =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400140.png" /></td> </tr></table>
+
$$
 +
= \
 +
\Phi \left [ \Phi  ^ {-} 1 \left (
 +
\frac \alpha {2}
 +
\right ) +
 +
\sqrt n ( \theta _ {0} - \theta ) \right ] + \Phi \left [ \Phi  ^ {-} 1 \left
 +
(
 +
\frac \alpha {2}
 +
\right ) - \sqrt n ( \theta _ {0} - \theta ) \right ] ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400141.png" />. The figure below gives a graphical representation of the behaviour of the power function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400142.png" />.
+
where $  \beta _ {n} ( \theta ) \geq  \beta _ {n} ( \theta _ {0} ) = \alpha $.  
 +
The figure below gives a graphical representation of the behaviour of the power function $  \beta _ {n} ( \cdot ) $.
  
 
<img style="border:1px solid;" src="https://www.encyclopediaofmath.org/legacyimages/common_img/s087400a.gif" />
 
<img style="border:1px solid;" src="https://www.encyclopediaofmath.org/legacyimages/common_img/s087400a.gif" />
Line 77: Line 288:
 
Figure: s087400a
 
Figure: s087400a
  
The function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400143.png" /> attains its lowest value, equal to the significance level <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400144.png" />, at the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400145.png" />, and by moving <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400146.png" /> away from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400147.png" />, its values increase, getting nearer to 1 as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400148.png" /> increases.
+
The function $  \beta _ {n} ( \cdot ) $
 +
attains its lowest value, equal to the significance level $  \alpha $,  
 +
at the point $  \theta = \theta _ {0} $,  
 +
and by moving $  \theta $
 +
away from $  \theta _ {0} $,  
 +
its values increase, getting nearer to 1 as $  | \theta - \theta _ {0} | $
 +
increases.
  
The theory of statistical hypotheses testing enables one to treat the different problems that arise in practice from the same point of view: the construction of interval estimators for unknown parameters, the estimation of the divergence between mean values of probability laws, the testing of hypotheses on the independence of observations, problems of statistical quality control, etc. Thus, in example 2, the acceptance region of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400149.png" /> is the best confidence interval with confidence coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400150.png" /> for the unknown mathematical expectation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400151.png" />.
+
The theory of statistical hypotheses testing enables one to treat the different problems that arise in practice from the same point of view: the construction of interval estimators for unknown parameters, the estimation of the divergence between mean values of probability laws, the testing of hypotheses on the independence of observations, problems of statistical quality control, etc. Thus, in example 2, the acceptance region of $  H _ {0} $
 +
is the best confidence interval with confidence coefficient $  1 - \alpha $
 +
for the unknown mathematical expectation $  \theta $.
  
Apart from the classical Neyman–Pearson approach, there are other methods for solving the problem of choosing between hypotheses: the [[Bayesian approach|Bayesian approach]], the [[Minimax|minimax]] approach, the Wald method of sequential testing, and others. Moreover, the theory of statistical hypotheses testing also includes approximate methods based on the study of the asymptotic behaviour of a sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400152.png" /> of power functions of statistical tests of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400153.png" /> against <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400154.png" />, when the dimension <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400155.png" /> of the vector of observations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400156.png" /> increases unboundedly. In this situation it is usually required that the constructed sequence of tests be consistent, i.e. that
+
Apart from the classical Neyman–Pearson approach, there are other methods for solving the problem of choosing between hypotheses: the [[Bayesian approach|Bayesian approach]], the [[Minimax|minimax]] approach, the Wald method of sequential testing, and others. Moreover, the theory of statistical hypotheses testing also includes approximate methods based on the study of the asymptotic behaviour of a sequence $  \{ \beta _ {n} ( \cdot ) \} $
 +
of power functions of statistical tests of $  H _ {0} $
 +
against $  H _ {1} $,  
 +
when the dimension $  n $
 +
of the vector of observations of $  X = ( X _ {1} \dots X _ {n} ) $
 +
increases unboundedly. In this situation it is usually required that the constructed sequence of tests be consistent, i.e. that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400157.png" /></td> </tr></table>
+
$$
 +
\lim\limits _ {n \rightarrow \infty }  \beta _ {n} ( \theta )  = 1 \ \
 +
\textrm{ for }  \textrm{ any }  \theta \in \Theta _ {1} ,
 +
$$
  
which means that as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400158.png" /> increases, the hypotheses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400159.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400160.png" /> can be distinguished with a greater degree of certainty. In example 2, a consistent sequence of tests is constructed (if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087400/s087400161.png" />).
+
which means that as $  n $
 +
increases, the hypotheses $  H _ {0} $
 +
and $  H _ {1} $
 +
can be distinguished with a greater degree of certainty. In example 2, a consistent sequence of tests is constructed (if $  n \rightarrow \infty $).
  
 
In any case, whatever the statistical test used, the acceptance of either hypothesis does not mean that it is necessarily the correct one, but simply that there is no evidence at this stage to contradict it. Precisely because of this agreement between theory and experience, the researcher has no reason not to believe that his choice is correct until such time as new observations appear that might force him to change his attitude towards the chosen hypothesis, and perhaps even towards the whole model.
 
In any case, whatever the statistical test used, the acceptance of either hypothesis does not mean that it is necessarily the correct one, but simply that there is no evidence at this stage to contradict it. Precisely because of this agreement between theory and experience, the researcher has no reason not to believe that his choice is correct until such time as new observations appear that might force him to change his attitude towards the chosen hypothesis, and perhaps even towards the whole model.

Latest revision as of 14:55, 7 June 2020


statistical hypotheses testing

One of the basic parts of mathematical statistics, expounding ideas and methods for the statistical testing of correspondences between experimental data on the one hand and hypotheses on their probability characteristics on the other.

Let a random vector $ X = ( X _ {1} \dots X _ {n} ) $ be observed, taking values $ x = ( x _ {1} \dots x _ {n} ) $ in a measurable space $ ( \mathfrak X _ {n} , {\mathcal B} _ {n} ) $, and suppose it is known that the probability distribution of $ X $ belongs to a given set of probability distributions $ H = \{ { {\mathsf P} _ \theta } : {\theta \in \Theta } \} $, where $ \Theta $ is a certain parametric set. $ H $ is called the set of admissible hypotheses, and any non-empty subset $ H _ {i} $ of it is called a statistical hypothesis, or simply a hypothesis. If $ H _ {i} $ contains precisely one element, then the hypothesis is said to be simple, otherwise it is said to be compound. Moreover, if there are two so-called competing hypotheses distinguished in $ H $:

$$ H _ {0} = \{ { {\mathsf P} _ \theta } : {\theta \in \Theta _ {0} \subset \Theta } \} $$

and

$$ H _ {1} = H \setminus H _ {0} = \ \{ { {\mathsf P} _ \theta } : {\theta \in \Theta _ {1} = \Theta \setminus \Theta _ {0} } \} , $$

then one of which, for example $ H _ {0} $, is called the null, and the other the alternative, hypothesis. In terms of $ H _ {0} $ and $ H _ {1} $, the basic problem in the theory of statistical hypotheses testing can be conveniently formulated using the Neyman–Pearson model (see , [2]). Namely, find an optimal method that makes it possible, on the basis of an observed realization of $ X $, to test whether the hypothesis $ H _ {0} $: $ \theta \in \Theta _ {0} $ is correct, according to which the probability distribution of $ X $ belongs to the set $ H _ {0} = \{ { {\mathsf P} _ \theta } : {\theta \in \Theta _ {0} } \} $, or whether the alternative hypothesis $ H _ {1} $: $ \theta \in \Theta _ {1} $ is correct, according to which the probability distribution of $ X $ belongs to the set

$$ H _ {1} = \{ { {\mathsf P} _ \theta } : {\theta \in \Theta _ {1} = \Theta \setminus \Theta _ {0} } \} . $$

Example 1.

Let a random vector $ X = ( X _ {1} \dots X _ {n} ) $ be observed, with components $ X _ {1} \dots X _ {n} $ that are independent identically-distributed random variables subject to the normal law $ N _ {1} ( \theta , 1) $, with unknown mathematical expectation $ \theta = {\mathsf E} X _ {i} $ $ ( | \theta | < \infty ) $, while the variance is equal to 1, i.e. for any real number $ x $,

$$ {\mathsf P} \{ X _ {i} < x \mid \theta \} = \ \Phi ( x- \theta ) = \ \frac{1}{\sqrt {2 \pi } } \int\limits _ {- \infty } ^ { x } e ^ {-( t- \theta ) ^ {2} /2 } dt, $$

$$ i = 1 \dots n. $$

Under these conditions it is possible to examine the problem of testing $ H _ {0} $: $ \theta = \theta _ {0} $ against $ H _ {1} $: $ \theta \neq \theta _ {0} $, where $ \theta _ {0} $ is a given number. In the given example, $ H _ {0} $ is a simple, while $ H _ {1} $ is a compound hypothesis.

Formally, the competing hypotheses $ H _ {0} $ and $ H _ {1} $ are equivalent in the problem of choosing between them, and the question of which of these two non-intersecting and mutually-complementary sets from $ H $ should be called the null hypothesis is not vital, and does not affect the construction of the theory of statistical hypotheses testing itself. However, as a rule, the researcher's attitude to the problem itself affects the choice of the null hypothesis, with the result that the null hypothesis is often taken to be that subset $ H _ {0} $ of the set $ H $ of all admissible hypotheses that in the researcher's opinion, bearing in mind the nature of the phenomenon in question, or in the light of any physical considerations, will best fit in with the expected experimental data. For this very reason, $ H _ {0} $ is often called the hypothesis to be tested. On a theoretical plan, the difference between $ H _ {0} $ and $ H _ {1} $ is often explained by the fact that, as a rule, $ H _ {0} $ has a simpler structure than $ H _ {1} $, as reflected in the researcher's preference for the simpler model.

In the theory of statistical hypotheses testing, the decision on the correctness of $ H _ {0} $ or $ H _ {1} $ is taken on the basis of an observed realization of the random vector $ X $; the decision principle used in taking the decision "the hypothesis Hi is correct" $ ( i = 0, 1) $, is called a statistical test. The structure of any statistical test is completely defined by its so-called critical function $ \phi _ {n} ( \cdot ) : \mathfrak X _ {n} \rightarrow [ 0, 1] $. According to the statistical test with critical function $ \phi _ {n} ( \cdot ) $, the hypothesis $ H _ {0} $ to be tested is rejected with probability $ \phi _ {n} ( X) $ in favour of the alternative $ H _ {1} $, while $ H _ {1} $ is rejected with probability $ 1- \phi _ {n} ( X) $ in favour of $ H _ {0} $. From a practical point of view, the most interesting are the so-called non-randomized tests, whose critical functions take only two values: 0 and 1. Whichever the test used in choosing between $ H _ {0} $ and $ H _ {1} $, it may lead either to a correct or a false decision being taken. In the theory of statistical hypotheses testing, wrong inferences are classified in the following way.

If the test rejects the hypothesis $ H _ {0} $ to be tested when in reality it is correct, then one says that an error of the first kind has been committed. Conversely, if the test does not reject $ H _ {0} $( and, in this test, $ H _ {0} $ is therefore accepted) when it is in fact incorrect, then one says that an error of the second kind has been committed. The problem of testing $ H _ {0} $ against $ H _ {1} $ should ideally be approached in such a way as to minimize the probabilities of these errors. Unfortunately, it is impossible, given the fixed dimension $ n $ of the vector of observations of $ X $, to control both error probabilities simultaneously: as a rule, as one decreases, so the other increases. The probabilities of these errors are expressed numerically in terms of the so-called power function $ \beta _ {n} ( \cdot ) $ of the statistical test, defined on the set $ \Theta = \Theta _ {0} \cup \Theta _ {1} $ by means of the rule:

$$ \beta _ {n} ( \theta ) = \ {\mathsf E} _ \theta \phi _ {n} ( X) = \ \int\limits _ { \mathfrak X } \phi _ {n} ( x) d {\mathsf P} _ \theta ( x),\ \ \theta \in \Theta = \Theta _ {0} \cup \Theta _ {1} . $$

It follows from the definition of the power function $ \beta _ {n} ( \cdot ) $ that if the random vector $ X $ is subject to the law $ {\mathsf P} _ \theta $, $ \theta \in \Theta = \Theta _ {0} \cup \Theta _ {1} $, then the statistical test based on the critical function $ \phi _ {n} ( \cdot ) $ will reject the hypothesis $ H _ {0} $ to be tested with probability $ \beta _ {n} ( \Theta ) $. Thus, the restriction of the power function $ \beta _ {n} ( \cdot ) $ from $ \Theta $ to $ \Theta _ {0} $ will show the probability of errors of the first kind, i.e. the probability of wrongly rejecting $ H _ {0} $. Conversely, the restriction of $ \beta _ {n} ( \cdot ) $ from $ \Theta $ to $ \Theta _ {1} $, called the power of the statistical test, shows another important quantity of the statistical test: the probability of rejecting the hypothesis $ H _ {0} $ to be tested when in reality the competing hypothesis $ H _ {1} $ is correct. The power of the statistical test is sometimes defined as the number

$$ \beta = \inf _ {\theta \in \Theta _ {1} } \beta _ {n} ( \theta ) = \ \inf _ {\theta \in \Theta _ {1} } {\mathsf E} _ \theta \phi _ {n} ( X). $$

By complementation, i.e. by use of the function $ 1- \beta _ {n} ( \cdot ) $, defined on the set $ \Theta _ {1} $, the probability of an error of the second kind can be calculated.

The problem of testing $ H _ {0} $ against $ H _ {1} $ using the classical Neyman–Pearson model begins with the choice of an upper bound $ \alpha $ $ ( 0 < \alpha < 1 ) $ for the probability of wrongly rejecting $ H _ {0} $, i.e. for the probability of an error of the first kind, and, given this bound $ \alpha $, the test with the greatest power is then sought. Owing to the special role played by $ H _ {0} $ in the researcher's work, the number $ \alpha $, called the significance level of the test, is taken to be sufficiently small, equal for example to 0.01; 0.05; 0.1; etc. The choice of the significance level $ \alpha $ means that the set of all statistical tests designed to test $ H _ {0} $ against $ H _ {1} $ is restricted to the set of those tests satisfying the condition

$$ \tag{1 } \sup _ {\theta \in \Theta _ {0} } \beta _ {n} ( \theta ) = \ \sup _ {\theta \in \Theta _ {0} } {\mathsf E} _ \theta \phi _ {n} ( X) = \ \alpha . $$

(It is sometimes required that, instead of condition (1), $ \sup _ {\theta \in \Theta _ {0} } \beta _ {n} ( \theta ) \leq \alpha $, which makes no difference to the general theory of statistical hypotheses testing.) A statistical test that satisfies (1) is called a test at level $ \alpha $. Thus, in the classical formulation, the problem of testing $ H _ {0} $ against $ H _ {1} $ reduces to the construction of a statistical test at level $ \alpha $ whose power function satisfies the condition

$$ \tag{2 } \beta _ {n} ^ \star ( \theta ) \geq \beta _ {n} ( \theta ) \ \ \textrm{ for } \textrm{ all } \theta \in \Theta _ {1} , $$

where $ \beta _ {n} ( \cdot ) $ is the power function of an arbitrary test at level $ \alpha $. If $ H _ {0} $ and $ H _ {1} $ are simple, an effective solution of this optimization problem is provided by the likelihood-ratio test. If $ H _ {1} $ is compound, however, then it is rare for a statistical test to satisfy condition (2). However, if such a test does exist, then it is recognized as the best test of $ H _ {0} $ against $ H _ {1} $, and is called the uniformly most-powerful test at level $ \alpha $ in the problem of choosing between $ H _ {0} $ and $ H _ {1} $. Since uniformly most-powerful tests exist only rarely, the class of statistical tests has to be restricted by means of certain extra requirements, such as unbiasedness, similarity, completeness, and others, and the best test in the sense of (2) has to be constructed in this narrower class. For example, the requirement that the test be unbiased means that its power function must satisfy the relation

$$ \sup _ {\theta \in \Theta _ {0} } \beta _ {n} ( \theta ) \leq \inf _ { \theta \in \Theta _ {1} } \beta _ {n} ( \theta ). $$

Example 2.

Under the conditions of example 1, for any fixed significance level $ \alpha $, a non-randomized, uniformly most-powerful, unbiased test of level $ \alpha $ exists for testing $ H _ {0} $ against $ H _ {1} $, namely the likelihood-ratio test. The critical function of this best test is defined as:

$$ \phi _ {n} ( X) = \left \{ \begin{array}{ll} 1 & \textrm{ if } | \overline{X}\; - \theta _ {0} | > \frac{1}{\sqrt n } \Phi ^ {-} 1 \left ( 1- \frac \alpha {2} \right ) , \\ 0 & \textrm{ if } | \overline{X}\; - \theta _ {0} | \leq \frac{1}{\sqrt n } \Phi ^ {-} 1 \left ( 1- \frac \alpha {2} \right ) , \\ \end{array} \right . $$

where

$$ \overline{X}\; = \frac{X _ {1} + \dots + X _ {n} }{n} . $$

Owing to the fact that the statistic $ \overline{X}\; $, called the test statistic, is subject to the normal law $ N _ {1} ( \theta , 1/n) $ with parameters $ {\mathsf E} \overline{X}\; = \theta $ and $ {\mathsf D} \overline{X}\; = 1/n $, i.e. for any real number $ x $,

$$ {\mathsf P} \{ \overline{X}\; < x \mid \theta \} = \Phi [ \sqrt n ( x- \theta )], $$

the power function $ \beta _ {n} ( \cdot ) $ of the best test for testing $ H _ {0} $ against $ H _ {1} $ is expressed by the formula

$$ \beta _ {n} ( \theta ) = {\mathsf E} _ \theta \phi _ {n} ( X) = $$

$$ = \ {\mathsf P} \left \{ | \overline{X}\; - \theta _ {0} | > \frac{1}{\sqrt n } \Phi ^ {-} 1 \left ( 1- \frac \alpha {2} \right ) \mid \Theta \right \} = $$

$$ = \ \Phi \left [ \Phi ^ {-} 1 \left ( \frac \alpha {2} \right ) + \sqrt n ( \theta _ {0} - \theta ) \right ] + \Phi \left [ \Phi ^ {-} 1 \left ( \frac \alpha {2} \right ) - \sqrt n ( \theta _ {0} - \theta ) \right ] , $$

where $ \beta _ {n} ( \theta ) \geq \beta _ {n} ( \theta _ {0} ) = \alpha $. The figure below gives a graphical representation of the behaviour of the power function $ \beta _ {n} ( \cdot ) $.

Figure: s087400a

The function $ \beta _ {n} ( \cdot ) $ attains its lowest value, equal to the significance level $ \alpha $, at the point $ \theta = \theta _ {0} $, and by moving $ \theta $ away from $ \theta _ {0} $, its values increase, getting nearer to 1 as $ | \theta - \theta _ {0} | $ increases.

The theory of statistical hypotheses testing enables one to treat the different problems that arise in practice from the same point of view: the construction of interval estimators for unknown parameters, the estimation of the divergence between mean values of probability laws, the testing of hypotheses on the independence of observations, problems of statistical quality control, etc. Thus, in example 2, the acceptance region of $ H _ {0} $ is the best confidence interval with confidence coefficient $ 1 - \alpha $ for the unknown mathematical expectation $ \theta $.

Apart from the classical Neyman–Pearson approach, there are other methods for solving the problem of choosing between hypotheses: the Bayesian approach, the minimax approach, the Wald method of sequential testing, and others. Moreover, the theory of statistical hypotheses testing also includes approximate methods based on the study of the asymptotic behaviour of a sequence $ \{ \beta _ {n} ( \cdot ) \} $ of power functions of statistical tests of $ H _ {0} $ against $ H _ {1} $, when the dimension $ n $ of the vector of observations of $ X = ( X _ {1} \dots X _ {n} ) $ increases unboundedly. In this situation it is usually required that the constructed sequence of tests be consistent, i.e. that

$$ \lim\limits _ {n \rightarrow \infty } \beta _ {n} ( \theta ) = 1 \ \ \textrm{ for } \textrm{ any } \theta \in \Theta _ {1} , $$

which means that as $ n $ increases, the hypotheses $ H _ {0} $ and $ H _ {1} $ can be distinguished with a greater degree of certainty. In example 2, a consistent sequence of tests is constructed (if $ n \rightarrow \infty $).

In any case, whatever the statistical test used, the acceptance of either hypothesis does not mean that it is necessarily the correct one, but simply that there is no evidence at this stage to contradict it. Precisely because of this agreement between theory and experience, the researcher has no reason not to believe that his choice is correct until such time as new observations appear that might force him to change his attitude towards the chosen hypothesis, and perhaps even towards the whole model.

References

[1a] J. Neyman, E.S. Pearson, "On the use and interpretation of certain test criteria for purposes of statistical inference I" Biometrika , 20A (1928) pp. 175–240
[1b] J. Neyman, E.S. Pearson, "On the use and interpretation of certain test criteria for purposes of statistical inference II" Biometrika , 20A (1928) pp. 263–294
[2] J. Neyman, E.S. Pearson, "On the problem of the most efficient tests of statistical hypotheses" Phil. Trans. Roy. Soc. London Ser. A , 231 (1933) pp. 289–337
[3] E.L. Lehmann, "Testing statistical hypotheses" , Wiley (1988)
[4] H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)
[5] J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)
[6] M.S. Nikulin, "A result of Bol'shev's from the theory of the statistical testing of hypotheses" J. Soviet Math. , 44 : 3 (1989) pp. 522–529 Zap. Nauchn. Sem. Mat. Inst. Steklov. , 153 (1986) pp. 129–137
How to Cite This Entry:
Statistical hypotheses, verification of. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Statistical_hypotheses,_verification_of&oldid=49601
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article