Statistical hypotheses, verification of
statistical hypotheses testing
One of the basic parts of mathematical statistics, expounding ideas and methods for the statistical testing of correspondences between experimental data on the one hand and hypotheses on their probability characteristics on the other.
Let a random vector be observed, taking values in a measurable space , and suppose it is known that the probability distribution of belongs to a given set of probability distributions , where is a certain parametric set. is called the set of admissible hypotheses, and any non-empty subset of it is called a statistical hypothesis, or simply a hypothesis. If contains precisely one element, then the hypothesis is said to be simple, otherwise it is said to be compound. Moreover, if there are two so-called competing hypotheses distinguished in :
then one of which, for example , is called the null, and the other the alternative, hypothesis. In terms of and , the basic problem in the theory of statistical hypotheses testing can be conveniently formulated using the Neyman–Pearson model (see , ). Namely, find an optimal method that makes it possible, on the basis of an observed realization of , to test whether the hypothesis : is correct, according to which the probability distribution of belongs to the set , or whether the alternative hypothesis : is correct, according to which the probability distribution of belongs to the set
Let a random vector be observed, with components that are independent identically-distributed random variables subject to the normal law , with unknown mathematical expectation , while the variance is equal to 1, i.e. for any real number ,
Under these conditions it is possible to examine the problem of testing : against : , where is a given number. In the given example, is a simple, while is a compound hypothesis.
Formally, the competing hypotheses and are equivalent in the problem of choosing between them, and the question of which of these two non-intersecting and mutually-complementary sets from should be called the null hypothesis is not vital, and does not affect the construction of the theory of statistical hypotheses testing itself. However, as a rule, the researcher's attitude to the problem itself affects the choice of the null hypothesis, with the result that the null hypothesis is often taken to be that subset of the set of all admissible hypotheses that in the researcher's opinion, bearing in mind the nature of the phenomenon in question, or in the light of any physical considerations, will best fit in with the expected experimental data. For this very reason, is often called the hypothesis to be tested. On a theoretical plan, the difference between and is often explained by the fact that, as a rule, has a simpler structure than , as reflected in the researcher's preference for the simpler model.
In the theory of statistical hypotheses testing, the decision on the correctness of or is taken on the basis of an observed realization of the random vector ; the decision principle used in taking the decision "the hypothesis Hi is correct" , is called a statistical test. The structure of any statistical test is completely defined by its so-called critical function . According to the statistical test with critical function , the hypothesis to be tested is rejected with probability in favour of the alternative , while is rejected with probability in favour of . From a practical point of view, the most interesting are the so-called non-randomized tests, whose critical functions take only two values: 0 and 1. Whichever the test used in choosing between and , it may lead either to a correct or a false decision being taken. In the theory of statistical hypotheses testing, wrong inferences are classified in the following way.
If the test rejects the hypothesis to be tested when in reality it is correct, then one says that an error of the first kind has been committed. Conversely, if the test does not reject (and, in this test, is therefore accepted) when it is in fact incorrect, then one says that an error of the second kind has been committed. The problem of testing against should ideally be approached in such a way as to minimize the probabilities of these errors. Unfortunately, it is impossible, given the fixed dimension of the vector of observations of , to control both error probabilities simultaneously: as a rule, as one decreases, so the other increases. The probabilities of these errors are expressed numerically in terms of the so-called power function of the statistical test, defined on the set by means of the rule:
It follows from the definition of the power function that if the random vector is subject to the law , , then the statistical test based on the critical function will reject the hypothesis to be tested with probability . Thus, the restriction of the power function from to will show the probability of errors of the first kind, i.e. the probability of wrongly rejecting . Conversely, the restriction of from to , called the power of the statistical test, shows another important quantity of the statistical test: the probability of rejecting the hypothesis to be tested when in reality the competing hypothesis is correct. The power of the statistical test is sometimes defined as the number
By complementation, i.e. by use of the function , defined on the set , the probability of an error of the second kind can be calculated.
The problem of testing against using the classical Neyman–Pearson model begins with the choice of an upper bound for the probability of wrongly rejecting , i.e. for the probability of an error of the first kind, and, given this bound , the test with the greatest power is then sought. Owing to the special role played by in the researcher's work, the number , called the significance level of the test, is taken to be sufficiently small, equal for example to 0.01; 0.05; 0.1; etc. The choice of the significance level means that the set of all statistical tests designed to test against is restricted to the set of those tests satisfying the condition
(It is sometimes required that, instead of condition (1), , which makes no difference to the general theory of statistical hypotheses testing.) A statistical test that satisfies (1) is called a test at level . Thus, in the classical formulation, the problem of testing against reduces to the construction of a statistical test at level whose power function satisfies the condition
where is the power function of an arbitrary test at level . If and are simple, an effective solution of this optimization problem is provided by the likelihood-ratio test. If is compound, however, then it is rare for a statistical test to satisfy condition (2). However, if such a test does exist, then it is recognized as the best test of against , and is called the uniformly most-powerful test at level in the problem of choosing between and . Since uniformly most-powerful tests exist only rarely, the class of statistical tests has to be restricted by means of certain extra requirements, such as unbiasedness, similarity, completeness, and others, and the best test in the sense of (2) has to be constructed in this narrower class. For example, the requirement that the test be unbiased means that its power function must satisfy the relation
Under the conditions of example 1, for any fixed significance level , a non-randomized, uniformly most-powerful, unbiased test of level exists for testing against , namely the likelihood-ratio test. The critical function of this best test is defined as:
Owing to the fact that the statistic , called the test statistic, is subject to the normal law with parameters and , i.e. for any real number ,
the power function of the best test for testing against is expressed by the formula
where . The figure below gives a graphical representation of the behaviour of the power function .
The function attains its lowest value, equal to the significance level , at the point , and by moving away from , its values increase, getting nearer to 1 as increases.
The theory of statistical hypotheses testing enables one to treat the different problems that arise in practice from the same point of view: the construction of interval estimators for unknown parameters, the estimation of the divergence between mean values of probability laws, the testing of hypotheses on the independence of observations, problems of statistical quality control, etc. Thus, in example 2, the acceptance region of is the best confidence interval with confidence coefficient for the unknown mathematical expectation .
Apart from the classical Neyman–Pearson approach, there are other methods for solving the problem of choosing between hypotheses: the Bayesian approach, the minimax approach, the Wald method of sequential testing, and others. Moreover, the theory of statistical hypotheses testing also includes approximate methods based on the study of the asymptotic behaviour of a sequence of power functions of statistical tests of against , when the dimension of the vector of observations of increases unboundedly. In this situation it is usually required that the constructed sequence of tests be consistent, i.e. that
which means that as increases, the hypotheses and can be distinguished with a greater degree of certainty. In example 2, a consistent sequence of tests is constructed (if ).
In any case, whatever the statistical test used, the acceptance of either hypothesis does not mean that it is necessarily the correct one, but simply that there is no evidence at this stage to contradict it. Precisely because of this agreement between theory and experience, the researcher has no reason not to believe that his choice is correct until such time as new observations appear that might force him to change his attitude towards the chosen hypothesis, and perhaps even towards the whole model.
|[1a]||J. Neyman, E.S. Pearson, "On the use and interpretation of certain test criteria for purposes of statistical inference I" Biometrika , 20A (1928) pp. 175–240|
|[1b]||J. Neyman, E.S. Pearson, "On the use and interpretation of certain test criteria for purposes of statistical inference II" Biometrika , 20A (1928) pp. 263–294|
|||J. Neyman, E.S. Pearson, "On the problem of the most efficient tests of statistical hypotheses" Phil. Trans. Roy. Soc. London Ser. A , 231 (1933) pp. 289–337|
|||E.L. Lehmann, "Testing statistical hypotheses" , Wiley (1988)|
|||H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)|
|||J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)|
|||M.S. Nikulin, "A result of Bol'shev's from the theory of the statistical testing of hypotheses" J. Soviet Math. , 44 : 3 (1989) pp. 522–529 Zap. Nauchn. Sem. Mat. Inst. Steklov. , 153 (1986) pp. 129–137|
Statistical hypotheses, verification of. M.S. Nikulin (originator), Encyclopedia of Mathematics. URL: http://www.encyclopediaofmath.org/index.php?title=Statistical_hypotheses,_verification_of&oldid=12498