Difference between revisions of "Dispersion analysis"

Latest revision as of 19:36, 5 June 2020

in mathematical statistics

A statistical method for detecting the effect of individual factors on the results of an experiment, and for the subsequent planning of similar experiments. Dispersion analysis was originally proposed by R.A. Fisher [1] for the processing of the results of agricultural trials, aimed at establishing the conditions under which a given agricultural crop yields a maximal harvest. Modern applications of dispersion analysis embrace a wide scope of problems in economics, sociology, biology, and technology; they are usually treated in terms of the statistical theory of detection of systematic differences between the results of direct measurements carried out under specific varying conditions.

Suppose that the values of unknown constants $ a _ {1} \dots a _ {I} $ can be measured by certain methods or using certain instruments $ M _ {1} \dots M _ {J} $, and that the systematic error $ b _ {ij } $ in each case may depend, in principle, both on the method $ M _ {j} $ chosen and on the unknown value $ a _ {i} $ to be measured. Then the results of such experiments are sums of the form

$$ x _ {ijk} = a _ {i} + b _ {ij} + y _ {ijk} , $$

$$ i = 1 \dots I ; \ j = 1 \dots J ; \ k = 1 \dots K , $$

where $ K $ is the number of independent measurements of the unknown magnitude $ a _ {i} $ by the method $ M _ {j} $, and $ y _ {ijk } $ is the random error of the $ k $- th measurement of $ a _ {i} $ by the method $ M _ {j} $( it is assumed that all $ y _ {ijk } $ are independent identically-distributed random variables with mathematical expectation zero: $ {\mathsf E} y _ {ijk } = 0 $). Such a linear model is known as a two-factor scheme of dispersion analysis; the first factor is the true value of the magnitude being measured, the second is the method of measurement; moreover, in this case the same number $ K $ of independent measurements is effected for any possible combination of values of the first and second factors (this assumption is immaterial for the purposes of dispersion analysis, and has only been introduced for the sake of clarity).

An example of such a situation is a competition between $ I $ sportsmen, the performance in which is evaluated by $ J $ referees, each participant in the competition appearing $ K $ times (is allowed $ K $" attempts" ). Here, $ a _ {i} $ is the true value of the performance index of sportsman number $ i $; $ b _ {ij} $ is the systematic error introduced in the evaluation of the performance of the $ i $- th sportsman by the $ j $- th referee; $ x _ {ijk} $ is the evaluation of the performance of the $ i $- th sportsman at the $ k $- th attempt, given by the $ j $- th referee; while $ y _ {ijk} $ is the respective random error. Such a setup is typical of the so-called subjective examination of the quality of a number of objects, effected by a group of independent experts. Another example is a statistical study of the productivity of an agricultural crop, in dependence of one of $ I $ kinds of soil and $ J $ methods of soil tillage, $ K $ independent experiments being performed for each type of soil $ i $ and each tillage method $ j $. In this example, $ b _ {ij} $ is the true value of the productivity of the crop for the $ i $- th type of soil tilled by the $ j $- th method, $ x _ {ijk} $ is the respective observed productivity of the crop in the $ k $- th trial; while $ y _ {ijk} $ is its random error caused by random factors; as regards the value of $ a _ {i} $, this may reasonably be equated to zero in agricultural experiments (see also [5]).

Let $ c _ {ij} = a _ {i} + b _ {ij} $, and let $ c _ {i*} $, $ c _ {*} j $ and $ c _ {**} $ be the results of averaging $ c _ {ij} $ over the corresponding indices, i.e.

$$ c _ {i*} = \frac{1}{J} \sum _ { j } c _ {ij} ,\ \ c _ {*} j = \frac{1}{I} \sum _ { i } c _ {ij} , $$

$$ c _ {**} = \frac{1}{IJ} \sum _ { ij } c _ {ij} = \frac{1}{I} \sum _ { i } c _ {i*} = \frac{1}{J} \sum _ { j } c _ {*} j . $$

Also, let $ \alpha = c _ {**} $, $ \beta _ {i} = c _ {i*} - c _ {**} $, $ \gamma _ {j} = c _ {*} j - c _ {**} $, and $ \delta _ {ij} = c _ {ij} - c _ {i*} - c _ {*} j + c _ {**} $. The idea of dispersion analysis is based on the obvious identity

$$ \tag{1 } c _ {ij} = \alpha + \beta _ {i} + \gamma _ {j} + \delta _ {ij} ,\ \ i = 1 \dots I ,\ j= 1 \dots J . $$

If the symbol $ ( c _ {ij} ) $ denotes a vector of dimension $ IJ $, obtained from a matrix $ \| c _ {ij} \| $ of order $ I \times J $ by some pre-set mode of ordering of its entries, then (1) may be written down as the equation

$$ \tag{2 } ( c _ {ij} ) = ( \alpha _ {ij} ) + ( \beta _ {ij} ) + ( \gamma _ {ij} ) + ( \delta _ {ij} ) , $$

where all the vectors are of dimension $ IJ $, and $ \alpha _ {ij} = \alpha $, $ \beta _ {ij} = \beta _ {i} $, $ \gamma _ {ij} = \gamma _ {j} $. Since the four vectors on the right-hand side of (2) are orthogonal, $ \alpha _ {ij} = \alpha $ is the best approximation of the function $ c _ {ij} $ in the arguments $ i $ and $ j $ by a constant magnitude (in the sense of the minimum sum of the square deviations $ \sum _ {ij} ( c _ {ij} - \alpha ) ^ {2} $). In the same sense $ \alpha _ {ij} + \beta _ {ij} = \alpha + \beta _ {i} $ is the best approximation of $ c _ {ij} $ by a function which depends only on $ i $; $ \alpha _ {ij} + \gamma _ {ij} = \alpha + \gamma _ {j} $ is the best approximation of $ c _ {ij} $ by a function depending only on $ j $, and $ \alpha _ {ij} + \beta _ {ij} + \gamma _ {ij} = \alpha + \beta _ {i} + \gamma _ {j} $ is the best approximation of $ c _ {ij} $ by a sum of functions, one of which (e.g. $ \alpha + \beta _ {i} $) depends on $ i $ only, while the other depends on $ j $ only. This fact, which had been established by Fisher [1] in 1918, subsequently served as the foundation of the theory of quadratic approximation of functions.

In the above example related to sports competition, the function $ \delta _ {ij} $ expresses the "interaction" of the $ i $- th sportsman with the $ j $- th referee (a positive value of $ \delta _ {ij} $ signifies an "overestimate" , i.e. systematically high estimates by the $ j $- th referee of the performances of the $ i $- th sportsman, while a negative value of $ \delta _ {ij} $ signifies an "underestimate" , i.e. estimates which are systematically too low). A necessary condition to be met by a group of experts is for all $ \delta _ {ij} $ to be equal to zero. In the case of agricultural experiments, such an equality is regarded as a hypothesis to be verified by experimental results, since the main objective is to find values of $ i $ and $ j $ such that the function (1) attains its maximum value. If this hypothesis is correct, then

$$ \max c _ {ij} = \alpha + \max \beta _ {i} + \max \gamma _ {j} , $$

which means that the detection of the best "soil" and "tillage" may be effected separately, with the result that the experimental work required is considerably reduced (for example, one may test out all $ I $ types of "soil" for one specific mode of tillage, thus finding the best type of soil, after which one tests out all $ J $ modes of "tillage" on that type of soil and finds out the best way; the total number of trials, including repetitions, will be $ ( I + J ) K $). If, on the other hand, the hypothesis $ \{ \textrm{ all } \delta _ {ij} = 0 \} $ is false, $ \max c _ {ij} $ can only be found by performing the "complete plan" described above, involving $ IJK $ experiments for $ K $ repetitions.

In the case of sports competitions the function $ \gamma _ {ij} = \gamma _ {j} $ may be treated as the systematic error committed by the $ j $- th referee in relation to all the sportsmen. Thus, $ \gamma _ {j} $ is a measure of the "rigour" or "mildness" of the $ j $- th referee. Ideally, all $ \gamma _ {j} $ are zero, but under the conditions occurring in practice one has to deal with non-zero values of $ \gamma _ {j} $ and has to take this fact into account in summing up the results of evaluations (e.g. one may base the comparison of the performance of the individual sportsman not on the sequence of true values of $ \alpha + \beta _ {1} + \gamma _ {j} \dots \alpha + \beta _ {I} + \gamma _ {j} $, but rather on the results of ordering these numbers by their values, since for all $ j = 1 \dots J $ the ordering will be the same). Finally, the sum of two remaining functions $ \alpha _ {ij} + \beta _ {ij} = \alpha + \beta _ {i} $ depends only on $ i $, and may therefore be used as a measure of the performance of the $ i $- th competitor. However, it must be borne in mind that here $ \alpha + \beta _ {i} = a _ {i} + b _ {i*} \neq a _ {i} $, and for this reason ordering the competitors according to the values of $ \alpha + \beta _ {i} $( or according to $ \alpha + \beta _ {i} + \gamma _ {j} $ for any given $ j $) may not be identical with the ordering according to the value of $ a _ {i} $. In the practical processing of expert evaluations this fact is neglected, since the above-mentioned "complete plan" does not provide for separate evaluations of $ a _ {i} $ and $ b _ {i*} $. Thus, the number $ \alpha + \beta _ {i} = a _ {i} + b _ {i*} $ is a characteristic not only of the performance of the $ i $- th sportsman, but also, to a certain extent, of the attitude of the experts towards his performance. This is why the results of subjective expert evaluations made at different times (in particular, during different Olympic games) can hardly be regarded as comparable. In the case of agricultural trials, on the other hand, no such difficulties arise, since all $ a _ {i} = 0 $, i.e. $ \alpha + \beta _ {i} = b _ {i*} $.

The true values of the functions $ \alpha $, $ \beta _ {i} $, $ \gamma _ {i} $, and $ \delta _ {ij} $ are not known and are expressed in terms of the unknown functions $ c _ {ij} $. Accordingly, the first stage in dispersion analysis is to find statistical estimators for $ c _ {ij} $ from the results $ x _ {ijk} $ of observations. An unbiased linear estimator for $ c _ {ij} $ with minimal dispersion is expressed by the formula

$$ {\widehat{c} } _ {ij} = x _ {ij*} = \frac{1}{K} \sum _ { k } x _ {ijk} . $$

Since $ \alpha $, $ \beta _ {i} $, $ \gamma _ {j} $, and $ \delta _ {ij} $ are linear functions of the entries of the matrix $ \| c _ {ij} \| $, the unbiased linear estimators for these functions with minimal dispersion are obtained by replacing the arguments $ c _ {ij} $ by the respective estimators $ {\widehat{c} } _ {ij} $, viz.

$$ \widehat \alpha = x _ {***} ,\ {\widehat \beta } _ {i} = x _ {i**} - x _ {***} , \ {\widehat \gamma } _ {j} = x _ {*} j* - x _ {***} , $$

$$ {\widehat \delta } _ {ij} = x _ {ij*} - x _ {i**} - x _ {*} j* + x _ {***} , $$

and the random vectors $ ( {\widehat \alpha } _ {ij} ) $, $ ( {\widehat \beta } _ {ij} ) $, $ ( {\widehat \gamma } _ {ij} ) $, and $ ( {\widehat \delta } _ {ij} ) $, defined in the same way as $ ( \alpha _ {ij} ) $, $ ( \beta _ {ij} ) $, $ ( \gamma _ {ij} ) $, and $ ( \delta _ {ij} ) $ introduced above, are orthogonal, i.e. are uncorrelated random vectors (in other words, any two components belonging to different vectors have correlation coefficient zero). In addition, any difference of the form

$$ x _ {ijk} - x _ {ij*} = x _ {ijk} - {\widehat{c} } _ {ij} $$

is uncorrelated with any component of these four vectors. Consider the five sets of random variables $ \{ x _ {ijk} \} $, $ \{ x _ {ijk} - x _ {ij*} \} $, $ \{ {\widehat \beta } _ {i} \} $, $ \{ {\widehat \gamma } _ {j} \} $, and $ \{ {\widehat \delta } _ {ij} \} $. Since

$$ x _ {ijk} - x _ {ij*} = y _ {ijk} - y _ {ij*} ,\ \ {\widehat \beta } _ {i} = \beta _ {i} + ( y _ {i**} - y _ {***} ) , $$

$$ {\widehat \gamma } _ {j} = \gamma _ {j} + ( y _ {*} j* - y _ {***} ) , $$

$$ {\widehat \delta } _ {ij} = \delta _ {ij} + ( y _ {ij*} - y _ {i**} - y _ {*} j* + y _ {***} ) , $$

the dispersions of the empirical distributions corresponding to these sets are expressed by the formulas

$$ S ^ {2} = \frac{1}{IJK} \sum _ { ijk } ( x _ {ijk} - x _ {***} ) ^ {2} , $$

$$ S _ {0} ^ {2} = \frac{1}{IJK} \sum _ { ijk } ( x _ {ijk} - x _ {ij*} ) ^ {2} = \frac{1}{IJK} \sum _ { ijk } ( y _ {ijk} - y _ {ij*} ) ^ {2} , $$

$$ S _ {1} ^ {2} = \frac{1}{I} \sum _ { i } {\widehat \beta } _ {i} ^ {2} = \frac{1}{I} \sum _ { i } [ \beta _ {i} + ( y _ {i**} - y _ {***} ) ] ^ {2} , $$

$$ S _ {2} ^ {2} = \frac{1}{J} \sum _ { j } {\widehat \gamma } {} _ {j} ^ {2} = \frac{1}{J} \sum _ { j } [ \gamma _ {j} + ( y _ {*} j* - y _ {***} ) ] ^ {2} , $$

$$ S _ {3} ^ {2} = \frac{1}{IJ} \sum _ { ij } {\widehat \delta } _ {ij} ^ {2} = \frac{1}{IJ} \sum _ { ij } [ \delta _ {ij} + ( y _ {ij*} - y _ {i**} - y _ {*} j* + y _ {***} ) ] ^ {2} . $$

These empirical dispersions are sums of squares of random variables, any two of which are uncorrelated provided they belong to different sums; also, the identity

$$ S ^ {2} = S _ {0} ^ {2} + S _ {1} ^ {2} + S _ {2} ^ {2} + S _ {3} ^ {2} , $$

explaining the origin of the term "dispersion analysis" , is valid for all $ y _ {ijk} $.

Let $ I, J, K \geq 2 $ and let

$$ s _ {0} ^ {2} = \frac{K}{K-} 1 S _ {0} ^ {2} ,\ s _ {1} ^ {2} = \frac{IJK}{I-} 1 S _ {1} ^ {2} ,\ s _ {2} ^ {2} = \frac{IJK}{J-} 1 S _ {2} ^ {2} , $$

$$ s _ {3} ^ {2} = \frac{IJK}{( I- 1 ) ( J- 1 ) } S _ {3} ^ {2} ; $$

then

$$ {\mathsf E} s _ {0} ^ {2} = \sigma ^ {2} ,\ \ {\mathsf E} s _ {1} ^ {2} = \ \sigma ^ {2} + \frac{JK}{I-} 1 \sum _ { i } \beta _ {i} ^ {2} ,\ {\mathsf E} s _ {2} ^ {2} = \sigma ^ {2} + \frac{IK}{J-} 1 \sum _ { j } \gamma _ {j} ^ {2} , $$

$$ {\mathsf E} s _ {3} ^ {2} = \sigma ^ {2} + \frac{K}{ ( I- 1 )( J- 1) } \sum _ { ij } \delta _ {ij} ^ {2} , $$

where $ \sigma ^ {2} $ is the dispersion of the random errors $ y _ {ijk} $.

These formulas form the base of the second stage in dispersion analysis — to wit, the clarification of the effect of the first and of the second factor on the experimental results (in agricultural trials the first factor is the "soil" type, the second is the mode of "tillage" ). For instance, in order to verify the hypothesis that the two factors are mutually "independent" , i.e. that $ \sum _ {ij} \delta _ {ij} ^ {2} = 0 $, it is reasonable to compute the dispersion proportion $ s _ {3} ^ {2} / s _ {0} ^ {2} = F _ {3} $. If this ratio is significantly different from one, the hypothesis is rejected. In the same way, the hypothesis $ \sum _ {j} \gamma _ {j} ^ {2} = 0 $ is usefully verified by the proportion $ s _ {2} ^ {2} / s _ {0} ^ {2} = F _ {2} $, which should also be compared with one; if it also known that $ \sum _ {ij} \delta _ {ij} ^ {2} = 0 $, the expression

$$ \frac{( IJK - I - J - 1 ) s _ {2} ^ {2} }{IJ ( K- 1 ) s _ {0} ^ {2} + ( I- 1 ) ( J- 1 ) s _ {3} ^ {2} } = F _ {2} ^ { * } , $$

rather than $ F _ {2} $, should be compared with one. A statistic for the verification of the hypothesis $ \sum _ {i} \beta _ {i} ^ {2} = 0 $ can be constructed in a similar manner.

The exact meaning of the concept of a significant difference of the above expressions from one may be defined only in terms of the distribution law of the random errors $ y _ {ijk} $. The situation most extensively studied in dispersion analysis is that of all $ y _ {ijk} $ being normally distributed. In such a case $ ( {\widehat \alpha } _ {ij} ) $, $ ( {\widehat \beta } _ {ij} ) $, $ ( {\widehat \gamma } _ {ij} ) $, $ ( {\widehat \delta } _ {ij} ) $ are independent random vectors, while $ s _ {0} ^ {2} $, $ s _ {1} ^ {2} $, $ s _ {2} ^ {2} $, $ s _ {3} ^ {2} $ are independent random variables, and the statistics

$$ IJ ( K- 1 ) \frac{s _ {0} ^ {2} }{\sigma ^ {2} } ,\ ( I- 1 ) \frac{s _ {1} ^ {2} }{\sigma ^ {2} } ,\ ( J- 1 ) \frac{s _ {2} ^ {2} }{\sigma ^ {2} } , $$

$$ ( I- 1)( J- 1 ) \frac{s _ {3} ^ {2} }{\sigma ^ {2} } $$

will have non-central chi-squared distributions with $ f _ {m} $ degrees of freedom and with non-centrality parameters $ \lambda _ {m} $, $ m= 0 , 1 , 2 , 3 $, where

$$ f _ {0} = IJ ( K- 1 ) ,\ f _ {1} = I- 1 ,\ f _ {2} = J- 1 ,\ f _ {3} = ( I- 1 )( J- 1 ) ; $$

$$ \lambda _ {0} = 0 ,\ \lambda _ {1} = JK \sum _ { i } \frac{\beta _ {i} ^ {2} }{\sigma ^ {2} } ,\ \lambda _ {2} = IK \sum _ { j } \frac{\gamma _ {j} ^ {2} }{\sigma ^ {2} } , $$

$$ \lambda _ {3} = K \sum _ { ij } \frac{\delta _ {ij} ^ {2} }{\sigma ^ {2} } . $$

If the non-centrality parameter is zero, the non-central chi-squared distribution becomes identical with the ordinary chi-squared distribution. Thus, if the hypothesis $ \lambda _ {3} = 0 $ is true, the proportion $ s _ {3} ^ {2} / s _ {0} ^ {2} = F _ {3} $ has an $ F $- distribution with parameters $ f _ {3} $ and $ f _ {0} $( the distribution of the dispersion proportion). Let $ x $ be the number for which the probability of the event $ \{ F _ {3} > x \} $ equals a pre-set value $ \epsilon $ known as the significance level (tables of the function $ x = x ( \epsilon ; f _ {3} , f _ {0} ) $ can be found in most textbooks on mathematical statistics). The verification criterion of the hypothesis $ \lambda _ {3} = 0 $ is that if the observed value of $ F _ {3} $ is greater than $ x $, the hypothesis is rejected; otherwise, the hypothesis is said not to be in contradiction with the experimental results. Criteria based on the statistics $ F _ {2} $ and $ F _ {2} ^ { * } $ are constructed in a similar manner.

The following stages in dispersion analysis materially depend not only on the nature of the problem to be solved, but also on the results of the statistical verification of the hypothesis during the second stage. Thus, as has been seen, the truth of the hypothesis $ \lambda _ {3} = 0 $ in agricultural trials permits a more economical design of subsequent experiments (if the hypotheses $ \lambda _ {3} = 0 $ and $ \lambda _ {2} = 0 $ are both true, the productivity depends only on the type of "soil" , and subsequent experiments may be performed in the framework of one-factor dispersion analysis); if the hypothesis $ \lambda _ {3} = 0 $ is false, it is reasonable to look for a third, hitherto unrecognized, factor which is relevant to the problem. If the types of "soil" and "tillage" methods were varied not only locally but in different geographic zones, climatic or geographic conditions may act as such a third factor, and the processing of the observations must involve a three-factor dispersion analysis.

In the case of expert evaluations, if the hypothesis $ \lambda _ {3} = 0 $ has been statistically confirmed, it is permissible to order the objects being compared (e.g. sportsmen) according to the values of $ \widehat \alpha + {\widehat \beta } _ {i} $, $ i= 1 \dots I $. If the hypothesis $ \lambda _ {3} = 0 $ is false (in the case of sports competition this indicates "interaction" between some some competitors and referees), the obvious course is to recalculate all results after discarding the values $ x _ {ijk} $ with pairs of indexes $ ( i, j) $ for which the absolute values of the statistical estimators $ \delta _ {ij} $ exceed some pre-set permissible level. This means that certain entries of the matrix $ \| x _ {ij*} \| $ are deleted, and the plan of dispersion analysis becomes incomplete.

Models of modern dispersion analysis comprise a wide circle of real experimental schemes (e.g. schemes of incomplete plans, with randomly or non-randomly selected elements $ x _ {ij*} $). The respective statistical conclusions are often still in the stage of development. At the time of writing (1987) particular problems in which the results of the observations $ x _ {ijk} = c _ {ij} + y _ {ijk} $ are not identically-distributed random variables are still far from being solved; even more difficult problems are those in which the values $ x _ {ijk} $ are dependent. The problem of factor selection has not been solved, even in the linear case. This problem may be formulated as follows. Let $ c = c( u , v) $ be a continuous function and let $ u = u ( z, w) $ and $ v = v( z, w) $ be arbitrary linear functions in the variables $ z $ and $ w $. Given the values of $ z _ {1} \dots z _ {I} $ and $ w _ {1} \dots w _ {J} $, $ c _ {ij} $ may be determined for any given choice of the linear functions $ u $ and $ v $ by the formula

$$ c _ {ij} = c [ u ( z _ {i} , w _ {j} ), v ( z _ {i} , w _ {j} )] , $$

and one can construct the dispersion analysis of these variables from the results of the respective observations $ x _ {ijk} $. The problem is to find the linear functions $ u $ and $ v $ for which the value of the sum of the squares $ \sum _ {ij} \delta _ {ij} ^ {2} $, where

$$ \delta _ {ij} = c _ {ij} - c _ {i*} - c _ {*} j + c _ {**} , $$

is minimal (on the assumption that the function $ c( u , v) $ is not known). In terms of dispersion analysis, the problem is reduced to a statistical determination of the factors $ z = z( u , v) $ and $ w = w( u , v) $ corresponding to "least interaction" .

References

[1]	R.A. Fisher, "Statistical methods of research workers" , Oliver & Boyd (1925)
[2]	H. Scheffé, "The analysis of variance" , Wiley (1959)
[3]	A. Hald, "Statistical theory with engineering applications" , Wiley (1952)
[4]	G.W. Snedecor, W.G. Cochran, "Statistical methods: applied to experiments in agriculture and biology" , Iowa State College Collegiate Press (1957)
[5]	M.S. Nikulin, "Application of the model of two-factor analysis of variance without interaction" J. Soviet Math. , 25 : 3 (1984) pp. 1196–1207 Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. Stud. Mat. Stat. , 108 : 5 (1981) pp. 134–153

Comments

The phrase "dispersion analysis" is out of use and has been replaced by analysis of variance.

How to Cite This Entry:
Dispersion analysis. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Dispersion_analysis&oldid=46745

This article was adapted from an original article by L.N. Bol'shev (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Dispersion analysis"

Latest revision as of 19:36, 5 June 2020

References

Comments

@@ Line 1: / Line 1: @@
+<!--
+d0333401.png
+$#A+1 = 243 n = 0
+$#C+1 = 243 : ~/encyclopedia/old_files/data/D033/D.0303340 Dispersion analysis
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
+{{TEX|auto}}
+{{TEX|done}}
 ''in mathematical statistics''
 A statistical method for detecting the effect of individual factors on the results of an experiment, and for the subsequent planning of similar experiments. Dispersion analysis was originally proposed by R.A. Fisher [[#References|[1]]] for the processing of the results of agricultural trials, aimed at establishing the conditions under which a given agricultural crop yields a maximal harvest. Modern applications of dispersion analysis embrace a wide scope of problems in economics, sociology, biology, and technology; they are usually treated in terms of the statistical theory of detection of systematic differences between the results of direct measurements carried out under specific varying conditions.
-Suppose that the values of unknown constants <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d0333401.png" /> can be measured by certain methods or using certain instruments <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d0333402.png" />, and that the systematic error <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d0333403.png" /> in each case may depend, in principle, both on the method <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d0333404.png" /> chosen and on the unknown value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d0333405.png" /> to be measured. Then the results of such experiments are sums of the form
+Suppose that the values of unknown constants  $  a _ {1} \dots a _ {I} $
+can be measured by certain methods or using certain instruments  $  M _ {1} \dots M _ {J} $,
+and that the systematic error  $  b _ {ij }  $
+in each case may depend, in principle, both on the method  $  M _ {j} $
+chosen and on the unknown value  $  a _ {i} $
+to be measured. Then the results of such experiments are sums of the form
+$$
+x _ {ijk}  =  a _ {i} + b _ {ij} + y _ {ijk} ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d0333406.png" /></td> </tr></table>
+$$
+i  =  1 \dots I ; \  j  =  1 \dots J ; \  k  =  1 \dots K ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d0333407.png" /></td> </tr></table>
+where  $  K $
+is the number of independent measurements of the unknown magnitude  $  a _ {i} $
+by the method  $  M _ {j} $,
+and  $  y _ {ijk }  $
+is the random error of the  $  k $-
+th measurement of  $  a _ {i} $
+by the method  $  M _ {j} $(
+it is assumed that all  $  y _ {ijk }  $
+are independent identically-distributed random variables with [[Mathematical expectation|mathematical expectation]] zero:  $  {\mathsf E} y _ {ijk }  = 0 $).
+Such a linear model is known as a two-factor scheme of dispersion analysis; the first factor is the true value of the magnitude being measured, the second is the method of measurement; moreover, in this case the same number  $  K $
+of independent measurements is effected for any possible combination of values of the first and second factors (this assumption is immaterial for the purposes of dispersion analysis, and has only been introduced for the sake of clarity).
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d0333408.png" /> is the number of independent measurements of the unknown magnitude <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d0333409.png" /> by the method <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334010.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334011.png" /> is the random error of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334012.png" />-th measurement of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334013.png" /> by the method <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334014.png" /> (it is assumed that all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334015.png" /> are independent identically-distributed random variables with [[Mathematical expectation|mathematical expectation]] zero: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334016.png" />). Such a linear model is known as a two-factor scheme of dispersion analysis; the first factor is the true value of the magnitude being measured, the second is the method of measurement; moreover, in this case the same number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334017.png" /> of independent measurements is effected for any possible combination of values of the first and second factors (this assumption is immaterial for the purposes of dispersion analysis, and has only been introduced for the sake of clarity).
+An example of such a situation is a competition between  $  I $
+sportsmen, the performance in which is evaluated by  $  J $
+referees, each participant in the competition appearing  $  K $
+times (is allowed  $  K $"
+attempts" ). Here,  $  a _ {i} $
+is the true value of the performance index of sportsman number  $  i $;
+$  b _ {ij} $
+is the systematic error introduced in the evaluation of the performance of the  $  i $-
+th sportsman by the  $  j $-
+th referee;  $  x _ {ijk} $
+is the evaluation of the performance of the  $  i $-
+th sportsman at the  $  k $-
+th attempt, given by the  $  j $-
+th referee; while  $  y _ {ijk} $
+is the respective random error. Such a setup is typical of the so-called subjective examination of the quality of a number of objects, effected by a group of independent experts. Another example is a statistical study of the productivity of an agricultural crop, in dependence of one of  $  I $
+kinds of soil and  $  J $
+methods of soil tillage,  $  K $
+independent experiments being performed for each type of soil  $  i $
+and each tillage method  $  j $.
+In this example,  $  b _ {ij} $
+is the true value of the productivity of the crop for the  $  i $-
+th type of soil tilled by the  $  j $-
+th method,  $  x _ {ijk} $
+is the respective observed productivity of the crop in the  $  k $-
+th trial; while  $  y _ {ijk} $
+is its random error caused by random factors; as regards the value of  $  a _ {i} $,
+this may reasonably be equated to zero in agricultural experiments (see also [[#References|[5]]]).
-An example of such a situation is a competition between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334018.png" /> sportsmen, the performance in which is evaluated by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334019.png" /> referees, each participant in the competition appearing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334020.png" /> times (is allowed <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334021.png" />  "attempts" ). Here, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334022.png" /> is the true value of the performance index of sportsman number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334023.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334024.png" /> is the systematic error introduced in the evaluation of the performance of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334025.png" />-th sportsman by the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334026.png" />-th referee; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334027.png" /> is the evaluation of the performance of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334028.png" />-th sportsman at the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334029.png" />-th attempt, given by the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334030.png" />-th referee; while <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334031.png" /> is the respective random error. Such a setup is typical of the so-called subjective examination of the quality of a number of objects, effected by a group of independent experts. Another example is a statistical study of the productivity of an agricultural crop, in dependence of one of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334032.png" /> kinds of soil and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334033.png" /> methods of soil tillage, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334034.png" /> independent experiments being performed for each type of soil <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334035.png" /> and each tillage method <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334036.png" />. In this example, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334037.png" /> is the true value of the productivity of the crop for the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334038.png" />-th type of soil tilled by the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334039.png" />-th method, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334040.png" /> is the respective observed productivity of the crop in the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334041.png" />-th trial; while <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334042.png" /> is its random error caused by random factors; as regards the value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334043.png" />, this may reasonably be equated to zero in agricultural experiments (see also [[#References|[5]]]).
+Let  $  c _ {ij} = a _ {i} + b _ {ij} $,
+and let  $  c _ {i*} $,
+$  c _ {*} j $
+and  $  c _ {**} $
+be the results of averaging  $  c _ {ij} $
+over the corresponding indices, i.e.
-Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334044.png" />, and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334045.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334046.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334047.png" /> be the results of averaging <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334048.png" /> over the corresponding indices, i.e.
+$$
+c _ {i*}  =
+\frac{1}{J}
+ \sum _ { j } c _ {ij} ,\ \
+c _ {*} j  =
+\frac{1}{I}
+ \sum _ { i } c _ {ij} ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334049.png" /></td> </tr></table>
+$$
+c _ {**}  =
+\frac{1}{IJ}
+ \sum _ { ij } c _ {ij}  =
+\frac{1}{I}
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334050.png" /></td> </tr></table>
+\sum _ { i } c _ {i*}  =
+\frac{1}{J}
+ \sum _ { j } c _ {*} j .
+$$
-Also, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334051.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334052.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334053.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334054.png" />. The idea of dispersion analysis is based on the obvious identity
+Also, let  $  \alpha = c _ {**} $,
+$  \beta _ {i} = c _ {i*} - c _ {**} $,
+$  \gamma _ {j} = c _ {*} j - c _ {**} $,
+and  $  \delta _ {ij} = c _ {ij} - c _ {i*} - c _ {*} j + c _ {**} $.
+The idea of dispersion analysis is based on the obvious identity
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334055.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+$$ \tag{1 }
+c _ {ij}  =  \alpha + \beta _ {i} + \gamma _ {j} + \delta _ {ij} ,\ \
+i = 1 \dots I ,\  j= 1 \dots J .
+$$
-If the symbol <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334056.png" /> denotes a vector of dimension <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334057.png" />, obtained from a matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334058.png" /> of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334059.png" /> by some pre-set mode of ordering of its entries, then (1) may be written down as the equation
+If the symbol  $  ( c _ {ij} ) $
+denotes a vector of dimension  $  IJ $,
+obtained from a matrix  $  \| c _ {ij} \| $
+of order  $  I \times J $
+by some pre-set mode of ordering of its entries, then (1) may be written down as the equation
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334060.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+$$ \tag{2 }
+( c _ {ij} )  =  ( \alpha _ {ij} ) + ( \beta _ {ij} ) + ( \gamma _ {ij} ) + ( \delta _ {ij} ) ,
+$$
-where all the vectors are of dimension <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334061.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334062.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334063.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334064.png" />. Since the four vectors on the right-hand side of (2) are orthogonal, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334065.png" /> is the best approximation of the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334066.png" /> in the arguments <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334067.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334068.png" /> by a constant magnitude (in the sense of the minimum sum of the square deviations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334069.png" />). In the same sense <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334070.png" /> is the best approximation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334071.png" /> by a function which depends only on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334072.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334073.png" /> is the best approximation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334074.png" /> by a function depending only on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334075.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334076.png" /> is the best approximation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334077.png" /> by a sum of functions, one of which (e.g. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334078.png" />) depends on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334079.png" /> only, while the other depends on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334080.png" /> only. This fact, which had been established by Fisher [[#References|[1]]] in 1918, subsequently served as the foundation of the theory of quadratic approximation of functions.
+where all the vectors are of dimension  $  IJ $,
+and  $  \alpha _ {ij} = \alpha $,
+$  \beta _ {ij} = \beta _ {i} $,
+$  \gamma _ {ij} = \gamma _ {j} $.
+Since the four vectors on the right-hand side of (2) are orthogonal,  $  \alpha _ {ij} = \alpha $
+is the best approximation of the function  $  c _ {ij} $
+in the arguments  $  i $
+and  $  j $
+by a constant magnitude (in the sense of the minimum sum of the square deviations  $  \sum _ {ij} ( c _ {ij} - \alpha )  ^ {2} $).
+In the same sense  $  \alpha _ {ij} + \beta _ {ij} = \alpha + \beta _ {i} $
+is the best approximation of  $  c _ {ij} $
+by a function which depends only on  $  i $;
+$  \alpha _ {ij} + \gamma _ {ij} = \alpha + \gamma _ {j} $
+is the best approximation of  $  c _ {ij} $
+by a function depending only on  $  j $,
+and  $  \alpha _ {ij} + \beta _ {ij} + \gamma _ {ij} = \alpha + \beta _ {i} + \gamma _ {j} $
+is the best approximation of  $  c _ {ij} $
+by a sum of functions, one of which (e.g.  $  \alpha + \beta _ {i} $)
+depends on  $  i $
+only, while the other depends on  $  j $
+only. This fact, which had been established by Fisher [[#References|[1]]] in 1918, subsequently served as the foundation of the theory of quadratic approximation of functions.
-In the above example related to sports competition, the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334081.png" /> expresses the  "interaction"  of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334082.png" />-th sportsman with the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334083.png" />-th referee (a positive value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334084.png" /> signifies an  "overestimate" , i.e. systematically high estimates by the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334085.png" />-th referee of the performances of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334086.png" />-th sportsman, while a negative value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334087.png" /> signifies an  "underestimate" , i.e. estimates which are systematically too low). A necessary condition to be met by a group of experts is for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334088.png" /> to be equal to zero. In the case of agricultural experiments, such an equality is regarded as a hypothesis to be verified by experimental results, since the main objective is to find values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334089.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334090.png" /> such that the function (1) attains its maximum value. If this hypothesis is correct, then
+In the above example related to sports competition, the function  $  \delta _ {ij} $
+expresses the  "interaction"  of the  $  i $-
+th sportsman with the  $  j $-
+th referee (a positive value of  $  \delta _ {ij} $
+signifies an  "overestimate" , i.e. systematically high estimates by the  $  j $-
+th referee of the performances of the  $  i $-
+th sportsman, while a negative value of  $  \delta _ {ij} $
+signifies an  "underestimate" , i.e. estimates which are systematically too low). A necessary condition to be met by a group of experts is for all  $  \delta _ {ij} $
+to be equal to zero. In the case of agricultural experiments, such an equality is regarded as a hypothesis to be verified by experimental results, since the main objective is to find values of  $  i $
+and  $  j $
+such that the function (1) attains its maximum value. If this hypothesis is correct, then
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334091.png" /></td> </tr></table>
+$$
+\max  c _ {ij}  =  \alpha + \max  \beta _ {i} +
+\max  \gamma _ {j} ,
+$$
-which means that the detection of the best  "soil"  and  "tillage"  may be effected separately, with the result that the experimental work required is considerably reduced (for example, one may test out all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334092.png" /> types of  "soil"  for one specific mode of tillage, thus finding the best type of soil, after which one tests out all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334093.png" /> modes of  "tillage"  on that type of soil and finds out the best way; the total number of trials, including repetitions, will be <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334094.png" />). If, on the other hand, the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334095.png" /> is false, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334096.png" /> can only be found by performing the  "complete plan"  described above, involving <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334097.png" /> experiments for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334098.png" /> repetitions.
+which means that the detection of the best  "soil"  and  "tillage"  may be effected separately, with the result that the experimental work required is considerably reduced (for example, one may test out all  $  I $
+types of  "soil"  for one specific mode of tillage, thus finding the best type of soil, after which one tests out all  $  J $
+modes of  "tillage"  on that type of soil and finds out the best way; the total number of trials, including repetitions, will be  $  ( I + J ) K  $).
+If, on the other hand, the hypothesis  $  \{ \textrm{ all }  \delta _ {ij} = 0 \} $
+is false,  $  \max  c _ {ij} $
+can only be found by performing the  "complete plan"  described above, involving  $  IJK $
+experiments for  $  K $
+repetitions.
-In the case of sports competitions the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d03334099.png" /> may be treated as the systematic error committed by the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340100.png" />-th referee in relation to all the sportsmen. Thus, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340101.png" /> is a measure of the  "rigour"  or  "mildness"  of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340102.png" />-th referee. Ideally, all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340103.png" /> are zero, but under the conditions occurring in practice one has to deal with non-zero values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340104.png" /> and has to take this fact into account in summing up the results of evaluations (e.g. one may base the comparison of the performance of the individual sportsman not on the sequence of true values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340105.png" />, but rather on the results of ordering these numbers by their values, since for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340106.png" /> the ordering will be the same). Finally, the sum of two remaining functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340107.png" /> depends only on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340108.png" />, and may therefore be used as a measure of the performance of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340109.png" />-th competitor. However, it must be borne in mind that here <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340110.png" />, and for this reason ordering the competitors according to the values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340111.png" /> (or according to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340112.png" /> for any given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340113.png" />) may not be identical with the ordering according to the value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340114.png" />. In the practical processing of expert evaluations this fact is neglected, since the above-mentioned  "complete plan"  does not provide for separate evaluations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340115.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340116.png" />. Thus, the number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340117.png" /> is a characteristic not only of the performance of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340118.png" />-th sportsman, but also, to a certain extent, of the attitude of the experts towards his performance. This is why the results of subjective expert evaluations made at different times (in particular, during different Olympic games) can hardly be regarded as comparable. In the case of agricultural trials, on the other hand, no such difficulties arise, since all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340119.png" />, i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340120.png" />.
+In the case of sports competitions the function  $  \gamma _ {ij} = \gamma _ {j} $
+may be treated as the systematic error committed by the  $  j $-
+th referee in relation to all the sportsmen. Thus,  $  \gamma _ {j} $
+is a measure of the  "rigour"  or  "mildness"  of the  $  j $-
+th referee. Ideally, all  $  \gamma _ {j} $
+are zero, but under the conditions occurring in practice one has to deal with non-zero values of  $  \gamma _ {j} $
+and has to take this fact into account in summing up the results of evaluations (e.g. one may base the comparison of the performance of the individual sportsman not on the sequence of true values of  $  \alpha + \beta _ {1} + \gamma _ {j} \dots \alpha + \beta _ {I} + \gamma _ {j} $,
+but rather on the results of ordering these numbers by their values, since for all  $  j = 1 \dots J $
+the ordering will be the same). Finally, the sum of two remaining functions  $  \alpha _ {ij} + \beta _ {ij} = \alpha + \beta _ {i} $
+depends only on  $  i $,
+and may therefore be used as a measure of the performance of the  $  i $-
+th competitor. However, it must be borne in mind that here  $  \alpha + \beta _ {i} = a _ {i} + b _ {i*} \neq a _ {i} $,
+and for this reason ordering the competitors according to the values of  $  \alpha + \beta _ {i} $(
+or according to  $  \alpha + \beta _ {i} + \gamma _ {j} $
+for any given  $  j $)
+may not be identical with the ordering according to the value of  $  a _ {i} $.
+In the practical processing of expert evaluations this fact is neglected, since the above-mentioned  "complete plan"  does not provide for separate evaluations of  $  a _ {i} $
+and  $  b _ {i*} $.
+Thus, the number  $  \alpha + \beta _ {i} = a _ {i} + b _ {i*} $
+is a characteristic not only of the performance of the  $  i $-
+th sportsman, but also, to a certain extent, of the attitude of the experts towards his performance. This is why the results of subjective expert evaluations made at different times (in particular, during different Olympic games) can hardly be regarded as comparable. In the case of agricultural trials, on the other hand, no such difficulties arise, since all  $  a _ {i} = 0 $,
+i.e.  $  \alpha + \beta _ {i} = b _ {i*} $.
-The true values of the functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340121.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340122.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340123.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340124.png" /> are not known and are expressed in terms of the unknown functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340125.png" />. Accordingly, the first stage in dispersion analysis is to find statistical estimators for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340126.png" /> from the results <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340127.png" /> of observations. An unbiased linear estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340128.png" /> with minimal [[Dispersion|dispersion]] is expressed by the formula
+The true values of the functions  $  \alpha $,
+$  \beta _ {i} $,
+$  \gamma _ {i} $,
+and  $  \delta _ {ij} $
+are not known and are expressed in terms of the unknown functions  $  c _ {ij} $.
+Accordingly, the first stage in dispersion analysis is to find statistical estimators for  $  c _ {ij} $
+from the results  $  x _ {ijk} $
+of observations. An unbiased linear estimator for  $  c _ {ij} $
+with minimal [[Dispersion|dispersion]] is expressed by the formula
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340129.png" /></td> </tr></table>
+$$
+{\widehat{c}  } _ {ij}  =  x _ {ij*}  =
+\frac{1}{K}
+ \sum _ { k } x _ {ijk} .
+$$
-Since <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340130.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340131.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340132.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340133.png" /> are linear functions of the entries of the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340134.png" />, the unbiased linear estimators for these functions with minimal dispersion are obtained by replacing the arguments <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340135.png" /> by the respective estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340136.png" />, viz.
+Since  $  \alpha $,
+$  \beta _ {i} $,
+$  \gamma _ {j} $,
+and  $  \delta _ {ij} $
+are linear functions of the entries of the matrix  $  \| c _ {ij} \| $,
+the unbiased linear estimators for these functions with minimal dispersion are obtained by replacing the arguments  $  c _ {ij} $
+by the respective estimators  $  {\widehat{c}  } _ {ij} $,
+viz.
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340137.png" /></td> </tr></table>
+$$
+\widehat \alpha    =  x _ {***} ,\  {\widehat \beta   } _ {i}  =  x _ {i**} - x _ {***} ,
+\  {\widehat \gamma   } _ {j}  =  x _ {*} j* - x _ {***} ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340138.png" /></td> </tr></table>
+$$
+{\widehat \delta   } _ {ij}  =  x _ {ij*} - x _ {i**} - x _ {*} j* + x _ {***} ,
+$$
-and the random vectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340139.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340140.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340141.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340142.png" />, defined in the same way as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340143.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340144.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340145.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340146.png" /> introduced above, are orthogonal, i.e. are uncorrelated random vectors (in other words, any two components belonging to different vectors have [[Correlation coefficient|correlation coefficient]] zero). In addition, any difference of the form
+and the random vectors  $  ( {\widehat \alpha   } _ {ij} ) $,
+$  ( {\widehat \beta   } _ {ij} ) $,
+$  ( {\widehat \gamma   } _ {ij} ) $,
+and  $  ( {\widehat \delta   } _ {ij} ) $,
+defined in the same way as  $  ( \alpha _ {ij} ) $,
+$  ( \beta _ {ij} ) $,
+$  ( \gamma _ {ij} ) $,
+and  $  ( \delta _ {ij} ) $
+introduced above, are orthogonal, i.e. are uncorrelated random vectors (in other words, any two components belonging to different vectors have [[Correlation coefficient|correlation coefficient]] zero). In addition, any difference of the form
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340147.png" /></td> </tr></table>
+$$
+x _ {ijk} - x _ {ij*}  =  x _ {ijk} - {\widehat{c}  } _ {ij}  $$
-is uncorrelated with any component of these four vectors. Consider the five sets of random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340148.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340149.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340150.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340151.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340152.png" />. Since
+is uncorrelated with any component of these four vectors. Consider the five sets of random variables  $  \{ x _ {ijk} \} $,
+$  \{ x _ {ijk} - x _ {ij*} \} $,
+$  \{ {\widehat \beta   } _ {i} \} $,
+$  \{ {\widehat \gamma   } _ {j} \} $,
+and  $  \{ {\widehat \delta   } _ {ij} \} $.
+Since
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340153.png" /></td> </tr></table>
+$$
+x _ {ijk} - x _ {ij*}  =  y _ {ijk} - y _ {ij*} ,\ \
+{\widehat \beta   } _ {i}  =  \beta _ {i} + ( y _ {i**} - y _ {***} ) ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340154.png" /></td> </tr></table>
+$$
+{\widehat \gamma   } _ {j}  =  \gamma _ {j} + ( y _ {*} j* - y _ {***} ) ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340155.png" /></td> </tr></table>
+$$
+{\widehat \delta   } _ {ij}  =  \delta _ {ij} + ( y _ {ij*} - y _ {i**} - y _ {*} j* + y _ {***} ) ,
+$$
 the dispersions of the empirical distributions corresponding to these sets are expressed by the formulas
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340156.png" /></td> </tr></table>
+$$
+S  ^ {2}  =
+\frac{1}{IJK}
+ \sum _ { ijk } ( x _ {ijk} - x _ {***} )  ^ {2} ,
+$$
+$$
+S _ {0}  ^ {2}  =
+\frac{1}{IJK}
+ \sum _ { ijk } ( x _ {ijk} - x _ {ij*} )
+ ^ {2}  =
+\frac{1}{IJK}
+ \sum _ { ijk } ( y _ {ijk} - y _ {ij*} )  ^ {2} ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340157.png" /></td> </tr></table>
+$$
+S _ {1}  ^ {2}  =
+\frac{1}{I}
+ \sum _ { i } {\widehat \beta   } _ {i}  ^ {2}  =
+\frac{1}{I}
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340158.png" /></td> </tr></table>
+\sum _ { i } [ \beta _ {i} + ( y _ {i**} - y _ {***} ) ]  ^ {2} ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340159.png" /></td> </tr></table>
+$$
+S _ {2}  ^ {2}  =
+\frac{1}{J}
+ \sum _ { j } {\widehat \gamma   } {} _ {j}  ^ {2}  =
+\frac{1}{J}
+ \sum _ { j } [ \gamma _ {j} + ( y _ {*} j* - y _ {***} ) ]  ^ {2} ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340160.png" /></td> </tr></table>
+$$
+S _ {3}  ^ {2}  =
+\frac{1}{IJ}
+ \sum _ { ij } {\widehat \delta   } _ {ij}  ^ {2}  =
+\frac{1}{IJ}
+ \sum _ { ij } [ \delta _ {ij} + ( y _ {ij*} - y _ {i**} - y _ {*} j* + y _ {***} ) ]  ^ {2} .
+$$
 These empirical dispersions are sums of squares of random variables, any two of which are uncorrelated provided they belong to different sums; also, the identity
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340161.png" /></td> </tr></table>
+$$
+S  ^ {2}  =  S _ {0}  ^ {2} + S _ {1}  ^ {2} + S _ {2}  ^ {2} + S _ {3}  ^ {2} ,
+$$
-explaining the origin of the term  "dispersion analysis" , is valid for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340162.png" />.
+explaining the origin of the term  "dispersion analysis" , is valid for all  $  y _ {ijk} $.
-Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340163.png" /> and let
+Let  $  I, J, K \geq  2 $
+and let
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340164.png" /></td> </tr></table>
+$$
+s _ {0}  ^ {2}  =
+\frac{K}{K-}
+S _ {0}  ^ {2} ,\  s _ {1}  ^ {2}  =
+\frac{IJK}{I-}
+S _ {1}  ^ {2} ,\  s _ {2}  ^ {2}  =
+\frac{IJK}{J-}
+S _ {2}  ^ {2} ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340165.png" /></td> </tr></table>
+$$
+s _ {3}  ^ {2}  =
+\frac{IJK}{( I- 1 ) ( J- 1 ) }
+ S _ {3}  ^ {2} ;
+$$
 then
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340166.png" /></td> </tr></table>
+$$
+{\mathsf E} s _ {0}  ^ {2}  =  \sigma  ^ {2} ,\ \
+{\mathsf E} s _ {1}  ^ {2}  = \
+\sigma  ^ {2} +
+\frac{JK}{I-}
+\sum _ { i } \beta _ {i}  ^ {2} ,\  {\mathsf E} s _ {2}  ^ {2}  =  \sigma  ^ {2} +
+\frac{IK}{J-}
+\sum _ { j } \gamma _ {j}  ^ {2} ,
+$$
+$$
+{\mathsf E} s _ {3}  ^ {2}  =  \sigma  ^ {2} +
+\frac{K}{
+( I- 1 )( J- 1) }
+ \sum _ { ij } \delta _ {ij}  ^ {2} ,
+$$
+where  $  \sigma  ^ {2} $
+is the [[Dispersion|dispersion]] of the random errors  $  y _ {ijk} $.
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340167.png" /></td> </tr></table>
+These formulas form the base of the second stage in dispersion analysis — to wit, the clarification of the effect of the first and of the second factor on the experimental results (in agricultural trials the first factor is the  "soil"  type, the second is the mode of  "tillage" ). For instance, in order to verify the hypothesis that the two factors are mutually  "independent" , i.e. that  $  \sum _ {ij} \delta _ {ij}  ^ {2} = 0 $,
+it is reasonable to compute the dispersion proportion  $  s _ {3}  ^ {2} / s _ {0}  ^ {2} = F _ {3} $.
+If this ratio is significantly different from one, the hypothesis is rejected. In the same way, the hypothesis  $  \sum _ {j} \gamma _ {j}  ^ {2} = 0 $
+is usefully verified by the proportion  $  s _ {2}  ^ {2} / s _ {0}  ^ {2} = F _ {2} $,
+which should also be compared with one; if it also known that  $  \sum _ {ij} \delta _ {ij}  ^ {2} = 0 $,
+the expression
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340168.png" /> is the [[Dispersion|dispersion]] of the random errors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340169.png" />.
+$$
-These formulas form the base of the second stage in dispersion analysis — to wit, the clarification of the effect of the first and of the second factor on the experimental results (in agricultural trials the first factor is the  "soil"  type, the second is the mode of  "tillage" ). For instance, in order to verify the hypothesis that the two factors are mutually  "independent" , i.e. that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340170.png" />, it is reasonable to compute the dispersion proportion <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340171.png" />. If this ratio is significantly different from one, the hypothesis is rejected. In the same way, the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340172.png" /> is usefully verified by the proportion <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340173.png" />, which should also be compared with one; if it also known that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340174.png" />, the expression
+\frac{( IJK - I - J - 1 ) s _ {2}  ^ {2} }{IJ ( K- 1 ) s _ {0}  ^ {2}
++ ( I- 1 ) ( J- 1 ) s _ {3}  ^ {2} }
+  =  F _ {2} ^ { * } ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340175.png" /></td> </tr></table>
+rather than  $  F _ {2} $,
+should be compared with one. A statistic for the verification of the hypothesis  $  \sum _ {i} \beta _ {i}  ^ {2} = 0 $
+can be constructed in a similar manner.
-rather than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340176.png" />, should be compared with one. A statistic for the verification of the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340177.png" /> can be constructed in a similar manner.
+The exact meaning of the concept of a significant difference of the above expressions from one may be defined only in terms of the distribution law of the random errors  $  y _ {ijk} $.
+The situation most extensively studied in dispersion analysis is that of all  $  y _ {ijk} $
+being normally distributed. In such a case  $  ( {\widehat \alpha   } _ {ij} ) $,
+$  ( {\widehat \beta   } _ {ij} ) $,
+$  ( {\widehat \gamma   } _ {ij} ) $,
+$  ( {\widehat \delta   } _ {ij} ) $
+are independent random vectors, while  $  s _ {0}  ^ {2} $,
+$  s _ {1}  ^ {2} $,
+$  s _ {2}  ^ {2} $,
+$  s _ {3}  ^ {2} $
+are independent random variables, and the statistics
-The exact meaning of the concept of a significant difference of the above expressions from one may be defined only in terms of the distribution law of the random errors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340178.png" />. The situation most extensively studied in dispersion analysis is that of all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340179.png" /> being normally distributed. In such a case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340180.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340181.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340182.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340183.png" /> are independent random vectors, while <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340184.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340185.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340186.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340187.png" /> are independent random variables, and the statistics
+$$
+IJ ( K- 1 )
+\frac{s _ {0}  ^ {2} }{\sigma  ^ {2} }
+ ,\  ( I- 1 )
+\frac{s _ {1}  ^ {2} }{\sigma  ^ {2} }
+ ,\  ( J- 1 )
+\frac{s _ {2}  ^ {2} }{\sigma  ^ {2} }
+ ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340188.png" /></td> </tr></table>
+$$
+( I- 1)( J- 1 )
+\frac{s _ {3}  ^ {2} }{\sigma  ^ {2} }
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340189.png" /></td> </tr></table>
+$$
-will have non-central chi-squared distributions with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340190.png" /> degrees of freedom and with non-centrality parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340191.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340192.png" />, where
+will have non-central chi-squared distributions with  $  f _ {m} $
+degrees of freedom and with non-centrality parameters  $  \lambda _ {m} $,
+$  m= 0 , 1 , 2 , 3 $,
+where
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340193.png" /></td> </tr></table>
+$$
+f _ {0}  =  IJ ( K- 1 ) ,\  f _ {1}  =  I- 1 ,\  f _ {2}  =  J- 1 ,\  f _ {3}  =  ( I- 1 )( J- 1 ) ;
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340194.png" /></td> </tr></table>
+$$
+\lambda _ {0}  =  0 ,\  \lambda _ {1}  =  JK \sum _ { i }
+\frac{\beta _ {i}  ^ {2} }{\sigma  ^ {2} }
+ ,\  \lambda _ {2}  =  IK \sum _ { j }
+\frac{\gamma _ {j}  ^ {2} }{\sigma  ^ {2} }
+ ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340195.png" /></td> </tr></table>
+$$
+\lambda _ {3}  =  K \sum _ { ij }
+\frac{\delta _ {ij}  ^ {2} }{\sigma  ^ {2} }
+ .
+$$
-If the non-centrality parameter is zero, the non-central chi-squared distribution becomes identical with the ordinary chi-squared distribution. Thus, if the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340196.png" /> is true, the proportion <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340197.png" /> has an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340198.png" />-distribution with parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340199.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340200.png" /> (the distribution of the dispersion proportion). Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340201.png" /> be the number for which the probability of the event <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340202.png" /> equals a pre-set value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340203.png" /> known as the significance level (tables of the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340204.png" /> can be found in most textbooks on mathematical statistics). The verification criterion of the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340205.png" /> is that if the observed value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340206.png" /> is greater than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340207.png" />, the hypothesis is rejected; otherwise, the hypothesis is said not to be in contradiction with the experimental results. Criteria based on the statistics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340208.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340209.png" /> are constructed in a similar manner.
+If the non-centrality parameter is zero, the non-central chi-squared distribution becomes identical with the ordinary chi-squared distribution. Thus, if the hypothesis  $  \lambda _ {3} = 0 $
+is true, the proportion  $  s _ {3}  ^ {2} / s _ {0}  ^ {2} = F _ {3} $
+has an  $  F $-
+distribution with parameters  $  f _ {3} $
+and  $  f _ {0} $(
+the distribution of the dispersion proportion). Let  $  x $
+be the number for which the probability of the event  $  \{ F _ {3} > x \} $
+equals a pre-set value  $  \epsilon $
+known as the significance level (tables of the function  $  x = x ( \epsilon ;  f _ {3} , f _ {0} ) $
+can be found in most textbooks on mathematical statistics). The verification criterion of the hypothesis  $  \lambda _ {3} = 0 $
+is that if the observed value of  $  F _ {3} $
+is greater than  $  x $,
+the hypothesis is rejected; otherwise, the hypothesis is said not to be in contradiction with the experimental results. Criteria based on the statistics  $  F _ {2} $
+and  $  F _ {2} ^ { * } $
+are constructed in a similar manner.
-The following stages in dispersion analysis materially depend not only on the nature of the problem to be solved, but also on the results of the statistical verification of the hypothesis during the second stage. Thus, as has been seen, the truth of the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340210.png" /> in agricultural trials permits a more economical design of subsequent experiments (if the hypotheses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340211.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340212.png" /> are both true, the productivity depends only on the type of  "soil" , and subsequent experiments may be performed in the framework of one-factor dispersion analysis); if the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340213.png" /> is false, it is reasonable to look for a third, hitherto unrecognized, factor which is relevant to the problem. If the types of  "soil"  and  "tillage"  methods were varied not only locally but in different geographic zones, climatic or geographic conditions may act as such a third factor, and the processing of the observations must involve a three-factor dispersion analysis.
+The following stages in dispersion analysis materially depend not only on the nature of the problem to be solved, but also on the results of the statistical verification of the hypothesis during the second stage. Thus, as has been seen, the truth of the hypothesis  $  \lambda _ {3} = 0 $
+in agricultural trials permits a more economical design of subsequent experiments (if the hypotheses  $  \lambda _ {3} = 0 $
+and  $  \lambda _ {2} = 0 $
+are both true, the productivity depends only on the type of  "soil" , and subsequent experiments may be performed in the framework of one-factor dispersion analysis); if the hypothesis  $  \lambda _ {3} = 0 $
+is false, it is reasonable to look for a third, hitherto unrecognized, factor which is relevant to the problem. If the types of  "soil"  and  "tillage"  methods were varied not only locally but in different geographic zones, climatic or geographic conditions may act as such a third factor, and the processing of the observations must involve a three-factor dispersion analysis.
-In the case of expert evaluations, if the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340214.png" /> has been statistically confirmed, it is permissible to order the objects being compared (e.g. sportsmen) according to the values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340215.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340216.png" />. If the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340217.png" /> is false (in the case of sports competition this indicates  "interaction"  between some some competitors and referees), the obvious course is to recalculate all results after discarding the values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340218.png" /> with pairs of indexes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340219.png" /> for which the absolute values of the statistical estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340220.png" /> exceed some pre-set permissible level. This means that certain entries of the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340221.png" /> are deleted, and the plan of dispersion analysis becomes incomplete.
+In the case of expert evaluations, if the hypothesis  $  \lambda _ {3} = 0 $
+has been statistically confirmed, it is permissible to order the objects being compared (e.g. sportsmen) according to the values of  $  \widehat \alpha   + {\widehat \beta   } _ {i} $,
+$  i= 1 \dots I $.
+If the hypothesis  $  \lambda _ {3} = 0 $
+is false (in the case of sports competition this indicates  "interaction"  between some some competitors and referees), the obvious course is to recalculate all results after discarding the values  $  x _ {ijk} $
+with pairs of indexes  $  ( i, j) $
+for which the absolute values of the statistical estimators  $  \delta _ {ij} $
+exceed some pre-set permissible level. This means that certain entries of the matrix  $  \| x _ {ij*} \| $
+are deleted, and the plan of dispersion analysis becomes incomplete.
-Models of modern dispersion analysis comprise a wide circle of real experimental schemes (e.g. schemes of incomplete plans, with randomly or non-randomly selected elements <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340222.png" />). The respective statistical conclusions are often still in the stage of development. At the time of writing (1987) particular problems in which the results of the observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340223.png" /> are not identically-distributed random variables are still far from being solved; even more difficult problems are those in which the values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340224.png" /> are dependent. The problem of factor selection has not been solved, even in the linear case. This problem may be formulated as follows. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340225.png" /> be a continuous function and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340226.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340227.png" /> be arbitrary linear functions in the variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340228.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340229.png" />. Given the values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340230.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340231.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340232.png" /> may be determined for any given choice of the linear functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340233.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340234.png" /> by the formula
+Models of modern dispersion analysis comprise a wide circle of real experimental schemes (e.g. schemes of incomplete plans, with randomly or non-randomly selected elements  $  x _ {ij*} $).
+The respective statistical conclusions are often still in the stage of development. At the time of writing (1987) particular problems in which the results of the observations  $  x _ {ijk} = c _ {ij} + y _ {ijk} $
+are not identically-distributed random variables are still far from being solved; even more difficult problems are those in which the values  $  x _ {ijk} $
+are dependent. The problem of factor selection has not been solved, even in the linear case. This problem may be formulated as follows. Let  $  c = c( u , v) $
+be a continuous function and let  $  u = u ( z, w) $
+and  $  v = v( z, w) $
+be arbitrary linear functions in the variables  $  z $
+and  $  w $.
+Given the values of  $  z _ {1} \dots z _ {I} $
+and  $  w _ {1} \dots w _ {J} $,
+$  c _ {ij} $
+may be determined for any given choice of the linear functions  $  u $
+and  $  v $
+by the formula
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340235.png" /></td> </tr></table>
+$$
+c _ {ij}  =  c [ u ( z _ {i} , w _ {j} ), v ( z _ {i} , w _ {j} )] ,
+$$
-and one can construct the dispersion analysis of these variables from the results of the respective observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340236.png" />. The problem is to find the linear functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340237.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340238.png" /> for which the value of the sum of the squares <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340239.png" />, where
+and one can construct the dispersion analysis of these variables from the results of the respective observations  $  x _ {ijk} $.
+The problem is to find the linear functions  $  u $
+and  $  v $
+for which the value of the sum of the squares  $  \sum _ {ij} \delta _ {ij}  ^ {2} $,
+where
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340240.png" /></td> </tr></table>
+$$
+\delta _ {ij}  =  c _ {ij} - c _ {i*} - c _ {*} j + c _ {**} ,
+$$
-is minimal (on the assumption that the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340241.png" /> is not known). In terms of dispersion analysis, the problem is reduced to a statistical determination of the factors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340242.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033340/d033340243.png" /> corresponding to  "least interaction" .
+is minimal (on the assumption that the function  $  c( u , v) $
+is not known). In terms of dispersion analysis, the problem is reduced to a statistical determination of the factors  $  z = z( u , v) $
+and  $  w = w( u , v) $
+corresponding to  "least interaction" .
 ====References====
 <table><TR><TD valign="top">[1]</TD> <TD valign="top">  R.A. Fisher,   "Statistical methods of research workers" , Oliver &amp; Boyd  (1925)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  H. Scheffé,   "The analysis of variance" , Wiley  (1959)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  A. Hald,   "Statistical theory with engineering applications" , Wiley  (1952)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  G.W. Snedecor,   W.G. Cochran,   "Statistical methods: applied to experiments in agriculture and biology" , Iowa State College Collegiate Press  (1957)</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  M.S. Nikulin,   "Application of the model of two-factor analysis of variance without interaction"  ''J. Soviet Math.'' , '''25''' :  3  (1984)  pp. 1196–1207  ''Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. Stud. Mat. Stat.'' , '''108''' :  5  (1981)  pp. 134–153</TD></TR></table>
 ====Comments====
 The phrase  "dispersion analysis"  is out of use and has been replaced by analysis of variance.