Difference between revisions of "Multi-dimensional statistical analysis"

Revision as of 08:01, 6 June 2020

multivariate statistical analysis

The branch of mathematical statistics devoted to mathematical methods for constructing optimal designs for the collection, systematization and processing of multivariate statistical data, directed towards clarifying the nature and the structure of the correlations between the components of the multivariate attribute in question, and intended for obtaining scientific and practical inferences. By a multivariate attribute is meant a $ p $- dimensional vector $ \mathbf x = ( x _ {1} \dots x _ {p} ) ^ \prime $ of components (laws, variables) $ x _ {1} \dots x _ {p} $ which may be quantitative, that is, measuring in some fixed scale the degree of manifestation of the studied property of an object, it may be ordering (or ordinal), that is, allowing the objects being analyzed to be ordered relative to the degree of manifestation in them of the studied property, and it may be classifying (or nominal), that is, allow the collection of objects being investigated, which does not lend itself to ordering, to be separated into homogeneous (relative to the analyzed property) classes. The results of measuring these components,

$$ \tag{1 } \{ \mathbf x _ {\cdot i } \} _ {1} ^ {n} = \ \{ ( x _ {1i} \dots x _ {p i } ) ^ \prime \} _ {1} ^ {n} $$

for each of $ n $ objects of a collection, forms a sequence of multivariate observations, or an initial ensemble of multivariate data, for conducting a multivariate statistical analysis. A significant part of multivariate statistical analysis involves the situation in which $ \mathbf x $ is interpreted as a multivariate random variable, and the corresponding sequence of observations (1) is a population sample. In this case the choice of a method for processing the initial statistical data and the analysis of their properties is carried out on the basis of assumptions regarding the nature of the multivariate (joint) law of the probability distribution $ {\mathsf P} ( \mathbf x ) $.

The content of multivariate statistical analysis can be conventionally divided into three basic subdivisions: the multivariate statistical analysis of multivariate distributions and their basic characteristics; the multivariate statistical analysis of the nature and structure of the correlations between the components of the multivariate attribute being investigated; and the multivariate statistical analysis of the geometric structure of the set of multi-dimensional observations being investigated.

Multivariate statistical analysis of multivariate distributions and their fundamental characteristics.

This branch covers only situations in which the observations (1) being processed have a probabilistic nature, that is, can be interpreted as a sample from a corresponding population. The basic problems of this branch are: the statistical estimation, for the multivariate distributions in question, of their fundamental numerical characteristics and parameters; the investigation of the properties of the statistical estimators used; and the investigation of the probability distributions of a number of statistics that are used to construct statistical tests for the verification of various hypotheses on the nature of the multi-dimensional data being analyzed. The fundamental results are related to the particular case when the attribute in question, $ \mathbf x $, is subject to a multivariate normal law $ N _ {p} ( \pmb\mu , \mathbf V ) $, with density function $ f ( \mathbf x \mid \pmb\mu , \mathbf V ) $ given by

$$ \tag{2 } f ( \mathbf x \mid \pmb\mu , \mathbf V ) = \ \frac{1}{( 2 \pi ) ^ {p/2} | \mathbf V | ^ {1/2} } \times $$

$$ \times \mathop{\rm exp} \left \{ - \frac{1}{2} ( \mathbf x - \pmb\mu ) ^ \prime \mathbf V ^ {-} 1 ( \mathbf x - \pmb\mu ) \right \} , $$

where $ \pmb\mu = ( \mu _ {1} \dots \mu _ {p} ) ^ \prime $ is the vector of mathematical expectations (cf. Mathematical expectation) of the components of $ \mathbf x $, that is, $ \pmb\mu _ {i} = {\mathsf E} x _ {i} $, $ i = 1 \dots p $, and $ V = \| v _ {ij} \| _ {i , j = 1 } ^ {p} $ is the covariance matrix of $ \mathbf x $, that is, $ v _ {ij} = {\mathsf E} ( x _ {i} - \mu _ {i} ) ( x _ {j} - \mu _ {j} ) $ is the covariance of these components of $ \mathbf x $( the non-degenerate case $ \mathop{\rm rank} \mathbf V = p $ is considered; in case $ \mathop{\rm rank} \mathbf V = p ^ \prime < p $, all the results remain true, but in a subspace of a smaller dimension $ p ^ \prime $ on which the probability distribution of $ \mathbf x $ is concentrated).

Thus, if (1) is a sequence of independent observations, forming a random sample from $ N _ {p} ( \pmb\mu , \mathbf V ) $, then the maximum-likelihood estimators for the parameters $ \pmb\mu $ and $ \mathbf V $ in (2) are, respectively, the statistics (see [1], [2])

$$ \tag{3 } \widehat{\pmb\mu} = \frac{1}{n} \sum _ { i= } 1 ^ { n } \mathbf x _ {\cdot i } $$

and

$$ \tag{4 } \widehat{\mathbf V} = \frac{1}{n} \sum _ { i= } 1 ^ { n } ( \mathbf x _ {\cdot i } - \widehat{\pmb\mu} ) ( \mathbf x _ {\cdot i } - \widehat{\pmb\mu} ) ^ \prime , $$

where the random vector $ \widehat{\pmb\mu} $ is subject to the $ p $- dimensional normal law $ N _ {p} ( \pmb\mu , \mathbf V / n ) $ and is statistically independent of $ \widehat{\mathbf V} $, and the joint distribution of the elements of the matrix $ \widehat{\mathbf Q} = n \widehat{\mathbf V} $ is described by the so-called Wishart distribution (see [4]) with density

$$ w ( \widehat{\mathbf Q} \mid \mathbf V ; n) = $$

$$ = \ \frac{\widehat{\mathbf Q} ^ {( n - p - 2 ) / 2 } \mathop{\rm exp} \{ - \mathop{\rm tr} ( \mathbf V ^ {-} 1 \widehat{\mathbf Q} ) /2 \} }{2 ^ { ( n - 2 ) p / 2 } \pi ^ {p ( p - 1 ) / 4 } | \mathbf V | ^ {( n - 1 ) / 2 } \prod _ { j= } 1 ^ { p } \Gamma ( ( n- j ) / 2) } $$

if $ \widehat{\mathbf Q} $ is positive definite, and 0 otherwise.

Within this scheme, the distribution and moments of sampling characteristics of multivariate random variables such as the coefficients of paired, partial and multiple correlations, the generalized variance (i.e., the statistic $ | \widehat{\mathbf V} | $) and the generalized Hotelling $ T ^ {2} $- statistic (cf. Hotelling $ T ^ {2} $- distribution and [5]) have been investigated. In particular (see [1]), if the sample covariance matrix $ \mathbf S _ {n} $ is defined as the estimator $ \widehat{\mathbf V} $ made "unbiased" , namely:

$$ \tag{5 } \mathbf S _ {n} = \frac{n}{n-} 1 \widehat{\mathbf V} , $$

then the distribution of $ \sqrt n ( | \mathbf S _ {n} | / | \mathbf V | ^ {-} 1 ) $ tends to $ N _ {1} ( 0 , 2 p ) $ as $ n \rightarrow \infty $, and the random variables

$$ \tag{6 } \frac{n - p }{p ( n - 1 ) } T ^ {2} = \ \frac{n - p }{p ( n - 1 ) } n ( \widehat{\pmb\mu} - \pmb\mu ) ^ \prime \mathbf S _ {n} ^ {-} 1 ( \widehat{\pmb\mu} - \pmb\mu ) $$

and

$$ \tag{7 } \frac{n _ {1} + n _ {2} - p - 1 }{( n _ {1} + n _ {2} - 2 ) p } \widetilde{T} {} ^ {2\ } = $$

$$ = \ \frac{n _ {1} + n _ {2} - p - 1 }{( n _ {1} + n _ {2} - 2 ) p } \frac{n _ {1} n _ {2} }{n _ {1} + n _ {2} } ( \widehat{\pmb\mu} _ {n _ {1} } - \widehat{\pmb\mu} _ {n _ {2} } ) ^ \prime \mathbf S _ {n _ {1} + n _ {2} } ^ {-} 1 ( \widehat{\pmb\mu} _ {n _ {1} } - \widehat{\pmb\mu} _ {n _ {2} } ) $$

have the Fisher $ F $- distribution with degrees of freedom $ ( p , n - p ) $ and $ ( p , n _ {1} + n _ {2} - p - 1 ) $, respectively. In (7), $ n _ {1} $ and $ n _ {2} $ are the sizes of two independent samples of the form (1) taken from the same population $ N _ {p} ( \pmb\mu , \mathbf V ) $, $ \widehat{\pmb\mu} _ {n _ {i} } $ and $ \mathbf S _ {n _ {i} } $ being estimators of the form (3) and (4)–(5), constructed with respect to the $ i $- th sample, and

$$ \mathbf S _ {n _ {1} + n _ {2} } = \ \frac{1}{n _ {1} + n _ {2} - 2 } [ ( n _ {1} - 1 ) \mathbf S _ {n _ {1} } + ( n _ {2} - 1 ) \mathbf S _ {n _ {2} } ] $$

is the common sample covariance matrix constructed with respect to the estimators $ \mathbf S _ {n _ {1} } $ and $ \mathbf S _ {n _ {2} } $.

Multivariate statistical analysis of the nature and structure of correlations between the components of the multivariate attribute in question.

This branch unifies the ideas and results used in such methods and models of multivariate statistical analysis as multiple regression; multivariate dispersion analysis and covariance analysis; factor analysis; the method of principal components; and the analysis of canonical correlations. The results of this branch may be conventionally divided into two basic types.

1) The construction of best (in a specified sense) statistical estimators for the parameters of these models and the analysis of their properties (more precisely, and in a probabilistic formulation, of their distribution laws, confidence regions, etc.). Thus, let the multivariate attribute $ \mathbf x $ be interpreted as a vector-valued random variable subject to the $ p $- dimensional normal distribution $ N _ {p} ( \pmb\mu , \mathbf V ) $, and let it be partitioned into two subvectors $ \mathbf x ^ {(} 1) $ and $ \mathbf x ^ {(} 2) $ of dimensions $ q $ and $ p- q $, respectively. This defines a corresponding partition of the expectation vector $ \pmb\mu $ and of the theoretical and sample covariance matrices $ \mathbf V $ and $ \widehat{\mathbf V} $, namely:

$$ \pmb\mu = \ \left ( \begin{array}{c} \pmb\mu ^ {(} 1) \\ \pmb\mu ^ {(} 2) \end{array} \right ) ,\ \ \mathbf V = \ \left ( Then (see [[#References|[1]]], [[#References|[2]]]) the conditional distribution of the subvector $ \mathbf x ^ {(} 1) $( under the condition that the second subvector $ \mathbf x ^ {(} 2) $ takes a fixed value) will also be normal $ N _ {q} ( \pmb\mu ^ {(} 1) + \mathbf B ( \mathbf x ^ {(} 2) - \pmb\mu ^ {(} 2) ) , \pmb\Sigma ) $. Here the maximum-likelihood estimators $ \widehat{\mathbf B} $ and $ \widehat{\pmb\Sigma} $ of the matrices of regression coefficients $ \mathbf B $ and covariances $ \pmb\Sigma $, in this classical multivariate model of multiple regression $$ \tag{8 } {\mathsf E} ( \mathbf x ^ {(} 1) \mid \mathbf x ^ {(} 2) ) = \pmb\mu ^ {(} 1) + \mathbf B ( \mathbf x ^ {(} 2) - \pmb\mu ^ {(} 2) ) , $$ will be the mutually independent statistics $$ \widehat{\mathbf B} = \widehat{\mathbf V} _ {12} \widehat{\mathbf V} {} _ {22} ^ {-} 1 \ \ \textrm{ and } \ \ \widehat \Sigma = \widehat{\mathbf V} _ {11} - \widehat{\mathbf V} _ {12} \mathbf V hat {} _ {22} ^ {-} 1 \widehat{\mathbf V} _ {21} , $$ respectively. Here the distribution of $ \widehat{\mathbf B} $ is the normal law $ N _ {q ( p- q ) } ( \mathbf B , \mathbf V _ {\mathbf B } ) $, and $ \mathbf n \widehat{\pmb\Sigma} $ has the Wishart distribution with parameters $ \pmb\Sigma $ and $ n - ( p- q ) $( the elements of $ \mathbf V _ {\mathbf B } $ are given in terms of the elements of $ \mathbf V $). The basic results on the construction of estimators of parameters and in the investigation of their properties in models of factor analysis, of principal components and of canonical correlations are related to the analysis of the probabilistic-statistical properties of the eigen values (characteristic values) and eigen vectors of the various covariance matrices. In schemes not falling within the limits of the classical normal model or even within the limits of any probabilistic model, the basic results are concerned with the construction of algorithms (and the investigation of their properties) for calculating estimators of parameters which are best from the point of view of some exogeneously given functional of the quality (or adequacy) of the model. 2) The construction of statistical tests for the verification of various hypotheses on the structure of the correlations being investigated. Within the limits of a multivariate normal model (sequences of observations of the form (1) are interpreted as random samples from the corresponding multivariate normal population) statistical tests have been constructed for testing, for example, the following hypotheses. I) The hypothesis $ \pmb\mu = \pmb\mu ^ {*} $, i.e., that the expectation of the variables studied be equal to a specific vector $ \pmb\mu ^ {*} $; this is tested via the Hotelling $ T ^ {2} $- statistic by substituting $ \pmb\mu = \pmb\mu ^ {*} $ in (6). II) The hypothesis $ \pmb\mu ^ {(} 1) = \pmb\mu ^ {(} 2) $ of equality of the expectation vectors in two populations (with identical but unknown covariance matrices), based on two samples; this is tested via the statistic $ \widetilde{T} {} ^ {2} $( see [[#References|[7]]]). III) The hypothesis $ \pmb\mu ^ {(} 1) = \dots = \pmb\mu ^ {(} k) = \pmb\mu $ of equality of the expectation vectors in several populations (with identical but unknown covariance matrices), based on samples from them; this is tested via the statistic $$ U _ { p , k- 1 , n- k } = \

\frac{\left | \sum _ { j= } 1 ^ { k } \sum _ { i= } 1 ^ { {n _ j } } ( \mathbf x _ { . i } ^ {(} j) - \widehat{\pmb\mu} {} ^ {(} j) ) ( \mathbf x _ {. i } ^ {(} j) - \widehat{\pmb\mu} {} ^ {(} j) ) ^ \prime \right | }{\left | \sum _ { j= } 1 ^ { k } \sum _ { i= } 1 ^ { {n _ j } } ( \mathbf x _ {. i } ^ {(} j) - \widehat{\pmb\mu} ) ( \mathbf x _ {. i } ^ {(} j) - \pmb\mu hat ) ^ \prime \right | }

$$ in which $ \mathbf x _ {. i } ^ {(} j) $ is the $ i $- th $ p $- dimensional observation in a sample of size $ n _ {j} $, representing the $ j $- th population, and $ \widehat{\pmb\mu} {} ^ {(} j) $ and $ \widehat{\pmb\mu} $ are estimators of the form (3), constructed separately with respect to each of the samples and with respect to the joint sample of size $ n = n _ {1} + \dots + n _ {k} $, respectively. IV) The hypotheses $ \pmb\mu ^ {(} 1) = \dots = \pmb\mu ^ {(} k) = \pmb\mu $ and $ \mathbf V _ {1} = \dots = \mathbf V _ {k} = \mathbf V $ of equivalence of several normal populations, based on samples from them $ \{ \mathbf x _ {. i } ^ {(} j) \} _ {i=} 1 ^ {n _ {j} } $, $ j = 1 \dots k $; this is tested via the statistic $$ \lambda = \

\frac{\prod _ { j= } 1 ^ { k } | n _ {j} \widehat{\mathbf V} _ {j} | ^ { ( n _ {j} - 1 ) /2 } }{\left | \sum _ { j= } 1 ^ { k } \sum _ { i= } 1 ^ { {n _ j} } ( \mathbf x _ {. i } ^ {(} j) - \widehat{\pmb\mu} ) ( \mathbf x _ {. i } ^ {(} j) - \widehat{\pmb\mu} ) ^ \prime \right | ^ {( n- k ) /2 } }

$$ in which the $ \widehat{\mathbf V} _ {j} $ are estimators of the form (4) constructed separately with respect to the observations from the $ j $- th sample, $ j = 1 \dots k $. V) The hypothesis of mutual independence of the subvectors $ \mathbf x ^ {(} 1) \dots \mathbf x ^ {(} m) $ of dimensions $ p _ {1} \dots p _ {m} $, respectively, into which the initial $ p $- dimensional vector $ \mathbf x $ has been partitioned, $ p _ {1} + \dots + p _ {m} = p $; this is tested via the statistic $$ \pmb\psi = \frac{| \mathbf n \widehat{\mathbf V} | }{\prod _ { i= } 1 ^ { m } | n _ {i} \widehat{\mathbf V} _ {i} | }

$$ in which $ \widehat{\mathbf V} $ and $ \widehat{\mathbf V} _ {i} $ are sample covariance matrices of the form (4) for the vector $ \mathbf x $ and its subvectors $ \mathbf x ^ {(} i) $, respectively. =='"`UNIQ--h-2--QINU`"'Multivariate statistical analysis of the geometric structure of the set of multi-dimensional observations being investigated.== This branch unifies notions and results of models and schemes such as [[Discriminant analysis|discriminant analysis]], mixtures of probability distributions, cluster analysis, taxonomy, and multi-dimensional scaling. The key in all of these schemes is a notion of distance (measure of proximity, measure of similarity) between the elements being analyzed. Here the objects being analyzed may both be real objects, in each of which the values of the components $ \mathbf x $ are fixed — then in the geometrical representation the $ i $- th object will be a point $ \mathbf x _ {. i } = ( x _ {1i} \dots x _ {p i } ) ^ \prime $ in the corresponding $ p $- dimensional space, as well as the variables $ \mathbf x _ {l . } $, $ l = 1 \dots p $, themselves — in the geometrical representation the $ l $- th index will be a point $ \mathbf x _ {l . } = ( x _ {l1} \dots x _ \mathop{\rm ln} ) $ in the corresponding $ n $- dimensional space. The methods and results of discriminant analysis (see [[#References|[1]]], [[#References|[2]]], [[#References|[7]]]) are directed to the solution of the following problem. Suppose that the existence of a specific number $ k \geq 2 $ of populations is known and that there is a sample from each (a "training sample" ) known. It is required to construct, on the basis of training samples, the best, in a specified sense, classifying rule which allows one to attribute some new element (an observation $ \mathbf x $) to its population, when the investigator does not know in advance to which population the element belongs. Usually, a classification rule means a sequence of actions; the calculation of a scalar function of the variables in question, based on which a decision is taken on assigning the element to one of the classes (the construction of a discriminant function); an ordering of the variables themselves according to their degree of informativeness from the point of view of a proper assignment of elements to classes; and a calculation of the corresponding probabilities of the errors in the classification. The problem of analysis of a mixture of probability distributions (see [[#References|[7]]]) most often (but not always) also arises in connection with the investigation of the "geometric structure" of some population. Here the idea of the $ r $- th homogeneous class is formalized with the help of a population described by some (as a rule, unimodal) distribution law $ {\mathsf P} ( \mathbf x \mid \pmb\theta _ {r} ) $, so that the distribution of the general population from which the sample (1) is extracted is described by a mixture of distributions of the form $$ {\mathsf P} ( \mathbf x ) = \sum _ { r= } 1 ^ { k } \pi _ {r} {\mathsf P} ( \mathbf x \mid \pmb\theta _ {r} ) , $$

where $ \pi _ {r} $ is the a priori probability (the specific weight of the elements) of the $ r $- th class in the general population. The problem is to give a "good" statistical estimation (with respect to a sample $ \{ \mathbf x _ {. i } \} _ {1} ^ {n} $) of the unknown parameters $ \pmb\theta _ {r} $, $ \pi _ {r} $, and sometimes even $ k $. This, in particular, allows one to reduce the problem of the classification of the elements to a scheme of discriminant analysis, although in this case training samples are absent.

The methods and results of cluster analysis (classification, taxonomy, pattern recognition "without a teacher" , see [2], [6], [7]) are directed to the solution of the following problem. The geometric structure of the set of elements to be analyzed is given either by the coordinates of the corresponding points (that is, by the matrix $ \| x _ {ij} \| $, $ i = 1 \dots p $, $ j = 1 \dots n $), or by geometric characteristics of their mutual disposition, for example, by the matrix of pairwise distances $ \| \rho _ {ij} \| _ {i , j = 1 } ^ {n} $. It is required to partition the set of elements being investigated into a comparatively small (known in advance or not) number of classes, so that the elements of a class are at a small distance from each other, and at the same time different classes should, as far as possible, be sufficiently far from each other and could not be partitioned into other subsets equally far from each other.

The problem of multi-dimensional scaling (see [6]) is related to the situation when the set of elements being investigated is given via a matrix of mutual distances $ \| \rho _ {ij} \| _ {i , j = 1 } ^ {n} $ and consists of attributing to each of the elements a given number ( $ p $) of coordinates so that the structure of the mutual distances between the elements, measured using these auxiliary coordinates, would on the average differ least from that given. It should be noted that the basic results and methods of cluster analysis and multi-dimensional scaling have usually been developed without any assumptions regarding the probabilistic nature of the initial data.

The merits of multivariate statistical analysis in practice.

These consist mainly in processing the following three problems.

The problem of statistical investigation of dependence between the variables being analyzed.

Suppose that the set of recorded statistical variables $ \mathbf x $ partitions, according to the meaning of these variables and the final aim of investigation, into a $ q $- dimensional subvector $ \mathbf x ^ {(} 1) $ of (dependent) variables to be predicted and a $ ( p- q ) $- dimensional subvector $ \mathbf x ^ {(} 2) $ of predicting (independent) variables. Then it can be said that the problem is to determine, on the basis of a sample (1), a $ q $- dimensional vector-valued function $ f ( \mathbf x ^ {(} 2) ) $ from the class of acceptable decisions $ F $, which would give the best, in a specific sense, approximation of the behaviour of the subvector $ \mathbf x ^ {(} 1) $. Depending on the concrete form of the functional of the quality of the approximation and the nature of the variables being analyzed, one arrives at some scheme of multiple regression, variance, covariance, or confluence analysis [8].

The problem of classifying elements.

This problem in a general (non-rigorous) formulation is that the whole set of elements (objects or variables) being analyzed, represented statistically as a matrix $ \| x _ {ij} \| $, $ i = 1 \dots p $, $ j = 1 \dots n $, or a matrix $ \| \rho _ {ij} \| $, $ i , j = 1 \dots n $, partitions into a comparatively small number of homogeneous (in a specified sense) groups [7]. Depending on the behaviour of the a priori information and the concrete form of the functional giving the criteria for the quality of the classification, one arrives at some scheme of discriminant analysis, cluster analysis (taxonomy, pattern recognition "without a teacher" ), or splitting mixtures of distributions.

The problem of lowering the dimension of the factor space being investigated and the selection of the most informative variables.

This consists of defining a set of a comparatively small number $ m \ll p $ of variables $ \mathbf z = ( z _ {1} \dots z _ {m} ) ^ \prime $ in the class of admissible transformations $ Z ( \mathbf x ) $ of the initial variables $ \mathbf x = ( x _ {1} \dots x _ {p} ) $ for which some exogeneously given measure of informativity for an $ m $- dimensional system of tests attains its least upper bound (see [7]). Concretization of the functional giving the measure of self-informativity (that is, aimed at a maximal preservation of the information contained in the statistical ensemble (1) relative to the initial attributes themselves) results, in particular, in various schemes of factor analysis and principal components, and in the method of extremal grouping of tests. Functionals giving a measure of external informativity, that is, aimed at extracting from (1) maximum information relative to certain other variables or phenomena not directly contained in $ \mathbf x $, lead to different methods of selecting the most informative variables in schemes of statistical research into dependences and discriminant analysis.

Fundamental mathematical tools in multivariate statistical analysis.

These consist of special methods of the theory of systems of linear equations and matrices (the method of solution of simple and generalized problems on eigen values and vectors; simple inversion and pseudo-inversion of matrices; a procedure for the diagonalization of matrices; etc.) and certain optimization algorithms (methods of coordinate-wise descent, conjugate gradients, branch-and-bound, various versions of random scanning and stochastic approximation, etc.).

References

[1]	T.W. Anderson, "An introduction to multivariate statistical analysis" , Wiley (1958)
[2]	M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 3 , Griffin (1983)
[3]	L.N. Bol'shev, Bull. Int. Stat. Inst. , 43 (1969) pp. 425–441
[4]	J. Wishart, Biometrika , 20A (1928) pp. 32–52
[5]	H. Hotelling, "The generalization of student's ratio" Ann. Math. Statist. , 2 (1931) pp. 360–378
[6]	J.B. Kruskal, "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis" Psychometrika , 29 (1964) pp. 1–27
[7]	S.A. Aivazyan, V.M. Bukhshtaber, I.S. Yenyukov, L.D. Meshalkin, "Applied statistics: classification and reduction of dimensionality" , Moscow (1989) (In Russian)
[8]	S.A. Aivazyan, I.S. Yenyukov, L.D. Meshalkin, "Applied statistics: study of relationships" , Moscow (1985) (In Russian)

Comments

References

[a1]	R. Gnanadesikan, "Methods for statistical data analysis of multivariate observations" , Wiley (1977)
[a2]	M.J. Schervish, Stat. Science , 2 (1987) pp. 396–433
[a3]	R. Farrell, "Techniques of multivariate calculation" , Springer (1976)
[a4]	M.L. Eaton, "Multivariate statistics: A vector space approach" , Wiley (1983)
[a5]	R.J. Muirhead, "Aspects of multivariate statistical theory" , Wiley (1982)

How to Cite This Entry:
Multi-dimensional statistical analysis. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Multi-dimensional_statistical_analysis&oldid=14286

This article was adapted from an original article by S.A. Aivazyan (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Multi-dimensional statistical analysis"

Revision as of 08:01, 6 June 2020

Contents

Multivariate statistical analysis of multivariate distributions and their fundamental characteristics.

Multivariate statistical analysis of the nature and structure of correlations between the components of the multivariate attribute in question.

The merits of multivariate statistical analysis in practice.

The problem of statistical investigation of dependence between the variables being analyzed.

The problem of classifying elements.

The problem of lowering the dimension of the factor space being investigated and the selection of the most informative variables.

Fundamental mathematical tools in multivariate statistical analysis.

References

Comments

References

@@ Line 1: / Line 1: @@
+<!--
+m0651401.png
+$#A+1 = 168 n = 0
+$#C+1 = 168 : ~/encyclopedia/old_files/data/M065/M.0605140 Multi\AAhdimensional statistical analysis,
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
+{{TEX|auto}}
+{{TEX|done}}
 ''multivariate statistical analysis''
-The branch of [[Mathematical statistics|mathematical statistics]] devoted to mathematical methods for constructing optimal designs for the collection, systematization and processing of multivariate statistical data, directed towards clarifying the nature and the structure of the correlations between the components of the multivariate attribute in question, and intended for obtaining scientific and practical inferences. By a multivariate attribute is meant a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m0651401.png" />-dimensional vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m0651402.png" /> of components (laws, variables) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m0651403.png" /> which may be quantitative, that is, measuring in some fixed scale the degree of manifestation of the studied property of an object, it may be ordering (or ordinal), that is, allowing the objects being analyzed to be ordered relative to the degree of manifestation in them of the studied property, and it may be classifying (or nominal), that is, allow the collection of objects being investigated, which does not lend itself to ordering, to be separated into homogeneous (relative to the analyzed property) classes. The results of measuring these components,
+The branch of [[Mathematical statistics|mathematical statistics]] devoted to mathematical methods for constructing optimal designs for the collection, systematization and processing of multivariate statistical data, directed towards clarifying the nature and the structure of the correlations between the components of the multivariate attribute in question, and intended for obtaining scientific and practical inferences. By a multivariate attribute is meant a  $  p $-
+dimensional vector  $  \mathbf x = ( x _ {1} \dots x _ {p} )  ^  \prime  $
+of components (laws, variables)  $  x _ {1} \dots x _ {p} $
+which may be quantitative, that is, measuring in some fixed scale the degree of manifestation of the studied property of an object, it may be ordering (or ordinal), that is, allowing the objects being analyzed to be ordered relative to the degree of manifestation in them of the studied property, and it may be classifying (or nominal), that is, allow the collection of objects being investigated, which does not lend itself to ordering, to be separated into homogeneous (relative to the analyzed property) classes. The results of measuring these components,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m0651404.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+$$ \tag{1 }
+\{ \mathbf x _ {\cdot i }  \} _ {1}  ^ {n}  = \
+\{ ( x _ {1i} \dots x _ {p i }  )  ^  \prime  \} _ {1}  ^ {n}
+$$
-for each of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m0651405.png" /> objects of a collection, forms a sequence of multivariate observations, or an initial ensemble of multivariate data, for conducting a multivariate statistical analysis. A significant part of multivariate statistical analysis involves the situation in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m0651406.png" /> is interpreted as a multivariate random variable, and the corresponding sequence of observations (1) is a population sample. In this case the choice of a method for processing the initial statistical data and the analysis of their properties is carried out on the basis of assumptions regarding the nature of the multivariate (joint) law of the probability distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m0651407.png" />.
+for each of  $  n $
+objects of a collection, forms a sequence of multivariate observations, or an initial ensemble of multivariate data, for conducting a multivariate statistical analysis. A significant part of multivariate statistical analysis involves the situation in which  $  \mathbf x $
+is interpreted as a multivariate random variable, and the corresponding sequence of observations (1) is a population sample. In this case the choice of a method for processing the initial statistical data and the analysis of their properties is carried out on the basis of assumptions regarding the nature of the multivariate (joint) law of the probability distribution  $  {\mathsf P} ( \mathbf x ) $.
 The content of multivariate statistical analysis can be conventionally divided into three basic subdivisions: the multivariate statistical analysis of multivariate distributions and their basic characteristics; the multivariate statistical analysis of the nature and structure of the correlations between the components of the multivariate attribute being investigated; and the multivariate statistical analysis of the geometric structure of the set of multi-dimensional observations being investigated.
 ==Multivariate statistical analysis of multivariate distributions and their fundamental characteristics.==
-This branch covers only situations in which the observations (1) being processed have a probabilistic nature, that is, can be interpreted as a sample from a corresponding population. The basic problems of this branch are: the statistical estimation, for the multivariate distributions in question, of their fundamental numerical characteristics and parameters; the investigation of the properties of the statistical estimators used; and the investigation of the probability distributions of a number of statistics that are used to construct statistical tests for the verification of various hypotheses on the nature of the multi-dimensional data being analyzed. The fundamental results are related to the particular case when the attribute in question, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m0651408.png" />, is subject to a multivariate normal law <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m0651409.png" />, with density function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514010.png" /> given by
+This branch covers only situations in which the observations (1) being processed have a probabilistic nature, that is, can be interpreted as a sample from a corresponding population. The basic problems of this branch are: the statistical estimation, for the multivariate distributions in question, of their fundamental numerical characteristics and parameters; the investigation of the properties of the statistical estimators used; and the investigation of the probability distributions of a number of statistics that are used to construct statistical tests for the verification of various hypotheses on the nature of the multi-dimensional data being analyzed. The fundamental results are related to the particular case when the attribute in question,  $  \mathbf x $,
+is subject to a multivariate normal law  $  N _ {p} ( \pmb\mu , \mathbf V ) $,
+with density function  $  f ( \mathbf x \mid  \pmb\mu , \mathbf V ) $
+given by
+$$ \tag{2 }
+f ( \mathbf x \mid  \pmb\mu , \mathbf V )  = \
+\frac{1}{( 2 \pi )  ^ {p/2} | \mathbf V |  ^ {1/2} }
+ \times
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514011.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+$$
+\times
+ \mathop{\rm exp}
+\left \{ -
+\frac{1}{2}
+ ( \mathbf x - \pmb\mu )  ^  \prime
+\mathbf V  ^ {-} 1 ( \mathbf x - \pmb\mu ) \right \} ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514012.png" /></td> </tr></table>
+where  $  \pmb\mu = ( \mu _ {1} \dots \mu _ {p} )  ^  \prime  $
+is the vector of mathematical expectations (cf. [[Mathematical expectation|Mathematical expectation]]) of the components of  $  \mathbf x $,
+that is,  $  \pmb\mu _ {i} = {\mathsf E} x _ {i} $,
+$  i = 1 \dots p $,
+and  $  V = \| v _ {ij} \| _ {i , j = 1 }   ^ {p} $
+is the [[Covariance matrix|covariance matrix]] of  $  \mathbf x $,
+that is,  $  v _ {ij} = {\mathsf E} ( x _ {i} - \mu _ {i} ) ( x _ {j} - \mu _ {j} ) $
+is the covariance of these components of  $  \mathbf x $(
+the non-degenerate case  $   \mathop{\rm rank}  \mathbf V = p $
+is considered; in case  $   \mathop{\rm rank}  \mathbf V = p  ^  \prime  < p $,
+all the results remain true, but in a subspace of a smaller dimension  $  p  ^  \prime  $
+on which the probability distribution of  $  \mathbf x $
+is concentrated).
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514013.png" /> is the vector of mathematical expectations (cf. [[Mathematical expectation|Mathematical expectation]]) of the components of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514014.png" />, that is, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514015.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514016.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514017.png" /> is the [[Covariance matrix|covariance matrix]] of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514018.png" />, that is, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514019.png" /> is the covariance of these components of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514020.png" /> (the non-degenerate case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514021.png" /> is considered; in case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514022.png" />, all the results remain true, but in a subspace of a smaller dimension <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514023.png" /> on which the probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514024.png" /> is concentrated).
+Thus, if (1) is a sequence of independent observations, forming a random sample from  $  N _ {p} ( \pmb\mu , \mathbf V ) $,
+then the maximum-likelihood estimators for the parameters  $  \pmb\mu $
+and  $  \mathbf V $
+in (2) are, respectively, the statistics (see [[#References|[1]]], [[#References|[2]]])
-Thus, if (1) is a sequence of independent observations, forming a random sample from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514025.png" />, then the maximum-likelihood estimators for the parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514026.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514027.png" /> in (2) are, respectively, the statistics (see [[#References|[1]]], [[#References|[2]]])
+$$ \tag{3 }
+\widehat{\pmb\mu}   =
+\frac{1}{n}
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514028.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
+\sum _ { i= } 1 ^ { n }  \mathbf x _ {\cdot i }
+$$
 and
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514029.png" /></td> <td valign="top" style="width:5%;text-align:right;">(4)</td></tr></table>
+$$ \tag{4 }
+\widehat{\mathbf V}   =
+\frac{1}{n}
-where the random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514030.png" /> is subject to the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514031.png" />-dimensional normal law <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514032.png" /> and is statistically independent of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514033.png" />, and the joint distribution of the elements of the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514034.png" /> is described by the so-called [[Wishart distribution|Wishart distribution]] (see [[#References|[4]]]) with density
+\sum _ { i= } 1 ^ { n }
+( \mathbf x _ {\cdot i }  - \widehat{\pmb\mu}  )
+( \mathbf x _ {\cdot i }  - \widehat{\pmb\mu}  )  ^  \prime  ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514035.png" /></td> </tr></table>
+where the random vector  $  \widehat{\pmb\mu}  $
+is subject to the  $  p $-
+dimensional normal law  $  N _ {p} ( \pmb\mu , \mathbf V / n ) $
+and is statistically independent of  $  \widehat{\mathbf V}  $,
+and the joint distribution of the elements of the matrix  $  \widehat{\mathbf Q}  = n \widehat{\mathbf V}  $
+is described by the so-called [[Wishart distribution|Wishart distribution]] (see [[#References|[4]]]) with density
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514036.png" /></td> </tr></table>
+$$
+w ( \widehat{\mathbf Q}  \mid  \mathbf V ;  n) =
+$$
-if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514037.png" /> is positive definite, and 0 otherwise.
+$$
+= \
-Within this scheme, the distribution and moments of sampling characteristics of multivariate random variables such as the coefficients of paired, partial and multiple correlations, the generalized variance (i.e., the statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514038.png" />) and the generalized Hotelling <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514039.png" />-statistic (cf. [[Hotelling-T^2-distribution|Hotelling <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514040.png" />-distribution]] and [[#References|[5]]]) have been investigated. In particular (see [[#References|[1]]]), if the sample covariance matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514041.png" /> is defined as the estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514042.png" /> made  "unbiased" , namely:
+\frac{\widehat{\mathbf Q}  ^ {( n - p - 2 ) / 2 }
+ \mathop{\rm exp} \{ -  \mathop{\rm tr} ( \mathbf V  ^ {-} 1 \widehat{\mathbf Q}  ) /2 \} }{2 ^ {
+( n - 2 ) p / 2 } \pi ^ {p ( p - 1 ) / 4 } | \mathbf V
+| ^ {( n - 1 ) / 2 } \prod _ { j= } 1 ^ { p }  \Gamma ( ( n- j ) / 2) }
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514043.png" /></td> <td valign="top" style="width:5%;text-align:right;">(5)</td></tr></table>
+$$
-then the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514044.png" /> tends to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514045.png" /> as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514046.png" />, and the random variables
+if  $  \widehat{\mathbf Q}  $
+is positive definite, and 0 otherwise.
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514047.png" /></td> <td valign="top" style="width:5%;text-align:right;">(6)</td></tr></table>
+Within this scheme, the distribution and moments of sampling characteristics of multivariate random variables such as the coefficients of paired, partial and multiple correlations, the generalized variance (i.e., the statistic  $  | \widehat{\mathbf V}  | $)
+and the generalized Hotelling  $  T  ^ {2} $-
+statistic (cf. [[Hotelling-T^2-distribution|Hotelling  $  T  ^ {2} $-
+distribution]] and [[#References|[5]]]) have been investigated. In particular (see [[#References|[1]]]), if the sample covariance matrix  $  \mathbf S _ {n} $
+is defined as the estimator  $  \widehat{\mathbf V}  $
+made  "unbiased" , namely:
+$$ \tag{5 }
+\mathbf S _ {n}  =
+\frac{n}{n-}
+\widehat{\mathbf V}  ,
+$$
+then the distribution of  $  \sqrt n ( | \mathbf S _ {n}  | / |  \mathbf V |  ^ {-} 1 ) $
+tends to  $  N _ {1} ( 0 , 2 p ) $
+as  $  n \rightarrow \infty $,
+and the random variables
+$$ \tag{6 }
+\frac{n - p }{p ( n - 1 ) }
+ T  ^ {2}  = \
+\frac{n - p }{p ( n - 1 ) }
+n ( \widehat{\pmb\mu}  - \pmb\mu )  ^  \prime
+\mathbf S _ {n}  ^ {-} 1 ( \widehat{\pmb\mu}  - \pmb\mu )
+$$
 and
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514048.png" /></td> <td valign="top" style="width:5%;text-align:right;">(7)</td></tr></table>
+$$ \tag{7 }
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514049.png" /></td> </tr></table>
+\frac{n _ {1} + n _ {2} - p - 1 }{( n _ {1} + n _ {2} - 2 ) p }
-have the [[Fisher-F-distribution|Fisher <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514050.png" />-distribution]] with degrees of freedom <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514051.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514052.png" />, respectively. In (7), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514053.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514054.png" /> are the sizes of two independent samples of the form (1) taken from the same population <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514055.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514056.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514057.png" /> being estimators of the form (3) and (4)–(5), constructed with respect to the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514058.png" />-th sample, and
+\widetilde{T}  {}  ^ {2\ } =
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514059.png" /></td> </tr></table>
+$$
+= \
-is the common sample covariance matrix constructed with respect to the estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514060.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514061.png" />.
+\frac{n _ {1} + n _ {2} - p - 1 }{( n _ {1} + n _ {2} - 2 ) p }
+\frac{n _ {1} n _ {2} }{n _ {1} + n _ {2} }
+ ( \widehat{\pmb\mu}  _ {n _ {1}  } - \widehat{\pmb\mu}  _ {n _ {2}  } )  ^  \prime  \mathbf S _ {n _ {1}  + n _ {2} }  ^ {-} 1 (
+\widehat{\pmb\mu}  _ {n _ {1}  } - \widehat{\pmb\mu}  _ {n _ {2}  } )
+$$
+have the [[Fisher-F-distribution|Fisher  $  F $-
+distribution]] with degrees of freedom  $  ( p , n - p ) $
+and  $  ( p , n _ {1} + n _ {2} - p - 1 ) $,
+respectively. In (7),  $  n _ {1} $
+and  $  n _ {2} $
+are the sizes of two independent samples of the form (1) taken from the same population  $  N _ {p} ( \pmb\mu , \mathbf V ) $,
+$  \widehat{\pmb\mu}  _ {n _ {i}  } $
+and  $  \mathbf S _ {n _ {i}  } $
+being estimators of the form (3) and (4)–(5), constructed with respect to the  $  i $-
+th sample, and
+$$
+\mathbf S _ {n _ {1}  + n _ {2} }  = \
+\frac{1}{n _ {1} + n _ {2} - 2 }
+[ ( n _ {1} - 1 ) \mathbf S _ {n _ {1}  } +
+( n _ {2} - 1 ) \mathbf S _ {n _ {2}  } ]
+$$
+is the common sample covariance matrix constructed with respect to the estimators  $  \mathbf S _ {n _ {1}  } $
+and  $  \mathbf S _ {n _ {2}  } $.
 ==Multivariate statistical analysis of the nature and structure of correlations between the components of the multivariate attribute in question.==
 This branch unifies the ideas and results used in such methods and models of multivariate statistical analysis as multiple [[Regression|regression]]; multivariate [[Dispersion analysis|dispersion analysis]] and [[Covariance analysis|covariance analysis]]; [[Factor analysis|factor analysis]]; the method of principal components; and the analysis of canonical correlations. The results of this branch may be conventionally divided into two basic types.
-) The construction of best (in a specified sense) statistical estimators for the parameters of these models and the analysis of their properties (more precisely, and in a probabilistic formulation, of their distribution laws, confidence regions, etc.). Thus, let the multivariate attribute <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514062.png" /> be interpreted as a vector-valued random variable subject to the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514063.png" />-dimensional normal distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514064.png" />, and let it be partitioned into two subvectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514065.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514066.png" /> of dimensions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514067.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514068.png" />, respectively. This defines a corresponding partition of the expectation vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514069.png" /> and of the theoretical and sample covariance matrices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514070.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514071.png" />, namely:
+) The construction of best (in a specified sense) statistical estimators for the parameters of these models and the analysis of their properties (more precisely, and in a probabilistic formulation, of their distribution laws, confidence regions, etc.). Thus, let the multivariate attribute  $  \mathbf x $
+be interpreted as a vector-valued random variable subject to the  $  p $-
+dimensional normal distribution  $  N _ {p} ( \pmb\mu , \mathbf V ) $,
+and let it be partitioned into two subvectors  $  \mathbf x  ^ {(} 1) $
+and  $  \mathbf x  ^ {(} 2) $
+of dimensions  $  q $
+and  $  p- q $,
+respectively. This defines a corresponding partition of the expectation vector  $  \pmb\mu $
+and of the theoretical and sample covariance matrices  $  \mathbf V $
+and  $  \widehat{\mathbf V}  $,
+namely:
+$$
+\pmb\mu  = \
+\left ( \begin{array}{c}
+\pmb\mu  ^ {(} 1) \\
+ \pmb\mu  ^ {(} 2)
+\end{array}
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514072.png" /></td> </tr></table>
+\right ) ,\ \
+\mathbf V  = \
+\left (
-Then (see [[#References|[1]]], [[#References|[2]]]) the conditional distribution of the subvector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514073.png" /> (under the condition that the second subvector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514074.png" /> takes a fixed value) will also be normal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514075.png" />. Here the maximum-likelihood estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514076.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514077.png" /> of the matrices of regression coefficients <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514078.png" /> and covariances <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514079.png" />, in this classical multivariate model of multiple regression
+Then (see [[#References|[1]]], [[#References|[2]]]) the conditional distribution of the subvector  $  \mathbf x  ^ {(} 1) $(
+under the condition that the second subvector  $  \mathbf x  ^ {(} 2) $
+takes a fixed value) will also be normal  $  N _ {q} ( \pmb\mu  ^ {(} 1) + \mathbf B ( \mathbf x  ^ {(} 2) - \pmb\mu  ^ {(} 2) ) , \pmb\Sigma ) $.
+Here the maximum-likelihood estimators  $  \widehat{\mathbf B}  $
+and  $  \widehat{\pmb\Sigma}  $
+of the matrices of regression coefficients  $  \mathbf B $
+and covariances  $  \pmb\Sigma $,
+in this classical multivariate model of multiple regression
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514080.png" /></td> <td valign="top" style="width:5%;text-align:right;">(8)</td></tr></table>
+$$ \tag{8 }
+{\mathsf E} ( \mathbf x  ^ {(} 1) \mid  \mathbf x  ^ {(} 2) )  =  \pmb\mu  ^ {(} 1) + \mathbf B ( \mathbf x  ^ {(} 2) - \pmb\mu  ^ {(} 2) ) ,
+$$
 will be the mutually independent statistics
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514081.png" /></td> </tr></table>
+$$
+\widehat{\mathbf B}   =  \widehat{\mathbf V}  _ {12} \widehat{\mathbf V}  {} _ {22}  ^ {-} 1 \ \
+\textrm{ and } \ \
+\widehat \Sigma    =  \widehat{\mathbf V}  _ {11} - \widehat{\mathbf V}  _ {12} \mathbf V
+hat {} _ {22}  ^ {-} 1 \widehat{\mathbf V}  _ {21} ,
+$$
-respectively. Here the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514082.png" /> is the normal law <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514083.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514084.png" /> has the Wishart distribution with parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514085.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514086.png" /> (the elements of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514087.png" /> are given in terms of the elements of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514088.png" />).
+respectively. Here the distribution of  $  \widehat{\mathbf B}  $
+is the normal law  $  N _ {q ( p- q ) }  ( \mathbf B , \mathbf V _ {\mathbf B }  ) $,
+and  $  \mathbf n \widehat{\pmb\Sigma}  $
+has the Wishart distribution with parameters  $  \pmb\Sigma $
+and  $  n - ( p- q ) $(
+the elements of  $  \mathbf V _ {\mathbf B }  $
+are given in terms of the elements of  $  \mathbf V $).
 The basic results on the construction of estimators of parameters and in the investigation of their properties in models of factor analysis, of principal components and of canonical correlations are related to the analysis of the probabilistic-statistical properties of the eigen values (characteristic values) and eigen vectors of the various covariance matrices.
@@ Line 77: / Line 245: @@
 ) The construction of statistical tests for the verification of various hypotheses on the structure of the correlations being investigated. Within the limits of a multivariate normal model (sequences of observations of the form (1) are interpreted as random samples from the corresponding multivariate normal population) statistical tests have been constructed for testing, for example, the following hypotheses.
-I) The hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514089.png" />, i.e., that the expectation of the variables studied be equal to a specific vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514090.png" />; this is tested via the Hotelling <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514091.png" />-statistic by substituting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514092.png" /> in (6).
+I) The hypothesis  $  \pmb\mu = \pmb\mu  ^ {*} $,
+i.e., that the expectation of the variables studied be equal to a specific vector  $  \pmb\mu  ^ {*} $;
+this is tested via the Hotelling  $  T  ^ {2} $-
+statistic by substituting  $  \pmb\mu = \pmb\mu  ^ {*} $
+in (6).
+II) The hypothesis  $  \pmb\mu  ^ {(} 1) = \pmb\mu  ^ {(} 2) $
+of equality of the expectation vectors in two populations (with identical but unknown covariance matrices), based on two samples; this is tested via the statistic  $  \widetilde{T}  {}  ^ {2} $(
+see [[#References|[7]]]).
-II) The hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514093.png" /> of equality of the expectation vectors in two populations (with identical but unknown covariance matrices), based on two samples; this is tested via the statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514094.png" /> (see [[#References|[7]]]).
+III) The hypothesis  $  \pmb\mu  ^ {(} 1) = \dots = \pmb\mu  ^ {(} k) = \pmb\mu $
+of equality of the expectation vectors in several populations (with identical but unknown covariance matrices), based on samples from them; this is tested via the statistic
-III) The hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514095.png" /> of equality of the expectation vectors in several populations (with identical but unknown covariance matrices), based on samples from them; this is tested via the statistic
+$$
+U _ { p , k- 1 , n- k }  = \
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514096.png" /></td> </tr></table>
+\frac{\left | \sum _ { j= } 1 ^ { k }  \sum _ { i= } 1 ^ { {n _ j } } (
+\mathbf x _ { . i }  ^ {(} j) - \widehat{\pmb\mu}  {}  ^ {(} j) ) ( \mathbf x _ {. i }   ^ {(} j)
+- \widehat{\pmb\mu}  {}  ^ {(} j) )  ^  \prime  \right | }{\left | \sum _ { j= } 1 ^ { k }  \sum _ { i= } 1 ^ { {n _ j } } (
+\mathbf x _ {. i }   ^ {(} j) - \widehat{\pmb\mu}  ) ( \mathbf x _ {. i }   ^ {(} j) - \pmb\mu
+hat )  ^  \prime  \right | }
+ ,
+$$
-in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514097.png" /> is the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514098.png" />-th <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m06514099.png" />-dimensional observation in a sample of size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140100.png" />, representing the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140101.png" />-th population, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140102.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140103.png" /> are estimators of the form (3), constructed separately with respect to each of the samples and with respect to the joint sample of size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140104.png" />, respectively.
+in which  $  \mathbf x _ {. i }   ^ {(} j) $
+is the  $  i $-
+th  $  p $-
+dimensional observation in a sample of size  $  n _ {j} $,
+representing the  $  j $-
+th population, and  $  \widehat{\pmb\mu}  {}  ^ {(} j) $
+and  $  \widehat{\pmb\mu}  $
+are estimators of the form (3), constructed separately with respect to each of the samples and with respect to the joint sample of size  $  n = n _ {1} + \dots + n _ {k} $,
+respectively.
-IV) The hypotheses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140105.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140106.png" /> of equivalence of several normal populations, based on samples from them <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140107.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140108.png" />; this is tested via the statistic
+IV) The hypotheses  $  \pmb\mu  ^ {(} 1) = \dots = \pmb\mu  ^ {(} k) = \pmb\mu $
+and  $  \mathbf V _ {1} = \dots = \mathbf V _ {k} = \mathbf V $
+of equivalence of several normal populations, based on samples from them  $  \{ \mathbf x _ {. i }   ^ {(} j) \} _ {i=} 1 ^ {n _ {j} } $,
+$  j = 1 \dots k $;
+this is tested via the statistic
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140109.png" /></td> </tr></table>
+$$
+\lambda  = \
-in which the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140110.png" /> are estimators of the form (4) constructed separately with respect to the observations from the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140111.png" />-th sample, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140112.png" />.
+\frac{\prod _ { j= } 1 ^ { k }  | n _ {j} \widehat{\mathbf V}  _ {j} | ^ {
+( n _ {j} - 1 ) /2 } }{\left | \sum _ { j= } 1 ^ { k }  \sum _ { i= } 1 ^ { {n _ j} } ( \mathbf x _ {. i }   ^ {(} j) -
+\widehat{\pmb\mu}  ) ( \mathbf x _ {. i }   ^ {(} j) - \widehat{\pmb\mu}  )  ^  \prime  \right | ^ {( n- k ) /2 } }
+ ,
+$$
-V) The hypothesis of mutual independence of the subvectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140113.png" /> of dimensions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140114.png" />, respectively, into which the initial <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140115.png" />-dimensional vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140116.png" /> has been partitioned, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140117.png" />; this is tested via the statistic
+in which the  $  \widehat{\mathbf V}  _ {j} $
+are estimators of the form (4) constructed separately with respect to the observations from the  $  j $-
+th sample,  $  j = 1 \dots k $.
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140118.png" /></td> </tr></table>
+V) The hypothesis of mutual independence of the subvectors  $  \mathbf x  ^ {(} 1) \dots \mathbf x  ^ {(} m) $
+of dimensions  $  p _ {1} \dots p _ {m} $,
+respectively, into which the initial  $  p $-
+dimensional vector  $  \mathbf x $
+has been partitioned,  $  p _ {1} + \dots + p _ {m} = p $;
+this is tested via the statistic
-in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140119.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140120.png" /> are sample covariance matrices of the form (4) for the vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140121.png" /> and its subvectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140122.png" />, respectively.
+$$
+\pmb\psi  =
+\frac{| \mathbf n \widehat{\mathbf V}  | }{\prod _ { i= } 1 ^ { m }  | n _ {i} \widehat{\mathbf V}  _ {i} | }
+ ,
+$$
+in which  $  \widehat{\mathbf V}  $
+and  $  \widehat{\mathbf V}  _ {i} $
+are sample covariance matrices of the form (4) for the vector  $  \mathbf x $
+and its subvectors  $  \mathbf x  ^ {(} i) $,
+respectively.
 ==Multivariate statistical analysis of the geometric structure of the set of multi-dimensional observations being investigated.==
-This branch unifies notions and results of models and schemes such as [[Discriminant analysis|discriminant analysis]], mixtures of probability distributions, cluster analysis, taxonomy, and multi-dimensional scaling. The key in all of these schemes is a notion of distance (measure of proximity, measure of similarity) between the elements being analyzed. Here the objects being analyzed may both be real objects, in each of which the values of the components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140123.png" /> are fixed — then in the geometrical representation the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140124.png" />-th object will be a point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140125.png" /> in the corresponding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140126.png" />-dimensional space, as well as the variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140127.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140128.png" />, themselves — in the geometrical representation the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140129.png" />-th index will be a point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140130.png" /> in the corresponding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140131.png" />-dimensional space.
+This branch unifies notions and results of models and schemes such as [[Discriminant analysis|discriminant analysis]], mixtures of probability distributions, cluster analysis, taxonomy, and multi-dimensional scaling. The key in all of these schemes is a notion of distance (measure of proximity, measure of similarity) between the elements being analyzed. Here the objects being analyzed may both be real objects, in each of which the values of the components  $  \mathbf x $
+are fixed — then in the geometrical representation the  $  i $-
+th object will be a point  $  \mathbf x _ {. i }  = ( x _ {1i} \dots x _ {p i }  )  ^  \prime  $
+in the corresponding  $  p $-
+dimensional space, as well as the variables  $  \mathbf x _ {l . }  $,
+$  l = 1 \dots p $,
+themselves — in the geometrical representation the  $  l $-
+th index will be a point  $  \mathbf x _ {l . }  = ( x _ {l1} \dots x _  \mathop{\rm ln} ) $
+in the corresponding  $  n $-
+dimensional space.
-The methods and results of discriminant analysis (see [[#References|[1]]], [[#References|[2]]], [[#References|[7]]]) are directed to the solution of the following problem. Suppose that the existence of a specific number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140132.png" /> of populations is known and that there is a sample from each (a  "training sample" ) known. It is required to construct, on the basis of training samples, the best, in a specified sense, classifying rule which allows one to attribute some new element (an observation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140133.png" />) to its population, when the investigator does not know in advance to which population the element belongs. Usually, a classification rule means a sequence of actions; the calculation of a scalar function of the variables in question, based on which a decision is taken on assigning the element to one of the classes (the construction of a discriminant function); an ordering of the variables themselves according to their degree of informativeness from the point of view of a proper assignment of elements to classes; and a calculation of the corresponding probabilities of the errors in the classification.
+The methods and results of discriminant analysis (see [[#References|[1]]], [[#References|[2]]], [[#References|[7]]]) are directed to the solution of the following problem. Suppose that the existence of a specific number  $  k \geq  2 $
+of populations is known and that there is a sample from each (a  "training sample" ) known. It is required to construct, on the basis of training samples, the best, in a specified sense, classifying rule which allows one to attribute some new element (an observation  $  \mathbf x $)
+to its population, when the investigator does not know in advance to which population the element belongs. Usually, a classification rule means a sequence of actions; the calculation of a scalar function of the variables in question, based on which a decision is taken on assigning the element to one of the classes (the construction of a discriminant function); an ordering of the variables themselves according to their degree of informativeness from the point of view of a proper assignment of elements to classes; and a calculation of the corresponding probabilities of the errors in the classification.
-The problem of analysis of a mixture of probability distributions (see [[#References|[7]]]) most often (but not always) also arises in connection with the investigation of the  "geometric structure"  of some population. Here the idea of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140134.png" />-th homogeneous class is formalized with the help of a population described by some (as a rule, unimodal) distribution law <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140135.png" />, so that the distribution of the general population from which the sample (1) is extracted is described by a mixture of distributions of the form
+The problem of analysis of a mixture of probability distributions (see [[#References|[7]]]) most often (but not always) also arises in connection with the investigation of the  "geometric structure"  of some population. Here the idea of the  $  r $-
+th homogeneous class is formalized with the help of a population described by some (as a rule, unimodal) distribution law  $  {\mathsf P} ( \mathbf x \mid  \pmb\theta _ {r} ) $,
+so that the distribution of the general population from which the sample (1) is extracted is described by a mixture of distributions of the form
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140136.png" /></td> </tr></table>
+$$
+{\mathsf P} ( \mathbf x )  =  \sum _ { r= } 1 ^ { k }  \pi _ {r} {\mathsf P} (
+\mathbf x \mid  \pmb\theta _ {r} ) ,
+$$
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140137.png" /> is the a priori probability (the specific weight of the elements) of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140138.png" />-th class in the general population. The problem is to give a  "good"  statistical estimation (with respect to a sample <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140139.png" />) of the unknown parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140140.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140141.png" />, and sometimes even <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140142.png" />. This, in particular, allows one to reduce the problem of the classification of the elements to a scheme of discriminant analysis, although in this case training samples are absent.
+where  $  \pi _ {r} $
+is the a priori probability (the specific weight of the elements) of the  $  r $-
+th class in the general population. The problem is to give a  "good"  statistical estimation (with respect to a sample  $  \{ \mathbf x _ {. i }  \} _ {1}  ^ {n} $)
+of the unknown parameters  $  \pmb\theta _ {r} $,
+$  \pi _ {r} $,
+and sometimes even  $  k $.
+This, in particular, allows one to reduce the problem of the classification of the elements to a scheme of discriminant analysis, although in this case training samples are absent.
-The methods and results of cluster analysis (classification, taxonomy, pattern recognition  "without a teacher" , see [[#References|[2]]], [[#References|[6]]], [[#References|[7]]]) are directed to the solution of the following problem. The geometric structure of the set of elements to be analyzed is given either by the coordinates of the corresponding points (that is, by the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140143.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140144.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140145.png" />), or by geometric characteristics of their mutual disposition, for example, by the matrix of pairwise distances <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140146.png" />. It is required to partition the set of elements being investigated into a comparatively small (known in advance or not) number of classes, so that the elements of a class are at a small distance from each other, and at the same time different classes should, as far as possible, be sufficiently far from each other and could not be partitioned into other subsets equally far from each other.
+The methods and results of cluster analysis (classification, taxonomy, pattern recognition  "without a teacher" , see [[#References|[2]]], [[#References|[6]]], [[#References|[7]]]) are directed to the solution of the following problem. The geometric structure of the set of elements to be analyzed is given either by the coordinates of the corresponding points (that is, by the matrix  $  \| x _ {ij} \| $,
+$  i = 1 \dots p $,
+$  j = 1 \dots n $),
+or by geometric characteristics of their mutual disposition, for example, by the matrix of pairwise distances  $  \| \rho _ {ij} \| _ {i , j = 1 }   ^ {n} $.
+It is required to partition the set of elements being investigated into a comparatively small (known in advance or not) number of classes, so that the elements of a class are at a small distance from each other, and at the same time different classes should, as far as possible, be sufficiently far from each other and could not be partitioned into other subsets equally far from each other.
-The problem of multi-dimensional scaling (see [[#References|[6]]]) is related to the situation when the set of elements being investigated is given via a matrix of mutual distances <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140147.png" /> and consists of attributing to each of the elements a given number (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140148.png" />) of coordinates so that the structure of the mutual distances between the elements, measured using these auxiliary coordinates, would on the average differ least from that given. It should be noted that the basic results and methods of cluster analysis and multi-dimensional scaling have usually been developed without any assumptions regarding the probabilistic nature of the initial data.
+The problem of multi-dimensional scaling (see [[#References|[6]]]) is related to the situation when the set of elements being investigated is given via a matrix of mutual distances  $  \| \rho _ {ij} \| _ {i , j = 1 }   ^ {n} $
+and consists of attributing to each of the elements a given number ( $  p $)
+of coordinates so that the structure of the mutual distances between the elements, measured using these auxiliary coordinates, would on the average differ least from that given. It should be noted that the basic results and methods of cluster analysis and multi-dimensional scaling have usually been developed without any assumptions regarding the probabilistic nature of the initial data.
 ==The merits of multivariate statistical analysis in practice.==
@@ Line 118: / Line 364: @@
 ===The problem of statistical investigation of dependence between the variables being analyzed.===
-Suppose that the set of recorded statistical variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140149.png" /> partitions, according to the meaning of these variables and the final aim of investigation, into a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140150.png" />-dimensional subvector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140151.png" /> of (dependent) variables to be predicted and a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140152.png" />-dimensional subvector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140153.png" /> of predicting (independent) variables. Then it can be said that the problem is to determine, on the basis of a sample (1), a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140154.png" />-dimensional vector-valued function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140155.png" /> from the class of acceptable decisions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140156.png" />, which would give the best, in a specific sense, approximation of the behaviour of the subvector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140157.png" />. Depending on the concrete form of the functional of the quality of the approximation and the nature of the variables being analyzed, one arrives at some scheme of multiple regression, variance, covariance, or confluence analysis [[#References|[8]]].
+Suppose that the set of recorded statistical variables  $  \mathbf x $
+partitions, according to the meaning of these variables and the final aim of investigation, into a  $  q $-
+dimensional subvector  $  \mathbf x  ^ {(} 1) $
+of (dependent) variables to be predicted and a  $  ( p- q ) $-
+dimensional subvector  $  \mathbf x  ^ {(} 2) $
+of predicting (independent) variables. Then it can be said that the problem is to determine, on the basis of a sample (1), a  $  q $-
+dimensional vector-valued function  $  f ( \mathbf x  ^ {(} 2) ) $
+from the class of acceptable decisions  $  F $,
+which would give the best, in a specific sense, approximation of the behaviour of the subvector  $  \mathbf x  ^ {(} 1) $.
+Depending on the concrete form of the functional of the quality of the approximation and the nature of the variables being analyzed, one arrives at some scheme of multiple regression, variance, covariance, or confluence analysis [[#References|[8]]].
 ===The problem of classifying elements.===
-This problem in a general (non-rigorous) formulation is that the whole set of elements (objects or variables) being analyzed, represented statistically as a matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140158.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140159.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140160.png" />, or a matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140161.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140162.png" />, partitions into a comparatively small number of homogeneous (in a specified sense) groups [[#References|[7]]]. Depending on the behaviour of the a priori information and the concrete form of the functional giving the criteria for the quality of the classification, one arrives at some scheme of discriminant analysis, cluster analysis (taxonomy, pattern recognition  "without a teacher" ), or splitting mixtures of distributions.
+This problem in a general (non-rigorous) formulation is that the whole set of elements (objects or variables) being analyzed, represented statistically as a matrix  $  \| x _ {ij} \| $,
+$  i = 1 \dots p $,
+$  j = 1 \dots n $,
+or a matrix  $  \| \rho _ {ij} \| $,
+$  i , j = 1 \dots n $,
+partitions into a comparatively small number of homogeneous (in a specified sense) groups [[#References|[7]]]. Depending on the behaviour of the a priori information and the concrete form of the functional giving the criteria for the quality of the classification, one arrives at some scheme of discriminant analysis, cluster analysis (taxonomy, pattern recognition  "without a teacher" ), or splitting mixtures of distributions.
 ===The problem of lowering the dimension of the factor space being investigated and the selection of the most informative variables.===
-This consists of defining a set of a comparatively small number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140163.png" /> of variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140164.png" /> in the class of admissible transformations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140165.png" /> of the initial variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140166.png" /> for which some exogeneously given measure of informativity for an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140167.png" />-dimensional system of tests attains its least upper bound (see [[#References|[7]]]). Concretization of the functional giving the measure of self-informativity (that is, aimed at a maximal preservation of the information contained in the statistical ensemble (1) relative to the initial attributes themselves) results, in particular, in various schemes of factor analysis and principal components, and in the method of extremal grouping of tests. Functionals giving a measure of external informativity, that is, aimed at extracting from (1) maximum information relative to certain other variables or phenomena not directly contained in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065140/m065140168.png" />, lead to different methods of selecting the most informative variables in schemes of statistical research into dependences and discriminant analysis.
+This consists of defining a set of a comparatively small number  $  m \ll  p $
+of variables  $  \mathbf z = ( z _ {1} \dots z _ {m} )  ^  \prime  $
+in the class of admissible transformations  $  Z ( \mathbf x ) $
+of the initial variables  $  \mathbf x = ( x _ {1} \dots x _ {p} ) $
+for which some exogeneously given measure of informativity for an  $  m $-
+dimensional system of tests attains its least upper bound (see [[#References|[7]]]). Concretization of the functional giving the measure of self-informativity (that is, aimed at a maximal preservation of the information contained in the statistical ensemble (1) relative to the initial attributes themselves) results, in particular, in various schemes of factor analysis and principal components, and in the method of extremal grouping of tests. Functionals giving a measure of external informativity, that is, aimed at extracting from (1) maximum information relative to certain other variables or phenomena not directly contained in  $  \mathbf x $,
+lead to different methods of selecting the most informative variables in schemes of statistical research into dependences and discriminant analysis.
 ==Fundamental mathematical tools in multivariate statistical analysis.==
@@ Line 131: / Line 397: @@
 ====References====
 <table><TR><TD valign="top">[1]</TD> <TD valign="top">  T.W. Anderson,   "An introduction to multivariate statistical analysis" , Wiley  (1958)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  M.G. Kendall,   A. Stuart,   "The advanced theory of statistics" , '''3''' , Griffin  (1983)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  L.N. Bol'shev,   ''Bull. Int. Stat. Inst.'' , '''43'''  (1969)  pp. 425–441</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  J. Wishart,   ''Biometrika'' , '''20A'''  (1928)  pp. 32–52</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  H. Hotelling,   "The generalization of student's ratio"  ''Ann. Math. Statist.'' , '''2'''  (1931)  pp. 360–378</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top">  J.B. Kruskal,   "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis"  ''Psychometrika'' , '''29'''  (1964)  pp. 1–27</TD></TR><TR><TD valign="top">[7]</TD> <TD valign="top">  S.A. Aivazyan,   V.M. Bukhshtaber,   I.S. Yenyukov,   L.D. Meshalkin,   "Applied statistics: classification and reduction of dimensionality" , Moscow  (1989)  (In Russian)</TD></TR><TR><TD valign="top">[8]</TD> <TD valign="top">  S.A. Aivazyan,   I.S. Yenyukov,   L.D. Meshalkin,   "Applied statistics: study of relationships" , Moscow  (1985)  (In Russian)</TD></TR></table>
 ====Comments====
 ====References====
 <table><TR><TD valign="top">[a1]</TD> <TD valign="top">  R. Gnanadesikan,   "Methods for statistical data analysis of multivariate observations" , Wiley  (1977)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  M.J. Schervish,   ''Stat. Science'' , '''2'''  (1987)  pp. 396–433</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  R. Farrell,   "Techniques of multivariate calculation" , Springer  (1976)</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  M.L. Eaton,   "Multivariate statistics: A vector space approach" , Wiley  (1983)</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  R.J. Muirhead,   "Aspects of multivariate statistical theory" , Wiley  (1982)</TD></TR></table>