Namespaces
Variants
Actions

Difference between revisions of "Linear regression"

From Encyclopedia of Mathematics
Jump to: navigation, search
m (tex encoded by computer)
(latex details)
 
Line 11: Line 11:
 
{{TEX|done}}
 
{{TEX|done}}
  
''of one random variable  $  \mathbf Y = ( Y  ^ {(} 1) \dots Y  ^ {(} m) )  ^  \prime  $
+
''of one random variable  $  \mathbf Y = ( Y  ^ {(1)} \dots Y  ^ {(m)} )  ^  \prime  $
on another  $  \mathbf X = ( X  ^ {(} 1) \dots X  ^ {(} p) )  ^  \prime  $''
+
on another  $  \mathbf X = ( X  ^ {(1)} \dots X  ^ {(p)} )  ^  \prime  $''
  
 
An  $  m $-
 
An  $  m $-
Line 21: Line 21:
  
 
$$ \tag{* }
 
$$ \tag{* }
y  ^ {(} k) ( \mathbf x , \mathbf b )  =  {\mathsf E} ( Y  ^ {(} k)
+
y  ^ {(k)} ( \mathbf x , \mathbf b )  =  {\mathsf E} ( Y  ^ {(k)}
 
\mid  \mathbf X = \mathbf x )  = \  
 
\mid  \mathbf X = \mathbf x )  = \  
\sum _ { j= } 0 ^ { p }  b _ {kj} x  ^ {(} j) ,
+
\sum_{j=0}^ { p }  b _ {kj} x  ^ {(j)} ,
 
$$
 
$$
  
 
$$  
 
$$  
x  ^ {(} 0) \equiv  1 ,\  k = 1 \dots m,
+
x  ^ {(0)} \equiv  1 ,\  k = 1 \dots m,
 
$$
 
$$
  
Line 37: Line 37:
 
under investigation depends.
 
under investigation depends.
  
In addition, the linear regression of  $  Y  ^ {(} k) $
+
In addition, the linear regression of  $  Y  ^ {(k)} $
 
on  $  \mathbf X $
 
on  $  \mathbf X $
is frequently also understood to be the  "best"  (in a well-defined sense) linear approximation of  $  Y  ^ {(} k) $
+
is frequently also understood to be the  "best"  (in a well-defined sense) linear approximation of  $  Y  ^ {(k)} $
 
by means of  $  \mathbf X $,  
 
by means of  $  \mathbf X $,  
or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" )  $  ( Y _ {i}  ^ {(} k) , \mathbf X _ {i} ) $,  
+
or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" )  $  ( Y _ {i}  ^ {(k)} , \mathbf X _ {i} ) $,  
 
$  i = 1 \dots n $,  
 
$  i = 1 \dots n $,  
by means of a hyperplane in the space  $  ( Y  ^ {(} k) , \mathbf X ) $,  
+
by means of a hyperplane in the space  $  ( Y  ^ {(k)} , \mathbf X ) $,  
in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of  $  Y  ^ {(} k) $
+
in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of  $  Y  ^ {(k)} $
 
by means of  $  \mathbf X $(
 
by means of  $  \mathbf X $(
or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of  $  Y  ^ {(} k) $
+
or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of  $  Y  ^ {(k)} $
 
by means of linear combinations of  $  \mathbf X $(
 
by means of linear combinations of  $  \mathbf X $(
linear smoothing of the points  $  ( Y _ {i}  ^ {(} k) , \mathbf X _ {i} ) $)  
+
linear smoothing of the points  $  ( Y _ {i}  ^ {(k)} , \mathbf X _ {i} ) $)  
 
are:
 
are:
  
Line 56: Line 56:
 
\omega  ^ {2} ( \mathbf X ) \cdot
 
\omega  ^ {2} ( \mathbf X ) \cdot
 
\left (
 
\left (
Y  ^ {(} k) ( \mathbf X ) - \sum _ { j= } 0 ^ { p }  
+
Y  ^ {(k)} ( \mathbf X ) - \sum _ {j=0}^ { p }  
b _ {kj} X  ^ {(} j)
+
b _ {kj} X  ^ {(j)}
 
\right )  ^ {2}
 
\right )  ^ {2}
 
\right \} ,
 
\right \} ,
Line 63: Line 63:
  
 
$$  
 
$$  
\widetilde{Q}  _ {1} ( \mathbf b )  =  \sum _ { i= } 1 ^ { n }  \omega _ {i}  ^ {2} \left ( Y _ {i}  ^ {(} k) - \sum _ { j= } 0 ^ { p }  b _ {kj} X _ {i}  ^ {(} j) \right )  ^ {2} ,
+
\widetilde{Q}  _ {1} ( \mathbf b )  =  \sum_{i=1}^ { n }  \omega _ {i}  ^ {2} \left ( Y _ {i}  ^ {(k)} - \sum_{j=0}^ { p }  b _ {kj} X _ {i}  ^ {(j)} \right )  ^ {2} ,
 
$$
 
$$
  
 
$$  
 
$$  
 
Q _ {2} ( \mathbf b )  =  {\mathsf E} \left \{ \omega ( \mathbf X )
 
Q _ {2} ( \mathbf b )  =  {\mathsf E} \left \{ \omega ( \mathbf X )
\left | Y  ^ {(} k) ( \mathbf X ) - \sum _ { j= } 0 ^ { p }  b _ {kj} X  ^ {(} j) \right | \right \} ,
+
\left | Y  ^ {(k)} ( \mathbf X ) - \sum_{j=0}^ { p }  b _ {kj} X  ^ {(j)} \right | \right \} ,
 
$$
 
$$
  
 
$$  
 
$$  
\widetilde{Q}  _ {2} ( \mathbf b )  =  \sum _ { j= } 1 ^ { n }  \omega _ {i} \left | Y _ {i}  ^ {(} k) - \sum _ { j= } 0 ^ { p }  b _ {kj} X _ {i}  ^ {(} j) \right | ,
+
\widetilde{Q}  _ {2} ( \mathbf b )  =  \sum_{j=1}^ { n }  \omega _ {i} \left | Y _ {i}  ^ {(k)} - \sum_{j=0}^ { p }  b _ {kj} X _ {i}  ^ {(j)} \right | ,
 
$$
 
$$
  
 
$$  
 
$$  
 
Q _ {3} ( \mathbf b )  =  {\mathsf E} \left \{ \omega
 
Q _ {3} ( \mathbf b )  =  {\mathsf E} \left \{ \omega
  ^ {2} ( \mathbf X ) \cdot \rho  ^ {2} \left ( Y  ^ {(} k) (
+
  ^ {2} ( \mathbf X ) \cdot \rho  ^ {2} \left ( Y  ^ {(k)} (
\mathbf X ) , \sum _ { j= } 0 ^ { p }  b _ {kj} X  ^ {(} j) \right ) \right \} ,
+
\mathbf X ) , \sum_{j=0}^ { p }  b _ {kj} X  ^ {(j)} \right ) \right \} ,
 
$$
 
$$
  
 
$$  
 
$$  
\widetilde{Q}  _ {3} ( \mathbf b )  =  \sum _ { i= } 1 ^ { n }  \omega _ {i}  ^ {2} \cdot \rho  ^ {2} \left ( Y _ {i}  ^ {(} k) , \sum _ { j= } 0 ^ { p }  b _ {kj} X _ {i}  ^ {(} j) \right ) .
+
\widetilde{Q}  _ {3} ( \mathbf b )  =  \sum_{i=1}^ { n }  \omega _ {i}  ^ {2} \cdot \rho  ^ {2} \left ( Y _ {i}  ^ {(k)} , \sum_{j=0}^ { p }  b _ {kj} X _ {i}  ^ {(j)} \right ) .
 
$$
 
$$
  
 
In these relations the choice of  "weights"  $  \omega ( \mathbf X ) $
 
In these relations the choice of  "weights"  $  \omega ( \mathbf X ) $
 
or  $  \omega _ {i} $
 
or  $  \omega _ {i} $
depends on the nature of the actual scheme under investigation. For example, if the  $  Y  ^ {(} k) ( \mathbf X ) $
+
depends on the nature of the actual scheme under investigation. For example, if the  $  Y  ^ {(k)} ( \mathbf X ) $
are interpreted as random variables with known variances  $  {\mathsf D} Y  ^ {(} k) ( \mathbf X ) $(
+
are interpreted as random variables with known variances  $  {\mathsf D} Y  ^ {(k)} ( \mathbf X ) $(
or with known estimates of them), then  $  \omega  ^ {2} ( \mathbf X ) = [ {\mathsf D} Y  ^ {(} k) ( \mathbf X ) ]  ^ {-} 1 $.  
+
or with known estimates of them), then  $  \omega  ^ {2} ( \mathbf X ) = [ {\mathsf D} Y  ^ {(k)} ( \mathbf X ) ]  ^ {-} 1 $.  
 
In the last two criteria the  "discrepancies"  of the approximation or the smoothing are measured by the distances  $  \rho ( \cdot , \cdot ) $
 
In the last two criteria the  "discrepancies"  of the approximation or the smoothing are measured by the distances  $  \rho ( \cdot , \cdot ) $
from  $  Y  ^ {(} k) ( \mathbf X ) $
+
from  $  Y  ^ {(k)} ( \mathbf X ) $
or  $  Y _ {i}  ^ {(} k) $
+
or  $  Y _ {i}  ^ {(k)} $
 
to the required hyperplane of regression. If the coefficients  $  b _ {kj} $
 
to the required hyperplane of regression. If the coefficients  $  b _ {kj} $
 
are determined by minimizing the quantities  $  Q _ {1} ( \mathbf b ) $
 
are determined by minimizing the quantities  $  Q _ {1} ( \mathbf b ) $
Line 106: Line 106:
  
 
In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type  $  Q _ {i} $.  
 
In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type  $  Q _ {i} $.  
Thus, if the vector  $  ( \mathbf X  ^  \prime  , Y  ^ {(} k) ) $
+
Thus, if the vector  $  ( \mathbf X  ^  \prime  , Y  ^ {(k)}] ) $
is subject to a multi-dimensional normal law, then the regression of  $  Y  ^ {(} k) $
+
is subject to a multi-dimensional normal law, then the regression of  $  Y  ^ {(k)} $
 
on  $  \mathbf X $
 
on  $  \mathbf X $
 
in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for  $  \omega ( \mathbf X ) \equiv 1 $).
 
in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for  $  \omega ( \mathbf X ) \equiv 1 $).

Latest revision as of 14:20, 13 January 2024


of one random variable $ \mathbf Y = ( Y ^ {(1)} \dots Y ^ {(m)} ) ^ \prime $ on another $ \mathbf X = ( X ^ {(1)} \dots X ^ {(p)} ) ^ \prime $

An $ m $- dimensional vector form, linear in $ \mathbf x $, supposed to be the conditional mean (given $ \mathbf X = \mathbf x $) of the random vector $ \mathbf Y $. The corresponding equations

$$ \tag{* } y ^ {(k)} ( \mathbf x , \mathbf b ) = {\mathsf E} ( Y ^ {(k)} \mid \mathbf X = \mathbf x ) = \ \sum_{j=0}^ { p } b _ {kj} x ^ {(j)} , $$

$$ x ^ {(0)} \equiv 1 ,\ k = 1 \dots m, $$

are called the linear regression equations of $ \mathbf Y $ on $ \mathbf X $, and the parameters $ b _ {kj} $ are called the regression coefficients (see also Regression), $ \mathbf X $ is an observable parameter (not necessarily random), on which the mean of the resulting function (response) $ \mathbf Y ( \mathbf X ) $ under investigation depends.

In addition, the linear regression of $ Y ^ {(k)} $ on $ \mathbf X $ is frequently also understood to be the "best" (in a well-defined sense) linear approximation of $ Y ^ {(k)} $ by means of $ \mathbf X $, or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" ) $ ( Y _ {i} ^ {(k)} , \mathbf X _ {i} ) $, $ i = 1 \dots n $, by means of a hyperplane in the space $ ( Y ^ {(k)} , \mathbf X ) $, in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of $ Y ^ {(k)} $ by means of $ \mathbf X $( or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of $ Y ^ {(k)} $ by means of linear combinations of $ \mathbf X $( linear smoothing of the points $ ( Y _ {i} ^ {(k)} , \mathbf X _ {i} ) $) are:

$$ Q _ {1} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \left ( Y ^ {(k)} ( \mathbf X ) - \sum _ {j=0}^ { p } b _ {kj} X ^ {(j)} \right ) ^ {2} \right \} , $$

$$ \widetilde{Q} _ {1} ( \mathbf b ) = \sum_{i=1}^ { n } \omega _ {i} ^ {2} \left ( Y _ {i} ^ {(k)} - \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right ) ^ {2} , $$

$$ Q _ {2} ( \mathbf b ) = {\mathsf E} \left \{ \omega ( \mathbf X ) \left | Y ^ {(k)} ( \mathbf X ) - \sum_{j=0}^ { p } b _ {kj} X ^ {(j)} \right | \right \} , $$

$$ \widetilde{Q} _ {2} ( \mathbf b ) = \sum_{j=1}^ { n } \omega _ {i} \left | Y _ {i} ^ {(k)} - \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right | , $$

$$ Q _ {3} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \rho ^ {2} \left ( Y ^ {(k)} ( \mathbf X ) , \sum_{j=0}^ { p } b _ {kj} X ^ {(j)} \right ) \right \} , $$

$$ \widetilde{Q} _ {3} ( \mathbf b ) = \sum_{i=1}^ { n } \omega _ {i} ^ {2} \cdot \rho ^ {2} \left ( Y _ {i} ^ {(k)} , \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right ) . $$

In these relations the choice of "weights" $ \omega ( \mathbf X ) $ or $ \omega _ {i} $ depends on the nature of the actual scheme under investigation. For example, if the $ Y ^ {(k)} ( \mathbf X ) $ are interpreted as random variables with known variances $ {\mathsf D} Y ^ {(k)} ( \mathbf X ) $( or with known estimates of them), then $ \omega ^ {2} ( \mathbf X ) = [ {\mathsf D} Y ^ {(k)} ( \mathbf X ) ] ^ {-} 1 $. In the last two criteria the "discrepancies" of the approximation or the smoothing are measured by the distances $ \rho ( \cdot , \cdot ) $ from $ Y ^ {(k)} ( \mathbf X ) $ or $ Y _ {i} ^ {(k)} $ to the required hyperplane of regression. If the coefficients $ b _ {kj} $ are determined by minimizing the quantities $ Q _ {1} ( \mathbf b ) $ or $ \widetilde{Q} _ {1} ( \mathbf b ) $, then the linear regression is said to be least squares or $ L _ {2} $; if the criteria $ Q _ {2} ( \mathbf b ) $ and $ \widetilde{Q} _ {2} ( \mathbf b ) $ are used, the linear regression is said to be minimal absolute deviations or $ L _ {1} $; if the criteria $ Q _ {3} ( \mathbf b ) $ and $ \widetilde{Q} _ {3} ( \mathbf b ) $ are used, it is said to be minimum $ \rho $- distance.

In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type $ Q _ {i} $. Thus, if the vector $ ( \mathbf X ^ \prime , Y ^ {(k)}] ) $ is subject to a multi-dimensional normal law, then the regression of $ Y ^ {(k)} $ on $ \mathbf X $ in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for $ \omega ( \mathbf X ) \equiv 1 $).

References

[1] Yu.V. Linnik, "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft. (1961) (Translated from Russian)
[2] H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)
[3] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Macmillan (1979)
[4] C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965)
How to Cite This Entry:
Linear regression. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Linear_regression&oldid=47663
This article was adapted from an original article by S.A. Aivazyan (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article