Difference between revisions of "Linear regression"

Latest revision as of 14:20, 13 January 2024

of one random variable $ \mathbf Y = ( Y ^ {(1)} \dots Y ^ {(m)} ) ^ \prime $ on another $ \mathbf X = ( X ^ {(1)} \dots X ^ {(p)} ) ^ \prime $

An $ m $- dimensional vector form, linear in $ \mathbf x $, supposed to be the conditional mean (given $ \mathbf X = \mathbf x $) of the random vector $ \mathbf Y $. The corresponding equations

$$ \tag{* } y ^ {(k)} ( \mathbf x , \mathbf b ) = {\mathsf E} ( Y ^ {(k)} \mid \mathbf X = \mathbf x ) = \ \sum_{j=0}^ { p } b _ {kj} x ^ {(j)} , $$

$$ x ^ {(0)} \equiv 1 ,\ k = 1 \dots m, $$

are called the linear regression equations of $ \mathbf Y $ on $ \mathbf X $, and the parameters $ b _ {kj} $ are called the regression coefficients (see also Regression), $ \mathbf X $ is an observable parameter (not necessarily random), on which the mean of the resulting function (response) $ \mathbf Y ( \mathbf X ) $ under investigation depends.

In addition, the linear regression of $ Y ^ {(k)} $ on $ \mathbf X $ is frequently also understood to be the "best" (in a well-defined sense) linear approximation of $ Y ^ {(k)} $ by means of $ \mathbf X $, or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" ) $ ( Y _ {i} ^ {(k)} , \mathbf X _ {i} ) $, $ i = 1 \dots n $, by means of a hyperplane in the space $ ( Y ^ {(k)} , \mathbf X ) $, in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of $ Y ^ {(k)} $ by means of $ \mathbf X $( or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of $ Y ^ {(k)} $ by means of linear combinations of $ \mathbf X $( linear smoothing of the points $ ( Y _ {i} ^ {(k)} , \mathbf X _ {i} ) $) are:

$$ Q _ {1} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \left ( Y ^ {(k)} ( \mathbf X ) - \sum _ {j=0}^ { p } b _ {kj} X ^ {(j)} \right ) ^ {2} \right \} , $$

$$ \widetilde{Q} _ {1} ( \mathbf b ) = \sum_{i=1}^ { n } \omega _ {i} ^ {2} \left ( Y _ {i} ^ {(k)} - \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right ) ^ {2} , $$

$$ Q _ {2} ( \mathbf b ) = {\mathsf E} \left \{ \omega ( \mathbf X ) \left | Y ^ {(k)} ( \mathbf X ) - \sum_{j=0}^ { p } b _ {kj} X ^ {(j)} \right | \right \} , $$

$$ \widetilde{Q} _ {2} ( \mathbf b ) = \sum_{j=1}^ { n } \omega _ {i} \left | Y _ {i} ^ {(k)} - \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right | , $$

$$ Q _ {3} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \rho ^ {2} \left ( Y ^ {(k)} ( \mathbf X ) , \sum_{j=0}^ { p } b _ {kj} X ^ {(j)} \right ) \right \} , $$

$$ \widetilde{Q} _ {3} ( \mathbf b ) = \sum_{i=1}^ { n } \omega _ {i} ^ {2} \cdot \rho ^ {2} \left ( Y _ {i} ^ {(k)} , \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right ) . $$

In these relations the choice of "weights" $ \omega ( \mathbf X ) $ or $ \omega _ {i} $ depends on the nature of the actual scheme under investigation. For example, if the $ Y ^ {(k)} ( \mathbf X ) $ are interpreted as random variables with known variances $ {\mathsf D} Y ^ {(k)} ( \mathbf X ) $( or with known estimates of them), then $ \omega ^ {2} ( \mathbf X ) = [ {\mathsf D} Y ^ {(k)} ( \mathbf X ) ] ^ {-} 1 $. In the last two criteria the "discrepancies" of the approximation or the smoothing are measured by the distances $ \rho ( \cdot , \cdot ) $ from $ Y ^ {(k)} ( \mathbf X ) $ or $ Y _ {i} ^ {(k)} $ to the required hyperplane of regression. If the coefficients $ b _ {kj} $ are determined by minimizing the quantities $ Q _ {1} ( \mathbf b ) $ or $ \widetilde{Q} _ {1} ( \mathbf b ) $, then the linear regression is said to be least squares or $ L _ {2} $; if the criteria $ Q _ {2} ( \mathbf b ) $ and $ \widetilde{Q} _ {2} ( \mathbf b ) $ are used, the linear regression is said to be minimal absolute deviations or $ L _ {1} $; if the criteria $ Q _ {3} ( \mathbf b ) $ and $ \widetilde{Q} _ {3} ( \mathbf b ) $ are used, it is said to be minimum $ \rho $- distance.

In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type $ Q _ {i} $. Thus, if the vector $ ( \mathbf X ^ \prime , Y ^ {(k)}] ) $ is subject to a multi-dimensional normal law, then the regression of $ Y ^ {(k)} $ on $ \mathbf X $ in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for $ \omega ( \mathbf X ) \equiv 1 $).

References

[1]	Yu.V. Linnik, "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft. (1961) (Translated from Russian)
[2]	H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)
[3]	M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Macmillan (1979)
[4]	C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965)

How to Cite This Entry:
Linear regression. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Linear_regression&oldid=11531

This article was adapted from an original article by S.A. Aivazyan (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Linear regression"

Latest revision as of 14:20, 13 January 2024

References

@@ Line 1: / Line 1: @@
-''of one random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594101.png" /> on another <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594102.png" />''
+<!--
+l0594101.png
+$#A+1 = 54 n = 0
+$#C+1 = 54 : ~/encyclopedia/old_files/data/L059/L.0509410 Linear regression
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
-An <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594103.png" />-dimensional vector form, linear in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594104.png" />, supposed to be the conditional mean (given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594105.png" />) of the random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594106.png" />. The corresponding equations
+{{TEX|auto}}
+{{TEX|done}}
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594107.png" /></td> <td valign="top" style="width:5%;text-align:right;">(*)</td></tr></table>
+''of one random variable  $  \mathbf Y = ( Y  ^ {(1)} \dots Y  ^ {(m)} )  ^  \prime  $
+on another  $  \mathbf X = ( X  ^ {(1)} \dots X  ^ {(p)} )  ^  \prime  $''
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594108.png" /></td> </tr></table>
+An  $  m $-
+dimensional vector form, linear in  $  \mathbf x $,
+supposed to be the conditional mean (given  $  \mathbf X = \mathbf x $)
+of the random vector  $  \mathbf Y $.
+The corresponding equations
-are called the linear regression equations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594109.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941010.png" />, and the parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941011.png" /> are called the regression coefficients (see also [[Regression|Regression]]), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941012.png" /> is an observable parameter (not necessarily random), on which the mean of the resulting function (response) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941013.png" /> under investigation depends.
+$$ \tag{* }
+y  ^ {(k)} ( \mathbf x , \mathbf b )  =  {\mathsf E} ( Y  ^ {(k)}
+\mid  \mathbf X = \mathbf x )  = \
+\sum_{j=0}^ { p }  b _ {kj} x  ^ {(j)} ,
+$$
-In addition, the linear regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941014.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941015.png" /> is frequently also understood to be the  "best"  (in a well-defined sense) linear approximation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941016.png" /> by means of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941017.png" />, or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" ) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941018.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941019.png" />, by means of a hyperplane in the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941020.png" />, in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941021.png" /> by means of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941022.png" /> (or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941023.png" /> by means of linear combinations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941024.png" /> (linear smoothing of the points <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941025.png" />) are:
+$$
+x  ^ {(0)} \equiv  1 ,\  k = 1 \dots m,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941026.png" /></td> </tr></table>
+are called the linear regression equations of  $  \mathbf Y $
+on  $  \mathbf X $,
+and the parameters  $  b _ {kj} $
+are called the regression coefficients (see also [[Regression|Regression]]),  $  \mathbf X $
+is an observable parameter (not necessarily random), on which the mean of the resulting function (response)  $  \mathbf Y ( \mathbf X ) $
+under investigation depends.
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941027.png" /></td> </tr></table>
+In addition, the linear regression of  $  Y  ^ {(k)} $
+on  $  \mathbf X $
+is frequently also understood to be the  "best"  (in a well-defined sense) linear approximation of  $  Y  ^ {(k)} $
+by means of  $  \mathbf X $,
+or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" )  $  ( Y _ {i}  ^ {(k)} , \mathbf X _ {i} ) $,
+$  i = 1 \dots n $,
+by means of a hyperplane in the space  $  ( Y  ^ {(k)} , \mathbf X ) $,
+in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of  $  Y  ^ {(k)} $
+by means of  $  \mathbf X $(
+or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of  $  Y  ^ {(k)} $
+by means of linear combinations of  $  \mathbf X $(
+linear smoothing of the points  $  ( Y _ {i}  ^ {(k)} , \mathbf X _ {i} ) $)
+are:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941028.png" /></td> </tr></table>
+$$
+Q _ {1} ( \mathbf b )  =  {\mathsf E}
+\left \{
+\omega  ^ {2} ( \mathbf X ) \cdot
+\left (
+Y  ^ {(k)} ( \mathbf X ) - \sum _ {j=0}^ { p }
+b _ {kj} X  ^ {(j)}
+\right )  ^ {2}
+\right \} ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941029.png" /></td> </tr></table>
+$$
+\widetilde{Q}  _ {1} ( \mathbf b )  =  \sum_{i=1}^ { n }  \omega _ {i}  ^ {2} \left ( Y _ {i}  ^ {(k)} - \sum_{j=0}^ { p }  b _ {kj} X _ {i}  ^ {(j)} \right )  ^ {2} ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941030.png" /></td> </tr></table>
+$$
+Q _ {2} ( \mathbf b )  =  {\mathsf E} \left \{ \omega ( \mathbf X )
+\left | Y  ^ {(k)} ( \mathbf X ) - \sum_{j=0}^ { p }  b _ {kj} X  ^ {(j)} \right | \right \} ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941031.png" /></td> </tr></table>
+$$
+\widetilde{Q}  _ {2} ( \mathbf b )  =  \sum_{j=1}^ { n }  \omega _ {i} \left | Y _ {i}  ^ {(k)} - \sum_{j=0}^ { p }  b _ {kj} X _ {i}  ^ {(j)} \right | ,
+$$
-In these relations the choice of  "weights"  <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941032.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941033.png" /> depends on the nature of the actual scheme under investigation. For example, if the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941034.png" /> are interpreted as random variables with known variances <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941035.png" /> (or with known estimates of them), then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941036.png" />. In the last two criteria the  "discrepancies"  of the approximation or the smoothing are measured by the distances <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941037.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941038.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941039.png" /> to the required hyperplane of regression. If the coefficients <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941040.png" /> are determined by minimizing the quantities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941041.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941042.png" />, then the linear regression is said to be least squares or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941044.png" />; if the criteria <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941045.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941046.png" /> are used, the linear regression is said to be minimal absolute deviations or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941047.png" />; if the criteria <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941048.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941049.png" /> are used, it is said to be minimum <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941051.png" />-distance.
+$$
+Q _ {3} ( \mathbf b )  =  {\mathsf E} \left \{ \omega
+  ^ {2} ( \mathbf X ) \cdot \rho  ^ {2} \left ( Y  ^ {(k)} (
+\mathbf X ) , \sum_{j=0}^ { p }  b _ {kj} X  ^ {(j)} \right ) \right \} ,
+$$
-In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941052.png" />. Thus, if the vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941053.png" /> is subject to a multi-dimensional normal law, then the regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941054.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941055.png" /> in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941056.png" />).
+$$
+\widetilde{Q}  _ {3} ( \mathbf b )  =  \sum_{i=1}^ { n }  \omega _ {i}  ^ {2} \cdot \rho  ^ {2} \left ( Y _ {i}  ^ {(k)} , \sum_{j=0}^ { p }  b _ {kj} X _ {i}  ^ {(j)} \right ) .
+$$
+In these relations the choice of  "weights"   $  \omega ( \mathbf X ) $
+or  $  \omega _ {i} $
+depends on the nature of the actual scheme under investigation. For example, if the  $  Y  ^ {(k)} ( \mathbf X ) $
+are interpreted as random variables with known variances  $  {\mathsf D} Y  ^ {(k)} ( \mathbf X ) $(
+or with known estimates of them), then  $  \omega  ^ {2} ( \mathbf X ) = [ {\mathsf D} Y  ^ {(k)} ( \mathbf X ) ]  ^ {-} 1 $.
+In the last two criteria the  "discrepancies"  of the approximation or the smoothing are measured by the distances  $  \rho ( \cdot , \cdot ) $
+from  $  Y  ^ {(k)} ( \mathbf X ) $
+or  $  Y _ {i}  ^ {(k)} $
+to the required hyperplane of regression. If the coefficients  $  b _ {kj} $
+are determined by minimizing the quantities  $  Q _ {1} ( \mathbf b ) $
+or  $  \widetilde{Q}  _ {1} ( \mathbf b ) $,
+then the linear regression is said to be least squares or  $  L _ {2} $;
+if the criteria  $  Q _ {2} ( \mathbf b ) $
+and  $  \widetilde{Q}  _ {2} ( \mathbf b ) $
+are used, the linear regression is said to be minimal absolute deviations or  $  L _ {1} $;
+if the criteria  $  Q _ {3} ( \mathbf b ) $
+and  $  \widetilde{Q}  _ {3} ( \mathbf b ) $
+are used, it is said to be minimum  $  \rho $-
+distance.
+In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type  $  Q _ {i} $.
+Thus, if the vector  $  ( \mathbf X  ^  \prime  , Y  ^ {(k)}] ) $
+is subject to a multi-dimensional normal law, then the regression of  $  Y  ^ {(k)} $
+on  $  \mathbf X $
+in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for  $  \omega ( \mathbf X ) \equiv 1 $).
 ====References====
 <table><TR><TD valign="top">[1]</TD> <TD valign="top">  Yu.V. Linnik,   "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft.  (1961)  (Translated from Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  H. Cramér,   "Mathematical methods of statistics" , Princeton Univ. Press  (1946)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  M.G. Kendall,   A. Stuart,   "The advanced theory of statistics" , '''2. Inference and relationship''' , Macmillan  (1979)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  C.R. Rao,   "Linear statistical inference and its applications" , Wiley  (1965)</TD></TR></table>