Namespaces
Variants
Actions

Difference between revisions of "M-estimator"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (AUTOMATIC EDIT (latexlist): Replaced 115 formulas out of 119 by TEX code with an average confidence of 2.0 and a minimal confidence of 2.0.)
Line 1: Line 1:
A generalization of the maximum-likelihood estimator (MLE) in [[Mathematical statistics|mathematical statistics]] (cf. also [[Maximum-likelihood method|Maximum-likelihood method]]; [[Statistical estimator|Statistical estimator]]). Suppose one has univariate observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m1200301.png" /> which are independent and identically distributed according to a distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m1200302.png" /> with univariate parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m1200303.png" />. Denote by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m1200304.png" /> the likelihood of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m1200305.png" />. The maximum-likelihood estimator is defined as the value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m1200306.png" /> which maximizes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m1200307.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m1200308.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m1200309.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003010.png" />, then this is equivalent to minimizing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003011.png" />. P.J. Huber [[#References|[a1]]] has generalized this to M-estimators, which are defined by minimizing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003012.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003013.png" /> is an arbitrary real function. When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003014.png" /> has a partial derivative <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003015.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003016.png" /> satisfies the implicit equation
+
<!--This article has been texified automatically. Since there was no Nroff source code for this article,
 +
the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist
 +
was used.
 +
If the TeX and formula formatting is correct, please remove this message and the {{TEX|semi-auto}} category.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003017.png" /></td> </tr></table>
+
Out of 119 formulas, 115 were replaced by TEX code.-->
  
Note that the maximum-likelihood estimator is an M-estimator, obtained by putting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003018.png" />.
+
{{TEX|semi-auto}}{{TEX|partial}}
 +
A generalization of the maximum-likelihood estimator (MLE) in [[Mathematical statistics|mathematical statistics]] (cf. also [[Maximum-likelihood method|Maximum-likelihood method]]; [[Statistical estimator|Statistical estimator]]). Suppose one has univariate observations $x _ { 1 } , \ldots , x _ { n }$ which are independent and identically distributed according to a distribution $F _ { \theta }$ with univariate parameter $\theta$. Denote by $f _ { \theta } ( x )$ the likelihood of $F _ { \theta }$. The maximum-likelihood estimator is defined as the value $T _ { n } = T _ { n } ( x _ { 1 } , \ldots , x _ { n } )$ which maximizes $\prod _ { i = 1 } ^ { n } f _ { T _ { n } } ( x _ { i } )$. If $f _ { \theta } ( x ) &gt; 0$ for all $x$ and $\theta$, then this is equivalent to minimizing $\sum _ { i = 1 } ^ { n } [ - \operatorname { ln } f _ { T _ { n } } ( x _ { i } ) ]$. P.J. Huber [[#References|[a1]]] has generalized this to M-estimators, which are defined by minimizing $\sum _ { i = 1 } ^ { n } \rho ( x _ { i } , T _ { n } )$, where $\rho$ is an arbitrary real function. When $\rho$ has a partial derivative $\Psi ( x , \theta ) = ( \partial / \partial \theta ) \rho ( x , \theta )$, then $T _ { n }$ satisfies the implicit equation
  
The maximum-likelihood estimator can give arbitrarily bad results when the underlying assumptions (e.g., the form of the distribution generating the data) are not satisfied (e.g., because the data contain some outliers, cf. also [[Outlier|Outlier]]). M-estimators are particularly useful in [[Robust statistics|robust statistics]], which aims to construct methods that are relatively insensitive to deviations from the standard assumptions. M-estimators with bounded <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003019.png" /> are typically robust.
+
\begin{equation*} \sum _ { i = 1 } ^ { n } \Psi ( x _ { i } , T _ { n } ) = 0. \end{equation*}
  
Apart from the finite-sample version <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003020.png" /> of the M-estimator, there is also a functional version <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003021.png" /> defined for any [[Probability distribution|probability distribution]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003022.png" /> by
+
Note that the maximum-likelihood estimator is an M-estimator, obtained by putting $\rho ( x , \theta ) = - \operatorname { ln } f _ { \theta } ( x )$.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003023.png" /></td> </tr></table>
+
The maximum-likelihood estimator can give arbitrarily bad results when the underlying assumptions (e.g., the form of the distribution generating the data) are not satisfied (e.g., because the data contain some outliers, cf. also [[Outlier|Outlier]]). M-estimators are particularly useful in [[Robust statistics|robust statistics]], which aims to construct methods that are relatively insensitive to deviations from the standard assumptions. M-estimators with bounded $\Psi$ are typically robust.
  
Here, it is assumed that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003024.png" /> is Fisher-consistent, i.e. that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003025.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003026.png" />. The influence function of a functional <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003027.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003028.png" /> is defined, as in [[#References|[a2]]], by
+
Apart from the finite-sample version $T _ { n } ( x _ { 1 } , \ldots , x _ { n } )$ of the M-estimator, there is also a functional version $T ( G )$ defined for any [[Probability distribution|probability distribution]] $G$ by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003029.png" /></td> </tr></table>
+
\begin{equation*} \int \Psi ( x , T ( G ) ) d G ( x ) = 0. \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003030.png" /> is the probability distribution which puts all its mass in the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003031.png" />. Therefore <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003032.png" /> describes the effect of a single outlier in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003033.png" /> on the estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003034.png" />. For an M-estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003035.png" /> at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003036.png" />,
+
Here, it is assumed that $T$ is Fisher-consistent, i.e. that $T ( F _ { \theta } ) = \theta$ for all $\theta$. The influence function of a functional $T$ in $G$ is defined, as in [[#References|[a2]]], by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003037.png" /></td> </tr></table>
+
\begin{equation*} \operatorname { IF } ( x ; T , G ) = \frac { \partial } { \partial \varepsilon } [ T ( ( 1 - \varepsilon ) G + \varepsilon \Delta _ { x } ) ]_{\varepsilon = 0 +}, \end{equation*}
  
The influence function of an M-estimator is thus proportional to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003038.png" /> itself. Under suitable conditions, [[#References|[a3]]], M-estimators are asymptotically normal with asymptotic variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003039.png" />.
+
where $\Delta _ { x }$ is the probability distribution which puts all its mass in the point $x$. Therefore $\operatorname{IF} ( x ; T , G )$ describes the effect of a single outlier in $x$ on the estimator $T$. For an M-estimator $T$ at $F _ { \theta }$,
  
Optimal robust M-estimators can be obtained by solving Huber's minimax variance problem [[#References|[a1]]] or by minimizing the asymptotic variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003040.png" /> subject to an upper bound on the gross-error sensitivity <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003041.png" /> as in [[#References|[a2]]].
+
\begin{equation*} \operatorname { IF } ( x ; T , F _ { \theta } ) = \frac { \Psi ( x , \theta ) } { \int \frac { \partial } { \partial \theta } \Psi ( y , \theta ) d F _ { \theta } ( y ) }. \end{equation*}
  
When estimating a univariate location, it is natural to use <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003042.png" />-functions of the type <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003043.png" />. The optimal robust M-estimator for univariate location at the Gaussian location model <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003044.png" /> (cf. also [[Gauss law|Gauss law]]) is given by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003045.png" />. This <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003046.png" /> has come to be known as Huber's function. Note that when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003047.png" />, this M-estimator tends to the median (cf. also [[Median (in statistics)|Median (in statistics)]]), and when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003048.png" /> it tends to the mean (cf. also [[Average|Average]]).
+
The influence function of an M-estimator is thus proportional to $\Psi ( x , \theta )$ itself. Under suitable conditions, [[#References|[a3]]], M-estimators are asymptotically normal with asymptotic variance $V ( T , F _ { \theta } ) = \int \operatorname { IF } ( x ; T , F _ { \theta } ) ^ { 2 } d F _ { \theta } ( x )$.
  
The breakdown value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003049.png" /> of an estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003050.png" /> is the largest fraction of arbitrary outliers it can tolerate without becoming unbounded (see [[#References|[a2]]]). Any M-estimator with a monotone and bounded <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003051.png" /> function has breakdown value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003052.png" />, the highest possible value.
+
Optimal robust M-estimators can be obtained by solving Huber's minimax variance problem [[#References|[a1]]] or by minimizing the asymptotic variance $V ( T , F _ { \theta } )$ subject to an upper bound on the gross-error sensitivity $\gamma ^ { * } = \operatorname { sup } _ { x } | \operatorname { IF } ( x ; T , F _ { \theta } ) |$ as in [[#References|[a2]]].
  
Location M-estimators are not invariant with respect to scale. Therefore it is recommended to compute <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003053.png" /> from
+
When estimating a univariate location, it is natural to use $\Psi$-functions of the type $\Psi ( x , \theta ) = \psi ( x - \theta )$. The optimal robust M-estimator for univariate location at the Gaussian location model $F _ { \theta } ( x ) = \Phi ( x - \theta )$ (cf. also [[Gauss law|Gauss law]]) is given by $\psi _ { b } ( x ) = [ x ] ^ { b _{ - b}} = \operatorname { min } ( b , \operatorname { max } ( - b , x ) )$. This $\psi_b$ has come to be known as Huber's function. Note that when $b \downarrow 0$, this M-estimator tends to the median (cf. also [[Median (in statistics)|Median (in statistics)]]), and when $b \uparrow \infty$ it tends to the mean (cf. also [[Average|Average]]).
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003054.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a1)</td></tr></table>
+
The breakdown value $\varepsilon ^ { * } ( T )$ of an estimator $T$ is the largest fraction of arbitrary outliers it can tolerate without becoming unbounded (see [[#References|[a2]]]). Any M-estimator with a monotone and bounded $\psi$ function has breakdown value $\varepsilon ^ { * } ( T ) = 1 / 2$, the highest possible value.
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003055.png" /> is a robust estimator of scale, e.g. the median absolute deviation
+
Location M-estimators are not invariant with respect to scale. Therefore it is recommended to compute $T _ { n }$ from
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003056.png" /></td> </tr></table>
+
\begin{equation} \tag{a1} \sum _ { i = 1 } ^ { n } \psi \Bigl( \frac { x _ { i } - T _ { n } } { S _ { n } }\Bigr ) = 0, \end{equation}
  
which has <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003057.png" />.
+
where $S _ { n }$ is a robust estimator of scale, e.g. the median absolute deviation
  
For univariate scale estimation one uses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003058.png" />-functions of the type <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003059.png" />. At the Gaussian scale model <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003060.png" />, the optimal robust M-estimators are given by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003061.png" />. For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003062.png" /> one obtains the median absolute deviation and for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003063.png" /> the standard deviation. In the general case, where both location and scale are unknown, one first computes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003064.png" /> and then plugs it into (a1) for finding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003065.png" />.
+
<table class="eq" style="width:100%;"> <tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003056.png"/></td> </tr></table>
  
For multivariate location and scatter matrices, M-estimators were defined by R.A. Maronna [[#References|[a4]]], who also gave their influence function and asymptotic covariance matrix. For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003066.png" />-dimensional data, the breakdown value of M-estimators is at most <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003067.png" />.
+
which has $\varepsilon ^ { * } ( \operatorname{MAD} ) = 1 / 2$.
  
For [[Regression analysis|regression analysis]], one considers the linear model <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003068.png" /> where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003069.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003070.png" /> are column vectors, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003071.png" /> and the error term <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003072.png" /> are independent. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003073.png" /> have a distribution with location zero and scale <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003074.png" />. For simplicity, put <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003075.png" />. Denote by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003076.png" /> the joint distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003077.png" />, which implies the distribution of the error term <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003078.png" />. Based on a data set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003079.png" />, M-estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003080.png" /> for regression [[#References|[a3]]] are defined by
+
For univariate scale estimation one uses $\Psi$-functions of the type $\Psi ( x , \sigma ) = \chi ( x / \sigma )$. At the Gaussian scale model $F _ { \sigma } ( x ) = \Phi ( x / \sigma )$, the optimal robust M-estimators are given by $\widetilde { \chi } ( x ) = [ x ^ { 2 } - 1 - a ] _ { - b } ^ { b }$. For $b \downarrow 0$ one obtains the median absolute deviation and for $b \uparrow \infty$ the standard deviation. In the general case, where both location and scale are unknown, one first computes $\hat { \sigma } = S _ { n } = \operatorname {MAD} _ { i = 1 } ^ { n } ( x _ { i } )$ and then plugs it into (a1) for finding $\hat { \theta } = T _ { n }$.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003081.png" /></td> </tr></table>
+
For multivariate location and scatter matrices, M-estimators were defined by R.A. Maronna [[#References|[a4]]], who also gave their influence function and asymptotic covariance matrix. For $p$-dimensional data, the breakdown value of M-estimators is at most $1 / p$.
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003082.png" /> are the residuals. If the Huber function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003083.png" /> is used, the influence function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003084.png" /> at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003085.png" /> equals
+
For [[Regression analysis|regression analysis]], one considers the linear model $y = \overset{\rightharpoonup}{ x } ^ { t } \overset{\rightharpoonup}{ \theta } + e$ where $\overset{\rightharpoonup}{ x }$ and $\overset{\rightharpoonup} { \theta }$ are column vectors, and $\overset{\rightharpoonup}{ x }$ and the error term $e$ are independent. Let $e$ have a distribution with location zero and scale $\sigma$. For simplicity, put $\sigma = 1$. Denote by $H _ { \overset{\rightharpoonup}{ \theta } }$ the joint distribution of $( \overset{\rightharpoonup} { x } , y )$, which implies the distribution of the error term $e = y - \overset{\rightharpoonup} { x } ^ { t } \overset{\rightharpoonup} { \theta }$. Based on a data set $\{ ( \overset{\rightharpoonup} { x } _ { 1 } , y _ { 1 } ) , \dots , ( \overset{\rightharpoonup}{x} _ { n } , y _ { n } ) \}$, M-estimators $T _ { n }$ for regression [[#References|[a3]]] are defined by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003086.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a2)</td></tr></table>
+
\begin{equation*} \sum _ { i = 1 } ^ { n } \psi ( r _ { i } ) \overset{\rightharpoonup} { x } _ { i } = \overset{\rightharpoonup} { 0 }, \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003087.png" />. The first factor of (a2) is the influence of the vertical error <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003088.png" />. It is bounded, which makes this estimator more robust than least squares (cf. also [[Least squares, method of|Least squares, method of]]). The second factor is the influence of the position <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003089.png" />. Unfortunately, this factor is unbounded, hence a single outlying <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003090.png" /> (i.e., a horizontal outlier) will almost completely determine the fit, as shown in [[#References|[a2]]]. Therefore the breakdown value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003091.png" />.
+
where $r _ { i } = y _ { i } - \overset{\rightharpoonup} { x } _ { i } ^ { t } T _ { n }$ are the residuals. If the Huber function $\psi_b$ is used, the influence function of $T$ at $H _ { \overset{\rightharpoonup}{ \theta } }$ equals
 +
 
 +
<table class="eq" style="width:100%;"> <tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003086.png"/></td> <td style="width:5%;text-align:right;" valign="top">(a2)</td></tr></table>
 +
 
 +
where $e _ { 0 } = y _ { 0 } - \overset{\rightharpoonup} { x } _ { 0 } ^ { t} \overset{\rightharpoonup} { \theta }$. The first factor of (a2) is the influence of the vertical error $e_0$. It is bounded, which makes this estimator more robust than least squares (cf. also [[Least squares, method of|Least squares, method of]]). The second factor is the influence of the position $\overset{\rightharpoonup} { x }_{0}$. Unfortunately, this factor is unbounded, hence a single outlying $\overset{\rightharpoonup} { x } _ { j }$ (i.e., a horizontal outlier) will almost completely determine the fit, as shown in [[#References|[a2]]]. Therefore the breakdown value $\varepsilon ^ { * } ( T ) = 0$.
  
 
To obtain a bounded influence function, generalized M-estimators [[#References|[a2]]] are defined by
 
To obtain a bounded influence function, generalized M-estimators [[#References|[a2]]] are defined by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003092.png" /></td> </tr></table>
+
\begin{equation*} \sum _ { i = 1 } ^ { n } \eta ( \overset{\rightharpoonup} { x } _ { i } , r _ { i } ) \overset{\rightharpoonup}{ x } _ { i } = \overset{\rightharpoonup}{ 0 }, \end{equation*}
  
for some real function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003093.png" />. The influence function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003094.png" /> at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003095.png" /> now becomes
+
for some real function $ \eta $. The influence function of $T$ at $H _ { \overset{\rightharpoonup}{ \theta } }$ now becomes
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003096.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a3)</td></tr></table>
+
\begin{equation} \tag{a3} \operatorname { IF } ( ( \overset{\rightharpoonup} { x } _ { 0 } , y _ { 0 } ) ; T , H _ { \overset{\rightharpoonup}{ \theta } } ) = \eta ( \overset{\rightharpoonup} { x } _ { 0 } , e _ { 0 } ) M ^ { - 1 } \overset{\rightharpoonup} { x } _ { 0 }, \end{equation}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003097.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003098.png" />. For an appropriate choice of the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m12003099.png" />, the influence function (a3) is bounded, but still the breakdown value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030100.png" /> goes down to zero when the number of parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030101.png" /> increases.
+
where $e _ { 0 } = y _ { 0 } - \overset{\rightharpoonup} { x } _ { 0 } ^ { t} \overset{\rightharpoonup} { \theta }$ and $M = \int ( \partial / \partial e ) \eta ( \overset{\rightharpoonup}  { x } , e ) \overset{\rightharpoonup}  { x } \overset{\rightharpoonup} {x } ^ { t } d H _ { \overset{\rightharpoonup} { \theta } } ( \overset{\rightharpoonup}  { x } , y )$. For an appropriate choice of the function $ \eta $, the influence function (a3) is bounded, but still the breakdown value $\varepsilon ^ { * } ( T )$ goes down to zero when the number of parameters $p$ increases.
  
To repair this, P.J. Rousseeuw and V.J. Yohai [[#References|[a5]]] have introduced S-estimators. An S-estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030102.png" /> minimizes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030103.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030104.png" /> are the residuals and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030105.png" /> is the robust scale estimator defined as the solution of
+
To repair this, P.J. Rousseeuw and V.J. Yohai [[#References|[a5]]] have introduced S-estimators. An S-estimator $T _ { n }$ minimizes $s ( r _ { 1 } , \dots , r _ { n } )$, where $r _ { i } = y _ { i } - \overset{\rightharpoonup} { x } _ { i } ^ { t } T _ { n }$ are the residuals and $s ( r _ { 1 } , \dots , r _ { n } )$ is the robust scale estimator defined as the solution of
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030106.png" /></td> </tr></table>
+
\begin{equation*} \frac { 1 } { n } \sum _ { i = 1 } ^ { n } \rho \left( \frac { r_i } { s } \right) = K, \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030107.png" /> is taken to be <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030108.png" />. The function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030109.png" /> must satisfy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030110.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030111.png" /> and be continuously differentiable, and there must be a constant <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030112.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030113.png" /> is strictly increasing on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030114.png" /> and constant on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030115.png" />. Any S-estimator has breakdown value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030116.png" /> in all dimensions, and it is asymptotically normal with the same asymptotic covariance as the M-estimator with that function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m120/m120030/m120030117.png" />. The S-estimators have also been generalized to multivariate location and scatter matrices, in [[#References|[a6]]], and they enjoy the same properties.
+
where $K$ is taken to be $\int \rho ( u ) d \Phi ( u )$. The function $\rho$ must satisfy $\rho ( - u ) = \rho ( u )$ and $\rho ( 0 ) = 0$ and be continuously differentiable, and there must be a constant $c &gt; 0$ such that $\rho$ is strictly increasing on $[ 0 , c ]$ and constant on $[ c , \infty )$. Any S-estimator has breakdown value $\varepsilon ^ { * } ( T ) = 1 / 2$ in all dimensions, and it is asymptotically normal with the same asymptotic covariance as the M-estimator with that function $\rho$. The S-estimators have also been generalized to multivariate location and scatter matrices, in [[#References|[a6]]], and they enjoy the same properties.
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  P.J. Huber,  "Robust estimation of a location parameter"  ''Ann. Math. Stat.'' , '''35'''  (1964)  pp. 73–101</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  F.R. Hampel,  E.M. Ronchetti,  P.J. Rousseeuw,  W.A. Stahel,  "Robust statistics: The approach based on influence functions" , Wiley  (1986)</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  P.J. Huber,  "Robust statistics" , Wiley  (1981)</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  R.A. Maronna,  "Robust M-estimators of multivariate location and scatter"  ''Ann. Statist.'' , '''4'''  (1976)  pp. 51–67</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  P.J. Rousseeuw,  V.J. Yohai,  "Robust regression by means of S-estimators"  J. Franke (ed.)  W. Härdle (ed.)  R.D. Martin (ed.) , ''Robust and Nonlinear Time Ser. Analysis'' , ''Lecture Notes Statistics'' , '''26''' , Springer  (1984)  pp. 256–272</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top">  P.J. Rousseeuw,  A. Leroy,  "Robust regression and outlier detection" , Wiley  (1987)</TD></TR></table>
+
<table><tr><td valign="top">[a1]</td> <td valign="top">  P.J. Huber,  "Robust estimation of a location parameter"  ''Ann. Math. Stat.'' , '''35'''  (1964)  pp. 73–101</td></tr><tr><td valign="top">[a2]</td> <td valign="top">  F.R. Hampel,  E.M. Ronchetti,  P.J. Rousseeuw,  W.A. Stahel,  "Robust statistics: The approach based on influence functions" , Wiley  (1986)</td></tr><tr><td valign="top">[a3]</td> <td valign="top">  P.J. Huber,  "Robust statistics" , Wiley  (1981)</td></tr><tr><td valign="top">[a4]</td> <td valign="top">  R.A. Maronna,  "Robust M-estimators of multivariate location and scatter"  ''Ann. Statist.'' , '''4'''  (1976)  pp. 51–67</td></tr><tr><td valign="top">[a5]</td> <td valign="top">  P.J. Rousseeuw,  V.J. Yohai,  "Robust regression by means of S-estimators"  J. Franke (ed.)  W. Härdle (ed.)  R.D. Martin (ed.) , ''Robust and Nonlinear Time Ser. Analysis'' , ''Lecture Notes Statistics'' , '''26''' , Springer  (1984)  pp. 256–272</td></tr><tr><td valign="top">[a6]</td> <td valign="top">  P.J. Rousseeuw,  A. Leroy,  "Robust regression and outlier detection" , Wiley  (1987)</td></tr></table>

Revision as of 16:52, 1 July 2020

A generalization of the maximum-likelihood estimator (MLE) in mathematical statistics (cf. also Maximum-likelihood method; Statistical estimator). Suppose one has univariate observations $x _ { 1 } , \ldots , x _ { n }$ which are independent and identically distributed according to a distribution $F _ { \theta }$ with univariate parameter $\theta$. Denote by $f _ { \theta } ( x )$ the likelihood of $F _ { \theta }$. The maximum-likelihood estimator is defined as the value $T _ { n } = T _ { n } ( x _ { 1 } , \ldots , x _ { n } )$ which maximizes $\prod _ { i = 1 } ^ { n } f _ { T _ { n } } ( x _ { i } )$. If $f _ { \theta } ( x ) > 0$ for all $x$ and $\theta$, then this is equivalent to minimizing $\sum _ { i = 1 } ^ { n } [ - \operatorname { ln } f _ { T _ { n } } ( x _ { i } ) ]$. P.J. Huber [a1] has generalized this to M-estimators, which are defined by minimizing $\sum _ { i = 1 } ^ { n } \rho ( x _ { i } , T _ { n } )$, where $\rho$ is an arbitrary real function. When $\rho$ has a partial derivative $\Psi ( x , \theta ) = ( \partial / \partial \theta ) \rho ( x , \theta )$, then $T _ { n }$ satisfies the implicit equation

\begin{equation*} \sum _ { i = 1 } ^ { n } \Psi ( x _ { i } , T _ { n } ) = 0. \end{equation*}

Note that the maximum-likelihood estimator is an M-estimator, obtained by putting $\rho ( x , \theta ) = - \operatorname { ln } f _ { \theta } ( x )$.

The maximum-likelihood estimator can give arbitrarily bad results when the underlying assumptions (e.g., the form of the distribution generating the data) are not satisfied (e.g., because the data contain some outliers, cf. also Outlier). M-estimators are particularly useful in robust statistics, which aims to construct methods that are relatively insensitive to deviations from the standard assumptions. M-estimators with bounded $\Psi$ are typically robust.

Apart from the finite-sample version $T _ { n } ( x _ { 1 } , \ldots , x _ { n } )$ of the M-estimator, there is also a functional version $T ( G )$ defined for any probability distribution $G$ by

\begin{equation*} \int \Psi ( x , T ( G ) ) d G ( x ) = 0. \end{equation*}

Here, it is assumed that $T$ is Fisher-consistent, i.e. that $T ( F _ { \theta } ) = \theta$ for all $\theta$. The influence function of a functional $T$ in $G$ is defined, as in [a2], by

\begin{equation*} \operatorname { IF } ( x ; T , G ) = \frac { \partial } { \partial \varepsilon } [ T ( ( 1 - \varepsilon ) G + \varepsilon \Delta _ { x } ) ]_{\varepsilon = 0 +}, \end{equation*}

where $\Delta _ { x }$ is the probability distribution which puts all its mass in the point $x$. Therefore $\operatorname{IF} ( x ; T , G )$ describes the effect of a single outlier in $x$ on the estimator $T$. For an M-estimator $T$ at $F _ { \theta }$,

\begin{equation*} \operatorname { IF } ( x ; T , F _ { \theta } ) = \frac { \Psi ( x , \theta ) } { \int \frac { \partial } { \partial \theta } \Psi ( y , \theta ) d F _ { \theta } ( y ) }. \end{equation*}

The influence function of an M-estimator is thus proportional to $\Psi ( x , \theta )$ itself. Under suitable conditions, [a3], M-estimators are asymptotically normal with asymptotic variance $V ( T , F _ { \theta } ) = \int \operatorname { IF } ( x ; T , F _ { \theta } ) ^ { 2 } d F _ { \theta } ( x )$.

Optimal robust M-estimators can be obtained by solving Huber's minimax variance problem [a1] or by minimizing the asymptotic variance $V ( T , F _ { \theta } )$ subject to an upper bound on the gross-error sensitivity $\gamma ^ { * } = \operatorname { sup } _ { x } | \operatorname { IF } ( x ; T , F _ { \theta } ) |$ as in [a2].

When estimating a univariate location, it is natural to use $\Psi$-functions of the type $\Psi ( x , \theta ) = \psi ( x - \theta )$. The optimal robust M-estimator for univariate location at the Gaussian location model $F _ { \theta } ( x ) = \Phi ( x - \theta )$ (cf. also Gauss law) is given by $\psi _ { b } ( x ) = [ x ] ^ { b _{ - b}} = \operatorname { min } ( b , \operatorname { max } ( - b , x ) )$. This $\psi_b$ has come to be known as Huber's function. Note that when $b \downarrow 0$, this M-estimator tends to the median (cf. also Median (in statistics)), and when $b \uparrow \infty$ it tends to the mean (cf. also Average).

The breakdown value $\varepsilon ^ { * } ( T )$ of an estimator $T$ is the largest fraction of arbitrary outliers it can tolerate without becoming unbounded (see [a2]). Any M-estimator with a monotone and bounded $\psi$ function has breakdown value $\varepsilon ^ { * } ( T ) = 1 / 2$, the highest possible value.

Location M-estimators are not invariant with respect to scale. Therefore it is recommended to compute $T _ { n }$ from

\begin{equation} \tag{a1} \sum _ { i = 1 } ^ { n } \psi \Bigl( \frac { x _ { i } - T _ { n } } { S _ { n } }\Bigr ) = 0, \end{equation}

where $S _ { n }$ is a robust estimator of scale, e.g. the median absolute deviation

which has $\varepsilon ^ { * } ( \operatorname{MAD} ) = 1 / 2$.

For univariate scale estimation one uses $\Psi$-functions of the type $\Psi ( x , \sigma ) = \chi ( x / \sigma )$. At the Gaussian scale model $F _ { \sigma } ( x ) = \Phi ( x / \sigma )$, the optimal robust M-estimators are given by $\widetilde { \chi } ( x ) = [ x ^ { 2 } - 1 - a ] _ { - b } ^ { b }$. For $b \downarrow 0$ one obtains the median absolute deviation and for $b \uparrow \infty$ the standard deviation. In the general case, where both location and scale are unknown, one first computes $\hat { \sigma } = S _ { n } = \operatorname {MAD} _ { i = 1 } ^ { n } ( x _ { i } )$ and then plugs it into (a1) for finding $\hat { \theta } = T _ { n }$.

For multivariate location and scatter matrices, M-estimators were defined by R.A. Maronna [a4], who also gave their influence function and asymptotic covariance matrix. For $p$-dimensional data, the breakdown value of M-estimators is at most $1 / p$.

For regression analysis, one considers the linear model $y = \overset{\rightharpoonup}{ x } ^ { t } \overset{\rightharpoonup}{ \theta } + e$ where $\overset{\rightharpoonup}{ x }$ and $\overset{\rightharpoonup} { \theta }$ are column vectors, and $\overset{\rightharpoonup}{ x }$ and the error term $e$ are independent. Let $e$ have a distribution with location zero and scale $\sigma$. For simplicity, put $\sigma = 1$. Denote by $H _ { \overset{\rightharpoonup}{ \theta } }$ the joint distribution of $( \overset{\rightharpoonup} { x } , y )$, which implies the distribution of the error term $e = y - \overset{\rightharpoonup} { x } ^ { t } \overset{\rightharpoonup} { \theta }$. Based on a data set $\{ ( \overset{\rightharpoonup} { x } _ { 1 } , y _ { 1 } ) , \dots , ( \overset{\rightharpoonup}{x} _ { n } , y _ { n } ) \}$, M-estimators $T _ { n }$ for regression [a3] are defined by

\begin{equation*} \sum _ { i = 1 } ^ { n } \psi ( r _ { i } ) \overset{\rightharpoonup} { x } _ { i } = \overset{\rightharpoonup} { 0 }, \end{equation*}

where $r _ { i } = y _ { i } - \overset{\rightharpoonup} { x } _ { i } ^ { t } T _ { n }$ are the residuals. If the Huber function $\psi_b$ is used, the influence function of $T$ at $H _ { \overset{\rightharpoonup}{ \theta } }$ equals

(a2)

where $e _ { 0 } = y _ { 0 } - \overset{\rightharpoonup} { x } _ { 0 } ^ { t} \overset{\rightharpoonup} { \theta }$. The first factor of (a2) is the influence of the vertical error $e_0$. It is bounded, which makes this estimator more robust than least squares (cf. also Least squares, method of). The second factor is the influence of the position $\overset{\rightharpoonup} { x }_{0}$. Unfortunately, this factor is unbounded, hence a single outlying $\overset{\rightharpoonup} { x } _ { j }$ (i.e., a horizontal outlier) will almost completely determine the fit, as shown in [a2]. Therefore the breakdown value $\varepsilon ^ { * } ( T ) = 0$.

To obtain a bounded influence function, generalized M-estimators [a2] are defined by

\begin{equation*} \sum _ { i = 1 } ^ { n } \eta ( \overset{\rightharpoonup} { x } _ { i } , r _ { i } ) \overset{\rightharpoonup}{ x } _ { i } = \overset{\rightharpoonup}{ 0 }, \end{equation*}

for some real function $ \eta $. The influence function of $T$ at $H _ { \overset{\rightharpoonup}{ \theta } }$ now becomes

\begin{equation} \tag{a3} \operatorname { IF } ( ( \overset{\rightharpoonup} { x } _ { 0 } , y _ { 0 } ) ; T , H _ { \overset{\rightharpoonup}{ \theta } } ) = \eta ( \overset{\rightharpoonup} { x } _ { 0 } , e _ { 0 } ) M ^ { - 1 } \overset{\rightharpoonup} { x } _ { 0 }, \end{equation}

where $e _ { 0 } = y _ { 0 } - \overset{\rightharpoonup} { x } _ { 0 } ^ { t} \overset{\rightharpoonup} { \theta }$ and $M = \int ( \partial / \partial e ) \eta ( \overset{\rightharpoonup} { x } , e ) \overset{\rightharpoonup} { x } \overset{\rightharpoonup} {x } ^ { t } d H _ { \overset{\rightharpoonup} { \theta } } ( \overset{\rightharpoonup} { x } , y )$. For an appropriate choice of the function $ \eta $, the influence function (a3) is bounded, but still the breakdown value $\varepsilon ^ { * } ( T )$ goes down to zero when the number of parameters $p$ increases.

To repair this, P.J. Rousseeuw and V.J. Yohai [a5] have introduced S-estimators. An S-estimator $T _ { n }$ minimizes $s ( r _ { 1 } , \dots , r _ { n } )$, where $r _ { i } = y _ { i } - \overset{\rightharpoonup} { x } _ { i } ^ { t } T _ { n }$ are the residuals and $s ( r _ { 1 } , \dots , r _ { n } )$ is the robust scale estimator defined as the solution of

\begin{equation*} \frac { 1 } { n } \sum _ { i = 1 } ^ { n } \rho \left( \frac { r_i } { s } \right) = K, \end{equation*}

where $K$ is taken to be $\int \rho ( u ) d \Phi ( u )$. The function $\rho$ must satisfy $\rho ( - u ) = \rho ( u )$ and $\rho ( 0 ) = 0$ and be continuously differentiable, and there must be a constant $c > 0$ such that $\rho$ is strictly increasing on $[ 0 , c ]$ and constant on $[ c , \infty )$. Any S-estimator has breakdown value $\varepsilon ^ { * } ( T ) = 1 / 2$ in all dimensions, and it is asymptotically normal with the same asymptotic covariance as the M-estimator with that function $\rho$. The S-estimators have also been generalized to multivariate location and scatter matrices, in [a6], and they enjoy the same properties.

References

[a1] P.J. Huber, "Robust estimation of a location parameter" Ann. Math. Stat. , 35 (1964) pp. 73–101
[a2] F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, W.A. Stahel, "Robust statistics: The approach based on influence functions" , Wiley (1986)
[a3] P.J. Huber, "Robust statistics" , Wiley (1981)
[a4] R.A. Maronna, "Robust M-estimators of multivariate location and scatter" Ann. Statist. , 4 (1976) pp. 51–67
[a5] P.J. Rousseeuw, V.J. Yohai, "Robust regression by means of S-estimators" J. Franke (ed.) W. Härdle (ed.) R.D. Martin (ed.) , Robust and Nonlinear Time Ser. Analysis , Lecture Notes Statistics , 26 , Springer (1984) pp. 256–272
[a6] P.J. Rousseeuw, A. Leroy, "Robust regression and outlier detection" , Wiley (1987)
How to Cite This Entry:
M-estimator. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=M-estimator&oldid=50053
This article was adapted from an original article by P.J. RousseeuwS. Van Aelst (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article