A practical approximation algorithm for the LMS line estimator

The problem of fitting a straight line to a finite collection of points in the plane is an important problem in statistical estimation. Robust estimators are widely used because of their lack of sensitivity to outlying data points. The least median-of-squares (LMS) regression line estimator is among...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Computational statistics & data analysis Ročník 51; číslo 5; s. 2461 - 2486
Hlavní autori:	Mount, David M., Netanyahu, Nathan S., Romanik, Kathleen, Silverman, Ruth, Wu, Angela Y.
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Amsterdam Elsevier B.V 01.02.2007 Elsevier Science Elsevier
Edícia:	Computational Statistics & Data Analysis
Predmet:	Approximation algorithms Calculus of variations and optimal control Exact sciences and technology General topics Least median-of-squares regression Line arrangements Line fitting Mathematical analysis Mathematics Multivariate analysis Numerical analysis Numerical analysis. Scientific computation Numerical methods in probability and statistics Probability and statistics Randomized algorithms Robust estimation Sciences and techniques of general use Statistics Approximation algorithms Randomized algorithms Robust estimation Line fitting Line arrangements Least median-of-squares regression Data analysis Approximation Error estimation Fitting Estimator robustness Median Statistical estimation Approximation algorithm Statistical regression Statistical computation Least squares method Least square median Distribution function Quantile
ISSN:	0167-9473, 1872-7352
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	The problem of fitting a straight line to a finite collection of points in the plane is an important problem in statistical estimation. Robust estimators are widely used because of their lack of sensitivity to outlying data points. The least median-of-squares (LMS) regression line estimator is among the best known robust estimators. Given a set of n points in the plane, it is defined to be the line that minimizes the median squared residual or, more generally, the line that minimizes the residual of any given quantile q, where 0 < q ⩽ 1 . This problem is equivalent to finding the strip defined by two parallel lines of minimum vertical separation that encloses at least half of the points. The best known exact algorithm for this problem runs in O ( n 2 ) time. We consider two types of approximations, a residual approximation, which approximates the vertical height of the strip to within a given error bound ε r ⩾ 0 , and a quantile approximation, which approximates the fraction of points that lie within the strip to within a given error bound ε q ⩾ 0 . We present two randomized approximation algorithms for the LMS line estimator. The first is a conceptually simple quantile approximation algorithm, which given fixed q and ε q > 0 runs in O ( n log n ) time. The second is a practical algorithm, which can solve both types of approximation problems or be used as an exact algorithm. We prove that when used as a quantile approximation, this algorithm's expected running time is O ( n log 2 n ) . We present empirical evidence that the latter algorithm is quite efficient for a wide variety of input distributions, even when used as an exact algorithm.
ISSN:	0167-9473 1872-7352
DOI:	10.1016/j.csda.2006.08.033