Journal of Zhejiang University SCIENCE A
ISSN 1673-565X(Print), 1862-1775(Online), Monthly
2009 Vol. 10 No. 6 p. 909~921
On-line Access Date: June 2, 2009Outlier detection by means of robust regression estimators for use in engineering science
Serif HEKIMOGLU†‡1, R. Cuneyt ERENOGLU†1, Jan KALINA2
(1Department of Geodesy and Photogrammetry Engineering, Yildiz Technical University, Istanbul 34349, Turkey)
(2Department of Probability and Mathematical Statistics, Charles University, Praha 18675, Czech Republic)
‡ Corresponding Author
†E-mail: hekim, ceren@yildiz.edu.tr
Received Feb. 29, 2008; revision accepted Nov. 6, 2008; Crosschecked Dec. 29, 2008
Abstract: This study compares the ability of different robust regression estimators to detect and classify outliers. Well-known estimators with high breakdown points were compared using simulated data. Mean success rates (MSR) were computed and used as comparison criteria. The results showed that the least median of squares (LMS) and least trimmed squares (LTS) were the most successful methods for data that included leverage points, masking and swamping effects or critical and concentrated outliers. We recommend using LMS and LTS as diagnostic tools to classify outliers, because they remain robust even when applied to models that are heavily contaminated or that have a complicated structure of outliers.
Key words: Linear regression, Outlier, Mean success rate (MSR), Leverage point, Least median of squares (LMS), Least trimmed squares (LTS)
doi:10.1631/jzus.A0820140 CLC number: O21
References:
[1] Barnett, V., Lewis, T., 1994. Outliers in Statistical Data (3rd Ed.). John Wiley and Sons, New York.
[2] Chen, C., 2002. Robust Regression and Outlier Detection with the ROBUSTREG Procedure. SUGI Paper No.265-27. SAS Institute, Cary, NC.
[3] Daniel, C., Wood, F.S., 1971. Fitting Equations to Data. Wiley, New York.
[4] Davies, P.L., 1993. Aspects of robust linear regression. Ann. Stat., 21(4):1843-1899.
[5] Davies, P.L., Gather, U., 2005. Breakdown and groups with discussion and rejoinder. Ann. Stat., 33(3):977-1035.
[6] Donoho, D.L., 1982. Breakdown Properties of Multivariate Location Estimators. PhD Qualifying Paper, Harvard University, Boston.
[7] Donoho, D.L., Huber, P.J., 1983. The Notion of Breakdown Point. In: Bickel, P.J., Doksum, K., Hodges, J.L.J. (Eds.), A Festschrift for Erich L. Lehmann. Wadsworth, Belmont, p.157-184.
[8] Gather, U., Hilker, T., 1997. A note on Tyler’s modification of the MAD for the Stahel-Donoho estimator. Ann. Stat., 25(5):2024-2026.
[9] Hadi, A.S., Simonoff, J.S., 1993. Procedures for the identification of multiple outliers in linear models. J. Am. Stat. Assoc., 88(424):1264-1272.
[10] Hampel, F.R., 1968. Contributions to the Theory of Robust Estimation. PhD Thesis, University of California, Berkeley.
[11] Hampel, F.R., 1971. A general qualitative definition of robustness. Ann. Math. Stat., 42(6):1887-1896.
[12] Hampel, F.R., 1975. Beyond location parameters: robust concepts and methods (with discussion). Bull. Inst. Int. Stat., 46:375-391.
[13] Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.R., Shatel, W.A., 1986. Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.
[14] Hekimoglu, S., 1997. Finite sample breakdown points of outlier detection procedures. ASCE J. Surv. Eng., 123(1): 15-31.
[15] Hekimoglu, S., 2005. Do robust methods identify outliers more reliably than conventional test for outlier? Zeitschrift für Vermessungwesen, 3:174-180.
[16] Hekimoglu, S., Koch, K.R., 1999. How Can Reliability of the Robust Methods Be Measured? In: Altan, M.O., Gründig, L. (Eds.), Third Turkish-German Joint Geodetic Days, 1:179-196.
[17] Hekimoglu, S., Erenoglu, R.C., 2005. Estimation of Parameters for Linear Regression Using Median Estimator. Int. Conf. on Robust Statistics, University of Jyvaskyla, Finland, p.26.
[18] Hekimoglu, S., Erenoglu, R.C., 2007. Effect of heteroscedasticity and heterogeneousness on outlier detection for geodetic networks. J. Geod., 81(2):137-148.
[19] Huber, P.J., 1981. Robust Statistics. John Wiley and Sons, New York.
[20] Kamgar-Parsi, B., Netanyahu, N.S., 1989. A nonparametric method for fitting a straight line to a noisy image. IEEE Trans. Pattern Anal. Mach. Intell., 11(9):998-1001.
[21] Lopuhaa, H.P., Rousseeuw, P.J., 1991. Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Stat., 19(1):229-248.
[22] Rousseeuw, P.J., 1984. Least median of squares regression. J. Am. Stat. Assoc., 79(388):871-880.
[23] Rousseeuw, P.J., 1985. Multivariate Estimation with High Breakdown Point. In: Grossman, W., Pflug, G., Vincze, I., Werz, W. (Eds.), Mathematical Statistics and Applications. Reidel, Dordrecht, p.283-297.
[24] Rousseeuw, P.J., Leroy, A.M., 1987. Robust Regression and Outlier Detection. John Wiley and Sons, New York.
[25] Sen, P.K., 1968. Estimates of the regression coefficient based on Kendall’s tau. J. Am. Stat. Assoc., 63(324):1379-1389.
[26] Shevlyakov, G.L., Vilchevski, N.O., 2001. Robustness in Data Analysis: Criteria and Methods. VSP International Science Publishers, Utrecht.
[27] Siegel, A.F., 1982. Robust regression using repeated medians. Biometrika, 69(1):242-244.
[28] Stahel, W.A., 1981. Breakdown of Covariance Estimators. Research Rep. 31, Fachgruppe für Statistik, ETH, Zurich.
[29] Staudte, R.G., Sheather, S.J., 1990. Robust Estimation and Testing. Wiley, New York.
[30] Stromberg, A.J., 1993. Computing the exact least median of squares estimate and stability diagnostics in multiple linear regression. SIAM J. Sci. Comput., 14(6):1289-1299.
[31] Theil, H., 1950. A rank-invariant method of linear and polynomial regression analysis. Nederlandse Akademie Wetenchappen Series A, 53:386-392.