IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY

Onur Toka, Meral Çetin
1.423 427

Abstract


Missing data and imputation methods are studied in many disciplines.  However, the methods have some different properties and some constraints according to missingness mechanism. In this paper, we examine some deletion and imputation methods’ behaviors under the presence of outliers. We obtain a mean vector and covariance matrix with missing and contaminated data and compare the results of imputation methods using mean square errors. In second application, we use the regression data and examine the effect of missingness on regression model’s parameters. We compare the imputed values with real values and explain the results of classical and robust imputation methods. 


Keywords


ER Algorithm, Missing data, Outliers, Robust imputation, Sequential imputation

Full Text:

PDF

References


Afifi, A. A. and Elashoff, R. M., “Missing observations in multivariate statistics I. Review of the literature, Journal of the American Statistical Association”, 61:595-605, (1966).

Allison, P. D. “Missing data: Quantitative applications in the social sciences. British Journal of Mathematical and Statistical Psychology”, 55(1): 193-196, (2002).

Beale, E. M. L., Little, R. J. A. “Missing values in multivariate analysis, Journal of the Royal Statistical Society, Series B”, 37:129-145, (1975).

Branden, K., Verboven, V. S., “Robust data imputation, Computational Biology and Chemistry”, 33(1): 7-13, (2009).

Cheng, T. S., Victoria-Feser, M. P. “High-breakdown estimation of multivariate mean and covariance with missing observations”, British J. Math. Statist. Psych., 5: 317–335, (2002).

Dempster, A. P., Laird, N. M., Rubin, D. B., “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society, Series B, 39: 1-38, (1977).

Dempster, A. P., Rubin, D. B. 1983, “Introduction of incomplete data in sample surveys (Volume 2)” Theory and Bibliography (W. G. Madow, I. Olkin, D.B. Rubin eds.)”, 3-10, New York.

Graham, J.W., Missing Data: Analysis and Design, Springer New York, 324 p., (2014).

Hampel, F. R. “The influence curve and its role in robust estimation”, The Annals of Statistics, 69: 383–393, (1974).

Hawkins, D.M., Bradu, D. and Kass, G.V. “Location of several outliers in multiple regression data using elemental sets”. Technometrics, 26: 197–208. (1984).

Hubert, M., Rousseeuw, P. J. and Vanden Branden, K., “ROBPCA: a new approach to robust principal component analysis”, Technometrics, 47(1): 64-79, (2005).

Ibrahim, J.G. and Molenberghs, G., “Missing Data Methods in Longitudinal Studies: A Review, Test (Madrid, Spain)”, 18.1:1–43, (2009).

Little, R. J. A., Smith, P. J., “Editing and imputing for quantitative survey data”, Journal of the American Statistical Association 82:58-68, (1987).

Little, R. J. A., Rubin, D. B., Statistical Analysis with Missing Data (2nd ed.), Hoboken, N. Jersey, Wiley, (2002).

Lynch, S.M. and Bron, J.S., “Handling Missing Data in Social Research”, Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences, (2015).

O'Kelly, M. and Ratitch, B., “Clinical Trials with Missing Data: A Guide for Practitioners”, John Wiley & Sons, (2014).

Raghunathan, T., “Missing Data Analysis in Practice”, Chapman & Hall CRC Interdisciplinary Statistics, (2015).

Rubin, D. B. “Inference and missing data”, Biometrika, 63:581–592, (1976).

Schafer, J. L., “Analysis of incomplete multivariate data”, Boca Raton, FL: Chapman & Hall, (1997).

Stanimirova, I. and Walczak, W., “Classification of data with missing elements and outliers”, Talanta, 76, 602-609, (2008).

Toka, O., Kayıp Veri Durumunda Sağlam Kestirim, H.Ü. Fen Bilimleri Enstitüsü Yüksek Lisans Tezi, Ankara, (2012).

Verboven, S., Branden, K.V. and Goos, P. “Sequential imputation for missing values”, Computational Biology and Chemistry, 31:320-327, (2007).

Wang, J., Data Mining: Opportunities and Challenges, Idea Group Inc (IGI), (2003).

Wilks, S. S., “Moments and distributions of estimates of population parameters from fragmentary samples”, The Annals of Mathematical Statistics, 3:163–195, (1932).

Zhou, X., Zhou, H. C., Lui, D. and Ding, X. Applied Missing Data Analysis in the Health Sciences, John Wiley & Sons, (2014).