Multiple Imputation: An Iterative Regression Imputation

Bintou Traore; A. Ismaila Adeleke

Bintou Traore Department of Mathematics, University of Lagos, Akoka, Nigeria.
A. Ismaila Adeleke Department of Actuarial Science and Insurance, University of Lagos, Lagos, Nigeria.

Keywords: Missing data, multiple imputation, sum of squares of residues, algorithms

Abstract

Multiple imputation (MI) is a commonly applied method of statistically handling missing data. It involves imputing missing values repeatedlyto account for the variability due to imputations. There are different techniques of MI that have proven to be effective and available in many statistical software packages. However, the main problem that arises when statistically handling missing data, namely, bias, still remains. Indeed, as multiple imputation techniques are simulation-based methods, estimates of a sample of fully complete data may substantially vary in every application using the same original data and the same implementation method. Therefore, the uncertainty is often under- or overestimated, exhibiting poor predictive capability. A new approach of MI based on regression method is presented. The proposed approach consists of constructing a possible lower and upper bound around the sum of square of residuals (SSE) that would have been obtained in a complete case (that is, if there were no missing data). Then, iteratively implement regression imputation (RI) to replace the missing values and compute a new SSE with fully completed data. If the new SSE does not fall within the constructed bounds, the RI method is repeated until the SSE estimated falls into those bounds.The SSEs of the prediction are used to assess the performance of the proposed approach compared to expectation-maximization (EM) imputation and multiple imputation by chained equations (MICE). The results indicate that the three methods work reasonably well in many situations, particularly when the amount of missingness is low and when data are missing at random (MAR) and missing completely at random (MCAR). However, when the proportion of missingness is severe and the data are missing not at random (MNAR), the proposed method performs better than MICE and EM algorithms.

References

[1] Allison P. D (2001). Missing Data. Sage University Papers Series on Quantitative Applications in the Social Sciences. 07-136. Thousand Oaks, CA: Sage.
[2] Allison, P. D. (2002). Missing data. Thousand Oaks, CA: Sage
[3] Carpenter J. R and Kenward MG (2013). Multiple Imputation and its Application. John Wiley and Sons Ltd, West Sussex.
[4] Azur, M. J., Stuart, E. A., Frangakis, C. and Leaf, P. J, (2011). Multiple imputation by chained equations: what is it and how does it work? International Journal of Methods in Psychiatric Research, 20 (1):40-9.
[5] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Association, 39(1), 1-38.
[6] Hippel, P. T. V (2018). How many imputations do you need? A two-stage calculation using a quadratic rule. Sociological Methods and Research , 1-20.
[7] King, G., Honaker, J., Joseph, A. and Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review , 95(1), 4969.
[8] Kleinke, K. (2018). Multiple imputation by predictive mean matching when sample size is small. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 14(1), 3-15.
[9] Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York: John Wiley and Sons.
[10] McKnight, E. P, McKnight K. M, Sidani S. and Figueredo A. J. (2007). Missing data: A gentle introduction. The Guilford Press A Division of Guilford Publications, Inc . 72 Spring Street, New York, NY 10012.
[11] Morris, T. P, White, I. R and Royston, P., (2014). Tuning multiple imputation by predictive mean matching and local residual draws. BMC Medical Research Methodology, 14(1), 75.
[12] Nakai, M. and Weiming, K. (2011). Review of the Methods for Handling Missing Data in Longitudinal Data Analysis. Int. Journal of Math. Analysis , 5(1), 1 - 13.
[13] O'Kelly, Michael and Ratitch. B (2014). Clinical Trials with Missing Data. United Kingdom: John Wiley and Sons, Ltd
[14] Oudshoorn, C. G. M., van Buuren, S. and van Rijckevorsel, J. L. A. (1999). Flexible multiple imputation by chained equations of the AVO-95 survey. TNO PreventieenGezondheid, TNO/PG 99.045.
[15] Raghunathan. T (2015). Missing data in practice. Chapman and Hall/CRC Press, Boca Raton, London, New York.
[16] Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: JohnWiley and Sons.
[17] Rubin, D.B. (1996). Multiple imputation after 18- years. Journal Statistical Associationo f the American , 91(434), 473-489.

[18] Schafer, J. L. and Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods , 7(2), 147177.
[19] Schafer, J. L. and Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst's perspective. Multivariate Behavioral Research, 33(4), 54571.
[20] Templ M., Kowarik A. and Filzmoser P. (2011). Iterative stepwise regression imputation using standard and robust methods. Computational Statistics and Data Analysis , 55 (10), 27932806.
[21] Van Buuren S (2012). Flexible Imputation of Missing Data. Chapman and Hall/CRC Press, Boca.
[22] White, I. R., Royston, P. and Wood, A. M. (2011). Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine , 30(4), 377399.

Multiple Imputation: An Iterative Regression Imputation

Abstract

References

Most read articles by the same author(s)