Penalized Cox Regression Models for High-Dimensional Survival Analysis: An Application to Breast Cancer Microarray Data
Abstract
Survival analysis provides a statistical framework for examining the relationship between time to event outcomes and explanatory variables. The Cox proportional hazards (Cox PH) model is widely used for this purpose, but its performance deteriorates in high-dimensional settings, where the number of predictors is large and strong collinearity is present. Penalized Cox regression methods have been developed to address these limitations by incorporating regularization into the estimation process. This study applies penalized Cox regression models to breast cancer microarray survival data obtained from the Gene Expression Omnibus dataset GSE20685, which contains gene expression measurements for 54,682 genes across 327 patients. The aim of this study is to fit and compare Ridge, LASSO, and Elastic Net Cox models with the classical Cox PH model. Model performance is evaluated using root mean square error (RMSE), with emphasis placed on test set results to assess predictive generalization. The results show that penalized Cox regression models consistently outperform the Cox PH model on the test set across all evaluated predictor dimensions. While the Cox PH model fails to produce reliable predictions when the number of predictors is large, the penalized models remain stable and effective. Among the penalized approaches, Ridge regression demonstrates the most robust performance in ultra-high-dimensional settings, whereas LASSO and Elastic Net provide competitive performance at lower and moderate dimensional levels through feature selection. These findings highlight the importance of regularization for survival modelling in high-dimensional genomic data and demonstrate that penalized Cox regression offers a more reliable alternative to the classical Cox PH model for microarray-based survival prediction.
References
Van't Veer, L. J., Dai, H., Van De Vijver, M. J., He, Y. D., Hart, A. A., Mao, M., & Friend, S. H. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415(6871), 530-536.
Van De Vijver, M. J., He, Y. D., Van't Veer, L. J., Dai, H., Hart, A. A., Voskuil, D. W., & Bernards, R. (2002). A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine, 347(25), 1999-2009.
Emmert-Streib, F., & Dehmer, M. (2019). Introduction to survival analysis in practice. Machine Learning and Knowledge Extraction, 1(3), 1013-1038.
Wang, P., Li, Y., & Reddy, C. K. (2019). Machine learning for survival analysis: A survey. ACM Computing Surveys (CSUR), 51(6), 1-36.
Sparapani, R. A., Logan, B. R., McCulloch, R. E., & Laud, P. W. (2016). Nonparametric survival analysis using Bayesian additive regression trees (BART). Statistics in Medicine, 35(16), 2741-2753.
Okolie, F. A., Fagbemigun, B. O., & Samson, O. J. (2025). Microarray Survival Analysis of Five Cancer Gene Sequences and its Application using Bayesian Additive Regression Trees (BART). Sri Lankan Journal of Applied Statistics, 26(3).
Mittal, S., Madigan, D., Burd, R. S., & Suchard, M. A. (2014). High-dimensional, massive sample-size Cox proportional hazards regression for survival analysis. Biostatistics, 15(2), 207-221.
Verweij, P. J., & Van Houwelingen, H. C. (1994). Penalized likelihood in Cox regression. Statistics in Medicine, 13(2-24), 2427-246.
Van Wieringen, W. N., Kun, D., Hampel, R., & Boulesteix, A. L. (2009). Survival prediction using gene expression data: a review and comparison. Computational Statistics & Data Analysis, 53(5), 1590-1603.
Goeman, J. J. (2010). L1 penalized estimation in the Cox proportional hazards model. Biometrical Journal, 52(1), 70-84.
Simon, N., Friedman, J. H., Hastie, T., & Tibshirani, R. (2011). Regularization paths for Cox's proportional hazards model via coordinate descent. Journal of Statistical Software, 39, 1-13.
Gui, J., & Li, H. (2005). Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, 21(13), 3001-3008.
Dang, X., Huang, S., & Qian, X. (2021). Penalized Cox's proportional hazards model for high-dimensional survival data with grouped predictors. Statistics and Computing, 31(6), 77.
McGough, S. F., Incerti, D., Lyalina, S., Copping, R., Narasimhan, B., & Tibshirani, R. (2021). Penalized regression for left-truncated and right-censored survival data. Statistics in Medicine, 40(25), 5487-5500.
Dai, B., & Breheny, P. (2024). Cross-validation approaches for penalized Cox regression. Statistical Methods in Medical Research, 33(4), 702-715.
Peyraud, E., Jacques, J., Metzler, G., Faivre, I., & Dousse, M. (2024). Mixture of Cox regression models with L1-penalization for modeling patients' survival time after liver transplantation. Statistics in Medicine.
Eslami, Z., Norouzirad, M., & Arashi, M. (2021). Penalized estimators in Cox regression model. Andishe-ye Amari, 25(1), 53-67.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58(1), 267-288.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the Elastic Net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.
National Center for Biotechnology Information (2012). Gene Expression Omnibus dataset GSE20685: Breast cancer gene expression profiles. NCBI Gene Expression Omnibus. Available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20685
Copyright (c) 2026 Author

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, adaptation, and reproduction in any medium, provided that the original work is properly cited.
