Using independent components for estimating logistic regression with high-dimensional multicollinear data: Simulation and application

Authors

  • Sana Ali M.Phil Scholar, Department of Statistics Bahauddin Zakariya University, Multan, Punjab, Pakistan Author
  • Saima Afzal Assistant Professor, Department of Statistics Bahauddin Zakariya University, Multan, Punjab, Pakistan Author
  • Nasir Saleem Ph.D Scholar, Department of Statistics Bahauddin Zakariya University, Multan, Punjab, Pakistan Author

DOI:

https://doi.org/10.71085/sss.04.02.271

Keywords:

Dimension reduction, Independent components, Logistic regression, Multicollinearity, Breast cancer

Abstract

The logistic regression model is used to predict a binary response variable in terms of a set of explicative ones. In the presence of multicollinearity among predictor, the estimation of the model parameters is not very accurate and their interpretation in terms of odds ratios may be inaccurate. Another important problem is that usually a large number of predictors are required to explain the response. In order to improve the estimation of the logistic model parameters under multicollinearity and to reduce the dimensions of the data with continuous covariates, it is proposed to use as covariates of the logistic model a reduced set of optimum independent components of the original predictors. Breast cancer data is used as real data set. The performance of the proposed independent component logistic regression model is analyzed by developing a simulation study where different methods for selecting the optimum independent components are compared. We built up a simulation study to illustrate the performance of the model with different regressors, sample size, and correlation among the regressors. Independent component logistic regression compared with principal component logistic regression model and independent component logistic regression gives better results. 

Downloads

Download data is not yet available.

References

Agyekum, G. O., Adarkwa, S. A., & Kusi, R. Y. (2023). Impact of sample size on multicollinearity with high dimensional data in logistic regression analysis. International Journal of Innovation and Development, 1(3).

Agyekum, G. O., Adarkwa, S. A., & Kusi, R. Y. (2023). Impact of sample size on multicollinearity with high dimensional data in logistic regression analysis. International Journal of Innovation and Development, 1(3).

Aucott, L. S., Garthwaite, P. H., &Currall, J. (2000). Regression methods for high dimensional multicollinear data. Communications in Statistics-Simulation and Computation, 29(4), 1021-1037.

Aguilera, A. M., Escabias, M., &Valderrama, M. J. (2006). Using principal components for estimating logistic regression with high-dimensional multicollinear data. Computational Statistics & Data Analysis, 50(8), 1905-1924.

Aguilera-Morillo, M. C., Aguilera, A. M., Escabias, M., &Valderrama, M. J. (2013). Penalized spline approaches for functional logit regression. Test, 22(2), 251-277.

Bastien, P., Vinzi, V. E., &Tenenhaus, M. (2005). PLS generalised linear regression. Computational Statistics & data analysis, 48(1), 17-46.

Escabias, M., Aguilera, A. M., &Valderrama, M. J. (2005). Modeling environmental data by functional principal component logistic regression. Environmetrics: The official journal of the International Environmetrics Society, 16(1), 95-107.

Escabias, M., Aguilera, A. M., &Valderrama, M. J. (2004). Principal component estimation of functional logistic regression: discussion of two different approaches. Journal of Nonparametric Statistics, 16(3-4), 365-384.

Escabias, M., Aguilera, A. M., &Valderrama, M. J. (2007). Functional PLS logit regression model. Computational Statistics & Data Analysis, 51(10), 4891-4902.

Hosmer, D. W., Hosmer, T., Le Cessie, S., &Lemeshow, S. (1997). A comparison of goodness‐of‐fit tests for the logistic regression model. Statistics in medicine, 16(9), 965-980.

Hubert, M. H., &Wijekoon, P. (2006). Improvement of the Liu estimator in linear regression model. Statistical Papers, 47(3), 471.

Hosmer, D.W., &Lemeshow, S., (1989). Applied logistic regression. Wiley, New York

Le Cessie, S., & Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(1), 191-201.

Månsson, K., &Shukur, G. (2011). On ridge parameters in logistic regression. Communications in Statistics-Theory and Methods, 40(18), 3366-3381.

Newhouse, J. P., & Oman, S. D. (1971). An evaluation of ridge estimators. Rand Corporation. P-716-PR.

Prentice, R. L., &Pyke, R. (1979). Logistic disease incidence models and case-control studies. Biometrika, 66(3), 403-411.

Pulkstenis, E., & Robinson, T. J. (2002). Two goodness‐of‐fit tests for logistic regression models with continuous covariates. Statistics in medicine, 21(1), 79-93.

Ryan, T.P., (1997). Modern regression methods.Wiley, New York.

Schaefer, R. L., Roi, L. D., & Wolfe, R. A. (1984). A ridge logistic estimator. Communications in Statistics-Theory and Methods, 13(1), 99-113.

Steyerberg, E. W., Eijkemans, M. J. C., &Habbema, J. D. F. (2001). Application of shrinkage techniques in logistic regression analysis: a case study. StatisticaNeerlandica, 55(1), 76-88.

Zhou, C., Wang, L., Zhang, Q., & Wei, X. (2014). Face recognition based on PCA and logistic regression analysis. Optik-International Journal for Light and Electron Optics, 125(20), 5916-5919.

Downloads

Published

2025-04-29

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Issue

Section

Articles

Categories

How to Cite

Ali, S. . ., Afzal, S. . ., & Saleem, N. . . (2025). Using independent components for estimating logistic regression with high-dimensional multicollinear data: Simulation and application. Social Sciences Spectrum, 4(2), 288-316. https://doi.org/10.71085/sss.04.02.271