Volume 53 - Article 21 | Pages 629–660
Analysing migrant fertility using machine learning techniques: An application of random survival forest to longitudinal data from France
By Isaure Delaporte, Andrew Ibbetson, Hill Kulu
References
Adham, D., Abbasgholizadeh, N., and Abazari, M. (2017). Prognostic factors for survival in patients with gastric cancer using a random survival forest. Asian Pacific Journal of Cancer Prevention 18(1): 129.
Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation 9(7): 1545–1588.
Andersson, G. (2004). Childbearing after migration: Fertility patterns of foreign-born women in Sweden. International Migration Review 38(2): 747–774.
Andersson, G. and Scott, K. (2007). Childbearing dynamics of couples in a universalistic welfare state: The role of labor-market status, country of origin, and gender. Demographic Research 17(30): 897–938.
Arpino, B., Le Moglie, M., and Mencarini, L. (2021). What tears couples apart: A machine learning analysis of union dissolution in Germany. Demography 59(1): 161–186.
Baudin, T. (2015). Religion and fertility: The French connection. Demographic Research 32(13): 397–420.
Berghammer, C. (2009). Religious socialisation and fertility: Transition to third birth in the Netherlands/Socialisation religieuse et fécondité: L’arrivée du troisième enfant aux Pays-Bas. European Journal of Population/Revue européenne de Démographie 25: 297–324.
Best, K., Gilligan, J., Baroud, H., Carrico, A., Donato, K., and Mallick, B. (2022). Applying machine learning to social datasets: A study of migration in southwestern Bangladesh using random forests. Regional Environmental Change 22(52): 1–12.
Best, K.B., Gilligan, J.M., Baroud, H., Carrico, A.R., Donato, K.M., Ackerly, B.A., and Mallick, B. (2021). Random forest analysis of two household surveys can identify important predictors of migration in Bangladesh. Journal of Computational Social Science 4(1): 77–100.
Billari, F.C., Fürnkranz, J., and Prskawetz, A. (2006). Timing, sequencing, and quantum of life course events: A machine learning approach. European Journal of Population/Revue Européenne de Démographie 22(1): 37–65.
Breiman, L. (2001). Random forests. Machine Learning 45: 5–32.
Breiman, L., Friedman, J., Olshen, R.A., and Stone, C.J. (1984). Classification and regression trees. Belmont, CA: Thomson Wadsworth.
Cafri, G., Li, L., Paxton, E.W., and Fan, J. (2018). Predicting risk for adverse health events using random forest. Journal of Applied Statistics 45(12): 2279–2294.
Cleves, M., Gutierrez, M., Gould, W., and Marchenko, Y. (2010). An introduction to survival analysis using Stata. College Station: Stata Press.
De Rose, A. and Pallara, A. (1997). Survival trees: An alternative non-parametric multivariate technique for life history analysis. European Journal of Population/Revue européenne de Démographie 13(3): 223–241.
Delaporte, I. and Kulu, H. (2022). Interaction between childbearing and partnership trajectories among immigrants and their descendants in France: An application of multichannel sequence analysis. Population Studies 77(1): 55–70.
Dudoit, S., Shaffer, J.P., and Boldrick, J.C. (2003). Multiple hypothesis testing in microarray experiments. Statistical Science 18(1): 71–103.
Ehrlinger, J. (2016). ggRandomForests: Exploring random forest survival. arXiv:1612.08974.
Erman, J. (2022). Cohort, policy, and process: The implications for migrant fertility in West Germany. Demography 59(1): 221–246.
Fawagreh, K., Gaber, M.M., and Elyan, E. (2014). Random forests: from early developments to recent advancements. Systems Science & Control Engineering 2(1): 602–609.
Garip, F. (2020). What failure to predict life outcomes can teach us. Proceedings of the National Academy of Sciences 117(15): 8234–8235.
Hamidi, O., Tapak, M., Poorolajal, J., Amini, P., and Tapak, L. (2017). Application of random survival forest for competing risks in prediction of cumulative incidence function for progression to AIDS. Epidemiology, Biostatistics and Public Health 14(4).
Hanson, H.A., Martin, C., O’Neil, B., Leiser, C.L., Mayer, E.N., Smith, K.R., and Lowrance, W.T. (2019). The relative importance of race compared to health care and social factors in predicting prostate cancer mortality: A random forest approach. The Journal of Urology 202(6): 1209–1216.
Hays, J.J. and Guzzo, K.B. (2022). Does sibling composition in childhood contribute to adult fertility behaviors? Journal of Marriage and Family 84(1): 53–79.
Ho, T.K. (ed.) (1995). Random decision forests. Montreal, QC: IEEE (Proceedings of 3rd international conference on document analysis and recognition).
Ho, T.K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8): 832–844.
Hsich, E., Gorodeski, E.Z., Blackstone, E.H., Ishwaran, H., and Lauer, M.S. (2011). Identifying important risk factors for survival in patient with systolic heart failure using random survival forests. Circulation: Cardiovascular Quality and Outcomes 4(1): 39–45.
Ishwaran, H. (2007). Variable importance in binary regression trees and forests. Electronic Journal of Statistics 1: 519–537.
Ishwaran, H., Gerds, T.A., Kogalur, U.B., Moore, R.D., Gange, S.J., and Lau, B.M. (2014). Random survival forests for competing risks. Biostatistics 15(4): 757–773.
Ishwaran, H. and Kogalur, U.B. (2014). RandomForestSRC: Random forests for survival, regression and classification (RF-SRC). R package version (0).
Ishwaran, H. and Kogalur, U.B. (2008). RandomSurvivalForest 3.2. 2. R package.
Ishwaran, H., Kogalur, U.B., Blackstone, E.H., and Lauer, M.S. (2008). Random survival forests. Annals of Applied Statistics 2(3): 841–860.
Ishwaran, H., Kogalur, U.B., Chen, X., and Minn, A.J. (2011). Random survival forests for high‐dimensional data. Statistical Analysis and Data Mining: The ASA Data Science Journal 4(1): 115–132.
Ishwaran, H., Kogalur, U.B., Gorodeski, E.Z., Minn, A.J., and Lauer, M.S. (2010). High-dimensional variable selection for survival data. Journal of the American Statistical Association 105(489): 205–217.
Jiang, S. (2019). Prediction based on Random Survival Forest. American Journal of Biomedical Science and Research 6(2).
Kashyap, R., Rinderknecht, R.G., Akbaritabar, A., Alburez-Gutierrez, D., Gil-Clavel, S., Grow, A., Kim, J., Leasure, D.R., Lohmann, S., Negraia, D.V., Perrotta, D., Rampazzo, F., Tsai, C.J., Verhagen, M.D., Zagheni, E., and Zhao, X. (2022). Digital and computational demography. SocArXiv.
Keramati, A., Lu, P., Iranitalab, A., Pan, D., and Huang, Y. (2020). A crash severity analysis at highway-rail grade crossings: The random survival forest method. Accident Analysis and Prevention 144: 105683.
Krapf, S. and Wolf, K. (2016). Persisting differences or adaptation to German fertility patterns? First and second birth behavior of the 1.5 and second generation Turkish migrants in Germany. In: Hank, K. and Kreyenfeld, M. (eds.). Social Demography – Forschung an der Schnittstelle von Soziologie und Demographie. Wiesbaden: Springer VS: 137–164.
Kulu, H. and González-Ferrer, A. (2014). Family dynamics among immigrants and their descendants in Europe: Current research and opportunities. European Journal of Population 30(4): 411–435.
Kulu, H. and Hannemann, T. (2016). Why does fertility remain high among certain UK-born ethnic minority women? Demographic Research 35(49): 1441–1488.
Kulu, H., Hannemann, T., Pailhé, A., Neels, K., Krapf, S., González-Ferrer, A., and Andersson, G. (2017). Fertility by birth order among the descendants of immigrants in selected European countries. Population and Development Review 43(1): 31–60.
Kulu, H. and Milewski, N. (2007). Family change and migration in the life course: An introduction. Demographic Research 17(19): 567–590.
Kulu, H., Milewski, N., Hannemann, T., and Mikolai, J. (2019). A decade of life-course research on fertility of immigrants and their descendants in Europe. Demographic Research 40(46): 1345–1374.
Kulu, H. and T, Hannemann (2016). Introduction to research on immigrant and ethnic minority families in Europe. Demographic Research 35(2): 31–46.
Liaw, A. and Wiener, M. (2002). Classification and regression by randomForest. R news 2(3): 18–22.
Loh, W.Y. (2011). Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1(1): 14–23.
Miao, F., Cai, Y.P., Zhang, Y.T., and Li, C.Y. (eds.) (2015). Is random survival forest an alternative to Cox proportional model on predicting cardiovascular disease? Cham: Springer (6TH European conference of the international federation for medical and biological engineering).
Milewski, N. (2007). First child of immigrant workers and their descendants in West Germany: Interrelation of events, disruption, or adaptation? Demographic Research 17(29): 859–896.
Milewski, N. (2010). Immigrant fertility in West Germany: Is there a socialization effect in transitions to second and third births? European Journal of Population/Revue européenne de Démographie 26(3): 297–323.
Mussino, E. and Cantalini, S. (2022). Influences of origin and destination on migrant fertility in Europe. Population, Space and Place 28(7): 2567.
Mussino, E. and Strozza, S. (2012). Does citizenship still matter? Second birth risks of migrants from Albania, Morocco, and Romania in Italy. European Journal of Population/Revue européenne de Démographie 28(3): 269–302.
Pailhé, A. (2017). The convergence of second-generation immigrants’ fertility patterns in France: The role of sociocultural distance between parents’ and host country. Demographic Research 36(45): 1361–1398.
Rezaei, M., Tapak, L., Alimohammadian, M., Sadjadi, A., and Yaseri, M. (2020). Review of Random Survival Forest method. Journal of Biostatistics and Epidemiology 6(1): 59–68.
Rojas, E.A.G., Bernardi, L., and Schmid, F. (2018). First and second births among immigrants and their descendants in Switzerland. Demographic Research 38(11): 247–286.
Salganik, M.J., Lundberg, I., Kindel, A.T., Ahearn, C.E., Al-Ghoneim, K., Almaatouq, A., and McLanahan, S. (2020). Measuring the predictability of life outcomes with a scientific mass collaboration. Proceedings of the National Academy of Sciences 117(15): 8398–8403.
Scheffner, I., Gietzelt, M., Abeling, T., Marschollek, M., and Gwinner, W. (2020). Patient survival after kidney transplantation: Important role of graft-sustaining factors as determined by predictive modeling using random survival forest analysis. Transplantation 104(5): 1095–1107.
Spooner, A., Chen, E., Sowmya, A., Sachdev, P., Kochan, N.A., Trollor, J., and Brodaty, H. (2020). A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Scientific Reports 10(1): 20410.
Taylor, J.M.G. (2011). Random survival forests. Journal of Thoracic Oncology 6(12): 1974–1975.
Wang, H. and Li, G. (2017). A selective review on random survival forests for high dimensional data. Quantitative Bio-Science 36(2): 85.
Wang, P., Li, Y., and Reddy, C.K. (2019). Machine learning for survival analysis: A survey. ACM Computing Surveys 51(6): 1–36.
Whetten, A.B., Stevens, J.R., and Cann, D. (2021). The implementation of random survival forests in conflict management data: An examination of power sharing and third party mediation in post-conflict countries. PloS ONE 16(5): e0250963.
Wilson, B. (2020). Understanding how immigrant fertility differentials vary over the reproductive life course. European Journal of Population 36(3): 465–498.
Witten, D.M. and Tibshirani, R. (2010). Survival analysis with high-dimensional covariates. Statistical Methods in Medical Research 19(1): 29–51.
Ziegler, A. and König, I.R. (2014). Mining data with random forests: Current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4(1): 55–63.