Adult mortality among second-generation immigrants in France: Results from a nationally representative record linkage study

BACKGROUND France has a large population of second-generation immigrants (i.e., native-born children of immigrants) who are known to experience important socioeconomic disparities by country of origin. The extent to which they also experience disparities in mortality, however, has not been previously examined. METHODS We used a nationally representative sample of individuals 18 to 64 years old in 1999 with mortality follow-up via linked death records until 2010. We compared mortality levels for second-generation immigrants with their first-generation counterparts and with the reference (neither first- nor second-generation) population using mortality hazard ratios as well as probabilities of dying between age 18 and 65. We also adjusted hazard ratios using educational attainment reported at baseline. RESULTS We found a large amount of excess mortality among second-generation males of North African origin compared to the reference population with no migrant background. This excess mortality was not present among second-generation males of southern European origin, for whom we instead found a mortality advantage, nor among North African–origin males of the first-generation. This excess mortality remained large and significant after adjusting for educational attainment. CONTRIBUTION In these first estimates of mortality among second-generation immigrants in France, males of North African origin stood out as a subgroup experiencing a large amount of excess mortality. This finding adds a public health dimension to the various disadvantages already documented for this subgroup. Overall, our results highlight the importance of second-generation status as a significant and previously unknown source of health disparity in France.

This appendix presents supplementary information for assessing the robustness of the paper's results to various data quality issues (Sections 1-5) and model specifications (Sections 6-7).
The overall conclusion of this sensitivity analysis is that despite various data quality issues inherent to the Echantillon Longitudinal de Mortalité (ELM), this data set appears as a reliable source for mortality estimation (Section 5). Moreover, patterns of case exclusion resulting from missing data and other issues suggest that the paper's main result is conservative, that is, that the excess mortality we find among secondgeneration (G2) immigrant males of North African origin likely underestimates the true amount of excess mortality for this group (Sections 3-4).

Response rate in the EHF
The Etude de l'Histoire Familiale (EHF) was conducted at the same time as the 1999 census with an average sampling rate of 1/170 for males and 1/110 for females. (Females were oversampled as they were the focus of many of the planned analyses in the EHF.) Eligible individuals were provided with the EHF questionnaire along with the usual census questionnaire. The overall response rate in the EHF (i.e., the proportion of EHF-eligible individuals who completed the census form and also answered the EHF questionnaire) was 79.4%, a response rate that is on par with other large sample surveys commonly used in this literature. An analysis of the probability of response using census variables as explanatory variables shows a lower response among individuals ages 85 and older, individuals who were unmarried, individuals born abroad, or individuals who did not report their level of education in the census (Lefèvre and Filhon 2005).
In order to address these nonresponses, post-stratification weights were provided in the EHF data set, based on the following seven variables: sex, age, education, country of birth, date of arrival in France, region of residence, and size of the place of residence. The results of our paper are based on the unweighted data, because as we show below, our final sample differs from the EHF sample since it excludes individuals for whom survival status is unknown (see Section 3 in this appendix). Nonetheless, an analysis comparing hazard ratios in unweighted vs. weighted models (Tables A-1 and A-2) shows that results for G2 subgroups are robust to the use of post-stratification weights. This suggests that nonresponses in the EHF are unlikely to be the main explanation for the paper's results. (1) 'reference' refers to individuals born in metropolitan France to two parents born in metropolitan France; (2) significance levels at ** p < 0.01, * p < 0.05, and † p< 0.10.

Missing data in the EHF sample
Our results exclude EHF individuals who could not be allocated to a specific population group due to missing values on the relevant variables (place of birth, parental place of birth, languages, and nationality at birth). Figures A-1 and A-2 show the process of subgroup attribution and the extent of missing data at each step. In these figures, the missing cases in a given box correspond to individuals who had missing values on variables needed for the attribution of categories at the same level. Among males, a total of 14,405 individuals, or 12.1% of the total EHF sample of 119,473 individuals, had missing information on variables necessary for subgroup attribution. In terms of proportion missing among non-missing individuals in the previous level, the largest percentage is for individuals born in France who didn't report the necessary information for allocation in a specific reference or G2 category (9.6%).
The amount of missing data was slightly lower for females. Out of a total of 183,814 females in the EHF, 20,064 (10.9%) had missing information for subgroup attribution. Among females born in France, 8.3% could not be attributed to a specific reference or G2 category. Note: G1 = first generation; G2 = second generation; G2m = mixed second generation. N.Afr = North African origin: S.Eur = southern European origin. G1/G2 N.Afr Def 1: first-or second-generation immigrants of North African origin, based on country of birth information only. G1/G2 N.Afr Def 2: first-or second-generation immigrants of North African origin, based on country of birth, language, and nationality information (see text for details).  Table A-3 shows how these native-born individuals with missing information for G2 vs. reference subgroup attribution were distributed according to the type of missing information (paternal and/or maternal place of birth). While the majority of these missing cases had missing place of birth for both parents, a large proportion of these missing cases declared one parent born in France (36.9% for males and 41.4% for females). While we cannot attribute these individuals to a specific subgroup category, we do know that they do not belong to the two main G2 categories of interest in the paper (G2 southern Europe and G2 North Africa). However, they may or may not belong to the reference category (born in France to two parents born in France). In order to examine the robustness of our results to this specific type of missing information (one parent France, other parent missing), we estimated our main model using two extreme scenarios: (1) a scenario in which none of these individuals belong to the reference category; (2) a scenario in which all of these individuals belong the reference category.
The first scenario is examined in a model treating native-born individuals with one parent born in France and the other parent with missing country of birth (one parent France, one unknown) as a separate category. Results from this model are shown in Table A-4. The second scenario is examined in a version of the model that includes all these individuals in the reference category. Results are shown in Table A -5. Results show that while these G2 missing cases have higher mortality than the reference category (Table A-4), the hazard ratios for G2 subgroups of interest are robust to these different model specifications (Tables A-4 and A-5). In particular, the excess mortality among G2 North African males and the mortality advantage among G2 southern European males discussed in the paper are not affected by the choice of scenario for handling these G2 missing cases. While this robustness test does not address all the G2 missing cases, it addresses a substantial portion of them. The remaining cases with missing parental place of birth not addressed by this robustness test are 6,251 for males (6.0% of all native-born males) and 7,806 for females (4.9% of all native-born females).  Note: (1) 'reference' refers to individuals born in metropolitan France to two parents born in metropolitan France; (2) significance levels at ** p < 0.01, * p < 0.05, and † p< 0.10. Note: (1) 'reference' refers to individuals born in metropolitan France to two parents born in metropolitan France; (2) significance levels at ** p < 0.01, * p < 0.05, and † p< 0.10.

Missing survival status among EHF individuals
The results presented in the paper are based on the ELM data set, which includes only those EHF individuals who could be matched with the RNIPP (National Directory for the Identification of Natural Persons), as explained in the paper. Individuals who could not be matched with the RNIPP had an unknown survival status and were thus excluded from the final sample. As explained in the paper, the matching procedure was based on information on first and last names as well as date of birth. The overall proportion of the EHF individuals who could be matched with the RNIPP is 87.3% for males and 76.3% for females. Tables A-6 and A-7 show proportions matched for each of the subgroups identified in the above flowcharts. Results for males show that the proportions matched were highest for the reference population (90.9%) and lowest for the foreign-born groups (68.2% for G1 southern Europe and 61.0% for G1 North Africa). Second-generation immigrant groups were somewhere in between, with 82.8% matched for G2 southern Europe and 75.5% for G2 North Africa. Repatriates, whether G1 or G2, had proportions matched that were close to the reference population, which is consistent with the expectation that a large majority of repatriates had French last names (vs. Arabic last names for the North African immigrants) that were likely more easily matched with the RNIPP.
Proportions matched among females were generally lower than for males, presumably because of changes in last names after marriage, making it more difficult to match female respondents with the RNIPP. Note: G1 = first generation; G2 = second generation; G2 mixed = mixed second generation. G1/G2 North Africa Definition 1: first-or second-generation immigrants of North African origin, based on country of birth information only. G1/G2 North Africa Definition 2: first-or second-generation immigrants of North African origin, based on country of birth, language, and nationality information (see text for details). In our analysis, all the individuals who were unmatched with the RNIPP were removed from the analysis as their survival status could not be ascertained. In order to assess the impact of these matching failures on our mortality estimates, we examined which background variables were associated with the probability of being unmatched using multivariate logistic regression. Results (Table A-8) confirm the matching  patterns by population subgroup observed in Tables A-6 and A-7. Additionally, they show that individuals with lower education or who were unemployed were more likely to be unmatched. Being married was associated with a higher likelihood of being unmatched for females but not for males, which is consistent with the expectation that changes in last name make matching more problematic. Overall, Table A-8 suggests an overall downward bias in mortality estimates in the ELM due to selective exclusion of individuals from lower SES categories. Given the lower proportions matched among G2 North Africa by comparison with the reference category, the downward bias is likely to be larger for this group, suggesting that the excess mortality we find among G2 North African-origin males underestimates the true amount of excess mortality for this group. The conclusion that the ELM produces conservative estimates of the true amount of excess mortality for G2 North African-origin males is further supported by a comparison of education distributions for all individuals (whether matched or unmatched in the RNIPP) vs. the education distributions for matched individuals only (i.e., those on the basis of whom mortality hazard ratios are estimated). Results (Table  A -9) show that for the reference population and G2 southern Europe, there is little distortion in educational distribution for the matched sample vs. the entire EHF sample. For the G2 North Africa group, however, the matched sample is substantially distorted toward higher education categories. The proportions with primary education for this group are indeed systematically lower in the matched sample than in the entire EHF sample. This further suggests that the excess mortality we find for second-generation North African-origin males underestimates the true amount of excess mortality for this group.

Impact of out-migration on mortality estimates
As explained in the text, the ELM does not contain information on international outmigrations. As a result, individuals who leave France during the follow-up period (1999-2010) erroneously remain in the risk pool, producing a downward bias in mortality rates. This is a classic bias inherent to many studies in this literature (Palloni and Arias 2004). In the paper, we explain that G2 individuals are more likely to outmigrate than individuals with no immigration background, implying that the downward bias in mortality rates will be larger for G2 individuals than for the reference population. We conclude that the excess mortality we find among G2 North African males cannot be explained by a lack of information on international out-migrations.
Here we illustrate this conclusion with simulations. These simulations were carried out using a Poisson regression framework with death and exposure terms broken down into two periods (1999-2004 and 2005-2010). We applied various rates of outmigration to our two main G2 groups (North Africa and southern Europe) and examined the impact of these out-migration scenarios on incidence rate ratios. Outmigrations were uniformly distributed during the follow-up period. For example, the scenario with a 10% out-migration rate assumes that by the end of the follow-up period, 10% of the baseline G2 population left France, generating a 2.5% decrease in exposure for the period 1999-2004 and a 7.5% decrease in exposure for the period 2005-2010. The Poisson model is then estimated with dummy variables for population subgroup, age, and time period as explanatory variables. Out-migration rates in these simulations correspond to the amount of additional out-migration that these G2 groups experience relative to the reference population. (If all groups experienced the same rates of outmigration, hazard ratios would remain unbiased.) Results are shown in Figure A-3. In this figure, the baseline scenario with an outmigration rate of 0% produces results that correspond to those presented in the paper, which is expected given that the paper does not adjust for out-migration. (The use of a Poisson framework here produces almost identical results as the Gompertz framework used in the paper.) When out-migration is introduced, the incidence ratios systematically increase, illustrating the point made in the paper that our hazard ratios underestimate true hazard ratios whenever G2 groups experience more out-migration than the reference population. Results for G2 North African males confirm the paper's conclusion that our lack of information on out-migrations produces conservative estimates of the true hazard ratios for this population. Interestingly, our simulations also show that the mortality advantage we find among G2 southern European males is also unlikely to be explained by out-migration. Even in a scenario in which 15% of individuals at baseline leave by the end of the observation period, incidence rate ratios for this group would still remain below 1 and statistically significant. Results for females show that incidence rate ratios remain insignificant whatever the amount of assumed out-migration.

Figure A-3: Mortality incidence rate ratios (ages 18-64) for second-generation immigrant subgroups estimates using different out-migration scenarios, France, 1999-2010
A different type of mechanism that could potentially affect our mortality estimates involves selective out-migration of healthier G2 individuals prior to the baseline year of 1999. Indeed, if such selective out-migration was taking place before 1999, this would make the baseline sample in 1999 less healthy than in the absence of out-migration, potentially generating an upward bias in mortality estimates. The importance of this mechanism is difficult to assess in the absence of longitudinal follow-up since birth. However, it is unlikely that the excess mortality we observe during the follow-up period among G2 North African-origin males would be explained by this mechanism, because out-migration among this population, while higher than for the reference population, is estimated to be rather small (Richard 2004). Also, a recent study has shown that the mortality disadvantage among second-generation North African individuals is already observed at infant ages, a result that cannot be explained by left-truncation bias (Wallace, Guillot, and Khlat 2019).

Overall quality of the ELM for mortality estimation purposes
The previous sections examine the robustness of our mortality estimates to various sources of errors in the ELM. In this section, we examine the overall quality of the ELM data for mortality estimation purposes by comparing ELM-based adult mortality estimates with mortality estimates based on official exhaustive census and vital registration (VR) data. This unlinked data forms the basis for the calculation of official life tables in France and thus constitutes a useful comparison point for evaluating the ELM-based mortality data.
This comparison is possible for only the native-born and the foreign-born, since G2 status cannot be derived from the information available on death certificates. We were able to access VR death data by nativity for 2005-2009, a period that is not exactly the same but overlaps with the time frame of the ELM (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010). Exposure terms by sex and nativity for the period 2005-2009 were derived from census information. Deaths and exposure terms were then combined to calculate age-specific mortality rates, which were then converted into probabilities of dying between age 18 and 65 using standard life table methodologies. (For more information about these sources, see Guillot et al. 2018.) Table A-10 compares these probabilities of dying calculated on the basis of these two different data sources. Results show that despite its limitations, the ELM produces mortality estimates that are reassuringly close to those based on exhaustive official VR data. The ELM has a tendency to underestimate mortality, which is expected given the discussion of biases in the preceding sections. Nonetheless, the difference is never more than 9%. The difference is even smaller for the foreign-born population, despite the specific data limitations inherent in this population. Overall, this comparison suggests that despite its limitations, the ELM appears as a reliable source of mortality information for these population subgroups in France.

Defining the G1 and G2 North African-origin group
According to the French national statistical office, an immigrant is a person residing in France who was born abroad and had a foreign nationality at birth. Building on that definition, a second-generation immigrant is a person residing in France who was born in France and had at least one immigrant parent. As explained in the text, we generally relied only on country of birth information for determining first-and second-generation immigrant status. The reason is that although we have information on the respondent's country of birth, nationality at birth, and parental country of birth in our data, we do not have information on parental nationality at birth, which would be necessary for determining second-generation immigrant status as officially defined. Using country of birth as the sole piece of information for identifying immigrants is an acceptable approximation for most countries of birth because the proportion of foreign-born individuals who have a French nationality at birth is negligible in most cases. In the case of North African countries, however, this approximation is problematic. Indeed a substantial share of France's North Africa-born population includes 'repatriates,' a group of individuals who were born in Algeria during the colonial period and relocated to France following Algeria's independence in 1962. Repatriates include three main categories: (1) individuals of European descent; (2) North African Jews; (3) some North African Muslims, including soldiers who fought with the French army against independence (also called 'harkis') and officials of the former colonial administration who feared for their security in postindependence Algeria. Typically these 'repatriates' are not considered immigrants in the French context because not only were they French by birth but they did not lose their French nationality after Algeria's independence (Beauchemin, Hamel, and Simon 2016). Available estimates show that among repatriates, individuals of European descent constituted by far the largest category (about 80%) (Moumen 2010).
In response to the specificity of North African countries, and in the absence of information on parental nationality at birth, we used additional variables to identify immigrants (vs. repatriates) from North Africa and their native-born children. As explained in the text, our approach uses language and nationality information for this distinction.
This approach is not perfect. In particular, it tends to classify repatriates of the third category (North African Muslims) as immigrants rather than repatriates. We believe this is not problematic for our paper because (1) North African Muslims represent a small percentage of the 'repatriates' category (about 9%); and (2) although they are not considered 'immigrants' per se due to their nationality status, they share the same ethnic background with their immigrant counterparts and are thus likely to face similar barriers to education and employment in France. Our approach also classifies second-generation immigrants from North Africa who report that their parents spoke to them only in French and who had a French nationality at birth as children of repatriates. We also believe that this will have a small impact on estimates because studies have shown that while proficiency of Arabic or Berber at adult ages among G2 North African-origin individuals is somewhat variable, exposure to these languages as children in households with two immigrant parents is very high. In the TeO survey, 86% of G2 individuals with two North African immigrant parents reported some exposure to Arabic or Berber when they were children (Condon and Régnard 2016, Annex 5). Moreover, our use of a second variable -nationality at birth -for those reporting that their parents spoke to them only in French at age 5 further alleviates concerns that some second-generation immigrants may be misclassified in our study as children of repatriates. Figures A-1 and A-2 show that the distinction between immigrants and repatriates is not trivial demographically. Among males, 41% of respondents born in North Africa are identified in the EHF as repatriates and 36% of respondents born in France to two parents born in North Africa are identified as native-born children of repatriates. For females, the proportions are 45% and 39%, respectively.
In this section we examine the impact of making this distinction between immigrants and repatriates from North Africa on our results. First, instead of excluding repatriates from the analysis, as we do in the paper, we treated them as separate G1 and G2 categories, allowing us to examine whether repatriates and their native-born children have a distinct mortality pattern (Table A-11). Second, we estimated our model without making the distinction between immigrants and repatriates from North Africa; that is, we treated all individuals born in North Africa as first-generation immigrants from North Africa and all individuals born in France to two parents born in North Africa as second-generation immigrants from North Africa (Table A-12).
Results including repatriates as separate G1 and G2 categories (Table A -11) show that for the G2 male subgroups, there is a clear distinction between the children of immigrants per se (G2 North Africa), who as we know exhibit excess mortality, and the children of repatriates (G2 repatriates), who have mortality levels that are not statistically different from the reference population. This is expected given that for the most part children of repatriates from North Africa do not face the same barriers to education and employment as children of immigrants from North Africa per se (Beauchemin, Hamel, and Simon 2016).
Models using country of birth information only for the identification of immigrant subgroups (i.e., without making the distinction between immigrants and repatriates from North Africa) are presented in Table A-12. Results show that when repatriates and immigrants from North Africa are merged, the hazard ratio for G2 North African males decreases from 1.82 to 1.32 and loses significance. This is also expected given the more favorable mortality patterns of G2 repatriates. It illustrates the importance of going beyond parental country of birth information when examining second-generation immigrants from North Africa in the French context.    Note: (1) 'reference' refers to individuals born in metropolitan France to two parents born in metropolitan France; (2) significance levels at ** p < 0.01, * p < 0.05, and † p< 0.10.

Using alternative age breakdowns
The hazard ratios presented in the paper are based on mortality risks for the age range 18 to 64, summarizing mortality at working ages. In this section, we examine whether the hazard ratios for population subgroups vary depending on different age specifications in order to better target ages within the adult age range where subgroups may be particularly vulnerable or advantaged. We focus on the following age groups: 18 to 44 and 45 to 64. These two age groups are distinct epidemiologically, with a larger share of external causes in the age range 18 to 44 and a larger share of noncommunicable diseases in the age range 45 to 64. Results are presented in Table A-13 for mortality at ages 18 to 44 and in Table A-14 for mortality at ages 45 to 64. The main lesson of this exercise is that excess mortality among second-generation North African-origin males is particularly salient in the age range 18 to 44, with a hazard ratio of 2.02, higher than when considering the entire 18 to 64 age range. No statistically significant excess mortality is detected for this population subgroup at ages 45 to 64. The reverse is true for second-generation southern European-origin males: Their advantage is salient in the age range 45 to 64 but not in the age range 18 to 44.
Results for G2 females, which were not significant in the age range 18 to 64, remain insignificant in these models with alternative age breakdowns. Note: (1) 'reference' refers to individuals born in metropolitan France to two parents born in metropolitan France; (2) significance levels at ** p < 0.01, * p < 0.05, and † p< 0.10. Note: (1) 'reference' refers to individuals born in metropolitan France to two parents born in metropolitan France; (2) significance levels at ** p < 0.01, * p < 0.05, and † p< 0.10.