This note informs about the different steps that are needed to replicate the results in the paper “The 
importance of correcting for survey non-response when estimating health expectancies: Evidence 
from The HUNT Studies”. It is also available as the pdf file “InstructionsForReplication.pdf” in the 
archive zip folder.   FS 23/11-2023.

1—Access to the datasets
The dataset used in the study consists of health data related to waves 1, 2 and 3 of The HUNT Studies, and administrative register data on income and educational attainment. To protect participants' privacy, the HUNT Research Centre aims to limit storage of data outside HUNT databank, and cannot deposit data in open repositories. However, The HUNT Databank has precise information on all data exported to different projects and can reproduce these on request.
To access to the health data, the research team needs to send a motivated application to the HUNT Data Access Committee, using the format explained at https://www.ntnu.edu/hunt/data. 
Researchers from other countries are welcome to apply in cooperation with a Norwegian principal investigator.  Since the health data are to be merged with administrative data, the application needs to include a Data Protection Impact Assessment (DPIA).
The application to HUNT requires that the project has been approved by a Regional Committee for Medical and Health Research Ethics (REK).  Such approval requires an application to the REK portal.  
The REK administration then allocates the application to one of the regional REK committees.  New applications are as a rule allocated to the regional committee for the region where the principal 
investigator’s institution is located.  The details about applying to REK can be found at https://rekportalen.no/?#hjem/s%C3%B8ke_REK.  
When the aim is to replicate the results in the paper, researchers should refer to project “2010/9328, Changes in disability-free expected lifetime in Norway (as life expectancy increases)”.  This project 
has project id number 102049.  
After the approval by REK and by HUNT, researchers can apply to Statistics Norway (Section for microdata) to get the HUNT data merged with the administrative data on income and educational attainment.  Information on the application procedure is available at https://www.ssb.no/en/data-til-forskning/utlan-av-data-til-forskere.  Researchers should refer to project “11/2172 Changes in disability free life expectancy” in order to apply for exactly the same register data on income and education that were used in the project. Statistics Norway owns the scrambling key with which the 
HUNT data were merged with the administrative data.  

2—Software
The software that was used is Stata 14/MP.  

3—The data sets
The following data sets were made available as Stata data files:
--"data102049.dta”: the original HUNT data file covering HUNT1, HUNT2 and HUNT3. It contains 106440 records corresponding to distinct residents of Nord-Trøndelag that participated at least once in the HUNT Studies.     
--"dato_inviterte.dta”: a HUNT file containing for each resident ever invited to HUNT1, HUNT2 or HUNT3, the date of invitation.  It contains 139419 records.  The difference in records w.r.t. the previous datafile is because not all residents participated at least once.  This file was received after the original HUNT data file was received.
--"MaritalStatusFileH3.dta”: a HUNT file containing for each resident ever invited to HUNT1, HUNT2 or HUNT3, the marital status of that resident during HUNT3 if that resident participated in HUNT3.  
This file was received after the original HUNT data file was received, when it transpired that the latter did not contain civil status information during H3. 
A researcher wishing to replicate the results in the paper, should therefore order a HUNT datafile that includes information on the date of invitation and on marital status during H3. 
--“w11-2172_inntekt_ut.dta”: a Statistics Norway file containing for all residents in NT participating in at least one of the HUNT Studies, comprehensive income for each of the years 1993—2010.  
Comprehensive income is a net income measure calculated by the tax authorities for each person subject to taxation in Norway.  It consists of all types of taxable income (labour income, capital income, entrepreneurial income, transfer income) minus all deductible expenses and losses. 
--“w11-2172_pensjon_ut.dta”: a Statistics Norway file containing for all residents in NT participating in at least one of the HUNT Studies, pension entitling income for the years 1993—2010.  Pension 
entitling income is the sum of all income that forms the basis for building up pension rights.  It is a measure of connectedness to the labour market. 
--“w11-2172_utdanning_ut.dta”: a Statistics Norway file containing for all residents in NT participating in at least one of the HUNT Studies, the completed degrees since 1970 (6 digit education code, following the NUS2000 classification) and the year and month of completion for each degree.  If the person had completed degrees before November 1970, the first recorded completed degree will be the highest completed degree in November 1970. 

4—Building up the dataset that is used for the analysis
Note: These instructions assume that the do-files reside under “S:\Fred\do\” and the data files reside under “S:\Fred\data\”.  The directory path “S:\Fred\” should everywhere be replaced by the 
appropriate path for the researcher. 
The dataset can be constructed by running the do file “DRMaster Hunt 2018.do” in the folder “DataManagement”.  This do-file calls upon the following do-files:
--"DRkobling_msH3fil.do"
Marital status for residents invited to H3 was not included in the original dataset received from HUNT.  It was sent later as a data file MaritalStatusFileH3.dta containing 139419 records. The above 
do-file merges this data file with the data file “dato_inviterte.dta”, and stores it as “dato_inviterte_inkl_msH3.dta”.   
--"DRUtdanning.do"
The code creates a variable for highest achieved education in the "main" H1, H2 and H3 year.
--"DRInntekter.do" 
This code constructs the mean of p-e income during the three years a HUNT survey was carried out.  
The constructed variables are stored in "P_Inntekter.dta"
Next, it constructs the mean of comprehensive income during the three years a HUNT survey was carried out (only for H2 and H3 since no data on comprehensive income are available before 1993). 
The constructed variables, together with the variables in "P_Inntekter.dta", are stored in the data file "Inntekter.dta"   
--"DRkobling_registerdata.do" 
This code merges the HUNT data file "data102049.dta" with the constructed data files on highest educational achievement ("Utdanning.dta") and the constructed data file on p-e and comprehensive incomes ("Inntekter.dta")
--"DRnye_data_alle_inviterte.do" 
This code merges the current data file with the aforementioned file "dato_inviterte_inkl_msH3.dta".
--"DRUtdanningskategorierNy.do" 
This code defines the highest obtained educational attainment using the Norwegian classification system for education. Cf NUS2000 (Norwegian Standard for education classification--version 2000)
--"DRage_agecategory_ny.do" 
This code first constructs an income variable that corresponds to pension-entitling income in the year a person is invited for a HUNT surveys. Next, it constructs the age for both participants and non-
participants.  It uses the difference between invitation year and year of birth.  It also constructs 5-year age group categories.  
--"DRFunctional impairment.do"
This code (i) names central variables, (ii) constructs indicator variables for the different marital statuses, (iii) constructs the functional impairment indicator used in the paper (the precise coding of 
this variable is explained in Section A.3 (Appendix A) of the paper).  The code also renames the variables for illness diagnosis.
Finally, the open data file is compressed and saved as "DRworkingfile_new.dta".  This data file contains 139411 records. 
Next, one should run the do-file “DRCreationOfPanelStructure.do”.  This code accesses "DRworkingfile_new.dta" and creates a dataset with panel structure "DRHUNT_panel.dta".  This data file contains 418233 records (=3 waves x 139411 records per wave).  The variables used in the analysis, and their descriptive statistics, are presented in Descriptive Appendix A.4 of the paper. 

5—The statistical analysis
The do-files for the statistical analysis reside in the folder “StatisticalAnalysis”.  The main do-file is “DRSelectionAttritionHUNTPanelWaves18.do”.  This file uploads in memory six three user-written 
programmes that are stored in the do-file “AETprobitNew.do”.  These programmes are
--datetime
--AETprobit
--myboot_pFIforH2andH3men
--myboot_pFIforH2andH3women
--myboot_pFIforH2andH3menSA
--myboot_pFIforH2andH3womenSA

“DRSelectionAttritionHUNTPanelWaves18.do” accesses "DRHUNT_panel.dta" and goes through the following steps. (Line numbers refer to the pdf version of the do file 
“DRSelectionAttritionHUNTPanelWaves18.do” stored as DRSelectionAttritionHUNTPanelWaves18.do.pdf in the folder “StatisticalAnalysis”.) 
A—(lines 1—203) Uploading in memory of the programmes in the do-file “AETprobitNew.do”.  
Construction of new variables: age categories, age splines, income variables, bmi categories, residence indicator, in-sample indicators, temporary attrition indicator.
B—(lines 205—245) Construction of  Table A.4.-1 with the descriptive statistics, and identification of relevant ages mentioned in the main text.
C—(lines 248—276) Preamble for the main analysis: construction of various variables and of labels to be attached to the results.
D—(lines 279—378)  Estimations corresponding to “Step 2” of Section 4.2 in the text.  Maximum likelihood estimation of the model under the restriction “SoU = 1 x SoO” using the AETprobit code.  If 
“dothis” equals 1, the code bootstraps the AET estimations to obtain the correct standard errors.   
The code next computes the corresponding average partial effects (which are the results reported in Table 3) with the associated bootstrapped standard errors. 
E—(lines 381—904) Estimations corresponding to “Step 3” of Section 4.2 in the text.  First for men.  
Maximum likelihood estimation of the model under the restriction “SoU = 1 x SoO” using the AETprobit code.  Construction of the age profiles for the prevalence of FI.  If “dothis” equals 1, the 
code bootstraps the AET estimations to obtain the correct standard errors; it next computes the corresponding average partial effects (which are the results reported in Table 4) with the associated 
bootstrapped standard errors (lines 466-483).  Next, the code bootstraps the entire procedures (i.e. steps 1-3) to obtain the standard errors for the estimated age profiles (Note that this takes up to 26 
hours for men).  For this it uses the routine “myboot_pFIfor H2andH3men” that is included in the do-file “AETprobit.do” (lines 486—528).  The results of this bootstrapping procedure are then stored in a 
new data file “outputAETimpfinalMay23.dta” (lines 530--566).   This data file contains only 26 records (13 age groups x 2 genders).  Next, the code performs the sensitivity analysis explained in Technical 
Appendix B.5 (lines 568--613).  For this it uses the routine “myboot_pFIfor H2andH3menSA”.  These results are also stored in “outputAETimpfinalMay23.dta” (lines 615—645).  Lines 648—904 do the 
same as above, but now for women (the bootstrapping of the entire procedure takes about 36 hours for women).  
F—(lines 907—1126) Instructions to compute the age profiles for participation rates and the answer to the main FI question, necessary to construct Figures 1, 2 and A.3-1 in the paper.  These age 
profiles are also stored in the data file “outputAETimpfinalMay23.dta”.
G—(lines 1128—1205) Instructions to compose Table 1 of the paper.

6—Construction of the figures
The do-file for the construction of the figures is called “DRPlottingFIandFILEasPDF.do”, which resides in the folder “FigureConstruction”.  This code accesses the data file “outputAETimpfinalMay23.dta” 
containing all the age profiles in order to build up the different figures in the paper. The figures are stored in the folder “graphs” with directory path “M:\ALLEMINE\DFLE-HUNT\Analyse\graphs”.  The 
directory path “M:\ALLEMINE\DFLE-HUNT\Analyse\” should be replaced by the appropriate path for the researcher.