Replication materials for: “Summertime, and the livin’ is easy:
Winter and summer pseudoseasonal life expectancy in the United States” 
to appear in Demographic Research

This file prepared by Tina Ho & Andrew Noymer.  noymer@uci.edu

PREAMBLE:

This project uses two languages, IDL and Stata.  IDL is further described at:
http://www.harrisgeospatial.com/ProductsandTechnology/Software/IDL.aspx and 
is very similar to MATLAB.  GDL is syntax-compatible with IDL:
http://gnudatalanguage.sourceforge.net/  Stata is further described at:
http://www.stata.com/ and will be more familiar to the social science community.


FILES IN THIS ZIPFILE:
-------------------------
flowchart.txt       THIS FILE
deaths.zip          described at (i) below
exposure.zip        described at (ii) below
misc_programs.zip   described at (iii) below
misc_data.zip       described at (iv) below


All files w/extension "*.do" are Stata programs
All files w/extension "*.pro" are IDL programs
All files w/extension "*.tsv" are tab-sep-value ASCII data


Source data:
--------------------
(i) Deaths:
  deaths.zip 
this file contains male_all_causes.txt and female_all_causes.txt,
which are tab-delimited ASCII files of deaths by 22 age groups and by
month.  Extracted from NCHS data:
  www.cdc.gov/nchs/data_access/vitalstatsonline.htm 
  www.nber.org/data/vital-statistics-mortality-data-multiple-cause-of-death.html 
The publicly-available data USA mortality data has monthly time resolution.

(ii) Exposures:
  exposure.zip
this file contains  exposure_22groups_monthly.txt, exposures by 22 age
groups and by month.  These have been interpolated from HMD annual
exposure data.  The IDL code that does this is
interpolate-denominators.pro  (this takes as input:
leap-year.txt [this is just a table of leap years so that
the program can know how manyd do budget for February] and
hmd_1x1 [these are 1x1 exposures, straight from HMD website] and 
gives as output: exposure_1x1_monthly.txt).  Also: make_22_groups.do
a Stata program to condense exposure_1x1_monthly.txt into
exposure_22groups_monthly.txt.


(iii) misc_programs.zip

contains:

(A) collapse_by_season_v01.do
    Stata program.  Takes as input: female_all_causes.txt &
    male_all_causes.txt [see (i)] and outputs
    numerators_pseudoyear.dta, which is the deaths, arranged by pseudoyear,
    in Stata format.  wide_2_long_v01.do takes this file as input and
    outputs numerators_long.dta, the same data, re-shaped, in Stata format.

(B) collapse_exposure_by_season_v01.do
   Stata program.  Takes as input: 
   exposure_22groups_monthly.txt [from (ii)] and outputs:
   exposure_pseudoyear.dta which is a Stata-format dataset
   with exposure shaped into pseudoseasons.

(C) calc_e0.do  takes as input exposure_pseudoyear.dta
   (from B) and numerators_long.dta (from A), and outputs
   summer_e0.csv and winter_e0.csv which are pseudoseasonal life
   expectancy data.  These (along with E0per.csv, which are the HMD
   data for calendar years) are source data for Figures 2&3 of the
   paper.

(D) extract_heatmap_v00a.do takes as input exposure_pseudoyear.dta
   (from B) and numerators_long.dta (from A) and outputs: 
   heatmap_male.tsv heatmap_male.tsv, to make Figure 4.

(E) PH_v010.pro takes as input: males.tsv females.tsv and 
    calculates propotional hazard data (Figures 5-7 and proportional
    hazard information in the text body).  These input files come from
    extract_data.do, which, in turn, takes as input canon_dataset.dta,
    which is produced by make_canon_dataset_v00.do, which itself
    takes as input numerators_long.dta and exposure_pseudoyear.dta (see D).

(F) gompertz_example_v04.do takes as input numerators_long.dta and
    exposure_pseudoyear.dta (see D) and performs Poisson-Gompertz regression.
    The output of this gets cut+pasted from Stata to
    gompertz_example_results.tsv, which gompertz_example_v03.pro will
    convert to Table 1.
    
(G) interpolate-denominators.pro & make_22_groups.do
    See (ii), above.


(iv) misc_data.zip:

as explained above:
  males.tsv females.tsv  canon_dataset.dta
  exposure_pseudoyear.dta numerators_long.dta
  numerators_pseudoyear.dta
  exposure_1x1_monthly.txt
  gompertz_example_results.tsv
  summer_e0.csv winter_e0.csv
  E0per.csv heatmap_male.tsv heatmap_female.tsv
  hmd_1x1

as well as:
  pct_positive.tsv : CDC flu data, obtained from the source listed in
  the paper.  Used to make figure 1.


--------------------------------------------------------------------------------
NARRATIVE EXPLANATION:
---------------------

To do everything from scratch:

Start at collapse_by_season_v01.do, which is run in Stata, and which
needs as input female_all_causes.txt & male_all_causes.txt (from
deaths.zip).  This will produce numerators_pseudoyear.dta; then
run wide_2_long_v01.do which will make numerators_long.dta.

Then run collapse_exposure_by_season_v01.do in Stata, which
takes as input exposure_22groups_monthly.txt (from exposure.zip). This
outputs exposure_pseudoyear.dta.

Then run calc_e0.do in Stata.  You have now reproduced the data for
Figs 2 & 3 and the corresponding parts of the paper.

Then run extract_heatmap_v00a.do in Stata.  You have now reproduced
the data for Fugure 4 and the corresponding parts of the paper.

Then run make_canon_dataset_v00.do in Stata. This creates
canon_dataset.dta.  Then run extract_data.do in Stata.  This
creates males.tsv females.tsv.  Then run PH_v010.pro in IDL.
You have now reproduced the data for Figs 5-7 and the corresponding
parts of the paper.

Then run gompertz_example_v04.do in Stata.  This creates
gompertz_example_results.tsv.  Run gompertz_example_v03.pro
in IDL to make table 1.

To re-create Figure 1, use the data in pct_positive.tsv.

You have now replicated the entire paper.

The above narrative assumes you start with
exposure_22groups_monthly.txt as the exposures data.
If, in turn, you wish to reproduce these, use 
interpolatre-denominators.pro in IDL, which starts
with the raw HMD annual exposure data.

EOF.