The estimation of birth intensities from EU-SILC data
Readme, Angela Greulich, Laurent Toulemon
Example: Filling and Reading the document Germany2010.xlsx:
1) Getting the data
2) Data compilation: STATA do-file “datacompil.do”
3) Data output: STATA do-file “dataoutputDE.do”
4) Data calculation with Excel “Germany2010.xls”
1) Getting the data
Eurostat, the provider of EU-SILC data, does not allow us to deposit the data. We are also not allowed to provide a small sample dataset for illustrative purposes.
Basic meta-information about the data files and the way for getting access are, however, provided in a separate document, called meta-file document.
2) Data compilation (STATA program code)
We illustrate here our data compilation with EU-SILC, using the cross-sectional waves 2011, 2012 and 2013. These waves were used to calculate period fertility measures for calendar year 2010 (including our semi-retrospective approach). Three waves have been grouped together for sample size-reasons.
For wave 2011, for example, we used version “rev3” of the R-file (register file) for obtaining demographic information, and “rev3” of the P-file (personal file) for obtaining information about the educational level of women.
3) Data output (STATA data output code)
We illustrate here how we use our compiled data for obtaining a series of descriptive statistics that are necessary to calculate education-, birth order- and age-specific period fertility measures, for the example of Germany. The descriptive statistics we obtain in this step are then copied into an Excel file. They include frequencies with weight (“freq ww”), frequencies without weight (“freq”) as well as fertility rates (“tcb”).
4) Further calculation with Excel
The Excel file “Germany2010.xls” contains several sheets allowing to transform the copied education-, birth-order and age-specific frequencies into birth intensities, including Bayesian statistics (prior/posterior calculation) to obtain credible intervals for small sample size-categories. Rates are used to calculate confidence intervals.
In the following, we go into more detail for each sheet, from left to right:
Sheet freq ww
Weighted sample size by education, (women N and births n), age, birth order (parity+1: women at parity r give birth to children of birth order r+1). All data come from STATA file.
Sheet freq
Unweighted sample size by education, (women N and births n), age, birth order. All data come from STATA file. Data included for checking sample size, not used in the computations.
Sheet n for priors
Weighted sample of women giving birth by age and birth order. Read from freq ww.
We stop at birth order 5 because for birth order 6 on, fertility levels are so low that we do not need to have an accurate estimate.
Sheet N for priors
Weighted sample of women by age and birth order (parity+1). Read from freq ww.
Sheet w for priors
Weighted population at risk by age and birth order (parity+1) corresponding to the weight of the prior. The weight of the prior has been set to 2 times the mean weight (equivalent to 2 (two) respondents added to N and 2qp added to n, with qp being the prior probability). The parameter 2 is in cell C1, and can be changed for more or less weight given to the prior.
For more information on the sensitivity of our estimates on the weight of the prior, see last page of this document.
For high parities and low ages, there can be no respondent, and thus no estimated weight. In that case, we set the weight to the mean weight for that parity (we set a value to avoid computational errors but when there is no respondent only the prior estimate of fertility will be used and the mean weight for the empty cell does not matter).
Sheet priors
Prior probabilities B4:N39. Pre-prior (all birth orders combined) in column N; order-specific priors in columns B:F.
Sheet educ prior (low)
Respondents with a low educational level. 5 tables on top
First birth probabilities D5:H40 (columns I, R, AA, AJ, and AS are not used as in some cases the Bernoulli parameters cannot be estimated (they are set to 0 when N=0). For first births,
- n number of events D5:D40 = 'freq ww'!B44
- N population at risk E5:E40 = 'freq ww'!B4
- q F5:F40 = n/N Bernoulli estimate of probability (set to 0 if N=0)
- q posterior G5:G40 = N/(N+w) q + w/(N+w) qp, with w prior weight and qp prior probability. q posterior=(D5+'priors '!B4*'w for priors '!B4)/(E5+'w for priors '!B4)
- CumulqL1 H5:H40: sum(15…xx) (q posterior) cumulated probabilities (an estimate of the cumulated hazard function H(x))(cumulated probabilities not in calculus, but helps us see importance of priors:)
- CumqL1 without p I5:I40: sum(15…xx) (q=n/N) cumulated probabilities (an estimate of the cumulated hazard function H(x) without priors)
(Without priors in light grey just presented to see impact of Bayesian stat., not used. Reminder: the last page of this document contains information on the sensitivity of our estimates on the weight of the prior.)
Qcb2 M5:Q40. For second births, same set of estimates
And V5:Z40 for third births, AE5:AI40 for fourth births, AN5:AR40 for fifth births,
4 tables down
B45: Post-stratification scaling factor: Input data. This scaling factor is the ratio of the TFR based on official estimates by the TFR obtained from our EU-SILC data. It is applied to all probabilities (see text 3.e and Figure D.1).
C44:O80 birth probabilities, by age and birth order, multiplied by the post-stratification scaling factor (> or < than 1).
Note that from birth order 6 on, we suppose that the probabilities are the same as for birth order 5, because a precise estimation for birth orders 6+ has no impact on total birth intensities. For the example of Germany: Out of the 192.7 births per 100 women (total birth intensity at age 49; AZ79), 2.4 births are of order 5 and only 0.4 are of order 6+.
Q44:AD80 life table population by age and parity based on the birth probabilities C44:O80, with initial population P(15,0)=100
AF44:AR79 life table events by age and birth order
AT44:AZ79 life table events by birth order (1-4,5+), cumulated by age. AZ79 gives the total intensity, all birth orders combined.
AZ79: total birth intensity at age 49 for low educated
Sheet educ prior (med)
Respondents with a middle educational level. N and n different, all the rest same as Sheet educ prior (low).
B45: Post-stratification scaling factor. From low: B45
AZ79: total birth intensity at age 49 for middle educated
Sheet educ prior (high)
Respondents with a high educational level. N and n different, all the rest same as Sheet educ prior (low).
B45: Post-stratification scaling factor. From low: B45
AZ79: total birth intensity at age 49 for high educated
Sheet educ prior (all)
Adds events and populations by educational levels: respondents all educational levels combined. N and n different, all the rest same as Sheet educ prior (low).
B45: Post-stratification scaling factor. From low: B45
AZ79: total birth intensity at age 49 for all education groups combined.
Sheet tcb
Objective of this sheet: calculate confidence intervals for education- and age-specific birth intensities based on education- and age-specific fertility rates.
(yellow means data comes from STATA file)
A2:E40: Four series of age-specific fertility rates (all low_educ med educ high educ): All data come from STATA file.
B39:E39: sum of the age-specific fertility rates (i.e. TFRs) (all low_educ med educ high educ), multiplied by the post-stratification scaling factor in T3 (copied from educ prior (low): B45).
B40:E40: total birth intensities at age 49 (all low_educ med educ high educ); copied from the respective ‘educ prior’ sheets (serves to enable a rapid comparison with TFRs)
H2:S38: Standard error (se) and variance (var) and cumulated variance (cumvar) by education:
H3:J40: se low se middle se high: Standard errors of the rates. All data come from STATA file.
K3-S38: variance = se² and sumvar = sum(15…x)(var) and (se low intensity se middle intensity se high intensity) with standard errors of cumulated fertility, multiplied by the post-stratification scaling factor from sheet ‘educ prior (low)’ cell B45.
Confidence interval limits for cumulated fertilities:
From 'educ prior (low)'!$AZ$45:$AZ$79 and Q3:Q37 and same cells from '…(med)… ' '…(high)… ': R3:R37 and S3:S37.
Sheet 1st cb
One graph, on birth intensities by age and education: first births.
Cumulated intensity by age, one curve by education. Source:
- =SERIE("low education";'educ prior (low)'!$AT$45:$AT$79;'educ prior (low)'!$AU$45:$AU$79;1)
- =SERIE("middle education";'educ prior (med) '!$AT$45:$AT$79;'educ prior (med) '!$AU$45:$AU$79;2)
- =SERIE("high education";'educ prior (high) '!$AT$45:$AT$79;'educ prior (high) '!$AU$45:$AU$79;3)
Sheet 2nd cb
One graph, on birth intensities by age and education, second births.
Cumulated intensity by age, one curve by education. Source:
- =SERIE("low education";'educ prior (low)'!$AT$45:$AT$79;'educ prior (low)'!$AV$45:$AV$79;1)
- =SERIE("middle education";'educ prior (med) '!$AT$45:$AT$79;'educ prior (med) '!$AV$45:$AV$79;2)
- =SERIE("high education";'educ prior (high) '!$AT$45:$AT$79;'educ prior (high) '!$AV$45:$AV$79;3)
Sheet 3rd cb
One graph, on birth intensities by age and education, third births.
Cumulated intensity by age, one curve by education. Source:
- =SERIE("low education";'educ prior (low)'!$AT$45:$AT$79;'educ prior (low)'!$AW$45:$AW$79;1)
- =SERIE("middle education";'educ prior (med) '!$AT$45:$AT$79;'educ prior (med) '!$AW$45:$AW$79;2)
- =SERIE("high education";'educ prior (high) '!$AT$45:$AT$79;'educ prior (high) '!$AW$45:$AW$79;3)
Sheet all cb
Two graphs, on birth intensities by age and education, all birth orders together
Cumulated intensity by age, one curve by education. Source in C45-L80 (copied from sheets ‘educ pior (…)’ and ‘tcb’.