|
A1 A Simple Binomial Model
Formula {4} is motivated by the following idea. Fix a territory and demographic group. Localities are
indexed by i = 1,..., L. Focus on a particular property, e.g., living in multi-unit housing. Suppose people
in that territory and group are independent, and in locality i there is a common probability pi of having the
property in question. Heterogeneity is amplified by binomial variation, and it is an estimate of binomial
variation that is the correction term in {4}.
More particularly, from locality i we choose a block at random and observe the Ni persons in that
block; Xi persons have the property in question. Conditioned on the choice of blocks, the Xi are
independent binomial variables, with Ni for the number of trials and success probability pi.
Now i = Xi/Ni and =
i i/L. Of course,
while
The expected value of the naive estimator {3} is now easy to work out, and is
is the
excess binomial variance. Finally - by design - the expected value of the correction term in {4}
equals
just canceling the contribution from excess binomial variance.
A2 Data-Dependent Areas
Our measure has simple properties in simple settings. If the local areas have fixed boundaries and
samples of fixed numbers of individual post-stratum members are drawn from the local areas, then the
theory just developed applies, and
2 is unbiased; the binomial formulas are easily adapted to simple
random sampling. However, P-12 is not a simple setting. Data-dependent aggregation of blocks
into superblocks, to be described shortly, implies local areas with random boundaries. The
numbers of sampled individuals in these areas are themselves random, not fixed, and that leaves
the correction term in the definition of
2 in need of justification. Sampling block clusters
instead of individuals introduces a term for cluster-level heterogeneity into the expectations.
We sketch our treatment of the data-dependence first and the term for clustered sampling
next.
The data-dependent boundaries turn pi and H into random quantities with expectations, and the goal is
to justify the formulas
In the display, W i accounts for within-area between-cluster covariance
and di is the analog of a
finite-sample correction factor. Both are defined below. We believe both are small, but our
argument is only heuristic, and that is one reason why our conclusions in this paper are somewhat
tentative.
The Census Bureau’s aggregation process, merging sample blocks into sample superblocks, may be
described as follows [Bateman 1991]. Within each post-stratum, after the P-12 sample has been drawn,
members are pooled together from block after block, following the sequence of blocks in the sample list,
until a minimum of ten members are included or a state boundary is reached. Post-strata represent a
fine-grained subdivision of the population along demographic lines, so most blocks contain at most a
handful of people from the same post-stratum. The stopping rule for superblock completion typically puts
half a dozen blocks into a superblock.
The list for the sampling frame snakes its way through the territory spanned by the post-stratum from
place to place among places of the same place type. The sampled blocks amalgamated into one sample
superblock are therefore often but not always drawn from the same contiguous area. Superblocks are put
together separately for each post-stratum and superblocks formed for different post-strata do not
coincide.
For our formal arguments, we use the word “locality” for the local area defined to correspond to a
particular superblock in the following way. Split the ordered list of blocks in the sampling frame randomly
at a uniformly distributed point between the last sampled block in the previous superblock and the first
sampled block in the current superblock. Repeat the procedure between the current superblock and the
succeeding one. That gives two breakpoints. The locality corresponding to the current superblock is the
set of all blocks in the list between the two breakpoints. The superblock then equals the subset of blocks in
the locality selected into the sample.
The order in the sampling frame maintains the integrity of address-register areas and Census district
office areas, so a locality is often a contiguous or nearly contiguous area, but not always so. The rate pi is
calculated for all the members in all the blocks in the the sampling frame in the ith locality. It is a
random quantity because it depends on sample selection, on the operation of the stopping rule,
and on the outcome of the splitting. The randomness in pi turns H into a random quantity as
well.
We can write i in the form
In our notation,
m is the binary outcome for the mth member of the post-stratum in the ith locality. For example, for
multi-unit housing rates, m equals 1 if the corresponding person lives in multi-unit housing and equals 0
else.
cm is the block cluster (P-12 sampling unit) to which the mth member belongs.
S is the set of clusters in the sample in the ith superblock, S in number; a subscript for i is
suppressed.
J is the indicator function of a set.
The argument that E( i) E(pi) has four steps.
First, we express E( i) as the expectation of the
conditional expectation given Ni, the number of members in the ith superblock. Second, we argue that
E J(cm
S)|Ni
is nearly constant in m. That entails arguing against any sizable endpoint effects
stemming from the random boundaries of the localities. It also entails arguing that conditioning on Ni has
little impact, inasmuch as the stopping rule produces values of Ni that exceed the required minimum of 10
members per superblock only by the overshoot contributed by the last included block. Third, we count up
terms with m = 0 and
m = 1; the answers are familiar combinatorial expressions. Fourth, we
argue that the people per cluster in the universe divided by the people per cluster in the sample
should be close to unity and not strongly associated with pi. That is enough to
conclude that E( i) E(pi).
The same line of reasoning leads, with more effort, to an approximation for E[(
i - pi)2]. Some terms
coincide with the binomial-formula terms found in the definition of .
One set of cross-product terms, involving clusters in different localities, cancels. Another set of cross-product terms,
involving pairs of clusters in the same locality, contributes the terms diW i discussed in the next
subsection.
These considerations are in principle further complicated by the fact that the PES and P-12
samples are stratified samples with some variation in sampling weights. Sampling stratum
membership is not indicated in the P-12 dataset. Sampling strata and sampling weights have
major effects in the PES, but we expect their effects in P-12 to be minor for several reasons,
including the absence of movers, the lack of non-response reweighting and special small-block
samples, and the fact that our and p are not weighted
averages but simple averages across localities.
A3 Effects of Clustered Sampling
The P-12 sample is a clustered sample primarily because individuals are clustered into blocks and
secondarily because blocks are clustered into block clusters (containing one or two blocks in most cases).
In the presence of clustered sampling, heterogeneity from cluster to cluster within localities makes a
downweighted but nonzero contribution to sampling variability in
( i - )2
and introduces, as we have said, a term of the form diW i
into E( 2). The average within-cluster covariance in the universe
of members of the ith locality is given by
The sums range over all clusters in the ith locality, and Mc is the number of members in the cth block cluster.
The denominator is the number of terms in the numerator. For the contribution to sampling variability, W i
must be multiplied by di, where
If members of the post-stratum were spread out with one member per cluster, di would be zero. If each
cluster always had 10 members, forcing Mc = Ni
= 10 under the stopping rule and creating
single-cluster superblocks, di would be 9/10. (With our notation, if the ith superblock in the sample has
index c in the sampling frame, then Ni = Mc.)
The covariance factor W i measures how much more often the outcomes for two members of the same
cluster agree compared to the outcomes for two randomly chosen members of the whole locality. At the
extreme, each cluster could consist entirely of ones or entirely of zeros, irrespective of size, and then we
would have W i = pi(1
- pi), the variance of the outcome for a single randomly-selected member of the
locality. The downweighting di would scale this variance by a kind of effective sample size for the
clustered sampling. Usually, however, knowing Xm gives only limited information about
Xm' , and W i
will be close to zero.
The only non-zero contributions to W i come from clusters with two or more members; large
contributions only from clusters with many members. Clusters with many members appear to be rare. The
identity of blocks is erased in the P-12 data set; however, we have detailed census and PES data for
metropolitan areas outside central cities in the Pacific division, nicknamed the “Berkeley data set.” In
these data, of the clusters that contain any post-stratum member, about 20% contain only one such person.
(We are averaging over post-strata.) Another 16% contain 2 people, and only about 20% contain 7 or
more. The di factors average out near 1/2.
We cannot measure W i directly from P-12, and the PES sample is much too small for stable estimates.
There is, however, an empirical test of the extreme hypothesis that all or most of the observed values of
2 are contributed by within-cluster covariances. Under this hypothesis, W
i would not increase
as localities and superblocks are merged into superlocalities and supersuperblocks, and di
would decrease in accordance with the formula {11}. Values of
have been inspected under a
sequence of mergings for selected post-strata: falls off substantially more slowly than its
predicted value under this extreme hypothesis. Any other outcome would be surprising; the
small numbers of post-stratum members per cluster makes the sampling quite close to random
sampling of individuals and thus to the case where the within-cluster covariance contribution is
absent.
Both between-locality heterogeneity and between-cluster within-locality heterogeneity are forms of
heterogeneity. remains a measure of heterogeneity whether or not the W i contributions are
small. But between-locality heterogeneity is of primary interest; it is the contribution which
directly affects estimates for whole local areas. The arguments in this section support the view
that in the P-12 data set the approximation E( 2)
E(H2) is a workable one, and that the
values in Table 1 are principally to be interpreted as evidence of heterogeneity from locality to
locality.
A4 The Standard Errors for Table 1
The nine Census divisions represent, with a handful of exceptions, disjoint groups in the sampling scheme
and their RMS
2s are essentially independent of each other. The squared measure 2 in Table 1 is the
weighted mean of the nine
2s for the divisions, weighting by the number of post-strata in each division.
For our calculation we make the assumption that the expected values of the nine measures for the
divisions are all the same (cf. Table 4), while the nine variances differ. We write down an unbiased
estimator for the variance of the overall measure as a weighted average of the squared deviations of the
divisional measures from the overall weighted mean. The weights are functions of the numbers of
post-strata in the divisions. This estimate should be something of an upper bound, because part of
the variability in divisional measures must reflect small differences among expected values
rather than sampling variability as assumed. We convert to square roots with a delta-method
approximation.
Two alternative estimates for the sampling standard errors in are given
in Table 1. An indirect approach is required because the identity of the sampling units has been erased
by the superblock aggregation process; to our knowledge, the Census Bureau has not published direct estimates
of standard errors for P-12. The low estimate in Table 1 is obtained by treating individuals as if they
were the sampling units; the sampling variance for a post-stratum-wide
is then computed as
(1 - )/( Ni).
The high estimate treats superblocks as if they were the sampling units. Then the sampling variance is computed
as
where L is the number of super-blocks associated with the post-stratum in question. These estimates
apply to P-12 itself; finally, we rescale in proportion to sample size, as measured by block
clusters.
|