## Appendix

A1 A Simple Binomial Model

Formula {4} is motivated by the following idea. Fix a territory and demographic group. Localities are indexed by i = 1,..., L. Focus on a particular property, e.g., living in multi-unit housing. Suppose people in that territory and group are independent, and in locality i there is a common probability pi of having the property in question. Heterogeneity is amplified by binomial variation, and it is an estimate of binomial variation that is the correction term in {4}.

More particularly, from locality i we choose a block at random and observe the Ni persons in that block; Xi persons have the property in question. Conditioned on the choice of blocks, the Xi are independent binomial variables, with Ni for the number of trials and success probability pi.
Now i = Xi/Ni and = i i/L. Of course,

while

The expected value of the naive estimator {3} is now easy to work out, and is

is the excess binomial variance. Finally - by design - the expected value of the correction term in {4} equals just canceling the contribution from excess binomial variance.

A2 Data-Dependent Areas

Our measure has simple properties in simple settings. If the local areas have fixed boundaries and samples of fixed numbers of individual post-stratum members are drawn from the local areas, then the theory just developed applies, and 2 is unbiased; the binomial formulas are easily adapted to simple random sampling. However, P-12 is not a simple setting. Data-dependent aggregation of blocks into superblocks, to be described shortly, implies local areas with random boundaries. The numbers of sampled individuals in these areas are themselves random, not fixed, and that leaves the correction term in the definition of 2 in need of justification. Sampling block clusters instead of individuals introduces a term for cluster-level heterogeneity into the expectations. We sketch our treatment of the data-dependence first and the term for clustered sampling next.

The data-dependent boundaries turn pi and H into random quantities with expectations, and the goal is to justify the formulas

In the display, W i accounts for within-area between-cluster covariance and di is the analog of a finite-sample correction factor. Both are defined below. We believe both are small, but our argument is only heuristic, and that is one reason why our conclusions in this paper are somewhat tentative.

The Census Bureau’s aggregation process, merging sample blocks into sample superblocks, may be described as follows [Bateman 1991]. Within each post-stratum, after the P-12 sample has been drawn, members are pooled together from block after block, following the sequence of blocks in the sample list, until a minimum of ten members are included or a state boundary is reached. Post-strata represent a fine-grained subdivision of the population along demographic lines, so most blocks contain at most a handful of people from the same post-stratum. The stopping rule for superblock completion typically puts half a dozen blocks into a superblock.

The list for the sampling frame snakes its way through the territory spanned by the post-stratum from place to place among places of the same place type. The sampled blocks amalgamated into one sample superblock are therefore often but not always drawn from the same contiguous area. Superblocks are put together separately for each post-stratum and superblocks formed for different post-strata do not coincide.

For our formal arguments, we use the word “locality” for the local area defined to correspond to a particular superblock in the following way. Split the ordered list of blocks in the sampling frame randomly at a uniformly distributed point between the last sampled block in the previous superblock and the first sampled block in the current superblock. Repeat the procedure between the current superblock and the succeeding one. That gives two breakpoints. The locality corresponding to the current superblock is the set of all blocks in the list between the two breakpoints. The superblock then equals the subset of blocks in the locality selected into the sample.

The order in the sampling frame maintains the integrity of address-register areas and Census district office areas, so a locality is often a contiguous or nearly contiguous area, but not always so. The rate pi is calculated for all the members in all the blocks in the the sampling frame in the ith locality. It is a random quantity because it depends on sample selection, on the operation of the stopping rule, and on the outcome of the splitting. The randomness in pi turns H into a random quantity as well.

We can write i in the form

In our notation, m is the binary outcome for the mth member of the post-stratum in the ith locality. For example, for multi-unit housing rates, m equals 1 if the corresponding person lives in multi-unit housing and equals 0 else. cm is the block cluster (P-12 sampling unit) to which the mth member belongs. S is the set of clusters in the sample in the ith superblock, S in number; a subscript for i is suppressed. J is the indicator function of a set.

The argument that E( i) E(pi) has four steps. First, we express E( i) as the expectation of the conditional expectation given Ni, the number of members in the ith superblock. Second, we argue that EJ(cm S)|Ni is nearly constant in m. That entails arguing against any sizable endpoint effects stemming from the random boundaries of the localities. It also entails arguing that conditioning on Ni has little impact, inasmuch as the stopping rule produces values of Ni that exceed the required minimum of 10 members per superblock only by the overshoot contributed by the last included block. Third, we count up terms with m = 0 and m = 1; the answers are familiar combinatorial expressions. Fourth, we argue that the people per cluster in the universe divided by the people per cluster in the sample should be close to unity and not strongly associated with pi. That is enough to conclude that E( i) E(pi).

The same line of reasoning leads, with more effort, to an approximation for E[( i - pi)2]. Some terms coincide with the binomial-formula terms found in the definition of . One set of cross-product terms, involving clusters in different localities, cancels. Another set of cross-product terms, involving pairs of clusters in the same locality, contributes the terms diW i discussed in the next subsection.

These considerations are in principle further complicated by the fact that the PES and P-12 samples are stratified samples with some variation in sampling weights. Sampling stratum membership is not indicated in the P-12 dataset. Sampling strata and sampling weights have major effects in the PES, but we expect their effects in P-12 to be minor for several reasons, including the absence of movers, the lack of non-response reweighting and special small-block samples, and the fact that our and p are not weighted averages but simple averages across localities.

A3 Effects of Clustered Sampling

The P-12 sample is a clustered sample primarily because individuals are clustered into blocks and secondarily because blocks are clustered into block clusters (containing one or two blocks in most cases). In the presence of clustered sampling, heterogeneity from cluster to cluster within localities makes a downweighted but nonzero contribution to sampling variability in ( i - )2 and introduces, as we have said, a term of the form diW i into E(2). The average within-cluster covariance in the universe of members of the ith locality is given by

The sums range over all clusters in the ith locality, and Mc is the number of members in the cth block cluster. The denominator is the number of terms in the numerator. For the contribution to sampling variability, W i must be multiplied by di, where

If members of the post-stratum were spread out with one member per cluster, di would be zero. If each cluster always had 10 members, forcing Mc = Ni = 10 under the stopping rule and creating single-cluster superblocks, di would be 9/10. (With our notation, if the ith superblock in the sample has index c in the sampling frame, then Ni = Mc.)

The covariance factor W i measures how much more often the outcomes for two members of the same cluster agree compared to the outcomes for two randomly chosen members of the whole locality. At the extreme, each cluster could consist entirely of ones or entirely of zeros, irrespective of size, and then we would have W i = pi(1 - pi), the variance of the outcome for a single randomly-selected member of the locality. The downweighting di would scale this variance by a kind of effective sample size for the clustered sampling. Usually, however, knowing Xm gives only limited information about Xm' , and W i will be close to zero.

The only non-zero contributions to W i come from clusters with two or more members; large contributions only from clusters with many members. Clusters with many members appear to be rare. The identity of blocks is erased in the P-12 data set; however, we have detailed census and PES data for metropolitan areas outside central cities in the Pacific division, nicknamed the “Berkeley data set.” In these data, of the clusters that contain any post-stratum member, about 20% contain only one such person. (We are averaging over post-strata.) Another 16% contain 2 people, and only about 20% contain 7 or more. The di factors average out near 1/2.

We cannot measure W i directly from P-12, and the PES sample is much too small for stable estimates. There is, however, an empirical test of the extreme hypothesis that all or most of the observed values of 2 are contributed by within-cluster covariances. Under this hypothesis, W i would not increase as localities and superblocks are merged into superlocalities and supersuperblocks, and di would decrease in accordance with the formula {11}. Values of have been inspected under a sequence of mergings for selected post-strata: falls off substantially more slowly than its predicted value under this extreme hypothesis. Any other outcome would be surprising; the small numbers of post-stratum members per cluster makes the sampling quite close to random sampling of individuals and thus to the case where the within-cluster covariance contribution is absent.

Both between-locality heterogeneity and between-cluster within-locality heterogeneity are forms of heterogeneity. remains a measure of heterogeneity whether or not the W i contributions are small. But between-locality heterogeneity is of primary interest; it is the contribution which directly affects estimates for whole local areas. The arguments in this section support the view that in the P-12 data set the approximation E(2) E(H2) is a workable one, and that the values in Table 1 are principally to be interpreted as evidence of heterogeneity from locality to locality.

A4 The Standard Errors for Table 1

The nine Census divisions represent, with a handful of exceptions, disjoint groups in the sampling scheme and their RMS 2s are essentially independent of each other. The squared measure 2 in Table 1 is the weighted mean of the nine 2s for the divisions, weighting by the number of post-strata in each division. For our calculation we make the assumption that the expected values of the nine measures for the divisions are all the same (cf. Table 4), while the nine variances differ. We write down an unbiased estimator for the variance of the overall measure as a weighted average of the squared deviations of the divisional measures from the overall weighted mean. The weights are functions of the numbers of post-strata in the divisions. This estimate should be something of an upper bound, because part of the variability in divisional measures must reflect small differences among expected values rather than sampling variability as assumed. We convert to square roots with a delta-method approximation.

Two alternative estimates for the sampling standard errors in are given in Table 1. An indirect approach is required because the identity of the sampling units has been erased by the superblock aggregation process; to our knowledge, the Census Bureau has not published direct estimates of standard errors for P-12. The low estimate in Table 1 is obtained by treating individuals as if they were the sampling units; the sampling variance for a post-stratum-wide is then computed as (1 - )/( Ni). The high estimate treats superblocks as if they were the sampling units. Then the sampling variance is computed as

where L is the number of super-blocks associated with the post-stratum in question. These estimates apply to P-12 itself; finally, we rescale in proportion to sample size, as measured by block clusters.

 Measuring Local Heterogeneity with 1990 U.S. Census Data Kenneth W. Wachter, David A. Freedman © 2000 Max-Planck-Gesellschaft ISSN 1435-9871 http://www.demographic-research.org/Volumes/Vol3/10