## 2. Heterogeneity

2.1 The Measure

This section presents our measure of heterogeneity, and procedures for estimating it from the data. Throughout this discussion we restrict attention to a single variable (e.g., the proportion of persons living in multi-unit housing) and to a single demographic group, for example, Hispanic women aged 20-29. Our object of study is variability from local area to local area within broader areas. We need some terminology and notation to clarify the distinctions.

Territory is our name for one of the broader areas, and each territory spans many local areas. The territories are determined by the design of the underlying estimation project and data-collection effort. In small-area estimation with uniform ratio estimators, all small areas within one territory are assigned a common “uniform” estimated value, based on aggregating across the territory; in census applications, estimates are uniform within combinations of territory and demography called “post-strata.” Typically, a territory consists of all places of a particular type across some regional subdivision of the country. In P-12, the central cities in New England are an example of a territory.

Each territory is dissected into localities: examples might be the Back Bay, North End, or Beacon Hill in Boston; or Colorado’s Snowmass Mountain Basin, Sangre de Cristos foothills, etc. The estimation project itself may go down to smaller units than these local areas - Census undercount estimation goes all the way down to blocks - but we are measuring heterogeneity only down to the scale allowed by P-12. For any particular variable of interest, each local area has a value that differs from the territory-wide value, the latter being an average of the former. The deviations from average are what we call heterogeneity, and it is heterogeneity that we are going to measure.

In our notation, within a given territory, for a given variable and demographic group:
pi is the true rate for all ni group members in the ith of L localities;

p = i pi/L is the arithmetic mean of the true local rates.

The quantity whose measurement is the goal of the study is the population-level “variance due to heterogeneity,” the variance of the true local rates about their mean:

This quantity is important because it appears in formulas for estimation error. Let be an estimator of p. The mean squared error of estimation that results from using as the estimate of pi for all local areas as if they shared a common value is

The first term on the right, H2, represents errors due to heterogeneity resulting directly from attributing one common rate to local areas whose true rates vary. The second term represents errors due to bias and sampling variability in the territory-wide estimator . The H2-term only vanishes if the local areas are in fact homogeneous, so that all pi equal each other and hence equal p. Otherwise, H2 is a contribution to error which cannot be reduced by increasing sample size.

Some fine points require mention. To begin with, H2 is centered on p, in order to make straightforward the interpretation as a variance due to heterogeneity. Similarly, p is the unweighted mean of the pi. The P-12 data were aggregated by a procedure that tends to equalize the counts of stratum members in local areas, so there is little numerical difference between weighted and unweighted means. Conceptually, however, the weighted mean is the natural target of a ratio estimator obtained from a numerator and denominator separately aggregated over i. If is such an estimator, then the term (p - )2 in {2} includes the squared difference between weighted and unweighted means, as well as a contribution from sampling error and from “ratio estimator bias,” whose underlying source is again heterogeneity among the pi. Ratio estimator bias enters through
p - ,
decreases with sample size, and is a minor part of the story compared to H2  [Freedman, Stark and Wachter 2000].

We now discuss estimation of H2 from the P-12 sample. Temporarily, we have fixed a territory and a variable, and are considering only persons in one demographic group. In our setup, the ith superblock represents the ith locality, so the number of superblocks coincides with the number of localities. Let

i be the rate for the Ni sample persons in the ith superblock;

= i i/L be the mean of these rates.

A naive estimator of H2 would be

However, this estimator of variance due to heterogeneity is inflated by variance due to sampling. We therefore define by the equation

The second term is an approximate correction for sampling variability in i and [see Appendix].

2.2 Variables and Strata

Four variables are examined in our study:

(i) The multi-unit housing rate is the proportion of persons residing in multi-unit structures.

(ii) The non-mailback rate is the proportion of people who did not mail back their Census form, out of all people in the Census who were meant to mail it back.

(iii) The allocation rate is the proportion of persons with at least one of six key characteristics imputed. The six characteristics are relationship to householder, age, sex, race, Hispanic origin, and marital status.

(iv) The substitution rate is the proportion of persons whose whole record was imputed or “substituted” into the Census, typically in households from which no detailed information was obtained.

These variables provide a good variety of cases with which to examine local heterogeneity. They include one structural variable, one behavioral variable, and two measures of data completeness, all taking values between zero and one. They are four of the five main variables treated in the Census Bureau’s P-12 Project Report [Kim 1991]: the fifth variable, the mail universe rate, is more narrowly administrative in character, and is not considered here.

Documentation of the P-12 data set is to be found in [Thompson 1990, U.S. Bureau of the Census 1990, Bateman 1991, Kim 1991]. The sample is a stratified cluster sample selected using essentially the same design as the Census Bureau’s 1990 Post-Enumeration Survey (PES) but with 116,619 block clusters in place of 5293. These clusters include 204,394 blocks compared to 12,964 blocks for the PES.

The stratification is the one proposed for adjusting the 1990 Census. The population is divided into 116 PSGs (“Post-Stratum Groups”) defined in part by geography and in part by demography. The geographical classification is based on census division (9 areas) and place-type (7 types). The demographic breakdown is by race-ethnicity (4 categories), and renter-owner status (2 groups). In principle, there could be 9 × 7 × 4 × 2 = 504 PSGs, but smaller ones are collapsed and the largest urban areas are treated differently. In total, there are 116 PSGs: a list can be found in Table A.1 of Hogan (1993). We exclude the PSG for Indians living on reservations, and deal with the remaining 115; we also exclude the so-called “residual population” not surveyed by the PES. Each PSG is broken down by 6 age groups and 2 sexes into 12 “post-strata,” so we have 115 × 12 = 1380 post-strata to consider.

Each post-stratum is defined in part by demography (race-ethnicity, renter-owner, age, sex) and in part by geography (census division, place-type). Three examples give the flavor of the post-strata:

(i) Non-minority females age 0-9 living in a central city in New England.

(ii) Black males age 10-19 living in rental units in a central city in a large metropolitan area in the South Atlantic division (Florida, Georgia, and so forth).

(iii) Black and hispanic females age 60 and over in New England, living either in a central city or in a metropolitan area but not in its central city.

The geography is the “territory” associated with the post-stratum, and the demography can be considered as providing the stratification within which small-area estimation takes place. In post-stratum (i), for instance, the territory is central cities in New England; small-area estimation would be uniform within the part of this territory inhabited by the demographic group consisting of non-minority females age 0-9. With post-stratum (ii), the territory consists of central cities in large metropolitan areas in the South Atlantic division, and the demographic group consists of black males age 10-19 living in rental units. Thus, the boundaries of the territories depend on the post-stratum, and so will the dissection of each territory into local areas. For groups whose members are numerous, high resolution is possible and the local areas are small. For groups whose members are few and far between, the local areas are extended.

As noted above, the specification of territories, and localities within a territory, is data-dependent. More specifically, the records in P-12 correspond to unique - and non-overlapping - intersections of post-strata and superblocks. These records were built up from more basic information for each post-stratum and sample block. The algorithm used to build up the records required that for any given post-stratum, a P-12 record must have at least ten post-stratum members; the corresponding geography must span whole Census block clusters and must not cross state lines. (These three constraints could not always be satisfied, and we eliminated some 2000 exceptional records.) Given a post-stratum, a “superblock” is the collection of block clusters put together during the construction of a record; there is one superblock per record. There is one locality per superblock, consisting of all the block clusters in the population corresponding to the one block cluster in the sample. Due to the sample design, this informal idea can be made fairly precise [see Appendix].

The total U.S. population is around 250 million. Our P-12 dataset has about 12,000,000 people, and we think of it as a 1-in-20 sample; in reality, the sampling fraction varies from one part of the country to another. There were 750,000 records, so the average number of sample persons in a record is 12,000,000/750,000 = 16: given a typical post-stratum and superblock, about 16 members of the post-stratum will be found in the superblock. This corresponds to roughly 300 post-stratum members per locality, since each sample person represents some 20 people. The algorithm used to construct the P-12 records tends to equalize the local-area counts.

There are 115 PSGs; these do not overlap, and their average size is around 250 million/115 or 2.2 million people. Each PSG is defined by a combination of geography - the “territory” - and demography. The territories will have a population that is several times larger than the PSG: 5-10 million people is a representative range, so there must be several hundred localities per territory. We estimate an average of 6 blocks per superblock, hence, 120 blocks per locality, with 6,000 persons in all demographic groups combined - although this is only an order-of-magnitude calculation. We have done the P-12 aggregation ourselves on the “Berkeley subset” of census data [see Appendix], and those simulations suggest a population of 10,000 per locality. In the end, we think most localities will have populations in the range 2,500-25,000.

The geometry is confusing at first. The superblocks and localities associated with any particular post-stratum do not overlap. As we move from one post-stratum to another within the same PSG, the territory remains the same - but due to the aggregation procedure, the superblocks change and new superblocks overlap the old. (Similar statements apply to the localities.) As we move from one PSG to another, the territories change and overlap: compare post-strata (i) and (iii) above.

Although details are complicated, the basic picture is straightforward. There are two scales which govern any measurement of heterogeneity. Variability occurs within some big unit, across some small units. Here the big unit is a territory encompassing something like 7 million people. The small units are local areas encompassing some 10,000 people, with about 300 people in each of 30 demographic groups. On these scales, P-12 allows estimates of residual heterogeneity after stratification by demographic group. The measurement of heterogeneity within post-strata across the superblocks of P-12 is relatively unambiguous; tying the results to more familiar geographical units must be more tentative, due to the complexities of the P-12 data structure.

2.3 Results

Estimates of residual heterogeneity from P-12 are shown in Table 1, along with related values for comparison. The four columns correspond to the four outcome variables in our study. The values shown for and for sampling standard error are root-mean-square (RMS) values calculated over all post-strata. We report rather than 2 to make units and scale more easily understandable.

The first row of Table 1 shows s for local areas in P-12. The first entry is 22.3%, signifying that within post-strata, the local area-to-area differences in the rates of multi-unit housing are on the order of 22.3%. In other words, ascribing the overall rate of multi-unit housing to the local areas within a territory incurs an RMS error due to heterogeneity of 22.3%, even after controlling for the geographic and demographic variables in the post-stratification. This outcome reveals a remarkable degree of diversity in the clustering of apartment buildings and multiple-family houses. The other entries range from 10.7% for the non-mailback rate down to 2.3% for the Census substitution rate. The calculation of the standard errors is described below [see Appendix]; these are plausible upper bounds.

If people belonging to the same post-stratum shared the same rates, wherever they resided, the values of H2 would all be estimates of zero. Though H2 is estimating a non-negative quantity, its sample values are not constrained to be non-negative. None of the 1380 post-strata have negative estimates for multi-unit housing or non-mailbacks; five do for allocations and 50 for substitutions. The small standard errors and the rarity of negatives both indicate the strong statistical significance of the observed heterogeneity, thanks to the large sample size of P-12.

Our formula for can be applied to measure state-to-state heterogeneity by letting i in the definition range over the 50 states plus the District of Columbia. The third row of Table 1 shows the state-level RMS values of across the 1224 post-strata that intersect more than one state. We see that for multi-unit housing only falls to a little less than half its local-level value at this much larger level of aggregation. For substitutions, is still one-fourth of its local level. Heterogeneity is not simply produced by small-scale flutters in concentrations: if it were, heterogeneity would average out at larger scales like states. The values for states in Table 1 are generally only a bit smaller than the comparable values for states in Table 5 of [Freedman and Wachter 1994], where a coarser 357-fold post-stratification is used; the value for multi-unit housing is actually bigger. This suggests that the measures of heterogeneity are somewhat robust to moderate changes in the post-stratification. Put another way, refining the stratification may not yield much reduction in heterogeneity.

The practical significance of the levels of heterogeneity indicated by the first row of Table 1 may be judged by various standards of comparison. One natural comparison is with the standard deviation of the post-stratum mean rates across post-strata, shown in the fourth row of Table 1. This standard deviation suggests itself when one thinks of the values for, say, the multi-unit housing rate as entries in a two-way table whose rows are superblocks and whose columns are post-strata. The index then measures the residual variability after controlling for column effects, and the standard deviation over post-strata measures the variability “explained” by the column effects. Table 1 shows that the residual variability is roughly as large as the explained variability. That is true for the first three variables. For the fourth, substitutions, the residual variability is three times as large. For comparable data at the state level and an algebraic treatment that dispels the air of paradox, see [Freedman and Wachter 1994].

The levels of may also be judged by comparison with the sampling standard errors for the post-stratum mean rates . The last two rows of Table 1 show a low and a high estimate of sampling standard error based on a sample of the size of the PES, the post-enumeration survey for 1990. The derivation of our two illustrative estimates of standard error is explained below [see Appendix]. It turns out that the local heterogeneity measured by is much larger than the sampling standard error with samples of this size. Even at the state level, heterogeneity is comparable to the sampling standard errors for . Obviously, heterogeneity cannot be taken to be negligible in comparison with sampling variability in any settings like the ones considered here. The numbers in Table 1 are based on averaging post-strata; however, examination of scatterplots (not presented here) indicates that the conclusions hold for practically all individual post-strata.

For variables like ours, taking values between zero and one, the mean of the variable imposes a constraint on the variance due to heterogeneity. Hence we expect the levels of to be strongly influenced by the post-stratum means . For instance, only 1.1% of person-records on average are Census substitutions while 28.6% correspond to people in multi-unit housing; the corresponding s are 2.3% and 22.3%. Post-stratum by post-stratum plots of 2 versus , not given here, show tends to vary like a fraction of . We call the ratio / the “max-fraction.” Its median value across the 1380 post-strata is roughly 1/2 for multi-unit housing, 1/4 for the non-mailback rate, 1/6 for the allocation rate, and 1/5 for the substitution rate.

The “max-fraction” is given its name for the following reason. The maximum amount of heterogeneity in (say) multi-unit housing is achieved by an all-or-nothing arrangement where a proportion p of the local areas have nothing but apartments and the remaining 1 - p of the local areas have nothing but single-family houses. Under this arrangement, H takes on the maximum value consistent with an overall mean of p, namely . Under any less heterogeneous arrangement, H takes on some fraction of its maximum. The max-fraction, a sample-based estimate of the population-level quantity, is a measure of heterogeneity standardized for the level of . Since mean max-fractions are sensitive to a handful of outliers, medians may be more descriptive. For multi-unit housing, is over half the maximum possible level. By this standardized measure, the allocation rate shows the least heterogeneity and the multi-unit housing rate the most.

Table 2 presents estimated s for various groups of strata; only two age-ranges are shown. The differences are modest: for instance, values for males and females are very close. The higher s are generally associated with higher mean s. The post-strata which mix renters and owners together do not show more heterogeneity than the post-strata which separate out renters: the latter strata have the higher mean s.

Breakdowns by groups, like those in Table 2, show that heterogeneity is pervasive. Heterogeneity is not concentrated among post-strata of any particular type. Strata which mix groups like owners and renters produce similar levels of heterogeneity as strata which separate them. That outcome is further evidence that dependence on the details of post-stratification is not severe. By contrast, heterogeneity would be expected to vary with the geographical resolution. Table 3 shows s from studies with different levels of resolution; the variable used is the allocation rate.

Table 3 has results for “Public Use Microdata Areas” (PUMAs), which are aggregations of cities and counties into areas each of which contains at least 100,000 people. The results are due to Marcey-Jo Rhyne and are quoted by permission. Her post-stratification for the PUMAs follows the one used in the 1990 PES, to the extent feasible: no distinctions of place-type can be made; renters are distinguished from owners in all cases, as are blacks, non-black hispanics, Asian and Pacific Islanders, and whites and others. She looked only at allocations. It is interesting that the heterogeneity across the relatively large PUMA units within one state is nearly as high as the heterogeneity across the much smaller local areas within larger territorial groupings.

 Measuring Local Heterogeneity with 1990 U.S. Census Data Kenneth W. Wachter, David A. Freedman © 2000 Max-Planck-Gesellschaft ISSN 1435-9871 http://www.demographic-research.org/Volumes/Vol3/10