2.1 The Measure
This section presents our measure of heterogeneity, and procedures for estimating it from the data.
Throughout this discussion we restrict attention to a single variable (e.g., the proportion of persons living
in multiunit housing) and to a single demographic group, for example, Hispanic women aged 2029. Our
object of study is variability from local area to local area within broader areas. We need some terminology
and notation to clarify the distinctions.
Territory is our name for one of the broader areas, and each territory spans many local areas. The
territories are determined by the design of the underlying estimation project and datacollection effort. In
smallarea estimation with uniform ratio estimators, all small areas within one territory are assigned
a common “uniform” estimated value, based on aggregating across the territory; in census
applications, estimates are uniform within combinations of territory and demography called
“poststrata.” Typically, a territory consists of all places of a particular type across some regional
subdivision of the country. In P12, the central cities in New England are an example of a territory.
Each territory is dissected into localities: examples might be the Back Bay, North End, or Beacon Hill
in Boston; or Colorado’s Snowmass Mountain Basin, Sangre de Cristos foothills, etc. The
estimation project itself may go down to smaller units than these local areas  Census undercount
estimation goes all the way down to blocks  but we are measuring heterogeneity only down
to the scale allowed by P12. For any particular variable of interest, each local area has a
value that differs from the territorywide value, the latter being an average of the former. The
deviations from average are what we call heterogeneity, and it is heterogeneity that we are going to
measure.
In our notation, within a given territory, for a given variable and demographic group:
p_{i} is the true rate
for all n_{i} group members in the ith of L localities;
p =
_{i} p_{i}/L is the arithmetic mean of the true local rates.
The quantity whose measurement is the goal of the study is the populationlevel “variance due to
heterogeneity,” the variance of the true local rates about their mean:
This quantity is important because it appears in formulas for estimation error. Let
be an estimator of p.
The mean squared error of estimation that results from using as the estimate of p_{i} for all local areas as if
they shared a common value is
The first term on the right, H^{2}, represents errors due to heterogeneity resulting directly from attributing one
common rate to local areas whose true rates vary. The second term represents errors due to
bias and sampling variability in the territorywide estimator .
The H^{2}term only vanishes
if the local areas are in fact homogeneous, so that all p_{i} equal each other and
hence equal p. Otherwise, H^{2} is a contribution to error which cannot be
reduced by increasing sample size.
Some fine points require mention. To begin with, H^{2} is centered on p, in order
to make straightforward the interpretation as a variance due to heterogeneity. Similarly, p is the
unweighted mean of the p_{i}. The P12 data were aggregated by a procedure that
tends to equalize the counts of stratum members in local areas, so there is little numerical difference
between weighted and unweighted means. Conceptually, however, the weighted mean is the natural target
of a ratio estimator obtained from a numerator and denominator separately aggregated over i.
If is such an estimator,
then the term (p  )^{2}
in {2} includes the squared difference between weighted and unweighted means, as well as a
contribution from sampling error and from “ratio estimator bias,” whose underlying source is
again heterogeneity among the p_{i}. Ratio estimator bias enters through
p  , decreases with
sample size, and is a minor part of the story compared to H^{2} [Freedman, Stark and Wachter
2000].
We now discuss estimation of H^{2} from the P12 sample. Temporarily, we have fixed a territory and a
variable, and are considering only persons in one demographic group. In our setup, the ith superblock
represents the ith locality, so the number of superblocks coincides with the number of localities.
Let
_{
i} be the rate for the N_{i} sample persons in the ith superblock;
=
_{i} _{i}/L be the mean of these rates.
A naive estimator of H^{2} would be
However, this estimator of variance due to heterogeneity is inflated by variance due to sampling. We
therefore define by the equation
The second term is an approximate correction for sampling variability in
_{i} and [see Appendix].
2.2 Variables and Strata
Four variables are examined in our study:
(i) The multiunit housing rate is the proportion of persons residing in multiunit structures.
(ii) The nonmailback rate is the proportion of people who did not mail back their Census form, out of all
people in the Census who were meant to mail it back.
(iii) The allocation rate is the proportion of persons with at least one of six key characteristics imputed.
The six characteristics are relationship to householder, age, sex, race, Hispanic origin, and marital
status.
(iv) The substitution rate is the proportion of persons whose whole record was imputed or
“substituted” into the Census, typically in households from which no detailed information was
obtained.
These variables provide a good variety of cases with which to examine local heterogeneity. They include
one structural variable, one behavioral variable, and two measures of data completeness, all taking values
between zero and one. They are four of the five main variables treated in the Census Bureau’s P12 Project
Report [Kim 1991]: the fifth variable, the mail universe rate, is more narrowly administrative in character,
and is not considered here.
Documentation of the P12 data set is to be found in [Thompson 1990, U.S. Bureau of the Census 1990,
Bateman 1991, Kim 1991]. The sample is a stratified cluster sample selected using essentially the same
design as the Census Bureau’s 1990 PostEnumeration Survey (PES) but with 116,619 block clusters
in place of 5293. These clusters include 204,394 blocks compared to 12,964 blocks for the
PES.
The stratification is the one proposed for adjusting the 1990 Census. The population is divided into
116 PSGs (“PostStratum Groups”) defined in part by geography and in part by demography. The
geographical classification is based on census division (9 areas) and placetype (7 types). The
demographic breakdown is by raceethnicity (4 categories), and renterowner status (2 groups). In
principle, there could be 9 × 7 × 4 × 2 = 504 PSGs, but smaller ones are collapsed and the largest urban
areas are treated differently. In total, there are 116 PSGs: a list can be found in Table A.1 of Hogan (1993).
We exclude the PSG for Indians living on reservations, and deal with the remaining 115; we also exclude
the socalled “residual population” not surveyed by the PES. Each PSG is broken down by
6 age groups and 2 sexes into 12 “poststrata,” so we have 115 × 12 = 1380 poststrata to
consider.
Each poststratum is defined in part by demography (raceethnicity, renterowner, age, sex)
and in part by geography (census division, placetype). Three examples give the flavor of the
poststrata:
(i) Nonminority females age 09 living in a central city in New England.
(ii) Black males age 1019 living in rental units in a central city in a large metropolitan area in the South
Atlantic division (Florida, Georgia, and so forth).
(iii) Black and hispanic females age 60 and over in New England, living either in a central city or in a
metropolitan area but not in its central city.
The geography is the “territory” associated with the poststratum, and the demography can be considered
as providing the stratification within which smallarea estimation takes place. In poststratum (i), for
instance, the territory is central cities in New England; smallarea estimation would be uniform within the
part of this territory inhabited by the demographic group consisting of nonminority females age 09. With
poststratum (ii), the territory consists of central cities in large metropolitan areas in the South Atlantic
division, and the demographic group consists of black males age 1019 living in rental units. Thus, the
boundaries of the territories depend on the poststratum, and so will the dissection of each territory into
local areas. For groups whose members are numerous, high resolution is possible and the
local areas are small. For groups whose members are few and far between, the local areas are
extended.
As noted above, the specification of territories, and localities within a territory, is datadependent.
More specifically, the records in P12 correspond to unique  and nonoverlapping  intersections of
poststrata and superblocks. These records were built up from more basic information for each
poststratum and sample block. The algorithm used to build up the records required that for any given
poststratum, a P12 record must have at least ten poststratum members; the corresponding
geography must span whole Census block clusters and must not cross state lines. (These three
constraints could not always be satisfied, and we eliminated some 2000 exceptional records.)
Given a poststratum, a “superblock” is the collection of block clusters put together during the
construction of a record; there is one superblock per record. There is one locality per superblock,
consisting of all the block clusters in the population corresponding to the one block cluster
in the sample. Due to the sample design, this informal idea can be made fairly precise [see
Appendix].
The total U.S. population is around 250 million. Our P12 dataset has about 12,000,000
people, and we think of it as a 1in20 sample; in reality, the sampling fraction varies from one
part of the country to another. There were 750,000 records, so the average number of sample
persons in a record is 12,000,000/750,000 = 16: given a typical poststratum and superblock,
about 16 members of the poststratum will be found in the superblock. This corresponds to
roughly 300 poststratum members per locality, since each sample person represents some
20 people. The algorithm used to construct the P12 records tends to equalize the localarea
counts.
There are 115 PSGs; these do not overlap, and their average size is around 250
million/115 or 2.2 million people. Each PSG is defined by
a combination of geography  the “territory”  and demography.
The territories will have a population that is several times larger than the PSG: 510 million people is a
representative range, so there must be several hundred localities per territory. We estimate an average of 6
blocks per superblock, hence, 120 blocks per locality, with 6,000 persons in all demographic groups
combined  although this is only an orderofmagnitude calculation. We have done the P12 aggregation
ourselves on the “Berkeley subset” of census data [see Appendix], and those simulations suggest a
population of 10,000 per locality. In the end, we think most localities will have populations in the range
2,50025,000.
The geometry is confusing at first. The superblocks and localities associated with any particular
poststratum do not overlap. As we move from one poststratum to another within the same PSG, the
territory remains the same  but due to the aggregation procedure, the superblocks change and
new superblocks overlap the old. (Similar statements apply to the localities.) As we move
from one PSG to another, the territories change and overlap: compare poststrata (i) and (iii)
above.
Although details are complicated, the basic picture is straightforward. There are two scales
which govern any measurement of heterogeneity. Variability occurs within some big unit,
across some small units. Here the big unit is a territory encompassing something like 7 million
people. The small units are local areas encompassing some 10,000 people, with about 300
people in each of 30 demographic groups. On these scales, P12 allows estimates of residual
heterogeneity after stratification by demographic group. The measurement of heterogeneity within
poststrata across the superblocks of P12 is relatively unambiguous; tying the results to more
familiar geographical units must be more tentative, due to the complexities of the P12 data
structure.
2.3 Results
Estimates of residual heterogeneity from P12 are shown in Table 1, along with related values
for comparison. The four columns correspond to the four outcome variables in our study.
The values shown for and for sampling standard error are rootmeansquare
(RMS) values calculated over all poststrata. We report rather than
^{2} to make units and scale more easily understandable.
TABLE 1
The first row of Table 1 shows s for local areas in P12.
The first entry is 22.3%, signifying that
within poststrata, the local areatoarea differences in the rates of multiunit housing are on the order of
22.3%. In other words, ascribing the overall rate of multiunit housing to the local areas within a territory
incurs an RMS error due to heterogeneity of 22.3%, even after controlling for the geographic and
demographic variables in the poststratification. This outcome reveals a remarkable degree of diversity in
the clustering of apartment buildings and multiplefamily houses. The other entries range
from 10.7% for the nonmailback rate down to 2.3% for the Census substitution rate. The
calculation of the standard errors is described below [see Appendix]; these are plausible upper
bounds.
If people belonging to the same poststratum shared the same rates, wherever they resided,
the values of H^{2} would all be estimates of zero. Though
H^{2} is estimating a nonnegative
quantity, its sample values are not constrained to be nonnegative. None of the 1380 poststrata
have negative estimates for multiunit housing or nonmailbacks; five do for allocations and
50 for substitutions. The small standard errors and the rarity of negatives both indicate the
strong statistical significance of the observed heterogeneity, thanks to the large sample size of P12.
Our formula for can be applied to measure statetostate
heterogeneity by letting i in the definition range over the 50 states plus the District
of Columbia. The third row of Table 1 shows the statelevel RMS values of
across the 1224 poststrata that intersect more than one state. We see that
for multiunit housing only falls to a little less than half its locallevel value at this much larger level
of aggregation. For substitutions, is still onefourth of its local level.
Heterogeneity is not simply produced by smallscale flutters in concentrations: if it were, heterogeneity
would average out at larger scales like states. The values for states in Table 1 are generally only
a bit smaller than the comparable values for states in Table 5 of [Freedman and Wachter 1994],
where a coarser 357fold poststratification is used; the value for multiunit housing is actually bigger.
This suggests that the measures of heterogeneity are somewhat robust to moderate changes in the
poststratification. Put another way, refining the stratification may not yield much reduction in
heterogeneity.
The practical significance of the levels of heterogeneity indicated by the first row of Table 1 may be
judged by various standards of comparison. One natural comparison is with the standard deviation of the
poststratum mean rates across poststrata, shown in the fourth row of
Table 1. This standard deviation suggests itself when one thinks of the values for, say, the multiunit
housing rate as entries in a twoway table whose rows are superblocks and whose columns are poststrata. The
index then measures the residual variability after controlling for column
effects, and the standard deviation over poststrata measures the variability “explained” by
the column effects. Table 1 shows that the residual variability is
roughly as large as the explained variability. That is true for the first three variables. For the
fourth, substitutions, the residual variability is three times as large. For comparable data at
the state level and an algebraic treatment that dispels the air of paradox, see [Freedman and
Wachter 1994].
The levels of may also be judged by comparison with the sampling standard
errors for the poststratum mean rates . The last two rows of Table 1
show a low and a high estimate of sampling standard error based on a sample of the size of the PES, the
postenumeration survey for 1990. The derivation of our two illustrative estimates of standard error
is explained below [see Appendix]. It turns out that the local heterogeneity measured by
is much larger than the sampling standard error with samples of this size.
Even at the state level, heterogeneity is comparable to the sampling standard errors for
. Obviously, heterogeneity cannot be taken to be negligible in
comparison with sampling variability in any settings like the ones considered here. The numbers in
Table 1 are based on averaging poststrata; however, examination of scatterplots (not presented here)
indicates that the conclusions hold for practically all individual poststrata.
For variables like ours, taking values between zero and one, the mean of the variable imposes a
constraint on the variance due to heterogeneity. Hence we expect the levels of
to be strongly influenced by the poststratum means .
For instance, only 1.1% of personrecords on average are Census substitutions while 28.6% correspond
to people in multiunit housing; the corresponding s are 2.3% and
22.3%. Poststratum by poststratum plots of ^{2} versus
, not given here, show tends to vary like a
fraction of . We call the ratio
/
the “maxfraction.” Its median value across the
1380 poststrata is roughly 1/2 for multiunit housing, 1/4 for the nonmailback rate, 1/6 for the allocation
rate, and 1/5 for the substitution rate.
The “maxfraction” is given its name for the following reason. The maximum amount of heterogeneity
in (say) multiunit housing is achieved by an allornothing arrangement where a proportion
p of the local areas have nothing but apartments and the remaining 1  p of the local areas
have nothing but singlefamily houses. Under this arrangement, H takes on the maximum
value consistent with an overall mean of p, namely .
Under any less heterogeneous arrangement, H takes on some fraction of its maximum.
The maxfraction, a samplebased
estimate of the populationlevel quantity, is a measure of heterogeneity standardized for the level
of . Since mean maxfractions are sensitive to a handful of outliers,
medians may be more descriptive. For multiunit housing, is over half the
maximum possible level. By this standardized
measure, the allocation rate shows the least heterogeneity and the multiunit housing rate the
most.
TABLE 2
Table 2 presents estimated s for various groups of strata; only
two ageranges are shown. The differences are modest: for instance, values for males and females are very close.
The higher s are generally associated with higher mean
s. The poststrata which mix renters and owners together do not
show more heterogeneity than the poststrata which separate out renters: the latter strata have the higher
mean s.
Breakdowns by groups, like those in Table 2, show that heterogeneity is pervasive. Heterogeneity is not
concentrated among poststrata of any particular type. Strata which mix groups like owners and renters
produce similar levels of heterogeneity as strata which separate them. That outcome is further evidence
that dependence on the details of poststratification is not severe. By contrast, heterogeneity would be
expected to vary with the geographical resolution. Table 3 shows s from studies with different levels of
resolution; the variable used is the allocation rate.
TABLE 3
Table 3 has results for “Public Use Microdata Areas” (PUMAs), which are aggregations of cities
and counties into areas each of which contains at least 100,000 people. The results are due
to MarceyJo Rhyne and are quoted by permission. Her poststratification for the PUMAs
follows the one used in the 1990 PES, to the extent feasible: no distinctions of placetype can be
made; renters are distinguished from owners in all cases, as are blacks, nonblack hispanics,
Asian and Pacific Islanders, and whites and others. She looked only at allocations.
It is interesting that the heterogeneity across the relatively large PUMA units within one state is nearly
as high as the heterogeneity across the much smaller local areas within larger territorial groupings.
