The results of Section 2 provide guidance about the likely size of errors due to heterogeneity in the Census
Bureau’s smallarea estimates of undercounts from the 1990 PES. They provide such guidance to the
extent that the P12 variables provide meaningful analogues to undercounts with respect to
placetoplace variability, and to the extent that P12 resembles the PES in sample design and
poststratification. The P12 variables were chosen specifically to provide such analogues. Like
undercounts, they are Census coverage indicators, and the Census Bureau goes so far as to call them
“proxies” or “surrogates” for undercount. The P12 sample design was chosen to be essentially the
same as that for the PES, and the poststratifications are identical. These considerations all
support the idea of taking P12 as a guide to the effects of heterogeneity on 1990 undercount
estimates.
On the other hand, there is no direct validation of the posited similarity between P12 variables and
undercounts. The main available comparisons are in terms of overall levels and indices of dispersion.
These are presented in this section. It turns out that undercounts fall well within the range of alternatives
spanned by the four P12 variables, but no single P12 variable is a close match in both level and
dispersion.
Net undercounts can be negative (when there is an overcount) but the P12 variables are always
nonnegative. This is an important difference which weakens the analogy. The net undercount is
approximately equal to the difference between two nonnegative variables, the rates of “gross
omissions” (e.g., missed persons) and “erroneous enumerations” (e.g., duplicates or fabrications).
The P12 variables may be better analogues for these two components of undercount than
for their difference, but the overall picture is complicated by the correlations between gross
omissions and erroneous enumerations which extend within poststrata all the way down to Census
blocks.
Information on levels and indices of dispersion for undercount variables are shown in Table 4. They are
to be compared to the corresponding rows for P12 variables in Table 1. In Table 4, following common
Bureau practice, centered adjustment factors are used in place of undercount rates. The centered
adjustment factor for any unit is calculated by taking the estimated true count, dividing by the Census
count, and subtracting one. The centered adjustment factor is close to the undercount rate itself. The first
column in Table 4 pertains to the Bureau’s “smoothed” adjustment factors, the factors actually used
for the Bureau’s calculation of adjusted counts. The second column pertains to the “raw”
adjustment factors. These are dualsystem estimates from PES data, calculated poststratum by
poststratum. The raw factors were transformed into the smoothed factors by an empirical Bayes
smoothing algorithm [Freedman et al. 1993]. The final two columns pertain to the gross omission
and erroneous enumeration rates. Neither Table 4 nor Table 1 is weighted for poststratum
size.
TABLE 4
The level and dispersion of a variable undoubtedly affect the numerical values of for the variable, so
the comparisons between Table 1 and Table 4 are important indicators of the relevance of P12 to
undercounts. With one exception, we see that all entries in Table 4 fall between the corresponding
values for substitutions and for allocations in Table 1. The exception is the 5.9% sampling
standard error for the raw factors, which falls above the standard error for allocations and
just below the high estimate of standard error for multiunit housing. Thus, in terms of the
quantities shown in Table 4, the P12 variables do span the relevant range, but none matches on all
dimensions.
An important conclusion is suggested by comparing the figure of 2.0% in the lower left of Table 4 with
the figures in the first row of Table 1. The 2.0% is the RMS of the Bureau’s estimates of sampling standard
error for its smoothed adjustment factors, and it is lower than any of the RMS values of
for local areas
in Table 1. If the P12 variables are at all valid analogues, then the estimated PES sampling variances are
evidently dominated by the variance due to heterogeneity measured by
^{2}. Sampling variance is the
contribution to error which the Bureau did include in its error margins for adjusted local counts
[U.S. Bureau of the Census 1991]. Variance due to heterogeneity is one of the contributions it did not
include. The data here suggest that what was left out is more important than what was put
in.
It is likely that some part of the true contribution from sampling variability was also left out. The 2.0%
figure for sampling standard deviation is believed to be a considerable underestimate [Fay and
Thompson 1993, Freedman et al. 1993]. In principle, sampling variance can be traded off against variance
due to heterogeneity by adopting a coarser or finer poststratification. But the variances due to
heterogeneity implied by Table 1 are so large that the leeway for such tradeoffs appears rather
slight.
The particular use we are making of P12, with our concentration on heterogeneity alone and our direct calculation
of within poststrata, avoids certain difficulties which would confront more ambitious
uses. We are not calculating measures of overall error for local counts or shares. Thus we are not engaged
in assessing the augmentations or cancellations of error that take place when the positive or negative
estimated adjustments for different poststrata in the same local area are added together to yield the
total estimate for the area. We cannot do so with P12, because P12 superblocks for different
poststrata do not coincide. Heterogeneity implies error both in Census counts and in adjusted
counts, and the balance between these errors appears to be a delicate function of patterns of
cancellation when poststratum contributions are summed. We are also not engaged in studying the
interaction between errors in local counts due to heterogeneity and errors at all levels due to bias
in poststratumwide adjustment factors. We are studying errors in an idealized, biasfree
setting. This setting would correspond to a PES in which the poststratumwide adjustment
factors were known perfectly. Our counterparts of poststratumwide factors, that is, our
s, are unbiased.
The poststratumwide adjustment factors in the real PES are known to be biased. There is,
of course, some ratioestimator bias. That is a sideeffect of heterogeneity, and should be
distinguished from the heterogeneity studied in this report, which affects estimated rates for local
areas within poststrata. There are other, more important, biases in the adjustment factors
estimated by the PES. Attempts have been made to measure some of these by qualitycontrol and
followup studies, but only at the level of large aggregations of poststrata. Biases are quantified in
[Breiman 1994] and in Table 15 of the Census Bureau’s P16 Project Report. Unfortunately, this
crucial table is omitted from the published version [Mulry and Spencer 1993]. There is also
unmeasured “correlation bias” resulting from the tendency for people missed by the Census to
be more likely to be missed by the PES estimates. Essentially nothing is known about how
the measured biases are distributed among the poststrata, and even less about the size and
distribution of correlation bias. Thus there is not yet a basis on which definitive assessments
of the relative accuracy of adjusted and unadjusted counts for local areas could be made 
unless some rather heroic assumptions are to be imposed on the data. For recent reviews, see
[Brown et al. 1999, Wachter and Freedman 2000], but those findings seem to be disputed in
[Prewitt 2000].
In short, at the local level, what can be made are assessments of components of error like
heterogeneity, not assessments of relative accuracy. To strengthen the assessments, it would be valuable to
relate P12 more closely to the PES. The Census Bureau (as far as we can tell) has not released data
sufficient to calculate placetoplace correlations between the variables studied here and undercounts. In
principle, substitutions, allocations, nonmailback rates, and multiunit housing rates exist along
with undercount estimates for the 5392 PES block clusters. Even more relevant than such
crosscorrelations would be autocorrelation functions for the variables, calculated as functions of
physical or notional distance between areas. The PES sample size is small for this purpose, but
some insights could be gleaned. At present, the correlations that can be computed are those
that are least relevant  across poststrata. Smoothed adjustment factors correlate 0.60 with
nonmailback rates, 0.23 with multiunit housing rates, 0.18 with substitution rates, and 0.07 with
allocation rates, across poststrata. Substitution and allocation rates correlate 0.61 with each
other.
The PES sample is too small to give estimates of heterogeneity of the precision obtained from P12. At
the local level, the data for an calculation are not available to us at all for most poststrata. At the state
level, using weighted data by poststratum and calculating as if the sampling weights were uniform
within poststrata, we find RMS values for for statetostate heterogeneity of 10% for gross
omissions and 7% for erroneous enumerations. These figures fall near the upper end of the
RMS statelevel values in Table 1. The PES estimates for single poststrata are unstable to
the extent that about 25% of poststrata come out with negative estimated values of ^{2}. The
RMS values over all 1380 poststrata are bound to be more stable, and the figures suggest that
heterogeneity in components of undercount is at least as great as heterogeneity in the P12
variables.
