In the United States, census counts are used to apportion congressional seats to states, and to draw the boundaries of electoral districts within states (“redistricting”). The counts also enter the formulas for allocating tax funds to states, counties, cities, and smaller jurisdictions. Thus, the census has some effect on the distribution of power and money [Skerry 2000]. Controversy over proposed statistical adjustments of population counts from decennial censuses has stimulated an extended program of demographic research over twenty years [see 4]. These issues have been brought again to the fore by the current Director of the U.S. Census Bureau, Kenneth Prewitt.
In considering whether to certify adjusted or unadjusted counts as the official census counts, Prewitt directs attention to the problem of geographical heterogeneity in quality of coverage, which limits the accuracy of small-area estimates; he acknowledges the renewed importance of data from 1990, recognizing that decisions will be made before much of the data from the 2000 Census evaluation process will become available: he favors certifying the adjusted counts, barring some unforeseen developments when the data are collected and analyzed [Prewitt 2000]. Because the U.S. Supreme Court ruled that federal law mandates the use of unadjusted population counts for apportionment, the impact of certification will be on the use of census data for redistricting within states, and the allocation of tax funds to state and substate jurisdictions [Brown et al. 1999].
A large data set for studying geographical heterogeneity in quality of coverage for substate areas as well as for states was assembled around 1990 by the U.S. Census Bureau in its P-12 Evaluation Project. However, most analysis was directed toward state-by-state heterogeneity. In this paper, we analyze the scale of substate heterogeneity as revealed by the P-12 data, to provide scientific background for the political decisions at stake in the Prewitt report.
The issue of heterogeneity should be viewed in the broader statistical context of small-area estimation. Classical statistical sampling theory is about inferences upward from the part to the whole, from sample to population. Accuracy is limited by the size of the sample, essentially through the square root of the sample size. In small-area estimation, the situation is different. The aim is to make inferences sideways from a few parts to all other parts. The plan for the U.S. Census in 2000 calls for extrapolating sideways from a sample of 12,000 block clusters to separate estimates of census undercount for tens of thousands of local areas and each of 5 million inhabited Census blocks. Accuracy is limited not only by sample size but also fundamentally by the amount of heterogeneity from local area to local area. The square root law ceases to apply - even if all data-processing can be done without error.
It is standard practice to apply uniform ratio estimators and other small-area techniques only after stratifying on available variables like age, sex, and race [Ghosh and Rao 1994]. Through stratification some heterogeneity is removed, leaving residual heterogeneity which at some point still imposes diminishing returns on the gains in accuracy achievable from larger sample size. Before 1990, little was known about levels of residual heterogeneity and the pace of diminishing returns to sample size. Since then, interest in census adjustment has led to a series of studies in the United States [see 4], principally focussed on state-to-state heterogeneity in various indices of enumeration difficulty. The U.S. Census Bureau created, in its P-12 Evaluation Project, a unique data set suitable for studying local as well as state-level heterogeneity. The present study exploits P-12 to derive the first - albeit somewhat tentative - measurements of residual heterogeneity for local areas containing on the order of 10,000 people each.
The measurements of heterogeneity in this study provide a benchmark for assessing small-area undercount estimation in the census. The issues are summarized in [Prewitt 2000], with extensive references; for another perspective, see [Brown et al. 1999]. An underlying probability model is useful for distinguishing the effects of geographical heterogeneity treated here from other components of error [Freedman, Stark and Wachter 2000]. As well as playing a role in discussions of Census adjustment, the measurements in the present study also bear on the likely accuracy of small-area estimation in many other applications. They follow, in an American context, on the new scientific interest in structural properties of geographical heterogeneity kindled by [Le Bras 1993].
Several questions are frequently asked about research in this area. (i) Why study indices of enumeration difficulty rather than undercounts themselves? (ii) Can residual heterogeneity not be eliminated by finer stratification? (iii) What about other datasets?
(i) The Census does not measure its own undercount. Surveys that do measure undercounts, large as they are, are much too small to measure heterogeneity at any fine geographical scale. Problems with data quality in the 1990 Post-Enumeration Survey (PES) also restrict its usefulness for appraising heterogeneity. Data from the 2000 PES, renamed “Accuracy and Coverage Evaluation” (ACE), will not be available for some time, and research projects to assess the data quality in ACE have uncertain completion dates.
(ii) Possibilities for finer stratification are limited. For Census Bureau purposes, only variables recorded for all respondents on Census short forms are usable for stratification. Moreover, there is little evidence to show that doubling or tripling the number of post-strata would achieve any marked reduction in heterogeneity; we return to this point, below.
(iii) Other publicly available data sets known to us lack one or another key feature of P-12. The U.S. Public Use Microdata Samples (PUMS) only identify geographical location down to “Public Use Microdata Areas” (PUMAs) with more than 100,000 people each. The Census Bureau’s Summary Tabulation Files have precise geography but little cross-classification by stratifying variables. The other large U.S. surveys, like the Current Population Survey, are much smaller than P-12. Similar limitations of one kind or another apply to data sets collected in other developed countries. The data for French communes achieve geographical resolution an order of magnitude finer than P-12, but lack stratification variables [Le Bras 1993].
Every silver lining has its cloud, and P-12 is no exception. The P-12 data were aggregated by the Bureau in a data-dependent way into “superblocks,” in order to protect the confidentiality of the respondents. Superblocks range in size from a city block in Manhattan to some large swath of rural Wyoming. The data we have are based on superblocks: our summary statistics show the heterogeneity in these units, thereby averaging across a full spectrum of more familiar geography. The data suggest, however, that a typical superblock represents a locality whose order of size is 10,000 inhabitants, and our results are best interpreted on that geographic scale. More formal arguments are postponed to the Appendix.
Measuring Local Heterogeneity with 1990 U.S. Census Data
Kenneth W. Wachter, David A. Freedman
© 2000 Max-Planck-Gesellschaft ISSN 1435-9871