Longitudinal Household Studies The HRS Software System

3. Data Management for Longitudinal Systems

The Matlab data management system, while flawed as a model for replicable surveillance, represents an appropriate model for longitudinal community health research [Note 8]. At least four alternatives to the Matlab population-based surveillance approach have been proposed. Each has been associated with technical pitfalls.

3.1 Cohort studies

The most common longitudinal studies are cohort studies in which individuals at risk are followed over time and events are monitored. Individuals are lost to observation through out-migration and death, but life table methods are used to adjust for possible biases that may arise from sample loss. The main limitation of cohort studies is attrition bias. This is particularly problematic in places like West Africa where migration is extensive. In the HRS design, monitoring social units provides a continuing basis for renewing the population at risk. In an HRS cohort, data on individuals leaving observation through out-migration are recorded together with information on individuals migrating into the study area. By recording both in-migration and out-migration, the HRS preserves an element of representativeness that is not feasible in cohort studies. Moreover, designs which use households rather than individuals as the primary level of organization provide means of generating data on migration that can be used to match records of in-migrants with out-migrant records, thereby diminishing censoring of individual event histories. Even with such aides, matching is a complex and time-consuming process. However, as discussed in section 4.2, features of the HRS in conjunction with appropriate field procedures can prove effective in resolving internal and return migration.

Although cohort studies are a seemingly efficient means of conducting longitudinal research, they are often so focused that a given study contributes little to other longitudinal studies that may be planned in the same population [Note 9]. In general the combined cost of a series of independent cohort studies is much higher than an HRS core which monitors demographic dynamics in a defined study area, provides a platform for the rapid implementation of multiple cohort studies, and allows new initiatives to build on a wealth of pre-existing data for the study population.

3.2 Longitudinal events with cross-sectional censuses

Experimental studies requiring rates on aggregate populations have been conducted with surveillance systems that monitor births and deaths only, avoiding the difficult data collection and data management problems associated with migration, under the assumption that periodic censuses provide accurate estimates of populations at risk [4,58]. This is a reasonable design if all that is needed are accurate rates for aggregate areal data. However, its usefulness is limited by the fact that individual level analysis is not possible, since risk accumulating to individuals cannot be monitored unless all components of demographic dynamics are observed. Although new "verbal autopsy" techniques, in which structured reporting by non-medical staff is used as a cheap way of identifying cause of death in broad categories, can simplify event surveillance and produce fairly reliable data, the logistical demands of conducting repeat censuses and event surveillance remain complex. Longitudinal studies require reinterviewing, and the process of recruiting, training, and deploying workers in annual rounds can actually be more costly than maintaining a team of interviewers who interview respondents in shorter rounds. The reason is that periodic restaffing of surveys in developing countries can occupy the time of expensive senior scientific staff and incur high overhead costs associated with recruitment and repeated training. These designs lead to rather superficial descriptive analyses of experimental demographic endpoints, without ancillary research on determinants or covariates of project results. Since risk at the individual or family level is unknown, most statistical methods, including regression analyses, are not applicable. Research is thus confined to the narrow goals and purposes of experiments, failing to capitalize on the full potential of field station-based studies for research on demographic determinants or interrelationships.

3.3 Relational files linked from batch files

Longitudinal studies are often designed as systems in which separate batches of data are managed for each type of demographic event, with periodic episodes of merging, matching, and linking of batches to construct relational files of longitudinal event histories for analysis. The batch-and-episode approach invites designs in which computer operations are not coordinated with field operations, allowing flawed data to accumulate before errors are corrected. The record of such studies suggests that economies achieved by simplifying data entry have been offset by the long-term costs and complexity of managing, linking, and cleaning data. If logical problems are not detected until the longitudinal files are constructed, even minor logical lapses can lead to major delays and difficulties, in some cases preventing use of data for any purpose other than rudimentary descriptions of study populations [45,62].

3.4 Panel surveys

Throughout Africa, field experiments in family planning have been implemented to test the demographic impact of service delivery systems. In the most common type of field experiment family planning is monitored as the outcome variable, although in a few studies fertility histories are compiled in the baseline and end-of-project surveys. This design permits use of statistical controls for baseline reproductive motives in the analysis of program effects. While panel designs provide important insights into the impact of family planning experiments, their demographic impact cannot be precisely gauged unless longitudinal surveillance approaches are used.

First, retrospective recall of events is always flawed, particularly in traditional nonnumerate societal settings. Panel survey rounds typically take place a year or more apart. Retrospective recall of births and deaths may be reliable over such durations, but complex procedures for recording proximate fertility determinants require information that is not reliably recalled. Family planning use, for instance, overlaps with post-partum amenorrhea or substitutes for traditional fertility regulation practices. Prospective observation is required in order to accurately measure and elucidate these dynamics.

Second, panel designs administered in annual survey rounds have proved to be no less complex to conduct than prospective surveillance studies administered in quarterly prospective cycles. In rural traditional settings where field stations are based, the task of hiring and training teams for annual panels is difficult and costly; a professional team working in a continuous interviewing cycle can be less expensive to manage and develop than recurrently convened interviewing teams because interviewer salary costs are low relative to the costs of senior project leaders and supervisors. A continuous interviewing design provides data that are relatively free of recall bias and free of the administrative complexities of regenerating interviewing teams for successive panels.

Finally, the complexity of panel designs has proved to be difficult to manage in the absence of surveillance data. As the gap between interviews increases, censoring increases owing to migration and data linkage problems. Fieldwork associated with resolving censoring problems in panel designs is more costly than corresponding field costs of surveillance systems with interviewing rounds of 90 days or less. Panel designs are selected to save costs; however, it is likely that repeat survey costs will be lower than surveillance costs only if no attempt is made to link responses over time.


Longitudinal Household Studies The HRS Software System

The Household Registration System:
Computer Software for the Rapid Dissemination of Demographic Surveillance Systems

James F. Phillips, Bruce B. MacLeod, Brian Pence
© 2000 Max-Planck-Gesellschaft ISSN 1435-9871