Introduction Data Management for Longitudinal Systems

2. Longitudinal Household Studies

2.1 The role of field stations in health research

The data management issues associated with monitoring births, deaths, marriages, and migrations in a fixed geographical region are complex and can complicate health research and intervention efforts. Despite the intrinsic complexity of longitudinal data systems, investment is justifiable on scientific grounds. Demographic events of interest, such as neonatal, post-neonatal, child, and maternal mortality are sufficiently rare as to require precise information on the events of interest and the population at risk. Cross-sectional survey techniques, so widely used in fertility research, are ill-suited to mortality studies because the determinants of mortality, morbidity, nutritional status, lactational behavior, etc. are either intervening events or longitudinal processes that can only be effectively studied in concomitant event history analyses. Recall biases associated with retrospective studies seriously compromise inference. Furthermore, health service interventions may vary over time, and their impact may be a function of their timing relative to seasonality of adversity and specific episodes of illness. When household relationships and characteristics are maintained longitudinally, time-referenced data can clarify the causes and consequences of adult, maternal, and child morbidity and mortality.

For many health studies, members of a household must be studied as a group. Strong observed correlations between social and economic status and health outcomes attest to the value of information on household member relationships, customs, and behaviors in research on the determinants of survival. It is crucial for research on health interventions to account for household-level characteristics, since health service effectiveness is determined as much by household social and behavioral factors as by the efficacy of medical technology.

Health research protocols require data on specific determinants of illness and mortality over time. Because mortality events are rare, research that focuses on mortality outcomes requires observation of large study populations. Morbidity events of interest are intermittent, requiring prospective studies rather than retrospective surveys. Field stations are established where individuals in populations can be observed in laboratory fashion, and interventions can be assigned to individuals in randomized trials or to groups in factorial experiments. Research on mortality requires a defined population at risk, and this requirement obliges investigators to monitor all components of population dynamics: births, deaths, in-migrations, and out-migrations [Note 2]. Monitoring population dynamics with demographic surveillance is thus the core scientific resource for field station-based epidemiological research.

Field stations have been used for epidemiological research ever since the 1920s [Note 3]. Early epidemiological research demonstrated the influence of sociological and demographic factors on the occurrence of disease and illness, establishing the importance of social research in understanding morbidity risks [Note 4]. In the 1950s, the use of field stations was expanded from investigating disease to conducting the controlled trial of health interventions [Note 5]. Field stations were developed in Guatemala, India, and Pakistan for international health research programs [3,27,28,31,35,52,54,55,63]. Field stations were also employed for population research in the 1950s. The Khana study, launched in India in 1953, was a test of the impact of family planning service delivery on fertility in eleven rural Punjab villages. A similar study was launched in Singur, West Bengal in 1956 [Note 6].

2.2 The Matlab station in Bangladesh

Of the field stations that have been established in developing countries for health research, the most productive has been the Matlab station of the International Centre for Diarrhoeal Disease Research, Bangladesh. Originally established in 1960 as a project for testing vaccines against cholera, the Matlab station eventually became a population laboratory for a wide range of research on the epidemiology of enteric disease, the determinants of health behavior and survival, and the impact of community health and family planning services [Note 7].

The Demographic Surveillance System (DSS) is the core scientific resource of the Matlab field station. The DSS enables researchers to conduct research on a defined population over time. The DSS identifies the population in study areas at any point in time and monitors all components of demographic dynamics over time: Births, deaths, migration into study areas and migration out of study areas. In the Matlab DSS, risk is measured at the individual level and calculation of person-days of observation permits individual-level longitudinal studies of greatest scientific interest: Individual-level randomized trials, causal analysis of determinants of demographic outcomes, the analysis of concomitant events, and other statistical studies of covariates of rates. By following individuals over time, the Matlab DSS thus provides the basis for a wide range of research protocols and activities in health and population [16-18,40].

Although Matlab has been a productive source of research on health and population issues, Matlab data management technology has not been utilized by other field stations. For the DSS operation, several thousand lines of database computer code are managed on a mainframe system in Dhaka, geographically remote from Matlab field operations. Field work and computing are managed separately, an operation that requires a large clerical staff for managing data correction operations. Managing this complex computer and field system requires a team of highly trained experts. Changing the simplest parameter of system code involves time-consuming and expensive expert assistance, and replicating Matlab computer operations demands a substantial investment in system development. New systems, in turn, have little to gain from using Matlab software designs that require expensive mainframe hardware.

The HRS is designed to resolve limitations of the Matlab system while replicating features of Matlab technology that make it a productive scientific center. The HRS preserves Matlab principles of structure, linkage, and checking, while introducing modern principles of database management, low-cost microcomputer hardware, and flexible and extensible software. We turn next to a discussion of alternatives to the Matlab/HRS data model before describing the HRS software.


Introduction Data Management for Longitudinal Systems

The Household Registration System:
Computer Software for the Rapid Dissemination of Demographic Surveillance Systems

James F. Phillips, Bruce B. MacLeod, Brian Pence
© 2000 Max-Planck-Gesellschaft ISSN 1435-9871