Abstract Longitudinal Household Studies

1. Introduction

Population laboratories have been an invaluable resource for research on the determinants of demographic dynamics, the prevention of disease, the impact of health technologies, the efficacy of family planning services, and the consequences of demographic change. Since longitudinal data are complex to compile and manage, however, most population laboratory-based research has been produced by a few generously funded, rigorously managed, and well-equipped research stations located in developing countries. Of these, the Matlab Demographic Surveillance System in Bangladesh exemplifies characteristics of population laboratories that justify their cost: Individuals in a large population are registered in a computerized database with field surveillance designed to record all events and changes in household relationships over time. Procedures record person-days of risk accruing to individuals so that rates, intervals, and relationships are readily linked with covariates of interest, thereby providing the machinery for a wide range of practical health and demographic research applications [18,51].

The logical rigor of the Matlab data management system which makes this important longitudinal health research possible is also a factor constraining its replication. The posting of demographic information to computer files is associated with a myriad of checks on the logical consistency of new events with extant registered information. Over 4000 lines of computer code are required to check that computer registers correspond to logically possible relationships, event patterns, and dynamics. Generating a system of this size and complexity requires a team of skilled computer scientists, demographic research guidance, and field capacities to respond to queries and advise system managers of needed changes. The Matlab system was developed over a period of three decades of intensive field research, but few contemporary studies have this luxury of time and resources. Systems must be developed rapidly for focused investigations. This typically leads to compromising simplifications that inadvertently lead to system limitations, eventually producing intractable data management challenges [Note 1]. The long delays involved in developing surveillance systems and utilizing surveillance data have elevated research costs and diminished the usefulness of data for policy. As a consequence, the field station model has been criticized as too costly and complex for practical applications. The few sites where systems are functioning are overburdened with multiple research protocols.

This paper presents an overview of an initiative addressed to the problem of generating low-cost and rigorously designed demographic surveillance systems. The Household Registration System (HRS) is a software system currently in use in eleven research sites in Africa and Asia. The HRS exploits recent developments in object-oriented programming and automated program generation to simplify the process of developing data management systems for a diverse collection of longitudinal household studies. The computational foundation of the HRS is a relational database system that resolves many of the complex data management issues associated with monitoring births, deaths, marriages, and migrations in a fixed geographical area. Experienced or novice programmers can build on this foundation by changing or adding to a collection of visually presented objects that correspond to the various items on data entry screens.

Some of these objects have small amounts of computer code (ten to twenty lines) associated with them that control, for instance, the range of legal values, legal relationships with other variables, and the circumstances under which the field can be edited. Programmers who modify and extend the system need only to specify variable names, legal values for variables, and screen layouts for additional data. The programmer can also give rules governing how new data must relate to other variables or to previously compiled information, either within the same record or in a different record. A user can then regenerate the system with project-specific data and new code integrated with the HRS core to create a project-specific demographic surveillance system.

The code generated by the HRS maintains, retrieves, and reports on cross-sectional and longitudinal data associated with studies of households and their members. Investigators are freed from the need to rewrite and modify large amounts of computer code. The generated database programs incorporate rigorous design principles, enforce logical checks on much of the database information, and are capable of reporting key demographic rates. The amount of error-free program code generated for a particular application is substantial, fulfilling the requirements of demographers, epidemiologists, and social scientists for comprehensively cross-referenced information on households and members. De novo development of data management software required for longitudinal demographic surveillance can require years of technical assistance from highly trained specialists; the end result is often a system that no one else can understand, modify, or manage. This "expert assistance" model extracts ownership from the research institution and reduces the policy relevance of research. The HRS, in contrast, embodies an empowering form of technology transfer to developing country institutions. While the system is appropriate for research in the developed world as well, the lack of permanent identification numbers and computerized records in most developing countries vastly complicates the data management of longitudinal research. The HRS is built around a core structure that is flexible enough to accommodate most longitudinal studies of populations. Innovations in object-oriented programming permit even beginning programmers to easily modify these core specifications and tailor the system to their needs. Finally, the structure of the HRS lends itself to a remote technical assistance model, whereby data managers can email difficulties to an HRS "expert" who can in return email short code segments to be incorporated into that site's HRS. After reviewing the technical requirements of longitudinal population studies [section 2] and various alternatives to the HRS data model [section 3], this paper discusses the structure of the HRS and describes the process of tailoring and operationalizing the system to a specific study site [section 4]. Finally, in "Next Steps," we address plans to further enhance the system through modifications that are being incorporated into a new version, the HRS-3, which is currently under development [section 5].


Abstract Longitudinal Household Studies

The Household Registration System:
Computer Software for the Rapid Dissemination of Demographic Surveillance Systems

James F. Phillips, Bruce B. MacLeod, Brian Pence
© 2000 Max-Planck-Gesellschaft ISSN 1435-9871