Data Management for Longitudinal Systems Conclusion

4. The HRS Software System

[Note 10]

4.1 A model for household registration and surveillance

The HRS is a software system currently operating in eleven research centers in Asia and Africa [see Table 1] that maintains a consistent record of significant demographic events that occur to a population in a fixed geographic region, generates up-to-date registration books that are used by the field workers, and computes basic demographic information (age-specific birth, death, and migration rates; age and sex distributions of the population; and life table functions). Investigators add to and sometimes modify this core set of data and software to tailor the HRS to their particular projects. The investigator can insert data fields, define logical checks, and interactively specify the screen layout. Core data and code specification and the ability to modify and add to the core specification all combine to give the user the flexibility to develop a site-specific household registration program [37-38].

4.2 Integrated data collection and data management

The HRS data management system operates in conjunction with a field system for collecting data on household members. The HRS requires an integrated field operation and data management system, as shown in Figure 1. No particular requirement for the duration of the work interval is specified in the HRS. However, 90 days has been selected for most HRS applications because this interval is short enough to ensure that all pregnancies can be seen by interviewers within a round, but long enough to ensure that all data can be entered, checked, processed, and reported in the work cycle. The central management task in HRS operations involves ensuring that field operations are linked to computer operations so that errors and problems are noted, fed back to interviewers, and corrected within the routine work cycle. Transactions with paper "Household Registration Book" (HRB) registers are designed to match computer database transactions. In a typical HRS application, detailed instructions are issued to ensure standardized interviewing and recording, with registers designed to maintain four rounds of interviews to facilitate probing and data checking at the time of visitation (Item a, Figure 1). Supervisors check a random sub-sample of the study population. This re-interview is used to detect problems to be discussed in monthly staff meetings. The HRB is arranged by household in the order that households are contacted.


Upon completion of a round of interviewing, registers are passed to data entry staff for updating the computer database (b). Upon entry, a series of logical checks are imposed to assess the consistency of the new information with data already present in the database. Clean data are archived (c), and an error report is generated (d) and reported to the field (e). Supervisors examine error reports and review actions to be taken with relevant staff. The HRB is manually corrected (f), and relevant corrections on event forms are communicated to the data management center for processing (g). Data passing all tests are archived, and information from households failing logical tests is printed with relevant error messages for field diagnosis and correction. Thus each cycle generates fully edited and cleaned data before updated HRBs are printed (h) or a new cycle of data collection begins (a).

The critical event in the Figure 1 cycle is the shaded box between (b) and (c), which imposes uniform logical rules on data as they are entered and provides reports to field supervisors and data managers at the time of data entry [29]. This step ensures both that the clerk is entering legal values for the variables and that any information recorded by the system is logically consistent with demographic information compiled in the past. For instance, the visit date must lie within the three-month period of the round in question, and the date of a recorded event cannot fall after the visit date. A woman who was observed as pregnant during the previous round must either be observed as still pregnant or have a pregnancy outcome (birth or miscarriage) recorded. A member who has migrated out of the study area cannot have any events recorded unless s/he has migrated back in.

Accurately monitoring migration represents one of the most daunting challenges to longitudinal research, especially in developing country settings that lack computerized records and permanent individual identification numbers. Monitoring the population in a fixed geographic area can reduce censoring by tracking migration and linking individual histories from in- and out-migration records. In practice, of course, the field process of resolving internal and return migration is difficult, time-consuming, and error-prone. The most significant problems occur when departure is not detected in a timely fashion, i.e. when an individual who moves within the study area is observed to have arrived in a new location before the corresponding departure from the previous residence is registered. Rapid resolution of this double entry is necessary in order to minimize the artificial inflation of the population at risk.

Features of the HRS help streamline the reconciliation of migration movements. The system generates reports of all in-migrants in a round who are not recognized as previous members of the study population and all out-migrants who have not been recorded as an in-migrant elsewhere in the same round. These reports can be generated for specific subsets of the population such as men, children, or residents of particular geographic areas, in order to facilitate the matching process. Plans for subsequent versions of the HRS include features to further simplify this process. Even with these features and personal identification information (such as names and dates of birth), resolving migration remains a difficult problem. It is essential that the software operate in conjunction with well-designed and effective field procedures to follow up and resolve migration inconsistencies.

The design of data collection procedures has been informed by database concepts. Workers are equipped with a register, printed from the population database, that is designed to facilitate data management. Pages are arranged in the order that households are visited. Rows in the register correspond to individuals in the households, and page headings include the name of the household head and information about characteristics that household members share as a group, such as primary family religion and household wealth and size. Rows for individuals list names, ages, relationships, and other basic information. Each column in the worker register corresponds to a visit cycle, and space is provided in each column for workers to enter codes corresponding to vital events or household status changes observed during that round (births, deaths, marriages, migrations, and pregnancies). This procedure limits the flow of loose paper and enforces data linkage between observed and past events at the time of the visit. If the register lists a woman as being pregnant at the last visit, the interviewer will know to probe whether she is still pregnant or whether a birth or miscarriage occurred. Since event data are recorded together with data on the individuals at risk, the worker's register structures data collection in a manner that is conceptually similar to computer operations for linking and checking records in the database.

Data entry involves passing registers to the data entry clerks who key in the household numbers and event codes and perform other requisite transactions with the database [Note 11]. The entry of only the events that affect the structure of a household (births, deaths, marriages, migrations) represents one of the fundamental differences between the HRS and other batch-oriented systems. Rather than completely reinterview the sample population, create a new record for each household, and then, with the computer, link those records back to previous interviews (an error-prone process), individuals in the study area are linked to households and past-event histories with the paper registers carried by the interviewer (mirroring the linkages in the relational database). This substantially reduces the quantity of data entry and the subsequent costs of consistency checking. Most data inconsistencies are caught at data entry and reports of these inconsistencies are printed for supervisory action. These field operations are deemed sufficiently well developed at this point to be generic to any longitudinal study of demographic dynamics [Note 12].

4.3 The Core HRS Data

There are characteristics of households, members, relationships, and demographic events that are common to all longitudinal studies of human populations. The logic for these characteristics is embedded in the core system. The HRS structures data and maintains logical integrity on the following basic elements of a household unit:
  • All households have defined members. Rules unambiguously exclude nonmembers.
  • All households have a single head at a given point in time, and members relate to one another and to the head in definable ways.
  • Members have names, dates of birth, and other characteristics that do not change.
  • Events can occur to members, such as death, birth, in- and out-migration, and marital status change. Events change household membership or relationships according to fixed rules.
  • Episodes (longitudinal events such as pregnancy, marriage, or residency) occur to individuals at risk (i.e. active members) and must follow simple logical relationships.

    This core structure must be adapted to conform to the local population and area. A household member is usually defined as a person registered as resident in the household at the initial enumeration, born to a member of the household, or who migrated into the household. The definition of a household head is somewhat more complex. In most current HRS applications, for instance, the household head must be a member of the household. A study operating in a polygamous society might need to relax this requirement, since one man could be the head of several households of which he is not a resident.

    Although the list above is seemingly trivial, everyday relationships tend to become complex and unwieldy when arrayed as a logical system of longitudinal population data. Portraying even simple relationships requires rigorous standards to avoid error. For example, to register a death in the population, a household member must be resident in the study area; a birth to a woman five months after she gave birth to another child is an inconsistent event. This logic may seem mundane, but lapses in the integrity of data management can generate deaths to individuals who are not logically members of the risk set, births to nonexistent mothers, or migrations among the deceased. The accumulation of minor logical lapses can render data useless for all but the most basic analyses. In addition, errors generated at one stage tend to cause additional errors in later stages; this compounding effect can quickly cause the database to become completely out of step with the study population. Longitudinal household research requires defining rigorous and unambiguous standards for data management. The logical integrity of the HRS core when paired with appropriate field procedures permits these standards to be met.

    4.4 The functional components of HRS software

    Viewing the HRS software from a user's perspective provides an overview of the HRS program structure. First, we present screens as they appear to a data entry clerk. Subsequently, we discuss data entry screens that a programmer would use to modify and extend the software.

    When a user first logs onto the HRS, the following main menu screen appears:

    FIGURE 2

    The options in the main menu of the HRS are:
  • Data entry: Allows for the entry, deletion, and editing of the baseline and longitudinal data. Baseline household information includes the household location, individuals within the household, relationships between individuals, and familial social groups. Longitudinal information includes basic information related to pregnancy observations and outcomes, deaths, migrations in and out of the study area, marriages, and any other measures specified by the investigator.
  • Validation: Checks the logical consistency of data for subsets of households and members.
  • Reports and Output: Calculates and displays demographic rates and life tables. Age-specific and overall rates can be computed.
  • Visit Register: Used to print the household registration book. The household registration book is used by the field workers to record information during household interviews.
  • Utilities: An option that is primarily used by the system administrator. It includes capabilities for adding new user IDs, setting interview round information, and generating reconciliation reports to help track down unreported pregnancy outcomes and unmatched internal migrants.

    Collectively, these functions form a part of every application generated from the HRS.

    Most HRS user interaction occurs in the data entry screens. Two screens are fundamental to the data entry process: the baseline screen which allows data entry from the initial enumeration, and the update screen for the entry of longitudinal information. The data entry window for baseline information is presented in Figure 3:

    FIGURE 3

    Much of the information displayed in this screen can be adjusted to suit project-specific needs. For example, the labels and many of the field values can be changed. Also, while the HRS requires that all locations and individuals have unique IDs, the format of the ID (character, numeric) can be changed from project to project. The HRS requires that the gender of individuals be input to the system, but again, the format of this information (e.g. M/F, 1/2) can be changed. More details are provided in the HRS user documentation ( 

    After entering the initial enumeration of locations, individuals, and relationships, the HRS is used to generate field books for entering demographic information collected during subsequent visits to the locations. After every visit round, the field worker will hand the books to the computer center and the changes in the demographic status of the household can be entered using the update screen:

    FIGURE 4

    The data entry clerk enters the information at the top of the screen (location ID, ...). Once this information is provided, the grid of individuals currently resident at the location is displayed. If an event occurred to an individual (for example, a birth to Ajua Adugbire), then the user would scroll down to the relevant individual and click on the button representing the event (in our example, the Preg Outcome button would need to be pressed). Clicking on buttons causes an event-specific entry screen to appear that collects additional information about the event (an example of an event screen is provided below).

    The above screen can be adapted to accommodate new types of "events" by adding additional event buttons. For example, suppose that information about malaria fever episodes occurring to children under five years old is required. A "Malaria fever" button could be added and checks could be put in place to restrict entries to the appropriate subset of individuals (in this example, under-five children). The process of adding new fields is described in section 4.6.

    The event forms collect additional information about the particular event. One of the more complex events is a pregnancy outcome. After selecting Ajua Adugbire and clicking on the Preg Outcome button, the data entry clerk would see the following screen:

    FIGURE 5

    Certain information is automatically entered by the system (field worker, mother ID, observation number, and previous birth history). A unique Event ID is automatically generated to refer to this birth. The clerk fills in the other fields, including type (live vs. still birth), date, and father ID. Since this is a live birth, the system generates a new Individual ID number for Akalou Adugbire. Her father's ID is that of Akumdaare Adugbire, the household head [see Figure 4], thus her "Relation to Head" is recorded as "CHD" (child). Finally, the Status of Data field reflects the consistency status of the record. It initially contains the value "P" (pending); when the record has passed all validation checks, it is set to "V" (valid).

    Information can be added to this core data about a birth. For example, a birth attendant field could be added to the information at the top of the screen and a birth-weight field could be added to the child-level fields in the grid.

    The other event forms are similar in structure to (although in most cases, less complicated than) the Pregnancy Outcome form.

    4.5 Output

    Once data has been collected over a time interval, the HRS provides capabilities for generating key demographic rates, such as fertility, mortality, and migration. The HRS can also produce life tables and a population age distribution. The various rates can be calculated for a user-specified time interval and for different geographic subsets of the population.

    4.6 Internal Logic and Customization of the HRS System

    Typically, a study will require amending the core HRS specification to include new systems of variables on household attributes, individual characteristics, or events. The HRS is built from the form (data entry screen) menu and database builders of the Microsoft Visual FoxPro System (currently developed in Version 6.0). FoxPro is not needed to run the HRS; however, it is required to make any modifications to the programming. The FoxPro tools encourage and facilitate a modular, object-oriented software development approach. In the HRS, most of the objects represent variables from the HRS core tables and these objects can appear on the data entry, reporting, and rate generation forms. Small "code snippets" are segments of code that can be "attached" to these objects. Some code snippets control when data can be entered for a variable (i.e. at baseline or in updates); others enforce rules for legal variable values and legal relationships with other variables.

    Collectively, the database, form, and menu specifications along with their associated code snippets define the HRS core. When changes to the core are required, a programmer locates the database table, menu, or form object where changes are needed and then works with the code snippets attached to the object. Since there are only a few code snippets attached to an object and these code snippets are usually short, the process of modifying and extending the HRS is significantly easier than changing code in a thousand-line program.

    This method of modifying and extending software is very different from the process used only a few years ago. Formerly, software development required a programmer to manage both the control logic of a program (for example, the sequence in which fields in a data entry screen are visited) as well as the semantic logic (for example, logical checks). When changes were required, the relevant code section had to be manually located in the many thousands of lines of code. Seemingly straightforward modifications such as altering the format of the individual ID would require changing code in many different places. Then the programmer had to ensure that those changes did not cause problems with the control code in other portions of the program. Only experienced programmers could work with large software systems written in this way. In contrast, the HRS specifies the format of the individual ID in only one place (the class directory). When the format of the ID is modified, the change immediately cascades through the entire program, updating the format of the ID in every place that it appears.

    The process of code modification is mechanized in the HRS. The new object-oriented, visual style of programming is much easier for both beginning and experienced programmers. Point-and-click specifications combined with a reasonable default behavior allow the beginning programmer to develop adequate software systems with relative ease. In addition, this new form of software development allows an experienced programmer to develop objects that can be dropped into an application. These objects can then be used by beginning programmers to develop new applications. The HRS contains many predesigned objects that are intended to facilitate demographic surveillance system development. For instance, template forms exist for each of the three different types of data (invariants, episodes, and events - see the paper detailing the Reference Data Model under "Publications" at In starting a project to study malaria incidence among children under 5, the programmer would first open a new "Event" form which would come equipped with all of the navigational buttons along the bottom. Data fields are other examples of predesigned objects. The programmer can cut-and-paste the Individual ID, Location ID, Date, and Field Worker boxes from another event form, inheriting all of the associated logic.

    The FoxPro form builder allows objects in a form specification to be visually manipulated. When a programmer wants to change the form layout or variable labels, the appropriate object is selected and then moved, edited, or deleted. It may be necessary to modify the core specification for a number of reasons. This may involve adding a variable, changing data prompts to a new language, changing logical constraints, or adding controls governing when data can be entered. For example, the update form specification appears to the programmer as follows:

    FIGURE 6

    Obviously, the specification and the data entry form seen by users are visually very similar. If there is a problem with a data field, such as the Location Number, then a programmer can mouse-click on the corresponding location field, display the list of properties associated with that field, and make any corrections. Control buttons at the bottom of the screen also serve a useful function. These buttons contain a substantial amount of logic to control user interaction with the form. A novice programmer who needs to create a new data entry form need not understand all of the detailed logic associated with a control button, but can simply paste the control buttons onto the new data entry form and inherit all the embedded logic. This is an extremely powerful utility for code reuse. If the reuseable objects are sufficiently powerful and versatile, then programmers need only insert the right objects in the right places to invoke the requisite logic and operation.

    Asking for the properties of any of the objects embedded in the form will reveal more of the details of that object's behavior and characteristics. For example, a few of the properties associated with the location number are shown:

    FIGURE 7

    The object has tabs to reflect the various categories of attributes and behaviors. The Data tab allows the programmer to associate the object with a particular field in a table. Methods allows the programmer to attach computer code to particular events that occur during the running of the program. In the above example, the "When" code snippet returns "true" if a user can enter data into the field and "false" otherwise (in this particular case, the code in the lower box stipulates that this field cannot be edited after the data has been entered). The Layout attributes affect the appearance of the object on the form, and the Other tab contains a few miscellaneous attributes.

    Logic about legal values and relationships for variables can also be associated with the objects embedded in a form. But often a better place to put this logic is in the database table itself. This logic can be associated with a field in a table or with the entire row in the table. If data is inserted into the table, then the logical checks are applied to the data before it is inserted; if there are any errors, the data cannot be inserted. Logical checks involve both ensuring that the entry is a legal value for this variable and that it is consistent with past information on that individual or household. This capability is one of the most significant and powerful features for the HRS programmer. It allows changes to be made to some of the HRS logical checks (for example, changing the allowable codes for relationship to household head) as well as adding new project-specific logic to HRS fields or new fields. The following figure shows some of the fields for the individual table as well as more detailed information, including a logical rule, for the gender field:

    FIGURE 8

    The Field Comment gives the programmer directions on how to change the legal values for the variable. In this case, to change the coding scheme of Gender from M/F to 1/2, the programmer would simply change "M" and "F" to "1" and "2" in the Rule (and adjust the Message accordingly). Imposing consistency checks with past data (for example, to allow a pregnancy observation to be recorded only for a woman) is a significantly more sophisticated task which requires attaching row-level rules to the database tables themselves.

    We anticipate that most of the consistency logic for an application can reside in the table definition. This means that data entry screens that use these fields will only need to specify when the data should be entered as well as the position and layout of the data field.


    Data Management for Longitudinal Systems Conclusion

    The Household Registration System:
    Computer Software for the Rapid Dissemination of Demographic Surveillance Systems

    James F. Phillips, Bruce B. MacLeod, Brian Pence
    © 2000 Max-Planck-Gesellschaft ISSN 1435-9871