Dealing with Errors in the Sources Linkage Results

6. Linkage Criteria

In his classical manual, E. A. Wrigley clearly advocates the use of all context information in the family reconstitution process, residence as well as occupation are no exceptions. [Wrigley 1966, 130]. The same point is made in a Norwegian handbook in historical demography [Dyrvik 1983,110]. Using the same variables first in the linkage process and then in the substantive analysis is, however, problematic [Wrigley and Schofield 1973,90-91]. Even if the research goals are purely demographic, the overall results can be biased if the reconstructed population differs from the real population. IREP in Canada recommend, therefore, to use only information that is stable for the whole life course in the linkage process [Bouchard 1992,69]. Still, the Cambridge Group has used residence as well as occupation in the automatic reconstitution [Schofield 1992] The same has been practiced in Sweden [Bengtsson & Lundh 1993].

The linkage criteria for Asker and Bærum were the names (both of person to be linked as well as relatives), the year of birth, and the residence, in that order. If the first name was very common, this reduced the importance of this criterion, but only seriously if also the patronymic was among the most frequent. But even in these cases, the age would discern different people. Where age was not given, or several records showed approximately the same age, then identical residence would do in most cases. In this study residence is the area that belongs to a specific land register number. This area typically covers one or two main farms and some rented cottages. Residence is thus a quite limited area. The chance that there should be two different people, with the same name and age that lived at the same place at two consecutive points in time, the first having left and the second having moved in, is small indeed.

To exclude residence from the linking criteria would skew the results unduly. For people with less common names, the names and age were good enough linkage criteria. They would be linked whether they were movers or stayers. Without residence, however, the linkage result for people with common names would often be either a false link or an incomplete life course. Clearly any study based on family reconstitution benefits from the reconstruction of as many reliable life histories as possible, both for those who stayed at the same place and for those who moved within the parish. Often, families would stay at the same place for some years and then move, stay there for some time, and then move on, or move back to the first place. Such patterns would be much more difficult to disclose without residence as a linkage criterion. Long-distance migrants will be lost in any case.

There were, however, times where several individual event records competed to be included in a particular life course, and it was impossible to know which record to choose. Other times no record seemed to fit into an unnatural "hole" in a particular life course. This could be when a person was mentioned in the sources both before and after a census, but was not found in the census. In the case of competing records, two different strategies have been chosen in automatic record linkage to resolve clusters of records for people with equal names [Bouchard 1992,70]. My preference was to follow the philosophy of IREP: To prioritize optimal accuracy over optimal completeness; i.e. it was more important that the links were secure than that a maximum of the individual event records were linked. However, unlike the automatic systems which only compare two links at a time, I could consider several links simultaneously. As such the question of accuracy versus completeness was not generally pressing. The most difficult and the most time consuming individual event records to link were those with common names. Certain combinations of very common first names and patronymics e.g. Hans Olsen, generated up to several hundred individual event records.


Dealing with Errors in the Sources Linkage Results

Interactive Record Linkage: The Cumulative Construction of Life Courses
Eli Fure
© 2000 Max-Planck-Gesellschaft ISSN 1435-9871