The Data Material and the Demolink System Dealing with Errors in the Sources

4. Record Linking Strategy

In Demolink, all individual events are first interactively linked to individual life courses. Afterwards an automatic procedure groups the linked individuals into families. The record linkage was executed, first for all males, then for females. The pocketing or grouping variables were first names and patronymics. Each section of a particular combination of a first name and patronymic was linked before I went to a new combination.

Men were linked first because men had more records than women. More records often mean more identification items and consequently more secure links. Mainly men were mentioned in land registers. The fathers, never the mothers, of the bride and groom were mentioned in the marriage records. When I started to link the females, the ready linked life courses of their fathers and husbands were useful.

To avoid starting with the problems connected with common names, male names with infrequent initials, such as B and D, were first taken up. Thereafter the males were linked alphabetically from A through V, then the females in the same way, except for the names Mari/Marie/Maria/Maren (all standardized to the same name code). These variants were the last to be linked, because they were the most common female name. They also occurred frequently as the second Christian name, so when records from this section of this file were to be linked, many of them were found to be linked already under the first Christian name. This was so because when an individual event record with two first names was linked at one place in the file, the duplicate record, sorted and listed somewhere else in the file, was automatically marked as linked too.

The records belonging to individuals with rare names practically stood out by themselves by virtue of the visual pattern of records in the individual event file. For the most frequent combinations of first name and patronymic, there could be hundreds of individual event records and consequently no overview on the screen. These cases needed a complementary paper print-out of the file. Notes, brackets and arrows prepared the proper linkage on the screen.

When the number of individual event records associated with a particular combination of a first name and patronymic exceeded the screen's capacity, i.e. 30 records, I started with an individual event record type connected to a marriage. It could be a person who was mentioned as father of the bride or the groom, or it could be the groom or bride. Because so much information was connected to marriages, in most cases it was possible to find links to both the family of origin and the family of procreation. By choosing the most informative records first, it made it easier to see where records with less abundant or discriminative information should fit in. Moreover, by taking the people who were mentioned in a marriage first, a substantial number of the records were linked, thus the number of records that initially seemed to fit into different life courses were reduced.

During subsequent examinations of the same combination of equal first name and patronymic, records for people who were married, but where the actual marriage record was missing, were linked. Thereafter unmarried people were linked. Finally, before introducing a new combination of a first name and a patronymic, a last check of missed links in the stored life courses was done. Also, cases with two or more alternative records that might fit into a particular life course, were settled, if possible. The process of choosing between competing links closely resembles the procedure explained for French-Canadian material [Jetté 1989].

In contrast to the traditional manual method, where the events are added chronologically as they occur, Demolink encourages retrospective record linkage. The retrospective strategy offered the opportunity to establish the most obvious links first, thus the number of uncertain links were reduced. If the starting point had been a baptism, the possibility of moving out or dying would have had to be kept in mind before linking the baptism to a later census or a marriage.

The record linking process was also in a certain way iterative, in that stored life courses could always be subject to changes if I found information that altered previously linked life courses. Stored life courses were infrequently split into two separate parts later. It happened more often that the linkage revealed that two or more stored life course fragments belonged to the same individual. The tendency to get fragments rather than erroneous events in the life course is due to inaccuracies or errors in the sources [Note 3]. The record linkage was thus a cumulative process, where insights gained in linking one person could be exploited to link other people, partly by removing apparently competing records, but also by discovering errors in the sources.

 

The Data Material and the Demolink System Dealing with Errors in the Sources

Interactive Record Linkage: The Cumulative Construction of Life Courses
Eli Fure
© 2000 Max-Planck-Gesellschaft ISSN 1435-9871
http://www.demographic-research.org/Volumes/Vol3/11