Record Linking Strategy Linkage Criteria

5. Dealing with Errors in the Sources

Errors in the sources, and particularly errors in names are obviously detrimental to correct record linkage. There is no reason to believe that the data set from Asker and Bærum was unusually replete with errors, but Demolink gave good possibilities for finding and correcting errors, either by looking more closely at a record in its source context, or by checking wholly established life courses of people related to the person in question.

The possibility of viewing the same source entry record from as many angles as there were people mentioned in the record, increased the chances for disclosing errors. The example of Johannes Johannesen may illustrate this. The point of departure for this linkage was his presence in the 1865 census. Johannes, according to his age in 1865, should have been born in 1841, but there was apparently no suitable candidate baptism record in the years 1840-1842. Now the possibility in Demolink to show the previous or next source entry was useful. In the household following that of Johannes Johannesen in the 1865 census, I found a woman, Mari Monsdatter, reported to be the mother of Johannes Johannesen. The section of the file listing the records pertaining to the Mari Monsdatters, was consulted, to see whether there was a Mari Monsdatter as mother at baptism of a child called Johannes. There was no such record, but many records belonging to a Mari Monsdatter married to a Johannes Nilsen. This Johannes Nilsen provided further clues. By looking up the section of the file listing the Johannes Nilsen individual event records, I found many reciprocal events to those belonging to Mari Monsdatter. He was naturally listed as father at the baptisms, where she appeared as mother. In addition there was one where he was listed as father, but the mother's name was Mari Hansdatter. This baptism was in 1841, but the child's name was not Johannes, but Johanne, the female form of the name. This meant that I finally found the baptism record for Johannes in the female section of the individual event file, among the records with the name Johanne Johannesdatter.

There were two errors in this example. One was already there in the church book, namely the wrong patronymic of the mother. The other was an error introduced by the computerization of the source. The last 's' in Johannes was almost invisible, so the name was interpreted as Johanne. No automatic record linkage system would have been able to resolve these errors, and the link would therefore have been missed.

Since first name was the first sorting variable in the individual event file, the linking strategy was very vulnerable to errors in the first name. Many such errors were indirectly discovered during the work with this file. Errors were difficult to discover for the person to be linked, but could be revealed for a related person.

By the progressive disclosure of data errors in the data set, more and more records were accepted as reliable parts of a life course, despite inaccuracies in names and ages. After the whole individual event file had been worked through, the first links were reviewed. Links that had not been made in the beginning were now established. The reviewing of links was stopped when the return in the form of more complete life courses was negligible. The linking process is thus also a learning process for the historian. This iterative process would not be possible in traditional family reconstitution.


Record Linking Strategy Linkage Criteria

Interactive Record Linkage: The Cumulative Construction of Life Courses
Eli Fure
© 2000 Max-Planck-Gesellschaft ISSN 1435-9871