Time Requirements of the Linking Process Acknowledgements

9. Concluding Remarks

As an advocate of a fully automatic record linkage process, Roger Schofield of the Cambridge Group has claimed that "if the historian's judgment has any claim to intellectual respectability, the principles on which it is based must be specifiable in algorithmic form and so be executable automatically by the computer without further human intervention." [Schofield 1992,75]. The experience with the record linkage is, however, that insight into the sources and their historical context is necessary to obtain good results in record linkage. This means that the researcher/linker must know the geography of the area, understand which names, even if not standardized, are so close that they are identical, and finally understand what constitute clear errors. In addition, human qualities like curiosity, imagination, alertness and attention are important. This does not mean that only university educated historians with an interest for this type of work could successfully use this record linkage system. Well educated amateur local historians could be also trained to do the job, but it is definitely not a routine, mechanical task. It is also an advantage that the historian who will examine the sources for substantive analyses also does at least some of the record linking. The work gives one a unique acquaintance with the sources and the data set, and many questions that can later be subject for more systematic research will arise.

The number of identifying items in the sources is also important for record linkage. The good quality of the Norwegian 19th century sources clearly contributed to the success of the interactive linkage. But even when the records contained poor or erroneous identification items, the possibility of viewing many sources simultaneously offered by Demolink made it possible to find the life courses to which the events belonged.

Since the different sources did not completely overlap in time, an expected result was less complete life courses. The case will often be that it is not solely up to the historian to select which sources to be used. Budget, availability and time constraints are legion. Given these constraints, the linkage strategy I chose, standardized first names and patronymics as pocketing variables, males first then females, both basically in alphabetical order, worked reasonably well. An alternative solution could have been to link the people with uncommon names first, regardless of sex. This might have eased the task of linking people with common names. Those of them who were married to people with uncommon names, could more quickly be linked by looking up the life courses of the related people. On the other hand it would not be easy to keep track of what had been done.

Whereas the order of linkage could have been altered, the cumulative and at least partly retrospective linking strategy within a pocket of events belonging to people with the same standardized first names and patronymes, seems more fixed. Rather than starting with a birth and thereupon adding events chronologically, the procedure of starting preferably with a marriage and subsequently adding events with fewer identifying items, seemed to be the most efficient way to produce secure life courses.

The main part of the Demolink system was developed before I could start the actual record linkage, but the process gave ideas as to how the system could be improved. Some of these ideas were implemented during the linking process, like sorting the individual event record file in different ways. Using patronymic or residence as the first sorting key, gave other views of the records, sometimes making it easier to see which records belonged to the same life course. This was particularly useful in the cases where the first name was wrong or lacking. The possibility of linking people by typing their individual event record number, i.e. not necessarily having them on the screen of the computer, was added. Production of lists of not linked records to help complete the final linking of people with common names was also useful.

Other ideas that arose during the record linkage process demand more fundamental changes. Among these is the need for better search routines in order to find candidate records for linkage, when problems like errors or missing information split records that ideally should have appeared close to each other. Some kind of a more automatic record linking procedure, to reduce the time in front of the computer linking the "easy" cases without giving up the claim of correctness, would be welcome. There is, however, always the problem of how to detect and handle non-systematic errors in the sources.

The interactive method, based on the researcher's knowledge of the sources, historical background as well as cognitive agility, has a qualitative aspect. But it does not necessarily follow that it is less respectable intellectually than the strict applications of formal algorithms. In history there are many ways to evaluate a piece of work. In his seminal article of 1973, Ian Winchester argued that all history basically is "speculation about the past controlled by record linkage operations" [Winchester 1973]. If this is true, it is not obvious that record linkage always should be based on a fully quantitative approach. The method, manual, interactive or fully automated, should be chosen and evaluated according to the goals for the research, the resources available, as well as the quality of the sources.


Time Requirements of the Linking Process Acknowledgements

Interactive Record Linkage: The Cumulative Construction of Life Courses
Eli Fure
© 2000 Max-Planck-Gesellschaft ISSN 1435-9871