Introduction Data Bases
2 Data resources developed for biodemographic studies
Several data sets were collected directly for the purpose of studies of familial resemblance in human longevity and estimation of heritability of human life span. These data sets deserve to be made available for further studies of familial clustering of human longevity.

2.1 Data on Finnish and Swedish middle class and noble families

Data comprising 12,768 cases were collected from various published Finnish and Swedish nobility and middle-class genealogies by Dr. Eva Jalavisto from the University of Helsinki (Finland) in 1951. This study was supported by Finnish Cultural Foundation ("Suomen Kulttuurirahasto"). The results of data analysis were published [91], but the data were not used later. We have made an attempt to contact Dr. Jalavisto requesting her database and have found out that unfortunately she died in 1966. We plan to contact the colleagues of Dr. Jalavisto (Dr. P.C. Holmberg in particular) at the Department of Physiology, University of Helsinki in order to find out the destiny of this data collection.

2.2 French data on village of Arthez d’Asson

This data set is the reconstruction of families from 1744-1975 based on the parish registers and civil registers (5196 births, 1538 marriages, 4365 deaths) for the village of Arthez d’Asson situated in the Ouzom Valley in Bearn (Department of Pyrenees Atlantiques). The data were collected and analyzed by Dr. Jean-Pierre Bocquet-Appel (Laboratoire d’Informatique pour les Sciences del’Homme, Centre National de la Recherche Scientifique, Musee de l'Homme, 17 place du Trocadero, 75116, Paris, France) and Dr. Lucienne Jakobi (CRA, Musee de l’Homme, F-75116, Paris, France). They have records for 542 paternal grandfathers (maximum life span of 98 years), 542 records for maternal grandfathers (maximum life span of 92 years), 542 records for paternal grandmothers (maximum life span of 97 years) and 542 records for maternal grandmothers (maximum life span of 92 years). For parents they have 542 records for fathers (maximum life span of 98 years) and 542 records for mothers (maximum life span of 102 years). For children they have 674 records for sons (maximum life span 94 years) and 688 records for daughters (maximum life span 102 years). The brief description of their database is published [24-25].

2.3 Hauge-Harvald Database on Elderly Danish Twins

This database is a part of Odense Archive of Population Data on Aging dataset (Program Director - Dr. James W. Vaupel).

The Hauge-Harvald Database on Elderly Danish Twins consists of individual level data on twin pairs born in Denmark between 1870 and 1930. For each twin pair, date of birth and dates of death (if dead), sex, and zygosity are available. Data availability: All of the above data are available and will be sent to qualified researchers on request.

Contact address:

Cindy Owens, Program on Population, Policy, and Aging, Box 90245, Sanford Institute, Duke University, Durham, N.C. 27708-0245; ph: (919)-613-7321; fax: (919)-681-8288.

This database was used in many studies on human longevity [42, 43, 82, 118].

2.4 Cambridge Group for the History of Population and Social Structure Research Projects


English sources allow the construction of large-scale, individual data-sets for demographic analysis of adult males from c1300 and for the general population from 1538. These fall into two groups: those with full reproductive histories from family reconstitutions or genealogies, and those for unlinked individuals.

For the general English population from 1538 - 1837 the researchers have family reconstitutions of 110,000 marriages in 26 communities. Prof. J. Knodel who has kindly provided the access to his family reconstitution study of 29,000 marriages in 14 German villages (1600–1950) [101]. Also, Dr. T.H. Hollingsworth has kindly allowed the researchers to use his descendant genealogy of the British Peerage from 1603 to 1959, which contains full life-histories for 28,000 individuals and partial information on their grand-children.

The unlinked individuals, suitable for mortality studies only, come from elite groups and include a medieval database containing about 750 land-holders and 750 monks. In addition, there are records for 14,000 Scottish church ministers (1530 - 1927) and we have access to data for 17,000 Members of Parliament (1500-1860), courtesy of the History of Parliament Trust.

1st Project: Long-Run Changes in Adult Mortality

Researcher: J. Oeppen

The major findings of this project are that, with the exception of two early epidemic periods, levels of adult survival were largely unchanged until c1700, when the modern rise begins. From this date, females follow a pattern of steady improvement shared by other European countries. For men of all social classes, this upward trend is checked by urbanisation and industrialisation in the first half of the nineteenth century. There is little evidence of differentials in life-expectancy by sex or social class before 1800, although there is a strong marital status effect. Only amongst women aged 65 to 85 does wealth seem to be an advantage.

2nd Project: Historical Bio-Demography

Researchers: J. Oeppen and R. Davies

Amongst the hypotheses being investigated, a number are of interest to analysts of longevity, and exploit the fact that most of the data is for periods with high exogenous stress over the life-course, and natural fertility. For example, it was found across all the data that the reproductive history of a woman has no influence on her survival after age 50, thus confirming a long-held view amongst demographers. It was also found that the difference in mortality in the reproductive years between single and married women in the Peerage is explained by the number of children born, multiplied by the risk of maternal mortality. However, the gap between men and women in these ages is too large to be explained by maternal-mortality alone. Before 1900, maternal mortality is higher for elite-group women than amongst the general population. See [128] for more information.

2.5 Longevity studies on the Valserine database

Research team

Cournil A, Legay J-M. Laboratoire de Biométrie-Génétique et Biologie des populations. UMR 5558, Université C. Bernard Lyon 1.

Brunet G, Bideau A. Centre d'études démographiques, Université Lumière, Lyon 3.

Current research projects

Patterns of inheritance of human longevity (in co-operation with F. Schächter)

Modelling of sex-linked survival traits (in co-operation with Tan Q. and Vaupel J.)


Historical population database of the Valserine valley

Short description of the database

The Valserine register reconstructs the population of five villages located in a narrow valley of the French Jura mountains, near the Swiss frontier, 10 km west of Geneva. Although more than 4,000 inhabitants dwelt in this region at the beginning of the 19th century, about 1,000 locals live there nowadays. The reconstruction of the population was carried out through the analysis of all parochial and civil registers available. Data about approximately 70,000 vital events such as birth, baptism, marriage, death, burial from 1680 to 1980 have been recorded into two computer files of the SYGAP software. One file contains 46,390 individual mentions, the other one contains 14,115 union mentions. Individuals of this database can be linked to each other through kinship networks. This database was implemented on the initiative of Bideau A, Brunet G and Plauchu H within the framework of a co-operation between historical demographers and geneticists aimed at studying the transmission of Rendu-Osler disease, a rare benign autosomal dominant vascular disorder.

From this database a set of familial biographies has been selected to study the relationships between life spans of parents and children. The main focus in the study was on late mortality, taking into account post-reproductive survival (above age 50) for both generations. The first step of the study was to test for the existence of a familial component of longevity. The results showed a non-negligible contribution of a familial component in the variability of life-span above age 50. The second step consists in detecting particular patterns of inheritance associated with specific factors. In other terms, the goal is to test the influence of particular factors on the magnitude of the "correlation" between parents' and children longevity. A strong sex-effect in the transmission of longevity has been identified suggesting the contribution of sex-linked genetic traits. These results have led the researchers to develop and fit survival models integrating sex-linked traits.

Studies on longevity are published in [44, 45]. Valserine Database was also used in many other biodemographic studies [21, 49, 50, 83, 141].

2.6 Framingham Longevity Study

Family patterns for age at death were examined in a 40 year follow-up of 5209 men and women (2900 deceased, 2309 living) in the Framingham Longevity Study [33].

2.7 Database on Six New England Families

13,656 records for members of 6 New England families (Bradford, Bulkeley, Cushman, Denison, Pardee, Waterman) born 1650-1874 were collected and analyzed to study the historical trend in resemblance between first-degree relatives for age at death [117]. The maximum recorded life span in this sample was 109 years (female born in 18th century).

2.8 The Utah Population Database

The creation and development of the Utah Population Database (UPDB) became possible when early in the 1970s permission was provided to copy records which are maintained by the Genealogical Society of Utah (GSU), an organization operated by the Church of Jesus Christ of Latter-day Saints (LDS). As a religious mandate members of the LDS church have been encouraged to identify their ancestors [14]. The GSU, located in Salt Lake City, Utah, is the world's largest repository of genealogical records. Each family group sheet submitted to the GSU by the members of the LDS church represents data on three generations: a husband and wife, their children, and their parents. The core of the Utah Population Database is a set of genealogies with more than 1.2 million names linked together in familial structures. These genealogies were selected for those records where at least one family member was born or died in Utah or on the pioneer trail followed by the members of the LDS church as they migrated from the midwestern states to Utah [14].

The second major set of records is the Utah Cancer Registry that was initiated in 1952 in large hospitals and which became statewide in 1966. The Utah Cancer Registry is maintained as a separate database, but could be linked to genealogical data if necessary.

UPDB also includes 1880 census data for the state of Utah representing approximately 143,000 individuals.

Another set of data is a file of death certificates for the period 1934 through 1981 that was included into UPDB. Now the entry of birth certificates is also in progress.

The UPDP is in the process of continuous developing with input and linkage to genealogical data the 20th century death certificates and birth certificates. Combining with census data, this database could provide a unique opportunity for longevity studies, since many variables (e.g., social status data from the census, causes of death from death certificates) could be taken into account in addition to the standard familial variables (birth order, life span of parents and spouses, parity, etc.) drawn from genealogies. For more information on the Utah Population Database see special publications [11, 12, 14-17, 95, 112, 119, 157].

The Utah Population Database was used in several studies on life span inheritance. In one study 20,682 familial longevity records for 9,719 families with twins and sibs were collected and analyzed by Dr. Grace Wyshak (Department of Preventive and Social Medicine, Harvard Medical School, Boston, MA) [182]. In another study an analysis of twin longevity on 2,242 sets of twins, extracted from the Mormon Genealogy Data Base (currently UPDB) was carried out by Dr. Dorit Carmelli [39-40].

2.9 The Laredo Epidemiological Project

The Laredo Epidemiology Project is a study of the patterns of degenerative disease, particularly cancer, in the families of Laredo, Texas. This mostly Mexican-American population is of manageable size and relatively culturally homogeneous. This project is based on the family reconstitution using parish records on births, deaths and marriages from all 12 Catholic parishes in Laredo, and also records from civil registries and the one hospital in Laredo [38, 173]. The genealogical history of Laredo was reconstituted by the grouping of 350,000 individual church and civil vital event records into multigenerational families, with record linkage based on matching names [38]. The Laredo population database was used in several studies of familial aggregation of chronic diseases [38, 174, 175].

2.10 Genealogical Data on European Royal and Noble Families

One of the best sources of genealogical data available is the famous German edition of the "Genealogisches Handbuch des Adels" (Genealogical Yearbook of Nobility) - the most reliable and complete data source on European royal and nobility families [75-78]. This edition is known world-wide as 'Gotha Almanac' ('Old Gotha' published in Gotha in 1763-1944 [7], and 'New Gotha' published in Marburg since 1951 [75-78]). Data from the Gotha Almanach were often used in early biodemographic studies of fertility (see [85], pp. 199-224, for references), although later this important source of genealogical data has been undeservedly forgotten.

Each volume of the New Gotha Almanach contains about 2,000 genealogical records appropriate for analysis, with more than 100 volumes of this edition already published. Thus, more than 200,000 genealogical records are available from this data source. The high quality of information published in this edition is ensured by the fact that the primary information is drawn from the German Noble Archive (Deutsches Adelsarchiv). The Director of the German Noble Archive (Archivdirektor) is also the Editor of the New Gotha Almanach.

It was not until 1995 when the information form the "New Gotha" Almanachs has been partially computerized by the research team of Dr. Leonid A. Gavrilov and Dr. Natalia S. Gavrilova (Center on Aging, NORC and University of Chicago). By the end of 1998 the database contained information on over 20,000 adult persons (over 30 years) born in 1700-1900 with complete information on their parents (including birth and death dates) and spouses. The studies of longevity using this database demonstrated substantially non-linear sex-specific transmission of human longevity from parents to offspring [66, 67, 71-74] and effects of parental age at reproduction on the survival of adult offspring [57, 58, 65, 68-70].

2.11 The Database of Qing Nobility (China)

This database has been developed by Dr. James Lee (California Institute for Technology) and his colleagues (Dr. Wang Feng and Dr. Cameron Campbell) using genealogies of the Qing dynasty (1640-1911). By the mid-twentieth century the Imperial Lineage in its entirety included almost 200,000 persons: 80,000 in the principal line, and 120,000 in collateral lines. Dr. James Lee and colleagues transcribed and organized vital records for more than 80,000 individuals from the principal imperial line (most of whom lived in Beijing) using data from the Chinese historical archives and the copy of principal-line genealogy available through the Genealogical Society of Utah. One of the advantages of this database is virtually complete registration of females, with a sex ratio of 109 for children born before the Opium War (1839-42), that is close to the sex ratio of birth of 105. After 1840, however, the quality of the data deteriorates significantly (a reflection of dynastic decline) [106, 107, 170].

2.12 Wang Genealogical Database (China)

Chinese genealogies were collected and analyzed for the purpose of mortality studies by Zhongwei Zhao. The main source in his study was the General Genealogy of Wang Clan or Wang genealogy of Chinese upper class families. The Wang genealogy covers a very long period of time - since 900 until 1800. The Chinese database contained records for about 30,000 individuals, although many records were incomplete [183]. Also, in most Chinese genealogies no individual records were made for women. The life expectancy of individuals in the database was very low for the entire historical period (e.g., about 30 years in 1700-1749 at age 30) even for these upper class families [183].

2.13 Special populations

Data on special religious sects are often used by epidemiologists and geneticists in their studies. These sects usually share a relatively uniform environment and have a rather unique life-style that isolates them from the general population. In addition to Mormon data (described earlier), Hutterite and Amish populations are well studied now.

A. Hutterite Population

The Hutterites are an Anabaptist sect that originated in Moravia in 1528. Between 1874 and 1877, approximately 900 members of the sect migrated from Russia to the United States to the area which is now South Dakota. The Hutterites represent a closed population with high levels of fertility and consanguinity. The group maintains a stable residence pattern and keeps extensive genealogical records that could be used in studies of familial aggregation of human longevity. Until recently Hutterite data were used in numerous studies of fertility and inheritance of genetic disorders including [86, 104, 127].

B. The Old Order Amish Genealogy Database

The unique genealogic registry of Lancaster County, Pennsylvania, Amish contains information on 8,163 marriages, dating back to the time of the pioneer migrants in the 1700s and spanning more than 10 generations. This database also represents closed population with high level of inbreeding. The individual records in this database, however, are heavily truncated and the total number of persons in the database is not very high [100]. An Old Order Amish genealogy database is maintained now at the Department of Epidemiology, Johns Hopkins University School of Hygiene and Public Health, Baltimore, Maryland. This database was used in the studies of consanguinity effects and infant mortality and other studies in genetic epidemiology [3, 51, 98, 100].

Introduction Data Bases

logo70.gif (2450 bytes)

Data Resources for Biodemographic Studies on Familial Clustering of Human Longevity
Natalia S. Gavrilova, Ph.D.
Leonid A. Gavrilov, Ph.D.
© 1999 - 2000 Max-Planck-Gesellschaft ISSN 1435-9871