Abstract Data Resources
1 Introduction
In contrast to the remarkable progress in the genetics of yeast and nematode aging, little is known about genes that control human longevity. What is behind the records of extreme human longevity: just a lucky chance, favorable environment, or "good" genes? How to resolve the apparent controversy between strong familial clustering of human longevity, and poor resemblance in life span among blood relatives? What is the nature of the genetic component for such a complex quantitative trait as human longevity: special "longevity assurance genes" or just an individual variation in the burden of deleterious mutations? These fundamental problems remain to be unresolved, and the major obstacle for their solution is the lack of appropriate data.

The idea of this review on data resources and the understanding of its significance came to us as a result of long scientific discussions held at two International Research Workshops "Genes, Genealogies, and Longevity" (Louvain-la-Neuve, Belgium, October, 1998 and Rostock, Germany, May, 1999). These two workshops sponsored by the Max-Planck-Institute for Demographic Research (Rostock, Germany) with the active participation of its Director, Dr. James Vaupel, has revealed many extremely interesting and bold scientific ideas on familial longevity, … that cannot be tested now because of the lack of appropriate data!

The importance of the search for the new large data resources on familial clustering of longevity was also confirmed in our discussions with the participants of the Gordon Research Conferences on the Biology of Aging (Ventura, USA, January 1997, and Il Ciocco, Italy, May 1998), Meetings of the Gerontological Society of America (Philadelphia, November 1998), Population Association of America (Chicago, April 1998, and New York, March 1999) and the American Aging Association (Seattle, June 1999). As a result of the discussions following our presentations [68-71, 73, 74], it became quite clear, that further advancement in understanding the mechanisms of familial aggregation of longevity crucially depends on the development of new publicly available databases with large amounts of reliable data.

1.1 Illustrative example: The hopes and disappointments with data on British aristocracy

In 1997, we found a computerized historical genealogical database for about 33,000 British aristocrats (distributed by a rather obscure British company, S&N Genealogy Supplies), that we initially believed could be a solution to the data problem. We have shared this information with our colleagues, and it was successfully used by Westendorp and Kirkwood [176] in their provocative study of the trade-off between human longevity and fertility, published by Nature with an acknowledgments of our efforts "for identifying the database" (see [176], p.746). The database became quite famous, but our further data quality control has revealed that this data set is extremely incomplete and for this reason, unfortunately, can NOT be a solution to the data problem [61].

First, the incompleteness of the database on British aristocracy is evident from an extremely biased sex ratio, indicating severe underreporting for women. The British database contains records for 19,380 males but for only 13,667 females, corresponding to the sex ratio of 1.42 [61]. The sex ratio in complete, high quality genealogies is close to the sex ratio at birth [101], which for Caucasian populations generally falls between 102 and 107 males per 100 females [101, 169].

Second (and even more important), in most cases there are no birth dates for women in this database, which makes the calculation of their life span impossible. Although 13,667 females are mentioned in the British database, the life span could be calculated in 2,441 cases only (see [176], Table 2 at p.745). In fact, this problem with British aristocratic women was first noticed by Karl Pearson a century ago [19, 20]. He studied the British Peerage data and had to exclude women from his consideration for the following reason: "The limitation to the male line was enforced upon us partly by the practice of tracing pedigrees only through the male line, partly by the habitual reticence as to the age of women, even at death, observed by the compilers of peerages and family histories" ([20], pp.50-51).

Thus, although the reasons for the incompleteness of the British database may sound rather funny (reticence to indicate the age of the British ladies), the scientific consequences of such data incompleteness are quite serious - this British database unfortunately can NOT be used in the scientific analysis in its present form.

Thus, this example demonstrates the need for systematic work for careful screening, collection and evaluation of various data resources, in order to select the most appropriate data sets. This review is the first step in this direction.

The review also describes the results and findings of our previous feasibility study made upon the request of the National Institute on Aging and summarized in the form of the 35-page report submitted to NIA [60].

In this work, we have made a pilot search for prospective data resources related to familial aggregation of longevity that could be used in biodemographic studies of human longevity.

The world-wide search for the data resources was made in the following directions:

  • Computerized products available on the international market (genealogical databases);
  • Published family history data that could be recommended for use after computerization;
  • Data resources developed for research purposes by other investigators;
  • Special data sets on long-living people and centenarians in particular.

The search was made through Internet search engines, reviews of scientific publications and direct mail requests. The Inventory of Data Resources was developed and these resources were characterized with regard to their strengths and weaknesses for research purposes. A conclusion was made that millions of familial longevity records are available now for researchers and that the potential of existing data resources was understated and, as a result of that, the data resources were underused. However, further, deeper review and analysis of the Data Resources would be desirable in order to characterize their applicability for scientific research and to facilitate their use for scientific analysis.

1.2 The content of this review

This review consists of the following 5 sections:

  • Section 2 Data resources developed for biodemographic studies of longevity.
  • Section 3 Databases created for the studies in historical demography.
  • Section 4 Data resources for long-lived persons and their families.
  • Section 5 Computerized genealogical data (products available on the international market).
  • Section 6 Published genealogical and family history data (that could be recommended for their use after computerization).

The authors would appreciate any comments and suggestions to improve the completeness and quality of this review.

Abstract Data Resources

logo70.gif (2450 bytes)

Data Resources for Biodemographic Studies on Familial Clustering of Human Longevity
Natalia S. Gavrilova, Ph.D.
Leonid A. Gavrilov, Ph.D.
1999 - 2000 Max-Planck-Gesellschaft ISSN 1435-9871