Computational Genealogy

Genealogical research is a popular hobby. Many people pour over sources such as birth and death certificates, wills, letters, and land deeds to assemble their family trees. A lot of these get uploaded and shared in online projects such as WikiTree, which creates vast datasets of interconnected family trees. Machine learning techniques can analyze such data, and the results show patterns in population gender ratios, marriage trends, fertility, lifespans, and the frequency of twins and triplets.

On this site, we survey the use of genealogical data by presenting some of the results attained from a range of population dynamics. You can see how the popularity of your name has varied over the years as you explore the trends we've seen.

This site is based on the paper Quantitative Analysis of Genealogy Using Digitized Family Trees.


WikiTree Dataset

The source of our data is WikiTree, a free, collaborative, worldwide family tree project created by a community of amateur genealogists. Their mission is to create a single worldwide family tree that will make genealogy free and accessible to everyone. The specific dataset we used included information on 6.67 million people in over 150 countries, going as far back as the first century.

On WikiTree, you can explore your family tree. On this website, you can explore the trends that family trees can capture.

book
Figure 1. The number of recorded births in the WikiTree dataset in each decade. It can be noted that the number of births in each decade peaks near the end of the 19th century. We believe that the graph’s considerable decreases in the 20th century may reflect privacy concerns - many active WikiTree users prefer not to expose the details of their immediate family.
Figure 2. Choropleth world map of recorded births per country. Each shaded country is the birthplace or death place of at least 100 WikiTree users; the heat map was created using Google Maps API.

Name Trends

Enter your name in the box below, and the graph will show how frequently your name has been used across time. With the quick links provided, you can also view the most popular names of all times, and see how fashionable biblical and presidential names have been.

50 most common Twins first names on WikiTree between 1800 and 1900
100 Most Popular Names on WikiTree


Insert a first name below, or choose one of the following groups:
American Presidents  Biblical Names  Most Popular Names 

Figure 3. Frequency of common first names over time.

Births and Fertility

Today, women are waiting longer before having children, and many people think this is because of the desire to establish careers and earn money to purchase a home. In fact, this is not a new trend. The average age at which mothers gave birth to their first and last children across time increases steadily.

We also examined the frequency of twins and triplets. Hellin's Law states that one in every 89 human maternities is twins and one in every 892 (7,921) is triplets; note, however, that this is not really a proper "law" of nature. Nevertheless, when we assess Hellin's Law, it seems to hold reasonably well. Of 960,803 births, 10,141 were twins (1.06%) and 118 were triplets (0.013%). Twin gender ratios were almost even: male-male — 3,260 (32.15%); female-female — 3,380 (33.33%); and male-female — 3,309 (32.63%).

Figure 4. Gender ratio for births between years 0 and 2000.
Figure 5. The distributions of fathers' and mothers' average ages at the time of the first- and last- born child between 1600 and 1950.
Figure 6. The distribution of females' ages when giving birth.

Marriages

The general trend in any given time period is for males to marry later than females, and this age increases over time. It was not unknown for girls aged 12 and boys aged 14 to marry in the medieval period, but perhaps contrary to popular opinion, these young ages did not represent the average.

Figure 7. Number of recorded marriages in each decade.
Figure 8. Average and median ages that males and females got married.

Lifespan

Please Select a State:

Costumi Napoletani - Anziani (Old People of Naples)
Figure 9. Average lifespan over time in the selected state.
Figure 10. Median lifespan over time in the selected state.

Event Detection

Cultural and economic events can impact births, deaths, and marriages. There is often a spike in births following the end of an economic depression or a war, and specific events can even increase the popularity of a first name. There was a peak in the use of the name Wendy, for example, soon after the Disney Peter Pan film was released. Click on the dots to see some of the events that have had an impact.

A Swedish pocket calendar from the year 2008 showing February 29
Figure 11. World map of events detected by analyzing the number of births and deaths over time.

Download

Publications

  • Michael Fire and Yuval Elovici, "Data Mining of Online Genealogy Datasets for Revealing Lifespan Patterns in Human Population," ACM Transactions on Intelligent Systems and Technology, 2014 (In Press) [Link] [BibTex].

  • Michael Fire, Thomas Chesney, and Yuval Elovici, "Quantitative Analysis of Genealogy Using Digitized Family Trees," 2014 [Link] [BibTex].

Datasets

Download First Name Distributions | Download Anonymized WikiTree Multigraph

The full WikiTree dataset is available upon request by contacting WikiTree's founder and CEO, Chris Whitten.
Additionally, most of our analysis results presented on this site are available upon request by contacting Michael Fire.

Gallery

About Us

    • Dr. Michael Fire
    • is a Washington Research Foundation Innovation Postdoctoral Fellow in Data Science and a University of Washington Moore/Sloan Data Science Postdoctoral Fellow, under the mentorship of professors Carlos Guestrin and Joshua Blumenstock. He holds an M.Sc. (magna cum laude) in Mathematics from the BIU and a Ph.D. (summa cum laude) in Information System Engineering from the BGU, where he won the Kreitman Prize for excellence in Ph.D. studies. In recent years, Michael has published dozens of papers for prestigious conferences and journals in the fields of social networks analysis and data mining. He also has extensive experience as a data scientist working for several companies and organizations.

    • Dr. Thomas Chesney
    • is an associate professor of information systems and co-author of the European edition of the popular textbook Principles of Business Information Systems, now in its second edition. He has a PhD in Information Systems from Brunel University, an MSc in Informatics from the University of Edinburgh, and a BSc in Information Management from the Queen's University of Belfast. Based at Nottingham University Business School, his research examines the behavior of networked individuals and has appeared in many journals, including the Information Systems Journal and Decision Support Systems.

    • Prof. Yuval Elovici
    • is the director of the Telekom Innovation Laboratories, head of the Cyber Security Labs, and a professor in the Department of Information Systems Engineering, all at Ben-Gurion University of the Negev (BGU). He holds BSc and MSc degrees in Computer and Electrical Engineering from BGU and a PhD in Information Systems from Tel-Aviv University. He has published more than 60 articles in leading peer-reviewed journals and over 100 papers. In addition, he has co-authored books on social network security and on information leakage detection and prevention, and he consults professionally in the area of cyber security.

Acknowledgments

We would like to thank DTbaker for designing the website HTML template. We also would like to thank AttivoDesigz for designing the website logo, Carol Teegarden for proofreading and editing the article and this website, and Adam Poole ‏ for creating our animation video. Especially, we want to thank Chris Whitten and the WikiTree supportive community who provided us with the WikiTree dataset.
book