What’s Your Research Style? Power Up Your Family History with Record Clustering Analysis

NOTE: The methods in this article are designed for use with 19th and 20th century genealogical research in the UK, particularly England and Wales. Record Clustering Analysis is readily adaptable to other eras and jurisdictions though, so watch out for a follow-up article in 2021!

Pull up a chair, put the kettle on and let’s sit down for a think. We’re about to indulge in a little introspection about how we undertake our family history research.

Tightly-packed bookshelf with groups of books on it.
Successful research doesn’t just come from knowing where to look or how to structure your critical thinking, but from understanding your own habits as a researcher. Photo credit: Upper Library, Christ Church Oxford, by the Portland Seminary via Flickr, CC BY-SA 2.0.

When people discuss genealogical methods, it’s often about quite specific “how-tos”: how to navigate record sets; how to carry out effective searches; or how to critique the evidence for a specific ancestor. These ideas all relate to a small scale of function, often involving a specific, well-defined issue.

Today we’re going large-scale instead! If you can visualise all the records which come together to make your tree, then the patterns of success, omission and difficulty which emerge can reveal bigger secrets about your own research tendencies. Read on and I’ll walk you through the process, and help you to use this information to become a more resourceful, more curious and – hopefully – a more successful researcher.

Genealogist, Know Thyself

First, a little honesty. If someone asked you to characterise the style or habits of your genealogy research, what would you tell them? Have you done a more thorough job on some lines of your tree than others? What are your strengths and weaknesses? How varied are the sources you rely upon? Write down your thoughts and responses to these questions right now. We’ll revisit them later.

Today I’m going to take a look at the different vantage points we have available to us in our genealogy research, and how each of these can be useful. We’ll revisit the idea of Negative Space at a larger scale using a method called Record Clustering Analysis, and use it to pinpoint ways for you to evolve as a researcher.

So, what’s your genealogy style? Are you a Heretic or an Enumerator, a Glutton or somewhat Scurvy-Prone? Jump on board with the methods below and find out!

Introducing the Perspective Pyramid

You’re probably familiar with the notion of the brick wall: a stoppage point in our research which arises when there’s insufficient clear evidence to proceed. Brick walls usually relate to a particular individual – for instance, if you can’t locate your great-grandfather’s birth or marriage registration and are unable to go further without those vital records.

Sometimes a brick wall crops up because certain documents haven’t survived, or because the underlying story is complicated. But sometimes it can be due to a deficiency in your research technique, or a record set which you’ve overlooked. It can be really beneficial to appraise your own research habits and understand where the gaps are. This is a complete change of focus from scrutinising one individual in your tree: we need a telescope rather than a magnifying glass.

To help us explore this bigger picture, let me introduce a little graphic which I call the Perspective Pyramid, shown below. It summarises the various components of your genealogy, because you’ll get a different type of story depending on how closely you zoom in on parts of your tree. From the base of the pyramid to the peak, we have four distinct levels: Individuals, Families, Branches, and the overall Tree.

Coloured schematic of a pyramid, demonstrating the four types of scale we can look at in genealogy. Numbered 1 to 4, we have individuals at the base, a smaller number of families above them, then an even smaller number of branches above the families, and finally a single researcher working on one tree at the top.
The Perspective Pyramid has four distinct levels, each one corresponding to a different scope of focus on your genealogy research. Depending on your perspective, you might be looking at Individuals, Families, Branches or the entire Tree. Each viewpoint provides its own insights.

When you’re near the base of the pyramid – researching individuals and families – you’ll acquire detailed insights into specific people. When you’re up near the top, you see broader trends rather than detail: the trajectory of your ancestral narratives through time AND patterns in your general approach to the record sets you tend to use.

So our research can tell us different things depending on what level of the pyramid we’re at. We’re used to working at Stage 1 and 2 on individuals and families, so let’s climb further up now – the technique I want to share with you today mostly relates to those upper levels of the pyramid.

Scaling the Pyramid: Large-Scale Negative Space

Do you recall the concept of Negative Space, which I introduced in a previous post? Negative Space describes the gaps in our ancestral timelines where we lack evidence to tell the story. By noticing these gaps, we gain a clearer impression of the narrative and understand which time periods need to be given more research attention. But this approach operates at the lower levels of the pyramid, with Individuals and Families. How might it translate to Branches and the overall Tree?

Your family tree may contain many hundreds of people, each with negative space in their timeline. As we climb the pyramid, we focus less on the detail of individual timelines. From the summit of the Perspective Pyramid, you’d make yourself dizzy if you tried to take in all the details of each person in your tree. Instead you need a way of visualising and understanding the dominant patterns of your research, including:

  • which record sets you rely on most heavily for particular eras;
  • which branches you have invested more time in;
  • which record sets you haven’t yet searched.
Archival filing drawers
Image: Archival filing drawers by Carolina Prysyazhnyuk via Flickr, CC BY-SA-2.0.

Just as our Negative Space timelines showed you the information you had and the gaps where you had nothing, we need to identify which record sets we’re using a lot and which ones are missing. Think of this as a bigger-picture, larger-scale version of Negative Space. It’s time to introduce a method which will help us explore our habits and identify these gaps.

Record Clustering Analysis: Get Insights Into How You Work

Try the following technique, which I call Record Clustering Analysis (or RCA for short). This method will identify which record sources and sets dominate your work.

And let’s be clear: there’s no shaming here! RCA is a positive exercise to show ourselves how we can develop as a researcher. Although it requires an honest appraisal of our habits, RCA is about personal growth rather than harsh self-judgment.

A Rainbow of Record Sources: The Basics of Record Clustering Analysis

Rainbow vista over a treetop mountain region - the range of colours in the image is the key visual
Photo: Andreas Krispler via Flickr, CC BY-ND 2.0.

Things are about to get colourful. Choose a single branch of your family tree now and dig out your research. You can work by hand or on computer – if you prefer the digital option, then I’ve prepared an Excel template which you can download here.

1. Extract Your Evidence

For each person in your chosen branch, find the direct evidence which mentions them by name. Make a list of all the individuals along this branch and note down the documentary evidence you have for each of them.

2. Cluster and Count the Records

Perform a quick count of your pieces of evidence. Rather than examining the details of each record, you’re going to form clusters of each record type for every individual. For each person in your chosen branch, write down the number of:

  • Census entries which this individual directly appears on;
  • All civil registration certificates (actual certificates, not index entries) for birth, marriage or death which name this individual. This includes any appearances as a witness or informant, as well as birth or marriage registrations for their children;
  • Parish register entries naming this person. Again, this could include entries for children where parents are named, and weddings where they act as a witness;
  • Other record sources. This cluster admits a variety of documents, including occupational sources, newspapers, wills and probate, street and trade directories, parish chest documents, apprenticeship papers, guild membership, registers of professional bodies or divorce papers.
  • Once you’ve done this, calculate the total number of records you hold for each individual. This is an important baseline indicator of how much research you’ve done overall.

To make the most of your record clustering information, you need to visualise it. We assign colours to each record type: the census is green, civil registration is blue, parish registers are orange, and all other sources are yellow. The aim now is to produce some bar graphs which will give you an at-a-glance sense of which record sets use and which you’re missing.

If you’re comfortable working on a computer using Excel, then go to Step 3a. If you prefer to work by hand, I’ve got some low-tech suggestions for you in Step 3b.

3a. Visualise the High-Tech Way

For each individual, we’re interested in what proportion each of the main record types contributes to our knowledge. A special graph called a 100% stacked bar chart is a good way of doing this. Each individual is given a bar of the same length, split up into blocks which indicate the proportion of evidence which comes from each record type.

You can see a fictional example of RCA plots in the image below.

A range of horizontally-lying stacked bar graphs, coloured to show the proportion of evidence coming from each of the four main record categories, for 11 fictional ancestors. Graphs at the top for a WILLIAMS branch are predominantly green, whilst those for the PRICE branch at the bottom include each of the four colours.
Record Clustering Analysis plots for three fictionalised lineages, drawn using Excel. This method uses 100% stacked bar charts, allowing us to compare the reliance of different branches of our tree upon particular record sets.

To avoid a big digression, I’ve put together a separate blog article, showing you how to do this in Microsoft Excel. Ancestors whom you haven’t researched much may not have many records associated with them, so note these names down and be mindful of this when you’re interpreting the finished plots.

3b. Visualise the Low-Tech Way

If you’re visualising by hand, then use coloured stickers, Post-it notes or index cards, or draw plots out freehand. Make this as bold and colourful as possible! Cluster the colours together for each individual and observe the trends and patterns in the sort of documents you’re using.

You could choose to calculate the percentage totals and draw a 100% stacked bar chart by hand, but it’s probably more straightforward to focus on a quick and easy approach when you’re working manually. In the above example, the stacked bars are not scaled by percentages and instead show the absolute number of records involved for each research subject.

4. Analyse Your Plots

Step back and pay attention to what the overall trends are along the whole branch – remember, we’re interested in the bigger picture rather than individual detail. Is your portfolio of evidence heavily weighted towards one colour? Are some colours completely missing? Where’s the negative space within your record clusters? Identify which sources you haven’t looked in yet.

Tips for Using Record Clustering Analysis

Here are some key points to bear in mind when looking at your RCA plots.


An important factor in any research, record availability influences RCA plots. Take the decennial census, for example. Named individuals didn’t appear on the England & Wales census until 1841, and the 100-year-rule currently prevents us from viewing any censuses beyond 1911. Individuals who lived during this 1841-1911 timeframe are therefore likely to have larger green components in their plots. Similarly, civil registration in England and Wales dates from 1837. The green census and blue civ-reg clusters fade out in the plots of earlier generations.

Extent of research

If your research is in its early stages, you probably have fewer pieces of evidence. This will generate a highly skewed RCA plot, such as those for Euphemia and William Henry Williams in the example above. You just need more time to get to know them!

Record survival

If a branch of your family lived in a parish with poor record survival, this will shape search outcomes. Compare the record clustering analysis between different branches of your tree, as this might help to identify lineages afflicted by record survival issues.

Blind spots differ

Each researcher has his or her own blind spots: things we forget, record sources we overlook. Join a genealogy friend, carry out your Record Clustering Analysis together and compare the outcomes. Mutual support is valuable because no two individuals have the same blind spots! Discuss your work with others and they’ll provide a fresh pair of eyes and hopefully some good suggestions for new collections to search.

So, What Is Your Research Style?

It’s time to revisit those thoughts you jotted down, where you gave your gut reaction on your research style. Do they match up with the record clustering analysis? If your own assessment of your style was very different to the picture that emerges from the RCA plots, then make particular efforts to work on your research blind spots.

Now let’s take a look at some strategy suggestions for overcoming those challenging branches of your family tree – all tailored to your research style. It’s time to find out which type you are!

A Balanced Genealogical Diet

Identify which colours dominate your RCA plots and which ones are missing. If your plots look very similar, you may be quite a consistent researcher. More likely, the plots for certain branches of your family tree may have a characteristic “look”. This is either telling you something about your approach to the research, or how that family’s story has interfaced with the official record.

Table set with a range of foods constituting a balanced diet: pulses, eggs, milk, oats, fruit and vegetables.
The balanced diet concept can help keep your family tree healthy. Be sure to include all the main genealogy food groups alongside those special other sources, and you’ll keep your research on track. Photo by Marco Verch via Flickr, CC-BY 2.0.

Choosing record sets to use in family history research is a bit like trying to have a balanced diet. It’s not healthy for your genealogy findings to survive on one record set alone. You need components of all the major “genealogy food groups” alongside a range of more specialist items to get the best results. The census, civil registration and parish register documents are rather like the protein, carbs and fats of your family history, whilst the range of “other records” are like vitamins and minerals.

Based on the plots you made, let’s take a peek now at what type of researcher you are. These are delivered slightly tongue-in-cheek but hopefully they’ll give you some useful pointers! If you want to compare typical plots for the different research styles, then jump to the combined graphic at the end.

Awash With Green? Meet The Enumerator

What’s going on here? If your cluster analysis is so overwhelmingly green that it reminds you of being knee-deep in a grassy meadow, then your genealogy research may be overly reliant on census sources and you probably have substantial negative space inside your timelines. This pattern also turns up when you have a newly-discovered individual whom you’ve not had chance to research.

Stacked bar plot for The Enumerator. Predominantly greed with only 10% blue and 10% yellow.

How does this affect my research? No historical sources are served well by being used in isolation: records realise their power from being connected to and compared with other ones. Remember that negative space we looked for last time? The England and Wales census is decennial – so if it’s the only source you’re using, you may have ten years of negative space in between each of your data points.

Extract from a census enumeration book for Walmersley, Bury, Lancashire in 1851. Image supplied by FindMyPast: HO107, Piece 2212, Folio 357, Page 3.

What are my next steps? Keep on extracting as much as you can from the census (after all, it’s a great source of information about family and community networks as well as individuals!) but be aware that you need to diversify. Try to connect your census work with evidence from the other key “food groups” – parish registers and civil registration. You’re here to uncover your ancestors’ stories, and this new task of locating them within other record sets is going to be an intriguing adventure.

Where’s the Blue? Meet the GRO Avoider

What’s going on here? This outcome arises in 19th and 20th century research when you’ve been relying solely on the indexes, rather than the official certificates from the General Register Office. For earlier generations of your tree which pre-date civil registration, there won’t be any blue but there should be a compensating amount of orange from parish registers.

Stacked bar plot for the GRO Avoider, with equal components of yellow and green, a small amount of orange but no blue.

How does this affect my research? The cost of certificates from the General Register Office does build up over time, and this added expense can deter many people – that’s understandable. Where applicable, I get the certificates for every ancestor and accept a slower pace of research in return for much higher quality data. Certificates provide address and occupational data which doesn’t appear on the indexes – vital for triangulating your sources and mapping your family’s whereabouts. They’re an essential step in separating several candidates with the same name, and can even offer up genealogical gold such as a father’s military service number.

Original birth certificate from 1909.
Original birth certificate for Jack William CRABBE, born in Salisbury in 1908. Image by crabchick via Flickr, CC BY-2.0.

What are my next steps? Make a list of certificates you would like to order. When resources allow, work through this list and order via the GRO’s official website. You’ll need to make a free account and pay for any certificates you need (I’d advise against buying certificates through third-party providers, as they usually charge a hefty extra fee for the service). All GRO registrations are available as a physical certificate costing £11; a substantial subset of the records can be ordered as a cheaper PDF download costing £7.50. Revel in the extra information, use them to cross-check your research and as launch pads to other record sets.

Where’s the Orange? Meet The Heretic

What’s going on here? If you’re short of orange entries in your record clustering plots, then perhaps you’ve been avoiding church! Either you haven’t searched parish registers, have looked in the wrong parish, or you’ve searched the correct parish but without positive matches.

Stacked bar plot for The Heretic, containing green, blue and yellow in near equal measures, but no orange.

How does this affect my research? In England and Wales research we’re used to the luxury of having both civil registration and census records for a good chunk of the 19th century. Since it’s possible to get by looking at only these sources, many people forget about the rich seam of findings on offer within parish registers, without which you may miss a lot of detail and cross-checking.

Have you been avoiding church in your research? Photo of All Saints Church, Leamington Spa provided by barnyz via Flickr, CC BY-NC-ND 2.0.

Non-conformism is a major reason behind a family’s absence from parish registers. If your record clustering plots are low on orange for a whole section of your tree, consider whether the families concerned could have been members of a different Christian denomination, such as Baptists, Congregationalists or Methodists.

Details within parish registers offer you a chance to triangulate and verify existing data points in your tree and enhance each family’s story. Digitised offerings from the big commercial providers usually allow you to flip through the pages as if the register is in front of you, so as well as finding your ancestor’s entry, you can also see who else was baptised, married or buried on the same day or surrounding weeks. Families sometimes arranged for cousin baptisms to take place on the same day; if infectious disease ran through a family and caused a run of fatalities in a short period of time, you may discover several burials over a short run of weeks.

What are my next steps? If you’ve simply omitted to explore any parish registers, that’s easily remedied: include them in your research plan from now on. Use census and civil registration information to work out which parish you need to look in. Phillimore’s Atlas and Index of Parish Registers can help you to work out which parish you need or locate the archives which hold original registers – and its maps are now available on Ancestry. A substantial number of baptism and marriage entries are indexed on FamilySearch, so sign up for a free account and start exploring.

Alternatively, if you’ve made fruitless searches in parish registers, then it may be time to revisit your choice of parish for the search. Have you made too many assumptions about the church your ancestors went to? Double-check the parish and try searching in neighbouring parishes if need be, or seek out non-conformist registers.

Where’s the Yellow? Meet The Scurvy-Ridden Researcher

What’s going on here? This situation is easily diagnosed: usually your plots will be devoid of yellow.

Stacked bar plot for the Scurvy Prone Researcher, made of mostly green and orange with a bit of blue, but no yellow.

How does this affect my research? In the balanced diet of sources in our family history research, sources beyond the traditional core ones are like vitamins and minerals – we need a wide range of these to keep our research healthy and to bring some amazing stories to light and to life!

Image showing abundance of yellow, with hundreds of lemons spilling out of two barrels
When scurvy threatens, you need to up your Vit C intake. It’s time to seek out more unusual record sources! Photo by Mike Mozart via Flickr, CC-BY 2.0.

What are my next steps? Try to use a wider range of sources in your research. This outline of the British Library’s reference collection will give you some great ideas, and highlights some key UK genealogy organisations. Join the Society of Genealogists and make use of their collections. Remember: names and dates aren’t the only details to be discovered. Occupations, wills, newspapers and all sorts of other resources will add colour and life to your ancestors’ stories.

Lots of Yellow? Meet The Glutton

What’s going on here? Another one that’s easy to spot, the Glutton will have a lot of yellow in their RCA plots, and probably a more balanced series of constituent colour blocks overall.

Balanced colours on a stacked bar plot, showing The Glutton. Good amounts of all four colours and over a third is yellow.

How does this affect my research? Gorging on other sources is usually a positive finding, because you’re reaching out beyond the core record sets to deepen your understanding of the narratives that make up your tree.

What are my next steps? If you’re using a large number of other sources, it’s important to have a good system in place for recording where they have all come from. Be sure to maintain a good research log (explore these articles by Janine Smith if you’d like to know more) and try to write up your research regularly to ensure that you’re keeping the main story in mind and not losing focus or direction (pick up some writing tips from Natalie Pithers here).

Finding the Digital Balance

Our Record Clustering Analysis focused on four main record clusters: census – civil registration – parish registers – other sources. We can sometimes forget that the majority of records have yet to be digitised. If you restrict yourself only to what’s available online, you could be missing out on a substantial part of your ancestor’s story.

Frontage of the UK National Archives at Kew.
The UK National Archives, Kew, Surrey. Photo by Sophie Kay, August 2017.

Once you’re comfortable with record clustering analysis, I’d advise adding in a purple fifth column, indicating sources found in archives. Then the yellow category becomes “other sources consulted online” and your fifth column, “physical sources consulted in an archive”. Our archives are an Aladdin’s cave of historical and genealogical riches, so if your extended plots are devoid of purple then you may want to consider dipping your toes in archival waters and exploring what else is on offer out there.

The Last Word(s)

Hopefully I’ve been able to persuade you how Record Clustering Analysis can shed light on how you might develop as a researcher, and encourage you to be curious and innovative in seeking new sources of information. Continue to update your RCA bar charts over time and identify new directions to explore. And remember: every genealogist needs a balanced family history diet, so be sure to include all the main “genealogy food groups” in your research. Good luck with scaling that Perspective Pyramid!

As always, I’d love to hear how you get on with these methods. Feel free to leave a comment below, or share images and screenshots of your record clustering analysis with me on Twitter – I’m @ScientistSoph.

Research Types: A Side-by-Side Guide

Here are the example record clustering analysis plots for the various types, shown side-by-side so you can see the differences:

Side by side comparison of the RCA plots for the different research styles

Additional Photo Credits

Many thanks to the colour block photographers, whose images were supplied via Flickr, licenced as follows: Orange by Christian Heindel, CC BY-SA 2.0; Blue by Liz West, CC-BY 2.0; Yellow by Evelyn Berg, CC BY-NC-ND 2.0; Green by Bob Scott, CC BY-NC-ND 2.0.

3 thoughts on “What’s Your Research Style? Power Up Your Family History with Record Clustering Analysis

  1. Lots of food for thought, Sophie. I’m definitely a glutton, I think, and in more ways than one, haha!! I actively target every document conceivable if my ancestors’ names might appear. Of course, records typically run out around the turn of the nineteenth century, so there are limited numbers to pursue. But, as well as that, now I’m going after every letter of DNA they left behind.

    1. Thanks so much Dara – really glad you enjoyed the article! Genealogy research is one of the few places where gluttony is a goal rather than a bad thing 😉 The core records can tell us some fantastic stories, but I think it’s usually those “yellow/purple” records where the richness really lies. I’ll be writing a follow-up post sometime in 2021 looking at how these methods can be extended to pre-19thC eras…but in the meantime, I hope your DNA exploration brings some exciting discoveries!

Leave a Reply

Your email address will not be published. Required fields are marked *