Record Clustering Analysis Using Excel (A Tutorial)

Computer keyboard

In this short tutorial, we’re going to walk through the process of using Microsoft Excel to make plots for Record Clustering Analysis, or RCA for short. Even if you don’t have a Microsoft subscription, these instructions should work on the free, online-access Excel version included in Microsoft Office for the Web.

RCA is a strategy for understanding your own research habits by examining the range of sources you use in your genealogy work. It aims to discover if there are some record sets you’re habitually overlooking, or whether your conclusions are overly reliant on one particular type of record. Before you perform any record clustering analysis, you should read my introduction to the method on the main article here.

Download My Excel Template & Get Started

The instructions below provide an illustrated step-by-step guide and I hope you find them easy to follow. The spreadsheet itself is quick to make, but if you’re impatient to get started then download my Excel template here. This file provides all the column headers for you and has the framework for a stacked bar chart already set up on a separate sheet. As you fill in your data it should automatically appear on the chart.

Once you’ve completed my tutorial, you’ll be able to create RCA plots for any ancestor you choose, just like the ones in the image below.

Comparative record clustering plots for eleven different individuals, comprising a series of stacked bar charts all scaled to the same length and showing the percentage each record types contributes to the body of evidence.

1. Decide Which Branch(es) of Your Tree You’re Analysing, and Acquire Your Data

If you’ve been researching your family history for some time, then you may have a vast tree containing many hundreds of individuals. To gain useful insights from the RCA method, you probably don’t want to plot everyone’s data in one go. Choose a single branch or two from your lineage and focus on them for now. This way, you’ll be able to see the resulting plots clearly and benefit from the patterns and research habits they illuminate, without being overwhelmed by too much data.

Refer back to my descriptions of the four categories and how to count them, on the main RCA article.

2. Enter the Names Into Your Excel Spreadsheet

An easy step to start! Once you’ve decided which branches you’re going to analyse, type the names down column A. You could separate these into two columns as forename-surname if you wish, but I prefer the way Excel handles the labels in the resulting plots when the names are in one column.

Largely empty spreadsheet including names of individuals and section headers, but no data yet.

I put the surnames in capitals to make it easier to identify the separate lineages at a glance. In the fictitious example shown above, I’ve listed names from the PRICE, RHODES and WILLIAMS branches of a family tree.

3. Enter the Data Into Your Excel Spreadsheet

Now, enter the data into your sheet. In the template I sent you, there are separate columns B to E for the four categories of cluster analysis: Census, Civil Registration, Parish Registers and Other. You can read more about which items you’re meant to include in these counts on my Record Clustering Analysis article here. Fill in the category count for each individual included in your analysis.

Then set up a totals formula in cell F2 – write =SUM(B2:E2) here. Although we won’t include these totals directly in our plots, it’s better to have a visual check within the spreadsheet on the extent of your research so far.

Filled spreadsheet containing data as well as names, but no totals

4. Copy the Formula Down the Totals Column

If you downloaded my Excel template at the start of this article, then the totals formula should have already been filled in for you. If so, then Excel may have already calculated the total number of sources for each person. If this hasn’t happened or you preferred to make your own spreadsheet from scratch, then you’ll need to copy the formula down the column manually.

Demonstration of how to fill in a formula in Excel and copy it down a column

To do this, select cell F2 by left-clicking on it. Move your mouse to the lower right-hand corner of the cell and the cursor should change to look like a thick black plus, as shown in the image above. A left double-click will fill the formula all the way down the column. Personal record totals should now appear down the F column wherever you have people listed.

Completed spreadsheet, with names, data, and totals.

5. Select the Data You Want to Plot

Now comes the visual bit at last! Once you have names, raw data and totals in the spreadsheet, you’re ready to visualise your data.

Go to the last person on your spreadsheet (in this case Euphemia WILLIAMS) and select the cell in the E column, containing their Other total. In our example, this is cell E12, but in your spreadsheet it might be a different cell, depending on how many people you have in your analysis. Click and drag upwards until you arrive at cell A1, which contains the Name header. It’s important to include the first row in your selection, because those column labels will appear in the legend of your plot.

The data in your spreadsheet should now be highlighted in grey, indicating that it’s selected in preparation for plotting a graph, as in the figure below.

Excel spreadsheet with all data and headers highlighted ready for plotting

6. Plot Your Data as a Stacked Bar Graph

Record Clustering Analysis uses stacked bar graphs – we’re going to plot them now.

At the top of the screen there is a series of menu titles, reading File; Home; Insert; Page Layout, and so on. Left-click on the Insert menu and it should bring up a range of options, including a section on charts and graphs. To the immediate right of the listing for “Recommended Charts“, there’s a little icon of a bar chart, highlighted in orange in the example below. Left-click on the drop-down arrow and a series of options will appear, showing you several different types of bar chart you can draw.

Excel screenshot showing how to select a 2D stacked bar chart

Navigate down to either of the charts labelled as “2D Bar” or “3D Bar” and move your mouse pointer over to the right until you are over the 100% Stacked Bar chart – it should be the last of three options. Left-click on it.

This option produces a bar of the same length for each person, but divides up the bar to show the percentage of sources coming from each record type. 100% stacked bars are useful for showing you how the relative contribution of sources for each ancestor varies.

7. Move Your Chart to Its Own Page

If a chart is worth producing, it’s worth making it as big as possible to make the insights really clear. So let’s move the chart to a separate page. You can do this by clicking your middle mouse button on the chart and selecting “Move Chart”.

Excel screenshot demonstrating how to move a chart to its own sheet

A box will pop up, asking you where you want the chart to go. Select “New Sheet” and type in the sheet label you’d like it to have – I’ve called mine “RCA Stacked Bar” in the example below. Once you press OK to confirm, the series of stacked bar charts should appear in its own sheet, labelled at the bottom of the sheet with your chosen name.

How to move a bar chart in Excel onto its own sheet

8. Correct the Colour Scheme and Labels

If the main article on Record Clustering Analysis, I set out a colour scheme for each of the types of record set. You might feel happy enough sticking with whatever colour scheme that Microsoft has assigned to your plot. That’s fine, as long as it’s clear to you which colour corresponds to which record type.

Here’s how to change these to match the colours I mentioned in my RCA article, namely:

  • Census – green;
  • Civil registration certificates – blue;
  • Parish registers – orange;
  • Other sources – yellow.
Excel screenshot showing how to change the colours of the different segments of the bar charts.

Left-click on the leftmost section in one of your bars, one which you know to be a census contribution. All the census sections should highlight at once, and you can use the colour selector in the Format Data Series window to change these to green (see image above).

The font size of the title and labels probably needs increasing too, so click on those in turn and increase the font size.

Excel screenshot showing how to change font size of plot labels

9. Look With Pride at Your Completed RCA Plot – AND USE IT

Congratulations – if you’ve got this far, then you should now have a series of completed RCA stacked bar plots in front of you, one for each ancestor in the branch you wanted to investigate. Bear in mind though, that these visualisations only become useful once we look at them critically and pay attention to the insights they offer. So rather than just drawing these plots, be sure to use them and act on that knowledge!

If you want to save the graph series as a PDF, then navigate to the sheet where the bar charts are. Then go to File>Save As and select PDF as the output file type. Once you press Save, this should export your clustering analysis as a PDF.

Example of a completed RCA plot

Now you’re ready to revisit the main Record Clustering Analysis article, read the Research Style descriptions of the second half, and see how the insights from record clustering analysis can help you find new directions as a researcher. Good luck!

One thought on “Record Clustering Analysis Using Excel (A Tutorial)

Leave a Reply

Your email address will not be published.