Doing cohort analysis in excel and spreadsheets - yemijohnson.com

How to do your Cohorts analysis in Google Spreadsheet & Excel (A detailed guide)

I have a love-hate relationship with Excel spreadsheets. Love them because they are so useful, hate them because of how cumbersome it gets and I have to deal with many. I am the sort of guy that does everything (your typical generalists), and my activities typically cover SEO audits, paid marketing assessments, financial forecasts, monthly reports, et al. So anytime I get to find a hack such as Fuzzy lookup to get the work done faster, I jump at it. Or sometimes, I pass on my troubles to the dev running the MySQL query 🙂

So I gifted myself with a headache this Christmas: I exported user’s emails and purchase dates so that I could build my cohorts and glean some other useful insights. I soon became stuck on which formula to use, how to write my IF statements, running into cyclical redundancies, and being jammed by arguments that made no sense.

So I swore to Sweet Jesus that if I figured it out that I would write a post on it for those that might struggle with this calculation some day. If you are an Excel buff, it’s okay if you stop reading now — it’s nothing you don’t already know.

For the uninitiated, a cohort is a group of users that share a common characteristic defined by a period in time. For instance, the retention rate of all users that signed up to use your app in January.

So let’s jump into how to build Cohorts on a spreadsheet. I will use a data dump of unique customer IDs and purchase dates. I will show you all the steps I took and the formula too. You might have questions at the end of the day, mail me ayemijohnson[at]gmail.com if you need help.

I got my data dump from Database Test Data, and I will be using over 9,000 purchases in this example.

STEP 1: The Customer IDs are unique values I use to represent each customer in my database while the dates are the days they made their purchases. Take note to have your date field in a consistent format to prevent errors during your calculations

STEP 2: We will then create a new column called repeat purchases. This is not necessary in calculating your Cohorts but you might need it based on your use cases for making your Cohorts

Repeat Purchase Rate = Total Number of Repeat Customers/Total Number of Customers.

To use the formula I used here, you will need to batch your repeating Customer IDs together, a simple sort in ascending order for customer IDs or sort in alphabetical order for email IDs will help batch your customer IDs together.

Create a new column called Purchase rate and apply an “IF statement”

Your “IF Statement” checks to see if the customer ID has appeared before in the columns of unique IDs and if it has, it adds +1 to the last count, if not it returns a value of 1.

If you want to know how many times each user made a purchase then you would want to use a “COUNT IF Statement” and remove the duplicate IDs to get repeat purchases as a lump sum. I will not advise using a count statement especially on a large volume of data because of how cyclical it is. You could spend an hour waiting for your excel functions to process each time you make a change.

STEP 3: From my data if I filter the “Purchase Rate” column I can see I have had some customers with over 20 purchases, and now I will like to establish cohort dates for each of my users. This cohort dates will be their Join dates.

The “IF Statement” checks how many times a user has made a purchase and returns the corresponding date for the user’s first purchase. If the user has made several purchases, it returns the date the user made his first purchase.

You can then use this simple formula to group your Cohorts by month: =TEXT(D2, “mmm-yyy”)

It’s critical that you maintain your date format to prevent inconsistencies and errors.

STEP 4: For the purpose of this example, I have grouped my Cohorts by month, but in my original cohorts analysis I grouped my Cohorts on a quaterly basis taking into account the nature of my user’s purchase cycle.

If you are a monthly subscription SaaS business, a monthly cohort is optimal. If you are a travel website for leisure travellers, quaterly or half yearly will be an optimal period for your cohorts.

To obtain the periods in which users in each Cohort come back, I used this formula: =DATEDIF(D19,B19,”d”)

The “d” gives you the difference between the 2 dates in days and if I substitute the “d” for “m”, I would get the period in months, but for the purpose of this post and those that might wish to calculate their cohorts on a different time period, I will leave the difference in days

When using the DATEDIF function, ensure that the start date (earlier date) comes before the enddate in your function

As a side note, I originally subtracted the 2 dates, divided the result by the period I wanted (30, 60, 90 days) and used a round up function to get the period in which each user in a cohort returned, it took me a long after to notice that some of my users in each cohort were being slot into the wrong return period, so I used a VLOOKUP instead.

A VLOOKUP is one of the most useful functions/features in Excel, right after pivot tables. If you have got a sheet filled with a lot of data values (reference data) and you need to find the corresponding value for multiple queries, then VLOOKUP is your guy.

To use the VLOOKUP, I created a new sheet because of how crowded my current sheet is and placed my reference data in it.

This is sheet 2. You can automate entering the periods using an add function. Look at my function bar.

The way VLOOKUP works is that it crosschecks your queries (data in Column F) with reference data (data in sheet 2, above).

Side note: the dollar sign, “$” ensures that whatever range of data you are working with doesn’t change as you drag your formula. “2” instructs the formula to check the second column of your lookup table for the data value to your query, and “TRUE” makes an approximate match when dealing with a data range and it’s surprisingly accurate, and it can be substituted for “FALSE” if you want an exact match.

For the purpose of this view, I filtered out users that never returned, but we need them so don’t delete them.

STEP 5: Now that all of the above has been sorted, it’s time to build your Cohorts table and Pivot tables are especially useful for this purpose.

Place your cursor anywhere within your workspace and create a pivot table, and ensure your entire sheet or range of data you wish to use are selected in the dialog box

So after creating a Pivot table, I did a couple of stuff that I will not be able to show you in screenshots because they are in multiple steps but very easy to explain.

Action 1: I dragged the column, “Join Date” to Row Labels. Remember that in the original worksheet, Join Date was showing actual dates and not months, so I applied the group function. To do this, select any of the date cells, right click and select group. The date option will pop up if you have been consistently using the correct date format and the you can select by which method you which to group your periods. You can group by months, quarters, years, et al.

Action 2: I dragged the column, “Customer IDs” to Values. Since, my customer IDs are all numbers, they were automatically summed up rather than counted. If they were emails or text, they would have been counted instead of being summed. To fix this, click on the drop down arrow under Values and select Value Field Settings, change “SUM” to “COUNT, and you are all set.

Action 3: I dragged the column, “Age by month” to Column Labels. And my Cohorts table was then completed.

You should delete the grand total below and by the side as there’s no real need for it.

STEP 6: To calculate retention rates, simply click on the options tab under Pivot tables. Locate the “Show Values as” function and select “% of” from the drop down. In this example, “Age by month” is the base field and “0” is the base item. Close the dialog box when done and viola!

Well, that’s that and for your sake, I created this in a spreadsheet so that you can view the formula and calculations, however, spreadsheets do not support Pivot tables, but my guide above is comprehensive enough to help out if you use an Excel table.

Here’s the link: https://docs.google.com/spreadsheets/d/1khZDP3vJXheVqJrv0DcaMRkDnn1rdj-cMdr88o11UhQ/edit#gid=0


It makes a lot of sense to know these numbers, but there’s no point knowing your Cohorts, LTV, Retention rates, et al, if you are taking no actions to understand them, so here are list of articles that will give you better insights on things you can glean from your data and how to use it to affect your bottom line.

  1. Andrew Chen is one of the most recognizable voices on Cohorts, visit his blog here: Andrewchen.co
  2. Intercom.io (that very popular customer support app you see on almost every SaaS site) wrote a detailed guide on visualizing your Cohorts: blog.intercom.io
  3. Faisal (owner of young growth marketer) has a complete resource guide on understanding Cohorts: Younggrowthmarketer.com

This post was originally published and last edited in December of 2016. If you like my content, follow me on Twitter – @yemi_uc

Leave a Reply

Your email address will not be published. Required fields are marked *