Best Practices

A Data Scientist’s Guide to Fighting Product Churn

Most companies religiously monitor their churn rate. If they don’t, they should.

Acquiring new customers is a costly investment, both of time and capital, so you naturally want to minimize the risk of new customers ending their relationship with you.

As a lagging indicator, churn rate only lets you look at what has already transpired in your product, whether it’s wistfully or disconcertingly. But, there are ways to analyze customer lifecycle data and build models to predict which of your new customers may be likely to churn.

As the data scientist at Pendo I watch churn, closely. Our own product is meant to help our customers reduce churn, and so it’s no surprise that we pay close attention to it within our own product. My particular focus these days is on predicting churn by looking at how our customers use our product.

Even if you don’t have a data scientist on your team, here are some things that you can do to build better predictive models, and mount an attack on churn.

1. Start with Definitions

Based on your business and transaction model, the churn rate could be defined in different ways.

For example, you might define it as the percentage of customers who never returned back to the app (or make a new purchase) after their first 60 days.

Whatever the case may be, after you’ve defined exactly what churn means for you, the next step is to identify the length of time for which you have reliable comparable churn data. Does your churn rate in 2016 compare favorably to your churn rate in 2017? Did your pricing models change from one period to the other? Did your product significantly change from one period to the other?

Establish your baseline, as this will be the starting point for your attack on churn.

2. Are You Tracking the Right Thing?

The next step to battling churn is tracking and measuring data that are essential for predictive purposes. In our case, we aim to have our customers achieve value in the product as soon as they onboard, so we track time to value for key onboarding actions. We also track aggregate usage data in the product and map it along regular points in our customer’s lifecycle.

Here are some questions that should help you figure out if you’re tracking the right things:

  • What are the steps in your customers’ journey as they onboard, both within and outside of your product?
  • Do you track how many users are active in these customer subscriptions at a regular cadence (e.g., daily, weekly, monthly)?
  • Do you have well-defined monetary transactional data for your customers in their lifecycle?
  • Do you have sufficient metadata on your customer that are relevant for your analysis (geography, company size, industry, user base, etc.)?
  • Do you track their satisfaction with you (e.g., NPS, CSAT, poll responses) at critical points in their lifecycle (e.g., 90 days, 6 months, 12 months)?
  • How do you aggregate support ticket data generated by these customers?

3. Churn Indicators

Now that you know what you’re looking for, and you have all this data, you can identify the most significant indicators to use in a churn prediction model. Make sure to only consider relevant and non-redundant indicators, and don’t overfit the model. You want new predictions to work with new input, so build with the future in mind!

In our case, average time spent per active day, number of features used, total number of active users, and number of days logged in over the period of analysis were broad indicators of overall engagement in the product. We also considered more specific indicators, such as the number of core features used and time spent on specific groups of features or product actions.

You may have your own ideas about what makes certain features your core, but don’t forget to also listen to your users. Even a brand new user’s first few days in your product can teach you a lot about what you should consider core. In our case, we’ve learned that activity in the first 10 days can be a good predictor of how likely a user is to return to the app.

4. Choose Models Wisely

Churn prediction models (like most models) are only as good as the data going into them. There is no standard strategy for setting up a churn prediction model. Before deciding which model to use, it is important to meticulously asses which indicators you are going to incorporate into your model.

Simple overall statistics on these indicators can lead you to decide which set of models to test: what are the distributions of your indicators, and are there outliers? Can you describe their irregularity? What is the relationship between sets of features?

One of the things we were particularly interested in was predicting account renewal based only on the first three months of product usage. Some approaches that we explored included logistic regression, random forest, clustering, k-nearest neighbors, and SVM. Optimizing based on the accuracy of the predictions with our validation datasets, we decided on a logistic regression model for predicting churn.

5. Pull on the Goggles: Experiment Time

Our inferences of churn probability are the most useful to us when we can experiment based on our learnings and measure the success of our experiments. Whatever model you end up using, you want to translate it into experiment as quickly as possible.

In our analysis, we observed a significant difference in onboarding speed between those who churned and those who hadn’t.

Once we saw that it was experiment time. Our team started tailoring new onboarding strategies for any segments that the model flagged as at-risk for future renewal. When the time for renewal comes, different strategies are tested for how effective they were in altering the risk factor of those accounts — keep what’s successful, lose what isn’t and pivot.  

6. User-friendly Models and Results

The most sophisticated and accurate models are only as powerful as how accessible they are to their end users: those in the company who ultimately stand to learn and apply the insights from them.

Make sure you invest in gathering input from those in your organization who are closest to your customers (from the pre-sales to customer relationships) to ensure that you are thinking of the right model variables and that you have embraced the depth of your customers’ product experiences.

Get Ahead of the Churn Curve

If you’re not super into data, some of these details may have made your eyes glaze over. But, churn affects everyone and should be everyone’s concern. The good news is that even without a full-time data scientist, you can probably take some of these suggestions and implement them in your organization to build a predictive churn model that will produce some action items for different teams. Prioritizing the time to build one, thoughtfully, can help you get ahead of the churn curve.

About the Author

Dr. Suja Thomas is a data scientist at Pendo. She has over 7 years of experience analyzing large, complex business and customer behavior data. Suja utilizes her expertise in statistical and mathematical modeling and predictive analytics to help Pendo achieve growth goals and develop a better product.