ThinkstockPhotos-522645819 Machine learning is now routinely applied to many areas of industry. At C9, we apply advanced machine learning to the sales pipelines of some of the world’s largest enterprises. In this blog post I want to highlight our recent data science whitepaper and present an overview of our predictive sales technology for two specific problems: predicting which deals will be won in the current quarter (opportunity scoring), and predicting sales revenue (forecasting).

Traditionally, sales forecasts and pipelines as reported by sales teams are subject to subjective and emotional biases. These typically arise in several ways:

  • Unrealistic targets: sales reps are often required to have a certain coverage ratio (say 3x quota), and consequently they add low-quality deals they don’t want to commit to.
  • Happy ears: sometimes, sales reps only hear the good news and not the bad, which biases their opinion of the opportunity. This is sometimes referred to as having ‘happy ears’.
  • Sandbagging: sales reps’ commission is often related to the fraction of quota they achieve, and this quota is adjusted based on their past performance. This gives an incentive to ‘sandbag’ – to delay reporting of deals that would otherwise be possible to close earlier.

The result is that pipelines contain spurious data and forecasts are missed. For many companies, this is a significant and common problem.

C9 helps eliminate these biases by applying data science to sales pipelines. Our models can typically identify winning opportunities over 45 days before closing, with over 80% accuracy (but see the whitepaper for a more thorough explanation of why ‘accuracy’ isn’t the whole story).

This helps sales teams in several ways:

  • Find opportunities that are promising but not committed (sandbagging)
  • Find opportunities that are committed but may be at risk (happy ears)
  • Gauge if the quality of the pipeline can support the current targets (unrealistic targets)
  • Produce a more accurate sales forecast

Our predictive analytics engine does 2 main things:

1) Opportunity scoring – for each opportunity, it estimates the probability the opportunity will close in the current quarter. We recognize that simply providing scores isn’t enough; therefore our models surface the positive and negative indicators associated with a particular opportunity, as it currently stands. This is both predictive and prescriptive.

One interesting aspect of our opportunity scoring models is that they use a rich mixture of public data and fine-grained temporal behavioral data. By fine-grained data, I mean the following: for each opportunity, we typically see dozens of observations as it progresses from a lead to its final closed state. By learning temporal patterns as the opportunity evolves, our models can be much richer than those which only consider the variables at the final state. For example, ‘how long did it spend in each stage’, ‘how many times did it revisit stage X’, ‘how did the amount change with each stage’, ‘what was the email frequency between stages X and Y’, and so on. These are not typically possible to include without detailed historical data.

2) Predictive forecasting – our models will give accurate predictions of the final revenue for the current and next quarters. They take into account not just the past revenue trends and seasonality, but also look at the quality of the current sales pipeline (in terms of its statistical moments), combing through every deal in the pipeline, what the sales reps are judging and committing, and what sort of deals are statistically expected to close between now and the end of the quarter. We’ll discuss this more in a future post.


MemeWe like to publish detailed results so that customers can see exactly how we are doing. One important thing I’d like to highlight is this: the term ‘accuracy’ is misleading. If your vendor tells you they’re 80% accurate and nothing more, then you should be wary. Why? Here’s a simple example – if my win rate is 20%, then I can easily give you a predictor that is 80% accurate (always predict lost). Since 20% is not an uncommon win rate, this example is not that far out.

Instead, your vendor should report standard metrics such as precision/recall: the fraction of predicted won opportunities actually won (precision) and the fraction of actually won opportunities correctly predicted (recall). We report F1 scores, which combine recall and precision into one metric.

Splitting our results into recall and precision is very useful. What we typically see is that sales teams have high precision, but low recall. This means that once an opportunity is committed, it is likely (~80%) to win. However, sales teams are often reluctant to commit to opportunities, particularly early in the quarter (we often see recall ~10% in the first week of a quarter). One could say they have a fear of commitment.

In contrast, C9’s predictive models are willing to commit earlier to more deals (they typically have recall ~70-80% in the first week of the quarter), which maintaining roughly the same precision as they sales team (typically around 75-80%). The figure below shows this clearly – grouping by sales stage, our models perform much better for ‘earlier’ stages. The advantage of this is huge — sales and finance teams can plan better, with more confidence about where their revenue is likely to come from.Accuracy by stage

More details can be found in our data science whitepaper also viewable here.

 Andy Twigg is the chief technology officer at C9.
Twitter: @lambdatwigg