Have you ever been trying to explain what was driving the gap of a KPI and just been overwhelmed by the number of effects coming into play? Then you might find the following article quite useful…
At DataMa, we’re passionate about improving online conversion through data mining. One of the key problem our clients are regularly facing is being able to understand the differences between two segments on a given KPI. So we have built a script in R that just made their lives much easier… It’s fast, it’s easy to understand & communicate, it doesn’t require any expensive BI tool because R is open source and the output can be plugged on any visualization tool that you already have (Tableau, Shiny, Qlikview or even Excel…), AND we can set up a demo adapted to your business in just a few hours… for free!
The following article gives you a use case based on some open data provided by Google Analytics with output in R Shiny, but use cases are endless. Just reach out here to submit yours.
Use Case
Imagine you are Google Merchandise Store E-commerce officer, and you are in charge of the business performance. One of the key metric you are looking at constantly is Revenue per User.
Now, imagine it’s Monday morning, and you look at this week’s performance vs. last week’s. Bad news: Revenue per User went down 4.96$ to 4.04$. That’s a -18.6% decrease! You ask yourself: “What on earth is driving my conversion down this week?”
Typically, this type of questions triggers another series of complex questions and investigations within the whole organization and generally ends up a long time after with a vague idea of “something that might be partly explaining this but we’re not sure by how much, any way it’s an old story now…”
But because you’re smart, your data is actually available to the world through Google Analytics demo account… which allows companies like DataMa to work on it. So you give them access to the following flat table, and they do the job for you:
Waterfall analysis
First part of DataMa waterfall tool helps you identify which step of the conversion funnel is driving the change. For Google Merchandise Store example, we split $/User into 4 main metrics:
$/User = Session/User * Checkout/Session * Purchase/Checkout * $/Purchase
Note that, as long as you have the data tagged, that funnel could be much more complex, even with different forks in parallel. It still works!
DataMa waterfall first identifies how much each step of the funnel is contributing to the overall gap observed between ‘Start’ (this week) and ‘End’ (last week). The output brings you that in a classic ‘waterfall’ chart, as any good consultant would do:
“All right, looks like our Checkout/ Session is driving most of the change. Now, let’s look at the underlying dimensions changes behind that drop…”
Dimension analysis
Where DataMa tool starts to be really useful is on analyzing the effect of each dimension existing in the data set and reconciling this with the funnel picture previously built. It is a two-step process: first, analyzing mix effects on each dimension and then understanding which segment within each dimension drives the observed performance.
A view almost similar to the previous one allows you to visualize very quickly the dimensions that hide behind each variation. You can directly identify that in the Checkout / Session step, there is an important mix effect (which we explain to you right after) and a performance effect mainly based on the medium dimension. Indeed my organic traffic has particularly less well converted, all things being equal. This view aggregates only the most significant dimensions, but to go further, we will have to go deeper into the details of each effects.
Mix effect analysis
As an E-commerce officer, you might be familiar with mix effects: for instance, say mobile users generally convert much lower than desktop users (which is often the case). If last week you had much more desktop users than this week, then the conversion this week will be mathematically lower, just because of that change in device mix.
Well, what DataMa Mix effect analysis does is just replicating that logic but for each step of the funnel and for each dimension provided in the data set. As long as those dimensions are not interdependent, you come up with a share of the gap on each step that can be attributed to mix effects on a given dimension:
“Ok… looks like about a third of the observed drop on Checkout/ Session is just due mix effects, on medium and device”. If you jump to the detailed view below, you’ll see that the share of ‘organic’ sessions has increased quite a bit, and because ‘organic’ sessions convert way lower than the average into checkouts, you see the overall conversion driven down by this mix effect.
Note that a share of the gap will always remain ‘unexplained’ by mix effects. Some of this is normal, as some changes (e.g. product changes) will not affect the mix of traffic into each step of the funnel. However, the more driving dimensions you have in the input data set, the better you will be able to identify what’s really driven by something else than mix effects.
Segment performance analysis
Now, within each dimension and each step of the funnel, outside of mix effects, you might wonder which specific segment is driving the observed gap. For instance, within device dimension, maybe the mobile segment conversion from Session to Checkout just dropped, driving the overall conversion down on that step. Segment performance analysis automates that approach for you.
The outcome basically gives you the biggest segment driver up (Max) or down (Min) for each step and each dimension, specifying the change in conversion of that specific segment relative to the overall change. This is important to capture, in order to understand whether that segment is the biggest driver because most of the traffic goes through it or because it is actually facing a abnormal change in its conversion.
“Hmm… Looks like the drop on Checkout/ Session is mainly driven by Male userq landing on Home page and using their desktop. But that is just because those are also the segments getting most of the sessions. The relative drop on those segments isn’t way higher than the average drop (you see that in the “x X” in parenthesis remaining close to 1). The only thing that could worth to dig deeper is the drop on organic, going down x2.43 faster than the average, which is concerning…”
Detailed view
As we know that details matter, once you have identified the main drivers, you might find useful to look closer at what is really happening for a given step and a given dimension. The detailed view allows you to do this, looking at the mix (X axis) and the conversion (Y axis) variations of each segment between Start (beginning of the arrow) and End (End of the arrow). You have the capability to deep dive into the raw data.
“This is where you see the organic segment going from 43% to 48% of sessions and at the same time experiencing a huge drop in its – already low – conversion. Changes in SEO might cause this. This is where I would like to investigate”
Conclusion
All right, now you know what to do: “My conversion is going down mainly due to more and lower qualified sessions coming in from organic medium”. You probably want to have a chat with the person in charge of SEO and see how to fix this.
The good thing is that all those charts come into one single dashboard on the viz tool you are already familiar with, so you can share it quickly with your stakeholders to support action.
The even better thing is that because all back end of that thing is just a simple R code, it can be replicated easily for other business problems, other metrics, other dimensions other ‘Start’ and ‘End’ definition… and it will run in just few seconds.
Do you want to try this out for your own company and business case? Demo is free and fully confidential, just send out a message here and we’ll reach out to you in a few days. This is simple. It is not a 6 months expensive BI project. All we need is a flat table.
Also, please share with your network if you found this useful!