Econometrics in Application
In my side-gig of providing econometric consulting in the private sector, I very quickly got used to the fact that a lot of what econometricians do is, for businesspeople, either overkill or underappreciated, depending on who you ask. Working too hard to address concerns that this particular strain of real-world people don’t actually worry that much about.
However, there has been one consistent exception to this. A tool from econometrics that, whenever I bring it up in a private-sector setting, is met with “oh my God how did you do that?” rather than “why did you bother to do that?” Despite this response, it’s a tool that I’ve seen becoming less common in econometrics classes. That tool is decomposition, also known as Oaxaca Decomposition, or Oaxaca-Blinder Decomposition, or Kitigawa-Oaxaca-Blinder decomposition, depending on how many of its original authors you’re in the mood to acknowledge on that particular day.1 In this article I’ll call it KOB decomposition.
KOB solves a simple problem. That problem is: We notice that something is different between two different observations - perhaps a mean changes over time (monthly sales went up!) or across settings (company A’s monthly sales are higher than company B’s!). We also know that we have some compositional feature that is different between those two observations (monthly sales went up, but also we launched a bunch of new products, so we have more products to make sales on).
KOB asks: how can we separate the difference in the mean to differences in compositional features vs. other kinds of changes? How much of our increase in sales was because we started selling more products, how much was because the products we had started selling more, and how much is something else entirely? And it provides that answer. That seems like a pretty useful answer to have. And when I present it, others seem to agree. So what is it, exactly?
How it Works
Let’s walk through the (pretty basic) math underlying this process. We’ll stick to a single explanatory variable in a linear model, and a single level of decomposition putting a pin in the fact that we can be more general, which we’ll come back to.
As an example, let’s say we’re a retail-oriented business with two kinds of storefronts: mini-storefront (K)iosks, and full-size (S)tores. You observe that the kiosks have considerably higher profits: $50k per month vs. $40k on average. Kiosks seem great! But you also know that kiosks tended to get placed in more-expensive areas with higher-income customers, while the full-size stores went in lower-income areas where floor space was cheaper.
If we assume a linear effect of consumer incomes on profits, we can write
Where Ki is each individual kiosk in your data, and Si is each individual full-size store.
We can take average profits from both of these equations (keeping in mind those εs will average out to 0, since we have our intercept α terms), subtract one from the other, and do some basic algebra!
Now we can add and subtract βKmean(IncomeSi) to get
This is our decomposition! How is it useful? We can split the interpretation into two parts:
This first part is the amount of the gap in profit between the two store types that can be explained using our observed income variable. In other words, of the $10k monthly profit gap we observe, how much of that is just because kiosks are in wealthier areas? If we were to, for example, turn the full-size stores into kiosks, we wouldn’t expect this part of the profit gap to translate over. The remaining part,
Cannot be explained by differences in the level of the explanatory variable. Instead we have a mix of differences in how important that explanatory variable is for each group (the second part of the term) - this is the part of the profit gap for which we might say something like “$X of the gap is explained by the fact that kiosks are better at taking advantage of higher-income environments” (or perhaps this element is negative - maybe the kiosks are worse at taking advantage, but are in more advantageous environments anyway - you can see how nuanced our interpretation can get here for something fairly simple!).
Finally there’s the difference in those α values - these are differences unrelated to our explanatory variable. It could be that there really is some special secret sauce to the kiosks, or it could be some other explanatory variable we left out (are kiosks not just in higher-income environments but also higher foot-traffic environments?).
That’s the rundown. A pretty straightforward application of some algebra to a difference in some means, that lets you really break down where the difference is coming from.
Let’s imagine some numbers we might get here. I generated some random data with 500 kiosks and 500 full-size stores, with local incomes in the area drawn from a standard normal, with a .5 boost for kiosks, and profits drawn from a uniform from $30,000 to $50,000, plus $10,000 times the income, plus a $5,000 boost for kiosks generally.
When I run a basic Oaxaca decomposition on this,2 I find a difference in average profits of $9,568. This is split into a $4,751 gap explained by customer income differences (which is roughly what it should be - Kiosks got a .5-unit boost, times the $10,000-per-unit profit boost), and a $4,817 gap not explained by consumer income differences (again, roughly what it should be - the $5,000 direct boost to kiosks I baked into the data).
If this kind of breakdown feels familiar, you might have seen something similar from the world of accounting (a breakdown I suspect at some point was inspired by KOB, although I don’t really know), using the fact that revenues = price times volume (or, across many products, imagine “average price” instead of P):
Which shows you fairly directly how revenue changes from time A to time B can be explained by volume changes (VB(PB-PA) is how much revenues change purely because of price changes, and PA(VB-VA) is how much revenues changed purely based on volume changes).
Implementing It
That basic calculation is, more or less, all you need. Pretty simple!
There’s more you can do - adding more than one predictor (maybe foot traffic and store age in addition to just customer income) is certianly an option. You can do a “threefold decomposition” which adds an element that allows for a shared/interacted impact of predictor changes and effect-of-predictor changes.3 This is a decent walkthrough of these options, as well as a more detailed technical walkthrough in general and a discussion of some limitations. You can also do stuff like use nonlinear models or focus on non-mean parts of the distribution (like a median). And if your two settings are two different periods of time, you may want to take special care to account for time dependence when calculating standard errors on these estimates.
That said, if you’re not doing those things, this isn’t too difficult to do by hand. If you also want standard errors on all of this (or easy implementations of many of those extras), there are automatic software implementations. In R and Stata there are packages called oaxaca. In Python, the statsmodels package has you covered with statsmodels.stats.oaxaca.OaxacaBlinder
.
Applications
I’ve said a few times that I think the private sector would find this useful, and that I’ve gotten positive receptions from doing KOB decompositions before. What have I been doing exactly? When is this useful? What other uses might it have even in cases I haven’t done myself?
Most commonly, applications I’ve done have been single-variable decompositions looking at changes in revenues over time, somewhat along the lines of the accounting-style KOB I mentioned earlier (but with some of the additional detail we get from doing it the KOB way). A company sees sharp changes in revenue over time and wants to know why. Is it just the addition of new SKUs to their lineup? An expansion of customers? New locations? Or is the average customer increasing their spend, or each individual SKU or individual location is doing better? What share of the increase can be attributed to each?
I’ve done similar applications looking at the mix of products. How much of the standard seasonal variation in revenues is due to a shift towards premium products during certain times of year, as opposed to aggregate purchasing levels overall?
I’ve also done more-economics-less-accounting applications where the explanatory variable isn’t something like “product mix” or “number of SKUs/customers/locations” but something more external. The customer-income example from the previous section is based on a real application I did. There is, more generally, a great capacity to break down differences in performance into compositional differences vs. other kinds.
In general, it’s a common business analytics task to want to compare two different settings or time periods, notice that there’s a confounder keeping the comparison from being a clean one, and wanting to account for the confounder. Adjusting for the confounder via regression or matching is one thing, and lets you figure out how much of a gap remains afterward.
But the nice thing about decomposition is that gives you an easily-interpretable second layer of results: not only can you adjust for sources of difference, but you can measure how much of the gap is because of those differences, how much is left over, and how to get confidence intervals on both (and break down both of those by each confounder if you have multiples!). Apples-to-apples comparisons become easier to understand an reason about in a detailed way, and any time there’s a need for that, KOB can help.
Kitigawa got to it first in a 1955 JASA article. Oaxaca and Blinder independently (both independently of each other and independently of Kitigawa) proposed similar methods in 1973. So sayeth Wikipedia. I had only ever heard of it as Oaxaca or Oaxaca-Blinder as I learned it the first time around; the push to recognize Kitigawa’s contributions are, at least to this economist and I think most applied econometricians, recent.
I decided to run this simulation in Stata, since I happen to be on a computer that has Stata installed for the first time in ages, so why not? Here’s the code:
* ssc install oaxaca
clear
set obs 1000
set seed 1000
g Type = "Kiosk"
replace Type = "Full Size" if _n > 500
g Kiosk = Type == "Kiosk"
g Income = rnormal() + .5*(Type == "Kiosk")
g Profit = runiform()*20000 + 30000 + 10000*Income + 5000*(Type == "Kiosk")
oaxaca Profit Income, by(Kiosk) pooled
Notice how the two parts are “difference in coefficients times a single predictor average” and “difference in predictor average times a single coefficient”? The interacted term is “difference in coefficients times difference in predictor average”.
You beat me to it. This was in my drafts. Too many explainers not enough time! Looking forward to reading yours!