Clusterjerk, the much anticipated sequel

This was a week of nerdily viral statistics posts on my blog. A few days ago I talked about the knee-jerk clustering of standard errors in so many papers, and whether we should ever do this in individually-randomized experiments.

Turns out a lot of you have opinions and answers. Thanks for that.

For me the best news was an email from Stanford econometrician Guido Imbens:

The bottom line is that you are right. You have a sample, collected in whatever way. You do a randomized experiment with randomization at the unit level. Then you do not need to cluster. Without doing any cluster adjustment your standard robust variance estimator will lead to the correct inferences for the average effect for the (convenience) sample you have.

He backed this up with a forthcoming working paper of his with Alberto Abadie, Susan Athey, and Jeffrey Wooldridge. This is basically the Mount Rushmore of applied econometrics, so: game, set, and match?

Not entirely. The paper is not yet ready for circulation (wait for early 2016). So let me outline some of the other intuitions and proofs people argued.

First, there are proofs out there. From one commenter:

This review article by Cameron and Miller is helpful. On page 21, they write:

First, given V[b] defined in (7) and (9), whenever there is reason to believe that both the regressors and the errors might be correlated within cluster, we should think about clustering defined in a broad enough way to account for that clustering. Going the other way, if we think that either the regressors or the errors are likely to be uncorrelated within a potential group, then there is no need to cluster within that group.

I think the second sentence is what you care about.

You can see the idea more easily in the parameterized Moulton formula. It’s equation (6) in the paper. The equation shows that the “inflation” is a product of within cluster correlation in the regressor (treatment) and within cluster correllation in the outcome. If either of those terms is equal to 0 then there is no variance inflation to worry about. In an individual level randomized experiment then the within cluster correlation in the treatment will be zero and so there is no need to cluster.

The one regressor example that they give in section IIA applies well to experiments with person level random assignment and it does not use the parametric Moulton approach. It sets things up with the sort of cluster standard errors that come out of the stata cluster option.

I received similar feedback from Cyrus Samii, among others.

That’s theory. John Horton, a professor at NYU, generously and heroically decided to simulate the answer in a post titled Monte Carlo Clusterjerk. (Jeff Mosenkis at IPA correctly pointed out that this should be the name of our indie band. John, I’m down, and I play a mean kazoo.)

John basically shows the above logic with an empirical example, and here is a cool looking graph without context to prove it:

From another commenter, here is another way to think about the issue:

Others have already pointed out that looking at the moulton inflation factor shows that when assignment is at the individual level clustering isn’t necessary. If you’re like me, this probably doesn’t help much with the intuition though. Here’s an alternate explanation: if you assume each unit has a defined outcome under treatment and control (i.e. SUTVA), then each treatment and control unit in a randomized experiment is a random draw from one of two distributions — the distribution of potential treatment outcomes and the distribution of potential control outcomes. (This is a slight simplification. For full details see page 87 of Imbens and Rubin, 2015.) Thus, the treatment and control means are averages of independent, identically distributed variables and the usual estimate of the variance (without clustering) is justified. Note that this explanation does not make any assumptions whatsoever about the distribution of potential outcomes in the overall population (other than the basic stuff necessary for the CLT to hold). Also note that this would not be the case if you randomized at a higher level.
On a related note, these same issues are present when testing for baseline balance in a randomized experiment. I have seen quite a number of papers where the authors randomize at a group level and then to balance tests at the unit level and erroneously come to the conclusion that their randomization failed.

Or also from John Horton:

One closing thought, a non-econometric argument why clustering can’t be necessary for a true experiment with randomization at the individual level: for *any* experiment, presumably there is some latent (i.e., unobserved to the researcher) grouping of the data such that the errors within that group are correlated with each other. As such, we could never use our standard tools for analyzing experiments to get the right standard errors if taking this latent grouping into account was necessary.

19 Responses

  1. Chris,

    Thank you for your informative posts on the clustering issue. Initially I felt quite strongly that you were wrong about clustering being unnecessary in many contexts, but reading over this new post and the paper you mentioned, I’ve come over to your side on this issue.

    The way I’m interpreting it, at least in intuitive terms, is as follows: We want to estimate two means, say—one for the treatment and one for the control. The exogenous regressor was assigned and measured at the observation level, and we’re making independent draws from the population. Even though the error terms are correlated for some pairs of observations, each draw from the population (our sampling process) is independent, so this autocorrelation has no effect on our ability to measure the likely value (or distribution) of y_i for the next observation that we’d draw–i.e., the independence that’s relevant is that of our sampling process, not of the underlying process determining y_i.

    The cases in which clustering is going to be necessary is when the regressor is applied/measured at the group level (e.g., using state or village-level proxies to measure individual characteristics) or when the sampling process is not independent (e.g., sampling families, interviewing each member, and making inferences about the overall population, as in the PSID).

    Regards,
    Chris