The Central Limit Theorem: Misunderstood

Asymptotics are the building blocks of many models. They’re basically lego: sturdy, functional and capable of allowing the user to exercise great creativity. They also hurt like hell when you don’t know where they are and you step on them accidentally. I’m pushing it on the last, I’ll admit. But I have gotten very sweary over recalcitrant limiting distributions in the past (though I may be in a small group there).

One of the fundamentals of the asymptotic toolkit is the Central Limit Theorem, or CLT for short. If you didn’t study eight semesters of econometrics or statistics, then it’s something you (might have) sat through a single lecture on and walked away with the hot take “more data is better”.

The CLT is actually a collection of theorems, but the basic entry-level version is the Lindberg-Levy CLT. It states that for any sample of n random, independent observations drawn from any distribution with finite mean (μ) and standard deviation (σ), if we calculate the sample mean x-bar then,

central limit theorem

In my time both in industry and in teaching, I’ve come across a number of interpretations of this result: many of them very wrong from very smart people. I’ve found it useful to clarify what this result does and does not mean, as well as when it matters.

Not all distributions become normal as n gets large. In fact, most things don’t “tend to normality” as N gets large. Often, they just get really big or really small. Some distributions are asymptotically equivalent to normality, but most “things”- estimators and distributions alike- are not.

The sample mean by itself does not become normal as n gets large. What would happen if you added up a huge series of numbers? You’d get a big number. What would happen if you divided your big number by your huge number? Go on, whack some experimental numbers into your calculator!

Whatever you put into your calculator, it’s not a “normal distribution” you get when you’re done. The sample mean alone does not tend to a normal distribution as N gets large.

The studentised sample mean has a distribution which is normal in the limit. There are some adjustments we need to make before the sample mean has a stable limiting distribution – this is the quantity often known as the z-score. It’s this quantity that tends to normality as n gets large.

How large does n need to be? This theorem works for any distribution with a finite mean and standard deviation, e.g. as long as x comes from a distribution with these features. Generally, statistics texts quote the figure of n=30 as a “rule of thumb”. This works reasonably well for simple estimators and models like the sample mean in a lot of situations.

This isn’t to say, however, that if you have “big data” your problems are gone. You just got a whole different set, I’m sorry. That’s a different post, though.

So that’s a brief run down on the simplest of central limit theorems: it’s not a complex or difficult concept, but it is a subtle one. It’s the building block upon which models such as regression, logistic regression and their known properties have been based.

The infographic below is the same information, but for some reason my students find information in that format easier to digest. When it comes to asymptotic theory, I am disinclined to argue with them: I just try to communicate in whatever way works. On that note, if this post was too complex or boring, here is the CLT presented with bunnies and dragons.** What’s not to love?CLT infographic

** I can’t help myself: The reason why the average bunny weights distribution gets narrower as the sample size gets larger is because this is the sample mean tending towards the true population mean. For a discussion of this behaviour vs the CLT see here.

It’s my only criticism of what was an otherwise a delightful video. Said video being in every way superior to my own version done late one night for a class with my dog assisting and my kid’s drawing book. No bunnies or dragons, but it’s here.