One of the perennial discussions is normal vs t distributions: which do you use, when, why and so on. This is one of those cases where for most sample sizes in a business analytics/data science context it probably makes very little practical difference. Since that’s such a rare thing for me to say, I thought it was worth explaining.
Now I’m all for statistical rigour: you should use the right one at the right time for the right purpose, in my view. However, this can be one of those cases where if the sample size is large enough, it’s just not that big a deal.
The actual simulations I ran are very simple, just 10 000 draws from normal and t-distributions with the t varying at different degrees of freedom. Then I just plotted the density for each on the same graph using ggplot in R. If you’d like to have a play around with the code, leave a comment to let me know and I’ll post it to github.