This post is a curated list of my favourite tutorials and guides because “that one where Hadley Wickham was talking about cupcakes” isn’t the most effective search term. You can find my list of cheat sheets here. There are a lot of great resources on data science (I’ve included my top picks), so I don’t intend to reinvent the wheel here. This is just a list of my favourites all laid out so I can find them again or point other people in their direction when it comes up in conversation. I’ve also added a number of the “how to” type posts I’ve written on this blog as I often answer an enquiry in that format.

**Data Science**

- Not so standard deviations podcast: very funny, very accessible.
- Huge wealth of information on data science at the becoming a data scientist website here.
- Where to start with data mining and data science.
- What I’d say to a new data scientist: read Jane Austen.
- Yes you can: learn data science

**Tutorials and videos: General**

- Hadley Wickham making everyone a better programmer with cupcakes. (This is a man who knows his audience.)
- Sharing data amongst nerds: how to guide by Jeff Leek (not a puppet).

**Puppets teach data science too**

- Render a 3D object in R. I have no idea where I would ever use this information in my practice, but it’s presented BY A PUPPET. Great fun.
- DIY your data science. Another offering from the puppet circle on the data science venn diagram.

**Econometrics**

- Censored, truncated, categorical data: the differences.
- Why you should care about autocorrelation, a.k.a. the horrible things it can do to your modelling.
- Not sure what a confound is? Not your problem? You’ll get it after reading this.
- Elasticity and marginal effects: super simple explanation.

**Statistics**

- What is a probability distribution?
- Probability cheat sheet: really useful if you’re new to the concepts of thinking probabilistically.
- Conditional probability: visual guide.
- Ten simple rules for effective statistical practice from the ASA: very useful reminder.
- Explaining histograms. Fantastic .gif here, even better find the code here.
- Visualise correlations: weak to strong.
- Statistical significance as explained by the Economist.
- Starting a data analysis: questions to ask the first time.
- More questions to ask: data analysis.
- Data analysis: enough with the questions already.
- P-values: very thorough guide. Don’t throw that baby out with the bathwater. I’m a committed worshipper at the altar of Neumann-Pearson: but know what you’re doing with them.
- And its opposite: the p-hacker app. Have a good laugh.
- Correlation vs causation. It matters.
- New to reading data analysis? Check out this chart.

**Work Flow**

- Guide to modern statistical workflow. Really great organisation of background material.
- Tidy data, tidy models. Honestly, if there was one thing that had been around 10 years ago, I wish this was it. The amount of time and accuracy to be saved using this method is phenomenal.
- Extracting data from the web. You found the data, now what to do? Look here.

**Linear Algebra**

- What on God’s green earth are eigen values and vectors? Not instruments of torture. Look here.
- Shiny app for estimating principal components. The author is on my list of “favourite nerds” just for this.

**Asymptotics**

- Law of large numbers vs central limit theorem: in gif form.
- Law of large numbers.
- Visualise the central limit theorem.
- Understanding asymptotics in practice: at what sample size do correlations stabilise?
- The central limit theorem explained.

**Bayes**

- Bayesian statistics for beginners. Includes a great discussion of frequentism vs bayesianism and no one got hurt.
- Visualising Bayesian updating– this is a fantastic way of making sense of the concept.

**Machine learning**

- Cross validation methodology review.
- Variable selection in big data.
- An introduction in 15 hours of lectures.
- Random Forests in R.

**Data visualisation**

- Five tips for coding for data visualisation: particularly useful I thought.
- Geographical data visualisation: comparisons.
- Walk through for tree mapping.
- Revisualisation: a really interesting and thoughtful discussion.
- Voronoi layers for maps.
- Helpful checklist for visualisation– remember it’s a start, not a recipe book.

**Natural Language Processing**

- Extending the word cloud.
- Social network analysis how to guide.
- Social networks in the golden age of Latin literature.
- Creating a text document matrix, retrieving text from twitter, great tutorial here.
- This post has a number of resources I found useful when learning about simple NLP and word clouds.
- Term frequency and TF-IDF in R.
- Dimensionality reduction in text data. A very clear step-by-step.

I’ll continue to update this list as I find things I think are useful or interesting.

Edit: actually, “that one where Hadley Wickham was talking about cupcakes” is surprisingly accurate as a search term.