Doing a data analysis

You probably know a fair bit about the methods we can use to do data analysis: about charts, about statistics and a basic idea of what they mean. But it’s not always easy to understand how we can put that all together in a usable context. How does it all fit when you need to ‘mine for insight’, as some of the more unfortunate business-speak would have it.

Let’s try this using the loanbook from the Australian arm of Ratesetter, which is a fintech company. Ratesetter specialises in marketplace lending: people can apply for a loan and other people can fund it. They publish their loanbook every quarter and the loanbook we’ll be using is from 30th September, 2017.

library(readxl)
library(ggplot2)
loanBook <- read_xlsx("data/20170930loanbook.xlsx", sheet = "RSLoanBook", col_names = TRUE, skip = 8)

Exploratory data analysis is a series of open-ended questions

I usually think of exploratory analysis as a series of questions I ask myself as I’m working with the data. This involves alot of intense staring-at-the-screen muttering and can be awkward in a shared workspace. If you have the mental capacity for it, I’d recommend asking the questions silently. If you’ve been following my blog before, you’ll have seen these infographics - it’s time to dust them off and show how I’d use them with a real data set.

First questions

Below is the first(!) list of some of the questions I ask myself while I’m working through the data set. You’ll note that in my first list I don’t even ask myself about means, variances or charts. This goes against my every instinct as a data scientist. I’m here for the stats, not the admin! But every time I’ve skipped this step, I’ve regretted it. Usually in six months time when someone has come back to me and said “we need clarification on \(x\)” and I have absolutely no idea what on earth \(x\) was.