Data Analysis: More Questions

In our last post on data analysis, we asked a lot of questions. Data analysis isn’t a series of generic questions we can apply to every dataset we encounter, but it can be a helpful way to frame the beginning of your analysis. This post is, simply, some more questions to ask yourself if you’re having trouble getting started.

The terminology I use below (tall, dense and wide) is due to Francis Diebold. You can find his original post here and it’s well worth a read.

Remember, these generic questions aren’t a replacement for a thoughtful, strategic analysis. But maybe they will help you generate your own questions to ask your data.

Data analysis infographic

Data Analysis: Questions to Ask the First Time

Data analysis is one of the most under rated, but most important parts of data science/econometrics/statistics/whatever it is you do with data.

It’s not impressive when it’s done right because it’s like being impressed by a door handle: it is something that is both ubiquitous and obvious. But when you’re missing the doorhandles, you can’t open the door.

There are lots of guides to data analysis but fundamentally there is no one-size-fits-most approach that can be guaranteed to work for every data set. Data analysis is a series of open-ended questions to ask yourself.

If you’re new or coming to data science from a background that did not emphasise statistics or econometrics (or story telling with data in general), it can be hard to know which questions to ask.

I put together this guide to offer some insight into the kinds of questions I ask myself when examining my data for the first time. It’s not complete: work through this guide and you won’t have even started the analysis proper. This is just the first time you open your data, after all.

But by uncovering the answers to these questions, you’ll have a more efficient analysis process. You’ll also (hopefully) think of more questions to ask yourself.

Remember, this isn’t all the information you need to uncover: this is just a start! But hopefully it offers you a framework to think about your data the first time you open it. I’ll be back with some ideas for the second time you open your data later.

career timeline-2.


Congratulations to the Melbourne Data Science Group!

Last week, I attended the Melbourne Data Science Initiative and it was definitely the highlight of my data science calendar this year! The event was superbly organised by Phil Brierley and his team. Events included tutorials on Machine Learning, Deep Learning, Business Analytics and talks on feature engineering, big data and the need to invest in analytic talent amongst others.

The speakers were knowledgable and interesting with everything covered from the hilarious building of a rhinoceros catapult (thanks to Eugene from Presciient, it’s possible I’ll never forget that one) to the paramount importance of the “higher  purpose” in business analytics as discussed by Evan Stubbs from SAS Australia and New Zealand.

If you’re in or around Melbourne and into Data Science at all, check out the group who put on this event out here.