Data Analysis: Enough with the Questions Already

We’ve talked a lot about data analysis lately. First we asked questions. Then we asked more. Hopefully when you’re doing your own analyses you have your own questions to ask. But sooner or later, you need to stop asking questions and start answering them.

Ideally, you’d really like to write something that doesn’t leave the reader with a keyboard imprint across their forehead due to analysis-induced narcolepsy. That’s not always easy, but here are some thoughts.

Know your story.

Writing up data analysis shouldn’t be about listing means, standard deviations and some dodgy histograms. Yes, sometimes you need that stuff- but mostly what you need is a compelling narrative. What is the data saying to support your claims?

It doesn’t all need to be there. 

You worked out that tricky bit of code and did that really awesome piece of analysis that led you to ask questions and… sorry, no one cares. If it’s not a direct part of your story, it probably needs to be consigned to telling your nerd friends on twitter- at least they’ll understand what you’re talking about. But keep it out of the write up!

How is it relevant?

Data analysis is rarely the end in and of itself. How does your analysis support the rest of your project? Does it offer insight for modelling or forecasting? Does it offer insight for decision making? Make sure your reader knows why it’s worth reading.

Do you have an internal structure?

Data analysis is about translating complex numerical information into text. A clear and concise structure for your analysis makes life much easier for the reader.

If you’re staring at the keyboard wondering if checking every social media account you ever had since high school is a valid procrastination option: try starting with “three important things”. Then maybe add three more. Now you have a few things to say and can build from there.

Who are you writing for?

Academia, business, government, your culture, someone else’s, fellow geeks, students… all of these have different expectations around communication.  All of them are interested in different things. Try not to have a single approach for communicating analysis to different groups. Remember what’s important to you may not be important to your reader.

Those are just a few tips for writing up your analyses. As we’ve said before: it’s not a one-size-fits-all approach. But hopefully you won’t feel compelled to give a list of means, a correlation matrix and four dodgy histograms that fit in the space of a credit card. We can do better than that!

Data Analysis: More Questions

In our last post on data analysis, we asked a lot of questions. Data analysis isn’t a series of generic questions we can apply to every dataset we encounter, but it can be a helpful way to frame the beginning of your analysis. This post is, simply, some more questions to ask yourself if you’re having trouble getting started.

The terminology I use below (tall, dense and wide) is due to Francis Diebold. You can find his original post here and it’s well worth a read.

Remember, these generic questions aren’t a replacement for a thoughtful, strategic analysis. But maybe they will help you generate your own questions to ask your data.

Data analysis infographic

Data Analysis: Questions to Ask the First Time

Data analysis is one of the most under rated, but most important parts of data science/econometrics/statistics/whatever it is you do with data.

It’s not impressive when it’s done right because it’s like being impressed by a door handle: it is something that is both ubiquitous and obvious. But when you’re missing the doorhandles, you can’t open the door.

There are lots of guides to data analysis but fundamentally there is no one-size-fits-most approach that can be guaranteed to work for every data set. Data analysis is a series of open-ended questions to ask yourself.

If you’re new or coming to data science from a background that did not emphasise statistics or econometrics (or story telling with data in general), it can be hard to know which questions to ask.

I put together this guide to offer some insight into the kinds of questions I ask myself when examining my data for the first time. It’s not complete: work through this guide and you won’t have even started the analysis proper. This is just the first time you open your data, after all.

But by uncovering the answers to these questions, you’ll have a more efficient analysis process. You’ll also (hopefully) think of more questions to ask yourself.

Remember, this isn’t all the information you need to uncover: this is just a start! But hopefully it offers you a framework to think about your data the first time you open it. I’ll be back with some ideas for the second time you open your data later.

career timeline-2.