Open datasets for analysis

So you’re a new data scientist and you’re exploring everything the internet has to offer (a lot). But having explored, you’re ready to try something on your own. Here is a (short) list of data sources you can tackle:

I’ll keep adding to the list as I come across interesting things.

Three things every new data scientist should know

Anyone who has spent any time in the online data science community knows that this kind of post is a genre all on its own. “N things you should know/do/be/learn/never do” is something that pops up in my twitter feed several times a day. These posts range from useful ways to improve your own practice to clickbait listing reams of accomplishments that make Miss Bingley’s “accomplished young ladies” speech in Pride and Prejudice appear positively unambitious.

Miss Bingley’s pronouncement could be easily be applied to data scientists everywhere:

“Oh! certainly,” cried his faithful assistant, “no [woman] can be really esteemed accomplished who does not greatly surpass what is usually met with. A woman must have a thorough knowledge of music, singing, drawing, dancing, and the modern languages, to deserve the word; and besides all this, she must possess a certain something in her air and manner of walking, the tone of her voice, her address and expressions, or the word will be but half-deserved.”

Swap out the references to women with “data scientist”, throw in a different skill set and there we have it:

“Oh! certainly,” cried his faithful assistant, “no data scientist can be really esteemed accomplished who does not greatly surpass what is usually met with. A data scientist must have a thorough knowledge of programming in every conceivable language that was, is or shall be, linear algebra, business acumen, obscure models only ever applied in obscure places, and whatever is “hot” this year, to deserve the title; and besides all this, she must possess a certain something in her air and manner of tweeting, the tone of her blogging, her linkedin profile and be a snappy dresser, or the title will be but half-deserved.”

Put like that, you’d be forgiven for not allowing the Miss Bingleys of the world to define you.

If I had a list of things to say to new data scientists, they wouldn’t have much to do with data science at all:

  1. You define yourself and your own practice. Not twitter, not an online community, not blogs from people who may or may not know your work. Data science is an incredibly broad array of people, ideas and tools. Maybe you’re in the middle of it, maybe you’re on the edge. That’s OK, it’s all valuable.
  2. You’re more than a bot. This is an industry that is increasing automation every day. You add value to your organisation in ways that a bot never can. What is the value you add? Cultivate and grow it.
  3. The online community is a wonderful place full of people who want to help you grow your practice and potential. Dive in and explore: but remember that the advice and pronouncements are just that. They don’t always apply to you all the time. Take what’s useful today and put the rest aside until it’s useful later.

It’s a short list!

Congratulations to the Melbourne Data Science Group!

Last week, I attended the Melbourne Data Science Initiative and it was definitely the highlight of my data science calendar this year! The event was superbly organised by Phil Brierley and his team. Events included tutorials on Machine Learning, Deep Learning, Business Analytics and talks on feature engineering, big data and the need to invest in analytic talent amongst others.

The speakers were knowledgable and interesting with everything covered from the hilarious building of a rhinoceros catapult (thanks to Eugene from Presciient, it’s possible I’ll never forget that one) to the paramount importance of the “higher  purpose” in business analytics as discussed by Evan Stubbs from SAS Australia and New Zealand.

If you’re in or around Melbourne and into Data Science at all, check out the group who put on this event out here.