Things I wish I’d noticed in grad school

Back in the day, I tended to get a little hyper-focussed on things. I’m sure someone, sometime, somewhere pointed this stuff out to me. But at the time it went over my head and I learned these things the hard way. Maybe my list of things I wish I’d noticed helps someone else.

  • Your professional contacts matter and it’s OK to ask for help. You’re not researching in a vacuum, the people around you want to help.
  • You need to look outside your department and university. There’s a bigger, wider world out there and while what’s going on inside your little world seems like it’s important: you need to be aware of what’s outside too.
  • Being methodologically/theoretically robust matters, yes. But learning when to let it go is going to be harder than learning the theory/methodology. No easy answers here, all you can do is make your decision and own it.
  • It doesn’t matter how much you read, you’re not going to be an expert across your whole field. Just be aware of the field and be an expert in what you’re doing right now. That’s OK.
  • Get a life. Really.

Tiny Coders

I’ve mentioned it before, but I run the local code club out here in rural Australia. We are using the Code Club curriculum, designed for kids aged 9-12. Due to our particular circumstances with transport and distance, our code club needs to offer fun and learning for the age range 5-8 as well. Some of our littles are finding the materials too challenging to be fun, so as of this week we are running two streams:

  • The “Senior Dev Team”: in time-honoured managerial tradition, I told them they could be senior devs with a badge, if they helped the littles. That’s right, more responsibility and nothing but a badge to show for it. The senior dev team is going to keep going with the regular code club projects and they are smashing them out. Seriously, all I need to do is get these kids a black t-shirt each and they’re regular programmers already.
  • The “red team”: these are our kids that are struggling with the projects we have been doing and not having fun because of it. We’ll be doing multistage projects with lots of optional end points for kids to stop and go play: these are really young kids sitting down to code after six hours of school, so for some of them 20 minutes is more than enough. For them, it’s enough that they learn that computers and code are fun and interesting. For the older/more capable kids in this group we’ll still be learning about loops and conditional statements and all the good stuff, but our projects will be pared back and more basic so they aren’t overwhelming.

Our first red team project is here: Flying Cat Instructions and on Github here.

Of course, none of this would be possible without an amazing team of dedicated parent and teacher volunteers: many of whom had very little computer skills before we started and NO coding skills. They’re as amazing as the kids.

Data Visualisation: Hex Codes, Pantone Colours and Accessibility

One of the things I find hardest about data visualisation is colouring. I’m not a natural artist, much preferring everything in gentle shades of monochrome. Possibly beige. Obviously for any kind of data visualisation, this limited .Quite frankly this is the kind of comfort zone that needs setting on fire.

I’ve found this site really helpful: it’s a listing of the Pantone colours with both Hex and RGB codes for inserting straight into your visualisations. It’s a really useful correspondence if I’m working with someone (they can give me the Pantone colour numbers of their website or report palette- I just search the page).

One thing I’ve found, however, is that a surprising (to me) number of people have some kind of colour-based visual impairment. A palette that looks great to me may be largely meaningless to someone I’m working with. I found this out in one of those forehead slapping moments when I couldn’t understand why a team member wasn’t seeing the implications of my charts. That’s because, to him, those charts were worse than useless. They were a complete waste of his time.

Some resources I’ve found helpful in making my visualisations more accessible are the colourblind-friendly palettes discussed here and this discussion on R-Bloggers. The latter made me realise that up until now I’ve been building visualisations that were obscuring vital information for many users.

The things I think are important for building an accessible visualisation are:

  • Yes, compared to more subtle palettes, colour-blind friendly palettes look like particularly lurid unicorn vomit. They don’t have to look bad if you’re careful about combinations, but I’m of the opinion that prioritising accessibility for my users is more important than “pretty”.
  • Redundant encoding (discussed in the R-bloggers link above) is a great way ensuring users can make out the information you’re trying to get across. To make sure this is apparent in your scale, use a combination of scale_colour_manual() and scale_linetype_manual(). The latter works the same as scale_colour_manual() but is not as well covered in the literature.
  • Consider reducing the information you’re putting into each chart, or using a combination of facets and multiple panels. The less there is to differentiate, the easier it can be on your users. This is a good general point and not limited to those with colourblindness.

Yes, you can: learn data science

Douglas Adams had it right in Dirk Gently’s Holistic Detective Agency. Discussing the mathematical complexity of the natural world, he writes:

… the mind is capable of understanding these matters in all their complexity and in all their simplicity. A ball flying through the air is responding to the force and direction with which it was thrown, the action of gravity, the friction of the air which it must expend its energy on overcoming, the turbulence of the air around its surface, and the rate and direction of the ball’s spin. And yet, someone who might have difficulty consciously trying to work out what 3 x 4 x 5 comes to would have no trouble in doing differential calculus and a whole host of related calculations so astoundingly fast that they can actually catch a flying ball.

If you can catch a ball, you are performing complex calculus instinctively. All we are doing in formal mathematics and data science is putting symbols and a syntax around the same processes you use to catch that ball.

Maybe you’ve spent a lot of your life believing you “can’t” or are “not good at” mathematics, statistics or whatever bugbear of the computational arts is getting to you. These are concepts we begin to internalise at a very early age and often carry them through our lives.

The good news is yes you can. If you can catch that ball (occasionally at least!) then there is a way for you to learn data science and all the things that go with it. It’s just a matter of finding the one that works for you.

Yes you can.

Teaching kids to code

Kids coding is a topical issue, particularly given the future of employment. The jobs our children will be doing are different to the ones our parents did/are doing and to our own. Programming skills are one of the few things that the experts agree are important.

There are lots of great online resources already in place to help children learn the computer skills they will need in the future. You can start early, you can make it fun and it doesn’t have to cost you a fortune.

Let me be clear: this isn’t a parenting blog. I do have kids. I do program. I do have a kid that wants to learn to program (mostly I think because he thinks I’ll give him a free pass on other human-necessary skills such as creativity, interpersonal relationships and trying on sports day).

My personal parenting philosophy (if anyone cares) is that kids learn very well when you give them interesting tools to explore the world with. That might include programming, but for some kids it won’t. That’s OK. It doesn’t mean they’re never going to get a job: it just means they may prefer to climb trees because they’re kids. There’s a lot of learning to be had up a tree.

But part of providing interesting resources with which to explore the world is knowing where to find them. Here’s a run down of some resources broken down by age group. Yes, kids can start as early as preschool!

Preschool Age (4 +)

The best resources for kids this age are fun interactive apps. If it’s not fun, they won’t engage and frankly nobody wants to stand over a small child making them do something when they could be learning autonomously through undirected play. Here are my favourites:

  • Lightbot. This is a fun interactive app available on Android and Apple that teaches kids the basics of programming using icons rather than language-based code. It comes in both junior coding (4-8 years) and programming puzzles (9+) and my kids have had the apps for six months and enjoyed them.
  • Cargo-bot was recommended to me by a fellow programming-parent and I love the interface and the puzzles. My friends have had the app for a few months and young I. enjoys it a lot.
  • Flow isn’t a coding app. It’s an app that encourages visual motor planning development. Anyone that’s done any coding at all will know that visual motor planning is a critical skill for programming. First this then that. If I put this here then that needs to go there. Flow is a great game that helps kids develop this kind of planning. And that’s helpful not only for programming, but everything else too.

School Age Kids (9 +)

Once kids are comfortable reading and manipulating English as a language, they can move on to a language-based program. There are a few different ones available, some specifically designed for kids like Tynker and Scratch.  For the kid that I have in this age bracket- taking into account his interests and temperament- I’m just going to go straight to Python or R for him. As with everything parenting: your mileage may vary and that’s OK.

Some resources for learning python with kids include:

  • This great post from Geekwire. Really simple ideas to engage with your kid.
  • Python Tutorials for kids 13+ is a companion site to the For Dummies book Python for kids I’ve mentioned previously. We got the book from the library a month or so back and I’m thinking of shelling out the $$ to buy it and keep it here permanently.
  • The Invent with Python blog has some great discussion of the issue generally.

R doesn’t seem to have as many kid-friendly resources, but the turtle graphics package looks like it might be worth a try.

General Resources for Teaching Kids to Code

Advocates for programming have been beating this drum for a long time. I came across a number of useful posts while writing this one, so here they are for your reference:

Good luck and enjoy coding with your kid. And if your kid doesn’t want to learn code, enjoy climbing that tree instead!

Open datasets for analysis

So you’re a new data scientist and you’re exploring everything the internet has to offer (a lot). But having explored, you’re ready to try something on your own. Here is a (short) list of data sources you can tackle:

I’ll keep adding to the list as I come across interesting things.

Three things every new data scientist should know

Anyone who has spent any time in the online data science community knows that this kind of post is a genre all on its own. “N things you should know/do/be/learn/never do” is something that pops up in my twitter feed several times a day. These posts range from useful ways to improve your own practice to clickbait listing reams of accomplishments that make Miss Bingley’s “accomplished young ladies” speech in Pride and Prejudice appear positively unambitious.

Miss Bingley’s pronouncement could be easily be applied to data scientists everywhere:

“Oh! certainly,” cried his faithful assistant, “no [woman] can be really esteemed accomplished who does not greatly surpass what is usually met with. A woman must have a thorough knowledge of music, singing, drawing, dancing, and the modern languages, to deserve the word; and besides all this, she must possess a certain something in her air and manner of walking, the tone of her voice, her address and expressions, or the word will be but half-deserved.”

Swap out the references to women with “data scientist”, throw in a different skill set and there we have it:

“Oh! certainly,” cried his faithful assistant, “no data scientist can be really esteemed accomplished who does not greatly surpass what is usually met with. A data scientist must have a thorough knowledge of programming in every conceivable language that was, is or shall be, linear algebra, business acumen, obscure models only ever applied in obscure places, and whatever is “hot” this year, to deserve the title; and besides all this, she must possess a certain something in her air and manner of tweeting, the tone of her blogging, her linkedin profile and be a snappy dresser, or the title will be but half-deserved.”

Put like that, you’d be forgiven for not allowing the Miss Bingleys of the world to define you.

If I had a list of things to say to new data scientists, they wouldn’t have much to do with data science at all:

  1. You define yourself and your own practice. Not twitter, not an online community, not blogs from people who may or may not know your work. Data science is an incredibly broad array of people, ideas and tools. Maybe you’re in the middle of it, maybe you’re on the edge. That’s OK, it’s all valuable.
  2. You’re more than a bot. This is an industry that is increasing automation every day. You add value to your organisation in ways that a bot never can. What is the value you add? Cultivate and grow it.
  3. The online community is a wonderful place full of people who want to help you grow your practice and potential. Dive in and explore: but remember that the advice and pronouncements are just that. They don’t always apply to you all the time. Take what’s useful today and put the rest aside until it’s useful later.

It’s a short list!

Congratulations to the Melbourne Data Science Group!

Last week, I attended the Melbourne Data Science Initiative and it was definitely the highlight of my data science calendar this year! The event was superbly organised by Phil Brierley and his team. Events included tutorials on Machine Learning, Deep Learning, Business Analytics and talks on feature engineering, big data and the need to invest in analytic talent amongst others.

The speakers were knowledgable and interesting with everything covered from the hilarious building of a rhinoceros catapult (thanks to Eugene from Presciient, it’s possible I’ll never forget that one) to the paramount importance of the “higher  purpose” in business analytics as discussed by Evan Stubbs from SAS Australia and New Zealand.

If you’re in or around Melbourne and into Data Science at all, check out the group who put on this event out here.