A Primer on Basic Probability

… and by basic, I mean basic. I sometimes find people come to me with questions and no one has ever taken the time to give them the most basic underpinnings in probability that would make their lives a lot easier. A friend of mine is having this problem and is on a limited time frame for solving it, so this is quick and dirty and contains both wild ad-lib on my part and swearing. When I get some more time, I’ll try and expand and improve, but for now it’s better than nothing.

Youtube explainer: done without microphone, sorry- time limit again.

Slides I used:


I mentioned two links in the screencast. One was Allen Downey’s walkthrough with python, you don’t need to know anything about Python to explore this one: well worth it. The other is Victor Powell’s visualisation of conditional probability. Again, worth a few minutes exploration.

Good luck! Hit me up in the comments section if you’ve got any questions, this was a super quick run through so it’s a summary at best.

Machine Learning is Basically the Reversing Camera on Your Car

I’ve been spending a bit of time on machine learning lately. But when it comes to classification or regression: it’s basically the reversing camera on your car.

Let me elaborate: machine learning, like a reversing camera, is awesome. Both things let you do stuff you already could do, but faster and more often. Both give you insights into the world around you that you may not have had without them. However, both can give a more narrow view of the world than some other techniques (in this case, expanded statistical/econometric methodologies and/or your mirrors and checking your blindspots).

As long as everything around you remains perfectly still and doesn’t change, the reversing camera will let you get into a tight parking spot backwards and give you some insights into where the gutter and other objects are that you didn’t have before. Machine learning does great prediction when the inputs are not changing.

But if you have to go a long way in reverse (like reversing down your driveway- mine is 400m long), or things are moving around you (other cars, pet geese, STUPID big black dogs that think running under your wheels is a great idea. He’s bloody fine, stupid mutt): then the reversing camera alone is not all the information you need.

In the same way, if you need to explain relationships- because your space is changing and prediction is not enough- then it’s a very useful thing to expand your machine learning toolbox with statistical/econometric techniques like hypothesis testing, information criteria and solid model building methodologies (as opposed to relying solely on lasso or ridge methods). Likewise, causality and endogeneity matters a lot.

So, in summary machine learning and reversing cameras are awesome, but aren’t the whole picture in many cases. Make your decision about what works best in your situation: don’t just default to what you’re used to.

(Also, I’m not convinced this metaphor extends in the forwards direction. Data analysis? You only reverse, maybe 5% of the time you’re driving. But you’re driving forward the rest of the time: data analysis is 95% of my workflow. Yours?)

The Seven Stages of Being a Data Scientist

Becoming a data scientist is a fraught process as you frantically try to mark off all the bits on the ridiculous Venn diagram that will allow you to enter the high priesthood of data and be a “real” data scientist. In that vein, I offer you the seven stages on the road to becoming a “real” data scientist.

Like the Venn diagrams (the best and most accurate is here), you should take these stages just as seriously.

(1) You find out that this data and code at the same time thing makes myyourbrain hurt.

(2) OK, you’re getting it now! [insert popular methodology du jour] is the most amazing thing ever! It’s so cool! You want to learn all about it!

(3) Why the hell won’t your matrix invert? You need to know how to code how many damn languages?

(4) While spending three increasingly frustrated hours looking for a comma, bracket or other infinitesimal piece of code in the wrong place, realise most of your wardrobe is now some variation on jeans and a local t-shirt, or whatever your local equivalent is. Realise you’ve crossed some sort of psychological divide. Wonder what the meaning of life is and remember it’s 42. Try to remember the last time you ate something that wasn’t instant coffee straight off the spoon. Ponder the pretty blinking cursor for a bit. Find your damn comma and return from the hell of debugging. Repeat stage (4) many times. (Pro tip: print statements are your friend.)

(5) Revise position on (2) to “it does a good enough job in the right place.”

(6) Revise position on (5) to “… that’s what the client wants and I need to be a better negotiator to talk them out of it because it’s wrong for this project.” All of a sudden your communication skills matter more than your code or your stats geek stuff.

(7) By this stage, you don’t really care what language or which method someone uses as long as they can get the job done right and explain it to the client so they understand it. The data and code at the same time thing still makes your brain hurt, though.

Code Club Happened!

Today was the first day of code club and we had about 20 kids between the ages of 5 and 12. Wow! That was different to any class I’ve taught previously. For starters, no one’s Mum has ever picked them up from one of my classes before.

I noticed a couple of things that blew me away:

  • These kids have no fear of failure (yet). Something doesn’t work? Doesn’t matter. Try something else.
  • They are native problem solvers. Our project was sound based, but sound didn’t work on some computers. They just made it visual.
  • They have no preconceptions about their coding ability. No fear of code. The most naked ‘yes we can’ I have ever seen in a class.
  • Also they’re noisy fun seekers and full of chaotic enthusiasm. Hats off to anyone who can teach kids six hours a day every day!

I also had a few parents with zero coding knowledge drop by to help and they were just as amazing as the kids. The school that’s hosting us has also been amazingly supportive with multiple teachers staying back to help out and try something new. I can’t believe how much support there is for the club from the school community.

We’re having fun with Scratch, but I set the kids up a Minecraft server too and I’m slowly adding kids one at a time. Honestly, they’re most interested in that right now!