Machine Learning: Beware Enthusiasts Bearing Algorithms

Machine learning is not the emperor with no clothes. It’s a serious, important discipline that has a lot to offer many industries. I’m not anti-machine learning. What I think is that machine learning is a discipline with a lot of hype surrounding it at the moment. Eventually this hype will die away and what will be left are the serious practitioners developing useful, robust analyses with real implications. In the meantime, those working with data scientists or with data science would do well to beware enthusiasts bearing gifts.

There are a lot of parallels between the enthusiasm for machine learning right now the enthusiasm for Bayesian methods about ten years ago. Then, as now, there were a large number of enthusiasts, a moderate number of people producing serious, useful analysis and a misguided belief in some quarters that Bayesian methods were the solution to just about everything. Sound familiar?

Then as now, Bayesian methods weren’t the solution to everything, but they offered great solutions to many problems. Machine learning is the same.

If you’re not a data scientist or not familiar with machine learning methods, beware the enthusiast who believes machine learning solves just about everything. It’s one tool in a whole suite of options. A good data scientist understands it, a great data scientist uses the whole toolbox.

If your enthusiast can’t tell you what’s in the black box, or how their algorithm works then be cautious and keep asking questions. Sometimes, the initial confusion is because the data scientist and the businessperson may actually be speaking two different languages. Try not to be put off by that, often your friendly nerd is doing an internal parallel translation between geek speak and regular language. It doesn’t mean they don’t know what they’re doing. When the statistician and the machine learning expert have to check in with each other regularly about terminology, this is definitely a “thing”!

Keep asking questions, keep listening to the answers: you’ll get a pretty good idea if this technique is being used by someone who knows how it works under the hood.

Things I’m glad got beaten into me in grad school

There are a few things that were – painstakingly and with great patience- inserted into my skull during grad school by my Ph.D. supervisor. A great supervisor is the best thing that can happen to you during a Ph.D. So, in no particular order, here are the things I’m glad he taught me (as of tonight, the list changes regularly):

  • You might think it’s all about the numbers, but you need to know how to write if you want anyone to care about the numbers.
  • Do it PROPERLY. No hacks, no bodge fixes. It will save you time and the occasional preventable heart attack in the long run.
  • It doesn’t really matter what programming language you use, but learn to code and learn to document that code thoroughly.
  • Even if you’re going into applied work, learn the theory: the hard stuff especially. Once you know the theory, you know you have options. You don’t have to default to what you’re familiar with, you have the skills to go and explore the unfamiliar.
  • Likewise, even if you’re going into theoretic work, learn how good applied work happens. Don’t be cavalier about applied work: in many cases the applied is the purpose for the theoretic. It doesn’t exist in a vacuum.
  • Reverse parking. Yes, he taught me to reverse park too.

Thanks for everything Andy, it was the best x

Data Visualisation: Hex Codes, Pantone Colours and Accessibility

One of the things I find hardest about data visualisation is colouring. I’m not a natural artist, much preferring everything in gentle shades of monochrome. Possibly beige. Obviously for any kind of data visualisation, this limited .Quite frankly this is the kind of comfort zone that needs setting on fire.

I’ve found this site really helpful: it’s a listing of the Pantone colours with both Hex and RGB codes for inserting straight into your visualisations. It’s a really useful correspondence if I’m working with someone (they can give me the Pantone colour numbers of their website or report palette- I just search the page).

One thing I’ve found, however, is that a surprising (to me) number of people have some kind of colour-based visual impairment. A palette that looks great to me may be largely meaningless to someone I’m working with. I found this out in one of those forehead slapping moments when I couldn’t understand why a team member wasn’t seeing the implications of my charts. That’s because, to him, those charts were worse than useless. They were a complete waste of his time.

Some resources I’ve found helpful in making my visualisations more accessible are the colourblind-friendly palettes discussed here and this discussion on R-Bloggers. The latter made me realise that up until now I’ve been building visualisations that were obscuring vital information for many users.

The things I think are important for building an accessible visualisation are:

  • Yes, compared to more subtle palettes, colour-blind friendly palettes look like particularly lurid unicorn vomit. They don’t have to look bad if you’re careful about combinations, but I’m of the opinion that prioritising accessibility for my users is more important than “pretty”.
  • Redundant encoding (discussed in the R-bloggers link above) is a great way ensuring users can make out the information you’re trying to get across. To make sure this is apparent in your scale, use a combination of scale_colour_manual() and scale_linetype_manual(). The latter works the same as scale_colour_manual() but is not as well covered in the literature.
  • Consider reducing the information you’re putting into each chart, or using a combination of facets and multiple panels. The less there is to differentiate, the easier it can be on your users. This is a good general point and not limited to those with colourblindness.

Decision vs Default: Abdicating to the Algorithm

Decision vs default is something I’ve been thinking a lot about lately in data science practise. It’s easy to stick with what we know and do well, rather than push our own boundaries. That’s pretty typical for life in general, but in data science it comes at the expense of defaulting to our norms instead of making a decision. That leads to less-than-optimal outcomes.

One way this can manifest is to default to the models and methods we know best. If you’re a machine learning aficionado, then you tend to use machine learning tools to solve your problems. Likewise, if you’re an econometrician by training, you may default to explicit model build and testing regimes. Playing to our strengths isn’t a bad thing.

When it comes to model construction, both methods have their good points. But the best outcome is when you make the decision to use one or the other in the knowledge that you have options, not because you defaulted to the familiar.

Explicit model build and testing is a useful methodology if explanation of your model matters. If your stakeholder needs to know why, not just what. These models are built with a priori assumptions about causation, relationships and functional forms. They require a reasonable level of domain knowledge and the model may be built out of many iterations of testing and experimenting. Typically, I use the Campos (2006) general to specific method: but not after extensive data analysis that informs my views on interactions, polynomial and other transformative inputs and so on. In this regime, predictive power comes from a combination of domain knowledge, statistical methodologies and a canny understanding of your data.

Machine learning methodologies on the other hand are useful if you need lots of models, in real time and you want them more for prediction, not explanation. Machine learning methodologies that use techniques like Lasso or Ridge regression let the algorithm guide feature selection to a greater degree than in the explicit model build methods. Domain knowledge still matters: interactions, the decisions regarding polynomial inputs and so on still have to be explicitly constructed in many cases. Causation may not be obvious.

Neither machine learning or statistical modelling is better in all scenarios. Either may perform substantially better in some, depending on what your performance metric is. But make your decision knowing you have options. Don’t default and abdicate to an algorithm.

Interpreting Models: Coefficients, Marginal Effects or Elasticities?

I’ve spoken about interpreting models before. I think that this is the most important part of our work, communicating results. However, it’s one that’s often overlooked when discussing the how-to of data science. That’s why marginal effects and elasticities are better for this purpose than coefficients alone.

Model build, selection and testing is complex and nuanced. Communicating the model is sometimes harder, because a lot of the time your audience has no technical background whatsoever. Your stakeholders can’t go up the chain with, “We’ve got a model. And it must be a good model because we don’t understand any of it.”

Our stakeholders also have a limited attention span so the explanation process is two fold: explain the model and do it fast.

For these reasons, I usually interpret models for my stakeholders with marginal effects and elasticities, not coefficients or log-odds. Coefficient interpretation is very different for regressions depending on functional form and if you have interactions or polynomials built into your model, then the coefficient is only part of the story. If you have a more complex model like a tobit, conditional logit or other option, then interpretation of coefficients is different for each one.

I don’t know about your stakeholders and reporting chains: mine can’t handle that level of complexity.

Marginal effects and elasticities are also different for each of these models but they are by and large interpreted in the same way. I can explain the concept of a marginal effect once and move on. I don’t even call it a “marginal effect”: I say “if we increase this input by a single unit, I expect [insert thing here]” and move on.

Marginal effects and elasticities are often variable over the range of your sample: they may be different at the mean than at the minimum or maximum, for example. If you have interactions and polynomials, they will also depend on covarying inputs. Some people see this as added layers of complexity.

In the age of data visualisation, I see it as an opportunity to chart these relationships and visualise how your model works for your stakeholders.

We all know they like charts!

Bonds: Prices, Yields and Confusion- a Visual Guide

Bonds have been the talk of the financial world lately. One minute it’s a thirty-year bull market, the next it’s a bondcano. Prices are up, yields are down and that’s bad. But then in the last couple of months, prices are down and yields are up and that’s bad too, apparently. I’m going to take some of the confusion out of these relationships and give you a visual guide to what’s been going on in the bond world.

The mathematical relationship between bond prices and yields can be a little complicated and I know very few people who think their lives would be improved by more algebra in it. So for our purposes, the fundamental relationship is that bond prices and yields move in opposite directions. If one is going up, the other is going down. But it’s not a simple 1:1 relationship and there are a few other factors at play.

There are several different types of bond yields that can be calculated:

  • Yield to maturity: the yield you would get if you hold the bond until it matures.
  • Yield to call: the yield you would get if you hold the bond until its call date.
  • Yield to worst: the worst outcome on a bond, whether it is called or held to maturity.
  • Running yield: this is roughly the yield you would get from holding the bond for a year.

We are going to focus on yield to maturity here, but a good overview of yields generally can be found at FIIG. Another good overview is here.

 

To explain all this (without algebra), I’ve created two simulations. These show the approximate yield to maturity against the time to maturity, coupon rate and the price paid for the bond. For the purposes of this exercise, I’m assuming that our example bonds have a face value of $100 and a single annual payment.

The first visual shows what happens as we change the price we pay for the bond. When we buy a bond below face value (at, say $50 when its face value is $100), yield is higher. But if we buy that bond at $150, then yield is much lower. As price increases, yield decreases.

The time the bond has until maturity matters a lot here, though. If there is only a short time to maturity then the differences between below/above face value can be very large. If there are decades to maturity, then these differences tend to be much smaller. The shading of the blue dots represent the coupon rate that might be attached to a bond like this- the darkest colours will have the highest coupon rate and the lighter colour will have the lowest coupon rates. Again, the differences matter more when there is less time for a bond to mature.

Prices gif

The second animation is a representation of what happens as we change the coupon rate (e.g. the interest rate the debtor is paying to the bond holder). The lines of dots represent differences in the price paid for the bond. The lighter colours represent a cheaper purchase below face value (better yields- great!). The darker colours represent an expensive purchase above face value (lower yields-not so great).

If we buy a bond cheaply, then the yield may be higher than the coupon rate. If we buy it over the face value, then the yield may be lower than the coupon rate. The difference between them is less the longer the bond has to mature. When the bond is very close to maturity those differences can be quite large.

Coupon Gif

When discussing bonds, we often mention something called the yield curve and this describes the yield a bond (or group of bonds) will generate over their life time.

If you’d like to have a go at manipulating the coupon rate and the price to manipulate an approximate yield curve, you can check out this interactive I built here.

Remember that all of these interactives and animations are approximate, if you want to calculate yield to maturity exactly, you can use an online calculator like the one here.

So how does this match the real data that gets reported on daily? Our last chart shows the data from the US Treasury 10-year bills that were sold on the 25th of November 2016. The black observations are bonds maturing within a year, the blue are those that have longer to run.  Here I’ve charted the “Asked Yield”, which is the yield a buyer would receive if the seller sold their bond at the price they were asking. Sometimes, however, the bond is bought at a lower bid, so the actual yield would be a little higher. I’ve plotted this against the time until the bond matures. We can see that the actual yield curve produced is pretty similar to our example charts.

This was the yield curve from one day. The shape of the yield curve will change on a day-to-day basis depending on the prevailing market conditions (e.g. prices). It will also change more slowly over time as the Federal Reserve issues bonds with higher or lower coupon rates, depending on economic conditions.

yield curve

Data: Wall Street Journal.

Bond yields and pricing can be confusing, but hopefully as you’re reading the financial pages they’re a lot less so now.

A huge thanks to my colleague, Dr Henry Leung at the University of Sydney for making some fantastic suggestions on this piece.

 

Update: Kids Are Going to Code

I’m a big believer that kids should be given access to learning about code and programming. I don’t believe there is any particular right, wrong or best method or language for kids: I think when they go pro, we’re going to have a whole new suite of languages. The most important thing is to teach them that they can solve interesting, relevant problems and give them this skill set generally.

It turns out there are a lot of kids and parents who feel strongly about learning too, but in rural Australia where we’re situated, resources are thin on the ground. So Rex Analytics is about to enter a whole new world of education and training- one with very small people in it. That’s right, I’m running the local kids’ code club this year!

Our plan is to use the resources provided by Code Club Australia and we’ll start out with Scratch, move onto some basic CSS/HTML and then go onto Python, partly because the kids around here like snakes.

Code Club is designed for kids ages 9-12, but in our tiny town, we need to be able to provide activities for younger kids to ensure that all siblings can attend. So we’ll be flying by the seat of our pants with the younger group ages 5-8 and really just letting them guide us. The plan is lots of iPad-based learning which I’ve talked about before.

The school and the P and C are big supporters and I’ve got another parent lined up to help out (thanks Dave!), hopefully more will be interested as time goes on.

Wish me luck. I’ve always had a healthy respect for my kids’ teachers, but I think that’s about to increase exponentially…

Political Donations 2015/16

Yesterday, the ABC released a dataset detailing donations made to political parties in Australia during the 2015-16 period. You can find their analysis and the data here. The data itself isn’t a particularly good representation of what was happening during the period: there isn’t a single donation to the One Nation Party among the lot of them, for example. This data isn’t a complete picture of what’s going on.

While the ABC made a pretty valiant effort to categorise where the donations were coming from, “uncategorised” was the last resort for many of the donors.

Who gets the money?

In total, there were 49 unique groups who received the money. Many of these were state branches of national parties, for example the Liberal Party of Australia – ACT Division, Liberal Party of Australia (S.A. Division) and so on. I’ve grouped these and others like it together under their national party. Other groups included small narrowly-focussed parties like the Shooters, Fishers and Farmers Party and the Australian Sex Party. Small micro parties like the Jacqui Lambie Network, Katter’s Australian Party and so on were grouped together. Parties with a conservative focus (Australian Christians, Family First, Democratic Labor Party) were grouped and those with a progressive focus (Australian Equality Party, Socialist Alliance) were also grouped together. Parties focused on immigration were combined.

The following chart shows the value of the donation declared and the recipient group that received it.

Scatter plot

Only one individual donation exceeded $500 000 and that was to the Liberal Party. It’s obscuring the rest of the distribution, so I’ve removed it in the next chart. Both the major parties receive more donations than the other parties, which comes as no surprise to anyone. However, the Greens have a proportion of very generous givers ($100 000+) which is quite substantial. The interesting question is not so much as who received it, but who gave the money.

Scatter plot with outlier removed

 

Who gave the money?

This is probably the more interesting point. The following charts use the ABC’s categories to see if we can break down where the (declared) money trail lies (for donations $500 000 and under). Again, the data confirmed what everyone already knew: unions give to the Labor party. Finance and insurance gave heavily to the Liberal Party (among others). Several clusters stand out, though: uncategorised donors give substantially to minor parties and the Greens have two major clusters of donors: individuals and a smaller one in the agriculture category.

Donor categories and value scatter plot

Breaking this down further, if we just look at where the money came from and who it went to, we can see that the immigration-focused parties are powered almost entirely by individual donations with some from uncategorised donors. Minor parties are powered by family trusts, unions and uncategorised donors. Greens by individuals, uncategorised and agriculture with some input from unions. What’s particularly interesting is the differences in Labor and Liberal donors. Compared to Liberal, Labor does not have donors in the tobacco industry, but also has less input by number of donations in agriculture, alcohol, advocacy/lobby groups, sports and water management. They also have fewer donations from uncategorised donors and more from unions.

Donors and Recipients Scatterplot

What did we learn?

Some of what we learned here was common knowledge: Labor doesn’t take donations from tobacco, but it does from unions. The unions don’t donate to Liberal, but advocacy and lobby groups do. The more interesting observations are focussed on the smaller parties: the cluster of agricultural donations for the Greens Party – normally LNP heartland; and the individual donations powering the parties focussed on immigration. The latter may have something to say for the money powering the far right.

 

Productivity: In the Long Run, It’s Nearly Everything.

“Productivity … isn’t everything, but in the long run it’s nearly everything.” Paul Krugman, The Age of Diminished Expectations (1994)

So in the very long run, what’s the Australian experience? I recently did some work with the Department of Communications and the Arts on digital techniques and developments. Specifically, we were looking at the impacts advances in fields like machine learning, artificial intelligence and blockchain may have on productivity in Australia. I worked with a great team at the department led by the Chief Economist Paul Paterson and we’re looking forward to our report being published.

In the meantime, here’s the very long run on productivity downunder.

Australian Productivity Chart