Stargate SG1: Sentiment is not Enough for the Techno Bugs

One of the really great things as your kids get older is that you can share with them stuff you thought was cool when you were young, had ideals and spare time. One of those things my siblings and I spent a lot of that spare time doing was watching Stargate SG1. Now I’ve got kids and they love the show too.

When I sat down to watch the series the first time I was a history student in my first year of university, so Daniel’s fascination with languages and cultures was my interest too. Ironically, at the time I was also avoiding taking my first compulsory econometrics course because I was going to hate it SO MUCH.

Approximately one million years, a Ph.D. in econometrics and possibly an alternate reality later, I’m a completely different person. With Julia Silge’s fabulous Austen analyses fresh in my mind (for a start, see here and then keep exploring) I rewatched the series. I wondered: how might sentiment work for transcripts, rather than print-only media like a novel?

In my view, this is something like an instrumental variables problem. A transcript of a TV show is only part of the medium’s signal: imagery and sound round out the full product. So a sentiment analysis on a transcript is only an analysis of part of the presented work. But because dialogue is such an intrinsic and important part of the medium, might it give a good representation?

What is sentiment analysis?

If you’re not a data scientist, or you’re new to natural language processing, you may not know what sentiment analysis is. Basically, sentiment analysis compares a list of words (like you may find in a transcript, a speech or a novel) to a dictionary that measures the emotions the words convey. In its most simple form, we talk about positive and negative sentiment.

Here’s an example of a piece of text with a positive sentiment:

“I would like to take this opportunity to express my admiration for your cause. It is both honourable and brave.” – Teal’c to Garshaw, Season Two: The Tokra Part II.

Now here’s an example of a piece of dialogue with a negative sentiment:

“I mean, one wrong move, one false step, and a whole fragile world gets wiped out?” – Daniel, Season Two: One False Step

This is an example of a fairly neutral piece of text:

“Gentlemen, these planets designated P3A-575 and P3A-577 have been submitted by Captain Carter’s team as possible destinations for your next mission.”- General Hammond, Season Two: The Enemy Within.

It’s important to understand that sentiment analysis in its simplest form doesn’t really worry about how the words are put together. Picking up sarcasm, for example isn’t really possible by just deciding which words are negative and which are positive.

Sentiment analyses like this can’t measure the value of a text: they are abbreviations of a text. In the same way we use statistics like a mean or a standard deviation to describe a dataset, a sentiment analysis can be used to succinctly describe a text.

If you’d like to find out more about how sentiment analysis works, check out Julia Silge’s blog post here which provided a lot of the detailed code structure and inspiration for this analysis.

What does sentiment analysis show for SG1?

I analysed the show on both a by-episode and by-series basis. With over 200 episodes and 10 series, the show covered a lot of ground with its four main characters. I found a couple of things that were interesting.

The sentiment arc for most shows is fairly consistent.

Most shows open with a highly variable sentiment as the dilemma is explained, sarcastic, wry humour is applied and our intrepid heroes set out on whatever journey/quest/mission they’re tasked with. Season One’s Within the Serpent’s Grasp is a pretty good example. Daniel finds himself in an alternate reality where everything is, to put it mildly, stuffed.

Within the Serpent’s Grasp, Season 1. Daniel crosses over to an alternate reality where earth is invaded by evil parasitic aliens with an OTT dress sense.

According to these charts however, about three quarters of the way through it all gets a bit “meh”.

Below is the sentiment chart for Season Two’s In the Line of Duty, where Sam Carter has an alien parasite in control of her body. If that’s not enough for any astrophysicist to deal with, an alien assassin is also trying to kill Sam.

If we take the sentiment chart below literally, nobody really cares very much at all about the impending murder of a major character. Except, that’s clearly not what’s happening in the show: it’s building to the climax.

 

In the Line of Duty, Season 2. Sam Carter gets a parasite in her head and if that’s not enough, another alien is trying to kill her.

So why doesn’t sentiment analysis pick up on these moments of high drama?

I think the answer here is that this is a scifi/adventure show: tension and action isn’t usually achieved through dialogue. It’s achieved by blowing stuff up in exciting and interesting ways, usually.

The Season Three cliffhanger introduced “the replicators” for precisely this purpose. SG1 always billed itself as a family-friendly show. Except for an egregious full frontal nude scene in the pilot, everyone kept their clothes on. Things got blown up and people got thrown around the place by Really Bad Guys, but limbs and heads stayed on and the violence wasn’t that bad. SG1 was out to make the galaxy a better place with family values, a drive for freedom and a liberal use of sarcasm.

But scifi/adventure shows thrive on two things: blowing stuff up and really big guns. So the writers introduced the replicators, a “race” of self-generating techno lego that scuttled around in bug form for the most part.

In response to this new galactic terror, SG1 pulled out the shot guns, the grenades and had a delightful several seasons blasting them blood-free. The show mostly maintained its PG rating.

The chart below shows the sentiment chart for the replicators’ introductory episode, Nemesis. The bugs are heading to earth to consume all technology in their path. The Asgard, a race of super-advanced Roswell Greys have got nothing and SG1 has to be called in to save the day. With pump action shotguns, obviously.

The replicator bugs don’t speak. The sound of them crawling around inside a space ship and dropping down on people is pretty damn creepy: but not something to be picked up by using a transcript as an instrument for the full product.

Nemesis, Season 3 cliffhanger: the rise of the techno bugs.

Season Four’s opener, Small Victories solved the initial techno bug crisis, but not before a good half hour of two of our characters flailing around inside a Russian submarine with said bugs. Again, the sentiment analysis found it all a little “whatever” towards the end.

Small Victories, Season 4 series opener. Techno bugs are temporarily defeated.

Is sentiment analysis useless for TV transcripts then?

Actually, no. It’s just that in those parts of the show where dialogue is only of secondary importance to the other elements of the work obscures the usefulness of the transcript as an instrument. In order for the transcript to be a useful instrument, we need to do what we’d ideally do in many instrumental variables cases: look at a bigger sample size.

Let’s take a look at the sentiment chart for the entire sixth season. This is the one where Daniel Jackson is dead, but is leading a surprisingly active and fulfilling life for a dead man. We can see the overall structure of the story arc for the season below. The season starts with something of a bang as new nerd Jonas is introduced just in time for old nerd Daniel to receive a life-ending dose of explosive radiation. The tension goes up and down throughout. It’s most negative at about the middle of the season where there’s usually a double-episode cliffhanger and smooths out towards the end of the series until tension increases with the final cliffhanger.

 

Series Six: Daniel is dead and new guy Jonas has to pick up the vacant nerd space.

Season Eight, in which the anti-establishment Jack O’Neill has become the establishment follows a broadly similar pattern. (Jack copes with the greatness thrust upon him as a newly-starred general by being more himself than ever before.)

Note the end-of-series low levels of sentiment. This is caused by a couple of things: as with the episodes, moments of high emotion get big scores and this obscures the rest of the distribution. I considered normalising it all between 0 and 1. This would be a good move for comparing between episodes and seasons, but didn’t seem necessary in this case.

The other issue going on here is the narrative structure of the overall arc. In these cases, I think the season is slowing down a little in preparation for the big finale.

Both of these issues were also apparent in the by-episode charts as well.

Your turn now

For the fun of it, I built a Shiny app which will allow you to explore the sentiment of each episode and series on your own. I’ve also added simple word clouds for each episode and series. It’s an interesting look at the relative importance each character has and how that changed over the length of the show.

The early series were intensely focussed on Jack, but as the show progressed the other characters got more and more nuanced development.

Richard Dean Anderson made the occasional guest appearance after Season Eight, but was no longer a regular role on the show after Season Nine started. The introduction of new characters Vala Mal Doran and Cameron Mitchell took the show into an entirely new direction. The word clouds show those changes.

You can play around with the app below, or find a full screen version here. Bear in mind it’s a little slow to load at first: the corpus of SG1 transcripts comes with the first load, and that’s a lot of episodes. Give it a few minutes, it’ll happen!

 

The details.

The IMSDB transcript database provided the transcripts for this analysis, but not for all episodes: the database only had 182 of the more than 200 episodes that were filmed on file. I have no transcripts to any episode in Season 10 and only half of Season 9! If anyone knows where to find more or if the spinoff Stargate Atlantis transcripts are online somewhere, I’d love to know.

A large portion of this analysis and build used Julia Silge and David Robinson’s Tidy Text package in R. They also have a book coming out shortly, which I have on preorder. If you’re considering learning about Natural Language Processing, this is the book to have, in my opinion.

You can find the code I wrote for the project on Github here.

 

Gradient Descent vs Stochastic Gradient Descent: Some Observations of Behaviour

Anyone who does any machine learning at all has heard of the gradient descent and stochastic gradient descent algorithms. What interests me about machine learning is understanding how some of these algorithms (and as a consequence the parameters they are estimating) behave in different scenarios.

The following are single observations from a random data generating process:

y(i) = x(i) +e(i)

where e is a random variable distributed in one of a variety of ways:

  • e is distributed N(0,1). This is a fairly standard experimental set up and should be considered “best case scenario” for most estimators.
  • e is distributed t(4). This is a data generation process that’s very fat tailed (leptokurtotic). It’s a fairly common feature in fields like finance. All the required moments for the ordinary least squares estimator to have all its usual properties exist, but only just. (If you don’t have a stats background, think of this as the edge of reasonable cases to test this estimator against.)
  • e is distributed as centred and standardised chi squared (2). In this case, the necessary moments exist and we have ensured e has a zero mean. But the noise isn’t just fat-tailed, it’s also skewed. Again, not a great scenario for most estimators, but a useful one to test against.

The independent variable x is intended to be deterministic, but in this case is generated as N(1,1). To get all the details of what I did, the code is here. Put simply, I estimated the ordinary least squares estimators for the intercept and coefficient of the process (e.g. the true intercept here is zero and the true coefficient is 1). I did this using stochastic gradient descent and gradient descent to find the minima of the likelihood and generate the estimates.

Limits: the following are just single  examples of what the algorithms do in each case. It’s not a full test of algorithm performance: we would repeat each of these thousands of times and then take overall performance measures to really know how these algorithms perform under these conditions. I just thought it would be interesting to see what happens. If I get some more time I’d develop this into a full simulation experiment if I find something worth exploring in detail.

Here’s what I found:

N is pretty large (10 000). Everything is super (for SGD)!

Realistically, unless dimensionality is huge, closed form ordinary least squares (OLS) estimation methods would do just fine here, I suspect. However, the real point here is to test differences between SGD and GD.

I used a random initialisation for the algorithm and you can see how quickly it got to the true parameter. In situations like this where the algorithm uses only a few iterations, if you are averaging SGD estimators then allowing for a burn in and letting it run beyond the given tolerance level may be very wise. In this case, averaging the parameters over the entire set of iterations would be a worse result than just taking the last estimator.

Under each scenario, the stochastic gradient descent algorithm performed brilliantly and much faster than the gradient descent equivalent which struggled. The different error distributions mattered very little here.

For reference, it took around 37 000 iterations for the gradient descent algorithm to reach its end point:

For the other experiments, I’ll just stick to a maximum of 10 000 iterations.

N is largish (N = 1000). Stochastic gradient descent doesn’t do too badly.

Now, to be fair, unless I was trying to estimate something with huge dimensionality (and this hasn’t been tested here anyway), I’d just use standard OLS estimating procedures for estimating this model in real life.

SGD takes marginally longer to get where it’s going, but the generating process made no material difference, although our example for chi squared was out slightly- this is one example.

N is smallish (N = 100): SGD gets brittle, but it gets somewhere close. GD has no clue.

Again, in real life, this is a moment for simpler methods but the interesting point here is that SGD takes a lot longer and fails to reach the exact true value, especially for the constant term (again, one example in each case, not a thorough investigation).

Here, GD has no clue and it’s taking SGD thousands more iterations to reach a less palatable conclusion. In practical machine learning contexts, it’s highly unlikely this is a sensible estimation method at these sample sizes: just use the standard OLS techniques.

I’m not destructive in general, but sometimes I like to break things.

I failed to achieve SGD armageddon by either weird error distributions (that were within the realms of reasonable for the estimator) or smallish sample sizes. So I decided to give it the doomsday scenario: smallish N and an error distribution that does not fulfil the requirements that the simple linear regression model needs to work. I tried a t(1) distribution for the error- this thing doesn’t even have a finite variance.

SGD and GD are utterly messed up (and I suspect the closed form solution may not be a whole lot better here either). Algorithmic armageddon, achieved. In practical terms, you can probably safely file this under “never going to happen”.

Data Visualisation: Hex Codes, Pantone Colours and Accessibility

One of the things I find hardest about data visualisation is colouring. I’m not a natural artist, much preferring everything in gentle shades of monochrome. Possibly beige. Obviously for any kind of data visualisation, this limited .Quite frankly this is the kind of comfort zone that needs setting on fire.

I’ve found this site really helpful: it’s a listing of the Pantone colours with both Hex and RGB codes for inserting straight into your visualisations. It’s a really useful correspondence if I’m working with someone (they can give me the Pantone colour numbers of their website or report palette- I just search the page).

One thing I’ve found, however, is that a surprising (to me) number of people have some kind of colour-based visual impairment. A palette that looks great to me may be largely meaningless to someone I’m working with. I found this out in one of those forehead slapping moments when I couldn’t understand why a team member wasn’t seeing the implications of my charts. That’s because, to him, those charts were worse than useless. They were a complete waste of his time.

Some resources I’ve found helpful in making my visualisations more accessible are the colourblind-friendly palettes discussed here and this discussion on R-Bloggers. The latter made me realise that up until now I’ve been building visualisations that were obscuring vital information for many users.

The things I think are important for building an accessible visualisation are:

  • Yes, compared to more subtle palettes, colour-blind friendly palettes look like particularly lurid unicorn vomit. They don’t have to look bad if you’re careful about combinations, but I’m of the opinion that prioritising accessibility for my users is more important than “pretty”.
  • Redundant encoding (discussed in the R-bloggers link above) is a great way ensuring users can make out the information you’re trying to get across. To make sure this is apparent in your scale, use a combination of scale_colour_manual() and scale_linetype_manual(). The latter works the same as scale_colour_manual() but is not as well covered in the literature.
  • Consider reducing the information you’re putting into each chart, or using a combination of facets and multiple panels. The less there is to differentiate, the easier it can be on your users. This is a good general point and not limited to those with colourblindness.

Bonds: Prices, Yields and Confusion- a Visual Guide

Bonds have been the talk of the financial world lately. One minute it’s a thirty-year bull market, the next it’s a bondcano. Prices are up, yields are down and that’s bad. But then in the last couple of months, prices are down and yields are up and that’s bad too, apparently. I’m going to take some of the confusion out of these relationships and give you a visual guide to what’s been going on in the bond world.

The mathematical relationship between bond prices and yields can be a little complicated and I know very few people who think their lives would be improved by more algebra in it. So for our purposes, the fundamental relationship is that bond prices and yields move in opposite directions. If one is going up, the other is going down. But it’s not a simple 1:1 relationship and there are a few other factors at play.

There are several different types of bond yields that can be calculated:

  • Yield to maturity: the yield you would get if you hold the bond until it matures.
  • Yield to call: the yield you would get if you hold the bond until its call date.
  • Yield to worst: the worst outcome on a bond, whether it is called or held to maturity.
  • Running yield: this is roughly the yield you would get from holding the bond for a year.

We are going to focus on yield to maturity here, but a good overview of yields generally can be found at FIIG. Another good overview is here.

 

To explain all this (without algebra), I’ve created two simulations. These show the approximate yield to maturity against the time to maturity, coupon rate and the price paid for the bond. For the purposes of this exercise, I’m assuming that our example bonds have a face value of $100 and a single annual payment.

The first visual shows what happens as we change the price we pay for the bond. When we buy a bond below face value (at, say $50 when its face value is $100), yield is higher. But if we buy that bond at $150, then yield is much lower. As price increases, yield decreases.

The time the bond has until maturity matters a lot here, though. If there is only a short time to maturity then the differences between below/above face value can be very large. If there are decades to maturity, then these differences tend to be much smaller. The shading of the blue dots represent the coupon rate that might be attached to a bond like this- the darkest colours will have the highest coupon rate and the lighter colour will have the lowest coupon rates. Again, the differences matter more when there is less time for a bond to mature.

Prices gif

The second animation is a representation of what happens as we change the coupon rate (e.g. the interest rate the debtor is paying to the bond holder). The lines of dots represent differences in the price paid for the bond. The lighter colours represent a cheaper purchase below face value (better yields- great!). The darker colours represent an expensive purchase above face value (lower yields-not so great).

If we buy a bond cheaply, then the yield may be higher than the coupon rate. If we buy it over the face value, then the yield may be lower than the coupon rate. The difference between them is less the longer the bond has to mature. When the bond is very close to maturity those differences can be quite large.

Coupon Gif

When discussing bonds, we often mention something called the yield curve and this describes the yield a bond (or group of bonds) will generate over their life time.

If you’d like to have a go at manipulating the coupon rate and the price to manipulate an approximate yield curve, you can check out this interactive I built here.

Remember that all of these interactives and animations are approximate, if you want to calculate yield to maturity exactly, you can use an online calculator like the one here.

So how does this match the real data that gets reported on daily? Our last chart shows the data from the US Treasury 10-year bills that were sold on the 25th of November 2016. The black observations are bonds maturing within a year, the blue are those that have longer to run.  Here I’ve charted the “Asked Yield”, which is the yield a buyer would receive if the seller sold their bond at the price they were asking. Sometimes, however, the bond is bought at a lower bid, so the actual yield would be a little higher. I’ve plotted this against the time until the bond matures. We can see that the actual yield curve produced is pretty similar to our example charts.

This was the yield curve from one day. The shape of the yield curve will change on a day-to-day basis depending on the prevailing market conditions (e.g. prices). It will also change more slowly over time as the Federal Reserve issues bonds with higher or lower coupon rates, depending on economic conditions.

yield curve

Data: Wall Street Journal.

Bond yields and pricing can be confusing, but hopefully as you’re reading the financial pages they’re a lot less so now.

A huge thanks to my colleague, Dr Henry Leung at the University of Sydney for making some fantastic suggestions on this piece.

 

Political Donations 2015/16

Yesterday, the ABC released a dataset detailing donations made to political parties in Australia during the 2015-16 period. You can find their analysis and the data here. The data itself isn’t a particularly good representation of what was happening during the period: there isn’t a single donation to the One Nation Party among the lot of them, for example. This data isn’t a complete picture of what’s going on.

While the ABC made a pretty valiant effort to categorise where the donations were coming from, “uncategorised” was the last resort for many of the donors.

Who gets the money?

In total, there were 49 unique groups who received the money. Many of these were state branches of national parties, for example the Liberal Party of Australia – ACT Division, Liberal Party of Australia (S.A. Division) and so on. I’ve grouped these and others like it together under their national party. Other groups included small narrowly-focussed parties like the Shooters, Fishers and Farmers Party and the Australian Sex Party. Small micro parties like the Jacqui Lambie Network, Katter’s Australian Party and so on were grouped together. Parties with a conservative focus (Australian Christians, Family First, Democratic Labor Party) were grouped and those with a progressive focus (Australian Equality Party, Socialist Alliance) were also grouped together. Parties focused on immigration were combined.

The following chart shows the value of the donation declared and the recipient group that received it.

Scatter plot

Only one individual donation exceeded $500 000 and that was to the Liberal Party. It’s obscuring the rest of the distribution, so I’ve removed it in the next chart. Both the major parties receive more donations than the other parties, which comes as no surprise to anyone. However, the Greens have a proportion of very generous givers ($100 000+) which is quite substantial. The interesting question is not so much as who received it, but who gave the money.

Scatter plot with outlier removed

 

Who gave the money?

This is probably the more interesting point. The following charts use the ABC’s categories to see if we can break down where the (declared) money trail lies (for donations $500 000 and under). Again, the data confirmed what everyone already knew: unions give to the Labor party. Finance and insurance gave heavily to the Liberal Party (among others). Several clusters stand out, though: uncategorised donors give substantially to minor parties and the Greens have two major clusters of donors: individuals and a smaller one in the agriculture category.

Donor categories and value scatter plot

Breaking this down further, if we just look at where the money came from and who it went to, we can see that the immigration-focused parties are powered almost entirely by individual donations with some from uncategorised donors. Minor parties are powered by family trusts, unions and uncategorised donors. Greens by individuals, uncategorised and agriculture with some input from unions. What’s particularly interesting is the differences in Labor and Liberal donors. Compared to Liberal, Labor does not have donors in the tobacco industry, but also has less input by number of donations in agriculture, alcohol, advocacy/lobby groups, sports and water management. They also have fewer donations from uncategorised donors and more from unions.

Donors and Recipients Scatterplot

What did we learn?

Some of what we learned here was common knowledge: Labor doesn’t take donations from tobacco, but it does from unions. The unions don’t donate to Liberal, but advocacy and lobby groups do. The more interesting observations are focussed on the smaller parties: the cluster of agricultural donations for the Greens Party – normally LNP heartland; and the individual donations powering the parties focussed on immigration. The latter may have something to say for the money powering the far right.

 

Productivity: In the Long Run, It’s Nearly Everything.

“Productivity … isn’t everything, but in the long run it’s nearly everything.” Paul Krugman, The Age of Diminished Expectations (1994)

So in the very long run, what’s the Australian experience? I recently did some work with the Department of Communications and the Arts on digital techniques and developments. Specifically, we were looking at the impacts advances in fields like machine learning, artificial intelligence and blockchain may have on productivity in Australia. I worked with a great team at the department led by the Chief Economist Paul Paterson and we’re looking forward to our report being published.

In the meantime, here’s the very long run on productivity downunder.

Australian Productivity Chart

Yield to Maturity: A Basic Interactive

The yield to maturity concept describes the approximate rate of return a bond generates if it’s held until redemption date. It’s dependent on a few things including the coupon rate (nominal interest rate), face value of the bond, price of the bond and the time until maturity.

It can get a little confusing with the mathematics behind it, so I’ve created a simple Shiny App that allows you to manipulate the inputs to observe what happens. Bear in mind this is not a financial calculator, it’s an interactive for educational purposes. It’s also the approximate not exact yield to maturity of a bond which is fine for our purposes.

I’ve mapped the yield up to 30 year redemption and assumed a face value of $100. Coupon rate varies between 0% and 25%. Current price of the bond can vary between $50 and $150. Mostly, the yield curve is very flat in this simplified approximation- but observe what happens when there is only a short time to maturity (0-5 years) and rates or price are extreme. You can find the interactive directly here.

 

 

Remember, this is just an approximation. For a more accurate calculation, see here.

Kids can code: update

So the kid heard I might know a thing or two about random number generators and could see how these might be useful to choose a drink for dinner. So we worked on a short program together and he was quite happy with the result. We also had a good discussion about random seeds and when/if you want to use them. Altogether a great nerd-bonding session.

Then, being ten years old, he had to alter the program to his own satisfaction. He used his own name, but privacy and all of that- you get the point.

Cheeky child's code

Does it matter in practice? Normal vs t distribution

One of the perennial discussions is normal vs t distributions: which do you use, when, why and so on. This is one of those cases where for most sample sizes in a business analytics/data science context it probably makes very little practical difference. Since that’s such a rare thing for me to say, I thought it was worth explaining.

Now I’m all for statistical rigour: you should use the right one at the right time for the right purpose, in my view. However, this can be one of those cases where if the sample size is large enough, it’s just not that big a deal.

The actual simulations I ran are very simple, just 10 000 draws from normal and t-distributions with the t varying at different degrees of freedom. Then I just plotted the density for each on the same graph using ggplot in R. If you’d like to have a play around with the code, leave a comment to let me know and I’ll post it to github.

Describing simple statistics

I’m a huge believer in the usefulness of learning by doing. That makes me a huge believer in Shiny, which allows me to create and deploy simple apps that allow students to do just that.

This latest app is a simple one that allows you to manipulate either the mean or the variance of a normal distribution and see how that changes the shape of the distribution.

If you want to try out making Shiny apps, but need a place to start, check out Oliver Keyes’ excellent start up guide.

application view1

application view 2