Stargate SG1: Sentiment is not Enough for the Techno Bugs

One of the really great things as your kids get older is that you can share with them stuff you thought was cool when you were young, had ideals and spare time. One of those things my siblings and I spent a lot of that spare time doing was watching Stargate SG1. Now I’ve got kids and they love the show too.

When I sat down to watch the series the first time I was a history student in my first year of university, so Daniel’s fascination with languages and cultures was my interest too. Ironically, at the time I was also avoiding taking my first compulsory econometrics course because I was going to hate it SO MUCH.

Approximately one million years, a Ph.D. in econometrics and possibly an alternate reality later, I’m a completely different person. With Julia Silge’s fabulous Austen analyses fresh in my mind (for a start, see here and then keep exploring) I rewatched the series. I wondered: how might sentiment work for transcripts, rather than print-only media like a novel?

In my view, this is something like an instrumental variables problem. A transcript of a TV show is only part of the medium’s signal: imagery and sound round out the full product. So a sentiment analysis on a transcript is only an analysis of part of the presented work. But because dialogue is such an intrinsic and important part of the medium, might it give a good representation?

What is sentiment analysis?

If you’re not a data scientist, or you’re new to natural language processing, you may not know what sentiment analysis is. Basically, sentiment analysis compares a list of words (like you may find in a transcript, a speech or a novel) to a dictionary that measures the emotions the words convey. In its most simple form, we talk about positive and negative sentiment.

Here’s an example of a piece of text with a positive sentiment:

“I would like to take this opportunity to express my admiration for your cause. It is both honourable and brave.” – Teal’c to Garshaw, Season Two: The Tokra Part II.

Now here’s an example of a piece of dialogue with a negative sentiment:

“I mean, one wrong move, one false step, and a whole fragile world gets wiped out?” – Daniel, Season Two: One False Step

This is an example of a fairly neutral piece of text:

“Gentlemen, these planets designated P3A-575 and P3A-577 have been submitted by Captain Carter’s team as possible destinations for your next mission.”- General Hammond, Season Two: The Enemy Within.

It’s important to understand that sentiment analysis in its simplest form doesn’t really worry about how the words are put together. Picking up sarcasm, for example isn’t really possible by just deciding which words are negative and which are positive.

Sentiment analyses like this can’t measure the value of a text: they are abbreviations of a text. In the same way we use statistics like a mean or a standard deviation to describe a dataset, a sentiment analysis can be used to succinctly describe a text.

If you’d like to find out more about how sentiment analysis works, check out Julia Silge’s blog post here which provided a lot of the detailed code structure and inspiration for this analysis.

What does sentiment analysis show for SG1?

I analysed the show on both a by-episode and by-series basis. With over 200 episodes and 10 series, the show covered a lot of ground with its four main characters. I found a couple of things that were interesting.

The sentiment arc for most shows is fairly consistent.

Most shows open with a highly variable sentiment as the dilemma is explained, sarcastic, wry humour is applied and our intrepid heroes set out on whatever journey/quest/mission they’re tasked with. Season One’s Within the Serpent’s Grasp is a pretty good example. Daniel finds himself in an alternate reality where everything is, to put it mildly, stuffed.

Within the Serpent’s Grasp, Season 1. Daniel crosses over to an alternate reality where earth is invaded by evil parasitic aliens with an OTT dress sense.

According to these charts however, about three quarters of the way through it all gets a bit “meh”.

Below is the sentiment chart for Season Two’s In the Line of Duty, where Sam Carter has an alien parasite in control of her body. If that’s not enough for any astrophysicist to deal with, an alien assassin is also trying to kill Sam.

If we take the sentiment chart below literally, nobody really cares very much at all about the impending murder of a major character. Except, that’s clearly not what’s happening in the show: it’s building to the climax.

 

In the Line of Duty, Season 2. Sam Carter gets a parasite in her head and if that’s not enough, another alien is trying to kill her.

So why doesn’t sentiment analysis pick up on these moments of high drama?

I think the answer here is that this is a scifi/adventure show: tension and action isn’t usually achieved through dialogue. It’s achieved by blowing stuff up in exciting and interesting ways, usually.

The Season Three cliffhanger introduced “the replicators” for precisely this purpose. SG1 always billed itself as a family-friendly show. Except for an egregious full frontal nude scene in the pilot, everyone kept their clothes on. Things got blown up and people got thrown around the place by Really Bad Guys, but limbs and heads stayed on and the violence wasn’t that bad. SG1 was out to make the galaxy a better place with family values, a drive for freedom and a liberal use of sarcasm.

But scifi/adventure shows thrive on two things: blowing stuff up and really big guns. So the writers introduced the replicators, a “race” of self-generating techno lego that scuttled around in bug form for the most part.

In response to this new galactic terror, SG1 pulled out the shot guns, the grenades and had a delightful several seasons blasting them blood-free. The show mostly maintained its PG rating.

The chart below shows the sentiment chart for the replicators’ introductory episode, Nemesis. The bugs are heading to earth to consume all technology in their path. The Asgard, a race of super-advanced Roswell Greys have got nothing and SG1 has to be called in to save the day. With pump action shotguns, obviously.

The replicator bugs don’t speak. The sound of them crawling around inside a space ship and dropping down on people is pretty damn creepy: but not something to be picked up by using a transcript as an instrument for the full product.

Nemesis, Season 3 cliffhanger: the rise of the techno bugs.

Season Four’s opener, Small Victories solved the initial techno bug crisis, but not before a good half hour of two of our characters flailing around inside a Russian submarine with said bugs. Again, the sentiment analysis found it all a little “whatever” towards the end.

Small Victories, Season 4 series opener. Techno bugs are temporarily defeated.

Is sentiment analysis useless for TV transcripts then?

Actually, no. It’s just that in those parts of the show where dialogue is only of secondary importance to the other elements of the work obscures the usefulness of the transcript as an instrument. In order for the transcript to be a useful instrument, we need to do what we’d ideally do in many instrumental variables cases: look at a bigger sample size.

Let’s take a look at the sentiment chart for the entire sixth season. This is the one where Daniel Jackson is dead, but is leading a surprisingly active and fulfilling life for a dead man. We can see the overall structure of the story arc for the season below. The season starts with something of a bang as new nerd Jonas is introduced just in time for old nerd Daniel to receive a life-ending dose of explosive radiation. The tension goes up and down throughout. It’s most negative at about the middle of the season where there’s usually a double-episode cliffhanger and smooths out towards the end of the series until tension increases with the final cliffhanger.

 

Series Six: Daniel is dead and new guy Jonas has to pick up the vacant nerd space.

Season Eight, in which the anti-establishment Jack O’Neill has become the establishment follows a broadly similar pattern. (Jack copes with the greatness thrust upon him as a newly-starred general by being more himself than ever before.)

Note the end-of-series low levels of sentiment. This is caused by a couple of things: as with the episodes, moments of high emotion get big scores and this obscures the rest of the distribution. I considered normalising it all between 0 and 1. This would be a good move for comparing between episodes and seasons, but didn’t seem necessary in this case.

The other issue going on here is the narrative structure of the overall arc. In these cases, I think the season is slowing down a little in preparation for the big finale.

Both of these issues were also apparent in the by-episode charts as well.

Your turn now

For the fun of it, I built a Shiny app which will allow you to explore the sentiment of each episode and series on your own. I’ve also added simple word clouds for each episode and series. It’s an interesting look at the relative importance each character has and how that changed over the length of the show.

The early series were intensely focussed on Jack, but as the show progressed the other characters got more and more nuanced development.

Richard Dean Anderson made the occasional guest appearance after Season Eight, but was no longer a regular role on the show after Season Nine started. The introduction of new characters Vala Mal Doran and Cameron Mitchell took the show into an entirely new direction. The word clouds show those changes.

You can play around with the app below, or find a full screen version here. Bear in mind it’s a little slow to load at first: the corpus of SG1 transcripts comes with the first load, and that’s a lot of episodes. Give it a few minutes, it’ll happen!

 

The details.

The IMSDB transcript database provided the transcripts for this analysis, but not for all episodes: the database only had 182 of the more than 200 episodes that were filmed on file. I have no transcripts to any episode in Season 10 and only half of Season 9! If anyone knows where to find more or if the spinoff Stargate Atlantis transcripts are online somewhere, I’d love to know.

A large portion of this analysis and build used Julia Silge and David Robinson’s Tidy Text package in R. They also have a book coming out shortly, which I have on preorder. If you’re considering learning about Natural Language Processing, this is the book to have, in my opinion.

You can find the code I wrote for the project on Github here.

 

Using Natural Language Processing for Survey Analysis

Surveys have a specific set of analysis tools that are used for analysing the quantitative part of the data you collect (stata is my particular poison of choice in this context). However, often the interesting parts of the survey are the unscripted, “tell us what you really think” comments.

Certainly this has been true in my own experience. I once worked on a survey deployed to teachers in Laos regarding resources for schools and teachers. All our quantitative information came back and was analysed, but one comment (translated for me into English by a brilliant colleague) stood out. It read something to the effect of “this is very nice, but the hole in the floor of the second story is my biggest concern as a teacher”. It’s not something that would ever have been included outright in the survey, but a simple sentence told us a lot about the resources this school had access to.

Careful attention to detailed comments in small surveys is possible. But if you have thousands upon thousands of responses, this is far more difficult. Enter natural language processing.

There are a number of tools which can be useful in this context. This is a short overview of some that I think are particularly useful.

  • Word Clouds. These are easy to prepare and very simple, but can be a powerful way to communicate information. Like all data visualisation, there are the good and the bad. This is an example of a very simple word cloud, while this post by Fells Stats illustrates some more sophisticated methods of using the tool.

One possibility to extend on the simple “bag of words” concept is to divide your sample by groups and compare clouds. Or create your own specific dictionary of words and concepts you’re interested in and only cloud those.

Remember that stemming the corpus is critical. For example, “work”, “worked”, “working”, “works” all belong to the same stem. They should be treated as one or else they are likely to swamp other themes if they are particularly common.

Note that no word cloud should be constructed without removing “stop words” like the, and, a, I etc. Dictionaries vary- they can (and should) be tailored to the problem at hand.

  • Network Analysis. If you have a series of topics you want to visualise relationships for, you could try a network-type analysis similar to this. The concept may be particularly useful if you manually decide topics of interest and then examine relationships between them. In this case, the outcome is very much user-dependent/chosen, but may be useful as a visualisation.
  • Word Frequencies. Alone, simple tables of word frequencies are not always particularly useful. In a corpus of documents pertaining to education, noting that “learning” is a common term isn’t something of particular note. However, how do these frequencies change by group? Do teachers speak more about “reading” than principals? Do people in one geographical area or salary bracket have a particular set of high frequency words compared to another? This is a basic exercise in feature/variable engineering. In this case, the usual data analysis tool kit applies (see here, here and here). Remember you don’t need to stop at high frequency words: what about high frequency phrases?
  •  TF-IDF (term frequency-inverse document frequency) matrix. This may provide useful information and is a basis of many more complex analyses. The TF-IDF downweights terms appearing in all documents/comments (“the”, “i”, “and” etc.) while upweighting rare words that may be of interest. See here for an introduction.
  • Are the comments clustered across some lower dimensional space? K-means algorithm may provide some data-driven guidance there. This would be an example of “unsupervised machine learning” vis a vis “this is an alogrithm everyone has been using for 25 years but we need to call it something cool”. This may not generate anything obvious at first- but who is in those clusters and why are they there?
  • Sentiment analysis will be useful, possibly both applied to the entire corpus and to subsets. For example, among those who discussed “work life balance” (and derivative terms) is the sentiment positive or negative? Is this consistent across all work/salary brackets? Are truck drivers more upbeat than bus drivers? Again, basic feature/variable engineering applies here. If you’re interested in this area, you could do a lot worse than learning from Julia Silge who writes interesting and informative tutorials in R on the subject.
  • Latent Dirichlet Algorithm (LDA) and more complex topic analyses. Finally, latent dirichlet algorithm or other more complex topic analyses may be able to directly generate topics directly from the corpus: I think this would take a great deal of time for a new user and may have limited outcomes, particularly if an early analysis would suggest you have a clear idea of which topics are worth investigating already. It is however particularly useful when dealing with enormous corpi. This is a really basic rundown of the concept. This is a little more complex, but has useful information.

So that’s a brief run down of some basic techniques you could try: there are plenty more out there- this is just the start. Enjoy!