Decoding error messages in R

Decoding error messages in R can be difficult for newcomers, that’s why I’m working on helpPlease. However, in the meantime, it’s important to be able to understand R errors and warnings in more detail than simply ‘R says no’. So here’s a quick rundown:

Errors in R an infographic

R gives both errors and warnings

An error is “R says no”. It’s R’s way of telling you why the chunk of code is not possible to execute.

Warnings mean “R says OK sure but maybe you won’t like what you’re going to get”. It’s R’s way of telling you the code is behaving in a different way than you might reasonably expect.

Decoding an error message

The error message typically comes in three parts. Here’s a common example from my code: I’ve tried to access a part of a array that doesn’t exist – my array has a column dimension of 5, so when R goes looking for a the 100th column it’s understandably confused and just gives up.

R error message

There are three main parts to this message:

  1. The declaration that it is an Error
  2. The location of the error – it’s in the line of my code fit[5,100,]
  3. The problem this mistake in my code caused: the subscript is out of bounds, i.e. I asked R to go an retrieve a part of this array that did not exist.

Decoding a warning message

Warning messages can be very variable in format, but there are often common elements. Here’s a common one that ggplot gives me:

ggplot2 warning message

Here I’ve asked ggplot2 to put a line chart together for me, but some of my data frame is missing. Ggplot2 can still put the chart together, but it’s letting me know I have missing values.

While warning messages can be very variable, there are some common elements that turn up fairly regularly:

  1. The declaration of a warning
  2. The behaviour being warned about
  3. The piece of code that caused the warning

Now that you know what warnings and errors are and what’s in them: how do you find out what they mean?

Where can you find help?

There’s lots of information out there to help you decode your warning and error messages. Here are some that I use all the time:

  • Typing ? or ?? and the name of the function that’s going wrong in the console will give you help within R itself
  • Googling the error message, warning or package is often very useful
  • Stack Overflow or the RStudio community forums can be searched for other people’s (solved!) problems
  • The vignettes and examples for the package you’re using are a wealth of information
  • Blog posts that use the package or function you are can be a very good step-by-step guide of how to prepare your data for the tool you’re trying to use
  • Building a reprex (a reproducible example) is a good way of getting ready to ask a question on Stack Overflow or the R community forums.

Good luck! And in the meantime, if you should come across an R message that could use explaining in plain text I’d really love to hear from you (especially if you’re new!).

excelTransition

excelTransition is designed as a series of ‘training wheels’ functions which allow you to create some outputs similar to those you’d already have created in Excel with a minimum amount of coding and time.

It’s a package designed for you to use and abandon quickly. One of the most costly things about learning a new tool is the time you spend learning to do simple things you can already do in your current tool. excelTransition will help you produce some (very) basic analyses in minimum time, leaving you more time to work on learning R.

It’s ideal for someone at the very beginning of their learning about programming. If you’re an experienced programmer, you may not need these ‘training wheels’ at all.

The package is currently under development and you can view it on Github.

Help Please – a package for new R users

Starting out in R can be quite overwhelming. There are lots of resources and people around who want to help, but navigating those can be hard and some of the R error messages and explainers within R itself can seem like something of a foreign language.

helpPlease is there to bridge the gap. The gap closes by itself over time, but let’s build a bridge and make life easier.

The package is in proof of concept stage and you can see it on Github. If you have an error message that could benefit by being explained in plain language, a term used in R that could use a plain language explanation, encouragement for new users or a troubleshooting tip, we’d love to hear from you.

I turned off commenting on new posts…

… not because I don’t like hearing from you all. It’s more that being offered amazing generic pharmaceutical deals 1500 times gets old.

If you’d like to throw some criticism or comment my way about anything on here, feel free to ping me on twitter @StephdeSilva or use the contact form to hit my inbox.

And if you want a great deal on generic pharmaceuticals, I can probably help you out there too.

Closures in R

Put briefly, closures are functions that make other functions. Are you repeating a lot of code, but there’s no simple way to use the apply family or Purrr to streamline the process? Maybe you could write your own closure. Closures enclose access to the environment in which they were created – so you can nest functions within other functions.

What does that mean exactly? Put simply, it can see all the variables enclosed in the function that created it. That’s a useful property!

What’s the use case for a closure?

Are you repeating code, but instead of variables or data that’s changing – it’s the function instead? That’s a good case for a closure. Especially as your code gets more complex, closures are a good way of modularising and containing concepts.

What are the steps for creating a closure?

Building a closure happens in several steps:

  1. Create the output function you’re aiming for. This function’s input is the same as what you’ll give it when you call it. It will return the final output. Let’s call this the enclosed function.
  2. Enclose that function within a constructor function. This constructor’s input will be the parameters by which you’re varying the enclosed function in Step 1. It will output the enclosed function. Let’s call this the enclosing function.
  3. Realise you’ve got it all wrong, go back to step 1. Repeat this multiple times. (Ask me how I know..)
  4. Next you need to create the enclosed function(s) (the ones from Step 1) by calling the enclosing function (the one from Step 2).
  5. Lastly, call and use your enclosed functions.

An example

Say I want to calculate the mean, SD and median of a data set. I could write:
x <- c(1, 2, 3)
mean(x)
median(x)
sd(x)

That would definitely be the most efficient way of going about it. But imagine that your real use case is hundreds of statistics or calculations on many, many variables. This will get old, fast.

I’m calling those three functions each in the same way, but the functions are changing rather than the data I’m using. I could write a closure instead:

stat <- function(stat_name){

function(x){
stat_name(x)
}

}

This is made up of two parts: function(x){} which is the enclosed function and stat() which is the enclosing function.

Then I can call my closure to build my enclosed functions:

mean_of_x <- stat(mean)
sd_of_x <- stat(sd)
median_of_x <- stat(median)

Lastly I can call the created functions (probably many times in practice):

mean_of_x(x)
sd_of_x(x)
median_of_x(x)

I can repeat this for all the statistics/outcomes I care about. This example is too trivial to be realistic – it takes about double the lines of code and is grossly ineffficient! But it’s a simple example of how closures work.

More on closures

If you’re producing more complex structures, closures are very useful. See Jason’s post from Left Censored for a realistic bootstrap example – closures can streamline complex pieces of code reducing mistakes and improving the process you’re trying to build. It takes modularity to the next step.

For more information in R see Hadley Wickham’s section on closures in Advanced R.

 

Some notes

I’m teaching a course in financial statistics this semester – it’s something I’ve been doing on and off for, well, it’s best measured in decades now.

To make my life easier, I started compiling my notes on various subjects: here they are in rough draft form, at least for the moment. I’ll add to them as I go.

Chapter 1 The World of Data Around Us

Chapter 2 Data Visualisation Matters

Data Analysis

An Introduction to Algebra

Introduction to Sigma Notation

Workflow by design

Failure Is An Option

Failure is not an option available to most of us, most of the time. With people depending on us, it’s a luxury few can afford. As a consequence, we shoot for a minimum viable product, minimise risks and, often, minimise creativity. This is a crying shame. If we want to be better programmers, better modellers or better analysts, we need to have space to fail occasionally. The opportunity to “try it and see” is an incredible luxury not offered to many people.

This isn’t a defence of mediocrity or incompetence. Let me be very clear: highly capable, brilliant people fail at things they try. They fail because they take risks. They push and push and push. They find out a whole bunch of things that don’t work and- if they’re lucky- find the piece of gold that does.

In our general workplaces, we often can’t afford this, unless we are very privileged. I spent last week at the ROpensci Oz Unconference and failure wasn’t just an option: it was encouraged.

This was an incredibly freeing approach to programming and one that generated a wealth of creativity and collaboration. Participants were users of all skill levels from just-starting-out to decades-long-veterans. They came from diverse fields like ecology, psychology, academia and business. The underlying ethos of the conference was “try it and see”.

Try it we did.

We had two days of learning, trying, succeeding, failing occasionally and solving those problems until we succeeded. Thanks to ROpensci and to our sponsors: having the space and support to fail made exploration, learning and creativity a priority. It’s a rare luxury and one I’d recommend to everyone if you can!

If you’re just starting out: remember that it’s OK to fail occasionally. You’ll be a better programmer, better analyst or better modeller for it.

Violence Against Women: Standing Up with Data

Today, I spent the day at a workshop tasked with exploring the ways we can use data to contribute to ending violence against women. I was invited along by The Minerva Collective, who have been working on the project for some time.

Like all good workshops there were approximately 1001 good ideas. Facilitation was great: the future plan got narrowed down to a manageable handful.

One thing I particularly liked was that while the usual NGO and charitable contributors were present (and essential) the team from Minerva had managed to bring in a number of industry contributors from telecommunications and finance who were able to make substantial contributions. This is quite a different approach to what I’ve seen before and I’m interested to see how we can work together.

I’m looking forward to the next stage, there’s a huge capacity to make a difference. While there are no simple answers or magic bullets, data science could definitely do some good here.

A Primer on Basic Probability

… and by basic, I mean basic. I sometimes find people come to me with questions and no one has ever taken the time to give them the most basic underpinnings in probability that would make their lives a lot easier. A friend of mine is having this problem and is on a limited time frame for solving it, so this is quick and dirty and contains both wild ad-lib on my part and swearing. When I get some more time, I’ll try and expand and improve, but for now it’s better than nothing.

Youtube explainer: done without microphone, sorry- time limit again.

Slides I used:

Probability

I mentioned two links in the screencast. One was Allen Downey’s walkthrough with python, you don’t need to know anything about Python to explore this one: well worth it. The other is Victor Powell’s visualisation of conditional probability. Again, worth a few minutes exploration.

Good luck! Hit me up in the comments section if you’ve got any questions, this was a super quick run through so it’s a summary at best.