Decoding error messages in R

Decoding error messages in R can be difficult for newcomers, that’s why I’m working on helpPlease. However, in the meantime, it’s important to be able to understand R errors and warnings in more detail than simply ‘R says no’. So here’s a quick rundown:

Errors in R an infographic

R gives both errors and warnings

An error is “R says no”. It’s R’s way of telling you why the chunk of code is not possible to execute.

Warnings mean “R says OK sure but maybe you won’t like what you’re going to get”. It’s R’s way of telling you the code is behaving in a different way than you might reasonably expect.

Decoding an error message

The error message typically comes in three parts. Here’s a common example from my code: I’ve tried to access a part of a array that doesn’t exist – my array has a column dimension of 5, so when R goes looking for a the 100th column it’s understandably confused and just gives up.

R error message

There are three main parts to this message:

  1. The declaration that it is an Error
  2. The location of the error – it’s in the line of my code fit[5,100,]
  3. The problem this mistake in my code caused: the subscript is out of bounds, i.e. I asked R to go an retrieve a part of this array that did not exist.

Decoding a warning message

Warning messages can be very variable in format, but there are often common elements. Here’s a common one that ggplot gives me:

ggplot2 warning message

Here I’ve asked ggplot2 to put a line chart together for me, but some of my data frame is missing. Ggplot2 can still put the chart together, but it’s letting me know I have missing values.

While warning messages can be very variable, there are some common elements that turn up fairly regularly:

  1. The declaration of a warning
  2. The behaviour being warned about
  3. The piece of code that caused the warning

Now that you know what warnings and errors are and what’s in them: how do you find out what they mean?

Where can you find help?

There’s lots of information out there to help you decode your warning and error messages. Here are some that I use all the time:

  • Typing ? or ?? and the name of the function that’s going wrong in the console will give you help within R itself
  • Googling the error message, warning or package is often very useful
  • Stack Overflow or the RStudio community forums can be searched for other people’s (solved!) problems
  • The vignettes and examples for the package you’re using are a wealth of information
  • Blog posts that use the package or function you are can be a very good step-by-step guide of how to prepare your data for the tool you’re trying to use
  • Building a reprex (a reproducible example) is a good way of getting ready to ask a question on Stack Overflow or the R community forums.

Good luck! And in the meantime, if you should come across an R message that could use explaining in plain text I’d really love to hear from you (especially if you’re new!).

Where do things live in R? R for Excel Users

One of the more difficult things about learning a new tool is the investment you make while you’re learning things you already know in your current tool. That can feel like time wasted – it’s not, but it’s a very frustrating experience. One of the ways to speed up this part is to ‘translate’ concepts you know in your current tool into concepts for your new one.

In that spirit, here’s a brief introduction to where things live in R compared to Excel. Excel is a very visual medium – you can see and manipulate your objects all the time. You can do the same in R, it’s just that they are arranged in slightly different ways.

Where does it live infographic

Data

Data is the most important part. In Excel, it lives in the spreadsheet. In R it lives in a data structure – commonly a data frame. In Excel you can always see your data.

Excel spreadsheet

In R you can too – go to the environment window and click on the spreadsheet-looking icon, it will give you your data in the viewer window if it’s an object that can be reproduced like that (if you don’t have this option, your object may be a list not a data frame). You can’t manipulate the data like this, however – you need code for that. You can also use commands like head(myData) to see the first few lines, tail(myData) to see the last few and print(myData) to see the whole object.

R environment view

view of data in R

Code

Excel uses code to make calculations and create statistics – but it often ‘lives’ behind the object it produces. Sometimes it can make your calculation look like the original data and create confusion for your stakeholders (and for you!).

Excel formula

In R code is used in a similar way to Excel, but it lives in a script, a .R file. This makes it easier to reuse, understand and more powerful to manipulate. Using code in a script saves a lot of time and effort.

R script

Results and calculations

In Excel, results and calculations live in a worksheet in a workbook. It can be easy to confuse with the original data, it’s hard to check if things are correct and re-running analyses (you often re-run them!) is time consuming.

In R, if you give your result or analysis a name, it will be in the Environment, waiting for you – you can print it, copy it, change it, chart it, write it out to Excel for a coworker and recreate it any time you need with your script.

A result in R

That’s just a simple run down – there’s a lot more to R! But it helps a lot to know where everything ‘lives’ as you’re getting started. Good luck!

Things I wish I’d noticed in grad school

Back in the day, I tended to get a little hyper-focussed on things. I’m sure someone, sometime, somewhere pointed this stuff out to me. But at the time it went over my head and I learned these things the hard way. Maybe my list of things I wish I’d noticed helps someone else.

  • Your professional contacts matter and it’s OK to ask for help. You’re not researching in a vacuum, the people around you want to help.
  • You need to look outside your department and university. There’s a bigger, wider world out there and while what’s going on inside your little world seems like it’s important: you need to be aware of what’s outside too.
  • Being methodologically/theoretically robust matters, yes. But learning when to let it go is going to be harder than learning the theory/methodology. No easy answers here, all you can do is make your decision and own it.
  • It doesn’t matter how much you read, you’re not going to be an expert across your whole field. Just be aware of the field and be an expert in what you’re doing right now. That’s OK.
  • Get a life. Really.

Cheat Sheets: The New Programmer’s Friend

Cheat sheets are brilliant: whether you’re learning to program for the first time or you’re picking up a new language. Most data scientists are probably programming regularly in multiple languages at any given time: cheat sheets are a handy reference guide that saves you from googling how to “do that thing you know I did it in python yesterday but how does it go in stata?”

This post is an ongoing curation of cheat sheets in the languages I use. In other words, it’s a cheat sheet for cheat sheets. Because a blog post is more efficient than googling “that cheatsheet, with the orange bit and the boxes.” You can find my list of the tutorials and how-to guides I enjoyed here.

R cheat sheets + tutorials

Python cheat sheets

Stata cheat sheets

  • There is a whole list of them here, organised by category.
  • Stata cheat sheet, I could have used this five years ago. Also very useful when it’s been awhile since you last played in the stata sandpit.
  • This isn’t a cheat sheet, but it’s an exhaustive list of commands that makes it easy to find what you want to do- as long as you already have a good idea.

SPSS cheat sheets

  • “For Dummies” has one for SPSS too.
  • This isn’t so much a cheat sheet but a very basic click-by-click guide to trying out SPSS for the first time. If you’re new to this, it’s a good start. Since SPSS is often the gateway program for many people, it’s a useful resource.

General cheat sheets + discusions

  • Comparisons between R, Stata, SPSS, SAS.
  • This post from KD Nuggets has lots of cheat sheets for R, Python, SQL and a bunch of others.

I’ll add to this list as I find things.