Common R Errors
Errors and mistakes that pop up a lot in R, what they mean, and what you can do about them.
This is a very minor reworking of an article from sometime in late 2022, posted to my data visualization class’s GitHub page. An actual Substack-native article is in the works and will be here before too long.
Common R Problems
I teach several courses in which students have to learn and use R. As such, I help students manage a lot of errors and mistakes in their code as they learn. Some of those errors and mistakes I see over and over again. This article explains what these errors are, and how to fix these things. You may find it handy if you are starting out in R, or if you are teaching or working with others starting out.
Things I often have students do include cleaning data using tidyverse packages including dplyr, making graphs with ggplot2, and rendering documents in Quarto/RMarkdown. So the errors covered here will be tilted towards issues that arise when working with those things.
I’ve divided this into two sections: (1) things breaking, and (2) mistakes in coding.
Things Breaking
These are common errors you might get, and common things that might cause errors.
General troubleshooting tips
Run your code one line at a time. Between each line, check the output and make sure it looks like you expect it to look. Often, errors occur because an earlier line of code didn’t do what we intended it to do. If you don’t spot any problems there, keep going until you hit the error message.
Read the error message and see if you can make sense of what it says
Often, even if you don’t understand the message, it will give you a hint as to what is causing the problem, and so you’ll know where to look to double-check your code
Google
There is no package called
If you get an error like there is no package called 'packagename'
, that means that you’ve tried to load or use a package - perhaps using library()
or something like that - but you don’t have that package installed.
The Fix:
Did you spell the package name correctly? Maybe you do have the package, but you made a typo
If you spelled it correctly, you may not have installed it. Install it (usually) with
install.packages('packagename')
where'packagename'
is replaced with the name of the package you actually want.
Could not find function
If you get an error like could not find function "functionname"
, that means it’s trying to look for that function, but cannot find it.
The Fix:
Did you spell the function name correctly? Maybe you made a typo.
If the function is in a package, did you load the package? Remember, the package must be re-loaded every time you restart R. If you’re working in Quarto, you must load the package inside of your Quarto script, since Quarto starts from a blank slate.
Object not found
If you get an error like object 'objectname' not found
, that means that you’ve made a reference to an object called objectname
, but there is no such object to refer to.
The Fix:
Did you spell the object’s name correctly? Maybe you made a typo.
Did you remember to create the object before referring to it? For example, the code
mean(x); x <- 1:10
would not work, because you try to take the mean ofx
before it’s been created. If you’re working in Quarto, remember that it will start from a blank slate, so you must create all objects inside of your Quarto script.
Column doesn’t exist
If you get an error like column 'columnname' doesn't exist
, that means it’s trying to refer to a column name that is not actually in your data.
The Fix:
Did you spell the column’s name correctly? Maybe you made a typo.
Are you referring to the proper dataset? Maybe you created the variable in
my_data2
but you’re trying to access it inmy_data
.Does your column name have spaces or other strange characters in it like dashes? If so, in many applications (anywhere you’re not treating the column name as a string) you have to surround the column name in backticks (`) so it knows where the column name starts and ends.
If your column name breaks some column-naming rules, like if it has a space in it or other strange special characters, or if it starts with a number, then you need to let R know where the column name actually starts and ends by using backticks `` to surround the column name.
Did you perhaps drop the column before trying to refer to it? For example, in the below code, the
group_by() %>% summarize()
will only keep columns named in thegroup_by()
or in thesummarize()
. So even though thempg
column was there at the start, it no longer is by the time we refer to it.
mtcars %>%
group_by(am) %>%
summarize(mean_hp = mean(hp)) %>%
mutate(hp_by_mpg = mean_hp/mpg)
Warning: Package was compiled with R version
If you load a package and get a warning that looks like Warning: packagename was compiled with R version...
, that just means that the package was compiled using a slightly newer version of R than yours. It’s generally not a problem as long as you have a somewhat-recent version of R.
The Fix:
Ignore it
If you want it to go away, update your R installation.
Rtools is required to build packages from source
If you are installing packages, you may be asked whether you’d like to build packages from source. If you say “Yes”, but you do not have Rtools installed, you will get an error, and the package won’t install properly.
The Fix:
When asked whether you want to install packages from source, say “No.”
Install Rtools before installing packages. The easiest way to do this is to install the installr package, and then run
installr::install.rtools()
.
File does not exist, or No such file or directory
If you get an error like 'filename.csv' does not exist
, that means that you’re trying to load a file, but the file is not in the location you’re telling R to look.
The Fix:
Did you spell the filename correctly? Maybe you made a typo. Or, perhaps, you’re trying to open an Excel fie (
.xlsx
) but telling R to look for a CSV (.csv
). Or you wrote “file.csv” but the file is “file (1).csv”.Is the file perhaps open in Excel? Sometimes, Microsoft Office will hold files hostage and not let them be opened in other programs while they’re open in Office. Close the Excel window that has the file open.
Most often, this is a working directory issue. If you tell R to look for
'filename.csv'
, it will look for that file in the working directory. Or, if you tell it to look for'data/filename.csv'
, it will look for a folder called “data” in the working directory, and for filename.csv inside of that. In recent versions of RStudio, the current working directory is listed right on top of the Console. Did you check whether the file is actually in that folder? If you’re running Quarto, it will set the working directory to the folder where the .qmd file is located. Did you put the file in the right spot relative to where the qmd is stored? See my video on filepaths for more help.
My code works fine in RStudio, but my Quarto won’t render
If your code runs fine when you run it by hand, but your Quarto throws errors, these are the most common issues:
You may have loaded a library while working on your code, but not loaded that same library in your Quarto. Include all necessary
library()
functions inside your Quarto document.You may have created an object while working on your code, but not created that same object in your Quarto. Make sure all objects are created within the Quarto itself.
To troubleshoot both (1) and (2) above, restart your R session (Session -> Restart R) and clear your environment (hit the broom icon in the Environment tab), and THEN run your code line by line to see where the error is.
If you’re loading in a file, you may have a different working directory than the Quarto is using. Remember, Quarto sets the working directory to the folder where the file is saved. See the previous “File does not exist” issue above.
Quarto hates when you try to install packages inside the Quarto (and it’s a bad idea anyway). Make sure there are no
install.packages()
orremotes::install_github()
lines in your Quarto doc.Did you make sure your code is inside of a chunk? Code outside of a chunk won’t run.
If none of those fix it, then look in the “Render” tab to see what exactly the error message is. Keep in mind that, in Quarto, the line number associated with the error message is the line number for the start of the chunk where the error is, not the actual line where the error occurs.
I don’t see an option to make a new Quarto document
If you are in RStudio and go to File → New File, you should see a “Quarto Document” option to start a new Quarto document.
If you do not see this, then you need to update your RStudio installation. Quarto integration is only in the newer versions of RStudio. Note I’m not saying to update your R installation, I’m saying to update your RStudio installation.
Object of type ‘closure’ is not subsettable
This is the most aggravating and least-informative R error. “Subsettable” is a hint though - you’re trying to take a subset of something (such as picking an element from a vector, or picking a column from a data set), but the thing you’re doing it to is a “closure” that you can’t do that to.
Most commonly, this is a case of accidentally giving the name of a function where you mean to give the name of a vector or data frame. For example:
# My data set is about how mean you are, so I'll call it mean_data
mean_data <- tibble(person = c('Me','You'), meanness = c(1000,999))
# Now let's look at that meanness variable, but oops, I'm referring to the function mean() instead of my data mean_data
mean$meanness
# Error in mean$meanness : object of type 'closure' is not subsettable
The Fix:
See where it says the error is “in” - that’s the place where you’ve referred to the wrong thing. fix it!
You probably made a typo or didn’t balance your parentheses
Just in general, a very large percentage of errors students come to me with occur simply because they made a typo, or didn’t balance their parentheses (pairing each ( with a )).
The Fix:
If you get an error message of any kind, read the relevant part of the code carefully to make sure you didn’t make a typo
Avoid coding where you have lots and lots of nested parentheses (this is something that tidyverse piping we learn is intended to avoid). Then, for whatever parentheses you do have, click on each one in RStudio - it will highlight its “partner” or show where there is a missing partner.
Mistakes in Coding
These are common mistakes I see students making, which may not always lead to error, but sometimes will. And these mistakes will often lead to incorrect results even if it runs okay.
Not checking
This is not specific to R, or really to coding. But I am begging you to do this:
After every line of code, check whether the code did what you expected it to do.
Ask yourself: what was that line of code trying to do? What should the data look like after that line runs?
Look at the data after you run the code. Does it look as expected?
If you created a variable you’d expect to take certain values, perhaps use a function like summary()
or unique()
to see if it takes those values.
If you tried to collapse the data so there should be one row per, say, country, look at your data (click it in the Environment pane) to see whether that happened.
I have been using R for years (and coding generally for decades). I can look at your code and, often, know immediately whether it’s going to do something unexpected. You (probably) don’t have that luxury. But you can replicate 90% of it by just checking your work. I cannot emphasize enough how much of a good idea this actually is and I’m not just saying this. I do this despite those decades of experience. Or perhaps because of it.
Using numbers as strings
When you put something in quotes, like ‘1’, it will treat it as a string variable - a set of characters to be read as text.
If you want a variable to be treated as a number, do not put it in quotes. This way lies madness.
R is pretty loose, and will allow you to do things like ask whether '2' > 1
, and will answer yes. This feels like it’s okay to put the 2 in quotes. But it’s not. What R is doing is saying “oof… I can’t compare a number to a letter, but you’re asking me to. I’ll just convert this number to a string as well, so I’m really doing '2' > '1'
, and since ‘2’ is alphabetically after ‘1’, that’s true, which is the desired result. Great! Except it also means that '02' > 1
is false, since ‘02’ is alphabetically before ‘1’. Same with '120'>'20'
being false. Oops.
The Fix:
If it’s a number, and you want it treated like a numeric value, don’t put quotes around it.
Using strings for variable names
If your main coding experience has been with Python so far, you’re probably used to always having to put variable names in quotes. If I want to get the variable mpg
from the Pandas DataFrame mtcars
, I do mtcars['mpg']
or mtcars.loc[:, 'mpg']
.
R is different. R makes a lot of use of what’s called “non-standard evaluation” which basically means you can often write variable names like regular code instead of putting it in strings. So we could do mtcars[['mpg']]
, but usually we’d do something like mtcars$mpg
or mtcars %>% pull(mpg)
.
Confusingly, sometimes R does require quotes (mtcars[['mpg']]
works but mtcars[[mpg]]
doesn’t), and sometimes it will accept things either way (mtcars %>% select(mpg)
and mtcars %>% select('mpg')
both work).
But often, especially when working with the tidyverse as we do, using strings for variable names can get you in trouble. For example, you’ll be scratching your head for a while to figure out why ggplot(data, aes(x = 'x', y = 'y')) + geom_point()
doesn’t work, before realizing that ggplot2 doesn’t accept variable names as strings, and it’s literally trying to graph the letter “x” against the letter “y”. Oops.
The Fix:
It’s generally a good idea to default to not using quotes around variable names.
Letting NAs propogate
In this class we do a whole lot of summary statistics! And you may find yourself with a lot of NA
(missing) values in the results. Why is that?
R’s standard summarizing functions, like mean()
or sum()
, will return NA
if any of the values you’re giving it are NA
. mean(c(1,2,3,4,5, NA))
returns NA
, not 3
.
So, if you’re doing something like a group_by() %>% summarize()
with a function like mean()
in it, then if there’s a missing value anywhere in your data, you’ll get back an NA
.
The Fix:
Whenever you use a function like
mean()
, there’s usually ana.rm = TRUE
option you can add to drop anyNA
values before computing the function. This is usually what you want.Note the
na.rm = TRUE
option goes inside themean()
or whatever function, not themutate()
or `summarize()
Indiscriminate na.omit()
One way to deal with missing values is to purge them from your data entirely. I don’t teach this in class, but I find a lot of students will look online and find the na.omit()
function for doing this.
However, be aware that na.omit()
removes rows for which there’s a missing value in any column. If you’re analyzing three columns of data, but your data set has 200 columns, then you lose a row if that missing value is anywhere. That’s probably not what you want.
The Fix:
Limit your data only to the columns you’re going to use (perhaps with
select()
) before runningna.omit()
Instead use the superior
drop_na()
in dplyr/tidyverse, which also lets you specify the set of variables to look for NAs in.data %>% drop_na(a, b)
will only drop rows if they have missing values in thea
orb
columns, not any others.
Mixing up <-, =, and ==
<-
is for assigning objects. You can say a <- 1
, and that will create an object a
with the value 1
inside.
=
can also be for assigning objects, just like <-
, but is also for assigning arguments and options in functions. In the function mean()
, which takes the argument x
, I can say mean(x = 1:10)
but not mean(x <- 1:10)
. (OK technically that last code works but probably not as you intend, and it’s not a great idea).
==
is for checking whether two things are equal. a == 1
will check whether the object a
is equal to 1
, and will return TRUE
if it is, or FALSE
otherwise.
Commonly, this will cause problems in cases like this:
data %>%
filter(state = "WA")
which we might expect to filter our data to rows that are in the state of Washington, but instead will try to assign a new object state
to be equal to "WA"
.
summarize() without summaries
summarize()
is a dplyr/tidyverse function that summarizes the data down to have one row, or one row per group if you’ve previously specified a group_by()
.
However, for it to do this, you need to tell it how to summarize that data. So, each argument needs to be a function that returns one value. Or else it won’t work.
Most commonly, students will just list a bunch of variables in their summarize()
and expect it to take the mean of each of them. But it doesn’t do that, and will instead return a non-summarized data set.
# wrong:
mtcars %>%
group_by(am) %>%
summarize(mpg, hp)
# right:
mtcars %>%
group_by(am) %>%
summarize(mpg = mean(mpg), hp = mean(hp))
# or if you want to be fancy
mtcars %>%
group_by(am) %>%
summarize(across(c(mpg, hp), mean))
Working way too hard
Data communications is a class in which we need to learn some coding to do what we need to do, but it’s not a coding class. If there’s a difficult task, I’ve usually gone over a simple solution somewhere in class.
If you’re trawling arcane StackExchange answers and pasting together a bunch of code you don’t understand, or if you’re spending tedious hours writing out a thousand lines of code, there’s a good chance that you’re doing something the hard way, when I’ve provided an easy way in class.
Some tips for avoiding this:
If you’re writing tiny variations on the same code a zillion times, try doing it in a loop instead
If you’re having difficulty working with a date variable, there’s probably a function that does things for you in the lubridate package
If you’re having difficulty working with a string variable, there’s probably a function that does things for you in the stringr package
If you’re spending hours and hours on what feels like a problem that should have a simple answer, it probably does. Ask someone online.