15.3 Adding a Column to a Data Frame

15.3.1 Problem

You want to add a column to a data frame.

15.3.2 Solution

Use mutate() from dplyr to add a new column and assign values to it. This returns a new data frame, which you’ll typically want save over the original.

If you assign a single value to the new column, the entire column will be filled with that value. This adds a column named newcol, filled with NA:


ToothGrowth %>%
  mutate(newcol = NA)
#>     len supp dose newcol
#> 1   4.2   VC  0.5     NA
#> 2  11.5   VC  0.5     NA
#>  ...<56 more rows>...
#> 59 29.4   OJ  2.0     NA
#> 60 23.0   OJ  2.0     NA

You can also assign a vector to the new column:

# Since ToothGrowth has 60 rows, we must create a new vector that has 60 rows
vec <- rep(c(1, 2), 30)

ToothGrowth %>%
  mutate(newcol = vec)
#>     len supp dose newcol
#> 1   4.2   VC  0.5      1
#> 2  11.5   VC  0.5      2
#>  ...<56 more rows>...
#> 59 29.4   OJ  2.0      1
#> 60 23.0   OJ  2.0      2

Note that the vector being added to the data frame must either have one element, or the same number of elements as the data frame has rows. In the example above we created a new vector that had 60 rows by repeating the values c(1, 2) thirty times.

15.3.3 Discussion

Each column of a data frame is a vector. R handles columns in data frames slightly differently from standalone vectors because all the columns in a data frame must have the same length.

To add a column using base R, you can simply assign values into the new column like so:

# Make a copy of ToothGrowth for this example
ToothGrowth2 <- ToothGrowth

# Assign NA's for the whole column
ToothGrowth2$newcol <- NA

# Assign 1 and 2, automatically repeating to fill
ToothGrowth2$newcol <- c(1, 2)

With base R, the vector being assigned into the data frame will automatically be repeated to fill the number of rows in the data frame.