15.13 Recoding a Categorical Variable to Another Categorical Variable
15.13.2 Solution
For the examples here, we’ll use a subset of the PlantGrowth
data set:
# Work on a subset of the PlantGrowth data set
pg <- PlantGrowth[c(1,2,11,21,22), ]
pg
#> weight group
#> 1 4.17 ctrl
#> 2 5.58 ctrl
#> 11 4.81 trt1
#> 21 6.31 trt2
#> 22 5.12 trt2
In this example, we’ll recode the categorical variable group into another categorical variable, treatment. If the old value was "ctrl"
, the new value will be "No"
, and if the old value was "trt1"
or "trt2"
, the new value will be "Yes"
.
This can be done with the recode()
function from the dplyr package:
library(dplyr)
recode(pg$group, ctrl = "No", trt1 = "Yes", trt2 = "Yes")
#> [1] No No Yes Yes Yes
#> Levels: No Yes
You can assign it as a new column in the data frame:
Note that since the input was a factor, it returns a factor. If you want to get a character vector instead, use as.character()
:
15.13.3 Discussion
You can also use the fct_recode()
function from the forcats package. It works the same, except the names and values are swapped, which may be a little more intuitive:
library(forcats)
fct_recode(pg$group, No = "ctrl", Yes = "trt1", Yes = "trt2")
#> [1] No No Yes Yes Yes
#> Levels: No Yes
Another difference is that fct_recode()
will always return a factor, whereas recode()
will return a character vector if it is given a character vector, and will return a factor if it is given a factor. (Although dplyr does have a recode_factor()
function which also always returns a factor.)
Using base R, recoding can be done with the match()
function:
oldvals <- c("ctrl", "trt1", "trt2")
newvals <- factor(c("No", "Yes", "Yes"))
newvals[ match(pg$group, oldvals) ]
#> [1] No No Yes Yes Yes
#> Levels: No Yes
It can also be done by indexing in the vectors:
pg$treatment[pg$group == "ctrl"] <- "No"
pg$treatment[pg$group == "trt1"] <- "Yes"
pg$treatment[pg$group == "trt2"] <- "Yes"
# Convert to a factor
pg$treatment <- factor(pg$treatment)
pg
#> weight group treatment
#> 1 4.17 ctrl No
#> 2 5.58 ctrl No
#> 11 4.81 trt1 Yes
#> 21 6.31 trt2 Yes
#> 22 5.12 trt2 Yes
Here, we combined two of the factor levels and put the result into a new column. If you simply want to rename the levels of a factor, see Recipe 15.10.
The coding criteria can also be based on values in multiple columns, by using the &
and |
operators:
pg$newcol[pg$group == "ctrl" & pg$weight < 5] <- "no_small"
pg$newcol[pg$group == "ctrl" & pg$weight >= 5] <- "no_large"
pg$newcol[pg$group == "trt1"] <- "yes"
pg$newcol[pg$group == "trt2"] <- "yes"
pg$newcol <- factor(pg$newcol)
pg
#> weight group newcol
#> 1 4.17 ctrl no_small
#> 2 5.58 ctrl no_large
#> 11 4.81 trt1 yes
#> 21 6.31 trt2 yes
#> 22 5.12 trt2 yes
It’s also possible to combine two columns into one using the interaction() function, which appends the values with a .
in between. This combines the weight
and group
columns into a new column, weightgroup
: