15.13 Recoding a Categorical Variable to Another Categorical Variable
15.13.2 Solution
For the examples here, we’ll use a subset of the PlantGrowth
data set:
# Work on a subset of the PlantGrowth data set
PlantGrowth[c(1,2,11,21,22), ]
pg <-
pg#> weight group
#> 1 4.17 ctrl
#> 2 5.58 ctrl
#> 11 4.81 trt1
#> 21 6.31 trt2
#> 22 5.12 trt2
In this example, we’ll recode the categorical variable group into another categorical variable, treatment. If the old value was "ctrl"
, the new value will be "No"
, and if the old value was "trt1"
or "trt2"
, the new value will be "Yes"
.
This can be done with the recode()
function from the dplyr package:
library(dplyr)
recode(pg$group, ctrl = "No", trt1 = "Yes", trt2 = "Yes")
#> [1] No No Yes Yes Yes
#> Levels: No Yes
You can assign it as a new column in the data frame:
$treatment <- recode(pg$group, ctrl = "No", trt1 = "Yes", trt2 = "Yes") pg
Note that since the input was a factor, it returns a factor. If you want to get a character vector instead, use as.character()
:
recode(as.character(pg$group), ctrl = "No", trt1 = "Yes", trt2 = "Yes")
#> [1] "No" "No" "Yes" "Yes" "Yes"
15.13.3 Discussion
You can also use the fct_recode()
function from the forcats package. It works the same, except the names and values are swapped, which may be a little more intuitive:
library(forcats)
fct_recode(pg$group, No = "ctrl", Yes = "trt1", Yes = "trt2")
#> [1] No No Yes Yes Yes
#> Levels: No Yes
Another difference is that fct_recode()
will always return a factor, whereas recode()
will return a character vector if it is given a character vector, and will return a factor if it is given a factor. (Although dplyr does have a recode_factor()
function which also always returns a factor.)
Using base R, recoding can be done with the match()
function:
c("ctrl", "trt1", "trt2")
oldvals <- factor(c("No", "Yes", "Yes"))
newvals <-
match(pg$group, oldvals) ]
newvals[ #> [1] No No Yes Yes Yes
#> Levels: No Yes
It can also be done by indexing in the vectors:
$treatment[pg$group == "ctrl"] <- "No"
pg$treatment[pg$group == "trt1"] <- "Yes"
pg$treatment[pg$group == "trt2"] <- "Yes"
pg
# Convert to a factor
$treatment <- factor(pg$treatment)
pg
pg#> weight group treatment
#> 1 4.17 ctrl No
#> 2 5.58 ctrl No
#> 11 4.81 trt1 Yes
#> 21 6.31 trt2 Yes
#> 22 5.12 trt2 Yes
Here, we combined two of the factor levels and put the result into a new column. If you simply want to rename the levels of a factor, see Recipe 15.10.
The coding criteria can also be based on values in multiple columns, by using the &
and |
operators:
$newcol[pg$group == "ctrl" & pg$weight < 5] <- "no_small"
pg$newcol[pg$group == "ctrl" & pg$weight >= 5] <- "no_large"
pg$newcol[pg$group == "trt1"] <- "yes"
pg$newcol[pg$group == "trt2"] <- "yes"
pg$newcol <- factor(pg$newcol)
pg
pg#> weight group newcol
#> 1 4.17 ctrl no_small
#> 2 5.58 ctrl no_large
#> 11 4.81 trt1 yes
#> 21 6.31 trt2 yes
#> 22 5.12 trt2 yes
It’s also possible to combine two columns into one using the interaction() function, which appends the values with a .
in between. This combines the weight
and group
columns into a new column, weightgroup
:
$weightgroup <- interaction(pg$weight, pg$group)
pg
pg#> weight group weightgroup
#> 1 4.17 ctrl 4.17.ctrl
#> 2 5.58 ctrl 5.58.ctrl
#> 11 4.81 trt1 4.81.trt1
#> 21 6.31 trt2 6.31.trt2
#> 22 5.12 trt2 5.12.trt2