15.13 Recoding a Categorical Variable to Another Categorical Variable
15.13.2 Solution
For the examples here, we’ll use a subset of the PlantGrowth data set:
# Work on a subset of the PlantGrowth data set
pg <- PlantGrowth[c(1,2,11,21,22), ]
pg
#> weight group
#> 1 4.17 ctrl
#> 2 5.58 ctrl
#> 11 4.81 trt1
#> 21 6.31 trt2
#> 22 5.12 trt2In this example, we’ll recode the categorical variable group into another categorical variable, treatment. If the old value was "ctrl", the new value will be "No", and if the old value was "trt1" or "trt2", the new value will be "Yes".
This can be done with the recode() function from the dplyr package:
library(dplyr)
recode(pg$group, ctrl = "No", trt1 = "Yes", trt2 = "Yes")
#> [1] No No Yes Yes Yes
#> Levels: No YesYou can assign it as a new column in the data frame:
Note that since the input was a factor, it returns a factor. If you want to get a character vector instead, use as.character():
15.13.3 Discussion
You can also use the fct_recode() function from the forcats package. It works the same, except the names and values are swapped, which may be a little more intuitive:
library(forcats)
fct_recode(pg$group, No = "ctrl", Yes = "trt1", Yes = "trt2")
#> [1] No No Yes Yes Yes
#> Levels: No YesAnother difference is that fct_recode() will always return a factor, whereas recode() will return a character vector if it is given a character vector, and will return a factor if it is given a factor. (Although dplyr does have a recode_factor() function which also always returns a factor.)
Using base R, recoding can be done with the match() function:
oldvals <- c("ctrl", "trt1", "trt2")
newvals <- factor(c("No", "Yes", "Yes"))
newvals[ match(pg$group, oldvals) ]
#> [1] No No Yes Yes Yes
#> Levels: No YesIt can also be done by indexing in the vectors:
pg$treatment[pg$group == "ctrl"] <- "No"
pg$treatment[pg$group == "trt1"] <- "Yes"
pg$treatment[pg$group == "trt2"] <- "Yes"
# Convert to a factor
pg$treatment <- factor(pg$treatment)
pg
#> weight group treatment
#> 1 4.17 ctrl No
#> 2 5.58 ctrl No
#> 11 4.81 trt1 Yes
#> 21 6.31 trt2 Yes
#> 22 5.12 trt2 YesHere, we combined two of the factor levels and put the result into a new column. If you simply want to rename the levels of a factor, see Recipe 15.10.
The coding criteria can also be based on values in multiple columns, by using the & and | operators:
pg$newcol[pg$group == "ctrl" & pg$weight < 5] <- "no_small"
pg$newcol[pg$group == "ctrl" & pg$weight >= 5] <- "no_large"
pg$newcol[pg$group == "trt1"] <- "yes"
pg$newcol[pg$group == "trt2"] <- "yes"
pg$newcol <- factor(pg$newcol)
pg
#> weight group newcol
#> 1 4.17 ctrl no_small
#> 2 5.58 ctrl no_large
#> 11 4.81 trt1 yes
#> 21 6.31 trt2 yes
#> 22 5.12 trt2 yesIt’s also possible to combine two columns into one using the interaction() function, which appends the values with a . in between. This combines the weight and group columns into a new column, weightgroup: