## 15.13 Recoding a Categorical Variable to Another Categorical Variable

### 15.13.1 Problem

You want to recode a categorical variable to another variable.

### 15.13.2 Solution

For the examples here, we’ll use a subset of the `PlantGrowth` data set:

``````# Work on a subset of the PlantGrowth data set
pg <- PlantGrowth[c(1,2,11,21,22), ]
pg
#>    weight group
#> 1    4.17  ctrl
#> 2    5.58  ctrl
#> 11   4.81  trt1
#> 21   6.31  trt2
#> 22   5.12  trt2``````

In this example, we’ll recode the categorical variable group into another categorical variable, treatment. If the old value was `"ctrl"`, the new value will be `"No"`, and if the old value was `"trt1"` or `"trt2"`, the new value will be `"Yes"`.

This can be done with the `recode()` function from the dplyr package:

``````library(dplyr)

recode(pg\$group, ctrl = "No", trt1 = "Yes", trt2 = "Yes")
#>  No  No  Yes Yes Yes
#> Levels: No Yes``````

You can assign it as a new column in the data frame:

``pg\$treatment <- recode(pg\$group, ctrl = "No", trt1 = "Yes", trt2 = "Yes")``

Note that since the input was a factor, it returns a factor. If you want to get a character vector instead, use `as.character()`:

``````recode(as.character(pg\$group), ctrl = "No", trt1 = "Yes", trt2 = "Yes")
#>  "No"  "No"  "Yes" "Yes" "Yes"``````

### 15.13.3 Discussion

You can also use the `fct_recode()` function from the forcats package. It works the same, except the names and values are swapped, which may be a little more intuitive:

``````library(forcats)
fct_recode(pg\$group, No = "ctrl", Yes = "trt1", Yes = "trt2")
#>  No  No  Yes Yes Yes
#> Levels: No Yes``````

Another difference is that `fct_recode()` will always return a factor, whereas `recode()` will return a character vector if it is given a character vector, and will return a factor if it is given a factor. (Although dplyr does have a `recode_factor()` function which also always returns a factor.)

Using base R, recoding can be done with the `match()` function:

``````oldvals <- c("ctrl", "trt1", "trt2")
newvals <- factor(c("No", "Yes", "Yes"))

newvals[ match(pg\$group, oldvals) ]
#>  No  No  Yes Yes Yes
#> Levels: No Yes``````

It can also be done by indexing in the vectors:

``````pg\$treatment[pg\$group == "ctrl"] <- "No"
pg\$treatment[pg\$group == "trt1"] <- "Yes"
pg\$treatment[pg\$group == "trt2"] <- "Yes"

# Convert to a factor
pg\$treatment <- factor(pg\$treatment)
pg
#>    weight group treatment
#> 1    4.17  ctrl        No
#> 2    5.58  ctrl        No
#> 11   4.81  trt1       Yes
#> 21   6.31  trt2       Yes
#> 22   5.12  trt2       Yes``````

Here, we combined two of the factor levels and put the result into a new column. If you simply want to rename the levels of a factor, see Recipe 15.10.

The coding criteria can also be based on values in multiple columns, by using the `&` and `|` operators:

``````pg\$newcol[pg\$group == "ctrl" & pg\$weight < 5]  <- "no_small"
pg\$newcol[pg\$group == "ctrl" & pg\$weight >= 5] <- "no_large"
pg\$newcol[pg\$group == "trt1"] <- "yes"
pg\$newcol[pg\$group == "trt2"] <- "yes"
pg\$newcol <- factor(pg\$newcol)
pg
#>    weight group   newcol
#> 1    4.17  ctrl no_small
#> 2    5.58  ctrl no_large
#> 11   4.81  trt1      yes
#> 21   6.31  trt2      yes
#> 22   5.12  trt2      yes``````

It’s also possible to combine two columns into one using the interaction() function, which appends the values with a `.` in between. This combines the `weight` and `group` columns into a new column, `weightgroup`:

``````pg\$weightgroup <- interaction(pg\$weight, pg\$group)
pg
#>    weight group weightgroup
#> 1    4.17  ctrl   4.17.ctrl
#> 2    5.58  ctrl   5.58.ctrl
#> 11   4.81  trt1   4.81.trt1
#> 21   6.31  trt2   6.31.trt2
#> 22   5.12  trt2   5.12.trt2``````