15.13 Recoding a Categorical Variable to Another Categorical Variable

15.13.1 Problem

You want to recode a categorical variable to another variable.

15.13.2 Solution

For the examples here, we’ll use a subset of the PlantGrowth data set:

In this example, we’ll recode the categorical variable group into another categorical variable, treatment. If the old value was "ctrl", the new value will be "No", and if the old value was "trt1" or "trt2", the new value will be "Yes".

This can be done with the recode() function from the dplyr package:

You can assign it as a new column in the data frame:

Note that since the input was a factor, it returns a factor. If you want to get a character vector instead, use as.character():

15.13.3 Discussion

You can also use the fct_recode() function from the forcats package. It works the same, except the names and values are swapped, which may be a little more intuitive:

Another difference is that fct_recode() will always return a factor, whereas recode() will return a character vector if it is given a character vector, and will return a factor if it is given a factor. (Although dplyr does have a recode_factor() function which also always returns a factor.)

Using base R, recoding can be done with the match() function:

It can also be done by indexing in the vectors:

Here, we combined two of the factor levels and put the result into a new column. If you simply want to rename the levels of a factor, see Recipe 15.10.

The coding criteria can also be based on values in multiple columns, by using the & and | operators:

It’s also possible to combine two columns into one using the interaction() function, which appends the values with a . in between. This combines the weight and group columns into a new column, weightgroup:

15.13.4 See Also

For more on renaming factor levels, see Recipe 15.10.

See Recipe 15.14 for recoding continuous values to categorical values.