15.14 Recoding a Continuous Variable to a Categorical Variable

15.14.1 Problem

You want to recode a continuous variable to another variable.

15.14.2 Solution

Use the cut() function. In this example, we’ll use the PlantGrowth data set and recode the continuous variable weight into a categorical variable, wtclass, using the cut() function:

pg <- PlantGrowth
pg$wtclass <- cut(pg$weight, breaks = c(0, 5, 6, Inf))
pg
#>    weight group wtclass
#> 1    4.17  ctrl   (0,5]
#> 2    5.58  ctrl   (5,6]
#>  ...<26 more rows>...
#> 29   5.80  trt2   (5,6]
#> 30   5.26  trt2   (5,6]

15.14.3 Discussion

For three categories we specify four bounds, which can include Inf and -Inf. If a data value falls outside of the specified bounds, it’s categorized as NA. The result of cut() is a factor, and you can see from the example that the factor levels are named after the bounds.

To change the names of the levels, set the labels:

pg$wtclass <- cut(pg$weight, breaks = c(0, 5, 6, Inf),
                  labels = c("small", "medium", "large"))
pg
#>    weight group wtclass
#> 1    4.17  ctrl   small
#> 2    5.58  ctrl  medium
#>  ...<26 more rows>...
#> 29   5.80  trt2  medium
#> 30   5.26  trt2  medium

As indicated by the factor levels, the bounds are by default open on the left and closed on the right. In other words, they don’t include the lowest value, but they do include the highest value. For the smallest category, you can have it include both the lower and upper values by setting include.lowest=TRUE. In this example, this would result in 0 values going into the small category; otherwise, 0 would be coded as NA.

If you want the categories to be closed on the left and open on the right, set right = FALSE:

cut(pg$weight, breaks = c(0, 5, 6, Inf), right = FALSE)
#>  [1] [0,5)   [5,6)   [5,6)   [6,Inf) [0,5)   [0,5)   [5,6)   [0,5)   [5,6)  
#> [10] [5,6)   [0,5)   [0,5)   [0,5)   [0,5)   [5,6)   [0,5)   [6,Inf) [0,5)  
#> [19] [0,5)   [0,5)   [6,Inf) [5,6)   [5,6)   [5,6)   [5,6)   [5,6)   [0,5)  
#> [28] [6,Inf) [5,6)   [5,6)  
#> Levels: [0,5) [5,6) [6,Inf)

15.14.4 See Also

To recode a categorical variable to another categorical variable, see Recipe 15.13.