## 6.4 Making Multiple Density Curves from Grouped Data

### 6.4.1 Problem

You want to make density curves of multiple groups of data.

### 6.4.2 Solution

Use `geom_density()`, and map the grouping variable to an aesthetic like `colour` or `fill`, as shown in Figure 6.12. The grouping variable must be a factor or a character vector. In the `birthwt` data set, the desired grouping variable, `smoke`, is stored as a number, so we have to convert it to a factor first.

``````library(MASS) # Load MASS for the birthwt data set

birthwt_mod <- birthwt %>%
mutate(smoke = as.factor(smoke)) # Convert smoke to a factor

# Map smoke to colour
ggplot(birthwt_mod, aes(x = bwt, colour = smoke)) +
geom_density()

# Map smoke to fill and make the fill semitransparent by setting alpha
ggplot(birthwt_mod, aes(x = bwt, fill = smoke)) +
geom_density(alpha = .3)``````  Figure 6.12: Different line colors for each group (left); Different semitransparent fill colors for each group (right)

### 6.4.3 Discussion

To make these plots, the data must all be in one data frame, with one column containing a categorical variable used for grouping.

For this example, we used the `birthwt` data set. It contains data about birth weights and a number of risk factors for low birth weight:

``````birthwt
#>    low age lwt race smoke ptl ht ui ftv  bwt
#> 85   0  19 182    2     0   0  0  1   0 2523
#> 86   0  33 155    3     0   0  0  0   3 2551
#> 87   0  20 105    1     1   0  0  0   1 2557
#>  ...<183 more rows>...
#> 82   1  23  94    3     1   0  0  0   0 2495
#> 83   1  17 142    2     0   0  1  0   0 2495
#> 84   1  21 130    1     1   0  1  0   3 2495``````

We looked at the relationship between `smoke` (smoking) and `bwt` (birth weight in grams). The value of `smoke` is either 0 or 1, but since it’s stored as a numeric vector, ggplot doesn’t know that it should be treated as a categorical variable. To make it so ggplot knows to treat `smoke` as categorical, we can either convert that column of the data frame to a factor, or tell ggplot to treat it as a factor by using `factor(smoke)` inside of the `aes()` statement. For these examples, we converted `smoke` to a factor.

Another method for visualizing the distributions is to use facets, as shown in Figure 6.13. We can align the facets vertically or horizontally. Here we’ll align them vertically so that it’s easy to compare the two distributions:

``````ggplot(birthwt_mod, aes(x = bwt)) +
geom_density() +
facet_grid(smoke ~ .)``````

One problem with the faceted graph is that the facet labels are just 0 and 1, and there’s no label indicating that those values are for smoke. To change the labels, we need to change the names of the factor levels. First we’ll take a look at the factor levels, then we’ll assign new factor level names:

``````levels(birthwt_mod\$smoke)
#>  "0" "1"

birthwt_mod\$smoke <- recode(birthwt_mod\$smoke, '0' = 'No Smoke', '1' = 'Smoke')``````

Now when we plot our modified data frame, our desired labels appear (Figure 6.13, right):

``````ggplot(birthwt_mod, aes(x = bwt)) +
geom_density() +
facet_grid(smoke ~ .)``````  Figure 6.13: Density curves with facets (left); With different facet labels (right)

If you want to see the histograms along with the density curves, the best option is to use facets, since other methods of visualizing both histograms in a single graph can be difficult to interpret. To do this, map `y = ..density..`, so that the histogram is scaled down to the height of the density curves. In this example, we’ll also make the histogram bars a little less prominent by changing the colors (Figure 6.14):

``````ggplot(birthwt_mod, aes(x = bwt, y = ..density..)) +
geom_histogram(binwidth = 200, fill = "cornsilk", colour = "grey60", size = .2) +
geom_density() +
facet_grid(smoke ~ .)`````` Figure 6.14: Density curves overlaid on histograms