Chapter 12 Using Colors in Plots

In ggplot2’s implementation of the grammar of graphics, color is an aesthetic, just like x position, y position, and size. If color is just another aesthetic, why does it deserve its own chapter? The reason is that color is a more complicated aesthetic than the others. Instead of simply moving geoms left and right or making them larger and smaller, when you use color, there are many degrees of freedom and many more choices to make. What palette should you use for discrete values? Should you use a gradient with several different hues? How do you choose colors that can be interpreted accurately by those with color-vision deficiencies? In this chapter, I’ll address these issues.

12.1 Setting the Colors of Objects

12.1.1 Problem

You want to set the color of some geoms in your graph.

12.1.2 Solution

In the call to the geom, set the values of colour or fill (Figure 12.1):

Setting fill and colour (left); Setting colour for points (right)Setting fill and colour (left); Setting colour for points (right)

Figure 12.1: Setting fill and colour (left); Setting colour for points (right)

12.1.3 Discussion

In ggplot2, there’s an important difference between setting and mapping aesthetic properties. In the preceding example, we set the color of the objects to “red”.

Generally speaking, colour controls the color of lines and of the outlines of polygons, while fill controls the color of the fill area of polygons. However, point shapes are sometimes a little different. For most point shapes, the color of the entire point is controlled by colour, not fill. The exception is the point shapes (21–25) that have both a fill and an outline.

You can use colour or color interchangeably with ggplot2. In this book, I’ve used colour, in keeping with the form used in the official ggplot2 documentation.

12.1.4 See Also

For more information about point shapes, see Recipe 4.5.

See Recipe 12.5 for more on specifying colors.

12.2 Representing Variables with Colors

12.2.1 Problem

You want to use a variable (column from a data frame) to control the color of geoms.

12.2.2 Solution

In the call to the geom, inside of aes(), set the value of colour or fill to the name of one of the columns in the data (Figure 12.2):

Mapping a variable to fill (left); Mapping a variable to colour for points (right)Mapping a variable to fill (left); Mapping a variable to colour for points (right)

Figure 12.2: Mapping a variable to fill (left); Mapping a variable to colour for points (right)

When the mapping is specified in ggplot() it is used as the default mapping, which is inherited by all the geoms. Within a geom, the default mappings can be overridden.

12.2.3 Discussion

In the cabbage_exp example, the variable Cultivar is mapped to fill. The Cultivar column in cabbage_exp is a factor, so ggplot treats it as a categorical variable. You can check the type using str():

In the mtcars example, cyl is numeric, so it is treated as a continuous variable. Because of this, even though the actual values of cyl include only 4, 6, and 8, the legend has entries for the intermediate values 5 and 7. To make ggplot treat cyl as a categorical variable, you can convert it to a factor in the call to ggplot() (Figure 12.3, left), or you can modify the data so that the column is a character vector or factor (Figure 12.3, right):

Converting cyl to a factor, within the call to ggplot (left); By modifying the dataframe (right)Converting cyl to a factor, within the call to ggplot (left); By modifying the dataframe (right)

Figure 12.3: Converting cyl to a factor, within the call to ggplot (left); By modifying the dataframe (right)

12.2.4 See Also

You may also want to change the colors that are used in the scale. For continuous data, see Recipe 12.6. For discrete data, see Recipe 12.4 and Recipe 12.5.

12.3 Using a Colorblind-Friendly Palette

12.3.1 Problem

You want to select a color palette that can also be distinguished by colorblind viewers.

12.3.2 Solution

Use the color scales in the viridis package.

The viridis package contains a set of beautiful color scales that are each designed to span as wide a palette as possible, making it easier to see differences in your data. These scales are also designed to be perceptually uniform, printable in grey scale, and easier to read by those with colorblindness.

Here is an example from the introduction page to viridis (https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html; Figure 12.4):

Example of viridis color palette

Figure 12.4: Example of viridis color palette

The viridis color scales can be implemented for data that is both continuous and discrete in nature. You will need to add scale_fill_viridis_c() to your plot if your data is continuous. If your data is discrete you will need to use scale_fill_viridis_d() instead, as in Figure 12.5 below:

A plot with the colorblind-friendly viridis palette

Figure 12.5: A plot with the colorblind-friendly viridis palette

12.3.3 Discussion

About 8 percent of males and 0.5 percent of females have some form of color-vision deficiency, so there’s a good chance that someone in your audience will be among them. There are many different forms of color blindness - the palettes that are mentioned in this book are designed to enable people with any of the most common forms of color-vision deficiency to distinguish the colors. (Monochromacy, or total colorblindness, is rare. Those who have it can only see differences in brightness.)

The viridis color scales come with the current version of ggplot2 (3.0.0). There are also other color palettes that are friendly to users with color blindness, such as those in cetcolor package (see below).

12.3.4 See Also

To see more on the different viridis palettes, see ?scales::viridis_pal.

The cetcolor scales: https://github.com/coatless/cetcolor.

The Color Oracle program (http://colororacle.org) can simulate how things on your screen appear to someone with color vision deficiency, but keep in mind that the simulation isn’t perfect. In my informal testing, I viewed an image with simulated red-green deficiency, and I could distinguish the colors just fine – but others with actual red-green deficiency viewed the same image and couldn’t tell the colors apart!

12.4 Using a Different Palette for a Discrete Variable

12.4.1 Problem

You want to use different colors for a discrete mapped variable.

12.4.2 Solution

Use one of the scales listed in Table 12.1.

Table 12.1: Discrete fill and color scales
Fill scale Color scale Description
scale_fill_discrete() scale_colour_discrete() Colors evenly spaced around the color wheel (same as hue)
scale_fill_hue() scale_colour_hue() Colors evenly spaced around the color wheel (same as discrete)
scale_fill_grey() scale_colour_grey() Greyscale palette
scale_fill_viridis_d() scale_colour_viridis_d()
 Viridis palettes
scale_fill_brewer() scale_colour_brewer() ColorBrewer palettes
scale_fill_manual() scale_colour_manual() Manually specified colors

In the example here we’ll use the default palette (hue), a viridis palette, and a ColorBrewer palette (Figure 12.6):

Default palette (using hue; top); A viridis palette (middle); A ColorBrewer palette (bottom)Default palette (using hue; top); A viridis palette (middle); A ColorBrewer palette (bottom)Default palette (using hue; top); A viridis palette (middle); A ColorBrewer palette (bottom)

Figure 12.6: Default palette (using hue; top); A viridis palette (middle); A ColorBrewer palette (bottom)

12.4.3 Discussion

Changing a palette is a modification of the color (or fill) scale: it involves a change in the mapping from numeric or categorical values to aesthetic attributes. There are two types of scales that use colors: fill scales and color scales.

With scale_fill_hue(), the colors are taken from around the color wheel in the HCL (hue-chroma-lightness) color space. The default lightness value is 65 on a scale from 0–100. This is good for filled areas, but it’s a bit light for points and lines. To make the colors darker for points and lines, as in Figure 12.7 (right), set the value of l (luminance/lightness):

Points with default lightness (left); With lightness set to 45 (right)Points with default lightness (left); With lightness set to 45 (right)

Figure 12.7: Points with default lightness (left); With lightness set to 45 (right)

The viridis package provides a number of color scales that make it easy to see differences across your data. See Recipe 12.3 for more details and examples.

The ColorBrewer package provides a number of palettes. You can generate a graphic showing all of them, as shown in Figure 12.8:

All the ColorBrewer palettes

Figure 12.8: All the ColorBrewer palettes

The ColorBrewer palettes can be selected by name. For example, this will use the “Oranges” palette (Figure 12.9):

Using a named ColorBrewer palette

Figure 12.9: Using a named ColorBrewer palette

You can also use a palette of greys. This is useful for print when the output is in black and white. The default is to start at 0.2 and end at 0.8, on a scale from 0 (black) to 1 (white), but you can change the range, as shown in Figure 12.10.

Using the default grey palette (left); A different grey palette (right)Using the default grey palette (left); A different grey palette (right)

Figure 12.10: Using the default grey palette (left); A different grey palette (right)

12.4.4 See Also

See Recipe 10.4 for more information about reversing the legend.

To select colors manually, see Recipe 12.5.

For more about viridis, see https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html. For more about ColorBrewer, see http://colorbrewer2.org.

12.5 Using a Manually Defined Palette for a Discrete Variable

12.5.1 Problem

You want to use different colors for a discrete mapped variable.

12.5.2 Solution

In the example here, we’ll manually define colors by specifying values with scale_colour_manual() (Figure 12.11). The colors can be named, or they can be specified with RGB values:

Scatter plot with named colors (top left); With slightly different RGB colors (top right); With colors from the viridis color scale (bottom)Scatter plot with named colors (top left); With slightly different RGB colors (top right); With colors from the viridis color scale (bottom)Scatter plot with named colors (top left); With slightly different RGB colors (top right); With colors from the viridis color scale (bottom)

Figure 12.11: Scatter plot with named colors (top left); With slightly different RGB colors (top right); With colors from the viridis color scale (bottom)

For fill scales, use scale_fill_manual() instead.

12.5.3 Discussion

The order of the items in the values vector matches the order of the factor levels for the discrete scale. In the preceding example, the order of sex is f, then m, so the first item in values goes with f and the second goes with m. Here’s how to see the order of factor levels:

If the variable is a character vector, not a factor, it will automatically be converted to a factor, and by default the levels will appear in alphabetical order.

It’s possible to specify the colors in a different order by using a named vector:

There is a large set of named colors in R, which you can see by running color(). Some basic color names are useful: “white”, “black”, “grey80”, “red”, “blue”, “darkred”, and so on. There are many other named colors, but their names are generally not very informative (I certainly have no idea what “thistle3” and “seashell” look like), so it’s often easier to use numeric RGB values for specifying colors.

RGB colors are specified as six-digit hexadecimal (base-16) numbers of the form #RRGGBB. In hexadecimal, the digits go from 0 to 9, and then continue with A (10 in base 10) to F (15 in base 10). Each color is represented by two digits and can range from 00 to FF (255 in base 10). So, for example, the color #FF0099 has a value of 255 for red, 0 for green, and 153 for blue, resulting in a shade of magenta. The hexadecimal numbers for each color channel often repeat the same digit because it makes them a little easier to read, and because the precise value of the second digit has a relatively insignificant effect on appearance.

Here are some rules of thumb for specifying and adjusting RGB colors:

  • In general, higher numbers are brighter and lower numbers are darker.
  • To get a shade of grey, set all the channels to the same value.
  • The opposites of RGB are CMY: Cyan, Magenta, and Yellow. Higher values for the red channel make it more red, and lower values make it more cyan. The same is true for the pairs green and magenta, and blue and yellow.

You may want to manually select colors based on the color scales in the viridis package, as described in Recipe 12.4. You can do so by calling viridis() and passing it the number of discrete categories you have. This will generate the RGB hexadecimal values. You can similarly generate the RGB values for the other color scales in the viridis package: “magma”, “plasma”, “inferno”, and “cividis”.

12.5.4 See Also

A chart of RGB color codes: http://html-color-codes.com.

12.6 Using a Manually Defined Palette for a Continuous Variable

12.6.1 Problem

You want to use different colors for a continuous variable.

12.6.2 Solution

In the example here, we’ll specify the colors for a continuous variable using various gradient scales (Figure 12.12). The colors can be named, or they can be specified with RGB values:

Clockwise from top left: default colors, two-color gradient (black and white) with scale_colour_gradient(), three-color gradient with midpoint with scale_colour_gradient2(), four-color gradient with scale_colour_gradientn()Clockwise from top left: default colors, two-color gradient (black and white) with scale_colour_gradient(), three-color gradient with midpoint with scale_colour_gradient2(), four-color gradient with scale_colour_gradientn()Clockwise from top left: default colors, two-color gradient (black and white) with scale_colour_gradient(), three-color gradient with midpoint with scale_colour_gradient2(), four-color gradient with scale_colour_gradientn()Clockwise from top left: default colors, two-color gradient (black and white) with scale_colour_gradient(), three-color gradient with midpoint with scale_colour_gradient2(), four-color gradient with scale_colour_gradientn()

Figure 12.12: Clockwise from top left: default colors, two-color gradient (black and white) with scale_colour_gradient(), three-color gradient with midpoint with scale_colour_gradient2(), four-color gradient with scale_colour_gradientn()

For fill scales, use scale_fill_xxx() versions instead, where xxx is one of gradient, gradient2, or gradientn.

12.6.3 Discussion

Mapping continuous values to a color scale requires a continuously changing palette of colors. Table 12.2. lists the continuous color and fill scales.

Table 12.2: Continuous fill and color scales
Fill scale Color scale Description
scale_fill_gradient() scale_colour_gradient() Two-color gradient
scale_fill_gradient2() scale_colour_gradient2() Gradient with a middle color and two colors that diverge from it
scale_fill_gradientn() scale_colour_gradientn() Gradient with n colors, equally spaced
scale_fill_viridis_c() scale_colour_viridis_c() Viridis palettes

Notice that we used the muted() function in the examples. This is a function from the scales package that returns an RGB value that is a less-saturated version of the color chosen.

12.6.4 See Also

If you want use a discrete (categorical) scale instead of a continuous one, you can recode your data into categorical values. See Recipe 15.14.

12.7 Coloring a Shaded Region Based on Value

12.7.1 Problem

You want to set the color of a shaded region based on the y value.

12.7.3 Discussion

If you look closely at the figure, you’ll notice that there are some stray shaded areas near the zero line. This is because each of the two colored areas is a single polygon bounded by the data points, and the data points are not actually at zero. To solve this problem, we can interpolate the data to 1,000 points by using approx():

It would be more precise (and more complicated) to interpolate exactly where the line crosses zero, but approx() works fine for the purposes here.

Now we can plot the interpolated data (Figure 12.14). This time we’ll make a few adjustments – we’ll make the shaded regions partially transparent, change the colors, remove the legend, and remove the padding on the left and right sides:

Shaded regions with interpolated data

Figure 12.14: Shaded regions with interpolated data