13.1 Making a Correlation Matrix

13.1.1 Problem

You want to make a graphical correlation matrix.

13.1.2 Solution

We’ll look at the mtcars data set:

First, generate the numerical correlation matrix using cor. This will generate correlation coefficients for each pair of columns:

If there are any columns that you don’t want used for correlations (for example, a column of names), you should exclude them. If there are any NA cells in the original data, the resulting correlation matrix will have NA values. To deal with this, you will probably want to use the argument use="complete.obs" or use="pairwise.complete.obs".

To plot the correlation matrix (Figure 13.1), we’ll use the corrplot package:

A correlation matrix

Figure 13.1: A correlation matrix

13.1.3 Discussion

The corrplot() function has many, many options. Here is an example of how to make a correlation matrix with colored squares and black labels, rotated 45 degrees along the top (Figure 13.2):

Correlation matrix with colored squares and black, rotated labels

Figure 13.2: Correlation matrix with colored squares and black, rotated labels

It may also be helpful to display labels representing the correlation coefficient on each square in the matrix. In this example we’ll make a lighter palette so that the text is readable, and we’ll remove the color legend, since it’s redundant. We’ll also order the items so that correlated items are closer together, using the order="AOE" (angular order of eigenvectors) option. The result is shown in Figure 13.3:

Correlation matrix with correlation coefficients and no legend

Figure 13.3: Correlation matrix with correlation coefficients and no legend

Like many other standalone plotting functions, corrplot() has its own menagerie of options, which can’t all be illustrated here. Table 13.1 lists some useful options.

Table 13.1: Options for corrplot()
Option Description
type={"lower" | "upper"} Only use the lower or upper triangle
diag=FALSE Don’t show values on the diagonal
addshade="all" Add lines indicating the direction of the correlation
shade.col=NA Hide correlation direction lines
method="shade" Use colored squares
method="ellipse" Use ellipses
addCoef.col="*color*" Add correlation coefficients, in color
tl.srt="*number*" Specify the rotation angle for top labels
tl.col="*color*" Specify the label color
order={"AOE" | "FPC" | "hclust"} Sort labels using angular order of eigenvectors, first principal component, or hierarchical clustering

13.1.4 See Also

To create a scatter plot matrix, see Recipe 5.13.

For more on subsetting data, see Recipe 15.7.