13.1 Making a Correlation Matrix

13.1.1 Problem

You want to make a graphical correlation matrix.

13.1.2 Solution

We’ll look at the mtcars data set:

mtcars
#>                mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#>  ...<26 more rows>...
#> Ferrari Dino  19.7   6  145 175 3.62 2.770 15.50  0  1    5    6
#> Maserati Bora 15.0   8  301 335 3.54 3.570 14.60  0  1    5    8
#> Volvo 142E    21.4   4  121 109 4.11 2.780 18.60  1  1    4    2

First, generate the numerical correlation matrix using cor. This will generate correlation coefficients for each pair of columns:

mcor <- cor(mtcars)
# Print mcor and round to 2 digits
round(mcor, digits = 2)
#>        mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#> mpg   1.00 -0.85 -0.85 -0.78  0.68 -0.87  0.42  0.66  0.60  0.48 -0.55
#> cyl  -0.85  1.00  0.90  0.83 -0.70  0.78 -0.59 -0.81 -0.52 -0.49  0.53
#> disp -0.85  0.90  1.00  0.79 -0.71  0.89 -0.43 -0.71 -0.59 -0.56  0.39
#>  ...<5 more rows>...
#> am    0.60 -0.52 -0.59 -0.24  0.71 -0.69 -0.23  0.17  1.00  0.79  0.06
#> gear  0.48 -0.49 -0.56 -0.13  0.70 -0.58 -0.21  0.21  0.79  1.00  0.27
#> carb -0.55  0.53  0.39  0.75 -0.09  0.43 -0.66 -0.57  0.06  0.27  1.00

If there are any columns that you don’t want used for correlations (for example, a column of names), you should exclude them. If there are any NA cells in the original data, the resulting correlation matrix will have NA values. To deal with this, you will probably want to use the argument use="complete.obs" or use="pairwise.complete.obs".

To plot the correlation matrix (Figure 13.1), we’ll use the corrplot package:

# If needed, first install with install.packages("corrplot")
library(corrplot)

corrplot(mcor)
A correlation matrix

Figure 13.1: A correlation matrix

13.1.3 Discussion

The corrplot() function has many, many options. Here is an example of how to make a correlation matrix with colored squares and black labels, rotated 45 degrees along the top (Figure 13.2):

corrplot(mcor, method = "shade", shade.col = NA, tl.col = "black", tl.srt = 45)
Correlation matrix with colored squares and black, rotated labels

Figure 13.2: Correlation matrix with colored squares and black, rotated labels

It may also be helpful to display labels representing the correlation coefficient on each square in the matrix. In this example we’ll make a lighter palette so that the text is readable, and we’ll remove the color legend, since it’s redundant. We’ll also order the items so that correlated items are closer together, using the order="AOE" (angular order of eigenvectors) option. The result is shown in Figure 13.3:

# Generate a lighter palette
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))

corrplot(mcor, method = "shade", shade.col = NA, tl.col = "black", tl.srt = 45,
         col = col(200), addCoef.col = "black", cl.pos = "n", order = "AOE")
Correlation matrix with correlation coefficients and no legend

Figure 13.3: Correlation matrix with correlation coefficients and no legend

Like many other standalone plotting functions, corrplot() has its own menagerie of options, which can’t all be illustrated here. Table 13.1 lists some useful options.

Table 13.1: Options for corrplot()
Option Description
type={"lower" | "upper"} Only use the lower or upper triangle
diag=FALSE Don’t show values on the diagonal
addshade="all" Add lines indicating the direction of the correlation
shade.col=NA Hide correlation direction lines
method="shade" Use colored squares
method="ellipse" Use ellipses
addCoef.col="*color*" Add correlation coefficients, in color
tl.srt="*number*" Specify the rotation angle for top labels
tl.col="*color*" Specify the label color
order={"AOE" | "FPC" | "hclust"} Sort labels using angular order of eigenvectors, first principal component, or hierarchical clustering

13.1.4 See Also

To create a scatter plot matrix, see Recipe 5.13.

For more on subsetting data, see Recipe 15.7.