13.15 Creating a Mosaic Plot

13.15.1 Problem

You want to make a mosaic plot to visualize a contingency table.

13.15.2 Solution

Use the mosaic() function from the vcd package. For this example we’ll use the USBAdmissions data set, which is a contingency table with three dimensions. We’ll first take a look at the data in a few different ways:

The three dimensions are Admit, Gender, and Dept. To visualize the relationships between the variables (Figure 13.27), use mosaic() and pass it a formula with the variables that will be used to split up the data:

Mosaic plot of UC-Berkeley admissions data-the area of each rectangle is proportional to the number of cases in that cell

Figure 13.27: Mosaic plot of UC-Berkeley admissions data-the area of each rectangle is proportional to the number of cases in that cell

Notice that mosaic() splits the data in the order in which the variables are provided: first on admission status, then gender, then department. The resulting plot order makes it very clear that more applicants were rejected than admitted. It is also clear that within the admitted group there were many more men than women, while in the rejected group there were approximately the same number of men and women. It is difficult to make comparisons within each department, though. A different variable splitting order may reveal some other interesting information.

Another way of looking at the data is to split first by department, then gender, then admission status, as in Figure 13.28. This makes the admission status the last variable that is partitioned, so that after partitioning by department and gender, the admitted and rejected cells for each group are right next to each other:

Mosaic plot with a different variable splitting order: first department, then gender, then admission status

Figure 13.28: Mosaic plot with a different variable splitting order: first department, then gender, then admission status

We also specified a variable to highlight (Admit), and which colors to use in the highlighting.

13.15.3 Discussion

In the preceding example we also specified the direction in which each variable will be split. The first variable, Dept, is split vertically; the second variable, Gender, is split horizontally; and the third variable, Admit, is split vertically. The reason that we chose these directions is because, in this particular example, it makes it easy to compare the male and female groups within each department.

We can also use different splitting directions, as shown in Figure 13.29 and Figure 13.30:

Splitting Dept vertically, Gender vertically, and Admit horizontally

Figure 13.29: Splitting Dept vertically, Gender vertically, and Admit horizontally

Splitting Dept vertically, Gender horizontally, and Admit horizontally

Figure 13.30: Splitting Dept vertically, Gender horizontally, and Admit horizontally

The example here illustrates a classic case of Simpson’s paradox, in which a relationship between variables within subgroups can change (or reverse!) when the groups are combined. The UCBerkeley table contains admissions data from the University of California-Berkeley in 1973. Overall, men were admitted at a higher rate than women, and because of this, the university was sued for gender bias. But when each department was examined separately, it was found that they each had approximately equal admission rates for men and women. The difference in overall admission rates was because women were more likely to apply to competitive departments with lower admission rates.

In Figure 13.28 and Figure 13.29, you can see that within each department, admission rates were approximately equal between men and women. You can also see that departments with higher admission rates (A and B) were very imbalanced in the gender ratio of applicants: far more men applied to these departments than did women. As you can see, partitioning the data in different orders and directions can bring out different aspects of the data. In Figure 13.29, as in Figure 13.28, it’s easy to compare male and female admission rates within each department and across departments. Splitting Dept vertically, Gender horizontally, and Admit horizontally, as in Figure 13.30, makes it difficult to compare male and female admission rates within each department, but it is easy to compare male and female application rates across departments.

13.15.4 See Also

See ?mosaicplot for another function that can create mosaic plots.

P.J. Bickel, E.A. Hammel, and J.W. O’Connell, “Sex Bias in Graduate Admissions: Data from Berkeley,” Science 187 (1975): 398–404.

make a pie chart.

13.15.5 Solution

Use the pie() function. In this example (Figure 13.31), we’ll use the survey data set from the MASS library:

A pie chart

Figure 13.31: A pie chart

We passed pie() an object of class table. We could have instead given it a named vector, or a vector of values and a vector of labels like this, with the same result:

13.15.6 Discussion

The lowly pie chart is the subject of frequent abuse from data visualization experts. If you’re thinking of using a pie chart, consider whether a bar graph (or stacked bar graph) would convey the information more effectively. Despite their faults, pie charts do have one important virtue: everyone knows how to read them.