You want to make a balloon plot, where the area of the dots is proportional to their numerical value.
scale_size_area(). For this example, we’ll filter the data set
countries to only include data from the year 2009, for certain countries we have specified in
library(gcookbook) # Load gcookbook for the countries data set c("Canada", "Ireland", "United Kingdom", "United States", countrylist <-"New Zealand", "Iceland", "Japan", "Luxembourg", "Netherlands", "Switzerland") countries %>% cdat <- filter(Year == 2009, Name %in% countrylist) cdat#> Name Code Year GDP laborrate healthexp infmortality #> 1 Canada CAN 2009 39599.04 67.8 4379.761 5.2 #> 2 Iceland ISL 2009 37972.24 77.5 3130.391 1.7 #> 3 Ireland IRL 2009 49737.93 63.6 4951.845 3.4 #> ...<4 more rows>... #> 8 Switzerland CHE 2009 63524.65 66.9 7140.729 4.1 #> 9 United Kingdom GBR 2009 35163.41 62.2 3285.050 4.7 #> 10 United States USA 2009 45744.56 65.0 7410.163 6.6
If we just map
size, the value of
GDP gets mapped to the radius of the dots (Figure 5.36, left), which is not what we want; a doubling of value results in a quadrupling of area, and this will distort the interpretation of the data. We instead want to map the value of
GDP to the area of the dots, which we can do this using
scale_size_area() (Figure 5.36, right):
# Create a base plot using the cdat data frame. We will call this base plot `cdat_sp` (for cdat scatter plot) ggplot(cdat, aes(x = healthexp, y = infmortality, size = GDP)) + cdat_sp <- geom_point(shape = 21, colour = "black", fill = "cornsilk") # GDP mapped to radius (default with scale_size_continuous) cdat_sp # GDP mapped to area instead, and larger circles + cdat_sp scale_size_area(max_size = 15)
The example here is a scatter plot, but that is not the only way to use balloon plots. It may also be useful to use balloon plots to represent values on a grid, where the x- and y-axes are categorical, as in Figure 5.37:
# Create a data frame that adds up counts for males and females HairEyeColor %>% hec <- # Convert to long format as_tibble() %>% group_by(Hair, Eye) %>% summarize(count = sum(n)) #> `summarise()` has grouped output by 'Hair'. You can override using the #> `.groups` argument. # Create the base balloon plot ggplot(hec, aes(x = Eye, y = Hair)) + hec_sp <- geom_point(aes(size = count), shape = 21, colour = "black", fill = "cornsilk") + scale_size_area(max_size = 20, guide = FALSE) + geom_text(aes( y = as.numeric(as.factor(Hair)) - sqrt(count)/34, label = count), vjust = 1.3, colour = "grey60", size = 4 ) hec_sp # Add red guide points + hec_sp geom_point(aes(y = as.numeric(as.factor(Hair)) - sqrt(count)/34), colour = "red", size = 1)
In this example we’ve used a few tricks to add the text labels under the circles. First, we used
vjust = 1.3 to justify the top of text slightly below the y coordinate. Next, we wanted to set the y coordinate so that it is at the bottom of each circle. This requires a little wrangling and arithmetic: we need to first convert the levels of
Eye into numeric values, which involves converting these variables from being a character vector to being a factor variable, and then converting them again into a numeric variable. We then take the numeric value of
Hair and subtract a small value from it, where the value depends in some way on count. This actually requires taking the square root of count, since the radius has a linear relationship with the square root of
count. The number that this value is divided by (34 in this case) is found by trial and error; it depends on the particular data values, radius, text size, and output image size.
To help find the correct y offset, we can add guide points in red and adjusted the value until they lined up with the bottom of each circle. Once we have the correct value, we can place the text and remove the points.
The text under the circles is in a shade of grey. This is so that it doesn’t jump out at the viewer and overwhelm the perceptual impact of the circles, but is still available if the viewer wants to know the exact values.