Chapter 5 Scatter Plots

Scatter plots are used to display the relationship between two continuous variables. In a scatter plot, each observation in a data set is represented by a point. Often, a scatter plot will also have a line showing the predicted values based on some statistical model. Adding this line is easy to do with R and the ggplot2 package, and can help to make sense of data when the trends aren’t immediately obvious just by looking at the points.

With large data sets, plotting every single observation in the data set can result in overplotting, when points overlap and obscure one another. To deal with the problem of overplotting, you’ll probably want to summarize the data before displaying it. We’ll also see how to do that in this chapter.