15.18 Summarizing Data with Standard Errors and Confidence Intervals

15.18.1 Problem

You want to summarize your data with the standard error of the mean and/or confidence intervals.

15.18.3 Discussion

The summarise() function computes the columns in order, so you can refer to previous newly-created columns. That’s why se can use the sd and n columns.

The n() function gets a count of rows, but if you want to have it not count NA values from a column, you need to use a different technique. For example, if you want it to ignore any NAs in the HeadWt column, use sum(!is.na(Headwt)).

15.18.3.1 Confidence Intervals {#_confidence_intervals}

Confidence intervals are calculated using the standard error of the mean and the degrees of freedom. To calculate a confidence interval, use the qt() function to get the quantile, then multiply that by the standard error. The qt() function will give quantiles of the t-distribution when given a probability level and degrees of freedom. For a 95% confidence interval, use a probability level of .975; for the bell-shaped t-distribution, this will in essence cut off 2.5% of the area under the curve at either end. The degrees of freedom equal the sample size minus one.

This will calculate the multiplier for each group. There are six groups and each has the same number of observations (10), so they will all have the same multiplier:

Now we can multiply that vector by the standard error to get the 95% confidence interval:

This could be done in one line, like this:

For a 99% confidence interval, use .995.

Error bars that represent the standard error of the mean and confidence intervals serve the same general purpose: to give the viewer an idea of how good the estimate of the population mean is. The standard error is the standard deviation of the sampling distribution. Confidence intervals are a little easier to interpret. Very roughly, a 95% confidence interval means that there’s a 95% chance that the true population mean is within the interval (actually, it doesn’t mean this at all, but this seemingly simple topic is way too complicated to cover here; if you want to know more, read up on Bayesian statistics).

This function will perform all the steps of calculating the standard deviation, count, standard error, and confidence intervals. It can also handle NAs and missing combinations, with the na.rm and .drop options. By default, it provides a 95% confidence interval, but this can be set with the conf.interval argument:

The following usage example has a 99% confidence interval and handles NAs and missing combinations:

15.18.4 See Also

See Recipe 7.7 to use the values calculated here to add error bars to a graph.