## 15.2 Getting Information About a Data Structure

### 15.2.1 Problem

You want to find out information about an object or data structure.

### 15.2.2 Solution

Use the `str()` function:

``````str(ToothGrowth)
#> 'data.frame':    60 obs. of  3 variables:
#>  \$ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
#>  \$ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
#>  \$ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...``````

This tells us that `ToothGrowth` is a data frame with three columns, `len`, `supp`, and `dose`. `len` and `dose` contain numeric values, while `supp` is a factor with two levels.

Another useful function is the `summary()` function:

``````summary(ToothGrowth)
#>       len        supp         dose
#>  Min.   : 4.20   OJ:30   Min.   :0.500
#>  1st Qu.:13.07   VC:30   1st Qu.:0.500
#>  Median :19.25           Median :1.000
#>  Mean   :18.81           Mean   :1.167
#>  3rd Qu.:25.27           3rd Qu.:2.000
#>  Max.   :33.90           Max.   :2.000``````

Instead of showing you the first few values of each column as `str()` does, `summary()` provides basic descriptive statistics (the minimum, maximum, median, mean, and first & third quartile values) for numeric variables, and tells you the number of values corresponding to each character value or factor level if it is a character or factor variable.

### 15.2.3 Discussion

The `str()` function is very useful for finding out more about data structures. One common source of problems is a data frame where one of the columns is a character vector instead of a factor, or vice versa. This can cause puzzling issues with analyses or graphs.

When you print out a data frame the normal way, by just typing the name at the prompt and pressing Enter, factor and character columns appear exactly the same. The difference will be revealed only when you run `str()` on the data frame, or print out the column by itself:

``````tg <- ToothGrowth
tg\$supp <- as.character(tg\$supp)
str(tg)
#> 'data.frame':    60 obs. of  3 variables:
#>  \$ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
#>  \$ supp: chr  "VC" "VC" "VC" "VC" ...
#>  \$ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...``````
``````# Print out the columns by themselves
# From old data frame (factor)
ToothGrowth\$supp
#>  [1] VC VC VC VC VC VC VC VC VC VC VC VC VC VC VC VC VC VC VC VC VC VC VC VC
#> [25] VC VC VC VC VC VC OJ OJ OJ OJ OJ OJ OJ OJ OJ OJ OJ OJ OJ OJ OJ OJ OJ OJ
#> [49] OJ OJ OJ OJ OJ OJ OJ OJ OJ OJ OJ OJ
#> Levels: OJ VC
# From new data frame (character)
tg\$supp
#>  [1] "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC"
#> [15] "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC" "VC"
#> [29] "VC" "VC" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ"
#> [43] "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ" "OJ"
#> [57] "OJ" "OJ" "OJ" "OJ"``````