# 6 Descriptive Statistics

There are various functions for descriptive statistics in R. The below subsections show a selected sample.

Like in Microsoft Excel, we can cast centrality and spread functions on a variable.

``````k <- c(1, 5, 7, 9)
mean(k)``````
``##  5.5``
``````# Use the \$ operator for columns in a dataset
mean(mtcars\$mpg)``````
``##  20.09062``
``sd(mtcars\$mpg)``
``##  6.026948``

If you want to use multiple functions on a single variable, the `with()` function can be useful, as it lets you define the local environment to be the desired dataset so that you do not have to use the `\$` operator repeatedly.

``with(mtcars, c(mean = mean(mpg), median = median(mpg), sd = sd(mpg)))``
``````##      mean    median        sd
## 20.090625 19.200000  6.026948``````

## 6.2 Minimum and Maximum

To compute the minimum and maximum of a variable, we can use the `min()` and `max()` functions respectively.

``````x <- 1:10 # 1 through 10.

min(x)``````
``##  1``
``max(x)``
``##  10``

## 6.3 Data Dimensions

To know the dimensions of an object in R, we can use `nrow()/NROW` for the number of rows; `ncol()/NCOL()` for the number of columns; and `dim()` for number of both rows and columns simultaneously.

``NROW(mtcars)``
``##  32``
``NCOL(mtcars)``
``##  11``
``dim(mtcars)``
``##  32 11``

## 6.4 Data Summary

We can cast `summary()` on an object to capture summary information on an object. This function is useful following `str()`, as you can get a sense of what your dataset is like.

``````# Preview the dataset
str(iris)``````
``````## 'data.frame':    150 obs. of  5 variables:
##  \$ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  \$ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  \$ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  \$ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  \$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...``````
``````# Summarize the dataset.
summary(iris)``````
``````##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500
##        Species
##  setosa    :50
##  versicolor:50
##  virginica :50
##
##
## ``````

Note that because `Species` is a factor variable, we obtain counts by category for that column instead of quantiles and means like the others.

## 6.5 Frequency Tables

To get counts by groups, we can use the `table()` function, while using `prop.table()` on a `table()` computation produces proportions. The input of `table()` can be one to two columns and the output is a `table` class.

### 6.5.1 Single-variable Case

For the single-variable case, we can simply input our desired column into the `table()` function.

``````my_table <- table(iris\$Species)

my_table``````
``````##
##     setosa versicolor  virginica
##         50         50         50``````

Additionally, we can apply `prop.table()` on our `my_table` object to obtain proportions.

``prop.table(my_table)``
``````##
##     setosa versicolor  virginica
##  0.3333333  0.3333333  0.3333333``````

### 6.5.2 Multi-variable Case

For the case of multiple variables, we simply input the desired columns from a dataset.

``````my_table2 <- with(mtcars, table(am, gear))

my_table2``````
``````##    gear
## am   3  4  5
##   0 15  4  0
##   1  0  8  5``````

When you input 3 or more variables, R will present the results in a list-like fashion (note that the class is still `table`).

``````my_table3 <- with(mtcars, table(am, gear, cyl))

my_table3``````
``````## , , cyl = 4
##
##    gear
## am   3  4  5
##   0  1  2  0
##   1  0  6  2
##
## , , cyl = 6
##
##    gear
## am   3  4  5
##   0  2  2  0
##   1  0  2  1
##
## , , cyl = 8
##
##    gear
## am   3  4  5
##   0 12  0  0
##   1  0  0  2``````

### 6.5.3 Converting to a Data Frame

If we apply the `as.data.frame()` function to an object of a `table` class, the output would be structured in a way such that we have a column (or columns) containing the group(s) and a column for the frequency. The structure is useful, as it is in a format that is acceptable for CSV output, for example.

``````freq <- table(iris\$Species)
prop <- prop.table(freq)

as.data.frame(freq)``````
``````##         Var1 Freq
## 1     setosa   50
## 2 versicolor   50
## 3  virginica   50``````
``as.data.frame(prop)``
``````##         Var1      Freq
## 1     setosa 0.3333333
## 2 versicolor 0.3333333
## 3  virginica 0.3333333``````
``````my_table_df <- merge(as.data.frame(freq), as.data.frame(prop), by = 'Var1')

names(my_table_df) <- c('Species', 'Frequency', 'Percent')

my_table_df``````
``````##      Species Frequency   Percent
## 1     setosa        50 0.3333333
## 2 versicolor        50 0.3333333
## 3  virginica        50 0.3333333``````
``write.csv(my_table_df, 'my_example_table.csv')``

## 6.6 Summary

Table 6.1: Summary of Descriptive Statistics Functions
Function Description Example
mean(x) Computes the mean. mean(mtcars\$mpg)
sd(x) Computes the standard deviation. sd(mtcars\$mpg)
median(x) Computes the median. median(mtcars\$mpg)
min(x) Computes the minimum. min(mtcars\$mpg)
max(x) Computes the maximum. max(mtcars\$mpg)
nrow(x)/NROW(x) Computes the number of rows. nrow(mtcars); NROW(mtcars)
ncol(x)/NCOL(x) Computes the number of columns. ncol(mtcars); NCOL(mtcars)
dim(x) Computes the number of rows and columns. dim(mtcars)
length(x) Computes the number of elements in a data object. length(mtcars\$mpg)
summary(x) Summarizes a dataset. summary(mtcars)
table(x) Generates a frequency table for one or more variables. table(mtcars\$gear); with(mtcars, table(gear, am))
prop.table(table) Generates a proportions table. prop.table(table(mtcars\$gear))