# 4 String Functions

String functions allow us to combine, pattern-match, and substitute character vectors. These functions are useful for detecting and recoding specific values.

## 4.1 Concatenate Strings

There are two concatenation functions we can use: `paste()` and `paste0()`. The former assumes you want to separate the concatenated elements with a space, whereas the latter will assume no separation.

``paste('a', 'b')``
``##  "a b"``
``paste('a', 'b', sep = '-')``
``##  "a-b"``
``paste0('a', 'b')``
``##  "ab"``

## 4.2 Subset Strings

In Excel, we can subset strings with `LEFT()`, `MID()`, and `RIGHT()`. In R, we can subset strings with `substr()`/`substring()`, which both act similarly as `MID()` from Excel.

``````x <- 'Albatross'

substr(x, 1, 4)``````
``##  "Alba"``
``substring(x, 5) # Goes to the end by default``
``##  "tross"``

## 4.3 Split Strings

We can split strings with the `strsplit()` function. The output is a list, where each list element is a character vector.

``````x <- c('This is a sentence.',
'This is another sentence.',
'This is yet another sentence.')

x``````
``````##  "This is a sentence."           "This is another sentence."
##  "This is yet another sentence."``````
``````# Split vector elements by space
my_split <- strsplit(x, split = ' ')

# Output is a list
my_split ``````
``````## []
##  "This"      "is"        "a"         "sentence."
##
## []
##  "This"      "is"        "another"   "sentence."
##
## []
##  "This"      "is"        "yet"       "another"   "sentence."``````

We can use `do.call()` and `c()` to combine these list elements into a single vector for a total of 13 elements. The function `do.call()` iteratively executes a function and `c()` (“combine”) combines elements into a vector.

``do.call(c, my_split)``
``````##   "This"      "is"        "a"         "sentence." "This"      "is"
##   "another"   "sentence." "This"      "is"        "yet"       "another"
##  "sentence."``````

## 4.4 Substitute Strings

We can make character substitutions with `gsub()`.

``````x <- c('This is a sentence.',
'This is another sentence.',
'This is yet another sentence.')

gsub('sentence', 'drink', x)``````
``````##  "This is a drink."           "This is another drink."
##  "This is yet another drink."``````

## 4.5 Match String Patterns

We can pattern-match strings with `grep()` and `grepl()`. The former outputs the position (or value) of a pattern match, while the latter outputs a Boolean value (i.e. `TRUE`/`FALSE`).

``````# Cars that start with "M"
grep('^M', rownames(mtcars), value = TRUE)``````
``````##   "Mazda RX4"     "Mazda RX4 Wag" "Merc 240D"     "Merc 230"
##   "Merc 280"      "Merc 280C"     "Merc 450SE"    "Merc 450SL"
##   "Merc 450SLC"   "Maserati Bora"``````
``````# Which cars start with and do not start with "M"?
grepl('^M', rownames(mtcars))``````
``````##    TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
##   TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE``````
``````# Selecting columns that start with "m".
# We set drop = FALSE to maintain a data frame.
head(mtcars[, grep('^m', names(mtcars)), drop = FALSE])``````
``````##                    mpg
## Mazda RX4         21.0
## Mazda RX4 Wag     21.0
## Datsun 710        22.8
## Hornet 4 Drive    21.4
## Valiant           18.1``````

Check out more regular expressions with RStudio’s cheat sheet on strings.

## 4.6 Summary

Table 4.1: Summary of String Functions
Function Description Example
paste(x, y)/paste0(x, y) Concatenation of x and y. paste(‘a’, ‘b’); paste0(‘a’, ‘b’)
substr(x, start, end) Subset strings. substr(‘Albatross’, 1, 4)
strsplit(x, split = ’ ’) Split a string by a splitting character.

x <- c(‘This is a sentence.’, ‘This is another sentence.’, ‘This is yet another sentence.’)

strsplit(x, split = ’ ’)
gsub(pattern, replacement, x) Substitute a portion of a string vector based on a given pattern. gsub(‘sentence’, ‘drink’, ‘This is a sentence.’)
grep/grepl(pattern, vector) Pattern match a string and output its position OR Boolean (i.e. TRUE/FALSE). grep(‘^M’, rownames(mtcars), value = TRUE)