String concatenation and splitting

String concatenation

When working with text data you often want to stick pieces of text together or split them apart. This can be achieved using the paste() and strsplit() functions.

Sticking string variables together is called string concatenation. In the example below we can concatenate the string variables x and y by providing the variable names as the arguments in the function.


Input:
x <- 'one string'
y <- 'two strings'
con <- paste(x,y)
print(con)

Output:
[1] "one string two strings"

In the example above the two string variables are separated by whitespace (a space) as the default separator. If we want to control what is placed between the two string variables to separate them we add a third argument to the paste function sep= then add a string to separate them. If you do not want anything to separate your string variables you need to use "" otherwise whitespace will be added by default.


Input:
conSep <- paste(x,y,sep=" - ")
print(conSep)

Output:
[1] "one string - two strings"

String splitting

To split a string variable we use the strsplit() function. This function requires at least two arguments, the first is the string variable to be split and the second is the character(s) on which to split the string.

In the example below we split the string variable con using the whitespace as the split point. The split string is returned as a list of string variables (see the lists section for more information on handling list format data)


Input:
print(con)
splt <- strsplit(con, " ")
print(splt)

Output:
[1] "one string two strings"
[[1]]
[1] "one"     "string"  "two"     "strings"

The example above returned all of the words individually. In the example below we split the conSep variable with the hyphen between the words using the hyphen as a unique split point to return the original two pieces of text.


Input:
print(conSep)
spltSep <- strsplit(conSep, " - ")
print(spltSep)

Output:
[1] "one string - two strings"
[[1]]
[1] "one string"  "two strings"