The apply family of functions exemplify the vectorised form of processing which makes R a statistical programming language. This set of functions are normally used when aggregating, transforming or iterating over data to produce new complex data structures as the output. A brief summary of the main apply functions is given below followed by examples of their use.
- apply() – performs any function on an array over rows or columns
- lapply() – performs any function on a dataframe, vector or list and returns a list
- sapply() – performs any function on an list and tries to return the output in the simplest form
- vapply() – performs any function on a vector and returns the output in a specified form
- mapply() – performs a function multiple times returning the output in the simplest form
- rapply() – recursively applies a function to a list and any sublists and returns a list with the original list structure
- tapply() – performs any function on any data format that can be split into groups and returns a vector with results by group
Apply
In the example below, we start by creating a matrix of data. Then in the apply()
function we enter the data over which the function is to work (x), the direction rows(1) or columns (2) and finally the function that we want to be applied. The mean function is a built-in function so can be written without brackets.
Input:
x <- matrix(rnorm(25),nrow=5,ncol=5)
x
Output:
-1.1530299 -0.43964399 0.6461054 -0.09678306 1.1527042
-2.2820481 0.03732001 -0.5070067 -2.12237654 -1.1424927
-1.1191257 0.71857743 -1.4357254 0.25865349 -0.2710555
-0.3153540 0.01545875 0.4006902 0.11588793 -1.0182689
0.7155965 -1.28879140 2.0635905 0.63958908 -1.1722304
Input:
y <- apply(x,2,mean)
y
Output:
-0.830792250214323 -0.191415841123202 0.233530803116317 -0.241005819826004 -0.49026865876211
lapply
The function lapply()
works on a list object so we first create a list of three matrices. Into the lapply function four arguments are entered: 1) the name of the list object, 2) an opening square bracket in quotes " [[ "
this is referred to as the selection operator, 3) no value for the row to select (note the two comma's) and 4) the column number which in this instance is 2. The function then extracts the second row from each of the three matrices and outputs a list that contains the three columns as vectors.
Input:
a <- matrix(rnorm(25),nrow=5,ncol=5)
b <- matrix(rnorm(25),nrow=5,ncol=5)
c <- matrix(rnorm(25),nrow=5,ncol=5)
matList <- list(a,b,c)
matList
Output:
1.0044540 -0.09192938 1.5451326 1.0815440 0.1923556
-0.3924538 -0.86621777 -0.5419108 -0.0111666 1.3141072
-0.9955435 -0.06541948 0.0456705 0.9710961 -0.4519248
-1.8986475 0.29051101 -0.3281682 0.1850529 0.6012198
-0.8635181 0.45342276 -2.1492485 1.0034427 -0.2542290
-0.8548116 -0.8276839 -0.5481886 -1.3143125 1.04299454
2.1171626 0.9819266 1.7733759 -0.7288635 0.06230784
0.2109100 -0.4840691 -1.5370679 -0.4075649 -1.52659144
0.4285750 -0.1722746 -1.7334228 -0.6831702 -0.82712265
-0.2541561 0.1224135 -0.5868828 -0.5530220 1.54618395
2.3337205 0.3251544 -0.9497848 0.7063923 0.5205705
-1.1468535 0.4165905 0.7705145 0.7704186 -0.6130642
-0.2478028 -0.4005098 0.5429765 0.7738654 -1.9499126
0.4060105 -1.2381091 -0.7615761 -0.4143838 0.3857224
-0.8456466 -0.1017032 1.4971350 -1.0950214 -0.3679984
Input:
y <- lapply(matList,"[",,2)
y
Output:
-0.0919293816833333 -0.866217768212912 -0.0654194795448853 0.290511005954111 0.453422757495705
-0.827683931917062 0.981926551091218 -0.484069126880197 -0.172274626713826 0.122413479190549
0.325154403450905 0.416590455174112 -0.400509792017686 -1.23810908046145 -0.101703244347273
Input:
y[1]
Output:
-0.0919293816833333 -0.866217768212912 -0.0654194795448853 0.290511005954111 0.453422757495705
sapply
The sapply()
function works in much the same way as lapply except it returns the output in the simplest form. In the example below, y is returned as a vector and z is returned as an array.
Input:
y <- sapply(matList,"[",1,2)
y
Output:
-0.0919293816833333 -0.827683931917062 0.325154403450905
Input:
z <- sapply(matList, function(x) x+5)
z
Output:
6.004454 4.145188 7.333721
4.607546 7.117163 3.853147
4.004457 5.210910 4.752197
3.101352 5.428575 5.406010
4.136482 4.745844 4.154353
4.908071 4.172316 5.325154
4.133782 5.981927 5.416590
4.934581 4.515931 4.599490
5.290511 4.827725 3.761891
5.453423 5.122413 4.898297
6.545133 4.451811 4.050215
4.458089 6.773376 5.770514
5.045670 3.462932 5.542977
4.671832 3.266577 4.238424
2.850752 4.413117 6.497135
6.081544 3.685687 5.706392
4.988833 4.271137 5.770419
5.971096 4.592435 5.773865
5.185053 4.316830 4.585616
6.003443 4.446978 3.904979
5.192356 6.042995 5.520570
6.314107 5.062308 4.386936
4.548075 3.473409 3.050087
5.601220 4.172877 5.385722
4.745771 6.546184 4.632002
vapply
The vapply()
function works solely on vectors and returns the output in a specified form which in this instance is numeric. This function is best used with complex user-defined functions that are to be applied over multiple vectors.
Input:
z <- vapply(y, function(x) x+5, numeric(1))
z
Output:
4.90807061831667 4.17231606808294 5.32515440345091
mapply
In the example below the mapply()
function is iterating over the whole matrix and applying the function. Note the different order of the input arguments to the function.
Input:
a <- matrix(rnorm(25),nrow=5,ncol=5)
a
Output:
0.3669917 1.8954046 -0.49711160 -0.8095770 0.8790076
0.3357009 0.1163962 0.05962936 0.8998756 -0.7069533
0.9847834 -0.2352052 1.78994883 0.4930750 -1.0219635
-0.8288212 -1.1853601 0.33572082 -0.2599212 -0.2311300
-0.7365851 -1.1367864 -0.63415754 1.0041917 0.2427762
Input:
x <- mapply(function(x) x*2, a)
x
Output:
0.733983381885807 0.6714017172192 1.96956678382252 -1.65764231874593 -1.47317019070934 3.79080929884312 0.232792485246592 -0.470410408027959 -2.37072015127208 -2.2735728075773 -0.994223200215515 0.11925871401439 3.57989766877882 0.67144164516866 -1.26831508778958 -1.61915393110957 1.79975124795027 0.986150078689369 -0.51984244601826 2.00838330039842 1.75801522533198 -1.41390669911874 -2.04392703937856 -0.462260026403293 0.485552454055948
rapply
If you want to use an apply function with a list that has sub-lists you will need to use rapply. The rapply()
function will return a list with the same structure as the input list when the argument how="list"
is used. If this argument is not included then a simplified output will be returned.
Input:
l <- list(1,list(2,3),4,5,list(6,7,8),9)
l
Output:
1. 1
2. A. 2
B. 3
3. 4
4. 5
5. A. 6
B. 7
C. 8
6. 9
Input:
x <- rapply(l,function(x) (x*2)^3)
x
Output:
8 64 216 512 1000 1728 2744 4096 5832
Input:
y <- rapply(l,function(x) (x*2)^3, how="list")
y
Output:
1. 8
2. A. 64
B. 216
3. 512
4. 1000
5. A. 1728
B. 2744
C. 4096
6. 5832
tapply
The tapply()
function uses data in a factor format to apply a function by group. This is common in dataframes where you will most likely be using mixed data types. In the example below, the male/female column is converted to the factor type (numerical categories) and the function is applied to the two groups. The order of the arguments in the function are 1) the column to which the function is to be applied, 2) the column containing the grouping variable and 3) the function being applied.
Input:
a <- 1:5
b <- c("F","M","M","F","F")
b <- as.factor(b)
c <- c(34,26,42,47,22)
d <- cbind(a,b,c)
data <- as.data.frame(d)
data
Output:
a b c
1 1 34
2 2 26
3 2 42
4 1 47
5 1 22
Input:
x <- tapply(data$c,b,sum,simplify=FALSE)
x
Output:
$F
103
$M
68