The apply family

The apply family of functions exemplify the vectorised form of processing which makes R a statistical programming language. This set of functions are normally used when aggregating, transforming or iterating over data to produce new complex data structures as the output. A brief summary of the main apply functions is given below followed by examples of their use.

  • apply() – performs any function on an array over rows or columns
  • lapply() – performs any function on a dataframe, vector or list and returns a list
  • sapply() – performs any function on an list and tries to return the output in the simplest form
  • vapply() – performs any function on a vector and returns the output in a specified form
  • mapply() – performs a function multiple times returning the output in the simplest form
  • rapply() – recursively applies a function to a list and any sublists and returns a list with the original list structure
  • tapply() – performs any function on any data format that can be split into groups and returns a vector with results by group

Apply

In the example below, we start by creating a matrix of data. Then in the apply() function we enter the data over which the function is to work (x), the direction rows(1) or columns (2) and finally the function that we want to be applied. The mean function is a built-in function so can be written without brackets.


Input:
x <- matrix(rnorm(25),nrow=5,ncol=5)
x

Output:
-1.1530299	-0.43964399	0.6461054	-0.09678306	1.1527042
-2.2820481	0.03732001	-0.5070067	-2.12237654	-1.1424927
-1.1191257	0.71857743	-1.4357254	0.25865349	-0.2710555
-0.3153540	0.01545875	0.4006902	0.11588793	-1.0182689
0.7155965	-1.28879140	2.0635905	0.63958908	-1.1722304

Input:
y <- apply(x,2,mean)
y

Output:
-0.830792250214323 -0.191415841123202 0.233530803116317 -0.241005819826004 -0.49026865876211

lapply

The function lapply() works on a list object so we first create a list of three matrices. Into the lapply function four arguments are entered: 1) the name of the list object, 2) an opening square bracket in quotes " [[ " this is referred to as the selection operator, 3) no value for the row to select (note the two comma's) and 4) the column number which in this instance is 2. The function then extracts the second row from each of the three matrices and outputs a list that contains the three columns as vectors.


Input:
a <- matrix(rnorm(25),nrow=5,ncol=5)
b <- matrix(rnorm(25),nrow=5,ncol=5)
c <- matrix(rnorm(25),nrow=5,ncol=5)
matList <- list(a,b,c)
matList

Output:
1.0044540	-0.09192938	1.5451326	1.0815440	0.1923556
-0.3924538	-0.86621777	-0.5419108	-0.0111666	1.3141072
-0.9955435	-0.06541948	0.0456705	0.9710961	-0.4519248
-1.8986475	0.29051101	-0.3281682	0.1850529	0.6012198
-0.8635181	0.45342276	-2.1492485	1.0034427	-0.2542290

-0.8548116	-0.8276839	-0.5481886	-1.3143125	1.04299454
2.1171626	0.9819266	1.7733759	-0.7288635	0.06230784
0.2109100	-0.4840691	-1.5370679	-0.4075649	-1.52659144
0.4285750	-0.1722746	-1.7334228	-0.6831702	-0.82712265
-0.2541561	0.1224135	-0.5868828	-0.5530220	1.54618395

2.3337205	0.3251544	-0.9497848	0.7063923	0.5205705
-1.1468535	0.4165905	0.7705145	0.7704186	-0.6130642
-0.2478028	-0.4005098	0.5429765	0.7738654	-1.9499126
0.4060105	-1.2381091	-0.7615761	-0.4143838	0.3857224
-0.8456466	-0.1017032	1.4971350	-1.0950214	-0.3679984

Input:
y <- lapply(matList,"[",,2)
y

Output:
-0.0919293816833333 -0.866217768212912 -0.0654194795448853 0.290511005954111 0.453422757495705
-0.827683931917062 0.981926551091218 -0.484069126880197 -0.172274626713826 0.122413479190549
0.325154403450905 0.416590455174112 -0.400509792017686 -1.23810908046145 -0.101703244347273

Input:
y[1]

Output:
-0.0919293816833333 -0.866217768212912 -0.0654194795448853 0.290511005954111 0.453422757495705

sapply

The sapply() function works in much the same way as lapply except it returns the output in the simplest form. In the example below, y is returned as a vector and z is returned as an array.


Input:
y <- sapply(matList,"[",1,2)
y

Output:
-0.0919293816833333 -0.827683931917062 0.325154403450905

Input:
z <- sapply(matList, function(x) x+5)
z

Output:
6.004454	4.145188	7.333721
4.607546	7.117163	3.853147
4.004457	5.210910	4.752197
3.101352	5.428575	5.406010
4.136482	4.745844	4.154353
4.908071	4.172316	5.325154
4.133782	5.981927	5.416590
4.934581	4.515931	4.599490
5.290511	4.827725	3.761891
5.453423	5.122413	4.898297
6.545133	4.451811	4.050215
4.458089	6.773376	5.770514
5.045670	3.462932	5.542977
4.671832	3.266577	4.238424
2.850752	4.413117	6.497135
6.081544	3.685687	5.706392
4.988833	4.271137	5.770419
5.971096	4.592435	5.773865
5.185053	4.316830	4.585616
6.003443	4.446978	3.904979
5.192356	6.042995	5.520570
6.314107	5.062308	4.386936
4.548075	3.473409	3.050087
5.601220	4.172877	5.385722
4.745771	6.546184	4.632002

vapply

The vapply() function works solely on vectors and returns the output in a specified form which in this instance is numeric. This function is best used with complex user-defined functions that are to be applied over multiple vectors.


Input:
z <- vapply(y, function(x) x+5, numeric(1))
z

Output:
4.90807061831667 4.17231606808294 5.32515440345091

mapply

In the example below the mapply() function is iterating over the whole matrix and applying the function. Note the different order of the input arguments to the function.


Input:
a <- matrix(rnorm(25),nrow=5,ncol=5)
a

Output:
0.3669917	1.8954046	-0.49711160	-0.8095770	0.8790076
0.3357009	0.1163962	0.05962936	0.8998756	-0.7069533
0.9847834	-0.2352052	1.78994883	0.4930750	-1.0219635
-0.8288212	-1.1853601	0.33572082	-0.2599212	-0.2311300
-0.7365851	-1.1367864	-0.63415754	1.0041917	0.2427762

Input:
x <- mapply(function(x) x*2, a)
x

Output:
0.733983381885807 0.6714017172192 1.96956678382252 -1.65764231874593 -1.47317019070934 3.79080929884312 0.232792485246592 -0.470410408027959 -2.37072015127208 -2.2735728075773 -0.994223200215515 0.11925871401439 3.57989766877882 0.67144164516866 -1.26831508778958 -1.61915393110957 1.79975124795027 0.986150078689369 -0.51984244601826 2.00838330039842 1.75801522533198 -1.41390669911874 -2.04392703937856 -0.462260026403293 0.485552454055948

rapply

If you want to use an apply function with a list that has sub-lists you will need to use rapply. The rapply() function will return a list with the same structure as the input list when the argument how="list" is used. If this argument is not included then a simplified output will be returned.


Input:
l <- list(1,list(2,3),4,5,list(6,7,8),9)
l

Output:
1. 1
2.  A. 2
    B. 3
3. 4
4. 5
5.  A. 6
    B. 7
    C. 8
6. 9

Input:
x <- rapply(l,function(x) (x*2)^3)
x

Output:
8 64 216 512 1000 1728 2744 4096 5832

Input:
y <- rapply(l,function(x) (x*2)^3, how="list")
y

Output:
1. 8
2.  A. 64
    B. 216
3. 512
4. 1000
5.  A. 1728
    B. 2744
    C. 4096
6. 5832

tapply

The tapply() function uses data in a factor format to apply a function by group. This is common in dataframes where you will most likely be using mixed data types. In the example below, the male/female column is converted to the factor type (numerical categories) and the function is applied to the two groups. The order of the arguments in the function are 1) the column to which the function is to be applied, 2) the column containing the grouping variable and 3) the function being applied.


Input:
a <- 1:5
b <- c("F","M","M","F","F")
b <- as.factor(b)
c <- c(34,26,42,47,22)
d <- cbind(a,b,c)
data <- as.data.frame(d)
data

Output:
a	b	c
1	1	34
2	2	26
3	2	42
4	1	47
5	1	22

Input:
x <- tapply(data$c,b,sum,simplify=FALSE)
x

Output:
$F
103
$M
68