It is often useful to sort data in a particular order, often by several columns, in which case the sorting order must be specified. Users of Excel will be aware that it is very easy to accidentally change the order of one column in a set of data without adjusting the others. Fortunately this is less of a problem in R.
R has a function called sort(), but it is not very helpful for data as it only sorts a single vector, as we shall see a better way is to use the order() function.
There are many inbuilt datasets with the R installation, accessed using the data() function.
The USArrests dataset contains a number of murder and assault statistics, based on 50 states, alongside the urban population size.
Input:
data(USArrests)
Let's attach the dataset to avoid the use of the $ sign
Input:
attach(USArrests)
And let's take a look at the first ten rows of the dataset
Input:
USArrests[1:10, ]
Output:
Murder Assault UrbanPop Rape
Alabama 13.2 236 58 21.2
Alaska 10.0 263 48 44.5
Arizona 8.1 294 80 31.0
Arkansas 8.8 190 50 19.5
California 9.0 276 91 40.6
Colorado 7.9 204 78 38.7
Connecticut 3.3 110 77 11.1
Delaware 5.9 238 72 15.8
Florida 15.4 335 80 31.9
Georgia 17.4 211 60 25.8
Suppose we want to sort the data by size of urban population. We could try using sort()
Input:
sort(UrbanPop)
Output:
32 39 44 44 45 45 48 48 50 51 52 53 54 56 57 58 59 60 60 62 63 65 66 66 66 66 67 67 68 70 70 72 72 73 74 75 77 78 80 80 80 80 81 83 83 85 86 87 89 91
However, we see that all this does is sort and output a single column.
Using the order function, we can sort first by population and secondly by murder rate.
Input:
sort1_USArrests<-USArrests[order(UrbanPop,Murder), ]
sort1_USArrests[1:15, ]
Output:
Murder Assault UrbanPop Rape
Vermont 2.2 48 32 11.2
West Virginia 5.7 81 39 9.3
North Dakota 0.8 45 44 7.3
Mississippi 16.1 259 44 17.1
South Dakota 3.8 86 45 12.8
North Carolina 13.0 337 45 16.1
Alaska 10.0 263 48 44.5
South Carolina 14.4 279 48 22.5
Arkansas 8.8 190 50 19.5
Maine 2.1 83 51 7.8
Kentucky 9.7 109 52 16.3
Montana 6.0 109 53 16.4
Idaho 2.6 120 54 14.2
New Hampshire 2.1 57 56 9.5
Iowa 2.2 56 57 11.3
Sometimes it is useful to sort a column in reverse order. this is achieved with a - (minus) sign.
Input:
sort2_USArrests<-USArrests[order(UrbanPop,-Murder), ]
sort2_USArrests[1:15,]
Output:
Murder Assault UrbanPop Rape
Vermont 2.2 48 32 11.2
West Virginia 5.7 81 39 9.3
Mississippi 16.1 259 44 17.1
North Dakota 0.8 45 44 7.3
North Carolina 13.0 337 45 16.1
South Dakota 3.8 86 45 12.8
South Carolina 14.4 279 48 22.5
Alaska 10.0 263 48 44.5
Arkansas 8.8 190 50 19.5
Maine 2.1 83 51 7.8
Kentucky 9.7 109 52 16.3
Montana 6.0 109 53 16.4
Idaho 2.6 120 54 14.2
New Hampshire 2.1 57 56 9.5
Iowa 2.2 56 57 11.3
