Sorting

It is often useful to sort data in a particular order, often by several columns, in which case the sorting order must be specified. Users of Excel will be aware that it is very easy to accidentally change the order of one column in a set of data without adjusting the others. Fortunately this is less of a problem in R.

R has a function called sort(), but it is not very helpful for data as it only sorts a single vector, as we shall see a better way is to use the order() function.

There are many inbuilt datasets with the R installation, accessed using the data() function.

The USArrests dataset contains a number of murder and assault statistics, based on 50 states, alongside the urban population size.


Input:
data(USArrests)

Let's attach the dataset to avoid the use of the $ sign


Input:
attach(USArrests)

And let's take a look at the first ten rows of the dataset


Input:
USArrests[1:10, ]

Output:
                Murder	Assault	UrbanPop  Rape
Alabama	        13.2	236	58	  21.2
Alaska	        10.0	263	48	  44.5
Arizona	        8.1	294	80	  31.0
Arkansas	8.8	190	50	  19.5
California	9.0	276	91	  40.6
Colorado	7.9	204	78	  38.7
Connecticut	3.3	110	77	  11.1
Delaware	5.9	238	72	  15.8
Florida	        15.4	335	80	  31.9
Georgia	        17.4	211	60	  25.8

Suppose we want to sort the data by size of urban population. We could try using sort()


Input:
sort(UrbanPop)

Output:
32 39 44 44 45 45 48 48 50 51 52 53 54 56 57 58 59 60 60 62 63 65 66 66 66 66 67 67 68 70 70 72 72 73 74 75 77 78 80 80 80 80 81 83 83 85 86 87 89 91

However, we see that all this does is sort and output a single column.

Using the order function, we can sort first by population and secondly by murder rate.


Input:
sort1_USArrests<-USArrests[order(UrbanPop,Murder), ]
sort1_USArrests[1:15, ]

Output:
                Murder	Assault	UrbanPop  Rape
Vermont	        2.2	48	32	  11.2
West Virginia	5.7	81	39	  9.3
North Dakota	0.8	45	44	  7.3
Mississippi	16.1	259	44	  17.1
South Dakota	3.8	86	45	  12.8
North Carolina	13.0	337	45	  16.1
Alaska	        10.0	263	48	  44.5
South Carolina	14.4	279	48	  22.5
Arkansas	8.8	190	50	  19.5
Maine	        2.1	83	51	  7.8
Kentucky	9.7	109	52	  16.3
Montana	        6.0	109	53	  16.4
Idaho	        2.6	120	54	  14.2
New Hampshire	2.1	57	56	  9.5
Iowa	        2.2	56	57	  11.3

Sometimes it is useful to sort a column in reverse order. this is achieved with a - (minus) sign.


Input:
sort2_USArrests<-USArrests[order(UrbanPop,-Murder), ]
sort2_USArrests[1:15,]

Output:
                Murder	Assault	UrbanPop  Rape
Vermont	        2.2	48	32	  11.2
West Virginia	5.7	81	39	  9.3
Mississippi	16.1	259	44	  17.1
North Dakota	0.8	45	44	  7.3
North Carolina	13.0	337	45	  16.1
South Dakota	3.8	86	45	  12.8
South Carolina	14.4	279	48	  22.5
Alaska	        10.0	263	48	  44.5
Arkansas	8.8	190	50	  19.5
Maine	        2.1	83	51	  7.8
Kentucky	9.7	109	52	  16.3
Montana	        6.0	109	53	  16.4
Idaho	        2.6	120	54	  14.2
New Hampshire	2.1	57	56	  9.5
Iowa	        2.2	56	57	  11.3