Bar plots

Simple bar plot

When working with categorical data you might to want to create bar graphs of the data. This can be easily achieved by applying the table function to your vector of categorical data. The frequency counts for the categories are used as the input to the barplot function. The table object can be passed straight into barplot() as the first argument. There are then the standard plot arguments that can be used to add labels, titles, colours etc to the plot.

x <- c(rep("Cluster 4",24), rep("Cluster 5", 31), rep("Cluster 6",14))
xTab <- table(x)

Cluster 4 Cluster 5 Cluster 6 
       24        31        14 

barplot(xTab, xlab="Care cluster", ylab="Frequency", main="Occurrences of care clusters")


Grouped bar plot

An extension of the basic barplot are the stacked and grouped barplots. Below is an example of a grouped barplot. The input data for this type of plot is a two-way table. A two-way table is created using two vectors of categorical data that are of equal length. In the example below these are the care cluster entries for patients and the year of the entry. These two vectors are then entered as the arguments into the table function. The data will always be grouped by the second argument so in this instance it is grouped by year. If you wanted to reverse this and group the years by care cluster you would enter table(y,x).

As with the simple barplot above, the table object is the first argument entered into the barplot() function. The bars are coloured using the col argument. The beside argument is important as this differentiates between a stacked barplot if FALSE and a grouped barplot if TRUE. The legend is added to the plot using the legend.text=TRUE argument. To change the position and formatting of the legend the args.legend argument is called. The input into the argument is a list of the arguments we want to use to format the legend which in this instance are the position of the legend (x=,y=). The colours and names are carried over from the barplot() function.

x <- c(rep("Cluster 4",24), rep("Cluster 5", 31), rep("Cluster 6",14),rep("Cluster 4",17),
       rep("Cluster 5", 23), rep("Cluster 6",29))
y <- c(rep("2016",69),rep("2017",69))
xyTab <- table(x,y)

x           2016 2017
  Cluster 4   24   17
  Cluster 5   31   23
  Cluster 6   14   29

barplot(xyTab, xlab="Care cluster", ylab="Frequency", main="Occurrences of care clusters",
        col=c("red","green","blue"), beside=TRUE, legend.text=TRUE,