Histograms

A histograms are another useful type of plot particularly for exploring the shape of time based data. Let’s start by creating some inter-arrival time data using the rexp() function. This function takes two arguments 1) the number of values to be generated and 2) the average rate of occurrence per time unit. In the example below we will assume the use of hours as the time unit so by entering 10 as the rate in the rexp() function we are saying that on average 10 people arrive per hour.

The hist() function requires only this single vector of data, there is no need to manually calculate the breaks and frequency per break point. The hist function then takes many of the same arguments to the plot function such as the x and y axis labels and a title.


Input:
x <- rexp(1000, 10)
hist(x,xlab="Inter-arrival time (Time unit = hours)",ylab="Frequency",
     main="Inter-arrival time of people to emergency department")

Output:
histFig1

The standard output of a histogram can sometimes look a bit messy with axis ending prematurely and there not being a large enough number of breaks. To fix the formatting we can use the breaks argument to specify more or less break points. We can specify the x and y axis limits using the xlim and ylim arguments where we specify the minimum and maximum of the axes in the format c(min,max).


Input:
hist(x,xlab="Inter-arrival time (Time unit = hours)",ylab="Frequency",
     main="Inter-arrival time of people to emergency department",breaks=15,xlim=c(0,1),ylim=c(0,500))

Output:
histFig2

When creating histograms dynamically perhaps in a for loop or user defined function you will want to avoid hard coding the x and y axis limits. One way to specify them dynamically is using the min() and max() functions on your data and then adding a certain percentage on top of the maximum. To get the minimum and maximum frequencies of the y axis run the hist function assigning the output to a variable. The output from a histogram is a list of the data used to construct the histogram. One of these pieces of data is the count for each bar which can be accessed using $counts.


Input
h <- hist(x,xlab="Inter-arrival time (Time unit = hours)",ylab="Frequency",
     main="Inter-arrival time of people to emergency department",breaks=15)
h

Output:
$breaks
 [1] 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70
[16] 0.75 0.80

$counts
 [1] 404 236 145  89  46  34  11  12   6   7   5   1   2   0   0   2

$density
 [1] 8.08 4.72 2.90 1.78 0.92 0.68 0.22 0.24 0.12 0.14 0.10 0.02 0.04 0.00 0.00
[16] 0.04

$mids
 [1] 0.025 0.075 0.125 0.175 0.225 0.275 0.325 0.375 0.425 0.475 0.525 0.575
[13] 0.625 0.675 0.725 0.775

$xname
[1] "x"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

histFig3

Input:
maxH <- max(h$counts)
yMax <- (maxH + ((maxH/100)*20))
hist(x,xlab="Inter-arrival time (Time unit = hours)",ylab="Frequency",
     main="Inter-arrival time of people to emergency department",breaks=15,xlim=c(0,1),ylim=c(0,yMax))

Output:
histFig4