Advanced plotting with ggplot2

This notebook is best used in conjunction with the recorded delivery of the training session which is available on https://youtu.be/5klSpGC2puU and the Advanced R presentation available in the https://gitlab.com/SManzi/r-for-healthcare-training.

Import the ggplot2 library

library(ggplot2)

Import the ‘mpg’ dataset that comes with ggplot2 and assign it to a ggplot object

mpg <- ggplot2::mpg   #import data

ggplot(data=mpg)  #assign data to ggplot object

Scatter plots and basic aesthetics

In the examples below ‘displ’ means engine displacement and ‘hwy’ means highway fuel efficency

# Map displ to the x-axis and hwy to the y-axis
ggplot(data=mpg) +
  geom_point(mapping=aes(x=displ,y=hwy))
# Colour the points by class of vehicle
ggplot(data=mpg) +
  geom_point(mapping=aes(x=displ,y=hwy, color=class))
# Change the size of the points based on the class of vehicle
ggplot(data=mpg) +
  geom_point(mapping=aes(x=displ,y=hwy, size=class))
# Change the opacity (alpha) of the points based on the class of vehicle
ggplot(data=mpg) +
  geom_point(mapping=aes(x=displ,y=hwy, alpha=class))
# Shape of the points is determined by the class of the vehicle
ggplot(data=mpg) +
  geom_point(mapping=aes(x=displ,y=hwy, shape=class))
# All points on the plot are coloured blue
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
# Add a title and axis labels to the plot using the 'labs' component
ggplot(data=mpg) + 
  geom_point(mapping=aes(x=displ, y=hwy),
             color="blue") +
  labs(title="Example scatterplot",
       x="Displacement", y="Highway efficency")

Facet wrap

Facet wrapping is used to determine how the plots are split up and organised, for example to graph our data by the class variable organised in 2 rows we use the facet wrap() function.

# separate into individual plots by class and arrange in 2 rows
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

Facet wrapping also allows you to plot by two variables to enable comparisons. We use the facet grid() function for this

# separate into individual plots by drive type and
# number of cylinders arranged as a grid
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_grid(drv ~ cyl)

Multiple geometry layers

There are lots of geometry layers, look at this https://rpubs.com/hadley/ggplot2-layers for an overview

# Plotting a smooth curve through the data
ggplot(data = mpg) + 
  geom_smooth(mapping = aes(x = displ, y = hwy))
# Plotting points and smooth curve
ggplot(data = mpg) + 
  geom_point(mapping=aes(x=displ,y=hwy, color=drv)) +
  geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))

Global mapping

If you define the mapping and aesthetics in the ggplot() object these parameters will be applied to any subsequent layers reducing replications in the code

# global mapping of the x and and y axis variables
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping=aes(color=drv)) +
  geom_smooth(mapping=aes(linetype=drv))

Bar plot

# quick bar plot
ggplot(data=mpg) +
  geom_bar(mapping=aes(x=class))

Statistical transformations

There are anumber of built in statistical transformations that can be performed on your data to produce new inputs to plot

ggplot(data=mpg) +
  stat_count(mapping=aes(x=class))

Coordinate transformations

Different coordinate systems can be used in your plots

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot()
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot() +
  coord_flip()

Saving your plots

bar <- ggplot(data = mpg) + 
  geom_bar(
    mapping = aes(x = class, fill = class), 
    show.legend = FALSE,
    width = 1
  ) + 
  theme(aspect.ratio = 1) +
  labs(x = NULL, y = NULL)

bar + coord_flip()
bar + coord_polar()

# The plot will be saved to your working directory
# unless otherwise specified
ggsave("my_plot.png", plot=bar)

Histogram

ggplot(data=mpg) +
  geom_histogram(mapping=aes(x=hwy),
                 col="black",
                 fill="grey")

Exercise

Using the ’midwest’ dataset create a graph or graphs that show something interesting about the data

mid <- ggplot2::midwest