Data Visualization (reference card)

Basic ggplot

  • ggplot() creates blank objects and defines default values
  • data sets the dataframe
  • mapping sets what columns of the data frame correspond to different parts of the plot
  • The mapping argument must be an “aesthetic” create using the aes() function
  • Add a geom_ to represent the data, e.g., geom_point() for point
ggplot(data = df, mapping = aes(x = col_name_1, y = col_name_2)) +
  geom_point()

Customizing geom

  • Add optional arguments to the geom to change how it looks
ggplot(data = df, mapping = aes(x = col_name_1, y = col_name_2)) +
  geom_point(size = 3, color = "blue", alpha = 0.5)

Rescaling axes

  • You can change the scales of things using scale_ functions
  • To scale axes logarithmically
ggplot(data = df, mapping = aes(x = col_name_1, y = col_name_2)) +
  scale_y_log10() +
  scale_x_log10()

Grouping

Color

  • Color points depending on the value of a categorical variable
ggplot(df, aes(x = col_name_1, y = col_name_2, color = TREATMENT)) +
  geom_point()

Facets (i.e., subplots)

  • Split the data into facets/subplots depending on the value of a categorical variable
ggplot(df, aes(x = col_name_1, y = col_name_2)) +
  geom_point() +
  facet_wrap(vars(TREATMENT))

Bar plot

ggplot(df, aes(x = col_name)) +
  geom_bar()

Histograms

ggplot(df, aes(x = col_name)) +
  geom_histogram(fill = "red", bins = 15)

Stacked

ggplot(df, aes(x = col_name_1, color = col_name_2)) +
  geom_histogram()

Unstacked

ggplot(df, aes(x = col_name_1, color = col_name_2)) +
  geom_histogram(position = "identity", alpha = 0.5)

Multiple layers

  • Add more than one geom to get more than one layer of data
  • E.g., one layer of point data and one layer showing a smooth of the data
ggplot(df, aes(x = col_name_1, y = col_name_2)) +
  geom_point() +
  geom_smooth(method = "lm")

Changing values across layers

  • If you set an aesthetic in the geom, it will only apply to that layer
  • So to color the points by col_name_3, but create one smooth for all points
ggplot(df, aes(x = col_name_1, y = col_name_2)) +
  geom_point(mapping = aes(color = col_name_3)) +
  geom_smooth(method = "lm")

Saving plots

ggsave("name_for_file.jpg")