In-class exercise: ggplot2

Overview

The high-level goals of this exercise are: (1) reason about suitable data visualizations and (2) deepen your understanding of ggplot2.

This is a collaborative exercise:

  • Create a plot for each prompt/question below and submit your plots on Canvas.

  • If you get stuck on a question or anything is unclear while completing the exercise ask the course staff for clarification.

Code cell short cuts:

  • cmd + return: execute current line (step-by-step execution)

  • shift + return: execute current cell

Note

All required packages are automatically loaded.

#install.packages(c("maps", "mapproj"))
library(tidyverse)
library(xtable)
library(nycflights23)

Instructions

Data set

Recall the nycflights23 package, which provides the following data frames (tibbles):

  • airlines
  • airports
  • flights
  • planes
  • weather

Plots

Create plots with ggplot2 for the following prompts/questions.

  1. How many flights took place each hour, broken down by origin airport and month?
  1. What carrier has the largest proportion of cancelled flights?
  1. For the top-3 carriers (total number of flights) across all 3 origin airports in the summer (July-September), show the relationship between departure and arrival delay (use color coding and faceting where appropriate).
  1. For the top-10 destinations across all 3 origin airports, add a circle (or bubble) to the map. Make the circle size proportionate to the total number of flights to that destination. (Use color coding where appropriate).
  1. Create a plot of your own choice and design. (Please state the question your plot is trying to answer.)