library(tidyverse)
library(nycflights23)
R Basics
Required packages
All packages are automatically loaded in this tutorial.
The tidyverse package is a metapackage – a collection of many related packages.
Quoting from the tidyverse website:
The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
Explore a given data frame
The flights
data frame is provided by the nycflights23 package. Usually, you will import data from an external source; a later tutorial will cover various data import options.
Column names
Structure and example data
First/last n rows
Assignments
All three statements are equivalent. R allows assignments at the beginning or at the end of a statement. Also note the two different assignment operators used (<-
vs. =
); you may use either, but using <-
for variable assignment and =
for named arguments may improve clarity and readability.
Summary statistics
A first (simple) visualization
Documentation
Questions
Q1 Look at the documentation for the
head
function (?head
). How many rows doeshead
print by default? What programming language feature is used to achieve this default behavior?Q2 Provide two syntactically different calls of the
head
function that both result in the same output of the first 3 rows of theflights
data frame. Which syntax is preferable (to you) and why?Q3 Given the output of
summary(flights)
, what do you observe in terms of data types and descriptive statistics? Compare the output forcarrier
andtime_hour
, which is more useful? What changes would improve usefulness?Q4 Change the plot to show the distribution of distance per month (i.e.,
x=month
). Look at the warning message: what is the root cause of this problem? What are possible solutions?Q5 Change the plot to show the distribution of distance per month, grouped by origin airport (i.e.,
x=month, fill=origin
). Look at the plot and/or any warning messages: what is the correct solution (conceptually) to avoid the issues observed in Q4 and Q5?