Fall 2024
Data wrangling
Tibble vs. data framemtcars is a built-in data set in R
Tibble vs. data frameTibbles printed to the console output only the first 10 rows
Print the entire df and tibble to see how they are rendered differently.
tidyverse packagesselectfiltermutategroup_bysummarizegroup_by)Slicing the built-in mtcars data set.

R’s indexing operators
The basic indexing operators [], [[]], and $ are useful for simple tasks but may return unexpected results and quickly become hard to read.
dplyr functionsExplicit indexing operations:
dplyr functionsExplicit indexing operations:
With local variables to improve readability:
dplyr functionsExplicit indexing operations:
With local variables to improve readability:
Local variables improve readability, but this approach is prone to errors.
dplyr functions and pipesExplicit indexing operations with pipes:
Optimize for readability: code is written once but read many times!
selectSelect specific columns (include):
selectSelect specific columns (exclude):
filterFilter rows:
filterFilter rows with %in%:
mutateAdd a column:
mutateChange a column type:
mutate and str_replaceChange column values (replace with regex):
We use the stringr package here
See str_replace, str_replace_all, and str_replace_na.
mutate and str_replace_allChange column values (replace with regex):
We use the stringr package here
Use c("<pattern>"="<new value>", ...) in str_replace_all.
rename_allgroup_byA grouped tibble affects downstream operations
Operations are applied to each group.
group_by and ngroup_by, n, and ungroupsummarizeHandling of NA values
The mean, median, etc. of a vector that includes NA values is always NA.
Use na.rm=T (or na.rm=TRUE) to drop NA values.
arrangeOrder data by column:
arrange orders in ascending order by default
Use arrange(desc(n_flights)) to order in descending order.
left_join