suppressPackageStartupMessages({
library(tidyverse)
library(gganimate)
library(gifski)
library(animation)
})
Simulations
Distributions and sampling
Packages used
Random seed
Fix the random seed for reproducibility.
set.seed(1)
The actual seed value doesn’t really matter here. We fix it so that we can refer to particular samples.
Uniform distribution
1M data points, uniformly distributed between 0 and 10.
<- tibble(Value=runif(1000000, 0, 10)) uni
This creates a tibble with a single column Value
.
Mean and standard deviation
mean(uni$Value)
[1] 4.999223
sd(uni$Value)
[1] 2.886293
Recall that uni$Value
indexes the column Value
in tibble (data frame) uni
.
Density plot
Plot the density of all values in column Value
in the tibble (data frame) uni
.
ggplot(uni, aes(x=Value)) + geom_density() + theme_bw()
Drawing samples
Create a total of 10 samples and sample 10 values from the population for each.
<- 10
S <- 100
N
<- replicate(S, sample(uni$Value, N, replace=F), simplify=F)
l
# Name each entry in the list
names(l) <- seq(1, S)
# Create a tibble from the list
<- as_tibble(l)
samples.wide
# What does this tibble look like?
samples.wide
# A tibble: 100 × 10
`1` `2` `3` `4` `5` `6` `7` `8` `9` `10`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 6.97 7.79 4.33 5.11 3.16 6.77 1.10 5.07 7.49 7.59
2 4.91 2.53 7.34 5.95 7.37 5.78 2.56 5.12 8.60 5.69
3 6.99 1.32 5.11 3.65 5.25 5.15 2.90 4.60 5.91 0.723
4 9.68 0.565 8.65 7.96 5.87 2.38 8.94 0.468 7.78 9.67
5 6.54 6.60 5.08 4.11 1.17 3.09 9.25 0.426 4.08 7.65
6 8.50 3.94 7.59 9.67 3.77 0.135 0.558 3.26 4.92 4.93
7 5.35 5.86 7.13 1.92 7.62 0.835 9.97 4.64 8.36 1.61
8 6.36 9.30 0.517 7.86 3.92 1.67 4.05 9.36 9.74 3.30
9 8.80 5.68 0.863 3.13 8.90 2.80 5.63 5.08 9.75 7.03
10 8.80 5.80 6.92 4.78 2.48 7.93 4.50 1.10 2.60 6.72
# ℹ 90 more rows
The end result is that samples.wide
is a tibble with S
columns – each column a sample of N
values from the population.
sample
samplesN
values fromuni$Value
without replacement.replicate
repeats the call tosample
S
times and returns a list of samples.seq
returns a vector ofS
values – simply enumerating the samples.names
uses the returned vector ofseq
to name the list elements from1
toS
.as_tibble
ultimately creates a tibble from the list with named elements.
We eventually want the samples to be in long format.
# samples.long <- ...
There many ways of creating a tibble in long format.
You can use your R magic to convert from wide to long, or create a tibble in long format directly (e.g., bind_rows
).
Plot population and samples.
The population from which we sample is plotted in black. The samples from this population are color-coded.
ggplot() + geom_density(data=samples.long, aes(x=Value, color=Sample)) +
geom_density(data=uni, aes(x=Value), color="black") +
theme_minimal()
Each geom_density here uses a different tibble (data frame).
Setting
color
orfill
within the aestheticsaes
leads to- grouping of values by labels and
- each density plot having a different line color. This example uses a default color palette, which can be changed. (Note: If
sample
was a numeric column, a color gradient would be used in stead of distinct colors.)
Setting
color
outside the aestheticsaes
leads to a single line color for the entire plot.When breaking lines, make sure to put the
+
at the end of the line, not on the new line.
Animations
Animated sampling
Let’s plot the population and each sample again, but with an animation.
<- ggplot() + geom_density(data=samples.long, aes(x=Value, color=Sample, fill=Sample)) +
p geom_density(data=uni, aes(x=Value), color="black") +
theme_minimal() +
labs(title = "Sample: {current_frame}") +
::transition_manual(Sample)
gganimate::animate(p, fps=20) gganimate
`nframes` and `fps` adjusted to match transition
gganimate
integrates really well with ggplot!
Galton board
Another example for a simulation: The Galton board.
<- function() {
makeplot ::quincunx(balls = 200, col.balls = rainbow(200))
animation
}::save_gif(makeplot(), "/tmp/galton.gif", 700, 500, delay=0.1) gifski
animation
package provides a number of example animations and simulations.
save_gif
function allows you to turn arbitrary sequences of plots (incl. non-ggplot ones) into an animation.
Computational notebooks
Rmarkdown notebooks
::render('in_class3.Rmd') rmarkdown
Rscript -e "rmarkdown::render('in_class3.Rmd')"
Rmarkdown has a number of desirable properties:
- Markdown syntax (as opposed to json).
- Integrated into the R ecosystem.
- Supported by modern rendering engines (e.g., Quarto).
Quarto notebooks
Quarto is a great option for developing notebooks. It supports code cells in different languages (incl. R and Python), and it supports a wide range of output formats.
Just like Rmarkdown, Quarto notebooks (.qmd) are written in markdown. (Quarto can render Rmd notebooks out of the box.)