Simulations

Distributions and sampling

Packages used

suppressPackageStartupMessages({
  library(tidyverse)
  library(gganimate)
  library(gifski)
  library(animation)
})

Random seed

Fix the random seed for reproducibility.

set.seed(1)

The actual seed value doesn’t really matter here. We fix it so that we can refer to particular samples.

Uniform distribution

1M data points, uniformly distributed between 0 and 10.

uni <- tibble(Value=runif(1000000, 0, 10))

This creates a tibble with a single column Value.

Mean and standard deviation

mean(uni$Value)
[1] 4.999223
sd(uni$Value)
[1] 2.886293

Recall that uni$Value indexes the column Value in tibble (data frame) uni.

Density plot

Plot the density of all values in column Value in the tibble (data frame) uni.

ggplot(uni, aes(x=Value)) + geom_density() + theme_bw()

Drawing samples

Create a total of 10 samples and sample 10 values from the population for each.

S <- 10
N <- 100

l <- replicate(S, sample(uni$Value, N, replace=F), simplify=F)

# Name each entry in the list
names(l) <- seq(1, S)

# Create a tibble from the list
samples.wide <- as_tibble(l)

# What does this tibble look like?
samples.wide
# A tibble: 100 × 10
     `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  6.97 7.79  4.33   5.11  3.16 6.77  1.10  5.07   7.49 7.59 
 2  4.91 2.53  7.34   5.95  7.37 5.78  2.56  5.12   8.60 5.69 
 3  6.99 1.32  5.11   3.65  5.25 5.15  2.90  4.60   5.91 0.723
 4  9.68 0.565 8.65   7.96  5.87 2.38  8.94  0.468  7.78 9.67 
 5  6.54 6.60  5.08   4.11  1.17 3.09  9.25  0.426  4.08 7.65 
 6  8.50 3.94  7.59   9.67  3.77 0.135 0.558 3.26   4.92 4.93 
 7  5.35 5.86  7.13   1.92  7.62 0.835 9.97  4.64   8.36 1.61 
 8  6.36 9.30  0.517  7.86  3.92 1.67  4.05  9.36   9.74 3.30 
 9  8.80 5.68  0.863  3.13  8.90 2.80  5.63  5.08   9.75 7.03 
10  8.80 5.80  6.92   4.78  2.48 7.93  4.50  1.10   2.60 6.72 
# ℹ 90 more rows
What does this code do?

The end result is that samples.wide is a tibble with S columns – each column a sample of N values from the population.

  • sample samples N values from uni$Value without replacement.

  • replicate repeats the call to sample S times and returns a list of samples.

  • seq returns a vector of S values – simply enumerating the samples.

  • names uses the returned vector of seq to name the list elements from 1 to S.

  • as_tibble ultimately creates a tibble from the list with named elements.

We eventually want the samples to be in long format.

# samples.long <- ...
Wide to long data

There many ways of creating a tibble in long format.

You can use your R magic to convert from wide to long, or create a tibble in long format directly (e.g., bind_rows).

Plot population and samples.

The population from which we sample is plotted in black. The samples from this population are color-coded.

ggplot() + geom_density(data=samples.long, aes(x=Value, color=Sample)) +
           geom_density(data=uni, aes(x=Value), color="black") +
           theme_minimal()

Notes
  • Each geom_density here uses a different tibble (data frame).

  • Setting color or fill within the aesthetics aes leads to

    1. grouping of values by labels and
    2. each density plot having a different line color. This example uses a default color palette, which can be changed. (Note: If sample was a numeric column, a color gradient would be used in stead of distinct colors.)
  • Setting color outside the aesthetics aes leads to a single line color for the entire plot.

  • When breaking lines, make sure to put the + at the end of the line, not on the new line.

Animations

Animated sampling

Let’s plot the population and each sample again, but with an animation.

p <- ggplot() + geom_density(data=samples.long, aes(x=Value, color=Sample, fill=Sample)) +
           geom_density(data=uni, aes(x=Value), color="black") +
           theme_minimal() +
           labs(title = "Sample: {current_frame}") +
           gganimate::transition_manual(Sample)
gganimate::animate(p, fps=20)
`nframes` and `fps` adjusted to match transition

gganimate integrates really well with ggplot!

Galton board

Another example for a simulation: The Galton board.

makeplot <- function() {
  animation::quincunx(balls = 200, col.balls = rainbow(200))
}
gifski::save_gif(makeplot(), "/tmp/galton.gif", 700, 500, delay=0.1)

The animation package provides a number of example animations and simulations.
The save_gif function allows you to turn arbitrary sequences of plots (incl. non-ggplot ones) into an animation.

Computational notebooks

Rmarkdown notebooks

rmarkdown::render('in_class3.Rmd')
Cmd-line: Rscript -e "rmarkdown::render('in_class3.Rmd')"

Rmarkdown has a number of desirable properties:

  • Markdown syntax (as opposed to json).
  • Integrated into the R ecosystem.
  • Supported by modern rendering engines (e.g., Quarto).

Quarto notebooks

Quarto is a great option for developing notebooks. It supports code cells in different languages (incl. R and Python), and it supports a wide range of output formats.

Just like Rmarkdown, Quarto notebooks (.qmd) are written in markdown. (Quarto can render Rmd notebooks out of the box.)