CSE 599K
Empirical Research Methods

Winter 2025

NHST: z test vs. t test

Packages used

Tidyverse (subset) and BSDA

library(dplyr)
library(ggplot2)
library(tidyr)
# Provides z.test as a reference for our computations
library(BSDA)

Example data set

Create a population and 100 samples

Create a tibble of samples with two columns (Value and Sample).

How can we validate the samples tibble?

Plot all samples

What is the expected number of samples whose p-value is < 0.05?

Sample statistics

Augment the code to compute the z and t scores for each sample. Then, transform the aggregated tibble into long format.

Sample statistics (solution)

Augment the code to compute the z and t scores for each sample. Then, transform the aggregated tibble into long format.

Plot the aggregated data

Plot all t and z scores as a function of the sample mean. Put Mean on the x axis and color-code Score.

Questions

  • Why to the t scores vary around the diagonal, but the z scores do not?

  • Which of these z and t scores have a p-value < 0.05?

Finding the critical values for t and z

Recall the xnorm and xt functions.

Why do we use 0.025 and 0.975 here?

Filter samples based on critical z values

Reason through the difference between a one-tailed and two-tailed test.

P-value for one of the samples

Compute the p value for both the z and t score of sample 60.

Note that the default for the optional argument mu is 0.