CSE 599K
Empirical Research Methods

Winter 2025

NHST: statistical significance and effect size

Packages used

Tidyverse and Effsize

library(tidyverse)
library(effsize)

Example data set

Create a data set with two groups

Create a tibble with two groups (Treat and Ctrl) – each with 5 data points, say indicating the duration of a coding task.

Is this a significant difference?

Tidy up the data

Point plot of the data

Testing for significance

Parametric T test

Formula syntax vs. passing individual vectors

The formula syntax (Duration~Grp) on long data is equivalent to calling t.test with vectors when data is in wide format: t.test(t$Ctrl, t$Treat)

One-sided vs. two-sided tests

A two-sided test (no difference) is the default

Set the alternative argument for a one-sided test.

Non-parametric U test

Compute the U test result “by hand”

Create example data

Is this a significant difference?

Tidy up the data (for plotting)

Point plot of the data

Expected result

Expected results for the wilcox.test (U test), one-sided, and the A12 effect size

How do we compute the p value and A12?

All possible pairs with expand

Compute “wins”

Sum all “wins”

Rank all observations

Compute the p value

p-value: probability of observing the given outcome (W score), or a more extreme outcome.

Exercise: work out the math

  • How many possible ranking permutations are there in total?
  • How many ranking permutations have the same W score (or more extreme W
  • score) as the observed ranking?
  • (For a two-tailed test, consider extremes on both ends of the distribution.)

How do we compute the z test or t test result “by hand”?