CSEP 590
Building Data Analysis Pipelines

Fall 2024

T test and linear regression
(two sides of the same coin)

Packages used

Tidyverse package

library(tidyverse)

Example data set

Create a data set with two groups

Two groups (grp-1 and grp-2) – each with 1000 data points, normally distributed with mean of -0.1 and 0.1, respectively.

Categorical (factor) variable for Grp to ease interpretation of lm output.

Difference in means

Plot the two distributions

Testing for significance

T test

Linear regression

Interpretation: lm vs. t.test

The linear model uses a single, categorical predictor (Grp)

  • lm uses dummy encoding for the two levels of Grp:
    • 0 = grp-1
    • 1 = grp-2
  • The model being fit is: Value ~ b1 * Grp + b0, with:
    • b0 (intercept): mean of grp-1, and
    • b1 (Grp coefficient): the difference between the means of grp-1 and grp-2.

Work out the math (OLS optimization) to see why the above must be true.