NHST: a concrete application
1. Packages used
library(tidyverse)
library(effsize)
2. Load the dataset
3. Inspect the data set
This dataset provides benchmark results (runtime data) for a new program analysis approach (MySystem
), compared to a baseline approach (Baseline
). Specifically, the columns provide the following information:
Subject
: One of three benchmark programs (tax, tictactoe, triangle).VariantID
: ID of a program variant. Each subject program has a different number of variants. For a given subject program, all variants are enumerated, starting with an ID of 1. variants are as follows:RunID
: Each program variant was analyzed 5 times to account for variability in runtime measurements.Baseline
: The runtime of the baseline system.MySystem
: The runtime of the new system.
Additional data expectations: runtime is strictly positive.
4. Tidy up the data
The output data frame should have the following columns (the order does not matter):
Subject
VariantID
RunID
Approach
Runtime
5. Aggregate runtime data
Recall that each variant was analyzed 5 times (i.e., a deterministic program was executed 5 times on the same variant with identical inputs). Aggregate each of the 5 related runtime results – using mean or median.
Think about the pros/cons and justify your choice of mean vs. median. (Your choice may be informed by data or domain knowledge.)
6. Plot the aggregated data
We use color coding and faceting for this visualization.
Read the syntax for facet_grid
as: group the data by Subject
and plot each subject on a separate row. More generally, facet_grid
allows you to group your data and plot these groups individually by rows or columns (Syntax: <rows> ~ <cols>
). For example, the following four configurations group and render the same data in different ways:
facet_grid(Subject~.)
facet_grid(Subject~Approach)
facet_grid(.~Subject)
facet_grid(.~Subject+Approach)
A future lecture will discuss best practices for choosing a suitable visualization, depending on the underlying data and research questions.
7. Transform the data
It is reasonable to assume that the runtime data is log-normally distributed. Add a column RuntimeLog
that simply takes the log
of the Runtime
column.
8. Plot transformed data
9. NHST: test the difference(s)
This is a simple (arguably too simple) model that uses the full data set. Think about alternative modeling approaches and implement them.