{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "title: \"In-class exercise Statistical Significance and Power: Solutions\"\n", "author: \"Name1, Name2\"\n", "\n", "## Part 1: Parametric vs. non-parametric statistics\n", "\n", "### Instructions\n", "\n", "#### 1. Install (if needed) and load the following packages" ] }, { "cell_type": "code", "metadata": {}, "source": [ "suppressPackageStartupMessages({\n", " library(tidyverse)\n", " library(assertthat)\n", " library(effsize)\n", "})" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2. Load the `runtime.csv` dataset from the course website" ] }, { "cell_type": "code", "metadata": {}, "source": [ "rt_data <- read_csv(\"https://homes.cs.washington.edu/~rjust/courses/CSEP590/in_class/04_stats/data/runtime.csv\", show_col_types=F) " ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For faster (local) exploration you may download the dataset and load it from a\n", "local file. *However, make sure that your final submission reads the data from\n", "the given URL.*\n", "\n", "#### 3. Inspect the data set" ] }, { "cell_type": "code", "metadata": {}, "source": [ "head(rt_data)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This dataset provides benchmark results (runtime data) for a new program\n", "analysis approach (`MySystem`), compared to a baseline approach (`Baseline`).\n", "Specifically, the columns provide the following information:\n", "\n", "* `Subject`: One of three benchmark programs (*tax*, *tictactoe*, *triangle*).\n", "* `VariantID`: ID of a program variant. *Each subject program has a different\n", " number of variants*. For a given subject program, all variants are\n", " enumerated, starting with an ID of 1. The expected number of\n", " variants are as follows:\n", " - *tax*: 99\n", " - *tictactoe*: 268\n", " - *triangle*: 122 \n", "* `RunID`: *Each* program *variant* was *analyzed 5 times* to account for\n", " variability in runtime measurements. \n", "* `Baseline`: The *runtime* of the *baseline system*.\n", "* `MySystem`: The *runtime* of the *new system*.\n", "\n", "Additional data expectations: runtime is strictly positive and the data set is\n", "complete.\n", "\n", "#### 4. Validate the data set\n", "Given the summary above, test for 3 expected properties of the data set, not\n", "counting the example assertion on number of subject programs (see Q1).\n", "\n", "(Optional: Thoroughly validate the data set beyond 3 expected properties.)\n", "\n", "*Note: If your validation reveals any failing assertions (1) ask the course\n", "staff whether these are expected and (2) comment out the assertion and move on.*" ] }, { "cell_type": "code", "metadata": {}, "source": [ "# Count unique subject names\n", "nSubj <- length(unique(rt_data$Subject))\n", "assert_that(3 == nSubj)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 5. Transform the data from wide to long format\n", "The output data frame should have the following columns (the order does not matter):\n", "\n", " * `Subject`\n", " * `VariantID`\n", " * `RunID`\n", " * `Approach`\n", " * `Runtime`" ] }, { "cell_type": "code", "metadata": {}, "source": [ "rt_data.long <- " ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 6. Aggregate runtime data\n", "Recall that each variant was analyzed 5 times (i.e., a deterministic program was\n", "executed 5 times on the same variant with identical inputs). Aggregate each of\n", "the 5 related runtime results -- using mean or median.\n", "*Provide a brief justification for your choice of mean vs. median (see Q2).*\n", "(Your choice may be informed by data or domain knowledge.)" ] }, { "cell_type": "code", "metadata": {}, "source": [ "rt_data.agg <-" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 7. Validate aggregation" ] }, { "cell_type": "code", "metadata": {}, "source": [ "assert_that(nrow(rt_data.agg) == nrow(rt_data.long)/5)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(Optional: Add additional assertions for data validation.)\n", "\n", "#### 8. Plot the aggregated data, using color coding and a faceting" ] }, { "cell_type": "code", "metadata": {}, "source": [ "ggplot(rt_data.agg) +\n", " geom_density(aes(x=Runtime, color=Approach)) +\n", " facet_grid(Subject~.) +\n", " theme_bw() + theme(legend.position=\"top\")" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read the syntax for `facet_grid` as: group the data by `Subject` and plot each\n", "subject on a separate row. More generally, `facet_grid` allows you to group your\n", "data and plot these groups individually by rows or columns (Syntax: ` ~\n", "`). For example, the following four configurations group and render the\n", "same data in different ways:\n", "\n", " * `facet_grid(Subject~.)`\n", " * `facet_grid(Subject~Approach)`\n", " * `facet_grid(.~Subject)`\n", " * `facet_grid(.~Subject+Approach)`\n", "\n", "A future lecture will discuss best practices for choosing a suitable\n", "visualization, depending on the underlying data and research questions.\n", "\n", "#### 9. Add a column for transformed data\n", "It is reasonable to assume that the runtime data is log-normally distributed.\n", "Add a column `RuntimeLog` that simply takes the `log` of the `Runtime` column." ] }, { "cell_type": "code", "metadata": {}, "source": [ "rt_data.agg <-" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 10. Plot transformed data" ] }, { "cell_type": "code", "metadata": {}, "source": [ "ggplot(rt_data.agg) +\n", " geom_density(aes(x=RuntimeLog, color=Approach)) +\n", " facet_grid(Subject~.) +\n", " theme_bw() + theme(legend.position=\"top\")" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 11. Test the difference(s) -- `Runtime` using the full data set" ] }, { "cell_type": "code", "metadata": {}, "source": [ "t <- t.test(Runtime~Approach, rt_data.agg)\n", "d <- cohen.d(Runtime~Approach, rt_data.agg)\n", "t.res <- tibble(subj=\"all\", data=\"Linear\", test=\"T\", p=t$p.value, eff=d$estimate, eff_qual=d$magnitude)\n", " \n", "u <- wilcox.test(Runtime~Approach, rt_data.agg)\n", "a <- VD.A(Runtime~Approach, rt_data.agg)\n", "u.res <- tibble(subj=\"all\", data=\"Linear\", test=\"U\", p=u$p.value, eff=a$estimate, eff_qual=a$magnitude)\n", "\n", "results <- bind_rows(t.res, u.res)\n", "results" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 12. Test the difference(s) -- `Runtime` vs. `RuntimeLog` and per subject\n", "Extend the code above (and the results data frame): add test results for all combinations of\n", "`Subject` x `{Runtime, RuntimeLog}` x `{t.test, wilcox.test}`. The final results\n", "data frame should provide 16 rows -- the results for *each subject as well as\n", "for all subjects (see Q3 and Q4)*.\n", "\n", "*Note: You are not graded on coding style or code efficiency. However, try to be\n", "as concise as possible.*" ] }, { "cell_type": "code", "metadata": {}, "source": [ "# Add additional rows to the results data frame\n", "\n", "# Test for completeness\n", "assert_that(nrow(results) == 16)\n", "\n", "# Print the final results data frame\n", "results" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: General properties of the U test\n", "\n", "*Note: This part is independent of part 1 and not related to the runtime data\n", "set.* In particular, *independent samples* in questions Q5 and Q6 refer to\n", "samples that you can make up (encoded manually or simulated using a common\n", "distribution) such that these samples satisfy the stated properties.\n", "\n", "\n", "#### 13. Code for questions Q5 and Q6\n", "Supporting code for Q5" ] }, { "cell_type": "code", "metadata": {}, "source": [ "# Create two samples A and B" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Supporting code for Q6" ] }, { "cell_type": "code", "metadata": {}, "source": [ "# Create two samples A and B" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Questions *(5 pts)*\n", "* Q1 Briefly justify your choice of data validation assertions (what informed\n", " your choices)? *(0.5 pts)*\n", "\n", "* Q2 Briefly justify your choice for aggregating the runtime data. *(0.5 pts)*\n", "\n", "* Q3 How did the data transformation of the aggregated `Runtime` values as well\n", " as the slicing by Subject affect the outcomes of the parametric and\n", " non-parametric tests (T vs. U)? Briefly explain your observations\n", " (considering differences in p values and effect sizes). *(1 pt)*\n", "\n", "* Q4 Given your understanding of the data-generation process and your\n", " observations about the data, indicate and justify which data analysis is\n", " preferable. (Consider possible decisions such as all subjects vs. per subject,\n", " transformed vs. non-transformed data, and parametric vs. non-parametric\n", " statistics.) *(1 pt)* \n", "\n", "* Q5 Consider the non-parametric U test: Create two independent samples A and B\n", " such that (1) each sample has five observations and (2) the p value is\n", " truly minimal when comparing A and B. State the null hypothesis for this U\n", " test and visualize the two samples with a point plot. *(0.5 pts)*\n", "\n", "* Q6 Consider the non-parametric U test: Create two independent samples A and B\n", " such that the p value is significant (p<0.05) but the medians are the same.\n", " Describe your approach (with a justification) to creating the two samples and\n", " visualize the two samples. (Depending on the samples, a point plot, histogram,\n", " or density plot may be an appropriate choice). *(1 pt)*\n", "\n", "* Q7 Under what assumption(s) can the U test of independent samples be\n", " interpreted as a significance test for the median? *(0.5 pts)*\n", "\n", "* Q8 (Optional) Additional validation efforts. *(up to 0.5 pts)*" ] } ], "metadata": { "kernelspec": { "name": "ir", "language": "R", "display_name": "R", "path": "/Users/rjust/Library/Jupyter/kernels/ir" } }, "nbformat": 4, "nbformat_minor": 4 }