{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "title: \"In-class exercise Statistical Significance and Power: Solutions\"\n",
        "author: \"Name1, Name2\"\n",
       "\n",
        "## Part 1: Parametric vs. non-parametric statistics\n",
        "\n",
        "### Instructions\n",
        "\n",
        "#### 1. Install (if needed) and load the following packages"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "suppressPackageStartupMessages({\n",
        "  library(tidyverse)\n",
        "  library(assertthat)\n",
        "  library(effsize)\n",
        "})"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### 2. Load the `runtime.csv` dataset from the course website"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "rt_data <- read_csv(\"https://homes.cs.washington.edu/~rjust/courses/CSEP590/in_class/04_stats/data/runtime.csv\", show_col_types=F) "
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For faster (local) exploration you may download the dataset and load it from a\n",
        "local file. *However, make sure that your final submission reads the data from\n",
        "the given URL.*\n",
        "\n",
        "#### 3. Inspect the data set"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "head(rt_data)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This dataset provides benchmark results (runtime data) for a new program\n",
        "analysis approach (`MySystem`), compared to a baseline approach (`Baseline`).\n",
        "Specifically, the columns provide the following information:\n",
        "\n",
        "* `Subject`:   One of three benchmark programs (*tax*, *tictactoe*, *triangle*).\n",
        "* `VariantID`: ID of a program variant. *Each subject program has a different\n",
        "               number of variants*. For a given subject program, all variants are\n",
        "               enumerated, starting with an ID of 1. The expected number of\n",
        "               variants are as follows:\n",
        "    - *tax*: 99\n",
        "    - *tictactoe*: 268\n",
        "    - *triangle*: 122  \n",
        "* `RunID`:     *Each* program *variant* was *analyzed 5 times* to account for\n",
        "               variability in runtime measurements. \n",
        "* `Baseline`:  The *runtime* of the *baseline system*.\n",
        "* `MySystem`:  The *runtime* of the *new system*.\n",
        "\n",
        "Additional data expectations: runtime is strictly positive and the data set is\n",
        "complete.\n",
        "\n",
        "#### 4. Validate the data set\n",
        "Given the summary above, test for 3 expected properties of the data set, not\n",
        "counting the example assertion on number of subject programs (see Q1).\n",
        "\n",
        "(Optional: Thoroughly validate the data set beyond 3 expected properties.)\n",
        "\n",
        "*Note: If your validation reveals any failing assertions (1) ask the course\n",
        "staff whether these are expected and (2) comment out the assertion and move on.*"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# Count unique subject names\n",
        "nSubj <- length(unique(rt_data$Subject))\n",
        "assert_that(3 == nSubj)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### 5. Transform the data from wide to long format\n",
        "The output data frame should have the following columns (the order does not matter):\n",
        "\n",
        "  * `Subject`\n",
        "  * `VariantID`\n",
        "  * `RunID`\n",
        "  * `Approach`\n",
        "  * `Runtime`"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "rt_data.long <- "
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### 6. Aggregate runtime data\n",
        "Recall that each variant was analyzed 5 times (i.e., a deterministic program was\n",
        "executed 5 times on the same variant with identical inputs). Aggregate each of\n",
        "the 5 related runtime results -- using mean or median.\n",
        "*Provide a brief justification for your choice of mean vs. median (see Q2).*\n",
        "(Your choice may be informed by data or domain knowledge.)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "rt_data.agg <-"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### 7. Validate aggregation"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "assert_that(nrow(rt_data.agg) == nrow(rt_data.long)/5)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "(Optional: Add additional assertions for data validation.)\n",
        "\n",
        "#### 8. Plot the aggregated data, using color coding and a faceting"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "ggplot(rt_data.agg) +\n",
        "    geom_density(aes(x=Runtime, color=Approach)) +\n",
        "    facet_grid(Subject~.) +\n",
        "    theme_bw() + theme(legend.position=\"top\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Read the syntax for `facet_grid` as: group the data by `Subject` and plot each\n",
        "subject on a separate row. More generally, `facet_grid` allows you to group your\n",
        "data and plot these groups individually by rows or columns (Syntax: `<rows> ~\n",
        "<cols>`). For example, the following four configurations group and render the\n",
        "same data in different ways:\n",
        "\n",
        "  * `facet_grid(Subject~.)`\n",
        "  * `facet_grid(Subject~Approach)`\n",
        "  * `facet_grid(.~Subject)`\n",
        "  * `facet_grid(.~Subject+Approach)`\n",
        "\n",
        "A future lecture will discuss best practices for choosing a suitable\n",
        "visualization, depending on the underlying data and research questions.\n",
        "\n",
        "#### 9. Add a column for transformed data\n",
        "It is reasonable to assume that the runtime data is log-normally distributed.\n",
        "Add a column `RuntimeLog` that simply takes the `log` of the `Runtime` column."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "rt_data.agg <-"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### 10. Plot transformed data"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "ggplot(rt_data.agg) +\n",
        "    geom_density(aes(x=RuntimeLog, color=Approach)) +\n",
        "    facet_grid(Subject~.) +\n",
        "    theme_bw() + theme(legend.position=\"top\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### 11. Test the difference(s) -- `Runtime` using the full data set"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "t <- t.test(Runtime~Approach, rt_data.agg)\n",
        "d <- cohen.d(Runtime~Approach, rt_data.agg)\n",
        "t.res <- tibble(subj=\"all\", data=\"Linear\", test=\"T\", p=t$p.value, eff=d$estimate, eff_qual=d$magnitude)\n",
        " \n",
        "u <- wilcox.test(Runtime~Approach, rt_data.agg)\n",
        "a <- VD.A(Runtime~Approach, rt_data.agg)\n",
        "u.res <- tibble(subj=\"all\", data=\"Linear\", test=\"U\", p=u$p.value, eff=a$estimate, eff_qual=a$magnitude)\n",
        "\n",
        "results <- bind_rows(t.res, u.res)\n",
        "results"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### 12. Test the difference(s) -- `Runtime` vs. `RuntimeLog` and per subject\n",
        "Extend the code above (and the results data frame): add test results for all combinations of\n",
        "`Subject` x `{Runtime, RuntimeLog}` x `{t.test, wilcox.test}`. The final results\n",
        "data frame should provide 16 rows -- the results for *each subject as well as\n",
        "for all subjects (see Q3 and Q4)*.\n",
        "\n",
        "*Note: You are not graded on coding style or code efficiency. However, try to be\n",
        "as concise as possible.*"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# Add additional rows to the results data frame\n",
        "\n",
        "# Test for completeness\n",
        "assert_that(nrow(results) == 16)\n",
        "\n",
        "# Print the final results data frame\n",
        "results"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Part 2: General properties of the U test\n",
        "\n",
        "*Note: This part is independent of part 1 and not related to the runtime data\n",
        "set.* In particular, *independent samples* in questions Q5 and Q6 refer to\n",
        "samples that you can make up (encoded manually or simulated using a common\n",
        "distribution) such that these samples satisfy the stated properties.\n",
        "\n",
        "\n",
        "#### 13. Code for questions Q5 and Q6\n",
        "Supporting code for Q5"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# Create two samples A and B"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Supporting code for Q6"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# Create two samples A and B"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Questions *(5 pts)*\n",
        "* Q1 Briefly justify your choice of data validation assertions (what informed\n",
        "  your choices)? *(0.5 pts)*\n",
        "\n",
        "* Q2 Briefly justify your choice for aggregating the runtime data. *(0.5 pts)*\n",
        "\n",
        "* Q3 How did the data transformation of the aggregated `Runtime` values as well\n",
        "  as the slicing by Subject affect the outcomes of the parametric and\n",
        "  non-parametric tests (T vs. U)? Briefly explain your observations\n",
        "  (considering differences in p values and effect sizes). *(1 pt)*\n",
        "\n",
        "* Q4 Given your understanding of the data-generation process and your\n",
        "  observations about the data, indicate and justify which data analysis is\n",
        "  preferable. (Consider possible decisions such as all subjects vs. per subject,\n",
        "  transformed vs. non-transformed data, and parametric vs. non-parametric\n",
        "  statistics.) *(1 pt)* \n",
        "\n",
        "* Q5 Consider the non-parametric U test: Create two independent samples A and B\n",
        "  such that (1) each sample has five observations and (2) the p value is\n",
        "  truly minimal when comparing A and B. State the null hypothesis for this U\n",
        "  test and visualize the two samples with a point plot. *(0.5 pts)*\n",
        "\n",
        "* Q6 Consider the non-parametric U test: Create two independent samples A and B\n",
        "  such that the p value is significant (p<0.05) but the medians are the same.\n",
        "  Describe your approach (with a justification) to creating the two samples and\n",
        "  visualize the two samples. (Depending on the samples, a point plot, histogram,\n",
        "  or density plot may be an appropriate choice). *(1 pt)*\n",
        "\n",
        "* Q7 Under what assumption(s) can the U test of independent samples be\n",
        "  interpreted as a significance test for the median? *(0.5 pts)*\n",
        "\n",
        "* Q8 (Optional) Additional validation efforts. *(up to 0.5 pts)*"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "ir",
      "language": "R",
      "display_name": "R",
      "path": "/Users/rjust/Library/Jupyter/kernels/ir"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}