GENOME 560 Statistics for Genome Scientists
The data intensive nature of the 21st century biology made it very important for scientists to have a basic proficiency in statistics, as technological advances now allow complex and high dimensional datasets to be routinely collected. Whether it is thousands of gene expression levels that have been measured by microarrays, millions of polymorphisms that have been genotyped for a case control study of a disease phenotype, or more general questions of how to properly design an experiment, you will constantly be confronted with how to collect, analyze and interpret data through your research careers.
This course provides the key statistical concepts and methods necessary for extracting biological insights from these types of datasets. As this is only a five-week course, we will not be able to cover every specific topic that might arise in the course of your research. Thus, we will focus on rigorous understanding of fundamental concepts that will provide you with the tools necessary to address routine statistical analyses and the foundation to understand and learn more specialized topics.
Throughout this course, we will often make use of the freely available statistical software R (available at http://www.r-project.org/). R has become one of the most widely used platforms for statistical analysis in genomics, because it is powerful, easy to share code, and makes publications quality graphics. Problem sets will require the use of statistical software, and while you are free to use whatever you feel comfortable with (such as MatLab, SAS, STATA, or perhaps even Microsoft Excel), I highly encourage you to use R.