Introduction to Statistical and Computational Genomics

GENOME 559
Department of Genome Sciences
University of Washington School of Medicine


Course description

Rudiments of statistical and computational genomics. Emphasis on basic probability and statistics, and an introduction to computer programming. This course is intended to introduce students with non-computer science backgrounds to the major concepts of programming and statistics.

Learning objectives

After taking this course, students will be able to describe and perform basic analysis tasks relating to biological sequence analysis, phylogenetics, pedigree analysis, genetic association studies, population genetics and microarray analysis. Students will be able to demonstrate an understanding of fundamental statistical concepts, such as p-values, t-tests, chi-squared tests and multiple testing correction. Finally, students will be able to write computer programs to perform statistical and bioinformatics analyses.

Instructional staff

Instructor: William Stafford Noble
Email: noble@gs.washington.edu
Office hours: Friday, 1:30-2:30 pm
Office: Foege S220B (call from the first floor phone to access the building -- there is a directory next to the phone)

Instructor: Mary Kuhner
Email: mkkuhner@gs.washington.edu
Office: Foege S420C (call from the first floor phone to access the building -- there is a directory next to the phone)

Instructor: Larry Ruzzo
Email: ruzzo@cs.washington.edu
Office Hours: Wednesday, 11:00-1:00
Office: CSE 554

Meeting times and locations

Tue/Thu 3:30-4:50 pm in Hitchcock 220

The class meets in a computer lab and will involve writing computer programs during class time.

Web page

The course web page is located at http://noble.gs.washington.edu/~noble/genome559/.

Prerequisites

Substantial background in molecular and cellular biology, genetics, biochemistry or related disciplines.

Course materials

Learning Python by Lutz. O'Reilly, 2007. Third edition.

Course requirements

Students will complete eight homework assignments during the course. Assignments will typically involve some written questions and some programming problems.

Examinations

The final exam will be open book, and will cover the entire quarter. The final exam is scheduled for Thursday, March 19, 4:30-6:20 pm in Hitchcock 220.

Course grade

10% for each homework assignment, and 20% for the final exam.

Class schedule

Lecture Instructor Lecture topic Concepts Programming topic Reading Homework
Tu Jan 6 Noble Sequence comparison-intro and motivation Substitution matrices, gap penalties Introduction to python    
Th Jan 8 Noble Sequence comparison-dynamic programming Dynamic programming, Needleman-Wunsch Strings Wikipedia: Needleman-Wunsch; (Eddy Nat Biotech 2004); Lutz: ch. 1-4, 7 Hw1 assigned
Tu Jan 13 Noble Sequence comparison-more dynamic programming   Numbers lists and tuples (Nicholas Biotechniques 2000); Lutz: ch. 5, 8  
Th Jan 15 Noble Sequence comparison-local alignment Smith-Waterman File I/O, if-then-else Wikipedia: Smith-Waterman; Lutz: ch. 9-12 Hw1 due Hw2 assigned
Tu Jan 20 Noble Sequence comparison-significance of similarity scores distribution, p-value, extreme value distribution for loops Wikipedia: P-value; Altschul BLAST tutorial; Lutz: ch. 13  
Th Jan 22 Kuhner Phylogeny: Parsimony heuristic search, "assumption-free" methods while loops Lutz 248-253
Small.txt Large.txt
Hw2 due
Tu Jan 27 Kuhner Phylogeny: Models of the mutational process least squares Concepts in looping Lutz 265-271 Hw3 assigned, dna.txt input file
Th Jan 29 Kuhner Phylogeny: Likelihood maximum likelihood Dictionaries Lutz 160-169  
Tu Feb 3 Kuhner Phylogeny: Bayesian methods and MCMC Bayes' Theorem, Markov chains Defining functions Lutz 299-308 HW3 due, (solution) Hw4 assigned infile.txt input file
Th Feb 5 Kuhner Phylogeny: Validating phylogenies likelihood ratio test, bootstrap Sorting    
Tu Feb 10 Ruzzo Motifs Likelihood, LRT, MLE Regular expressions Testre.txt Ref 1 below; Lutz 75; Tutorial; Library reference HW4 due (solution)
Th Feb 12 Ruzzo Motifs Entropy Regular expressions    
Tu Feb 17 Ruzzo BLAST Heuristics; BLAST strengths & weaknesses Regular Expressions; "Comprehensions" Ref 2 below. Lutz 78, 272-275 Hw5
Th Feb 19 Ruzzo Multiple Alignment Progressive Alignment Objects, I Wikipedia: Multiple sequence alignment. Lutz Ch 22-23  
Tu Feb 24 Ruzzo Gene Prediction Markov models Objects, II Ref 3 below. Hw5 due (solution)
Th Feb 26 Ruzzo Gene Prediction Data Integration Objects, III Date.py   Hw6
Tu Mar  3 Ruzzo Probabilites on Pedigrees LOD Scores Biopython, I Wikipedia: Genetic linkage plus the section of Strachan & Read cited therein. Biopython, esp Tutorial&Cookbook  
Th Mar  5 Ruzzo Association LD & Chi-square Biopython, II Wikipedia: Pearson's chi-square test. Mathworld: Bonferroni Correction Hw6 due (solution) Hw7 assigned
Tu Mar 10 Ruzzo Association LD & Chi-square Biopython, III    
Th Mar 12 Ruzzo RNA Compensatory mutation; Mutual information Exceptions Ex1.py Ex2.py Either ref 4 or 5 below. Hw7solutions

padlock  Electronic access to journals is generally free from on-campus computers. For off-campus access, follow the "[offcampus]" links or look at the library "proxy server" instructions.  padlock

References

  1. GD Stormo, "DNA binding sites: representation and discovery." Bioinformatics, 16, #1 (2000) 16-23. PMID: 10812473 [offcampus]
  2. A Pertsemlidis, JW Fondon, "Having a BLAST with bioinformatics (and avoiding BLASTphemy)." Genome Biol., 2, #10 (2001) REVIEWS2002. PMID: 11597340 [offcampus]
  3. Harrow J, Nagy A, Reymond A, Alioto T, Patthy L, Antonarakis SE, Guigo R. "Identifying protein-coding genes in genomic sequences." Genome Biol. 2009 Jan 30;10(1):201. [Epub ahead of print] PMID: 19226436 [offcampus]
  4. Amaral PP, Dinger ME, Mercer TR, Mattick JS, "The eukaryotic genome as an RNA machine." Science. 2008 Mar 28;319(5871):1787-9. PMID: 18369136 [offcampus]
  5. Breaker RR, "Complex riboswitches." Science. 2008 Mar 28;319(5871):1795-7. PMID: 18369140 [offcampus]