Introduction to Statistical and Computational GenomicsGENOME 559
Department of Genome Sciences
University of Washington School of Medicine
Course descriptionRudiments of statistical and computational genomics. Emphasis on basic probability and statistics, and an introduction to computer programming. This course is intended to introduce students with non-computer science backgrounds to the major concepts of programming and statistics.
Learning objectivesAfter taking this course, students will be able to describe and perform basic analysis tasks relating to biological sequence analysis, phylogenetics, pedigree analysis, genetic association studies, population genetics and microarray analysis. Students will be able to demonstrate an understanding of fundamental statistical concepts, such as p-values, t-tests, chi-squared tests and multiple testing correction. Finally, students will be able to write computer programs to perform statistical and bioinformatics analyses.
Instructor: William Stafford Noble
Office hours: Friday, 1:30-2:30 pm
Office: Foege S220B (call from the first floor phone to access the building -- there is a directory next to the phone)
Instructor: Mary Kuhner
Office: Foege S420C (call from the first floor phone to access the building -- there is a directory next to the phone)
Instructor: Larry Ruzzo
Office Hours: Wednesday, 11:00-1:00
Office: CSE 554
Meeting times and locations
Tue/Thu 3:30-4:50 pm in Hitchcock 220
The class meets in a computer lab and will involve writing computer programs during class time.
The course web page is located at http://noble.gs.washington.edu/~noble/genome559/.
Substantial background in molecular and cellular biology, genetics, biochemistry or related disciplines.
Learning Python by Lutz. O'Reilly, 2007. Third edition.
Students will complete eight homework assignments during the course. Assignments will typically involve some written questions and some programming problems.
The final exam will be open book, and will cover the entire quarter. The final exam is scheduled for Thursday, March 19, 4:30-6:20 pm in Hitchcock 220.
Course grade10% for each homework assignment, and 20% for the final exam.
Lecture Instructor Lecture topic Concepts Programming topic Reading Homework Tu Jan 6 Noble Sequence comparison-intro and motivation Substitution matrices, gap penalties Introduction to python Th Jan 8 Noble Sequence comparison-dynamic programming Dynamic programming, Needleman-Wunsch Strings Wikipedia: Needleman-Wunsch; (Eddy Nat Biotech 2004); Lutz: ch. 1-4, 7 Hw1 assigned Tu Jan 13 Noble Sequence comparison-more dynamic programming Numbers lists and tuples (Nicholas Biotechniques 2000); Lutz: ch. 5, 8 Th Jan 15 Noble Sequence comparison-local alignment Smith-Waterman File I/O,
Wikipedia: Smith-Waterman; Lutz: ch. 9-12 Hw1 due Hw2 assigned Tu Jan 20 Noble Sequence comparison-significance of similarity scores distribution, p-value, extreme value distribution
Wikipedia: P-value; Altschul BLAST tutorial; Lutz: ch. 13 Th Jan 22 Kuhner Phylogeny: Parsimony heuristic search, "assumption-free" methods
Hw2 due Tu Jan 27 Kuhner Phylogeny: Models of the mutational process least squares Concepts in looping Lutz 265-271 Hw3 assigned, dna.txt input file Th Jan 29 Kuhner Phylogeny: Likelihood maximum likelihood Dictionaries Lutz 160-169 Tu Feb 3 Kuhner Phylogeny: Bayesian methods and MCMC Bayes' Theorem, Markov chains Defining functions Lutz 299-308 HW3 due, (solution) Hw4 assigned infile.txt input file Th Feb 5 Kuhner Phylogeny: Validating phylogenies likelihood ratio test, bootstrap Sorting Tu Feb 10 Ruzzo Motifs Likelihood, LRT, MLE Regular expressions Testre.txt Ref 1 below; Lutz 75; Tutorial; Library reference HW4 due (solution) Th Feb 12 Ruzzo Motifs Entropy Regular expressions Tu Feb 17 Ruzzo BLAST Heuristics; BLAST strengths & weaknesses Regular Expressions; "Comprehensions" Ref 2 below. Lutz 78, 272-275 Hw5 Th Feb 19 Ruzzo Multiple Alignment Progressive Alignment Objects, I Wikipedia: Multiple sequence alignment. Lutz Ch 22-23 Tu Feb 24 Ruzzo Gene Prediction Markov models Objects, II Ref 3 below. Hw5 due (solution) Th Feb 26 Ruzzo Gene Prediction Data Integration Objects, III Date.py Hw6 Tu Mar 3 Ruzzo Probabilites on Pedigrees LOD Scores Biopython, I Wikipedia: Genetic linkage plus the section of Strachan & Read cited therein. Biopython, esp Tutorial&Cookbook Th Mar 5 Ruzzo Association LD & Chi-square Biopython, II Wikipedia: Pearson's chi-square test. Mathworld: Bonferroni Correction Hw6 due (solution) Hw7 assigned Tu Mar 10 Ruzzo Association LD & Chi-square Biopython, III Th Mar 12 Ruzzo RNA Compensatory mutation; Mutual information Exceptions Ex1.py Ex2.py Either ref 4 or 5 below. Hw7solutions
Electronic access to journals is generally free from on-campus computers. For off-campus access, follow the "[offcampus]" links or look at the library "proxy server" instructions.
- GD Stormo, "DNA binding sites: representation and discovery." Bioinformatics, 16, #1 (2000) 16-23. PMID: 10812473 [offcampus]
- A Pertsemlidis, JW Fondon, "Having a BLAST with bioinformatics (and avoiding BLASTphemy)." Genome Biol., 2, #10 (2001) REVIEWS2002. PMID: 11597340 [offcampus]
- Harrow J, Nagy A, Reymond A, Alioto T, Patthy L, Antonarakis SE, Guigo R. "Identifying protein-coding genes in genomic sequences." Genome Biol. 2009 Jan 30;10(1):201. [Epub ahead of print] PMID: 19226436 [offcampus]
- Amaral PP, Dinger ME, Mercer TR, Mattick JS, "The eukaryotic genome as an RNA machine." Science. 2008 Mar 28;319(5871):1787-9. PMID: 18369136 [offcampus]
- Breaker RR, "Complex riboswitches." Science. 2008 Mar 28;319(5871):1795-7. PMID: 18369140 [offcampus]