Homework #1
Due Thursday, January 15, 2009, at the beginning of class. Assignments turned in more than 5 minutes after the beginning of class will be penalized 10 points, with an additional 10 points every 24 hours thereafter.
(10 points) Here is a pair of aligned protein sequences:
GDIFYPGYCPDVKPVNKQFDLSAFAGAWHEIAKLP GDNFHLGKCPSPLPVQENFDVKKYLGRWYEIEKIPIf this alignment were to be included in the data set used to generate statistics for the BLOSUM matrices, which of the following matrices would it be used to help generate: BLOSUM90, BLOSUM80, BLOSUM62, BLOSUM52, BLOSUM45. Why?
(5 points) You can find a copy of the BLOSUM45 matrix at ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM45. Which amino acid has the largest number of negative scores associated with it? Why?
(10 points)
RVVNLVP----WVLATDYKNY QFFPLMPPAPYWILATDYENYScore the above alignment using
- BLOSUM45 and a linear gap penalty of -4
- BLOSUM80 with affine gap penalties: gap open of -9 and gap extension of -1.
You can find the BLOSUM80 matrix at ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM80. Be sure to show your work.
- (20 points) Draw and fill in the dynamic programming matrix to align these two sequences:
CATTC
andCGATC
. Use this substitution matrix:
A C G T A 2 -7 -3 -7 C -7 2 -7 -3 G -3 -7 2 -7 T -7 -3 -7 2 and use a fixed gap penalty of -5. What is the score of the optimal global alignment?
(10 points) Write a program that takes as input the first three command line arguments (after the program name) and prints them in uppercase letters on a single line with spaces between.
> python get-three-args.py con stan tinople CON STAN TINOPLE(15 points) Write a program similar to the previous one, but print the three arguments without spaces between.
> python get-three-args.py con stan tinople CONSTANTINOPLE(15 points) Write a program that takes as input two command line arguments: the first argument is a DNA or protein sequence, and the second is an integer n. Print the nth character in the given sequence.
> python get-nth-character.py curmudgeon 5 u(15 points) Write a program that takes as input two command line arguments, counts how many time the second one appears inside the first one, and then tells the user how many there are, like this:
> python count-substrings-in-string.py acgtacgtttgacgtacc acg The substring acg appears in the sequence acgtacgtttgacgtacc 3 times.