Homework Assignment #17
Objective
To employ dynamic programming to solve the gene sequence alignment problem.
Exercises
Question 1
Find the optimal alignment using dynamic programming (by hand only) of $x=$ ATGCC and $y=$ TACGCA . Use the same cost function we used in the lecture, namely: $c_{indel} = 5$; $c_{sub} = 1$; and $c_{match} = -3$.
(a) Show your DP table.
(b) Circle the optimal alignment cost.
(c ) Extract an optimal alignment from the table in the form of a sequence of edit operators. Break ties among optimal alternatives by preferring a backpointer leading to the left over a packpointer leading diagonally over a packpointer leading downward. Write the edit operators (not their coordinates) in forward order from start $(0,0)$ to finish $(m,n)$. Use the following notation:
to represent a match of an A with an A, for example, use Match(A)
to represent an indel involving a gap on the $x$ side and an A on the $y$ side, use Indel(-, A)
to represent a substitution of an A on the $x$ side for C on the $y$ side, use Sub(A, C)
(d) Translate the optimal alignment from part (c ) into the two-row, side-by-side form, where each letter in the first sequence (or a “-” for a gap) is adjacent to a letter (or a “-”) in the second sequence.
Question 2
Now find the optimal alignment using dynamic programming (by hand or in a spreadsheet) of the same sequences, ATGCC and TACGCA . Use a new cost function, namely: $c_{indel} = 1$; $c_{sub} = 2$; and $c_{match} = 0$.
(a) Show your DP table.
(b) Circle the optimal alignment cost.
(c ) Extract an optimal alignment from the table in the form of a sequence of edit operators by following the same instructions specified for question 1, part (c ).
(d) Translate the optimal alignment from part (c ) into the two-row, side-by-side form by following the same instructions specified for question 1, part (d).
Back to top