Troubleshooting Overlap Alignment Implementation in Python

64 views Asked by At

I am expected to solve the Overlap Alignment Problem where I have as: Input: A match reward, a mismatch penalty, an indel penalty, and two nucleotide strings v and w. Output: The score of an optimal overlap alignment of v and w, followed by an alignment of a suffix v' of v and a prefix w' of w achieving this maximum score.

An overlap alignment of strings v = v1 ... vn and w = w1 ... wm is a global alignment of a suffix of v with a prefix of w. An optimal overlap alignment of strings v and w maximizes the global alignment score between an i-suffix of v and a j-prefix of w (i.e., between vi ... vn and w1 ... wj) among all i and j.

def overlapAlignment(v, w, match_reward, mismatch_penalty, indel_penalty):
len1, len2 = len(v), len(w)

S = [[0 for j in range(len2 + 1)] for i in range(len1 + 1)]
backtrack = [[0 for j in range(len2 + 1)] for i in range(len1 + 1)]

maxScore = -float('inf')
maxIndex = None

for i in range(1, len1 + 1):
    for j in range(1, len2 + 1):
        # Match = match_reward, Mismatch/Indel = indel_penalty
        scoreList = [S[i - 1][j - 1] + (match_reward if v[i - 1] == w[j - 1] else -  mismatch_penalty),
                     S[i - 1][j] - indel_penalty, S[i][j - 1] - indel_penalty, 0]
        S[i][j] = max(scoreList)
        backtrack[i][j] = scoreList.index(S[i][j])

        if j == len2:
            if S[i][j] >= maxScore:
                maxScore = S[i][j]
                maxIndex = (i, j)

a, b = maxIndex
align1, align2 = v[:a], w[:b]

while a > 0 and b > 0:
    if backtrack[a][b] == 0:
        a -= 1
        b -= 1
    elif backtrack[a][b] == 1:
        a -= 1
        align2 = align2[:b] + '-' + align2[b:]
    else:
        b -= 1
        align1 = align1[:a] + '-' + align1[a:]

align1 = align1[a:]
align2 = align2[b:]

return str(maxScore), align1, align2

v = "GAGA"
w = "GAT"
match_reward = 1
mismatch_penalty = 1
indel_penalty = 2

score, aligned_v, aligned_w = overlapAlignment(v, w, match_reward, mismatch_penalty,    indel_penalty)
print("Alignment Score:", score)
print("Aligned v:", aligned_v)
print("Aligned w:", aligned_w)`

This is a code that I generated however it doesn't work in all instances. I tried modifying it for hours already but did not manage to make it work. Basically the problematic inputs are:

  1. Input: 1 1 5 ATCACT ATG
  2. Output: 0 CT AT

and

  1. Input: 2 3 1 CTT AGCATAAAGCATT

  2. Output: 0 --CT-T AGC-AT

There are of course more instances when this code doesn't produce desired outputs but these are the examples I have from the debug dataset.

Example of correct output would be the one for the input that is already in the code and for example: Input: 3 2 1 CAGAGATGGCCG ACG

Output: 5 -CG ACG

0

There are 0 answers