I have been working on Rosalind exercises for Bioinformatics stronghold on RNA Splicing. I am currently using Python 3.6 version. It didn't tell me there is any error in my code, so I'm assuming my code is fine. However, there is no output produced, no error warning or whatsoever. Below is my code:
DNA_CODON_TABLE = {
'TTT': 'F', 'CTT': 'L', 'ATT': 'I', 'GTT': 'V',
'TTC': 'F', 'CTC': 'L', 'ATC': 'I', 'GTC': 'V',
'TTA': 'L', 'CTA': 'L', 'ATA': 'I', 'GTA': 'V',
'TTG': 'L', 'CTG': 'L', 'ATG': 'M', 'GTG': 'V',
'TCT': 'S', 'CCT': 'P', 'ACT': 'T', 'GCT': 'A',
'TCC': 'S', 'CCC': 'P', 'ACC': 'T', 'GCC': 'A',
'TCA': 'S', 'CCA': 'P', 'ACA': 'T', 'GCA': 'A',
'TCG': 'S', 'CCG': 'P', 'ACG': 'T', 'GCG': 'A',
'TAT': 'Y', 'CAT': 'H', 'AAT': 'N', 'GAT': 'D',
'TAC': 'Y', 'CAC': 'H', 'AAC': 'N', 'GAC': 'D',
'TAA': '-', 'CAA': 'Q', 'AAA': 'K', 'GAA': 'E',
'TAG': '-', 'CAG': 'Q', 'AAG': 'K', 'GAG': 'E',
'TGT': 'C', 'CGT': 'R', 'AGT': 'S', 'GGT': 'G',
'TGC': 'C', 'CGC': 'R', 'AGC': 'S', 'GGC': 'G',
'TGA': '-', 'CGA': 'R', 'AGA': 'R', 'GGA': 'G',
'TGG': 'W', 'CGG': 'R', 'AGG': 'R', 'GGG': 'G'
}
def result(s):
result = ''
lines = s.split()
dna = lines[0]
introns = lines[1:]
for intron in introns:
dna = dna.replace(intron, '')
for i in range(0, len(dna), 3):
codon = dna[i:i+3]
protein = None
if codon in DNA_CODON_TABLE:
protein = DNA_CODON_TABLE[codon]
if protein == '-':
break
if protein:
result += protein
return ''.join(list(result))
if __name__ == "__main__":
"""small_dataset = ' '"""
large_dataset = open('rosalind_splc.txt').read().strip()
print (result(large_dataset))
This is the content in rosalind_splc.txt text file:
>Rosalind_3363
ATGGGGCTGAGCCCATGTCTAAATGATATCTTGGTGCATTGCAATCTAACTATTTTTTCG
CAACCATGTTCCATCTGGCGCAAAATGGGCGTGTAGGGAGCTTCGCTATAGTCACTGAAG
AACATTCGCAACTTACAGCTCTCGAGAGGGTACAGCTGGACGGTGTTTGTTTGGTCTAAG
TCTGAGTCCAAAGTCGTTGAATGTCGAGCTAGGTTGACGTCATTCTTCGAGTTACGTCTT
CATTGATTCGCGGCGGCCGCCAGCATTTGATTGTACACATCCGACGTCTTTGGCAATCTA
CATAATTATATTGAGAGGGGCGCCATTACTCGAACCCATAACAAACAACTGTCCGTTTAC
AAGGTTATATTATCATGACCTAATGGTTGAGCTACGGAGTGGGGGGCCCTCGGCTACAGG
TGTTAAACTATCCTGCGGATGCGGATCTTAGCCCGATTTGCATGGCCCAGTAAGGCGCTG
ATTGTAAACCGCCTAGCATACATGTGCTTCTTACTCCAGGGTCCATTGCTACCAGTTCGC
TTCTGACGCCTCAATTGTACCTTCCTTTTTTGAATGGCAACCTGCAATAGCAGTCGACTG
ATGGGGCGTTACAGTATGAAGGCTATATTTACATTATCTCTAAACACACTGCTACCGCGA
AACCCCAACTCGGACCGGTCAGAGCGCTCGTGCTTTGTTCTTGGTCGCTAGCGACCAACA
GTGGATAGGTGGGCGCGGGCCTTGCACCTCCTAGAGCATCACGTGGAGTGGATGCAAACA
GTCTATGGTCCCCCGCTTCGGCTCACGGGTAACGTCTCTTGTGGTACTAGACCATAGGCA
TCCAGGTGAGGGCTACATCCGTATTTAATGAAACTGAGTTCCTCCAAAGCTCCTCGGGAC
GCAGGCAGGTTCATCCGCAGTCAGTAAGGGAGGGAAGAGCTTTCCCCGTTCCACCCAGAT
GCCCTGTGCACGGGAGAGAGATCCAGGTGGTAG
>Rosalind_0423
TCGCAACTTACAGCTCTCGAGAGGG
>Rosalind_5768
GCCCAGTAAGGCGCTGATTGTAAACCGCCTAGCATACAT
>Rosalind_6780
GTCTTCATTGATTCGCGGCGGCCGCCAGCA
>Rosalind_6441
GCAAACAGTCT
>Rosalind_3315
TTGGTCGCTAGCGACCAACAGTGGATAGGTGGGCGCGGGCCTTGCACCT
>Rosalind_7467
TTATCTCTAAACACACTGC
>Rosalind_3159
CGCAGTCAGTAAGGGAGG
>Rosalind_6420
TCTAAGTCTGAGTCCAAAGTCGTTGAATGTCGAGCTAGGTTGACGT
>Rosalind_8344
GGGGCGCCATTACTCGAACCCATAACAAACAACT
>Rosalind_2993
CCAGGTGAGGGCTACATCCGTAT
>Rosalind_0536
ATTATCATGACCTAATG
>Rosalind_3774
TCGCAACCATGTTCCAT
>Rosalind_7168
GGGCCCTCGGCTACAGGTGTTAAACTAT
>Rosalind_8059
CAATTGTACCTTCCTTTTTTGAATG
Since there is no output given, I would like to know which part of my code need to be fixed in order for the output to come out. Thanks.
To understand which part of your code you need to change, it helps to understand what goes wrong in your code. If you have a code editor with a debugger, it helps to step through the code. If you don't have one, you can use the online tool http://pythontutor.com. Here is a direct link to your code with the first few lines of your input.
Click on the forward button under the code. At step 20 you jump into your function
result(). After step 24 your input is split on the newlines. You can see thatlinesis now:In step 25, you assign the first item of
linesto the variabledna. Sodnais now equal to>Rosalind_3363. You assign the rest of the items in the list to the variableintronsin the next step. So now we haveHere the first signs of trouble are already apparent. You probably expect
dnato contain a DNA sequence. But it contains the sequence header of the FASTA file. Similarly, introns should only contain DNA sequences as well, but here they also contains FASTA sequence headers (>Rosalind_0423,>Rosalind_5768).So what happens in the next lines doesn't make any sense anymore with the data you have now.
In the lines
you want to remove the introns from the DNA, but
dnadoesn't contain a DNA sequence string andintronscontains other things than substrings ofdna. So after this loop,dnastill equals>Rosalind_3363. None of the three letter sequences ofdna(>Ro,sal,ind, ...) are valid codons, so they are not found inDNA_CODON_TABLE. And hence,result()returns an empty string.Now my guess as to what happened. You lifted the code verbatim from the internet (it is exactly equal to the code here) without understanding what it does and without realizing that the original author had already preprocessed the input data.
So, what do you need to do to fix the code?
Bio.SeqIO.parse()dnavariableintronsvariable.