I have a code like that:
#!/usr/bin/perl
use strict;
use warnings;
my %proteins = qw/
UUU F UUC F UUA L UUG L UCU S UCC S UCA S UCG S UAU Y UAC Y UGU C UGC C UGG W
CUU L CUC L CUA L CUG L CCU P CCC P CCA P CCG P CAU H CAC H CAA Q CAG Q CGU R CGC R CGA R CGG R
AUU I AUC I AUA I AUG M ACU T ACC T ACA T ACG T AAU N AAC N AAA K AAG K AGU S AGC S AGA R AGG R
GUU V GUC V GUA V GUG V GCU A GCC A GCA A GCG A GAU D GAC D GAA E GAG E GGU G GGC G GGA G GGG G
/;
open(INPUT,"<dna.txt");
while (<INPUT>) {
tr/[a,c,g,t]/[A,C,G,T]/;
y/GCTA/CGAU/;
foreach my $protein (/(...)/g) {
if (defined $proteins{$protein}) {
print $proteins{$protein};
}
}
}
close(INPUT);
This code is related to my other question's answer: DNA to RNA and Getting Proteins with Perl
The output of the program is:
SIMQNISGREAT
How can I rewrite that code with Perl, it will run on command line and it will be rewritten with less code(if possible one line code)?
PS 1: dna.txt is like that:
TCATAATACGTTTTGTATTCGCCAGCGCTTCGGTGT
PS 2: If the code will be less line, it is accepted to write the my %proteins variable into a file.
Somebody (@kamaci) called my name in another thread. This is the best I can come up with while keeping the protein table on the command line:
(Shell quoting, for Windows quoting swap
'and"characters). This version marks invalid codons with%, you can probably fix that by adding=~y/%//dat an appropriate spot.Hint: This picks out 6 bits from the raw ASCII encoding of an RNA triple, giving 64 codes between 0 and 101058048; to get a string index, I reduce the result modulo 63, but this creates one double mapping which regrettably had to code two different proteins. The
s/GGG/GGC/imaps one of them to another that codes the right protein.Also note the parentheses before the
%operator which both isolate the,operator from the argument list ofsubstrand fix the precedence of&vs%. If you ever use that in production code, you're a bad, bad person.