I have a python script that runs
for cmd in chr_commands:
info("Run: "+cmd)
subprocess.run(cmd)
Where each cmd is equivalent to
samtools view -F 0x0004 input.bam chr3:0-500000 2>> log.txt |awk 'BEGIN {OFS="\t"} {bpstart=$4; bpend=bpstart; split ($6,a,"[MIDNSHP]"); n=0;\
for (i=1; i in a; i++){\
n+=1+length(a[i]);\
if (substr($6,n,1)=="S"){\
if (bpend==$4)\
bpstart-=a[i];\
else \
bpend+=a[i]; \
}\
else if( (substr($6,n,1)!="I") && (substr($6,n,1)!="H") )\
bpend+=a[i];\
}\
if (($2 % 32)>=16)\
print $3,bpstart,bpend,"-",$1,$10,$11;\
else\
print $3,bpstart,bpend,"+",$1,$10,$11;}' | sort -k1,1 -k2,2n | awk \
'BEGIN{chr_id="NA";bpstart=-1;bpend=-1; fastq_filename="NA";num_records=0;fastq_records="";fastq_record_sep="";record_log_str = ""}\
{ if ( (chr_id!=$1) || (bpstart!=$2) || (bpend!=$3) )\
{\
if (fastq_filename!="NA") {if (num_records < 1000){\
record_log_str = record_log_str chr_id"\t"bpstart"\t"bpend"\t"num_records"\tNA\n"} \
else{print(fastq_records)>fastq_filename;close(fastq_filename); system("gzip -f "fastq_filename); record_log_str = record_log_str chr_id"\t"bpstart"\t"bpend"\t"num_records"\t"fastq_filename".gz\n"} \
}\
chr_id=$1; bpstart=$2; bpend=$3;\
fastq_filename=sprintf("REGION_%s_%s_%s.fastq",$1,$2,$3);\
num_records = 0;\
fastq_records="";\
fastq_record_sep="";\
}\
fastq_records=fastq_records fastq_record_sep "@"$5"\n"$6"\n+\n"$7; \
fastq_record_sep="\n"; \
num_records++; \
} \
END{ \
if (fastq_filename!="NA") {if (num_records < 1000){\
record_log_str = record_log_str chr_id"\t"bpstart"\t"bpend"\t"num_records"\tNA\n"} \
else{printf("%s",fastq_records)>fastq_filename;close(fastq_filename); system("gzip -f "fastq_filename); record_log_str = record_log_str chr_id"\t"bpstart"\t"bpend"\t"num_records"\t"fastq_filename".gz\n"} \
}\
print record_log_str > "chr3.info" \
}'
There are between 22-1000 of these commands to run. Each cmd should take a few seconds. Initially, it runs as expected, but after 2-5 loops the job starts hanging. It shows 100% CPU in htop but runs indefinitely.
If instead I write all of chr_commands to example.sh and run it from terminal with bash example.sh, it takes less than a minute to complete. I expect something deadlocks, but I don't know how to prevent it.
Originally the code used map_async which ran into the same issue where some jobs finished quickly but others would never stop. I have also tried with , shell=True, subprocess.call, subprocess.Popen with wait().
Not sure if it makes a difference, but the python code is in a singularity image that I run with Snakemake.