Why does wget fail when running multiple wget commands targeting the ENA in a single shell script?

123 views Asked by At

I wanted to download FASTQ files associated with a particular BioProject (PRJEB21446) from the European Nucleotide Archive. There is a button to generate and download a shell script containing wget commands for all FASTQ files associated with the BioProject. Great! That gives me a script with the following commands:

wget -nc [ftp-link-to-sample1.fastq.gz]
wget -nc [ftp-link-to-sample2.fastq.gz]
...
wget -nc [ftp-link-to-sample40.fastq.gz]

EDIT: Here are the first 5 lines of the script from ENA:

wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/004/ERR2014384/ERR2014384_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/006/ERR2014386/ERR2014386_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/001/ERR2014361/ERR2014361_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/009/ERR2014369/ERR2014369_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/007/ERR2014367/ERR2014367_1.fastq.gz

However, when I tried to run the script using sh script_from_ENA.sh, the first file downloads without any problems, but all files after that are stuck at 0% for about 20 seconds, then show the following:

2023-08-14 10:54:01 (0.00 B/s) - Data transfer aborted.
Retrying.

wget then attempts to download the same file over and over again with no success.

After spending all morning trying various workarounds, I eventually solved the problem by putting all the URLs into a single file and running wget in a for loop, like so:

sed 's/wget -nc //' script_from_ENA.sh > url-list
for i in `cat url-list` ; do wget -nc $i ; done

This worked like a charm and the files downloaded without any problem, but I'm still curious as to why the script generated by ENA didn't work. Was it an issue with wget or the ENA servers cutting me off?

If anyone can offer insight or an explanation, I'd be very grateful- thanks!

1

There are 1 answers

0
Daweo On

Note that if you have list of URLs you do not need to do

sed 's/wget -nc //' script_from_ENA.sh > url-list
for i in `cat url-list` ; do wget -nc $i ; done

as wget has option for that case, namely -i file or --input-file=file which as wget man page says does

Read URLs from a local or external file.

in your case, if you have urls.txt like so

ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/004/ERR2014384/ERR2014384_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/006/ERR2014386/ERR2014386_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/001/ERR2014361/ERR2014361_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/009/ERR2014369/ERR2014369_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/007/ERR2014367/ERR2014367_1.fastq.gz

you could just do

wget -i urls.txt