downloading pdf using requests not working

25 views Asked by At

I written a code to download a pdf file from a link using request

import requests

url = "https://disclosure.bursamalaysia.com/FileAccess/apbursaweb/download?id=231746&name=EA_DS_ATTACHMENTS"

response = requests.get(url)

with open("EA_DS_ATTACHMENTS.pdf", "wb") as f:
    f.write(response.content)

print("PDF downloaded successfully!")

Of course, it doesn't work. It instead downloads a PDF that is unreadable. I suspect its because it isn't a proper PDF download link, but then again I'm not really sure since im new to this.

1

There are 1 answers

2
James On

It is returning a 403 response when using requests. It looks like it is blocking based on the user agent in the headers. You can use custom headers to mimic your browser's user agent to get the PDF document.

import requests

url = (
    'https://disclosure.bursamalaysia.com/FileAccess/apbursaweb/download?'
    'id=231746&name=EA_DS_ATTACHMENTS'
)

res_bad = requests.get(url)
print(res_bad, res_bad.request.headers)

# prints:
# <Response [403]> {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 
# 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}

# this is the FireFox user agent
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) '
                         'Gecko/20100101 Firefox/124.0'}

res_good = requests.get(url, headers=headers)
with open("EA_DS_ATTACHMENTS.pdf", "wb") as f:
    f.write(res.content)