I am trying to decode a pycurl html get request response but nothing works so far. UTF-8 throws an error and the closest I got it was using iso-8859-1 encoding, but the response is intelligible.
This is my code:
import pycurl
import certifi
from io import BytesIO
## Create PycURL instance
c = pycurl.Curl()
## Define Options - Set URL we want to request
c.setopt(c.URL, 'http://pycurl.io/')
## Setup buffer to recieve response
buffer = BytesIO()
c.setopt(c.WRITEDATA, buffer)
## Setup SSL certificates
c.setopt(c.CAINFO, certifi.where())
## Make Request
c.perform()
## Close Connection
c.close()
## Retrieve the content BytesIO & Decode
body = buffer.getvalue()
print(body.decode('iso-8859-1'))
This is part of the response using iso-8859-1 encoding.
EÑÛË Åð¦Öÿ¢ù|ÎB¥'Ñè:º#zD vìÚê0µiÐÃön×»<+LwW¯^ùå~²ài{Xi3Ñ,ësöAå
øDDþ9ÍÈåvÄ_u¾*¬(lg´(EÀ×
¬¸³mð#K¦\a»Ò
If I use UTF-8 the script throws an error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
What is wrong with my code? Apparently it is working for other people but not for me.