UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 5: invalid start byte

131 views Asked by At

I tried to transfer an image file using Python socket programming but when I run the code I got an error saying "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 5: invalid start byte"

Server code:

import socket
import tqdm

server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(("localhost",9999))
server.listen()

client, addr = server.accept()

file_name = client.recv(1024).decode()
print(file_name)
file_size = client.recv(1024).decode()
print(file_size)

file = open(file_name,"wb")
file_bytes = b""
done = False

progress = tqdm.tqdm(unit="B", unit_scale=True, unit_divisor=1000,total=int(file_size))

while not done:
    data = client.recv(1024)
    if file-bytes[-5:] == b"<END":
        done = True
    else:
        file_bytes+=data
    progress.update(1024)
file.write(file_bytes)
file.close()
client.close()
server.close()

Client code:

import os
import socket

client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(("localhost",9999))

file = open("image.png","rb")
file_size = os.path.getsize("image.png")

client.send("received_image.png".encode())
client.send(str(file_size).encode())

data = file.read()
client.sendall(data)
client.send(b"\<END\>")

file.close()
client.close()
1

There are 1 answers

0
tripleee On

Your wire format is ambiguous. There is no way for the server to know when the file ends and when the size of the image is transmitted. Step back and figure out what bytes you are sending, and how to make it clear which bytes belong to the metainformation. A common solution is to transmit the length of the next field before the field itself. Another is to terminate each string with a null byte, which is disallowed in C strings and thus also e.g. in file names.

Here's the latter idea implemented as changes in the client script. You'll obviously need to adapt the server side accordingly.

import os
import socket

client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(("localhost",9999))

# Switch to using a context manager
with open("image.png","rb") as file:

    file_size = os.path.getsize("image.png")
    
    client.send("received_image.png".encode() + b'\x00')
    client.send(str(file_size).encode() + b'\x00')
    
    client.sendall(file.read())
    # Pretty redundant now; maybe remove?
    client.send(b"<END>")
    # (Removed the invalid backslash sequences, too)
client.close()

Reading the entire file into memory is also an unnecessary burden; you might want to change this to read a chunk at a time (say, 8k or 64k) and transmit each separately as long as there are chunks left to send.