How to create an improperly closed gzip file using python?

47 views Asked by At

I have an application that occasionally needs to be able to read improperly closed gzip files. The files behave like this:

>>> import gzip
>>> f = gzip.open("path/to/file.gz", 'rb')
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/gzip.py", line 292, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.8/gzip.py", line 498, in read
    raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached

I wrote a function to handle this by reading the file line by line and catching the EOFError, and now I want to test it.

The input to my test should be a gz file that behaves in the same way as demonstrated. How do I make this happen in a controlled testing environment?

I really strongly prefer not making a copy of the improperly closed files that I get in production.

2

There are 2 answers

0
Amadan On

Very simple: do the compression, then snip the result.

import gzip
plain = b"Stuff"
compressed = gzip.compress(plain)
bad_compressed = compressed[:-1]

gzip.decompress(bad_compressed)     # EOFError

Even easier, just two bytes is enough for the gzip module to recognise the gzip format, but is obviously not a complete compressed file.

bad_compressed = b'\x1f\x8b'
gzip.decompress(bad_compressed)     # EOFError

This is in-memory for the simplicity of demonstration; it would work the same if you manipulated the file instead of the string. For example:

echo Stuff | gzip | head -c 2 >file.gz
0
notacorn On

I really don't want to knock the answer as it technically answers my question as stated in the best possible way.

What I did also manage to do was create a mock of a gzip.GzipFile that behaves exactly as I expect.

% python3 -m pytest test.py
============================================================= test session starts =============================================================
platform darwin -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /private/tmp
collected 1 item                                                                                                                              

test.py .                                                                                                                               [100%]

============================================================== 1 passed in 0.03s ==============================================================
% cat test.py
from unittest import mock
import gzip
import pytest

def test_foo():
    f = mock.Mock(gzip.GzipFile)
    f.readline = mock.Mock(side_effect=["hello", EOFError()])
    assert f.readline() == "hello"
    with pytest.raises(EOFError):
        f.readline()

I think for unit testing purposes this might be the cleaner solution as opposed to actually creating the file and reading it, as I can just mock the open function to return my mocked file.