I am trying to use zlib to detect the end of a compressed gz data stream.
I do not need the uncompressed contents. My goal to get pointers to the beginning and end of a stream. My code works on small files, but fails on large files. I tried allocating more memory for outbuf with no success. It's mainly copy and pasted from zlib examples. What is wrong?
// fopen(filename,"rb")...fread(inbuf, inlen, 1, fd);
int gzip_dctest(unsigned char *inbuf, unsigned int inlen) {
unsigned int outlen = 262144;
unsigned char *outbuf = malloc(outlen);
int ret = 0;
z_stream infstream;
/* allocate inflate state */
infstream.zalloc = Z_NULL;
infstream.zfree = Z_NULL;
infstream.opaque = Z_NULL;
infstream.avail_in = 0;
infstream.next_in = Z_NULL;
ret = inflateInit2(&infstream, MAX_WBITS | 32); // gzip/zlib header autodetect
if (ret != Z_OK) {
fprintf(stderr, "gzip_test: init fail (%d: %s)\n", ret, infstream.msg);
return ret;
}
unsigned int ptr = 0;
/* decompress until deflate stream ends or end of file */
do {
infstream.next_in = inbuf + ptr;
infstream.avail_in = inlen - ptr;
if (infstream.avail_in > (outlen / 8)) infstream.avail_in = (outlen / 8); // get chunk size
if (infstream.avail_in == 0)
break;
/* run inflate() on input until output buffer not full */
do {
ptr += infstream.avail_in;
infstream.avail_out = outlen;
infstream.next_out = outbuf;
ret = inflate(&infstream, Z_NO_FLUSH);
if (ret < 0) {
fprintf(stderr, "gzip_test: inflate fail at %u (%d: %s)\n", ptr - infstream.avail_in, ret, infstream.msg);
return ret;
}
} while (infstream.avail_out == 0);
/* done when inflate() says it's done */
} while (ret != Z_STREAM_END);
inflateEnd(&infstream);
return ptr - infstream.avail_in;
}
Example output with a problem file (403709952 uncompressed, 99152355 compressed size):
gzip_test: inflate fail at 2374700 (-3: invalid code lengths set)
"gzip -d" on this file gives no error:
1.cpio.gz: 75.4% -- replaced with 1.cpio
If I compress it again (gzip command), I get another error in my code:
gzip_test: inflate fail at 2381863 (-3: invalid block type)
I am expecting the code to work on any file size.
Your
ptr += infstream.avail_in;needs to be moved outside of the innerdoloop. Then it works fine.There is no need to throttle
avail_inbased on the amount of output space. Your innerdoloop will just keep going untilavail_inis consumed.For this to work on archives larger than 4GB, you'd need to use
size_tinstead ofintfor yourinlenandptr, and take care to setavail_into a value in the range of anunsigned. I would recommend something much simpler with noptr, like:Note that
next_inis updated byinflate()for you.Better still would be to not load the entire .gz file into memory, but rather to read a small buffer at a time and inflate as you go. Then keep track of the number of consumed bytes in a
size_toruintmax_ttotal.You also need to add a
free(outbuf);before you return, both on error and on success.Note that this will not detect the end of a gzip stream. It will detect the end of a gzip member. A gzip stream can contain multiple members. You would need to loop on the whole thing until you got to the end or encountered an error, with the latter indicating some non-gzip data after the end of the gzip stream.