why am I get negative compression ratio for a gz file

642 views Asked by At

I found an interesting thing about a gz file. The compression ration is negative.

[root@pridns named]# ll dns-query.log-2022083103*
-rw-r--r-- 1 named named 1.2G Aug 31 03:10 dns-query.log-2022083103.gz
[root@pridns named]#
[root@pridns named]# gzip -l dns-query.log-2022083103.gz
         compressed        uncompressed  ratio uncompressed_name
         1187103824           679547787 -74.7% dns-query.log-2022083103
[root@pridns named]#
[root@pridns named]# python -c "print((679547787-1187103824)/679547787.0)"
-0.7469026413
[root@pridns named]#
[root@pridns named]# gunzip -c dns-query.log-2022083103.gz > dns-query.log-2022083103
[root@pridns named]#
[root@pridns named]# ll dns-query.log-2022083103*
-rw-r--r-- 1 root  root  8.7G Sep 16 11:06 dns-query.log-2022083103
-rw-r--r-- 1 named named 1.2G Aug 31 03:10 dns-query.log-2022083103.gz
[root@pridns named]#
[root@pridns named]# ls -l dns-query.log-2022083103*
-rw-r--r-- 1 root  root  9269482379 Sep 16 11:06 dns-query.log-2022083103
-rw-r--r-- 1 named named 1187103824 Aug 31 03:10 dns-query.log-2022083103.gz
[root@pridns named]#
[root@pridns named]# python -c "print((9269482379-1187103824)/9269482379.0)"
0.871934184082
[root@pridns named]# gzip --version
gzip 1.5
Copyright (C) 2007, 2010, 2011 Free Software Foundation, Inc.
Copyright (C) 1993 Jean-loup Gailly.
This is free software.  You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

Written by Jean-loup Gailly.
[root@pridns named]#

The correct compression ratio should be 87.2%.

I also tried to test another file, the compression ratio is correct.

[root@pridns named]# gzip -l dns-security.log-2022083103.gz
         compressed        uncompressed  ratio uncompressed_name
          131503235          1275215408  89.7% dns-security.log-2022083103
[root@pridns named]# zcat dns-security.log-2022083103.gz > dns-security.log-2022083103
[root@pridns named]# ls -l dns-security.log-2022083103*
-rw-r--r-- 1 root  root  1275215408 Sep 16 11:31 dns-security.log-2022083103
-rw-r--r-- 1 named named  131503235 Aug 31 03:10 dns-security.log-2022083103.gz
[root@pridns named]# python -c "print((1275215408-131503235)/1275215408.0)"
0.896877630105
[root@pridns named]#
1

There are 1 answers

0
Donghua Liu On

The comments of @pmqs are correct, from the spec on https://www.rfc-editor.org/rfc/rfc1952, the original file size is just stored as 4 bytes. 9269482379 = 0x22881138B, the high bits are cropped, so the stored size is 0x2881138B = 679,547,787.

      2.3.1. Member header and trailer
         ...
         ISIZE (Input SIZE)
            This contains the size of the original (uncompressed) input
            data modulo 2^32.
         ...