My problem is similar to the one solved HERE, with few differences.
So the a Deflated block with dynamic Huffman codes is structured like this:
|Block Header|compressed block|end of block|
Then Block header's bit stream is like this:
|block type|HLIT|HDIST|HCLEN|
So i ran Pyflate and i got the following
Dynamic Huffman tree:
length codes: 21, distances codes: 24, code_lengths_length: 16
lengths:
[3, 0, 6, 6, 5, 3, 3, 2, 3, 3, 0, 0, 0, 0, 0, 0, 5, 6, 6]
lengths to spans:
[(0, 3), (1, 0), (2, 6), (3, 6), (4, 5), (5, 3), (6, 3), (7, 2), (8, 3), (9, 3), (10, 0), (11, 0), (12, 0), (13, 0), (14, 0), (15, 0), (16, 5), (17, 6), (18, 6), (19, -1)]
From this byte stream:
EC 5D 69 73 AB 36 14 FD DC CE F4 3F A8 E9 3E 0D 0E 60 B0 8D 5F 92 36 CD D2 7D 4F F7 E9 64 64 10 09 0D 36 2E E0 26 E9 F2 DF 7B 24 B0 01 AF C2 30 5D E5 F7 92 D8 02 1D 6D F7 5E E9 1E 5D E4 8F DF FF F8 52 FB 9A C5 49 10 4D 86 C4 E8 E8 2F 3C 7F 41 53 36 24 5F CE 26 87 C4 B4 C9 15 1B 11 53 37 2D FC 1A 76 AD A1 35 20 6F EA A6 8E DB AE E2 68 3C 24 67 61 30 A2 23 DA 71 A3 31 39 A6 D9 87 B7 13 16 FF 12 B8 AC 43 8B 8B A7 2F 3C FF E5 6C F4 13 73 D3 21 B9 0A 26 1E 49 EF 18 89 83 DB BB 94 4C E3 C8 9B B9 69 42 FC 28 26 4F D1 2C 26 A3 59 12 4C 58 92 BC F0 FC F5 5D CC A8 A7 5D 47 D3 C0 AD 95 F1 63 FC A6 B7 4C 7B FF 62 48 8E 7B B6 37 1A 0C AC 41 A7 D7 D5 A9 6E 9A 7A A7 EF E8 D4 E9 F4 BD BE FF E5 C7 D7 9F BD FF C9 CD D9 C5 C5 E5 C5 CD 3B 5F 7C FA E1 E5 27 6F 8F 1F 3B B7 51 74 1B B2 BC E6 D7 D1 90 1C A4 3E 9D DC 87 C1 44 7F FB 76 4C 83 90 5F 3A 20 C7 4B A9 79 86 F3 68 92 B2 49 AA 5D C7 74 92 F8 2C D6 2E 27 6E E4 05 93 DB 21 F9 79 16 A5 CC D3 A6 71 30 49 E9 28 64 A5 9B 9F A6 E8 F7 94 3D A6 47 77 E9 38 7C 46 DC 3B 1A 27 2C 3D 39 98 A5 BE 36 38 78 E1 79 FE EF 98 5F 43 11 E4 F8 0E 3D 73 7A 62 EA 78 4F F2 3F C7 2F 6A DA 0F 81 4F 5E 1C 27 D1 8F A7 FC D3 E9 FC CA 98 A5 94 B8 59 51 27 DD 8B 83 F7 2F F1 9B 79 B7 EC E0 B4 94 F9 F8 C5 1F D8 C4 0B FC 1F 8B 9C 15 80 09 1D 33 9E FB 97 80 3D 4C A3 38 3D 28 43 3E 04 5E 7A 87 37 1E E3 A3 AF 89 8F 87 24 98 04 69 40 43 2D 71 69 C8 F3 9E BC F0 BC 51 94 99 A4 4F 21 23 E9 D3 54 C0 8A C6 BB 49 72 20 EA FE C2 F3 2F 45 B3 34 8C A2 7B 42 5F 78 FE B9 DF A6 D4 13 7D A8 3F FB E3 85 E7 47 91 F7 C4 13 45 29 43 43 D7 5F 79 31 18 F3 2A D1 49 FA 0C 17 C6 C1 44 2B AE 89 14 1A DF 06 13 E4 C6 FB 1C 2A FB A0 3D B0 D1 7D 90 6A BC 74 2D 09 7E 65 1A F5 7E 9A 25 A9 C8 C9 CB 12 03 75 48 52 8F 17 38 8A 62 0F 23 EA 46 61 48 A7 09 1B CE DF 88 22 92 48 13 37 6B 61 32 A5 2E 1B EA D3 B4 9A 1E 2F D2 39 70 30 BE 2D 20 B3 CA DC 31 2E DB 43 3A 4B 23 FE 31 84 3C 6B 59 DA A2 21 E8 14 9E 3C 9C 44 13 51 AA A8 B7 C7 DC 28 A6 29 D4 58 A4 73 F4 29 C7 F6 82 64 1A D2 A7 E1 28 8C DC FB 72 37 4C 1F F9 3D 9D 2F 20 45 1F 27 B7 EF AC 74 A7 B8 4A 9F C6 37 A2 E2 37 0F 31 9D 6E EA 6F 7E EB 4B 23 EA DE DF C6 D1 6C E2 5D F3 0C FC DE 8D 3D BE 0E 64 B9 B5 AB 45 74 2E 1F 53 16 4F 68 78 1E D2 24 39 24 D5 CF 64 BA 92 82 BE 9E AC 24 FA 10
But i can't find the bytes corresponding to Pyflate's output except for 21 which is first 5 bits of byte EC and Pyflate did actually decompress the data successfully.
I need to do this manually myself without third party software to better understand it, So i know about RFC1951 but it didn't help on this one.
How does one correctly extract Huffman tree from compressed data?
Or if someone can provide an idea of how the block is really structure because clearly the above structure is just an overview.
My goal is to understand how much space does the Huffman data covers before the actual data from the file being compressed.
RFC 1951 completely and exactly describes the deflate format, so it is not possible that it can't help. Unless, that is, you have not spent the time and effort to read and understand it.
The dynamic block header starts like that. The header continues for many more bits. For that example, the dynamic header is 495 bits. A little less than 62 bytes.
Along with studying RFC 1951, you can use infgen to disassemble example streams and see the pieces. Here is the complete dynamic block header from your stream as shown by infgen, including the bits from the stream on the right (using the
-ddoption of infgen):Note that the bits are read from right to left. So the bits
10 0from the first line and11101from the end of the second line combine to11101100which is your first byteEC.Following the header are infgen comments (which start with an exclamation mark) showing the literal/length and distance codes that result from that dynamic block header:
That is then followed by the actual compressed data, along with the bits for those codes: