How to keep the header and trailer while zlib decompress and compress

borislav-penev

I have raw data extracted from PDF and I decompressed the raw data and compressed it again.

I expected the same header and trailer, but the header was changed.

  • Original Hex Header
48 89 EC 57 ....
  • Converted Hex Header
78 9C EC BD ...

I dug into zlib compression and got header 48 also is one of zlib.header.

But mostly 78 is used for zlib compression.

It's my code which decompress and compress:

decompress_wbit = 12
compress_variable = 6
output_data = zlib.decompress(open(raw_data, "rb").read(), decompress_wbit)
output_data = zlib.compress(output_data, 6)
output_file = open(raw_data + '_', "wb")
output_file.write(output_data)
output_file.close()

I changed the decompress_wbit and compress_variable but still keeps 78.

So not sure how to get 48 as header.

Here is the short description about zlib.header.

  • CINFO (bits 12-15)

Indicates the window size as a power of two, from 0 (256 bytes) to 7 (32768 bytes). This will usually be 7. Higher values are not allowed.

  • CM (bits 8-11)

The compression method. Only Deflate (8) is allowed.

  • FLEVEL (bits 6-7)

Roughly indicates the compression level, from 0 (fast/low) to 3 (slow/high)

  • FDICT (bit 5)

Indicates whether a preset dictionary is used. This is usually 0. 1 is technically allowed, but I don't know of any Deflate formats that define preset dictionaries.

  • FCHECK (bits 0-4)

A checksum (5 bits, 0..31), whose value is calculated such that the entire value divides 31 with no remainder.

Typically, only the CINFO and FLEVEL fields can be freely changed, and FCHECK must be calculated based on the final value.* Assuming no preset dictionary, there is no choice in what the other fields contain, so a total of 32 possible headers are valid. Here they are:

      FLEVEL: 0       1       2       3
CINFO:
     0      08 1D   08 5B   08 99   08 D7
     1      18 19   18 57   18 95   18 D3
     2      28 15   28 53   28 91   28 CF
     3      38 11   38 4F   38 8D   38 CB
     4      48 0D   48 4B   48 89   48 C7
     5      58 09   58 47   58 85   58 C3
     6      68 05   68 43   68 81   68 DE
     7      78 01   78 5E   78 9C   78 DA

Please let me know how to keep the zlib.header while decompression & compression

Thanks for your time.

Mark Adler

I will first note that it doesn't matter. The data will be decompressed fine with that zlib header. Why do you care?

You are giving zlib.compress a small amount of data that permits a smaller window. Since it is permitted, the Python library is electing to compress with a smaller window.

A way to avoid that would be to use zlib.compressobj instead. Upon initiation, it doesn't know how much data you will be feeding it and will default to the largest window size.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Compress string in php and decompress in python using zlib

Use zlib.js to decompress python zlib compress

Decompress zlib data in Haskell -- Incorrect header check

How to decompress and compress .odt and .docx?

Compress and decompress zlib (RFC 1950) using DEFLATE (RFC 1951) functions

Error compress/decompress spdy name/value block with zlib+dictionary

How to decompress an object compressed by zlib in JavaScript?

python zlib how to decompress many objects

How to compress and decompress between C++ and Java?

How to compress / decompress a serialized Pandas Dataframe with PyArrow?

zlib error code -3 while using zlib to decompress PDF Flatedecode stream

Decompress zlib stream in Clojure

JavaScript Zlib Decompress

flutter/dart: How to decompress/inflate zlib binary string in flutter

Validating header and trailer

Tool to compress/decompress STDIN

Compress/Decompress based on probability

How do I replace the header and trailer of all files in a directory with a new header and trailer string, but only up to a certain character?

How can I easily compress and decompress Strings to/from byte arrays?

How to compress / decompress string with using SevenZip - 7Zip

How to Compress/Decompress tar.gz files in java

How to compress and decompress a file using lz4?

How to use LWJGL's LZ4 bindings to compress and decompress

how to use XZ lib to compress/decompress file in android

Javascript compress/decompress. How to use Array reduce

How do we compress JSON in back end and decompress in front end?

Decompress gzip data without header using Node.JS zlib module

How to keep the header static, always on top while scrolling?

Zlib decompress script working on linux but not on windows: error -5 while decompressing data: incomplete or truncated stream