Binary data gets written as string literal - how to convert it back to bytes?

I.F. Adams

I am writing compressed data as a bytes type to a black-box API (i.e. I cannot change what happens under the hood). When I get that data back, it is returned as a string type which I cannot decompress using the generic python modules (zlib, bz2, etc)

In more detail, part of the problem is that this string includes the leading 'b', e.g.
b'x\x9c\xabV*HL\xd1\xcd\xccK\xcbW\xb2RPJ\xcb\xcfOJ,R\xaa\x05\x00T\x83\x07b'
(this is a string type).

When I compare this to the original binary representation, outside of the quotes and leading B it is identical.

If I try to simply convert back to bytes (e.g. using the bytes function) it wraps the whole thing and escapes the slashes and I get something like the following:

b"b'x\\x9c\\xabV*HL\\xd1\\xcd\\xccK\\xcbW\\xb2RPJ\\xcb\\xcfOJ,R\\xaa\\x05\\x00T\\x83\\x07b'"

Questions is, is it possible to convert this back to a bytes type so I can decompress it? If so, how?

I've seen a few different examples (e.g. How to cast a string to bytes without encoding) that don't quite work out for what I'm trying.

UPDATE:

Lots of good answers, thanks folks! I wish I could click accept on multiple of them. And yes, as many of you noted, it is zlib compressed. This is by design as we have extremely limited space to work with and would like to stay with JSON if possible (zlib was chosen arbitrarily to just get the quirks of binary data out, and may not be the final choice).

Mark Tolonen

Assuming type str for your original string, you have the following raw string (literal length 4 escape codes not an actual escape code representing 1 byte):

s = r"b'x\x9c\xabV*HL\xd1\xcd\xccK\xcbW\xb2RPJ\xcb\xcfOJ,R\xaa\x05\x00T\x83\x07b'"

If you remove the leading b' and ', you can use the latin1 encoding to convert to bytes. latin1 is a 1:1 mapping of Unicode code points to byte values, because the first 256 Unicode code points represent the latin1 character set:

>>> s[2:-1].encode('latin1')
b'x\\x9c\\xabV*HL\\xd1\\xcd\\xccK\\xcbW\\xb2RPJ\\xcb\\xcfOJ,R\\xaa\\x05\\x00T\\x83\\x07b'

This is now a byte string, but contains literal escape codes. Now apply the unicode_escape encoding to translate back to a str of the actual code points:

>>> s2 = b.decode('unicode_escape')
>>> s2
'x\x9c«V*HLÑÍÌKËW²RPJËÏOJ,Rª\x05\x00T\x83\x07b'

This is now a Unicode string, with code points, but we still need a byte string. Encode with latin1 again:

>>> b2 = s2.encode('latin1')
>>> b2
b'x\x9c\xabV*HL\xd1\xcd\xccK\xcbW\xb2RPJ\xcb\xcfOJ,R\xaa\x05\x00T\x83\x07b'

In one step:

>>> s = r"b'x\x9c\xabV*HL\xd1\xcd\xccK\xcbW\xb2RPJ\xcb\xcfOJ,R\xaa\x05\x00T\x83\x07b'"
>>> b = s[2:-1].encode('latin1').decode('unicode_escape').encode('latin1')
>>> b
b'x\x9c\xabV*HL\xd1\xcd\xccK\xcbW\xb2RPJ\xcb\xcfOJ,R\xaa\x05\x00T\x83\x07b'

It appears this sample data is a zlib-compressed JSON string:

>>> import zlib,json
>>> json.loads(zlib.decompress(b))
{'pad-info': 'foobar'}

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

msgpack gets string data back as binary

How to convert a literal string representation of binary string to a binary string?

how to convert python binary str literal to real bytes

RSA Encrypted data convert from bytes to string and back to bytes?

How to convert string of binary values back to char

How to convert binary back to normal string?

How can I convert literal escape sequences in a string to the corresponding bytes?

Go - How to convert binary string as text to binary bytes?

Which encoding does dbm use to save data to bytes(python(And how to convert it back to a string?))?

How to convert a regular expression to a String literal and back again?

Is there any data lost in Java if I convert Binary data to a String and back?

SQL string literal hexadecimal key to binary and back

How to convert a binary representation of a string back to the original string in Python?

How to convert bytes string to bytes

python how to convert bytes to binary

convert a mixed binary string represantation back to binary

How to convert a string of numbers back into binary hex (\x values) type?

How to convert data URI to buffer (string with binary)?

How to convert javascript array to binary data and back for websocket?

How do you convert binary data to Strings and back in Java?

Understanding how ldstr gets string literal

Convert ASCII data to hex/binary/bytes in Python

Convert binary data variable to the list of bytes

How to convert integer into a string literal?

Convert bytes -> string -> back to bytes, and get original value

Convert array of bytes to string then back it to array of bytes again

Convert string serialized version of python bytes back to bytes

How to convert string to binary

I want to convert bytes literal that's hardcoded in string