Binary data gets written as string literal - how to convert it back to bytes?

I.F. Adams

I am writing compressed data as a bytes type to a black-box API (i.e. I cannot change what happens under the hood). When I get that data back, it is returned as a string type which I cannot decompress using the generic python modules (zlib, bz2, etc)

In more detail, part of the problem is that this string includes the leading 'b', e.g.
b'x\x9c\xabV*HL\xd1\xcd\xccK\xcbW\xb2RPJ\xcb\xcfOJ,R\xaa\x05\x00T\x83\x07b'
(this is a string type).

When I compare this to the original binary representation, outside of the quotes and leading B it is identical.

If I try to simply convert back to bytes (e.g. using the bytes function) it wraps the whole thing and escapes the slashes and I get something like the following:

b"b'x\\x9c\\xabV*HL\\xd1\\xcd\\xccK\\xcbW\\xb2RPJ\\xcb\\xcfOJ,R\\xaa\\x05\\x00T\\x83\\x07b'"

Questions is, is it possible to convert this back to a bytes type so I can decompress it? If so, how?

I've seen a few different examples (e.g. How to cast a string to bytes without encoding) that don't quite work out for what I'm trying.

UPDATE:

Lots of good answers, thanks folks! I wish I could click accept on multiple of them. And yes, as many of you noted, it is zlib compressed. This is by design as we have extremely limited space to work with and would like to stay with JSON if possible (zlib was chosen arbitrarily to just get the quirks of binary data out, and may not be the final choice).

Mark Tolonen

Assuming type str for your original string, you have the following raw string (literal length 4 escape codes not an actual escape code representing 1 byte):

s = r"b'x\x9c\xabV*HL\xd1\xcd\xccK\xcbW\xb2RPJ\xcb\xcfOJ,R\xaa\x05\x00T\x83\x07b'"

If you remove the leading b' and ', you can use the latin1 encoding to convert to bytes. latin1 is a 1:1 mapping of Unicode code points to byte values, because the first 256 Unicode code points represent the latin1 character set:

>>> s[2:-1].encode('latin1')
b'x\\x9c\\xabV*HL\\xd1\\xcd\\xccK\\xcbW\\xb2RPJ\\xcb\\xcfOJ,R\\xaa\\x05\\x00T\\x83\\x07b'

This is now a byte string, but contains literal escape codes. Now apply the unicode_escape encoding to translate back to a str of the actual code points:

>>> s2 = b.decode('unicode_escape')
>>> s2
'x\x9c«V*HLÑÍÌKËW²RPJËÏOJ,Rª\x05\x00T\x83\x07b'

This is now a Unicode string, with code points, but we still need a byte string. Encode with latin1 again:

>>> b2 = s2.encode('latin1')
>>> b2
b'x\x9c\xabV*HL\xd1\xcd\xccK\xcbW\xb2RPJ\xcb\xcfOJ,R\xaa\x05\x00T\x83\x07b'

In one step:

>>> s = r"b'x\x9c\xabV*HL\xd1\xcd\xccK\xcbW\xb2RPJ\xcb\xcfOJ,R\xaa\x05\x00T\x83\x07b'"
>>> b = s[2:-1].encode('latin1').decode('unicode_escape').encode('latin1')
>>> b
b'x\x9c\xabV*HL\xd1\xcd\xccK\xcbW\xb2RPJ\xcb\xcfOJ,R\xaa\x05\x00T\x83\x07b'

It appears this sample data is a zlib-compressed JSON string:

>>> import zlib,json
>>> json.loads(zlib.decompress(b))
{'pad-info': 'foobar'}

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-19

Comments

0 comments

TOP Ranking

Article

Binary data gets written as string literal - how to convert it back to bytes?

Binary data gets written as string literal - how to convert it back to bytes?

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Inner Loop design for webscrapping

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' (111 Connection refused)

Removed zsh, but forgot to change shell back to bash, and now Ubuntu crashes (wsl)

ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

How to run blender on webserver?

Resetting Value of <input type="time"> in Firefox

Converting a class method to a property with a backing field

Ambiguous use of 'init' with CFStringTransform and Swift 3

Execute ./script.sh with a crontab

How to set tab order for array of cluster,where cluster elements have different data types in LabVIEW?

How to pass data to the ng2-bs3-modal?

Retrieve Element Tag Value XML Using Bash

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

SQL Server : need add a dot before two last character

Making Array From Page Elements in jQuery

Laravel's ORM sync with timestamps doesn't update timestamps

Do animations stop css changes after animation completion?