How can I convert literal escape sequences in a string to the corresponding bytes?

Rafael Almeida

I have a UTF-8 encoded string that comes from somewhere else that contains the characters \xc3\x85lesund (literal backslash, literal "x", literal "c", etc).

Printing it outputs the following:

\xc3\x85lesund

I want to convert it to a bytes variable:

b'\xc3\x85lesund'

To be able to encode:

'Ålesund'

How can I do this? I'm using python 3.4.

ThisSuitIsBlackNot

Using unicode_escape

TL;DR You can decode bytes using the unicode_escape encoding to convert \xXX and \uXXXX escape sequences to the corresponding characters:

>>> r'\xc3\x85lesund'.encode('utf-8').decode('unicode_escape').encode('latin-1')
b'\xc3\x85lesund'

First, encode the string to bytes so it can be decoded:

>>> r'\xc3\x85あ'.encode('utf-8')
b'\\xc3\\x85\xe3\x81\x82'

(I changed the string to show that this process works even for characters outside of Latin-1.)

Here's how each character is encoded (note that あ is encoded into multiple bytes):

  • \ (U+005C) -> 0x5c
  • x (U+0078) -> 0x78
  • c (U+0063) -> 0x63
  • 3 (U+0033) -> 0x33
  • \ (U+005C) -> 0x5c
  • x (U+0078) -> 0x78
  • 8 (U+0038) -> 0x38
  • 5 (U+0035) -> 0x35
  • (U+3042) -> 0xe3, 0x81, 0x82

Next, decode the bytes as unicode_escape to replace each escape sequence with its corresponding character:

>>> r'\xc3\x85あ'.encode('utf-8').decode('unicode_escape')
'Ã\x85ã\x81\x82'

Each escape sequence is converted to a separate character; each byte that is not part of an escape sequence is converted to the character with the corresponding ordinal value:

  • \\xc3 -> U+00C3
  • \\x85 -> U+0085
  • \xe3 -> U+00E3
  • \x81 -> U+0081
  • \x82 -> U+0082

Finally, encode the string to bytes again:

>>> r'\xc3\x85あ'.encode('utf-8').decode('unicode_escape').encode('latin-1')
b'\xc3\x85\xe3\x81\x82'

Encoding as Latin-1 simply converts each character to its ordinal value:

  • U+00C3 -> 0xc3
  • U+0085 -> 0x85
  • U+00E3 -> 0xe3
  • U+0081 -> 0x81
  • U+0082 -> 0x82

And voilà, we have the byte sequence you're looking for.

Using codecs.escape_decode

As an alternative, you can use the codecs.escape_decode method to interpret escape sequences in a bytes to bytes conversion, as user19087 posted in an answer to a similar question:

>>> import codecs
>>> codecs.escape_decode(r'\xc3\x85lesund'.encode('utf-8'))[0]
b'\xc3\x85lesund'

However, codecs.escape_decode is undocumented, so I wouldn't recommend using it.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How do I convert a string of escape sequences to bytes?

How can I remove the ANSI escape sequences from a string in python

How can I define a string that will ignore escape sequences?

How to convert Unicode string to Unicode escape sequences?

How can I convert bytes to string In Tensorflow

How can I convert a string containing a character escape sequence into a char?

I want to convert bytes literal that's hardcoded in string

how can I convert my String (that represents hex values) to bytes?

How can I convert string to bytes in Python, like node js

In Perl, how can I convert an array of bytes to a Unicode string?

How can I convert a string of bytes to a byte object

How can i run ASM opcode in a string variable Or convert it to bytes?

Can I escape a double quote in a verbatim string literal?

How can I render newline escape sequences in xargs?

How can I send terminal escape sequences through SSH with Go?

Binary data gets written as string literal - how to convert it back to bytes?

Print a string with its special characters printed as literal escape sequences

Convert literal backslash followed by a character to the corresponding escape sequence

How to escape backticks in string literal

How to convert escaped sequences to literal characters

Can I convert a C# string value to an escaped string literal

How do I convert a string into bytes in python?

How do I convert a B string to bytes?

How can I convert POI HSSFWorkbook to bytes?

How do I convert from &alloc::string::String to a string literal?

How can I convert string to int? dtype("O") - ValueError: invalid literal for int() with base 10: ''

How to convert bytes string to bytes

How can I escape string with a regex?

How can i escape quotes from a string?