How do i remove the special chars that show as `\uxxx` in python3 string object?

Merlin Published at Dev

merlin

python string object as follow:

The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833.

I want to remove these chars \u200b \ufeff that show as raw unicode.

think-maths

Encode it to ascii and ignore errors

>>> s = 'The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833'
>>> s.encode('ascii', 'ignore')
b'The site of the old observatory in Bern is the point of origin of the CH1903 coordinate system at 465708.66N 72622.50E / 46.9524056N 7.4395833E / 46.9524056; 7.4395833'

To replace unicode character with whitespace to keep the length same, you can use

#length of original string

>>> s = 'The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833'
>>> len(s)
179

#to maintain the same length

>>> new_s = s.encode('ascii',errors='ignore').decode('utf-8')
>>> final_s = new_s + ' ' * (len(s) - len(new_s))
>>> final_s
'The site of the old observatory in Bern is the point of origin of the CH1903 coordinate system at 465708.66N 72622.50E / 46.9524056N 7.4395833E / 46.9524056; 7.4395833            '
>>> len(final_s)
179

this will add additional space at last to maintain the length

Collected from the Internet

Please contact [email protected] to delete if infringement.