How can I extract text from single quotes, even if the text itself contains single quotes, using regex in Python?

Edwin Jose

I'm trying to extract data from a .txt file and while my regex did work for the most part, it fails when it comes across single quotes within the text I'm trying to extract.

{'pro_id':'1692423', 'pro_model':'SKUF42051', 'pro_category':'accessories', 'pro_name':'Gants tactiques Escalade en plein air Gants antidérapants résistants à l'usure Formation Gants de moto d'équitation', 'pro_current_price':'27.99', 'pro_raw_price':'27.99', 'pro_discount':'36', 'pro_likes_count':'11'}

This is what my text in the .txt file looks like.

I'm looping through and creating dicts from them. I do that by extracting the content from within the single quotes and appending the "key" and "value" pairs to a dict.

I've first extracted the content from within the curly brackets, then split that at ", " to get the "items" in a list, after which I looped through the list and used the regex in the command key, value = re.findall(r"\'([^']+)\'", element) to extract the "key" and "value".

I'm a regex as well as a programming novice, so I could use some expert help.

I did ask ChatGPT for a regex '([^']+(?:\\'[^']+)*?)':'([^']+(?:\\'[^']+)*?)' but that fails too.

I want to get a list that holds ['pro_name', 'Gants tactiques Escalade en plein air Gants antidérapants résistants à l'usure Formation Gants de moto d'équitation'] from re.findall

but instead I get

['Gants tactiques Escalade en plein air Gants antidérapants résistants à l', 'équitation'].

Ξένη Γήινος

Your string is malformed. Strings containing literal single quotes should be enclosed in double quotes, else it can't be parsed correctly.

It is extremely difficult to use regex to sort this out, and also by using a for loop.

But I have discovered a way, I have found simple patterns. Since all strings are enclosed in single quotes, and the key value pairs are separated by commas followed by a space, and the keys are separated from values by single colons, it is easy to identify key value pairs by first split the string by "', '", then split each substring by "':'".

You can then convert it to dict, with cleanup if necessary.

Example:

import re

text = "{'pro_id':'1692423', 'pro_model':'SKUF42051', 'pro_category':'accessories', 'pro_name':'Gants tactiques Escalade en plein air Gants antidérapants résistants à l'usure Formation Gants de moto d'équitation', 'pro_current_price':'27.99', 'pro_raw_price':'27.99', 'pro_discount':'36', 'pro_likes_count':'11'}"
arr = [i.split("':'") for i in text.split("', '")]
def clean(s):
    return re.sub("^[{']+|[}']+$", '', s)

{clean(a): clean(b) for a, b in arr}

The result is:

{'pro_id': '1692423',
 'pro_model': 'SKUF42051',
 'pro_category': 'accessories',
 'pro_name': "Gants tactiques Escalade en plein air Gants antidérapants résistants à l'usure Formation Gants de moto d'équitation",
 'pro_current_price': '27.99',
 'pro_raw_price': '27.99',
 'pro_discount': '36',
 'pro_likes_count': '11'}

Wrap it in a function:

def dictify(text):
    arr = [i.split("':'") for i in text.split("', '")]
    return {clean(a): clean(b) for a, b in arr}

I assume you have many more strings like the above in your text file, since I don't know the exact format, I can only demonstrate how to convert the file to a list of dicts as if it is newline separated.

with open('/path/to/file', 'r') as f:
   text = f.read()
[dictify(row) for row in text.split('\n')]

You need to change the file path placeholder to the actual path. The above won't work if your file isn't newline separated.

And my method won't work if your string deviates from the format, for example if there are spaces after the key-value delimiting colons, or there aren't spaces after the commas that separate key-value pairs.

If that is the case I cannot help you, you need to figure out a different method, but my example does work on the example you have given.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How can I remove single quotes from a column using python

How can I modify text inside single quotes?

How to delete text in single quotes

How can I substitute a text that is inside single quotes on multiple lines using command-line tools?

Extract text between three single quotes

Extract single and double quotes urls from link using regex

extract phrase between single quotes regex and python

How can I put an output that contains single quotes into a string in Powershell?

How to find text with single quotes without using Like in PostgreSQL

How to stop Text::CSV escaping single quotes?

Replace escaped double quotes to single quotes in Python using regex

How can I color text between single quotes when apostrophes are present?

Treat single quotes in text file

Insert text with single quotes in PostgreSQL

how to extract text from inside quotes using findstr

extract text between single quotes in mult line variable with sed

how do i extract value inside quotes using regex python?

Remove double quotes from text inside JSON using Python and Regex

How to replace single quotes string with literal if it contains escaped single quotes?

How to replace single quotes from a list in python

How do i extract text from double quotes and add it to string ? python 3.x

Regex in bash: extract string with single or double quotes

ruby regex extract word between single quotes

regex to extract strings outside single or double quotes

How can I truncate a text string in quotes and preserve the quotes?

How can I delete text that is NOT in quotes or parentheses?

When attempting to remove single quotes from text file, they are replaced with <98>

get text within single quotes from an html file

How to replace single quotes (') with \' in python?

TOP Ranking

HotTag

Archive