How can I extract text from single quotes, even if the text itself contains single quotes, using regex in Python?

Edwin Jose

I'm trying to extract data from a .txt file and while my regex did work for the most part, it fails when it comes across single quotes within the text I'm trying to extract.

{'pro_id':'1692423', 'pro_model':'SKUF42051', 'pro_category':'accessories', 'pro_name':'Gants tactiques Escalade en plein air Gants antidérapants résistants à l'usure Formation Gants de moto d'équitation', 'pro_current_price':'27.99', 'pro_raw_price':'27.99', 'pro_discount':'36', 'pro_likes_count':'11'}

This is what my text in the .txt file looks like.

I'm looping through and creating dicts from them. I do that by extracting the content from within the single quotes and appending the "key" and "value" pairs to a dict.

I've first extracted the content from within the curly brackets, then split that at ", " to get the "items" in a list, after which I looped through the list and used the regex in the command key, value = re.findall(r"\'([^']+)\'", element) to extract the "key" and "value".

I'm a regex as well as a programming novice, so I could use some expert help.

I did ask ChatGPT for a regex '([^']+(?:\\'[^']+)*?)':'([^']+(?:\\'[^']+)*?)' but that fails too.

I want to get a list that holds ['pro_name', 'Gants tactiques Escalade en plein air Gants antidérapants résistants à l'usure Formation Gants de moto d'équitation'] from re.findall

but instead I get

['Gants tactiques Escalade en plein air Gants antidérapants résistants à l', 'équitation'].

Ξένη Γήινος

Your string is malformed. Strings containing literal single quotes should be enclosed in double quotes, else it can't be parsed correctly.

It is extremely difficult to use regex to sort this out, and also by using a for loop.

But I have discovered a way, I have found simple patterns. Since all strings are enclosed in single quotes, and the key value pairs are separated by commas followed by a space, and the keys are separated from values by single colons, it is easy to identify key value pairs by first split the string by "', '", then split each substring by "':'".

You can then convert it to dict, with cleanup if necessary.

Example:

import re

text = "{'pro_id':'1692423', 'pro_model':'SKUF42051', 'pro_category':'accessories', 'pro_name':'Gants tactiques Escalade en plein air Gants antidérapants résistants à l'usure Formation Gants de moto d'équitation', 'pro_current_price':'27.99', 'pro_raw_price':'27.99', 'pro_discount':'36', 'pro_likes_count':'11'}"
arr = [i.split("':'") for i in text.split("', '")]
def clean(s):
    return re.sub("^[{']+|[}']+$", '', s)

{clean(a): clean(b) for a, b in arr}

The result is:

{'pro_id': '1692423',
 'pro_model': 'SKUF42051',
 'pro_category': 'accessories',
 'pro_name': "Gants tactiques Escalade en plein air Gants antidérapants résistants à l'usure Formation Gants de moto d'équitation",
 'pro_current_price': '27.99',
 'pro_raw_price': '27.99',
 'pro_discount': '36',
 'pro_likes_count': '11'}

Wrap it in a function:

def dictify(text):
    arr = [i.split("':'") for i in text.split("', '")]
    return {clean(a): clean(b) for a, b in arr}

I assume you have many more strings like the above in your text file, since I don't know the exact format, I can only demonstrate how to convert the file to a list of dicts as if it is newline separated.

with open('/path/to/file', 'r') as f:
   text = f.read()
[dictify(row) for row in text.split('\n')]

You need to change the file path placeholder to the actual path. The above won't work if your file isn't newline separated.

And my method won't work if your string deviates from the format, for example if there are spaces after the key-value delimiting colons, or there aren't spaces after the commas that separate key-value pairs.

If that is the case I cannot help you, you need to figure out a different method, but my example does work on the example you have given.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2023-06-7

Comments

0 comments

TOP Ranking

Article

How can I extract text from single quotes, even if the text itself contains single quotes, using regex in Python?

How can I extract text from single quotes, even if the text itself contains single quotes, using regex in Python?

pump.io port in URL

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

How to import an asset in swift using Bundle.main.path() in a react-native native module

Inner Loop design for webscrapping

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' (111 Connection refused)

Removed zsh, but forgot to change shell back to bash, and now Ubuntu crashes (wsl)

Ambiguous use of 'init' with CFStringTransform and Swift 3

Resetting Value of <input type="time"> in Firefox

Execute ./script.sh with a crontab

Converting a class method to a property with a backing field

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

How to update azerothcore-wotlk docker container

How to set tab order for array of cluster,where cluster elements have different data types in LabVIEW?

Grails with Oracle thick OCI driver authenticate to Oracle with wrong user

How to pass data to the ng2-bs3-modal?

Making Array From Page Elements in jQuery

Retrieve Element Tag Value XML Using Bash

Laravel's ORM sync with timestamps doesn't update timestamps

Do animations stop css changes after animation completion?