How to make regex if string contains comma(,) space, and other characters like (

itsmnthn

I have string:

<td class="cspan">Proximates</td>\n\t<td style="text-align:left">Total lipid (fat)\n\t\t\n\t\t\n\t\t</td>\n\t\t\n\t\t<td>g</td>\n\t\t\n\t\t\t<td style="text-align:right;">78.30</td>

and I need a regex for it. I have tried many like this one:

Total lipid\(fat\)\\n\\t\\t\\n\\t\\t\\n\\t\\t\<\/td\>\\n\\t\\t\\n\\t\\t\<td\>g\<\/td\>\\n\\t\\t\\n\\t\\t\\t\<td style\=\"text\-align\:right\;\"\>(.*?)\<\/td\>

And also I have another string:

<td style="text-align:left">Vitamin C, total ascorbic acid\n\t\t\n\t\t\n\t\t</td>\n\t\t\n\t\t<td>mg</td>\n\t\t\n\t\t\t<td style="text-align:right;">0.0</td>

and I have tried many regex's for that one also like:

Vitamin C\, total ascorbic acid\\n\\t\\t\\n\\t\\t\\n\\t\\t\<\/td\>\\n\\t\\t\\n\\t\\t\<td\>mg\<\/td\>\\n\\t\\t\\n\\t\\t\\t\<td style\=\"text\-align\:right\;\"\>(.*?)\<\/td\>

and my third string is:

<td style="text-align:left">Vitamin B-12\n\t\t\n\t\t\n\t\t</td>\n\t\t\n\t\t<td>\xb5g</td>\n\t\t\n\t\t\t<td style="text-align:right;">0.07</td>

and I have tried this one and more like this:

data = re.search('Vitamin B\-12\\n\\t\\t\\n\\t\\t\\n\\t\\t\<\/td\>\\n\\t\\t\\n\\t\\t\<td\>µg\<\/td\>\\n\\t\\t\\n\\t\\t\\t\<td style\=\"text\-align\:right\;\"\>(.*?)\<\/td\>',tb)

From those strings I am trying to get the data which is:

  1. from the first string is: 78.30
  2. from the second: 0.0
  3. from the third: 0.07

I need regex like i have written above with just minor changes because i know i am missing something

Stephen Rauch

As you have discovered, XML (HTML) and regex's do not mix well. However this problem is quite straight forward when using BeautifulSoup:

Code:

soup = BeautifulSoup(row)
print soup.findAll('td')[-1].text

Test Code:

data = (
    """
    <td class="cspan">Proximates</td>
    <td style="text-align:left">Total lipid (fat)


    </td>
    <td>g</td>
        <td style="text-align:right;">78.30</td>
    """,
    """
    <td style="text-align:left">Vitamin C, total ascorbic acid


    </td>
    <td>mg</td>
    <td style="text-align:right;">0.0</td> "
    """,
    """
    <td style="text-align:left">Vitamin B-12


    </td>
    <td>\xb5g</td>
    <td style="text-align:right;">0.07</td> "
    """,
)


from bs4 import BeautifulSoup
for row in data:
    soup = BeautifulSoup(row)
    print soup.findAll('td')[-1].text

Results:

78.30
0.0
0.07

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to check if a string contains special characters (characters other than a-z 0-9 and _) with regex

Regex - allow some characters only if the string contains any other characters

Regex Check Whether a string contains characters other than specified

PHP regex to check if a string ends with a space and 15 other characters

How to make the scanner read string characters like , - , / , *

How to validate comma separated string with space using Regex

How to check if a string contains accented Latin characters like é in Ruby?

regex string between comma or parenthese but not contains dot

Split string by space and comma using regex

Regex expression for comma or space separated string

Regex Assistance - alphanumeric string separated by comma and/or space

Remove space after comma in String regex java

How to check if a string contains only specifc characters using regex

Regex - How to check if string only contains specific characters instead of all?

How to search/find special characters like &, < or > in the string with regex using Python

How to write Regex for pattern like a series of Number and String pair separated by Space. Pattern -> n <space> string

How to make Joi regex() validation fail if the string contains " " (whitespace)?

How many characters are visible like a space, but are not a space characters?

BigQuery regex to find string that contains chinese characters

RegEx: Checking if string contains non whitelisted characters

Regex to check string contains only Hex characters

Regex in C, check if string contains specific characters

Ruby Regular Expression To Ensure String Contains Comma Characters

How can I make a regular expression that contains special characters, like word boundaries, with a variable?

How to split string that contains comma into another column?

How to get 2 Characters after space using regex or without using regex from a string

How to remove strings and special characters from string and show only numbers without comma using regex?

How can I allow a single space in a regex in Dart but allow other characters to be 1 or more?

How to convert comma separated string to list that contains comma in items in Python?