I have string:
<td class="cspan">Proximates</td>\n\t<td style="text-align:left">Total lipid (fat)\n\t\t\n\t\t\n\t\t</td>\n\t\t\n\t\t<td>g</td>\n\t\t\n\t\t\t<td style="text-align:right;">78.30</td>
and I need a regex for it. I have tried many like this one:
Total lipid\(fat\)\\n\\t\\t\\n\\t\\t\\n\\t\\t\<\/td\>\\n\\t\\t\\n\\t\\t\<td\>g\<\/td\>\\n\\t\\t\\n\\t\\t\\t\<td style\=\"text\-align\:right\;\"\>(.*?)\<\/td\>
And also I have another string:
<td style="text-align:left">Vitamin C, total ascorbic acid\n\t\t\n\t\t\n\t\t</td>\n\t\t\n\t\t<td>mg</td>\n\t\t\n\t\t\t<td style="text-align:right;">0.0</td>
and I have tried many regex's for that one also like:
Vitamin C\, total ascorbic acid\\n\\t\\t\\n\\t\\t\\n\\t\\t\<\/td\>\\n\\t\\t\\n\\t\\t\<td\>mg\<\/td\>\\n\\t\\t\\n\\t\\t\\t\<td style\=\"text\-align\:right\;\"\>(.*?)\<\/td\>
and my third string is:
<td style="text-align:left">Vitamin B-12\n\t\t\n\t\t\n\t\t</td>\n\t\t\n\t\t<td>\xb5g</td>\n\t\t\n\t\t\t<td style="text-align:right;">0.07</td>
and I have tried this one and more like this:
data = re.search('Vitamin B\-12\\n\\t\\t\\n\\t\\t\\n\\t\\t\<\/td\>\\n\\t\\t\\n\\t\\t\<td\>µg\<\/td\>\\n\\t\\t\\n\\t\\t\\t\<td style\=\"text\-align\:right\;\"\>(.*?)\<\/td\>',tb)
From those strings I am trying to get the data which is:
I need regex like i have written above with just minor changes because i know i am missing something
As you have discovered, XML (HTML) and regex's do not mix well. However this problem is quite straight forward when using BeautifulSoup:
Code:
soup = BeautifulSoup(row)
print soup.findAll('td')[-1].text
Test Code:
data = (
"""
<td class="cspan">Proximates</td>
<td style="text-align:left">Total lipid (fat)
</td>
<td>g</td>
<td style="text-align:right;">78.30</td>
""",
"""
<td style="text-align:left">Vitamin C, total ascorbic acid
</td>
<td>mg</td>
<td style="text-align:right;">0.0</td> "
""",
"""
<td style="text-align:left">Vitamin B-12
</td>
<td>\xb5g</td>
<td style="text-align:right;">0.07</td> "
""",
)
from bs4 import BeautifulSoup
for row in data:
soup = BeautifulSoup(row)
print soup.findAll('td')[-1].text
Results:
78.30
0.0
0.07
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments