<span>
I Like
<span class='unwanted'> to punch </span>
your face
</span>
如何打印“我喜欢你的脸”而不是“我喜欢打你的脸”
我试过了
lala = soup.find_all('span')
for p in lala:
if not p.find(class_='unwanted'):
print p.text
但它给出“ TypeError:find()不包含关键字参数”
您可以extract()
先删除不需要的标签,然后再获取文字。
但是它保留了所有内容'\n'
,spaces
因此您需要一些工作才能删除它们。
data = '''<span>
I Like
<span class='unwanted'> to punch </span>
your face
<span>'''
from bs4 import BeautifulSoup as BS
soup = BS(data, 'html.parser')
external_span = soup.find('span')
print("1 HTML:", external_span)
print("1 TEXT:", external_span.text.strip())
unwanted = external_span.find('span')
unwanted.extract()
print("2 HTML:", external_span)
print("2 TEXT:", external_span.text.strip())
结果
1 HTML: <span>
I Like
<span class="unwanted"> to punch </span>
your face
<span></span></span>
1 TEXT: I Like
to punch
your face
2 HTML: <span>
I Like
your face
<span></span></span>
2 TEXT: I Like
your face
您可以跳过Tag
外部范围内的每个对象,而仅保留NavigableString
对象(HTML中为纯文本)。
data = '''<span>
I Like
<span class='unwanted'> to punch </span>
your face
<span>'''
from bs4 import BeautifulSoup as BS
import bs4
soup = BS(data, 'html.parser')
external_span = soup.find('span')
text = []
for x in external_span:
if isinstance(x, bs4.element.NavigableString):
text.append(x.strip())
print(" ".join(text))
结果
I Like your face
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句