我是BeautifulSoup的新手。这是我感兴趣的html段:
<div class="jpag" id="srchpagination"><a rel='prev' class="dis"><span>‹‹</span> Prev</a><span class="act">1</span><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2' >2</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-3' >3</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-4' >4</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-5' >5</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-6' >6</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-7' >7</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-8' >8</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-9' >9</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-10' >10</a><a rel='next' href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2'>Next
<span>››</span></a></div>
我想检查'a'标记内的最后页码的值是否为10。我能够使用以下命令获取标记:
atags1=bSoup.find('div' ,attrs={'class' : 'jpag'})
现在,我要遍历没有像rel =“ prev”或rel =“ next”这样的属性的'a'标记,这样我将仅通过页码遍历'a'标记。请帮我。提前致谢。
有很多方法可以做到这一点,一种简单的方法是在div中选择锚点并过滤所有具有rel
属性的锚点:
html = """<div class="jpag" id="srchpagination"><a rel='prev' class="dis"><span>‹‹</span> Prev</a><span class="act">1</span><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2' >2</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-3' >3</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-4' >4</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-5' >5</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-6' >6</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-7' >7</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-8' >8</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-9' >9</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-10' >10</a><a rel='next' href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2'>Next
<span>››</span></a></div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for a in soup.select("#srchpagination a[href]"):
if not a.get("rel"):
print(a)
这会给你:
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2">2</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-3">3</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-4">4</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-5">5</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-6">6</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-7">7</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-8">8</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-9">9</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-10">10</a>
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句