如何遍历div标签中存在的标签

哈里巴利

我是BeautifulSoup的新手。这是我感兴趣的html段:

<div class="jpag" id="srchpagination"><a rel='prev' class="dis"><span>&lsaquo;&lsaquo;</span> Prev</a><span class="act">1</span><a 
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2' >2</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-3' >3</a><a 
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-4' >4</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-5' >5</a><a 
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-6' >6</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-7' >7</a><a 
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-8' >8</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-9' >9</a><a 
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-10' >10</a><a rel='next' href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2'>Next 
<span>&rsaquo;&rsaquo;</span></a></div>

我想检查'a'标记内的最后页码的值是否为10。我能够使用以下命令获取标记:

atags1=bSoup.find('div' ,attrs={'class' : 'jpag'})

现在,我要遍历没有像rel =“ prev”或rel =“ next”这样的属性的'a'标记,这样我将仅通过页码遍历'a'标记。请帮我。提前致谢。

帕德拉克·坎宁安(Padraic Cunningham)

有很多方法可以做到这一点,一种简单的方法是在div中选择锚点并过滤所有具有rel属性的锚点:

html = """<div class="jpag" id="srchpagination"><a rel='prev' class="dis"><span>&lsaquo;&lsaquo;</span> Prev</a><span class="act">1</span><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2' >2</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-3' >3</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-4' >4</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-5' >5</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-6' >6</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-7' >7</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-8' >8</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-9' >9</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-10' >10</a><a rel='next' href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2'>Next
<span>&rsaquo;&rsaquo;</span></a></div>"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)

for a in soup.select("#srchpagination a[href]"):
    if not a.get("rel"):
        print(a)

这会给你:

<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2">2</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-3">3</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-4">4</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-5">5</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-6">6</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-7">7</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-8">8</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-9">9</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-10">10</a>

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章