从具有特定ID的div开始获取嵌套的div内容

桑德拉·吉列普(Sandra Guilep Zouaoui Zandeh)

我有以下股利id="participant"

<div id="participant" class="panel-collapse collapse in" role="tabpanel" aria-expanded="true" aria-labelledby="headingOne" style="">
<div class="panel-body">
<div class="row">
   <div class="col-sm-12">
      <div class="question-container">
         <div class="question-group">
            <h5 class="question">
               Organisation
            </h5>
            <div class="answer">
               <p>Ministerio de Hacienda [Ministry of Finance]</p>
               <p>Consejo de Contadores Públicos del Paraguay (Consejo) [Council of Public Accountants of Paraguay]</p>
               <p>Central Bank of Paraguay – Superintendence of Banks</p>
               <br>
            </div>
         </div>
         <div class="question-group">
            <h5 class="question">
               Role of the organisation
            </h5>
            <div class="answer">
               <p>The Ministry of Finance has authority to establish accounting standards for all entities in Paraguay other than banks and financial institutions.&nbsp; </p>
               <p>The Consejo is the professional association of public accountants in Paraguay.&nbsp; The Consejo advises the Ministry of Finance with regard to accounting standards.</p>
               <p>Accounting standards for banks and other financial institutions are established by the Central Bank of Paraguay.</p>
            </div>
         </div>
         <div class="question-group">
            <h5 class="question">
               Website
            </h5>
            <div class="answer">
               <p>Ministry of Finance: <a href="http://www.hacienda.gov.py" target="_blank">http://www.hacienda.gov.py</a></p>
               <p>Consejo: <a href="http://www.consejo.com.py" target="_blank">www.consejo.com.py</a></p>
               <p>Central Bank: <a href="http://www/bcp.gov.py" target="_blank">http://www/bcp.gov.py</a></p>
            </div>
         </div>
         <div class="question-group">
            <h5 class="question">
               Email contact
            </h5>
            <div class="answer">
               <p>Consejo: <a href="mailto:[email protected]">[email protected]</a><br>
                  Central Bank:
               </p>
               <ul>
                  <li><a href="mailto:[email protected]">[email protected]</a> and <a href="[email protected]">[email protected]</a></li>
                  <li><a href="mailto:[email protected]">[email protected]</a></li>
                  <li><a href="mailto:[email protected]">[email protected]</a></li>
               </ul>
            </div>
         </div>
      </div>
   </div>
</div>

我想每个div的内容class="question",并class="answer"从开始<div id="participant">,因为我有很多的div具有相同的结构和CSS,所以我可以在它们之间与区别id

这是我的预期输出:

Organisation Ministerio de Hacienda [Ministry of Finance]
             Consejo de Contadores Públicos del Paraguay (Consejo) [Council of Public Accountants of Paraguay]
             Central Bank of Paraguay – Superintendence of Banks
Role of the  The Ministry of Finance has authority to establish accounting standards for all entities in Paraguay other than banks and financial institutions.
organisation The Consejo is the professional association of public accountants in Paraguay.  The Consejo advises the Ministry of Finance with regard to accounting standards.
             Accounting standards for banks and other financial institutions are established by the Central Bank of Paraguay.
Website      Ministry of Finance: http://www.hacienda.gov.py
             Consejo: www.consejo.com.py
             Central Bank: http://www/bcp.gov.py    
Emailcontact Consejo: [email protected]
             Central Bank:
             [email protected] and [email protected]
             [email protected]
             [email protected]          
         

到目前为止,这是我的工作:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
# Site URL
url = "https://www.ifrs.org/use-around-the-world/use-of-ifrs-standards-by-jurisdiction/paraguay"
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text
# Parse HTML code for the entire site
soup = BeautifulSoup(html_content, "lxml")
divs = soup.find_all("div", attrs={"id": "participant"})
disp = []
d=[]
for c in divs : disp.append(c.find('div', attrs={'class': 'question-group'}))
for t in disp : d.append(t.h5.text.strip())    
杰克·弗莱汀

抛开最终的打印格式,这样的方法应该起作用:

questions = [q.text.strip() for q in soup.select('div#participant h5.question') ]
answers = [a.text.strip() for a in soup.select('div#participant div.answer')]
for q, a in zip(questions,answers):
    print(q,": ",a)
    print('---')

输出:

Organisation :  Ministerio de Hacienda [Ministry of Finance]
Consejo de Contadores Públicos del Paraguay (Consejo) [Council of Public Accountants of Paraguay]
Central Bank of Paraguay – Superintendence of Banks
---

等等。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

获取具有特定类的所有div的div ID

删除具有特定ID的div标签并保留内容

根据ID获取特定的Div内容

查找具有特定内容的DIV标签

单击时如何从具有相同ID的多个父div中获取子div内容

JSOUP选择具有特定ID的<div>

如何使用JavaScript \ JQuery将某些内容写入具有特定ID的div中?

从具有特定类别的div列表中获取div

JS:如何根据子div的内容隐藏具有多个嵌套div的父div

选择具有多个非嵌套特定子元素的div

仅在嵌套div中打印具有特定类的元素

如何获取具有包含特定文本的类或ID的抓取DIV

通过ID获取div内容

卷曲特定的div标签,直到获取所有内容

获取所有具有div中随机ID的输入

仅获取具有两个特定类的 div

如何获取具有特定类的 div 数组?

Python BeautifulSoup 无法从具有特定类的 div 获取数据

克隆后删除具有id属性的特定div

jQuery最接近的具有特定类/ ID的div

从具有相同ID的多个div获取数据-jQuery

如何查找具有非唯一ID /类的div,该div包含具有特定文本的div,然后获取该div下的所有<td>和<tr>标记

如何查找具有非唯一ID /类的div,该div包含具有特定文本的div,然后获取该div下的所有<td>和<tr>标记

在具有相同ID的许多div onclick href中刷新特定div ID

获取具有已知ID的div下的所有div并对其进行迭代

当有多个具有相同ID的div时,使用jquery在某个div内获取变量

使用jQuery从嵌套的div获取ID

HTML Agility Pack从div获取特定内容

具有嵌套div结构的CSS网格