我有以下股利id="participant"
:
<div id="participant" class="panel-collapse collapse in" role="tabpanel" aria-expanded="true" aria-labelledby="headingOne" style="">
<div class="panel-body">
<div class="row">
<div class="col-sm-12">
<div class="question-container">
<div class="question-group">
<h5 class="question">
Organisation
</h5>
<div class="answer">
<p>Ministerio de Hacienda [Ministry of Finance]</p>
<p>Consejo de Contadores Públicos del Paraguay (Consejo) [Council of Public Accountants of Paraguay]</p>
<p>Central Bank of Paraguay – Superintendence of Banks</p>
<br>
</div>
</div>
<div class="question-group">
<h5 class="question">
Role of the organisation
</h5>
<div class="answer">
<p>The Ministry of Finance has authority to establish accounting standards for all entities in Paraguay other than banks and financial institutions. </p>
<p>The Consejo is the professional association of public accountants in Paraguay. The Consejo advises the Ministry of Finance with regard to accounting standards.</p>
<p>Accounting standards for banks and other financial institutions are established by the Central Bank of Paraguay.</p>
</div>
</div>
<div class="question-group">
<h5 class="question">
Website
</h5>
<div class="answer">
<p>Ministry of Finance: <a href="http://www.hacienda.gov.py" target="_blank">http://www.hacienda.gov.py</a></p>
<p>Consejo: <a href="http://www.consejo.com.py" target="_blank">www.consejo.com.py</a></p>
<p>Central Bank: <a href="http://www/bcp.gov.py" target="_blank">http://www/bcp.gov.py</a></p>
</div>
</div>
<div class="question-group">
<h5 class="question">
Email contact
</h5>
<div class="answer">
<p>Consejo: <a href="mailto:[email protected]">[email protected]</a><br>
Central Bank:
</p>
<ul>
<li><a href="mailto:[email protected]">[email protected]</a> and <a href="[email protected]">[email protected]</a></li>
<li><a href="mailto:[email protected]">[email protected]</a></li>
<li><a href="mailto:[email protected]">[email protected]</a></li>
</ul>
</div>
</div>
</div>
</div>
</div>
我想每个div的内容class="question"
,并class="answer"
从开始<div id="participant">
,因为我有很多的div具有相同的结构和CSS,所以我可以在它们之间与区别id
这是我的预期输出:
Organisation Ministerio de Hacienda [Ministry of Finance]
Consejo de Contadores Públicos del Paraguay (Consejo) [Council of Public Accountants of Paraguay]
Central Bank of Paraguay – Superintendence of Banks
Role of the The Ministry of Finance has authority to establish accounting standards for all entities in Paraguay other than banks and financial institutions.
organisation The Consejo is the professional association of public accountants in Paraguay. The Consejo advises the Ministry of Finance with regard to accounting standards.
Accounting standards for banks and other financial institutions are established by the Central Bank of Paraguay.
Website Ministry of Finance: http://www.hacienda.gov.py
Consejo: www.consejo.com.py
Central Bank: http://www/bcp.gov.py
Emailcontact Consejo: [email protected]
Central Bank:
[email protected] and [email protected]
[email protected]
[email protected]
到目前为止,这是我的工作:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
# Site URL
url = "https://www.ifrs.org/use-around-the-world/use-of-ifrs-standards-by-jurisdiction/paraguay"
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text
# Parse HTML code for the entire site
soup = BeautifulSoup(html_content, "lxml")
divs = soup.find_all("div", attrs={"id": "participant"})
disp = []
d=[]
for c in divs : disp.append(c.find('div', attrs={'class': 'question-group'}))
for t in disp : d.append(t.h5.text.strip())
抛开最终的打印格式,这样的方法应该起作用:
questions = [q.text.strip() for q in soup.select('div#participant h5.question') ]
answers = [a.text.strip() for a in soup.select('div#participant div.answer')]
for q, a in zip(questions,answers):
print(q,": ",a)
print('---')
输出:
Organisation : Ministerio de Hacienda [Ministry of Finance]
Consejo de Contadores Públicos del Paraguay (Consejo) [Council of Public Accountants of Paraguay]
Central Bank of Paraguay – Superintendence of Banks
---
等等。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句