我正在尝试从网址“https://www.sustainalytics.com/esg-ratings/?industry=Banks¤tpage=1”中抓取元素
<div class="company-row d-flex">
<div class="w-50">
<a class="primary-color d-block" href="../esg-rating/aecc-aviation-power-co-ltd/1031931293">AECC Aviation Power Co Ltd</a>
<small>SHG:600893</small>
</div>
<div class="company-score w-50">
<div class="row">
<div class="col-2">53.3</div>
<div class="col-4 d-none d-lg-block">
<div class="row cc-risk-rating-brackets">
<div class="col cc-risk-rating-bracket active"><span>&nbsp</span></div><div class="col cc-risk-rating-bracket active"><span>&nbsp</span></div><div class="col cc-risk-rating-bracket active"><span>&nbsp</span></div><div class="col cc-risk-rating-bracket active"><span>&nbsp</span></div><div class="col cc-risk-rating-bracket active"><span>&nbsp</span></div> </div>
</div>
<div class="col-lg-6 col-md-10">Severe ESG Risk</div>
</div>
</div>
</div>
>>>
这是我的python代码:
>>> import requests
>>> from bs4 import BeautifulSoup
>>> source = requests.get('https://www.sustainalytics.com/esg-ratings/?industry=Aerospace+%26+Defense¤tpage=1').text
>>> soup = BeautifulSoup(source)
>>> company_info = soup.find(class_='company-row d-flex')
>>> company_name = company_info.a.text
>>> company_exchange = company_info.find("small").text
>>> company_risk = soup.find("div", class_="company-score w-50").text
>>> company_risk = company_risk.split('\n')
>>> print(company_name, company_exchange, company_risk[2], company_risk[7])
AECC Aviation Power Co Ltd SHG:600893 53.3 Severe ESG Risk
>>>
In [16]: a = """<div class="company-row d-flex">
...: <div class="w-50">
...: <a class="primary-color d-block" href="../esg-rating/aareal-bank-ag/1008754176">Aareal Bank AG</a>
...: <small>ETR:ARL</small>
...: </div>
...: <div class="company-score w-50">
...: <div class="row">
...: <div class="col-2">22.3</div>
...: <div class="col-4 d-none d-lg-block">
...: <div class="row cc-risk-rating-brackets">
...: <div class="col cc-risk-rating-bracket active"><span> </span></div><div class="col cc-risk-rating-bracket active"><span> </span></div><div class="col cc-risk-rating-bracket active"><span> </span></div><div class="col cc-risk-
...: rating-bracket"><span> </span></div><div class="col cc-risk-rating-bracket"><span> </span></div> </div>
...: </div>
...: <div class="col-lg-6 col-md-10">Medium ESG Risk</div>
...: </div>
...: </div>
...: </div>"""
In [17]: soup = BeautifulSoup(a, "html.parser")
In [18]: data = {}
In [19]: data["name"] = soup.find("a").get_text(strip=True)
In [20]: data["name_href"] = soup.find("a")["href"]
In [21]: data["small_text"] = soup.find("small").get_text(strip=True)
In [22]: data["company_score"] = soup.find("div", class_="row").find("div", class_="col-2").get_text(strip=True)
In [23]: data
Out[23]:
{'name': 'Aareal Bank AG',
'name_href': '../esg-rating/aareal-bank-ag/1008754176',
'small_text': 'ETR:ARL',
'company_score': '22.3'}
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句