BeautifulSoup 如何从网站（电晕）中提取数据？

mo ta 发表于 Dev

摩他

我想以国家名称的形式保存每个国家的文章数量，文件中的文章数量，用于我从以下站点的研究工作。为此，我编写了这段代码，不幸的是它不起作用。

http://corona.sid.ir/

!pip install bs4
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page
url='http://corona.sid.ir/'
data  = requests.get(url).text 
soup = BeautifulSoup(data,"lxml")  # create a soup object using the variable 'data'
soup.find_all(attrs={"class":"value"})

结果= []

赤城88

您使用了错误的网址。尝试这个：

from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page
import pandas as pd

url = 'http://corona.sid.ir/world.svg'
data  = requests.get(url).text 
soup = BeautifulSoup(data,"lxml")  # create a soup object using the variable 'data'
soup.find_all(attrs={"class":"value"})

rows = []
for each in soup.find_all(attrs={"class":"value"}):
    row = {}
    row['country'] = each.text.split(':')[0]
    row['count'] = each.text.split(':')[1].strip()
    rows.append(row)
    
df = pd.DataFrame(rows)

输出：

print(df)
                  country count
0                 Andorra    17
1    United Arab Emirates   987
2             Afghanistan    67
3                 Albania   143
4                 Armenia    49
..                    ...   ...
179                 Yemen    54
180               Mayotte     0
181          South Africa  1938
182                Zambia   127
183              Zimbabwe   120

[184 rows x 2 columns]

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-08-26

我来说两句

0 条评论

登录后参与评论

如何从带有请求和 BeautifulSoup4 的动态内容的网站中提取表格数据？

如何使用python beautifulsoup从网站中提取隐藏评论？

BeautifulSoup 如何用于从网站中提取“href”链接？

使用BeautifulSoup提取网站数据

BeautifulSoup 如何从网站（电晕）中提取数据？

BeautifulSoup 如何从网站（电晕）中提取数据？

Linux的官方Adobe Flash存储库是否已过时？

如何使用HttpClient的在使用SSL证书，无论多么“糟糕”是

错误：“ javac”未被识别为内部或外部命令，

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

Modbus Python施耐德PM5300

为什么Object.hashCode（）不遵循Java代码约定

如何检查字符串输入的格式

检查嵌套列表中的长度是否相同

错误TS2365：运算符'！=='无法应用于类型'“（”'和'“）”'

如何自动选择正确的键盘布局？-仅具有一个键盘布局

如何正确比较 scala.xml 节点？

在令牌内联程序集错误之前预期为 ')'

如何在JavaScript中获取数组的第n个元素？

如何将sklearn.naive_bayes与（多个）分类功能一起使用？

ValueError：尝试同时迭代两个列表时，解包的值太多（预期为 2）

如何监视应用程序而不是单个进程的CPU使用率？

解决类Koin的实例时出错

ES5的代理替代

有什么解决方案可以将android设备用作Cast Receiver？

VBA 自动化错误：-2147221080 (800401a8)

套接字无法检测到断开连接