我如何使用beautifulsoup从网站的表格中获取多个信息

用户名

我试图弄清楚如何从https://www.fda.gov/Safety/Recalls/网站提取我想要的多个信息

import requests
from bs4 import BeautifulSoup

res = requests.get("https://www.fda.gov/Safety/Recalls/")
soup = BeautifulSoup(res.text, "lxml")

for item in soup.select("table td"):
    if "Undeclared" in item.text:
        brand = item.find_parents()[0].select("td")[1].text
        reason = item.text
        print(brand,reason)

如何从html获取brand_link？

SIM卡

我想这就是您的预期输出：

import requests
from bs4 import BeautifulSoup

res = requests.get("https://www.fda.gov/Safety/Recalls/")
soup = BeautifulSoup(res.text, "lxml")

for item in soup.select("table td"):
    if "Undeclared" in item.text:
        brand = item.find_parents()[0].select("td")[1].text
        reason = item.text
        print(brand,reason)

部分输出：

N/A   Undeclared Milk
Colorado Nut Company and various other private labels   Undeclared milk
All Natural, Weis, generic   Undeclared milk
Dilettante Chocolates   Undeclared almonds
Hot Pockets   Undeclared egg, milk, soy, and wheat
Figiâs   Undeclared Milk
Germack   Undeclared Milk

当您还想获得到品牌名称的链接时，可以执行以下操作：

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

url = "https://www.fda.gov/Safety/Recalls/"
res = requests.get(url)
soup = BeautifulSoup(res.text, "lxml")

for item in soup.select("table td"):
    if "Undeclared" in item.text:
        brand = item.find_parents()[0].select("td")[1].text
        brand_link = urljoin(url,item.find_parents()[0].select("td")[1].select("a")[0]['href'])
        reason = item.text
        print("Brand: {}\nBrand_link: {}\nReason: {}\n".format(brand,brand_link,reason))

输出：

Brand: N/A  
Brand_link: https://www.fda.gov/Safety/Recalls/ucm587012.htm
Reason: Undeclared Milk

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-19

我来说两句

0 条评论

登录后参与评论

上一篇：有无序集合的Python数据类型吗？

使用BeautifulSoup从网站获取表格

动态生成表格时，如何使用Python BeautifulSoup来获取表格信息？

如何使用BeautifulSoup从网站获取var脚本中的json数据？

在python 3.6中使用beautifulsoup4抓取网站以获取产品信息时

如何使用BeautifulSoup从python网站中未加载的选项卡中抓取表格数据

如何使用 BeautifulSoup 获取网站图片 src

如何使用BeautifulSoup从网站获取href链接

如何使用 python BeautifulSoup 获取表格内容

使用 BeautifulSoup 从网站导入表格

如何使用Selenium Webdriver从多个页面中获取信息？

如何从表格中获取多个详细信息而不是一个ID？

如何使用 JSoup 从网站获取多个表

如何使用python和beautifulsoup4循环抓取网站中多个页面的数据

如何使用|获取多个信息。grep？

如何从网站获取价格信息？

如何测试我的网站表格？

我们如何在网站可编辑的odoo 12中制作表格？（使用js / jquery编辑/保存表格）

Python beautifulsoup，抓取网站中的表格

如何使用python请求获取网站的服务器信息？

无法使用Beautifulsoup从网站读取表格

使用BeautifulSoup 4.8.2从网站抓取表格

如何使用BeautifulSoup在网站上获取实时股价？

如何使用 BeautifulSoup Python 获取数据表单网站？

如何使用BeautifulSoup从网站上获取所有标头？

我如何使用python获取匹配信息

使用 Google 表格中的应用脚本获取网站数据

我如何使用Beautifulsoup获取网址

如何从我的网站上的 Excel 表格中输出值？

我如何使用jsonpath从json中获取多个元素？

TOP 榜单

文章

我如何使用beautifulsoup从网站的表格中获取多个信息

我如何使用beautifulsoup从网站的表格中获取多个信息

计算数据帧R中的字符串频率

Android Studio Kotlin：提取为常量

Excel 2016图表将增长与4个参数进行比较

获取并汇总所有关联的数据

如何使用Redux-Toolkit重置Redux Store

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

将加号/减号添加到jQuery菜单

算术中的c ++常量类型转换

TYPO3：将 Formhandler 添加到新闻扩展

TreeMap中的自定义排序

如何开始为Ubuntu开发

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

无法使用 envoy 访问 .ssh/config

在Ubuntu和Windows中，触摸板有时会滞后。硬件问题？

遍历元素数组以每X秒在浏览器上显示

在Jenkins服务器中使用Selenium和Ruby进行的黄瓜测试失败，但在本地计算机中通过

警告消息：在matrix（unlist（drop.item），ncol = 10，byrow = TRUE）中：数据长度[16]不是列数的倍数[10]>？

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

如何使用tweepy流式传输来自指定用户的推文（仅在该用户发布推文时流式传输）

尝试在Dell XPS13 9360上安装Windows 7时出错

如果从DB接收到的值为空，则JMeter JDBC调用将返回该值作为参数名称