使用 BeautifulSoup FindAll 进行网页抓取

Niccola Tartaglia 发表于 Dev

52

尼古拉·塔尔塔利亚

我想在下面的网站上下载NEED TO KNOW上面的4篇文章的hrefs：

http://www.marketwatch.com/

但我无法使用 FindAll 唯一地识别它们。以下方法为我提供了符合这些标准的文章，还有一堆其他文章。

trend_articles  = soup1.findAll("a", {"class": "link"})
href= article.a["href"]

trend_articles  = soup1.findAll("div", {"class": "content--secondary"})
href= article.a["href"]

有人有建议，我如何才能获得这 4 篇文章，而且只有这 4 篇文章？

罗曼·阿列克谢耶夫

这似乎对我有用：

from bs4 import BeautifulSoup
import requests

page = requests.get("http://www.marketwatch.com/").content
soup = BeautifulSoup(page, 'lxml')
header_secondare = soup.find('header', {'class': 'header--secondary'})
trend_articles = header_secondare.find_next_siblings('div', {'class': 'group group--list '})[0].findAll('a')

trend_articles = [article.contents[0] for article in trend_articles]
print(trend_articles)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-06-2

我来说两句

0 条评论

登录后参与评论

上一篇：为什么kibana的可视化图表中的唯一计数不正确？

相关文章

使用BeautifulSoup进行网页抓取将不起作用

硒与BeautifulSoup进行网页抓取

使用BeautifulSoup进行网页抓取时出现属性错误

使用BeautifulSoup进行网页抓取时出错

Python字符串转换为int / float [使用BeautifulSoup进行熊猫/网页抓取]

使用BeautifulSoup进行网页抓取只能获得一半的内容

使用python，BeautifulSoup和pandas'read_html'进行网页抓取的问题

使用BeautifulSoup抓取网页

使用beautifulsoup进行Python网页抓取-无法从Clinicaltrials.gov提取首席调查员

使用BeautifulSoup Python抓取网页

用beautifulsoup进行网页抓取

使用 BeautifulSoup 抓取网页

如何使用 BeautifulSoup 进行网页抓取

使用 BeautifulSoup 进行网页抓取 -- Python

使用 Python BeautifulSoup 进行网页抓取

使用 BeautifulSoup 进行网页抓取时如何移动到新页面？

使用 Beautifulsoup 进行网页抓取 - 输出无意合并的单词（例如，ThisHappens）

使用 Beautifulsoup 抓取 UEFA 网页

使用 BeautifulSoup 进行网页抓取时无法在 a 标签中显示文本

使用 Beautifulsoup 4 进行网页抓取 - 提取联系信息

使用 beautifulsoup 进行网页抓取的问题

使用 BeautifulSoup 进行网页抓取 / Zomato 网页抓取

使用 BeautifulSoup 和 json 进行网页抓取

使用 Selenium 和 BeautifulSoup 进行网页抓取返回空列表

在 python 网页抓取中使用 Selenium 对 BeautifulSoup 进行分页

使用 BeautifulSoup 遍历 URL 以进行网页抓取

如何在使用 BeautifulSoup 进行网页抓取时访问 <li> 中的特定项目？

使用 BeautifulSoup 进行网页抓取，在 html 中找不到表格

使用 Selenium 和 BeautifulSoup 进行 Zillow 网页抓取

TOP 榜单

文章

热门标签

归档