为什么 BeautifulSoup 不返回子元素?

杰里米·福克斯

我正在尝试使用 Python 3 和 BeautifulSoup 4 获取一个 url 以从此页面下载 xlsx 文件:https : //psnc.org.uk/funding-and-statistics/funding-distribution/retained-margin-category-米/

我需要获取最新文件的 url,该文件位于 a<p>标签列表中的索引 0 处<div>,我可以在控制台中使用 JS 获取它,如下所示:

var link = document.getElementsByClassName("toggle_container")[2].children[1].children[0].href

如果我使用BS4让所有的的<p>页面上的标签,我想要的链接是在列表中:

import urllib
import requests
from bs4 import BeautifulSoup

cat_m_site = "https://psnc.org.uk/funding-and-statistics/funding-distribution/retained-margin-category-m/"

page = requests.get(cat_m_site)

soup = BeautifulSoup(page.text, 'html.parser')
p_elements = soup.find_all('p')

        for item in p_elements:
            print(item)

如果我尝试通过获取<div>包含链接的来重现 JS 解决方案,则应该有一个包含 29 个<p>元素的列表,但该列表为空:

import urllib
import requests
from bs4 import BeautifulSoup

cat_m_site = "https://psnc.org.uk/funding-and-statistics/funding-distribution/retained-margin-category-m/"

page = requests.get(cat_m_site)

soup = BeautifulSoup(page.text, 'html.parser')

divs = soup.find_all('div', {'class':'toggle_container'})
children = divs[2].findChildren("p", recursive=True)

        for child in children:
            print(child)

我更喜欢这种方式,因为我“知道”链接将位于此 div 的第 0 个元素中,但我觉得我缺少有关 findChildren 方法的某些内容。

赤城88

使用soup = BeautifulSoup(page.text, 'lxml')替代

import urllib
import requests
from bs4 import BeautifulSoup

cat_m_site = "https://psnc.org.uk/funding-and-statistics/funding-distribution/retained-margin-category-m/"

page = requests.get(cat_m_site)

soup = BeautifulSoup(page.text, 'lxml')

divs = soup.find_all('div', {'class':'toggle_container'})
children = divs[2].findChildren("p", recursive=True)

for child in children:
    print(child) 

输出:

<p><a href="https://psnc.org.uk/wp-content/uploads/2019/10/Category-M-201920-Q3-Oct-Dec-with-Aug-19-combined.xlsx">Category M 2019/20 Q3 Oct-Dec (with Aug 19 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2019/08/Category-M-2019-August-with-Jul-19-combined.xlsx">Category M 2019 August (with Jul 19 combined</a>) (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2019/08/Category-M-2019-20-Q2-Jul-Sep-with-Apr-19-combined.xlsx">Category M 2019/20 Q2 Jul-Sep (with Apr 19 combined) </a>(MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2019/05/Cat-M-Apr-2019-1.xlsx">Category M: 2019/20 Q1 Apri-June (with Jan 2019 combined) </a>(MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2019/01/Category-M-2018.19-Q4-JanMar-with-Nov-18-combined.xlsx">Category M: 2018/19 Q4 Jan-Mar (with Nov 18 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2019/01/Category-M-Nov-18.xlsx">Category M: 2018 November (with Oct 18 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2018/09/Category-M-2018.19-Q3-OctDec-with-Aug-18-combined.xlsx">Category M: 2018/19 Q3 Oct-Dec (with Aug 18 combined)</a> (MS Excel)</p>
<p><a href="http://psnc.org.uk/wp-content/uploads/2018/06/Category-M-2018.19-Q2-JulSep-with-Apr-18-combined.xlsx">Category M: 2018/19 Q2 Jul-Sep (with Apr 18 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2018/04/Category-M-2018.19-Q1-AprJun-with-Jan-18-combined-v2.xlsx">Category M: 2018/19 Q1 Apr-Jun (with Jan 18 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2017/12/Category-M-Jan-18.xlsx">Category M: 2017/18 Q4 Jan-Mar (with Oct 17 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Oct-17.xlsx">Category M: 2017/18 Q3 Oct-Dec (with Aug 17 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Aug-17.xlsx">Category M: 2017 August (with Jul 17 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Jul-17.xlsx">Category M: 2017/18 Q2 Jul-Sep (with Apr 17 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Apr-17.xlsx">Category M: 2017/18 Q1 Apr-Jun (with Jan 17 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Jan-17.xlsx">Category M: 2016/17 Q4 Jan-Mar (with Oct 16 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Oct-16.xlsx">Category M: 2016/17 Q3 Oct-Dec (with Jul 16 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Jul-16.xlsx">Category M: 2016/17 Q2 Jul – Sep (with Jun 16 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-June-16.xlsx">Category M: 2016 June (with Apr 16 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-April-16.xlsx" rel="">Category M: 2016/17 Q1 Apr – Jun (with Jan 16 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-2015.16-Q4-Jan-Mar-with-Oct-15-combined.xlsx">Category M: 2015/16 Q4 Jan – Mar (with Oct 15 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-2015.16-Q3-Oct-Dec-with-Jul-15-combined.xlsx">Category M: 2015/16 Q3 Oct </a><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Jun-15-and-Apr-15-Cat-M-prices.xlsx">–</a><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-2015.16-Q3-Oct-Dec-with-Jul-15-combined.xlsx"> Dec (with Jul 15 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Jun-15-and-Apr-15-Cat-M-prices.xlsx">Category M: 2015/16 Q2 Jul – Sep (with Apr 15 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Apr_15_and_Jan_15_Cat_M_prices-2.xlsx">Category M: 2015/16 Q1 Apr – Jun (with Jan 15 combined) updated</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Jan_15_and_Oct_14_Cat_M_prices.xlsx">Category M: 2014/15 Q4 Jan – Mar (with Oct 14 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Oct_14_and_Jul_14_Cat_M_prices.xlsx">Catgegory M: 2014/15 Q3 Oct – Dec (with Jul 14 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2013/07/Jul_14_and_Apr_14_Cat_M_Prices.xlsx">Category M: 2014/15 Q2 Jul – Sep (with Apr 14 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2013/07/Apr_14_and-Jan_14_Cat_M_Prices.xls.xlsx">Category M: 2014/15 Q1 Apr – Jun (with Jan 14 combined)</a> (MS Excel)</p>
<p></p>

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

Beautifulsoup不返回子元素

为什么Beautifulsoup对此表什么也不返回?

为什么Beautifulsoup find_all不返回完整结果?

为什么Python BeautifulSoup返回空列表?

为什么BeautifulSoup发现我返回的类ID以外的其他元素总是返回?

为什么这个 requests/beautifulsoup 代码不遵循 URL 循环?

为什么BeautifulSoup不解析页面的所有元素?

为什么BeautifulSoup在搜索结果网站上返回空列表?

为什么Beautifulsoup不从此页面返回必需的项目?

使用Python / BeautifulSoup搜寻网站-为什么此表不返回None?

为什么soup.find('title') 在BeautifulSoup 中什么都不返回?

当我按CSS类过滤时,为什么scrapy和beautifulsoup都什么都不返回?

为什么这个BeautifulSoup结果[]?

尽管网站中的 <span> 包含它,但为什么 <span> 不包含 BeautifulSoup 中的文本?

为什么BeautifulSoup .children包含无名元素以及预期的标记

为什么使用beautifulsoup4访问页面时reddit返回502错误

BeautifulSoup find() 返回标签,但标签之间没有值。为什么是这样?

为什么Beautifulsoup中的select_one函数返回None值

为什么这个 find_all 方法 (BeautifulSoup4) 不会正确返回所有 URL?

当我使用 find_all 函数时,为什么 beautifulsoup 不会返回所有值?

为什么我在beautifulsoup 中的find 方法在抓取coursera 网站时返回None?

为什么 LINQ 查询不返回任何元素?

为什么此流不返回任何元素?

$ {@:-1}为什么不返回$ @的最后一个元素?

为什么不返回DOM元素边框宽度?

为什么我的Iterator不返回整个链表中的元素?

为什么phantomjs不返回

为什么BeautifulSoup给我错误的文字?

Beautifulsoup - 为什么无法抓取此网站?