我正在尝试使用 Python 3 和 BeautifulSoup 4 获取一个 url 以从此页面下载 xlsx 文件:https : //psnc.org.uk/funding-and-statistics/funding-distribution/retained-margin-category-米/
我需要获取最新文件的 url,该文件位于 a<p>
内的标签列表中的索引 0 处<div>
,我可以在控制台中使用 JS 获取它,如下所示:
var link = document.getElementsByClassName("toggle_container")[2].children[1].children[0].href
如果我使用BS4让所有的的<p>
页面上的标签,我想要的链接是在列表中:
import urllib
import requests
from bs4 import BeautifulSoup
cat_m_site = "https://psnc.org.uk/funding-and-statistics/funding-distribution/retained-margin-category-m/"
page = requests.get(cat_m_site)
soup = BeautifulSoup(page.text, 'html.parser')
p_elements = soup.find_all('p')
for item in p_elements:
print(item)
如果我尝试通过获取<div>
包含链接的来重现 JS 解决方案,则应该有一个包含 29 个<p>
元素的列表,但该列表为空:
import urllib
import requests
from bs4 import BeautifulSoup
cat_m_site = "https://psnc.org.uk/funding-and-statistics/funding-distribution/retained-margin-category-m/"
page = requests.get(cat_m_site)
soup = BeautifulSoup(page.text, 'html.parser')
divs = soup.find_all('div', {'class':'toggle_container'})
children = divs[2].findChildren("p", recursive=True)
for child in children:
print(child)
我更喜欢这种方式,因为我“知道”链接将位于此 div 的第 0 个元素中,但我觉得我缺少有关 findChildren 方法的某些内容。
使用soup = BeautifulSoup(page.text, 'lxml')
替代
import urllib
import requests
from bs4 import BeautifulSoup
cat_m_site = "https://psnc.org.uk/funding-and-statistics/funding-distribution/retained-margin-category-m/"
page = requests.get(cat_m_site)
soup = BeautifulSoup(page.text, 'lxml')
divs = soup.find_all('div', {'class':'toggle_container'})
children = divs[2].findChildren("p", recursive=True)
for child in children:
print(child)
输出:
<p><a href="https://psnc.org.uk/wp-content/uploads/2019/10/Category-M-201920-Q3-Oct-Dec-with-Aug-19-combined.xlsx">Category M 2019/20 Q3 Oct-Dec (with Aug 19 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2019/08/Category-M-2019-August-with-Jul-19-combined.xlsx">Category M 2019 August (with Jul 19 combined</a>) (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2019/08/Category-M-2019-20-Q2-Jul-Sep-with-Apr-19-combined.xlsx">Category M 2019/20 Q2 Jul-Sep (with Apr 19 combined) </a>(MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2019/05/Cat-M-Apr-2019-1.xlsx">Category M: 2019/20 Q1 Apri-June (with Jan 2019 combined) </a>(MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2019/01/Category-M-2018.19-Q4-JanMar-with-Nov-18-combined.xlsx">Category M: 2018/19 Q4 Jan-Mar (with Nov 18 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2019/01/Category-M-Nov-18.xlsx">Category M: 2018 November (with Oct 18 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2018/09/Category-M-2018.19-Q3-OctDec-with-Aug-18-combined.xlsx">Category M: 2018/19 Q3 Oct-Dec (with Aug 18 combined)</a> (MS Excel)</p>
<p><a href="http://psnc.org.uk/wp-content/uploads/2018/06/Category-M-2018.19-Q2-JulSep-with-Apr-18-combined.xlsx">Category M: 2018/19 Q2 Jul-Sep (with Apr 18 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2018/04/Category-M-2018.19-Q1-AprJun-with-Jan-18-combined-v2.xlsx">Category M: 2018/19 Q1 Apr-Jun (with Jan 18 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2017/12/Category-M-Jan-18.xlsx">Category M: 2017/18 Q4 Jan-Mar (with Oct 17 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Oct-17.xlsx">Category M: 2017/18 Q3 Oct-Dec (with Aug 17 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Aug-17.xlsx">Category M: 2017 August (with Jul 17 combined)</a> (MS Excel)</p>
<p><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Jul-17.xlsx">Category M: 2017/18 Q2 Jul-Sep (with Apr 17 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Apr-17.xlsx">Category M: 2017/18 Q1 Apr-Jun (with Jan 17 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Jan-17.xlsx">Category M: 2016/17 Q4 Jan-Mar (with Oct 16 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Oct-16.xlsx">Category M: 2016/17 Q3 Oct-Dec (with Jul 16 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-Jul-16.xlsx">Category M: 2016/17 Q2 Jul – Sep (with Jun 16 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-June-16.xlsx">Category M: 2016 June (with Apr 16 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-April-16.xlsx" rel="">Category M: 2016/17 Q1 Apr – Jun (with Jan 16 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-2015.16-Q4-Jan-Mar-with-Oct-15-combined.xlsx">Category M: 2015/16 Q4 Jan – Mar (with Oct 15 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-2015.16-Q3-Oct-Dec-with-Jul-15-combined.xlsx">Category M: 2015/16 Q3 Oct </a><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Jun-15-and-Apr-15-Cat-M-prices.xlsx">–</a><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Category-M-2015.16-Q3-Oct-Dec-with-Jul-15-combined.xlsx"> Dec (with Jul 15 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Jun-15-and-Apr-15-Cat-M-prices.xlsx">Category M: 2015/16 Q2 Jul – Sep (with Apr 15 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Apr_15_and_Jan_15_Cat_M_prices-2.xlsx">Category M: 2015/16 Q1 Apr – Jun (with Jan 15 combined) updated</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Jan_15_and_Oct_14_Cat_M_prices.xlsx">Category M: 2014/15 Q4 Jan – Mar (with Oct 14 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2014/09/Oct_14_and_Jul_14_Cat_M_prices.xlsx">Catgegory M: 2014/15 Q3 Oct – Dec (with Jul 14 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2013/07/Jul_14_and_Apr_14_Cat_M_Prices.xlsx">Category M: 2014/15 Q2 Jul – Sep (with Apr 14 combined)</a> (MS Excel)</p>
<p style="text-align: justify;"><a href="https://psnc.org.uk/wp-content/uploads/2013/07/Apr_14_and-Jan_14_Cat_M_Prices.xls.xlsx">Category M: 2014/15 Q1 Apr – Jun (with Jan 14 combined)</a> (MS Excel)</p>
<p></p>
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句