使用 BeautifulSoup 或 Selenium 抓取网站时找不到数据

萨姆辛

我正在尝试抓取一个站点以获取最新情况说明书的链接。我尝试过使用 Selenium 和 BeautifulSoup，但是每次我都无法使用这些工具找到链接。例如，当使用 Soup 检查输出时，我无法从零件中得到任何信息。有什么建议？

使用硒：

#BIOG
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True

driver = webdriver.Chrome(options=options)
driver.get('https://www.biotechgt.com/performance/monthly-factsheets')
html = driver.page_source

driver.find_elements(By.XPATH, '/html/body/div/main/section/div/div/div/div/div[2]/div/div[1]/div[2]/div/table/tbody[1]/tr[2]/td/a')

安德烈·凯斯利

要从页面获取所有下载链接，您可以使用下一个示例：

import requests
from bs4 import BeautifulSoup

url = "https://www.biotechgt.com/performance/monthly-factsheets"

soup = BeautifulSoup(
    requests.get(url, cookies={"dp-disclaimer": "APPROVED"}).content,
    "html.parser",
)

for a in soup.select("a.gtm-downloads:has(.btn-download)"):
    print(a["href"])

印刷：

https://www.biotechgt.com/download_file/force/191/209
https://www.biotechgt.com/download_file/force/187/209
https://www.biotechgt.com/download_file/force/185/209
https://www.biotechgt.com/download_file/force/184/209

...

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。