我正在尝试抓取一个站点以获取最新情况说明书的链接。我尝试过使用 Selenium 和 BeautifulSoup,但是每次我都无法使用这些工具找到链接。例如,当使用 Soup 检查输出时,我无法从零件中得到任何信息。有什么建议?
使用硒:
#BIOG
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
driver.get('https://www.biotechgt.com/performance/monthly-factsheets')
html = driver.page_source
driver.find_elements(By.XPATH, '/html/body/div/main/section/div/div/div/div/div[2]/div/div[1]/div[2]/div/table/tbody[1]/tr[2]/td/a')
要从页面获取所有下载链接,您可以使用下一个示例:
import requests
from bs4 import BeautifulSoup
url = "https://www.biotechgt.com/performance/monthly-factsheets"
soup = BeautifulSoup(
requests.get(url, cookies={"dp-disclaimer": "APPROVED"}).content,
"html.parser",
)
for a in soup.select("a.gtm-downloads:has(.btn-download)"):
print(a["href"])
印刷:
https://www.biotechgt.com/download_file/force/191/209
https://www.biotechgt.com/download_file/force/187/209
https://www.biotechgt.com/download_file/force/185/209
https://www.biotechgt.com/download_file/force/184/209
...
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句