我要抓取的文字是标题123rd Meeting,来自
为此,我使用此代码
import urllib.request #get the HTML page from url
import urllib.error
from bs4 import BeautifulSoup
# set page to read
with urllib.request.urlopen('https://www.bcb.gov.br/en/#!/c/copomstatements/1724') as response:
page = response.read()
# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, "html.parser")
print(soup)
# Inspect: <h3 class="BCTituloPagina ng-binding">123rd Meeting</h3>
title = soup.find("h3", attrs={"class": "BCTituloPagina ng-binding"})
print(title)
但是,命令
print(soup)
既不返回标题:123rd Meeting,也不返回正文:鉴于....目标降低了25个基点。
您不能使用python中的常规请求库来提取标题,因为您要提取的元素是使用javascript呈现的。您将需要使用硒来实现您的目标。
码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get('https://www.bcb.gov.br/en/#!/c/copomstatements/1724')
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//h3')))
title = driver.find_element_by_xpath('//h3').text
print(title)
driver.close()
输出:
123rd Meeting
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句