我正在抓取此网站https://robertsspaceindustries.com/pledge/ship-upgrades?to-ship=173我想在“选择您的船”文本的右侧获得“箭头”文本
我尝试使用请求和BeautifulSoup选择包含文本的标签,当我检查页面时,我可以看到文本在标签之间的位置,我尝试用soup.select(“。name”)选择它,但是我仍然空着字符串,可能是用Javascript渲染的数据,所以我尝试了selenium并尝试等待元素加载后再选择它,这仍然是我的代码
try:
element = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.CLASS_NAME, "name"))
)
select_tags = driver.find_elements_by_css_selector(".name")
for tag in select_tags:
print(tag.text)
finally:
driver.quit()
箭头
Selenium可能对于不需要与页面进行交互的此类任务显得过于刻板。这只是几行requests_html
:
from requests_html import HTMLSession
url = 'https://robertsspaceindustries.com/pledge/ship-upgrades?to-ship=173'
session = HTMLSession()
r = session.get(url)
r.html.render()
print(r.html.find('.info > .name', first=True).text)
产生Arrow
预期的效果。
对于此特定站点,您还可以在内容的其他位置进行检查以获取所需的信息,而无需JavaScript支持,例如:
import json
import requests
url = 'https://robertsspaceindustries.com/pledge/ship-upgrades?to-ship=173'
r = requests.get(url)
text = r.text
json_start_text = 'fromShips: '
json_start = text.index(json_start_text) + len(json_start_text)
json_end = text.index(']', json_start)
json_text = text[json_start:json_end + 1]
data = json.loads(json_text)
for ship in data:
name = ship['name']
msrp = ship['msrp']
print(f'{name} {msrp}')
导致
Aurora ES $20.00
P52 Merlin $20.00
Aurora MR $25.00
P72 Archimedes $30.00
Mustang Alpha $30.00
Aurora LX $30.00
...
Arrow $75.00
...
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句