我正在尝试使用Python从网站中提取特定的'dd'元素
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
url = "https://www.ranger5g.com/forum/threads/pre-collision-assist.3239"
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
vehicle=[]
for i in soup.findAll("div", class_="message-userExtras"):
for item in soup.find_all("dd")[::-1]:
vehicle.append(item.get_text())
print(vehicle)
我正在尝试仅从网址中提取车辆列表,我的输出应如下所示
2019 Ford Ranger XLT FX4
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
Tahoe/Tundra/Fusion
2019 Ford Ranger Lariat - Saber; 2014 GMC Terrain
但是我的结果不是我所期望的
使用正则表达式re并dt
用文本搜索标签,Vehicle
然后找到下一个dd
标签。
import re
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
url = "https://www.ranger5g.com/forum/threads/pre-collision-assist.3239"
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
for item in soup.find_all("div",class_='message-userExtras'):
print(item.find('dt',text=re.compile("Vehicle")).find_next('dd').text.strip())
输出:
2019 Ford Ranger XLT FX4
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
Tahoe/Tundra/Fusion
2019 Ford Ranger Lariat - Saber; 2014 GMC Terrain
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
2019 Ranger Lariat - 2019 Honda CRV Touring
2019 Ford Ranger XLT FX4
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
2019 Ranger Lariat SuperCab
2019 Ranger Lariat
Ranger Lariat
2019 Ford Ranger Lariat
Ranger Lariat
Ranger Lariat
2019 Ranger XLT 301A SuperCrew 4X4 2015 Ecoboost Mustang 50 Year Appereance Package convertible
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句