我是python和所有事物的新手,并且我希望解析div类中的所有href。我的目标是创建一个程序来打开div类中的所有链接,以便能够保存与href相关联的照片。
链接:https://www.opi.com/shop-products/nail-polish-powders/nail-lacquer
我要解析的部分是“ div-id:all_nail_lacquer”
到目前为止,我已经能够获取所有href,而这是到目前为止的结果:
import urllib
import urllib.request
from bs4 import BeautifulSoup
theurl = "https://www.opi.com/shop-products/nail-polish-powders/nail-lacquer"
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage, "html.parser")
print(soup.title.text)
nail_lacquer = (soup.find('div', {"id":"all_nail_lacquer"}))
"""
for nail_lacquer in soup.find_all('div'):
print(nail_lacquer.findAll('a')
"""
for a in soup.findAll('div', {"id":"all_nail_lacquer"}):
for b in a.findAll('a'):
print(b.get('href'))
要打印图像链接(甚至是高分辨率图像)和标题,可以使用以下脚本:
import urllib
import urllib.request
from bs4 import BeautifulSoup
theurl = "https://www.opi.com/shop-products/nail-polish-powders/nail-lacquer"
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage, "html.parser")
for img in soup.select('#all_nail_lacquer [typeof="foaf:Image"][data-src]'):
print(img['data-src'])
print(img['data-src'].replace('shelf_image', 'photos')) # <-- this is URL to hi-res image
print(img['title'])
print('-' * 80)
印刷品:
https://www.opi.com/sites/default/files/styles/product_shelf_image/public/baby-take-a-vow-nlsh1-nail-lacquer-22850011001_0_0.jpg?itok=3b2ftHzc
https://www.opi.com/sites/default/files/styles/product_photos/public/baby-take-a-vow-nlsh1-nail-lacquer-22850011001_0_0.jpg?itok=3b2ftHzc
Baby, Take a Vow
--------------------------------------------------------------------------------
https://www.opi.com/sites/default/files/styles/product_shelf_image/public/suzi-without-a-paddle-nlf88-nail-lacquer-22006698188_21_0.jpg?itok=mgi1-rz3
https://www.opi.com/sites/default/files/styles/product_photos/public/suzi-without-a-paddle-nlf88-nail-lacquer-22006698188_21_0.jpg?itok=mgi1-rz3
Suzi Without a Paddle
--------------------------------------------------------------------------------
https://www.opi.com/sites/default/files/styles/product_shelf_image/public/coconuts-over-opi-nlf89-nail-lacquer-22006698189_24_1_0.jpg?itok=yasOZA4l
https://www.opi.com/sites/default/files/styles/product_photos/public/coconuts-over-opi-nlf89-nail-lacquer-22006698189_24_1_0.jpg?itok=yasOZA4l
Coconuts Over OPI
--------------------------------------------------------------------------------
https://www.opi.com/sites/default/files/styles/product_shelf_image/public/no-tan-lines-nlf90-nail-lacquer-22006698190_20_1_0.jpg?itok=ot_cu8c5
https://www.opi.com/sites/default/files/styles/product_photos/public/no-tan-lines-nlf90-nail-lacquer-22006698190_20_1_0.jpg?itok=ot_cu8c5
No Tan Lines
--------------------------------------------------------------------------------
...and so on.
编辑:要将图像保存到磁盘,您可以使用此脚本:
import requests
from bs4 import BeautifulSoup
theurl = "https://www.opi.com/shop-products/nail-polish-powders/nail-lacquer"
thepage = requests.get(theurl)
soup = BeautifulSoup(thepage.content, "html.parser")
i = 1
for img in soup.select('#all_nail_lacquer [typeof="foaf:Image"][data-src]'):
u = img['data-src'].replace('shelf_image', 'photos')
with open('img_{:04d}.jpg'.format(i), 'wb') as f_out:
print('Saving {}'.format(u))
f_out.write(requests.get(u).content)
i += 1
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句