我正在尝试检查一个元素内部是否有其他元素。url 元素有时单独包含 loc 标签,有时包含 loc 和图像标签,我想在有图像标签时获取 loc 标签的值,我尝试过这样的事情。
url = "https://www.aeroprecisionusa.com/media/sitemap_en.xml"
response = requests.get(url)
root = ET.fromstring(response.content)
links = []
for elm in root.findall(".//{http://www.sitemaps.org/schemas/sitemap/0.9}url"):
if elm.find('.//{http://www.sitemaps.org/schemas/sitemap/0.9}image) is not None:
link = elm.find('./{http://www.sitemaps.org/schemas/sitemap/0.9}loc').text
links.append(link)
return links
但它仍然返回所有 url 父元素的 loc 标记。
此脚本将打印<loc>
标签和<image>
urls 标签:
import requests
import xml.etree.ElementTree as ET
url = "https://www.aeroprecisionusa.com/media/sitemap_en.xml"
response = requests.get(url)
root = ET.fromstring(response.content)
links = []
for elm in root.findall(".//{http://www.sitemaps.org/schemas/sitemap/0.9}url"):
loc = elm.find(".//{http://www.sitemaps.org/schemas/sitemap/0.9}loc")
img = elm.find(".//{http://www.google.com/schemas/sitemap-image/1.1}image")
if not loc is None and not img is None:
img_loc = img.find(
".//{http://www.google.com/schemas/sitemap-image/1.1}loc"
)
print(loc.text)
print(img_loc.text)
print("-" * 80)
打印:
...
--------------------------------------------------------------------------------
https://www.aeroprecisionusa.com/magpul-moe-grip-sl-s-stock-midnight-marshland-furniture-set
http://d2df4e9l5rljaz.cloudfront.net/media/catalog/product/cache/61578878e6753b4ec73e244e03a0515d/a/p/aprh101361c-magpul-moe-grip-sl-s-stock-midnight-marshland-furniture-set-3.jpg
--------------------------------------------------------------------------------
https://www.aeroprecisionusa.com/battle-rope-2pt0-357-38-cal-9mm-pistol
http://d2df4e9l5rljaz.cloudfront.net/media/catalog/product/cache/61578878e6753b4ec73e244e03a0515d/a/p/aprh101753-battle-rope-2pt0.jpg
--------------------------------------------------------------------------------
https://www.aeroprecisionusa.com/battle-2pt0-rope-22-223-cal-pistol-rifle
http://d2df4e9l5rljaz.cloudfront.net/media/catalog/product/cache/61578878e6753b4ec73e244e03a0515d/a/p/aprh101754-battle-rope-2pt0.jpg
--------------------------------------------------------------------------------
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句