如何使用beautifulsoup4从网页中仅提取特定类型的链接

Moofin资源管理器

我正在尝试在包含链接的页面上提取特定链接。我需要的链接中包含“公寓”一词。

但是，无论我尝试什么，我都可以获得比仅需要的链接更多的数据提取方式。

<a href="https://www.website.com/en/ad/apartment/abcd123" title target="IWEB_MAIN">

如果有人可以帮助我，将不胜感激！另外，如果您有足够的资源可以更好地告知我有关信息，则将不胜感激！

昆杜克

Yon可以使用正则表达式re。

import re
soup=BeautifulSoup(Pagesource,'html.parser')
alltags=soup.find_all("a",attrs={"href" : re.compile("apartment")})
for item in alltags:
    print(item['href']) #grab href value

或者您可以使用CSS选择器

soup=BeautifulSoup(Pagesource,'html.parser')
alltags=soup.select("a[href*='apartment']")
for item in alltags:
    print(item['href'])

您可以在官方文件Beautifulsoup中找到详细信息

编辑：

您需要先考虑父div，然后找到锚标记。

import requests
from bs4 import BeautifulSoup
res=requests.get("https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000")
soup = BeautifulSoup(res.text, 'html.parser')
for item in soup.select("div[data-type='resultgallery-resultitem'] >a[href*='apartment']"):
       print(item['href'])

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。