我想从GitHub搜索结果中获取GitHub存储库链接。现在,我的代码同时获得了用户名和存储库的链接。我如何通过定位锚标记属性值来仅获取存储库链接。
我的代码:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
path = "C:\programs\chromedriver.exe"
driver = webdriver.Chrome(path)
url = 'https://github.com/topics/flutter-apps'
driver.get(url)
links_list = []
headings = driver.find_elements_by_class_name('f3')
for heading in headings:
links = heading.find_elements_by_tag_name('a')
for l in links:
links_list.append(l.get_attribute('href'),)
print(links_list)
这是我要从中获取链接的代码。
<h1 class="f3 text-gray text-normal lh-condensed">
<a data-hydro-click="{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"OWNER","click_visual_representation":"REPOSITORY_OWNER_HEADING","actor_id":49521558,"record_id":484656,"originating_url":"https://github.com/topics/ios","user_id":49521558}}"
data-hydro-click-hmac="7b69680b468dda1b4e10ddab19c8034fd4c530bc57957662d8be320d79cc38f1"
data-ga-click="Explore, go to repository owner, location:explore feed" href="/vsouza">
vsouza
</a> /
<a data-hydro-click="{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"REPOSITORY","click_visual_representation":"REPOSITORY_NAME_HEADING","actor_id":49521558,"record_id":21700699,"originating_url":"https://github.com/topics/ios","user_id":49521558}}"
data-hydro-click-hmac="c38ef14c5a72214b8e946bde857c36653301cb96a15a6b1108242526485221b8"
data-ga-click="Explore, go to repository, location:explore feed" href="/vsouza/awesome-ios" class="text-bold">
awesome-ios
</a>
</h1>
我想要获得具有此属性和值的锚标记的href值,这两个锚元素之间 data-ga-click="Explore, go to repository, location:explore feed"
要获得这样的特定链接,您可以在中传递此data-ga-click
属性xpath
以获得唯一的结果。
for heading in headings:
links = heading.find_elements_by_xpath('.//a[@data-ga-click="Explore, go to repository, location:explore feed"]')
for l in links:
links_list.append(l.get_attribute('href'))
或CSS选择器。
for heading in headings:
links = heading.find_elements_by_css_selector('a[data-ga-click="Explore, go to repository, location:explore feed"]')
for l in links:
links_list.append(l.get_attribute('href'))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句