我想解析 GitHub 趋势页面,这是我的代码:
import requests
from bs4 import BeautifulSoup
url_github = "https://github.com/trending"
def request_github_trending(url):
request = requests.get(url)
return request
def extract(page):
soup = BeautifulSoup(page.content, 'html.parser')
return soup.find_all('article', class_="Box-row")
def transform(html_repos):
for repo in html_repos:
stars = repo.find('a', class_="Link--muted d-inline-block mr-3")
print(stars)
break
print(transform(extract(request_github_trending(url_github))))
我想解析星星的数量然后我得到了这个结果:
<a class="Link--muted d-inline-block mr-3" data-view-component="true" href="/rocketseat-education/nlw6-discover/stargazers">
<svg aria-label="star" class="octicon octicon-star" data-view-component="true" height="16" role="img" version="1.1" viewbox="0 0 16 16" width="16">
<path d="M8 .25a.75.75 0 01.673.418l1.882 3.815 4.21.612a.75.75 0 01.416 1.279l-3.046 2.97.719 4.192a.75.75 0 01-1.088.791L8 12.347l-3.766 1.98a.75.75 0 01-1.088-.79l.72-4.194L.818 6.374a.75.75 0 01.416-1.28l4.21-.611L7.327.668A.75.75 0 018 .25zm0 2.445L6.615 5.5a.75.75 0 01-.564.41l-3.097.45 2.24 2.184a.75.75 0 01.216.664l-.528 3.084 2.769-1.456a.75.75 0 01.698 0l2.77 1.456-.53-3.084a.75.75 0 01.216-.664l2.24-2.183-3.096-.45a.75.75 0 01-.564-.41L8 2.694v.001z" fill-rule="evenodd"></path>
</svg>
128
</a>
None
我怎么只能得到数字?而且,我试图解析存储库名称和开发人员名称。但是搞砸了这个。无法获取开发者姓名,有仓库名的情况只能获取斜杠前的部分。我将不胜感激任何帮助!
要获得“星星”,您可以使用该.get_text()
方法。要获取“存储库”,您可以使用该next_sibling
方法
在这个例子中,我已经介绍了如何获取所有信息,包括“存储库”、“星星”和开发人员名称(“内置购买”)。
import requests
from bs4 import BeautifulSoup
url_github = "https://github.com/trending"
def request_github_trending(url):
request = requests.get(url)
return request
def extract(page):
soup = BeautifulSoup(page.content, "html.parser")
return soup.find_all("article", class_="Box-row")
def print_info(html):
fmt_string = "{:<60} {:<30} {}"
print(fmt_string.format("Repo", "Stars", "Built by"))
print("-" * 150)
for tag in html:
repository_info = tag.find(class_="text-normal")
repository = repository_info.text.strip() + repository_info.next_sibling.strip()
stars = tag.find(class_="Link--muted d-inline-block mr-3").get_text(strip=True)
usernames = [user["alt"] for user in tag.find_all("img")]
print(fmt_string.format(repository, stars, usernames))
print_info(extract(request_github_trending(url_github)))
输出:
Repo Stars Built by
------------------------------------------------------------------------------------------------------------------------------------------------------
rocketseat-education /nlw6-discover 129 ['@jakeliny']
six-ddc /plow 1,531 ['@six-ddc', '@chenrui333', '@dependabot', '@musinit']
flutter /flutter 123,023 ['@engine-flutter-autoroll', '@abarth', '@Hixie', '@jonahwilliams', '@HansMuller']
n8n-io /n8n 15,781 ['@janober', '@RicardoE105', '@ivov', '@Rupenieks', '@krynble']
PaddlePaddle /PaddleClas 1,521 ['@littletomatodonkey', '@weisy11', '@dyning', '@Intsigstephon', '@cuicheng01']
...
...
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句