使用python bs4时如何从嵌套标签中获取信息？

扎姆希德贝克·阿卜杜勒哈米多夫

我想解析 GitHub 趋势页面，这是我的代码：

import requests
from bs4 import BeautifulSoup

url_github = "https://github.com/trending"


def request_github_trending(url):
    request = requests.get(url)
    return request


def extract(page):
    soup = BeautifulSoup(page.content, 'html.parser')
    return soup.find_all('article', class_="Box-row")


def transform(html_repos):
    for repo in html_repos:
        stars = repo.find('a', class_="Link--muted d-inline-block mr-3")
        print(stars)
        break


print(transform(extract(request_github_trending(url_github))))

我想解析星星的数量然后我得到了这个结果：

<a class="Link--muted d-inline-block mr-3" data-view-component="true" href="/rocketseat-education/nlw6-discover/stargazers">
<svg aria-label="star" class="octicon octicon-star" data-view-component="true" height="16" role="img" version="1.1" viewbox="0 0 16 16" width="16">
<path d="M8 .25a.75.75 0 01.673.418l1.882 3.815 4.21.612a.75.75 0 01.416 1.279l-3.046 2.97.719 4.192a.75.75 0 01-1.088.791L8 12.347l-3.766 1.98a.75.75 0 01-1.088-.79l.72-4.194L.818 6.374a.75.75 0 01.416-1.28l4.21-.611L7.327.668A.75.75 0 018 .25zm0 2.445L6.615 5.5a.75.75 0 01-.564.41l-3.097.45 2.24 2.184a.75.75 0 01.216.664l-.528 3.084 2.769-1.456a.75.75 0 01.698 0l2.77 1.456-.53-3.084a.75.75 0 01.216-.664l2.24-2.183-3.096-.45a.75.75 0 01-.564-.41L8 2.694v.001z" fill-rule="evenodd"></path>
</svg>
        128
</a>
None

我怎么只能得到数字？而且，我试图解析存储库名称和开发人员名称。但是搞砸了这个。无法获取开发者姓名，有仓库名的情况只能获取斜杠前的部分。我将不胜感激任何帮助！

孟德尔

要获得“星星”，您可以使用该.get_text()方法。要获取“存储库”，您可以使用该next_sibling方法

在这个例子中，我已经介绍了如何获取所有信息，包括“存储库”、“星星”和开发人员名称（“内置购买”）。

import requests
from bs4 import BeautifulSoup


url_github = "https://github.com/trending"


def request_github_trending(url):
    request = requests.get(url)
    return request


def extract(page):
    soup = BeautifulSoup(page.content, "html.parser")
    return soup.find_all("article", class_="Box-row")


def print_info(html):
    fmt_string = "{:<60} {:<30} {}"
    print(fmt_string.format("Repo", "Stars", "Built by"))
    print("-" * 150)
    for tag in html:
        repository_info = tag.find(class_="text-normal")
        repository = repository_info.text.strip() + repository_info.next_sibling.strip()

        stars = tag.find(class_="Link--muted d-inline-block mr-3").get_text(strip=True)

        usernames = [user["alt"] for user in tag.find_all("img")]
        print(fmt_string.format(repository, stars, usernames))


print_info(extract(request_github_trending(url_github)))

输出：

Repo                                                         Stars                          Built by
------------------------------------------------------------------------------------------------------------------------------------------------------
rocketseat-education /nlw6-discover                          129                            ['@jakeliny']
six-ddc /plow                                                1,531                          ['@six-ddc', '@chenrui333', '@dependabot', '@musinit']
flutter /flutter                                             123,023                        ['@engine-flutter-autoroll', '@abarth', '@Hixie', '@jonahwilliams', '@HansMuller']
n8n-io /n8n                                                  15,781                         ['@janober', '@RicardoE105', '@ivov', '@Rupenieks', '@krynble']
PaddlePaddle /PaddleClas                                     1,521                          ['@littletomatodonkey', '@weisy11', '@dyning', '@Intsigstephon', '@cuicheng01']
...
...

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-08-31

我来说两句

0 条评论

登录后参与评论

上一篇：保存更新时转换为 ISOString

使用python bs4时如何从嵌套标签中获取信息？

使用python bs4时如何从嵌套标签中获取信息？

蓝屏死机没有修复解决方案

计算数据帧中每行的NA

UITableView的项目向下滚动后更改颜色，然后快速备份

Node.js中未捕获的异常错误，发生调用

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

Linux的官方Adobe Flash存储库是否已过时？

验证REST API参数

ggplot：对齐多个分面图-所有大小不同的分面

Mac OS X更新后的GRUB 2问题

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

带有错误“ where”条件的查询如何返回结果？

用日期数据透视表和日期顺序查询

VB.net将2条特定行导出到DataGridView

如何从视图一次更新多行（ASP.NET - Core）

Java Eclipse中的错误13，如何解决？

尝试反复更改屏幕上按钮的位置 - kotlin android studio

离子动态工具栏背景色

应用发明者仅从列表中选择一个随机项一次

当我尝试下载 StanfordNLP en 模型时，出现错误

python中的boto3文件上传

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID