使用python从单个youtube播放列表链接中提取单个链接

144

theCursedPirate：

我需要一个Python脚本，该脚本需要链接到单个youtube播放列表，然后给出一个包含播放列表中各个视频链接的列表。

我意识到几年前曾问过同样的问题，但有人问过python2.x，答案中的代码无法正常工作。它们很奇怪，有时可以工作，但偶尔会输出空（也许其中一些软件包已经更新，我不知道）。我在下面包括了其中之一。

如果您不相信，请多次运行此代码，偶尔会收到一次空列表，但大多数情况下，它会分解播放列表。

from bs4 import BeautifulSoup as bs
import requests

r = requests.get('https://www.youtube.com/playlist?list=PL3D7BFF1DDBDAAFE5')
page = r.text
soup=bs(page,'html.parser')
res=soup.find_all('a',{'class':'pl-video-title-link'})
for l in res:
    print(l.get("href"))

在某些播放列表的情况下，代码根本不起作用。

另外，如果beautifulsoup无法完成这项工作，那么任何其他流行的python库也可以做到。

贝特朗·马特尔：

似乎youtube有时会加载页面的不同版本，有时会使用pl-video-title-linkclass 链接来按您期望的那样组织html ：

<td class="pl-video-title">
   <a class="pl-video-title-link yt-uix-tile-link yt-uix-sessionlink  spf-link " dir="ltr" href="/watch?v=GtWXOzsD5Fw&amp;list=PL3D7BFF1DDBDAAFE5&amp;index=101&amp;t=0s" data-sessionlink="ei=TJbjXtC8NYri0wWCxarQDQ&amp;feature=plpp_video&amp;ved=CGoQxjQYYyITCNCSmqHD_OkCFQrxtAodgqIK2ij6LA">
   Android Application Development Tutorial - 105 - Spinners and ArrayAdapter
   </a>
   <div class="pl-video-owner">
      de <a href="/user/thenewboston" class=" yt-uix-sessionlink      spf-link " data-sessionlink="ei=TJbjXtC8NYri0wWCxarQDQ&amp;feature=playlist&amp;ved=CGoQxjQYYyITCNCSmqHD_OkCFQrxtAodgqIK2ij6LA" >thenewboston</a>
   </div>
   <div class="pl-video-bottom-standalone-badge">
   </div>
</td>

有时将数据嵌入JS变量中并动态加载：

window["ytInitialData"] = { .... very big json here .... };

对于第二个版本，除非要使用硒等工具来在页面加载后获取内容，否则将需要使用正则表达式来解析Javascript。

IMO的最佳方法是使用官方API，该API很容易获得播放列表项：

转到Google开发者控制台，搜索Youtube Data API /启用Youtube Data API v3

转到凭据/创建凭据/ API密钥

为python安装Google api客户端：

pip3 install --upgrade google-api-python-client

在下面的脚本中使用API密钥。该脚本获取ID为的播放列表的播放列表项PL3D7BFF1DDBDAAFE5，使用分页获取所有列表，然后从videoId和playlistID重新创建链接：

import googleapiclient.discovery
from urllib.parse import parse_qs, urlparse

#extract playlist id from url
url = 'https://www.youtube.com/playlist?list=PL3D7BFF1DDBDAAFE5'
query = parse_qs(urlparse(url).query, keep_blank_values=True)
playlist_id = query["list"][0]

print(f'get all playlist items links from {playlist_id}')
youtube = googleapiclient.discovery.build("youtube", "v3", developerKey = "YOUR_API_KEY")

request = youtube.playlistItems().list(
    part = "snippet",
    playlistId = playlist_id,
    maxResults = 50
)
response = request.execute()

playlist_items = []
while request is not None:
    response = request.execute()
    playlist_items += response["items"]
    request = youtube.playlistItems().list_next(request, response)

print(f"total: {len(playlist_items)}")
print([ 
    f'https://www.youtube.com/watch?v={t["snippet"]["resourceId"]["videoId"]}&list={playlist_id}&t=0s'
    for t in playlist_items
])

输出：

get all playlist items links from PL3D7BFF1DDBDAAFE5
total: 195
[
    'https://www.youtube.com/watch?v=SUOWNXGRc6g&list=PL3D7BFF1DDBDAAFE5&t=0s', 
    'https://www.youtube.com/watch?v=857zrsYZKGo&list=PL3D7BFF1DDBDAAFE5&t=0s', 
    'https://www.youtube.com/watch?v=Da1jlmwuW_w&list=PL3D7BFF1DDBDAAFE5&t=0s',
    ...........
    'https://www.youtube.com/watch?v=1j4prh3NAZE&list=PL3D7BFF1DDBDAAFE5&t=0s', 
    'https://www.youtube.com/watch?v=s9ryE6GwhmA&list=PL3D7BFF1DDBDAAFE5&t=0s'
]

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。