我正在尝试解析来自此网站链接的评论:我需要获得 1000 条评论,默认情况下它只显示 10 条
我想获得 1000 条评论,默认情况下只显示 10 条。单击“查看更多”后,我无法找到获取网页上显示内容的方法
到目前为止,我有以下代码:
import urllib.request
from bs4 import BeautifulSoup
import sys
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
response = urllib.request.urlopen("https://www.mygov.in/group-issue/share-
your-ideas-pm-narendra-modis-mann-ki-baat-26th-march-2017/")
srcode = response.read()
soup = BeautifulSoup(srcode, "html.parser")
all_comments_div=soup.find_all('div', class_="comment_body");
all_comments=[]
for div in all_comments_div:
all_comments.append(div.find('p').text.translate(non_bmp_map))
print (all_comments)
print (len(all_comments))
您可以使用 while 循环来获取下一页
(即当有下一页且所有评论少于 1000 时)
import urllib.request
from bs4 import BeautifulSoup
import sys
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
all_comments = []
max_comments = 1000
base_url = 'https://www.mygov.in/'
next_page = base_url + '/group-issue/share-your-ideas-pm-narendra-modis-mann-ki-baat-26th-march-2017/'
while next_page and len(all_comments) < max_comments :
response = response = urllib.request.urlopen(next_page)
srcode = response.read()
soup = BeautifulSoup(srcode, "html.parser")
all_comments_div=soup.find_all('div', class_="comment_body");
for div in all_comments_div:
all_comments.append(div.find('p').text.translate(non_bmp_map))
next_page = soup.find('li', class_='pager-next first last')
if next_page :
next_page = base_url + next_page.find('a').get('href')
print('comments: {}'.format(len(all_comments)))
print(all_comments)
print(len(all_comments))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句