网站抓取:通过Python抓取多个网站

用户名
from bs4 import BeautifulSoup
import requests

url = 'https://uk.trustpilot.com/review/thread.com'
for pg in range(1, 10):
  pg = url + '?page=' + str(pg)
  soup = BeautifulSoup(page.content, 'lxml')
  for paragraph in soup.find_all('p'):
     print(paragraph.text)

我想从https://uk.trustpilot.com/review/thread.com抓取排名,评论和评论日期,但是,我不知道如何从多个页面抓取并为它制作一个熊猫DataFrame。刮擦结果

比托·本尼汉(Bitto Bennichan)

嗨,您需要向每个页面发送一个请求,然后处理响应。另外,由于某些项目不能直接用作标签中的文本,因此您可以从javascript中获取它(我使用json load输入日期)或从类名获取(我得到这样的评分)。

from bs4 import BeautifulSoup
import pandas as pd
final_list=[]#final list to be the df
import json
import requests
final_list=[]
url = 'https://uk.trustpilot.com/review/thread.com'
for pg in range(1, 3):
  pg = url + '?page=' + str(pg)
  r=requests.get(pg)
  soup = BeautifulSoup(r.text, 'lxml')
  for paragraph in soup.find_all('section',class_='review__content'):
     title=paragraph.find('h2',class_='review-content__title').text.strip()
     content=paragraph.find('p',class_='review-content__text').text.strip()
     datedata= json.loads(paragraph.find('div',class_='review-content-header__dates').text)
     date=datedata['publishedDate'].split('T')[0]
     rating_class=paragraph.find('div',class_='star-rating')['class']
     rating=rating_class[1].split('-')[-1]
     final_list.append([title,content,date,rating])
df = pd.DataFrame(final_list,columns=['Title','Content','Date','Rating'])
print(df)

输出量

                                                Title                                            Content        Date Rating
0                      I ordered a jacket 2 weeks ago  I ordered a jacket 2 weeks ago.  Still hasn't ...  2019-01-13      1
1              I've used this service for many years…  I've used this service for many years and get ...  2018-12-31      4
2                                       Great website  Great website, tailored recommendations, and e...  2018-12-19      5
3              I was excited by the prospect offered…  I was excited by the prospect offered by threa...  2018-12-18      1
4       Thread set the benchmark for customer service  Firstly, their customer service is second to n...  2018-12-12      5
5                                    It's a good idea  It's a good idea.  I am in between sizes and d...  2018-12-02      3
6                             Great experience so far  Great experience so far. Big choice of clothes...  2018-10-31      5
7                    Absolutely love using Thread.com  Absolutely love using Thread.com.  As a man wh...  2018-10-31      5
8                 I'd like to give Thread a one star…  I'd like to give Thread a one star review, but...  2018-10-30      2
9            Really enjoying the shopping experience…  Really enjoying the shopping experience on thi...  2018-10-22      5
10                         The only way I buy clothes  I absolutely love Thread. I've been surviving ...  2018-10-15      5
11                                  Excellent Service  Excellent ServiceQuick delivery, nice items th...  2018-07-27      5
12             Convenient way to order clothes online  Convenient way to order clothes online, and gr...  2018-07-05      5
13                Superb - would thoroughly recommend  Recommendations have been brilliant - no more ...  2018-06-24      5
14                    First time ordering from Thread  First time ordering from Thread - Very slow de...  2018-06-22      1
15          Some of these criticisms are just madness  I absolutely love thread.com, and I can't reco...  2018-05-28      5
16                                       Top service!  Great idea and fantastic service. I just recei...  2018-05-17      5
17                                      Great service  Great service. Great clothes which come well p...  2018-05-05      5
18                                          Thumbs up  Easy, straightforward and very good costumer s...  2018-04-17      5
19                 Good idea, ruined by slow delivery  I really love the concept and the ordering pro...  2018-04-08      3
20                                      I love Thread  I have been using thread for over a year. It i...  2018-03-12      5
21      Clever simple idea but.. low quality clothing  Clever simple idea but.. low quality clothingL...  2018-03-12      2
22                      Initially I was impressed....  Initially I was impressed with the Thread shop...  2018-02-07      2
23                                 Happy new customer  Joined the site a few weeks ago, took a short ...  2018-02-06      5
24                          Style tips for mature men  I'm a man of mature age, let's say a "baby boo...  2018-01-31      5
25            Every shop, every item and in one place  Simple, intuitive and makes online shopping a ...  2018-01-28      5
26                     Fantastic experience all round  Fantastic experience all round.  Quick to regi...  2018-01-28      5
27          Superb "all in one" shopping experience …  Superb "all in one" shopping experience that i...  2018-01-25      5
28  Great for time poor people who aren’t fond of ...  Rally love this company. Super useful for thos...  2018-01-22      5
29                            Really is worth trying!  Quite cautious at first, however, love the way...  2018-01-10      4
30           14 days for returns is very poor given …  14 days for returns is very poor given most co...  2017-12-20      3
31                  A great intro to online clothes …  A great intro to online clothes shopping. Usef...  2017-12-15      5
32                           I was skeptical at first  I was skeptical at first, but the service is s...  2017-11-16      5
33            seems good to me as i hate to shop in …  seems good to me as i hate to shop in stores, ...  2017-10-23      5
34                          Great concept and service  Great concept and service. This service has be...  2017-10-17      5
35                                      Slow dispatch  My Order Dispatch was extremely slow compared ...  2017-10-07      1
36             This company sends me clothes in boxes  This company sends me clothes in boxes! I find...  2017-08-28      5
37          I've been using Thread for the past six …  I've been using Thread for the past six months...  2017-08-03      5
38                                             Thread  Thread, this site right here is literally the ...  2017-06-22      5
39                                       good concept  The website is a good concept in helping buyer...  2017-06-14      3

注意:尽管我可以“破解”获取该网站结果的方式,但是最好使用硒来抓取动态页面。

编辑:自动找出页数的代码

from bs4 import BeautifulSoup
import math
import pandas as pd
final_list=[]#final list to be the df
import json
import requests
final_list=[]
url = 'https://uk.trustpilot.com/review/thread.com'
#making a request to get the number of reviews
r=requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
review_count_h2=soup.find('h2',class_="header--inline").text
review_count=int(review_count_h2.strip().split(' ')[0].strip())
#there are 20 reviews per page so pages can be calculated as
pages=int(math.ceil(review_count/20))
#change range to 1 to pages+1
for pg in range(1, pages+1):
  pg = url + '?page=' + str(pg)
  r=requests.get(pg)
  soup = BeautifulSoup(r.text, 'lxml')
  for paragraph in soup.find_all('section',class_='review__content'):
     try:
         title=paragraph.find('h2',class_='review-content__title').text.strip()
         content=paragraph.find('p',class_='review-content__text').text.strip()
         datedata= json.loads(paragraph.find('div',class_='review-content-header__dates').text)
         date=datedata['publishedDate'].split('T')[0]
         rating_class=paragraph.find('div',class_='star-rating')['class']
         rating=rating_class[1].split('-')[-1]
         final_list.append([title,content,date,rating])
     except AttributeError:
        pass
df = pd.DataFrame(final_list,columns=['Title','Content','Date','Rating'])
print(df)

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章