我的代码循环遍历 urls 但不是 urls Python 中的页面

匿名13

我正在尝试从 url 中提取名称和评论,我的代码循环遍历 url,但不包含其中的页面

len(name) 给出 37

urls = ['https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/','https://www.f150forum.com/f118/adaptive-cruise-control-sensor-blockage-446041/']

name=[]

for url in urls:
    with requests.Session() as req:
        for item in range(1,3):
            response = req.get(f"{url}index{item}/")
            soup = BeautifulSoup(response.content, "html.parser")
            posts = soup.find(id = "posts")
            threadtitle = soup.find('h1',attrs={"class":"threadtitle"})
            for item in soup.findAll('a',attrs={"class":"bigusername"}):
                result = [item.get_text(strip=True, separator=" ")]
                name.append(result)


当我尝试运行此代码时

for url in urls:
    with requests.Session() as req:
        for item in range(1,3):
            response = req.get(f"{url}index{item}/")
            soup = BeautifulSoup(response.content, "html.parser")
            posts = soup.find(id = "posts")
            threadtitle = soup.find('h1',attrs={"class":"threadtitle"})
        for item in soup.find_all('div', class_="ism-true"):
            try:
                item.find('div', class_="panel alt2").extract()                  
            except AttributeError:
                pass 
            try:
                item.find('label').extract()
            except AttributeError:
                pass
            result = [item.get_text(strip=True, separator=" ")]
            comments1.append(item.text.strip())

len(comments1) 只给出 17 它只提取 page2,它是范围中的最后一页。如何确保我的代码循环遍历所有页面。

昆杜克

如果您想遍历所有页面,您可以定位下一个链接,直到它被禁用。

代码

urls = ['https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/',
        'https://www.f150forum.com/f118/adaptive-cruise-control-sensor-blockage-446041/']

name = []

for url in urls:
    with requests.Session() as req:
        index = 1
        while (True):

            # Checking url here
            print(url + "index{}/".format(index))
            response = req.get(url + "index{}/".format(index))
            index = index + 1
            soup = BeautifulSoup(response.content, "html.parser")

            posts = soup.find(id="posts")
            threadtitle = soup.find('h1', attrs={"class": "threadtitle"})
            for item in soup.findAll('a', attrs={"class": "bigusername"}):
                result = [item.get_text(strip=True, separator=" ")]
                name.append(result)
            # Check here next link is disable.
            if 'disabled' in soup.select_one('a#mb_pagenext').attrs['class'][-1]:
                break

print(len(name))

在控制台上,您可以看到它打印了所有页面 url 和总名称计数。

https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index1/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index2/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index3/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index4/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index5/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index6/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index7/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index8/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index9/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index10/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index11/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index12/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index13/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index14/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index15/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index16/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index17/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index18/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index19/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index20/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index21/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index22/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index23/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index24/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index25/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index26/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index27/
https://www.f150forum.com/f118/2018-adding-adaptive-cruise-control-415450/index28/
https://www.f150forum.com/f118/adaptive-cruise-control-sensor-blockage-446041/index1/
https://www.f150forum.com/f118/adaptive-cruise-control-sensor-blockage-446041/index2/
280

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章