Bs4和python中的问题

塔亚卜·纳西尔(Tayyab Nasir)
import requests

from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
url = 'https://edition.cnn.com/'
page = requests.get(url,headers=headers)
soup = BeautifulSoup(page.content,"html.parser")
al = soup.find_all("h3",attrs={'class':'cd__headline'})
for divv in al:
for links in divv.find_all('a'):

    print(links.text)
    print(links.get('href'))

我正在尝试从cnn中提取标题。我为汤提供了正确的html元素和类,但输出仍然为空,并且没有收到任何错误或回溯

丹德夫

该网页由嵌入在HTML脚本元素中的JSON动态生成。您可以提取JSON并进行解析以获取所需的数据,或者如您在上面的注释中所述,使用Selenium在页面上呈现JavaScript。提取JSON:

import requests
import json
from bs4 import BeautifulSoup

url = 'https://edition.cnn.com/'
page = requests.get(url)
soup = BeautifulSoup(page.content,"html.parser")
# Find the script element containging th JSON the web-page is dynamically generated from.
anchor = "var CNN = CNN || {};CNN.isWebview = false;CNN.contentModel = "
s = soup.find(lambda tag:tag.name=="script" and anchor in tag.text)
# Extract the JSON.
j = s.text[s.text.find("articleList")-2:s.text.find("}]")+4]
# Load the JSON.
d = json.loads(j)
# Read the headline from the JSON.
for article in d['articleList']:
    print ( article['headline'])

输出:

Here's how the show's cast reacted to the rant
Wanda Sykes quit show before it was cancelled
ABC took a moral stand on Roseanne. Spoiler alert: Trump won't.
<strong>Your questions on the 'Spider-Man' photo, answered</strong>
Trump, without proof, says Mueller team will meddle in 2018 elections
Trump wins by demonizing Mueller
2 police officers, passerby killed in Belgium
MH370 search ends but mystery remains
Israel responds to Gaza fire with airstrikes
French Open: Serena, Sharapova win
Duterte will 'go to war' over South China Sea
Giuliani gets booed on his birthday
<strong>Childhood obesity highest in home of Mediterranean diet</strong>
Top North Korea official heading to US to revive Trump talks
Suspected serial killer ID'd, but cops 'can't arrest him'
Pre-monsoon storms kill 48 in India
Lava 'river' engulfs home in minutes
Mugabe warned: Be at hearing or face jail 
Why supersonic air travel could boom in Asia 
'Unbreakable:' How tennis star Jelena Dokic overcame 'years of abuse' 
This guy survived Vesuvius eruption -- but not for long
Best travel photos of 2018
Online dating 'lowers self-esteem and increases depression'
Who is North Korea's go-to diplomat?
The best cities for swimming
Vatican unveils radical chapels
Why this country has the best libraries
The architect that changed our cities
<strong>Jill Filipovic:</strong> French Spider-Man's act of bravery you don't know about
<strong>Silvia Marchetti:</strong> Italy's chaos is more dangerous than Brexit
<strong>Jesse Williams and Judith Browne Dianis:</strong> Starbucks' incident proves 'Whites Only' spaces still exist 
<strong>Perez and O'Leary Carmona:</strong> How Trump is dehumanizing Latinos
Moment man climbs building to save child
Flash floods ravage US town 
See North Korea's nuclear tunnels go up in smoke
Meghan laughs off Harry's bee encounter
Blue flames burn during Kilauea eruption
Footage of NBA player's arrest released
Why Dubai is hungry for food delivery apps
Paris in spring? Must be Rafa Nadal time 
Fore! Golfers ignore erupting volcano
Take a tour of the Russia World Cup stadiums
Rugby World Cup 2019 Japan venues
Gorgeous Vietnam: Take a photo tour
Breathtaking architecture found underwater
India's problem with rape: Do women feel safe? 
Afghan who risked life for UK: 'They are sending me to get killed'

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章