我正在尝试抓取该网站上所有论坛帖子的所有主题标题。我不知道如何解决这个问题,因为论坛网站的 HTML 格式不是我所熟悉的。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'http://thailove.net/bbs/board.php?bo_table=ent'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
#I don't think this is correct, but not sure on how else to to do this...
containers = page_soup.findAll("td",{"class":"td_subject"})
for container in containers:
subject = container.a.font.font.contents
#similarly not sure this is correct
print("subject: ", subject)
请让我知道我应该怎么做。另请记住,该网站是韩文,但如果需要,可以轻松翻译成英文。
您的代码很好,直到您进入for
循环,您应该container.a.contents[0]
访问主题,并且该print
函数应该在您的for
循环内:
for container in containers:
subject = container.a.contents[0]
print("subject: ", subject)
然后运行脚本:
>>>
subject:
미성년자도 이용하는 게시판이므로 글 수위를 지켜주세요.
subject:
방콕의 대표 야시장 - 딸랏롯파이2
subject:
공항에서 제일 가까운 레드썬 마사지
.......
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句