我正在嘗試從網站的多個頁面(例如,從第 1 頁到第 20 頁)從archdaily抓取所有信息。
html 結構如下所示:
<div>
<div class = 'afd-container-main afd-container-main--margin-bottom nft-container-main-search clearfix afd-mobile-margin search-container'>
::before
<div>
<div class='gridview'>
<div>
<div data-insights-category>
<a href = '...'> # this is the htmls i wanted
我使用的代碼是
soup = BeautifulSoup(html, 'html')
for foo in soup.find_all('div'):
bar = foo.find('div', attrs={'class': 'afd-container-main afd-container-main--margin-bottom nft-container-main-search clearfix afd-mobile-margin search-container'})
print(bar.text)
錯誤信息
AttributeError: 'NoneType' object has no attribute 'text'
我誤解了什麼嗎?
注意:因為這個問題沒有透露,你是如何得到你的html的,所以回答起來並不容易。
如果使用requests
,則不會以這種方式獲得結果,因為該站點會處理動態提供的內容。
requests
通過 api獲取信息(提供更多信息 - 類別、公司、...)
#iterate over pages
for p in range(1,3):
r = requests.get(f'https://www.archdaily.com/search/api/v1/us/projects/categories/residential-architecture?page={p}') #url of next page
for item in r.json()['results']:
# iterate over results and print title+url
print(item['title'], item['url'])
html
通過 Selenium渲染
import requests
for p in range(1,2):
r = requests.get(f'https://www.archdaily.com/search/api/v1/us/projects/categories/residential-architecture?page={p}') #url of next page
for item in r.json()['results']:
print(item['title'], item['url'])
Wooden House / derksen | windt architecten https://www.archdaily.com/972995/wooden-house-derksen-windt-architecten?ad_source=search&ad_medium=projects_tab
PLA2 House / Dersyn Studio https://www.archdaily.com/972939/pla2-house-dersyn-studio?ad_source=search&ad_medium=projects_tab
gjG House / BLAF Architecten https://www.archdaily.com/951845/gjg-house-blaf-architecten?ad_source=search&ad_medium=projects_tab
Leopoldo 1201 Residential Building / aflalo/gasperini arquitetos https://www.archdaily.com/972959/leopoldo-1201-residential-building-aflalo-gasperini-arquitetos?ad_source=search&ad_medium=projects_tab
Sayang House / Carlos Gris Studio https://www.archdaily.com/972773/sayang-house-carlos-gris-studio?ad_source=search&ad_medium=projects_tab
Nong Ho 17 House / Skarn Chaiyawat https://www.archdaily.com/972911/nong-ho-17-house-skarn-chaiyawat?ad_source=search&ad_medium=projects_tab
LÂM’s Home / AD+studio https://www.archdaily.com/972794/lams-home-ad-plus-studio?ad_source=search&ad_medium=projects_tab
Limestone House / John Wardle Architects https://www.archdaily.com/972958/limestone-house-john-wardle-architects?ad_source=search&ad_medium=projects_tab
Quay Wall House / Thomas Kemme Architects https://www.archdaily.com/971781/quay-wall-house-thomas-kemme-architects?ad_source=search&ad_medium=projects_tab
...
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句