比较初学者。有类似的主题,但我可以看到我的解决方案是如何工作的,我只需要帮助连接最后几个点。我想在不使用 API 的情况下从 Instagram 中获取关注者数量。这是我到目前为止所拥有的:
Python 3.7.0
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
> DevTools listening on ws://.......
driver.get("https://www.instagram.com/cocacola")
soup = BeautifulSoup(driver.page_source)
elements = soup.find_all(attrs={"class":"g47SY "})
# Note the full class is 'g47SY lOXF2' but I can't get this to work
for element in elements:
print(element)
>[<span class="g47SY ">667</span>,
<span class="g47SY " title="2,598,456">2.5m</span>, # Need what's in title, 2,598,456
<span class="g47SY ">582</span>]
for element in elements:
t = element.get('title')
if t:
count = t
count = count.replace(",","")
else:
pass
print(int(count))
>2598456 # Success
有没有更简单或更快捷的方法来获得 2,598,456 号码?我最初的希望是我可以只使用“g47SY lOXF2”类,但据我所知,类名中的空格在 BS4 中不起作用。只是想确保此代码简洁且实用。
我不得不使用 headless 选项并添加了 executable_path 进行测试。你可以删除它。
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(executable_path="chromedriver.exe",chrome_options=options)
driver.get('https://www.instagram.com/cocacola')
soup = BeautifulSoup(driver.page_source,'lxml')
#This will give you span that has title attribute. But it gives us multiple results
#Follower count is in the inner of a tag.
followers = soup.select_one('a > span[title]')['title'].replace(',','')
print(followers)
#Output 2598552
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句