我尝试webscrape的这部分html:
<td class="zebraTable__td zebraTable__td--companyName"><a href="/unternehmen/8116602/schneider-electric-holding-germany-gmbh" data-gtm="companySearch__searchResult--76">
Schneider Electric Holding Germany GmbH
</a></td>
从此站点:
使用此代码:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd
import time
driver = webdriver.Chrome('/Users/rieder/Anaconda3/chromedriver_win32/chromedriver.exe')
driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=500&employeesTo=100000000&sortMethod=revenueDesc&p=1')
driver.find_element_by_id("cookiesNotificationConfirm").click();
company_name = driver.find_element_by_class_name('zebraTable__td zebraTable__td--companyName')
print(company_name)
我尝试了4个小时,但仍无法获取。我用xpath,链接文本等不同的方法进行了尝试,但是我得到的只是一个空公司名称,例如“ []”。
有人知道硒如何找到“Liebherr-HausgeräteOchsenhausen GmbH”的确切文本吗?
非常感谢。
要打印文本Schneider Electric Holding Germany GmbH,您必须为引入WebDriverWait,visibility_of_element_located()
并且可以使用以下两种定位策略之一:
使用CSS_SELECTOR
和文字属性:
driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#cookiesNotificationConfirm"))).click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.zebraTable.zebraTable--companies tr:nth-child(2)>td.zebraTable__td.zebraTable__td--companyName>a"))).text)
使用XPATH
和get_attribute("innerHTML")
:
driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@id='cookiesNotificationConfirm']"))).click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='zebraTable zebraTable--companies']//following::tr[2]/td[@class='zebraTable__td zebraTable__td--companyName']/a"))).get_attribute("innerHTML"))
控制台输出:
Schneider Electric Holding Germany GmbH
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
您可以在如何使用Selenium检索WebElement的文本中找到相关的讨论-Python
链接到有用的文档:
get_attribute()
方法 Gets the given attribute or property of the element.
text
属性返回 The text of the element.
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句