因此,我制作了一个 selenium bot,它遍历领土代码列表,并将此代码发送到网站的搜索框,该网站将代码更改为城市名称,然后我将其抓取以获取城市列表以代替代码列表。问题是,当我的 for 循环遍历列表时,有时它会“跳过”给出的命令并直接进入下一次迭代,因此我没有收到完整的城市列表。列表中的某些代码不存在或不适合传递到网站中,因此我对这种情况进行了例外处理。
import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")
# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()
cities = []
iteration = 0
for code in codes:
time.sleep(0.05)
iteration += 1
print(iteration)
if code == "Absence":
cities.append("Absence")
elif code == "Error":
cities.append("Error")
elif code == 2211041 or code == 2211021:
cities.append("Manual")
else:
# Send territorial code
driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').clear()
driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').send_keys(code)
# Search
try:
button = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH,
'/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
button.click()
except:
button = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH,
'/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
button.click()
# Scrape city name
city = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH, '//*[@id="body_TabContainer1_TabPanel1_GVTERC"]/tbody/tr[2]/td[1]/strong'))).text.split()
print(code)
print(city)
cities.append(city)
table = {
"Cities": cities
}
df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()
这是我的控制台日志的一部分。如您所见,在指示迭代数为 98 后,它跳到 99,在那里它完全正常工作,打印城市和地区代码。这个问题会在循环的更深处发生,但每次都从迭代编号 98 开始。与此相关的领土代码不是例外之一。
96 <-- Iteration
2201025 <-- Territorial Code
['Kędzierzyn-Koźle', '(2201025)'] <-- City Name
97
2262011
['Bytów', '(2262011)']
98 !<-- Just iteration!
99
2205084
['Gdynia', '(2208011)']
**!Quick Note due to the answers! Here is the order of the print statements in the console. First: number of the iteration, Second: Territorial Code related to the iteration, Third: City Name**
这里有几个问题:
我试图让你的代码更好一点。
请尝试一下。
import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")
# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()
code_input_xpath = 'body_TabContainer1_TabPanel1_TBJPTIdentyfikator'
search_button_xpath = '//input[@id="body_TabContainer1_TabPanel1_BJPTWyszukaj"]'
city_xpath = '//table[@id="body_TabContainer1_TabPanel1_GVTERC"]//td/strong'
cities = []
iteration = 0
for code in codes:
time.sleep(0.1)
iteration += 1
print(iteration)
if code == "Absence":
cities.append("Absence")
elif code == "Error":
cities.append("Error")
elif code == 2211041 or code == 2211021:
cities.append("Manual")
else:
# Send territorial code
driver.find_element_by_xpath(code_input_xpath).clear()
driver.find_element_by_xpath(code_input_xpath).send_keys(code)
# Search
button = WebDriverWait(driver, 20).until(
EC.visibility_of_element_located((By.XPATH,search_button_xpath)))
button.click()
# Scrape city name
time.sleep(2)
city = WebDriverWait(driver, 20).until(
EC.visibility_of_element_located((By.XPATH, city_xpath))).text.split()
print(code)
print(city)
cities.append(city)
table = {
"Cities": cities
}
df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句