Python for 循环跳过迭代

马克西米利安·姆罗夫卡

因此,我制作了一个 selenium bot,它遍历领土代码列表,并将此代码发送到网站的搜索框,该网站将代码更改为城市名称,然后我将其抓取以获取城市列表以代替代码列表。问题是,当我的 for 循环遍历列表时,有时它会“跳过”给出的命令并直接进入下一次迭代,因此我没有收到完整的城市列表。列表中的某些代码不存在或不适合传递到网站中,因此我对这种情况进行了例外处理。

import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")

# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()


cities = []


iteration = 0

for code in codes:
    time.sleep(0.05)
    iteration += 1
    print(iteration)
    if code == "Absence":
        cities.append("Absence")
    elif code == "Error":
        cities.append("Error")
    elif code == 2211041 or code == 2211021:
        cities.append("Manual")
    else:
        # Send territorial code
        driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').clear()
        driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').send_keys(code)
        # Search
        try:
            button = WebDriverWait(driver, 20).until(
                EC.presence_of_element_located((By.XPATH,
                                                '/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
            button.click()
        except:
            button = WebDriverWait(driver, 20).until(
                EC.presence_of_element_located((By.XPATH,
                                                '/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
            button.click()
        # Scrape city name
        city = WebDriverWait(driver, 20).until(
            EC.presence_of_element_located((By.XPATH, '//*[@id="body_TabContainer1_TabPanel1_GVTERC"]/tbody/tr[2]/td[1]/strong'))).text.split()
        print(code)
        print(city)
        cities.append(city)


table = {
    "Cities": cities
}

df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()

这是我的控制台日志的一部分。如您所见,在指示迭代数为 98 后,它跳到 99,在那里它完全正常工作,打印城市和地区代码。这个问题会在循环的更深处发生,但每次都从迭代编号 98 开始。与此相关的领土代码不是例外之一。

96 <-- Iteration
2201025 <-- Territorial Code
['Kędzierzyn-Koźle', '(2201025)'] <-- City Name
97
2262011
['Bytów', '(2262011)']
98 !<-- Just iteration!
99
2205084
['Gdynia', '(2208011)']

**!Quick Note due to the answers! Here is the order of the print statements in the console. First: number of the iteration, Second: Territorial Code related to the iteration, Third: City Name**
预言家

这里有几个问题:

  1. 你的定位器很糟糕。
  2. 我看到你的结果不正确。例如,对于“2262011”输入,当您为输入“2205084”呈现此输出时,输出为“Gdynia (2262011)”
  3. 您的 except 代码类似于 try 代码。这没有意义。如果这在 try 块中不起作用,为什么您认为这会在没有任何更改的情况下在第二次尝试中起作用?
  4. 也最好等待元素可见性而不是存在,因为在元素刚刚呈现的那一刻,它仍然没有完全准备好被点击等。
  5. 最好将元素定位器至少保留在类的顶部,而不是在代码中进行硬编码。

我试图让你的代码更好一点。
请尝试一下。

import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")

# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()

code_input_xpath = 'body_TabContainer1_TabPanel1_TBJPTIdentyfikator'
search_button_xpath = '//input[@id="body_TabContainer1_TabPanel1_BJPTWyszukaj"]'
city_xpath = '//table[@id="body_TabContainer1_TabPanel1_GVTERC"]//td/strong'



cities = []


iteration = 0

for code in codes:
    time.sleep(0.1)
    iteration += 1
    print(iteration)
    if code == "Absence":
        cities.append("Absence")
    elif code == "Error":
        cities.append("Error")
    elif code == 2211041 or code == 2211021:
        cities.append("Manual")
    else:
        # Send territorial code
        driver.find_element_by_xpath(code_input_xpath).clear()
        driver.find_element_by_xpath(code_input_xpath).send_keys(code)
        # Search
        button = WebDriverWait(driver, 20).until(
                EC.visibility_of_element_located((By.XPATH,search_button_xpath)))
            button.click()        
        # Scrape city name
        time.sleep(2)
        city = WebDriverWait(driver, 20).until(
            EC.visibility_of_element_located((By.XPATH, city_xpath))).text.split()
        print(code)
        print(city)
        cities.append(city)


table = {
    "Cities": cities
}

df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章