用硒进行网页抓取以单击按钮并抓取所有内容

罗伯特_553z8

我已经在这个刮刀上工作了一段时间,我认为它可以改进,但我不知道从哪里开始。

最初的刮板看起来像这样,我相信它可以完成我需要它做的一切:

                                                                                                                                                        
url = "https://matrix.heartlandmls.com/Matrix/Public/Portal.aspx?L=1&k=990316X949Z&p=DE-74613894-421"   

h_table = []
driver = webdriver.Firefox()
driver.get(url)
driver.find_element_by_xpath("/html/body/form/div[3]/div/div/div[5]/div[3]/span[2]/div/div/div[2]/div[1]/div/div/div[2]/div[2]/div[1]/span/a").click()
time.sleep(10)
i = 200
while i > 0:
    h_table.append(driver.find_element_by_id("wrapperTable").text)
    driver.find_element_by_xpath("/html/body/form/div[3]/div/div/div[5]/div[2]/div/div[1]/div/div/span/ul/li[2]/a").click()
    time.sleep(10)
    i -= 1

这将所有内容输出到我可以清理的表中

['210 Sitter Street\nPleasant Hill, MO 64080\nMLS#:2178982\nMap\n$15,000\nSold\n4Bedrms\n2Full Bath(s)\n0Half Bath(s)\n1,848Sqft\nBuilt in1950\n0.27Acres\nSingle Family\n1 / 10\nThis Home sits on a level, treed, and nice .279 acre sizeable double lot. The property per taxes, is identified as a Single Family Home however it has 2 separate utility meters and 2 living spaces, each with 2 bedrooms and 1 full bath and laundry areas, and was utilized as a Duplex for Rental income for 2 units. This property is a CASH ONLY sale and is being sold "In It\'s Present Condition". Home and detached garage are in need of repair OR would be a candidate for a tear down and complete rebuild on the lot.\nAbout 210 Sitter Street, Pleasant Hill, MO 64080\nDirections:I-70 to 7 Hwy, to Broadway, to Sitter St, to property.\nGeneral Description\nMLS Number\n2178982\nCounty\nCass\nCity\nPleasant Hill\nSub Div\nWalkers & Sitlers\nType\nSingle Family\nFloor Plan Description\nRanch\nBdrms\n4\nBaths Full\n2\nBaths Half\n0\nAge Description\n51-75 Years\nYear Built\n1950\nSqft Main\n1848\nSQFT MAIN SOURCE\nPublic Record\nBelow Grade Finished Sq Ft\n0\nBelow Grade Finished Sq Ft Source\nPublic Record\nSqft\n1848\nLot Size\n12,155\nAcres\n0.27\nSchools E\nPleasant Hill Prim\nSchools M\nPleasant Hill\nSchools H\nPleasant Hill\nSchool District\nPleasant Hill\nLegal Description\nWALKER & SITLERS LOT 47 & 48 BLK 5\nS Terms\nCash\nInterior Features\nFireplace?\nY\nFireplace Description\nLiving Room, Wood Burning\nBasement\nN\nBasement Description\nBlock, Crawl Space\nDining Area Description\nEat-In Kitchen\nUtility Room\nMultiple, Main Level\nInterior Features\nFixer Up\nRooms\nBathroom Full\nLevel 1\n2nd Full Bath\nLevel 1\nMaster Bedroom\nLevel 1\nSecond Bedroom\nLevel 1\nMaster BR- 2nd\nLevel 1\nFourth Bedroom\nLevel 1\nKitchen\nLevel 1\nKitchen- 2nd\nLevel 1\nLiving Room\nLevel 1\nFamily Rm- 2nd\nLevel 1\nExterior / Construction\nGarage/Parking?\nY\nGarage/Parking #\n2\nGarage Description\nDetached, Front Entry\nConstruction\nFrame\nArchitecture\nTraditional\nRoof\nComposition\nLot Description\nCity Limits, City Lot, Level, Treed\nIn Floodplain\nNo\nInside City Limits\nYes\nStreet Maintenance\nPub Maint, Paved\nExterior Features\nFixer Up\nUtility Information\nCentral Air\nY\nHeat\nForced Air Gas\nCool\nCentral Electric, Window Unit(s)\nWater\nCity/Public\nSewer\nCity/Public\nFinancial Information\nS Terms\nCash\nHoa Amount\n$0\nTax\n$1,066\nSpecial Tax\n$0\nTotal Tax\n$1,066\nExclusions\nEntire Property\nType Of Ownership\nPrivate\nWill Sell\nCash\nAssessment & Tax\nAssessment Year\n2019\n2018\n2017\nAssessed Value - Total\n$17,240\n$15,380\n$15,380\nAssessed Value - Land\n$2,400\n$1,920\n$1,920\nAssessed Value - Improved\n$14,840\n$13,460\n$13,460\nYOY Change ($)\n$1,860\n$\nYOY Change (%)\n12%\n0%\nTax Year\n2019\n2018\n2017\nTotal Tax\n$1,178.32\n$1,065.64\n$1,064.30\nYOY Change ($)\n$113\n$1\nYOY Change (%)\n11%\n0%\nNotes for you and your agent\nAdd Note\nMap data ©2020\nTerms of Use\nReport a map error\nMap\n200 ft \nParcel Disclaimer'

然而,我已经看到了其他一些 WebDriverWait 的例子,但到目前为止我没有成功,我认为它会大大加快刮板速度,这是我写的代码

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://matrix.heartlandmls.com/Matrix/Public/Portal.aspx?L=1&k=990316X949Z&p=DE-74613894-421"
h_table = []
xpath = '/html/body/form/div[3]/div/div/div[5]/div[2]/div/div[1]/div/div/span/ul/li[2]/a'
driver = webdriver.Firefox()
driver.get(url)
driver.find_element_by_xpath("/html/body/form/div[3]/div/div/div[5]/div[3]/span[2]/div/div/div[2]/div[1]/div/div/div[2]/div[2]/div[1]/span/a").click()
time.sleep(10)
while True:
    button = driver.find_elements_by_xpath("/html/body/form/div[3]/div/div/div[5]/div[2]/div/div[1]/div/div/span/ul/li[2]/a")
    if len(button) < 1:
        print('done')
        break
    else:
        h_table.append(driver.find_element_by_id("wrapperTable").text)
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, 'xpath'))).click()

这似乎给出了所有结果,但它给出了重复项,而且我无法在没有键盘中断的情况下停止它

calling len(h_table) = 258, where it should be 200
巴斯蒂安-p1

如果您的列表长度有问题,为什么不使用:

    if len(h_table) >= 200:
        print("done")
        break

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章