i am trying to define two functions to easily grab any table off the web as a pandas dataframe using a link and xpath. however once i try to use pd.readhtml i get the error 'ValueError: No tables found' i added a print(html) and to my suprise the html contains my data as plain text. all html codes have dissapeared. Any idea why this is happening and how to convert from webelement to pandas dataframe?
my code:
import pandas as pd
def openchrome():
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
#open browser
opt = webdriver.ChromeOptions()
opt.add_argument('headless')
serv = Service("d:\webdrivers\chromedriver")
browser = webdriver.Chrome(service=serv,options=opt)
return browser
def scrape(browser, link, xpath):
from selenium.webdriver.common.by import By
browser.get(link)
html = browser.find_element( By.XPATH , xpath)
print(html)
df = pd.read_html(html)
return df
#df=pd.dataframe()
#return df
browser = openchrome()
df = scrape(browser, 'https://www.multpl.com/s-p-500-pe-ratio/table/by-year', '/html/body/div[2]/div[2]/div[2]/div[1]/div[3]/div/div[1]/table')
As the error states, no tables are being found. Why?
pd.read_html
can't parse WebElement, only a URL, a file-like object, or a raw string containing HTML. That said, you may use html.get_attribute('outerHTML')
to get the WebElement raw HTML as argument of pd.read_html
.def scrape(browser, link, xpath):
from selenium.webdriver.common.by import By
browser.get(link)
html = browser.find_element(By.XPATH, xpath)
print(html.get_attribute('outerHTML'))
df = pd.read_html(html.get_attribute('outerHTML'))
return df
# df=pd.dataframe()
# return df
browser = openchrome()
df = scrape(browser, 'https://www.multpl.com/s-p-500-pe-ratio/table/by-year',
'/html/body/div[2]/div[2]/div[2]/div[1]/div[3]/div/div[1]/table')
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments