I am new to python!! I want to Extract data from yelp
https://www.yelp.com/search?find_desc=nails+salons&find_loc=San+Francisco%2C+CA&ns=1
and then from clicking on name on 1st page ...i.e
https://www.yelp.com/biz/joy-joy-nail-and-spa-san-francisco?osq=nails+salons
it should extract
Name
Address
Website
Contact No
Rating (How many) in numbers
and then it should continue doing so for full page Example output
Joy Joy Nail & Spa
4023 24th St San Francisco, CA 94114
joyjoynailspa.com
(415) 655-3216
6 Reviews
Sunset Nails
1810 Irving St
San Francisco, CA 94122
(415) 566-9888
1185 reviews
if any of the element not present like website it should skip that info and continue
So, basically you have to go to page, then using find_elements
have to see how many items are present to scrape, then select the first one and scrape the desire elements and go back to the previous page and do the same for other products.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(50)
driver.get("https://www.yelp.com/search?find_desc=nails+salons&find_loc=San+Francisco%2C+CA&ns=1")
wait = WebDriverWait(driver, 20)
lnght = len(driver.find_elements(By.XPATH, "//div[contains(@class,'businessName')]/descendant::a"))
j = 0
for item in range(lnght):
elements = driver.find_elements(By.XPATH, "//div[contains(@class,'arrange-unit') and contains(@class,'arrange-unit-fill')]//ancestor::div[contains(@class,'container') and contains(@class,'hover')]")
time.sleep(1)
#driver.execute_script("arguments[0].scrollIntoView(true);", elements[j])
eles = driver.find_elements(By.XPATH, "//h4/descendant::a")
ActionChains(driver).move_to_element(eles[j]).click().perform()
#elements[j].click()
time.sleep(2)
print(wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'headingLight')]//h1"))).text)
print(wait.until(EC.visibility_of_element_located((By.XPATH, "//p[text()='Business website']/following-sibling::p/a"))).text)
print(wait.until(EC.visibility_of_element_located((By.XPATH, "//p[text()='Phone number']/following-sibling::p"))).text)
print(wait.until(EC.visibility_of_element_located((By.XPATH, "//a[text()='Get Directions']/../following-sibling::p"))).text)
print(wait.until(EC.visibility_of_element_located((By.XPATH, "//span[contains(text(),'reviews')]"))).text)
driver.execute_script("window.history.go(-1)")
time.sleep(2)
j = j + 1
Update 1 :
Whichever line is causing the issue, try to wrap them like this :
try:
print(wait.until(EC.visibility_of_element_located((By.XPATH, "//p[text()='Business website']/following-sibling::p/a"))).text)
except:
pass
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments