How do i extract data from yelp using selenium python

Muazma Tech

I am new to python!! I want to Extract data from yelp

https://www.yelp.com/search?find_desc=nails+salons&find_loc=San+Francisco%2C+CA&ns=1

and then from clicking on name on 1st page ...i.e

https://www.yelp.com/biz/joy-joy-nail-and-spa-san-francisco?osq=nails+salons

it should extract

Name
Address 
Website
Contact No
Rating (How many) in numbers

and then it should continue doing so for full page Example output

Joy Joy Nail & Spa 
4023 24th St San Francisco, CA 94114
joyjoynailspa.com
(415) 655-3216
6 Reviews




Sunset Nails
1810 Irving St 
San Francisco, CA 94122
(415) 566-9888
1185 reviews

if any of the element not present like website it should skip that info and continue

cruisepandey

So, basically you have to go to page, then using find_elements have to see how many items are present to scrape, then select the first one and scrape the desire elements and go back to the previous page and do the same for other products.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(50)
driver.get("https://www.yelp.com/search?find_desc=nails+salons&find_loc=San+Francisco%2C+CA&ns=1")
wait = WebDriverWait(driver, 20)
lnght = len(driver.find_elements(By.XPATH, "//div[contains(@class,'businessName')]/descendant::a"))
j = 0
for item in range(lnght):
    elements = driver.find_elements(By.XPATH, "//div[contains(@class,'arrange-unit') and contains(@class,'arrange-unit-fill')]//ancestor::div[contains(@class,'container') and contains(@class,'hover')]")
    time.sleep(1)
    #driver.execute_script("arguments[0].scrollIntoView(true);", elements[j])
    eles = driver.find_elements(By.XPATH, "//h4/descendant::a")
    ActionChains(driver).move_to_element(eles[j]).click().perform()
    #elements[j].click()
    time.sleep(2)
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'headingLight')]//h1"))).text)
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//p[text()='Business website']/following-sibling::p/a"))).text)
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//p[text()='Phone number']/following-sibling::p"))).text)
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//a[text()='Get Directions']/../following-sibling::p"))).text)
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//span[contains(text(),'reviews')]"))).text)
    driver.execute_script("window.history.go(-1)")
    time.sleep(2)
    j = j + 1

Update 1 :

Whichever line is causing the issue, try to wrap them like this :

try:
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//p[text()='Business website']/following-sibling::p/a"))).text)
except:
    pass

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-09-1

Comments

0 comments

How do I extract data from multiple text files to Excel using Python? (One file's data per sheet)

How can i extract href from this html using selenium?

How do i extract data from yelp using selenium python

How do i extract data from yelp using selenium python

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

How to import an asset in swift using Bundle.main.path() in a react-native native module

pump.io port in URL

Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

BigQuery - concatenate ignoring NULL

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

How to remove the extra space from right in a webview?

java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

flutter: dropdown item programmatically unselect problem

How to use merge windows unallocated space into Ubuntu using GParted?

Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

Nuget add packages gives access denied errors

Svchost high CPU from Microsoft.BingWeather app errors

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

12.04.3--- Dconf Editor won't show com>canonical>unity option

Any way to remove trailing whitespace *FOR EDITED* lines in Eclipse [for Java]?

maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

Any way to remove trailing whitespace FOR EDITED lines in Eclipse [for Java]?