How do i extract data from yelp using selenium python

Muazma Tech

I am new to python!! I want to Extract data from yelp

https://www.yelp.com/search?find_desc=nails+salons&find_loc=San+Francisco%2C+CA&ns=1

and then from clicking on name on 1st page ...i.e

https://www.yelp.com/biz/joy-joy-nail-and-spa-san-francisco?osq=nails+salons

it should extract

Name
Address 
Website
Contact No
Rating (How many) in numbers

and then it should continue doing so for full page Example output

Joy Joy Nail & Spa 
4023 24th St San Francisco, CA 94114
joyjoynailspa.com
(415) 655-3216
6 Reviews




Sunset Nails
1810 Irving St 
San Francisco, CA 94122
(415) 566-9888
1185 reviews

if any of the element not present like website it should skip that info and continue

cruisepandey

So, basically you have to go to page, then using find_elements have to see how many items are present to scrape, then select the first one and scrape the desire elements and go back to the previous page and do the same for other products.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(50)
driver.get("https://www.yelp.com/search?find_desc=nails+salons&find_loc=San+Francisco%2C+CA&ns=1")
wait = WebDriverWait(driver, 20)
lnght = len(driver.find_elements(By.XPATH, "//div[contains(@class,'businessName')]/descendant::a"))
j = 0
for item in range(lnght):
    elements = driver.find_elements(By.XPATH, "//div[contains(@class,'arrange-unit') and contains(@class,'arrange-unit-fill')]//ancestor::div[contains(@class,'container') and contains(@class,'hover')]")
    time.sleep(1)
    #driver.execute_script("arguments[0].scrollIntoView(true);", elements[j])
    eles = driver.find_elements(By.XPATH, "//h4/descendant::a")
    ActionChains(driver).move_to_element(eles[j]).click().perform()
    #elements[j].click()
    time.sleep(2)
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'headingLight')]//h1"))).text)
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//p[text()='Business website']/following-sibling::p/a"))).text)
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//p[text()='Phone number']/following-sibling::p"))).text)
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//a[text()='Get Directions']/../following-sibling::p"))).text)
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//span[contains(text(),'reviews')]"))).text)
    driver.execute_script("window.history.go(-1)")
    time.sleep(2)
    j = j + 1

Update 1 :

Whichever line is causing the issue, try to wrap them like this :

try:
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//p[text()='Business website']/following-sibling::p/a"))).text)
except:
    pass

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How do I extract data from multiple text files to Excel using Python? (One file's data per sheet)

How can i extract href from this html using selenium?

How to extract url from onclick javascript using selenium : Python

Extract data from multiple page using python selenium/Beautifulsoup

how can i extract this value from a website, with python, selenium and chromedriver

How to Extract Data from tmdB using Python

How do I extract data from a script?

How do I extract data from multiples tables using beautifulsoup?

How do I access CSS data with selenium using python?

How do i extract data from a live website with python?

How can I extract information from a HTML code using Python + Selenium?

How to loop from a list of urls by clicking the xpath and extract data using Selenium in Python?

How do I extract a table from a webpage using selenium when the table is not constructed with the HTML 'table' tag?

How do I extract data from a node in a linked list in Python?

how do I extract data from linked pages in websites using python

How to extract data from product page with selenium python

How do I capture hidden data from a table with Selenium and Python?

How to extract values or data from a list of stored links using selenium python?

How to to extract data from notam using selenium

How do I Extract ADDRESS info from a fakeaddressgenerator using selenium python

How do I Extract ZIP code from a Website using selenium python

How do I scrape data from Trip Advisor by using Selenium? - Python

How do I extract data from a DataFrame using regular expressions?

How to extract data from both th and td tags using Selenium in Python?

how to extract all data from a web page with a scroll using selenium python and different rank pages?

how do you extract data from json using python

I am trying to extract data from class using selenium but it is not working

How to extract data from a dynamic table with selenium python?

How can I extract the text from a webelement using selenium

TOP Ranking

  1. 1

    Failed to listen on localhost:8000 (reason: Cannot assign requested address)

  2. 2

    Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

  3. 3

    How to import an asset in swift using Bundle.main.path() in a react-native native module

  4. 4

    pump.io port in URL

  5. 5

    Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

  6. 6

    BigQuery - concatenate ignoring NULL

  7. 7

    ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

  8. 8

    ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

  9. 9

    Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

  10. 10

    How to remove the extra space from right in a webview?

  11. 11

    java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

  12. 12

    Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

  13. 13

    flutter: dropdown item programmatically unselect problem

  14. 14

    How to use merge windows unallocated space into Ubuntu using GParted?

  15. 15

    Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

  16. 16

    Nuget add packages gives access denied errors

  17. 17

    Svchost high CPU from Microsoft.BingWeather app errors

  18. 18

    Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

  19. 19

    12.04.3--- Dconf Editor won't show com>canonical>unity option

  20. 20

    Any way to remove trailing whitespace *FOR EDITED* lines in Eclipse [for Java]?

  21. 21

    maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

HotTag

Archive