How can I loop scraping data for multiple pages in a website using python and beautifulsoup4

Gonzalo68

I am trying to scrape data from the PGA.com website to get a table of all of the golf courses in the United States. In my CSV table I want to include the Name of the golf course ,Address ,Ownership ,Website , Phone number. With this data I would like to geocode it and place into a map and have a local copy on my computer

I utilized Python and Beautiful Soup4 to extract my data. I have reached as far to extract the data and import it into a CSV but I am now having a problem of scraping data from multiple pages on the PGA website. I want to extract ALL THE GOLF COURSES but my script is limited only to one page I want to loop it in away that it will capture all data for golf courses from all pages found in the PGA site. There are about 18000 gold courses and 900 pages to capture data

Attached below is my script. I need help on creating code that will capture ALL data from the PGA website and not just one site but multiple. In this manner it will provide me with all the data of gold courses in the United States.

Here is my script below:

import csv
import requests 
from bs4 import BeautifulSoup
url = "http://www.pga.com/golf-courses/search?searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0"

r = requests.get(url)

soup = BeautifulSoup(r.content)

g_data1=soup.find_all("div",{"class":"views-field-nothing-1"})
g_data2=soup.find_all("div",{"class":"views-field-nothing"})

courses_list=[]

for item in g_data2:
     try:
          name=item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
     except:
          name=''
     try:
          address1=item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
     except:
          address1=''
     try:
          address2=item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
     except:
          address2=''
     try:
          website=item.contents[1].find_all("div",{"class":"views-field-website"})[0].text
     except:
          website=''   
     try:
          Phonenumber=item.contents[1].find_all("div",{"class":"views-field-work-phone"})[0].text
     except:
          Phonenumber=''      

     course=[name,address1,address2,website,Phonenumber]
     courses_list.append(course)

     with open ('filename5.csv','wb') as file:
          writer=csv.writer(file)
          for row in courses_list:
               writer.writerow(row)    

#for item in g_data1:
     #try:
          #print item.contents[1].find_all("div",{"class":"views-field-counter"})[0].text
     #except:
          #pass  
     #try:
          #print item.contents[1].find_all("div",{"class":"views-field-course-type"})[0].text
     #except:
          #pass

#for item in g_data2:
   #try:
      #print item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
   #except:
      #pass
   #try:
      #print item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
   #except:
      #pass
   #try:
      #print item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
   #except:
      #pass

This script only captures 20 at a time and I want to capture all in one script which account for 18000 golf courses and 900 pages to scrape form.

liamdiprose

The PGA website's search have multiple pages, the url follows the pattern:

http://www.pga.com/golf-courses/search?page=1 # Additional info after page parameter here

this means you can read the content of the page, then change the value of page by 1, and read the the next page.... and so on.

import csv
import requests 
from bs4 import BeautifulSoup
for i in range(907):      # Number of pages plus one 
    url = "http://www.pga.com/golf-courses/search?page={}&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0".format(i)
    r = requests.get(url)
    soup = BeautifulSoup(r.content)

    # Your code for each individual page here 

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Loop not working for scraping data using python and beautifulsoup4

Loop scraping for data for multiple pages in a website using a variable in a URL

How to scraping data from multiple pages in one web, I'm using Python and BeautifulSoup

How to fix error in web scraping with Python using BeautifulSoup4

While scraping a website for product information using beautifulsoup4 in python 3.6

BeautifulSoup4 & Python - multiple pages into DataFrame

How can i create a loop to scrape multiple pages from source url using BeautifulSoup?

scraping multiple pages in python with BeautifulSoup

writing and saving CSV file from scraping data using python and Beautifulsoup4

How to loop over multiple pages of a website using Scrapy

how i can split word and number after scraping website with BeautifulSoup?

How to separate scraping results after using beautifulsoup4?

How can I link a single script for a multiple pages website?

Web scraping using python for multiple pages

How can I parse a website using Selenium and Beautifulsoup in python?

How can I parse a website using Selenium and Beautifulsoup in python?

extracting data from website using beautifulsoup4 and parse into csv

How can I specify what <table class> I want when there are multiple duplicates in BeautifulSoup4?

Scraping contents of multi web pages of a website using BeautifulSoup and Selenium

Web scraping a list of pages from the same website using python

Scraping reports from a website using BeautifulSoup in python

python - web scraping an ajax website using BeautifulSoup

Scraping OpenTable website using python BeautifulSoup

How do I create multiple data frames using a for loop in python

Not scraping website data with beautifulsoup

BeautifulSoup4 scraping cannot reach beyond the first page in a website (Python 3.6)

How do I scrape more products of amazon with my python script using beautifulsoup4?

scrape multiple pages website with python and beautifulsoup

Website scraping with python - BeautifulSoup