How can I extract href links from a within a table th using BeautifulSoup

SlowBear

I am trying to create a list of all football teams/links from any one of a number of tables within the base URL: https://fbref.com/en/comps/10/stats/Championship-Stats

I would then use the link from the href to scrape each individual team's data. The href is embedded within the th tag as per below

th scope="row" class="left " data-stat="squad"><a href="/en/squads/293cb36b/Barnsley-Stats">Barnsley</a></th

   a href="/en/squads/293cb36b/Barnsley-Stats">Barnsley</a

The following code gives me a list of the 'a' tags

page = "https://fbref.com/en/comps/10/Championship-Stats"
pageTree = requests.get(page)
pageSoup = BeautifulSoup(pageTree.content, 'html.parser')
Teams = pageSoup.find_all("th", {"class": "left"})

Output(for each class of 'left'):

th class="left" data-stat="squad" scope="row"> a href="/en/squads/293cb36b/Barnsley-Stats">Barnsley,

I have tried the guidance from a previous Stack question (Extract links after th in beautifulsoup) However, the following code based on that thread produces errors

AttributeError: 'NoneType' object has no attribute 'find_parent'

def import_TeamList():
BASE_URL = "https://fbref.com/en/comps/10/Championship-Stats"
r = requests.get(BASE_URL)
soup = BeautifulSoup(r.text, 'lxml')
team_list = []
team_tr = soup.find('a', {'data-stat': 'squad'}).find_parent('tr')
for tr in reels_tr.find_next_siblings('tr'):
    if tr.find('a').text != 'squad':
        break
    midi_list.append(BASE_URL + tr.find('a')['href'])
return TeamList
Johannes Kasimir

Here is a version using CSS selectors, which I find simpler than most other methods.

import requests
from bs4 import BeautifulSoup


url = 'https://fbref.com/en/comps/10/stats/Championship-Stats'
data  = requests.get(url).text
soup = BeautifulSoup(data)

links = BeautifulSoup(data).select('th a')
urls = [link['href'] for link in links]
print(urls)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How can BeautifulSoup be used to extract ‘href’ links from a website?

How to extract links from HTML using BeautifulSoup?

How to extract link from href using beautifulsoup

How can I get href links from HTML using Python?

How can I extract links from webpages using scrapy?

How can i extract href from this html using selenium?

How can I extract the text from the <em> tag using BeautifulSoup

How can I extract all links (href) in an HTML file?

How to extract a table from a website using BeautifulSoup?

How to extract table value from using BeautifulSoup

How can I extract links from HTML?

Extract links after th in beautifulsoup

How can I extract the number from this BeautifulSoup?

how to extract a href content from a website using BeautifulSoup package in python

How can I highlight and extract text with links to within a PDF?

How can I parse a table from a specific string using BeautifulSoup?

How can I extract a table from wikipedia using Beautiful soup

How can I scrape this table using Beautifulsoup?

How to extract href content using beautifulsoup in python

How do I extract info from this table using python (ideally BeautifulSoup)

How to extract these links with BeautifulSoup?

How can i extract the links from the site that contains pagination?(using selenium)

How can I extract only certain text from similar elements using BeautifulSoup and Python

How to extract Table contents from an HTML page using BeautifulSoup in Python?

How can I extract outgoing links from a website in python?

How can I extract url links from IGN website

How can I extract HTML links from this list in Python?

How can i extract Href and title from this HTML

How can I extract the full datetime from a Beautifulsoup ResultSet?