How can I extract href links from a within a table th using BeautifulSoup

SlowBear Published at Dev

SlowBear

I am trying to create a list of all football teams/links from any one of a number of tables within the base URL: https://fbref.com/en/comps/10/stats/Championship-Stats

I would then use the link from the href to scrape each individual team's data. The href is embedded within the th tag as per below

th scope="row" class="left " data-stat="squad"><a href="/en/squads/293cb36b/Barnsley-Stats">Barnsley</a></th

   a href="/en/squads/293cb36b/Barnsley-Stats">Barnsley</a

The following code gives me a list of the 'a' tags

page = "https://fbref.com/en/comps/10/Championship-Stats"
pageTree = requests.get(page)
pageSoup = BeautifulSoup(pageTree.content, 'html.parser')
Teams = pageSoup.find_all("th", {"class": "left"})

Output(for each class of 'left'):

th class="left" data-stat="squad" scope="row"> a href="/en/squads/293cb36b/Barnsley-Stats">Barnsley,

I have tried the guidance from a previous Stack question (Extract links after th in beautifulsoup) However, the following code based on that thread produces errors

AttributeError: 'NoneType' object has no attribute 'find_parent'

def import_TeamList():
BASE_URL = "https://fbref.com/en/comps/10/Championship-Stats"
r = requests.get(BASE_URL)
soup = BeautifulSoup(r.text, 'lxml')
team_list = []
team_tr = soup.find('a', {'data-stat': 'squad'}).find_parent('tr')
for tr in reels_tr.find_next_siblings('tr'):
    if tr.find('a').text != 'squad':
        break
    midi_list.append(BASE_URL + tr.find('a')['href'])
return TeamList

Johannes Kasimir

Here is a version using CSS selectors, which I find simpler than most other methods.

import requests
from bs4 import BeautifulSoup


url = 'https://fbref.com/en/comps/10/stats/Championship-Stats'
data  = requests.get(url).text
soup = BeautifulSoup(data)

links = BeautifulSoup(data).select('th a')
urls = [link['href'] for link in links]
print(urls)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-09-22

Comments

0 comments

How can BeautifulSoup be used to extract ‘href’ links from a website?

How can I extract href links from a within a table th using BeautifulSoup

How can I extract href links from a within a table th using BeautifulSoup

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

Emulator wrong screen resolution in Android Studio 1.3

3D Touch Peek Swipe Like Mail

Double spacing in rmarkdown pdf

Svchost high CPU from Microsoft.BingWeather app errors

How to how increase/decrease compared to adjacent cell

Using Response.Redirect with Friendly URLS in ASP.NET

java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

BigQuery - concatenate ignoring NULL

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Can a 32-bit antivirus program protect you from 64-bit threats

Make a B+ Tree concurrent thread safe

Bootstrap 5 Static Modal Still Closes when I Click Outside

Vector input in shiny R and then use it

Assembly definition can't resolve namespaces from external packages