I am trying to find and print all the h3 tags which contain the months i am interested in. To do this i tried to make a for loop of my bs4 object(head
) and an if statement within it specify to print the row that satisfies the condition; which in this case would be if a string (the month) is in the row. The problem i have is even if the months i specified exist in the bs4 object / rows they are not being printed in my if statement.
I have tried adding the year to the months and this seemed to solve the issue though is not ideal. Additionally i tested the logic behind my method by making a short list of some of the rows (manually) and running a for loop with that list instead of the bs4 object(head
)
import requests
from bs4 import BeautifulSoup
page=requests.get('https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/statistical-work-areas-delayed-transfers-of-care-delayed-transfers-of-care-data-2018-19/')
soup=BeautifulSoup(page.text,'html.parser')
text=soup.find(class_='rich-text')
head = text.find_all('h3')
for row in head:
for r1 in ['January','February']:
if r1 in row:
print(row)
else:
continue
The expected results are <h3>February 2019</h3>
<h3>January 2019</h3>
The results i am getting are non existent as nothing is printed out
Another way of getting the DTOC monthly publications using bs4 4.7.1
import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/statistical-work-areas-delayed-transfers-of-care-delayed-transfers-of-care-data-2018-19/'
r = requests.get(url)
soup = bs(r.content, 'lxml')
publications = [item.next_sibling.next_sibling.text for item in soup.select('#main-content p:has(+h3)')][1:]
print(publications)
For the page:
#main-content p:has(+h3)
filters for p
tags, with parent element having id main-content
, that have an adjacent sibling h3
tag. The [1:]
ignores the first item in the returned list as this is not a month but Statistical Press Notice
header
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments