Is there any reason why my if statement for finding text in a bs4 tag element fails?

KvothesLute

I am trying to find and print all the h3 tags which contain the months i am interested in. To do this i tried to make a for loop of my bs4 object(head) and an if statement within it specify to print the row that satisfies the condition; which in this case would be if a string (the month) is in the row. The problem i have is even if the months i specified exist in the bs4 object / rows they are not being printed in my if statement.

I have tried adding the year to the months and this seemed to solve the issue though is not ideal. Additionally i tested the logic behind my method by making a short list of some of the rows (manually) and running a for loop with that list instead of the bs4 object(head)

import requests
from bs4 import BeautifulSoup

page=requests.get('https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/statistical-work-areas-delayed-transfers-of-care-delayed-transfers-of-care-data-2018-19/')

soup=BeautifulSoup(page.text,'html.parser')
text=soup.find(class_='rich-text')
head = text.find_all('h3')

for row in head:
    for r1 in ['January','February']:
        if r1 in row:
            print(row)
        else:
            continue

The expected results are <h3>February 2019</h3> <h3>January 2019</h3>

The results i am getting are non existent as nothing is printed out

QHarr

Another way of getting the DTOC monthly publications using bs4 4.7.1

import requests
from bs4 import BeautifulSoup as bs

url = 'https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/statistical-work-areas-delayed-transfers-of-care-delayed-transfers-of-care-data-2018-19/'

r = requests.get(url)
soup = bs(r.content, 'lxml')
publications = [item.next_sibling.next_sibling.text for item in soup.select('#main-content p:has(+h3)')][1:]
print(publications)

For the page:

#main-content p:has(+h3)

filters for p tags, with parent element having id main-content, that have an adjacent sibling h3 tag. The [1:] ignores the first item in the returned list as this is not a month but Statistical Press Notice header

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-05-18

Comments

0 comments

TOP Ranking

Article

Is there any reason why my if statement for finding text in a bs4 tag element fails?

Is there any reason why my if statement for finding text in a bs4 tag element fails?

pump.io port in URL

grouping by column variables and appending a new variable based on condition

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

Group boxplot data while keeping their individual X axis labels in ggplot2 in R

Vector input in shiny R and then use it

BigQuery - concatenate ignoring NULL

Can a 32-bit antivirus program protect you from 64-bit threats

How to remove the extra space from right in a webview?

How to how increase/decrease compared to adjacent cell

android.content.Context.getSharedPreferences(java.lang.String, int)' on a null object reference id DBhandler

Getting 502 Bad Gateway Error While Deploying WordPress On Dockerized Lemp?

Type 'number' is not assignable to type 'NgIterable<any>' when trying to async observe a datasource

Check if a number is a perfect square

FFmpeg resize without upscaling

How do I display Label text character-by-character?

How to show an image in a View with ASP.NET MVC 5? (Many suggestions failed so far)

Json Schema - Conditional Evaluation with RegEx

PlayOnLinux displays weird looking window on 18.04 for MS Office installation

JMeter: Why get error when try to save test plan

Emulator wrong screen resolution in Android Studio 1.3