Is there any reason why my if statement for finding text in a bs4 tag element fails?

KvothesLute

I am trying to find and print all the h3 tags which contain the months i am interested in. To do this i tried to make a for loop of my bs4 object(head) and an if statement within it specify to print the row that satisfies the condition; which in this case would be if a string (the month) is in the row. The problem i have is even if the months i specified exist in the bs4 object / rows they are not being printed in my if statement.

I have tried adding the year to the months and this seemed to solve the issue though is not ideal. Additionally i tested the logic behind my method by making a short list of some of the rows (manually) and running a for loop with that list instead of the bs4 object(head)

import requests
from bs4 import BeautifulSoup

page=requests.get('https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/statistical-work-areas-delayed-transfers-of-care-delayed-transfers-of-care-data-2018-19/')

soup=BeautifulSoup(page.text,'html.parser')
text=soup.find(class_='rich-text')
head = text.find_all('h3')

for row in head:
    for r1 in ['January','February']:
        if r1 in row:
            print(row)
        else:
            continue

The expected results are <h3>February 2019</h3> <h3>January 2019</h3>

The results i am getting are non existent as nothing is printed out

QHarr

Another way of getting the DTOC monthly publications using bs4 4.7.1

import requests
from bs4 import BeautifulSoup as bs

url = 'https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/statistical-work-areas-delayed-transfers-of-care-delayed-transfers-of-care-data-2018-19/'

r = requests.get(url)
soup = bs(r.content, 'lxml')
publications = [item.next_sibling.next_sibling.text for item in soup.select('#main-content p:has(+h3)')][1:]
print(publications)

For the page:

#main-content p:has(+h3)

filters for p tags, with parent element having id main-content, that have an adjacent sibling h3 tag. The [1:] ignores the first item in the returned list as this is not a month but Statistical Press Notice header

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Any reason why my method to check if there is even parenthesis etc is not working?

BS4: Getting text in tag

How to get inner text value of an HTML tag with BeautifulSoup bs4?

Any reason why my ViewBag is not working?

How to extract text from outside of tag with BS4

Split by bs4 tag/Get text between two tags

Finding links in a bs4.element.Tag

bs4 python not finding text

BeautifulSoup: get_text() returns empty string from bs4 Tag

Trim text from scraped element - Python / bs4

Find HTML-tag by text in BS4

How can I get the line of the text where an XML tag is found in Python using bs4 or lxml?

bs4 Tag element returning 'NoneType' when trying to extract information

bs4 How can I extract the text within <p> tag

Python/bs4: Span inside div tag - text extraction

Why is my regex not finding any matches?

How to use Python BS4 to access text within HTML <p> tag

Finding tag of text-searched element in HTML

Get the text from a div tag in html with bs4 python

Why my return statement is not showing any output?

Is there any obvious reason why my loop is not working as it should be?

Finding certain element using bs4 beautifulSoup

Iterating over a list of BS4 Tag elements to delete specified text

I want to get p tag's text and other tag's text inside the p tag in order by bs4

bs4 extract text from multiple spans in a tag

Python BS4 find_all replaces text inside the tag with <!--empty-->

Is there any reason WHERE NOT EXISTS does not seem to work in my statement?

Extract text from class 'bs4.element.Tag' beautifulsoup

How to use bs4 to grab all text, sequentially, whether wrapped in element tag or not, regardless of hierarchical order