In beautifulsoup4, when scraping a website purely based off of an element and the text within, how do you return more than one result?

Senuvox

I'm currently at the end of my rope dealing with a frustrating program, and I'm posting here for help for the first time. Using beautifulsoup4, I'm attempting to, in short, scrape a website with no reliable HTML classes or IDs to work with. All I have is the anchor element and, for the example I'm providing below, I am attempting to grab the phrase "Where the Red Fern Grows" using only the the lowercase text "red fern". So in conclusion, I am attempting to identify and collect/print the text of each unclassified/unidentified anchor element that contains the phrase "Where the Red Fern Grows", without having to type the entire string and remain case insensitive.

I've tried a multitude of things so far, with my greatest success being only a half measure. I was able to successfully collect the very first anchor element that contained 'WTRFG'. Unfortunately, despite my best efforts, that's about as much as I've been able to get. I've used both find and find_all, tried to use re.search with regex, and tried a number of other things I found in other stack overflow answers. No dice. Here's what I got right now.

import bs4
import requests
import re
import pretty_errors

url = "http://fake.site/search.php?req=where+the+red+fern+grows&lg_topic=fakesite&open=0&view=simple&res=25&phrase=1&column=def"
page = requests.get(url)
fernSoup = bs4.BeautifulSoup(page.content, "html.parser")
redFern = "red fern"

print(type(fernSoup))
print(type(redFern))

anchor = fernSoup.find_all("a", class_=False, text=lambda text: text and redFern in text.lower())

print(anchor)

Which outputs as:

<class 'bs4.BeautifulSoup'>
<class 'str'>
[<a href="book/index.php?md5=82C10FF9DA122C4B1061F83555F3800E" id="796869" title="">Where The Red Fern Grows</a>]

# This is only the first of three different results, but the only one I can access usually. The other two contain the exact same structure, minus differences in the href url and ID number.

Any advice would be greatly appreciated, and thank you for taking the time to read my post.

Edit: The three anchors I am attempting to access, copy pasted directly from the result of print(fernSoup)

<td width="500"><a href="book/index.php?md5=82C10FF9DA122C4B1061F83555F3800E" id="796869" title="">Where The Red Fern Grows</a></td>

<td width="500"><a href="book/index.php?md5=3C96145628CC4759595FB3C1A767673A" id="1157998" title="">Where the Red Fern Grows<br/> <font color="green" face="Times"><i>0553274295</i></font></a></td>

<td width="500"><a href="book/index.php?md5=9DD3079644E043E530682DA95C95B999" id="2413155" title="">Where the Red Fern Grows: The Story of Two Dogs and a Boy<br/> <font color="green" face="Times"><i>978-0-307-78156-7, 0307781569, 0553274295, 9780440412670</i></
Andrej Kesely

To select multiple <a> tags with the text "red fern", you can do:

from bs4 import BeautifulSoup

html_doc = """
 <td width="500"><a href="book/index.php?md5=82C10FF9DA122C4B1061F83555F3800E" id="796869" title="">Where The Red Fern Grows</a></td> <td width="500"><a href="book/index.php?md5=3C96145628CC4759595FB3C1A767673A" id="1157998" title="">Where the Red Fern Grows<br/> <font color="green" face="Times"><i>0553274295</i></font></a></td> 
"""

fernSoup = BeautifulSoup(html_doc, "html.parser")
redFern = "red fern"

anchor = fernSoup.find_all(
    lambda tag: tag.name == "a" and redFern in tag.text.lower()
)

print(anchor)

Prints:

[<a href="book/index.php?md5=82C10FF9DA122C4B1061F83555F3800E" id="796869" title="">Where The Red Fern Grows</a>, <a href="book/index.php?md5=3C96145628CC4759595FB3C1A767673A" id="1157998" title="">Where the Red Fern Grows<br/> <font color="green" face="Times"><i>0553274295</i></font></a>]

Or CSS selector (but this is case sensitive):

print(fernSoup.select('a:-soup-contains("Red Fern")'))

Prints:

[<a href="book/index.php?md5=82C10FF9DA122C4B1061F83555F3800E" id="796869" title="">Where The Red Fern Grows</a>, <a href="book/index.php?md5=3C96145628CC4759595FB3C1A767673A" id="1157998" title="">Where the Red Fern Grows<br/> <font color="green" face="Times"><i>0553274295</i></font></a>]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to return more than one element list?

How do you hide an element when another element has a left position of more than 0?

How do i return more than one result from this if the input have similar value

How to find and return a value based on more than one text value in a column in Google Sheets

How do you search a cell for text, cell might have more than one value in it

How do I display the updated element information when there is more than one element in the arrayList?

How do you get the location of touches for a UILongPressGestureRecognizer when the number of required touches is more than one in swift?

How return more than one match on a list of text?

How do you use loc in pandas with more than one condition?

How do you combine more than one pathlib object?

How do you make a javascript variable more than one thing?

How do you start more than one thread in C++

How do you get this function to read more than one selection?

How to return nicely-formatted text in beautifulsoup4 when HTML text is across multiple lines

How do I return more than one item with Scrapy?

How do you return a boolean value based on the result of a promise?

How to do a sort based on more than one string

How do you move to a new page when web scraping with BeautifulSoup?

how do you take a website off of the internet?

How to remove more than one space when reading text file

How to display more than one website on a webView

Why do JSON queries return object if there is one element, list if more than one?

How to Index A Search Based Off More Than One Column Using Pandas

why doesn't python return result using regex when there's more than one match?

On the 4Clojure website, how do you see your previous problems, so you can return to one of them?

how to show more than one result on query

How to get more than one result in XQuery

How to write more than one object result to one output text file in Java?

Cheerio - Get correct text when selector returns more than one result