I hurt my nose to a cookie agreement page...
What I am doing:
import requests
url = "https://stockhouse.com/community/bullboards/"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
print(soup)
which returns HTML from a cookie agreement page. What I am then looking for is to bypass this page and scrape the content of the actual page once we accept the cookies...
I tried the code from this question:
cookies = dict(BCPermissionLevel='PERSONAL')
html = requests.get(website, headers={"User-Agent": "Mozilla/5.0"}, cookies=cookies)
but I still get the html from the cookie page.
Note: I succeeded using Selenium, but selenium is a is very inefficient last recourse...
For this site it's enough to specify "dummy" cookie privacy-policy
:
import requests
from bs4 import BeautifulSoup
url = "https://stockhouse.com/community/bullboards/"
cookies = {
'privacy-policy': '1,XXXXXXXXXXXXXXXXXXXXXX'
}
r = requests.get(url, cookies=cookies)
soup = BeautifulSoup(r.content, "html.parser")
for h3 in soup.select('h3'):
print(h3.get_text(strip=True))
Prints the titles:
Perfect timing: Mach offer no good as per AMF
'Explosive' Move Up Next Week"
Repsol/ Tullow
Assessment
$5.96
Possible Deal?
Massive Investor(s) Buys Over 1 Million JE Shares Last Close
This CEO is really on the ball , right flubber
slow bb
Situation
Loadddddd
Numerology of the number 36
TIMBERRRR!!.. it will go down fast to $1.50
Employees in the know do the right thing Whistelblow
News finally
Will be bought out...halt coming
Green today
Somebody is buying
re re :350 mil is not enough
And Trump fk up another day
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments