How can I bypass a cookie agreement page while web scraping using Python?

Vincent Labrecque

I hurt my nose to a cookie agreement page...

What I am doing:

import requests
url = "https://stockhouse.com/community/bullboards/"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
print(soup)

which returns HTML from a cookie agreement page. What I am then looking for is to bypass this page and scrape the content of the actual page once we accept the cookies...

I tried the code from this question:

cookies = dict(BCPermissionLevel='PERSONAL')
html = requests.get(website, headers={"User-Agent": "Mozilla/5.0"}, cookies=cookies)

but I still get the html from the cookie page.

Note: I succeeded using Selenium, but selenium is a is very inefficient last recourse...

Andrej Kesely

For this site it's enough to specify "dummy" cookie privacy-policy:

import requests
from bs4 import BeautifulSoup

url = "https://stockhouse.com/community/bullboards/"

cookies = {
    'privacy-policy': '1,XXXXXXXXXXXXXXXXXXXXXX'
}

r = requests.get(url, cookies=cookies)
soup = BeautifulSoup(r.content, "html.parser")

for h3 in soup.select('h3'):
    print(h3.get_text(strip=True))

Prints the titles:

Perfect timing: Mach offer no good as per AMF
'Explosive' Move Up Next Week"
Repsol/ Tullow
Assessment
$5.96
Possible Deal?
Massive Investor(s) Buys Over 1 Million JE Shares Last Close
This CEO is really on the ball , right flubber
slow bb
Situation
Loadddddd
Numerology of the number 36
TIMBERRRR!!.. it will go down fast to $1.50
Employees in the know do the right thing Whistelblow
News finally
Will be bought out...halt coming
Green today
Somebody is buying
re re :350 mil is not enough
And Trump fk up another day

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How do I insert a cookie in Python for web scraping?

how to bypass googletagmanager while scraping

How can I create a Python CSV file after scraping information from a web page?

How can I create a Python Dictionary with Selenium Web Scraping?

How to Bypass Google Recaptcha while scraping with Requests

How to bypass Mod_Security while scraping

Scraping web page using BeautifulSoup Python

problems scraping web page using python

How can I change "display: flex" to "display: none" when parsing (scraping) a web-page? | Python (telegram-bot) | Selenium

How can I bypass the character limit of Python?

Web scraping- how can i get the price of all the available posts in a web page

Python Scraping Web page

Bypass incapsula while scraping

How to increase the request page time in python 3 while scraping web pages?

How can I do web scraping in Julia?

how can i do web scraping in this case?

How can I get rid of Unicode Encode Error while trying to output web scraping result in a .txt file

How can I "bypass" control characters while reading a file?

How to remove duplicate titles while scraping it from web-page

How to decode [email\xa0protected] while web scraping using python

Web scraping using python, how to deal with ngif?

How to Extract Information with web scraping using Python?

How can I set the cookie by using requests in python?

Using Python for web scraping

Web scraping using Python

Any option to bypass Incapsula protection in python3 while scraping?

Basic Error while web scraping using Selenium and Edge with Python

Scraping links from a page using Beautiful Soup, how can I now iterate through these links?

How do I get numerical data while web scraping?