Why does BeautifulSoup give me the wrong text?

Jem

I've been trying to get the availability status of a product on IKEA's website. On IKEA's website, it says in Dutch: 'not available for delivery', 'only available in the shop', 'not in stock' and 'you've got 365 days of warranty'.

But my code gives me: 'not available for delivery', 'only available for order and pickup', 'checking inventory' and 'you've got 365 days of warranty'.

What do I do wrong which causes the text to not be the same?

This is my code:

import requests
from bs4 import BeautifulSoup

# Get the url of the IKEA page and set up the bs4 stuff
url = 'https://www.ikea.com/nl/nl/p/flintan-bureaustoel-vissle-zwart-20336841/'
thepage = requests.get(url)
soup = BeautifulSoup(thepage.text, 'lxml')

# Locate the part where the availability stuff is
availabilitypanel = soup.find('div', {'class' : 'range-revamp-product-availability'})

# Get the text of the things inside of that panel
availabilitysectiontext = [part.getText() for part in availabilitypanel]
print(availabilitysectiontext)
Rajesh Alwar

The page markup is getting added with javascript after the initial server response. BeautifulSoup is only able to see the initial response and doesn't execute javascript to get the complete response. If you want to run JavaScript, you'll need to use a headless browser. Otherwise, you'll have to disassemble the JavaScript and see what it does.

You could get this to work with Selenium. I modified your code a bit and got it to work.

Get Selenium:

pip3 install selenium

Download Firefox + geckodriver or Chrome + chromedriver:

from bs4 import BeautifulSoup
import time
from selenium import webdriver

# Get the url of the IKEA page and set up the bs4 stuff
url = 'https://www.ikea.com/nl/nl/p/flintan-bureaustoel-vissle-zwart-20336841/'

#uncomment the following line if using firefox + geckodriver
#driver = webdriver.Firefox(executable_path='/Users/ralwar/Downloads/geckodriver') # Downloaded from https://github.com/mozilla/geckodriver/releases

# using chrome + chromedriver
op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op, executable_path='/Users/ralwar/Downloads/chromedriver') # Downloaded from https://chromedriver.chromium.org/downloads

driver.get(url)
time.sleep(5)   #adding delay to finish loading the page + javascript completely, you can adjust this
source = driver.page_source

soup = BeautifulSoup(source, 'lxml')

# Locate the part where the availability stuff is
availabilitypanel = soup.find('div', {"class" : "range-revamp-product-availability"})

# Get the text of the things inside of that panel
availabilitysectiontext = [part.getText() for part in availabilitypanel]
print(availabilitysectiontext)

The above code prints:

['Niet beschikbaar voor levering', 'Alleen beschikbaar in de winkel', 'Niet op voorraad in Amersfoort', 'Je hebt 365 dagen om van gedachten te veranderen. ']

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Why does Vlookup give me the wrong answer?

Why does Parse_Dates give me the wrong results in Python?

Why does my output give me the wrong amount of each number?

Why does this give me an IndexError?

Why does geoshere give the wrong distance?

Why does Decimal modulo give the wrong sign?

Why does this SPARQL query give wrong results?

Why does declare -F give the wrong file

Why does "cgps -s" give me no results?

Why does mktime give me an hour less?

Why does my function give me "this is not red"?

Why does " $Namefile* " give me such a result?

Why does this html code give me an error?

Why does the function glViewport () give me errors?

Why does this IQueryable issue give me an error?

Why does glewInit() give me a segfault?

Why does subtracting the value of etext from edata not give me the correct size for my text segment

Why javascript indexOf method give me wrong output

Why is this element null and why does it give me an input element?

why does modulus by a large number seem to give a wrong answer in Java

Why does using a turbofish with into give "wrong number of type arguments"?

Why does a RegExp with global flag give wrong results?

Why does integer division code give the wrong answer?

Why does excel SUMIF function give wrong figure?

why does mysql timediff function give wrong output?

Why does this recursive function for square-roots give the wrong result?

Why does my program for the Chudnovsky algorithm give the wrong result?

Why does my code give wrong values for variables?

Why does decision tree give wrong classification in R?