Link of embed youtube video scraping

Mel Published at Dev

mel

I'm trying to scrape a website: page I try to crawl. The data I'm trying to collect is the link of the youtube video embed in their page. The problem is when I use urllib2 I can't execute the js, so the link doesn't appear in my code:

response = OPENER.open("https://www.hopenglish.com/how-sugar-affects-the-brain?ref=category")
html_text = response.read() 
print html_text

Do I have a way to retrieve this link without using another library to scrape this website? (Almost all my crawler is already implemented, i just need the youtube link of the embed video)

Naveen Kumar R B

After going through entire HTML response found the lead which gives the youtube video id in an inline javascript, which is inside a script tag.

part of HTML response (which gives video Id):

<script type="text/javascript" language="javascript">
                var vID = "lEXBxijQREo";
                var srt_name = "sugaraffectsbrain";
                var user_id = 0;
                var post_id = 8349;
                var share_link = 'https://www.hopenglish.com/how-sugar-affects-the-brain';
                var share_img_link = 'https://s3-ap-northeast-1.amazonaws.com/hopenglish/wp/wp-content/uploads/2014/10/how-sugar-affects-the-brain.jpg';
            </script>

From above HTML response, retrieve vID value using the regular expression as follows:

import urllib2
import re

response = urllib2.urlopen("https://www.hopenglish.com/how-sugar-affects-the-brain?ref=category")
html_text = response.read() 
# print html_text

m = re.search('vID = "(.*?)"', html_text)
print m.group(0)

which yields:

vID = "lEXBxijQREo"

you can append the vID value lEXBxijQREo to the youtube.com domain as follows:

https://www.youtube.com/watch?v=lEXBxijQREo

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-04-13

Comments

0 comments

TOP Ranking

Article

Link of embed youtube video scraping

Link of embed youtube video scraping

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

Emulator wrong screen resolution in Android Studio 1.3

3D Touch Peek Swipe Like Mail

Double spacing in rmarkdown pdf

Svchost high CPU from Microsoft.BingWeather app errors

How to how increase/decrease compared to adjacent cell

Using Response.Redirect with Friendly URLS in ASP.NET

java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

BigQuery - concatenate ignoring NULL

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Can a 32-bit antivirus program protect you from 64-bit threats

Make a B+ Tree concurrent thread safe

Bootstrap 5 Static Modal Still Closes when I Click Outside

Vector input in shiny R and then use it

Assembly definition can't resolve namespaces from external packages