Link of embed youtube video scraping

mel

I'm trying to scrape a website: page I try to crawl. The data I'm trying to collect is the link of the youtube video embed in their page. The problem is when I use urllib2 I can't execute the js, so the link doesn't appear in my code:

response = OPENER.open("https://www.hopenglish.com/how-sugar-affects-the-brain?ref=category")
html_text = response.read() 
print html_text

Do I have a way to retrieve this link without using another library to scrape this website? (Almost all my crawler is already implemented, i just need the youtube link of the embed video)

Naveen Kumar R B

After going through entire HTML response found the lead which gives the youtube video id in an inline javascript, which is inside a script tag.

part of HTML response (which gives video Id):

<script type="text/javascript" language="javascript">
                var vID = "lEXBxijQREo";
                var srt_name = "sugaraffectsbrain";
                var user_id = 0;
                var post_id = 8349;
                var share_link = 'https://www.hopenglish.com/how-sugar-affects-the-brain';
                var share_img_link = 'https://s3-ap-northeast-1.amazonaws.com/hopenglish/wp/wp-content/uploads/2014/10/how-sugar-affects-the-brain.jpg';
            </script>

From above HTML response, retrieve vID value using the regular expression as follows:

import urllib2
import re

response = urllib2.urlopen("https://www.hopenglish.com/how-sugar-affects-the-brain?ref=category")
html_text = response.read() 
# print html_text

m = re.search('vID = "(.*?)"', html_text)
print m.group(0)

which yields:

vID = "lEXBxijQREo"

you can append the vID value lEXBxijQREo to the youtube.com domain as follows:

https://www.youtube.com/watch?v=lEXBxijQREo

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related