我正在尝试从网站上的脚本标签中提取链接。目前,我的正则表达式出于某种原因返回了整个代码块。
这是我要从中获取链接的脚本标签的内容:
<script type="text/javascript">
var key = '';
var url = 'http://stream1.song365.me/h1/20160129/1772422101/The%20Beatles%20-%20There%27s%20a%20Place%20%28Studio%20Outtake%20Takes%205%20%26%206%29_(song365.cc).mp3';
var hqurl = 'http://stream1.song365.me/h1/20160129/1772422101/The%20Beatles%20-%20There%27s%20a%20Place%20%28Studio%20Outtake%20Takes%205%20%26%206%29_(song365.cc).mp3';
$(document).ready(function(){
$("div[rel='digg']").click(function(){
var method = $(this).attr("method");
var v = parseInt($(this).find('em').html());
var p = this;
$.post("/track/digg/2788951/" + method, function(data){
if(data.status==0)
{
alert("today you have been digg it!")
}
else
{
$(p).find('em').html(data.number);
}
}, "JSON")
})
if(url.length!=0)
{
$("#download-link").attr("href", url + "?key=" + key).css("display","");;
}
if(hqurl.length!=0)
{
$("#download-link-hq").attr("href", hqurl + "?key=" + key).css("display","");
}
});
</script>
这是我目前拥有的代码:
request = requests.get(dummy_link)
data = request.text
soup = BeautifulSoup(data, 'html.parser')
link = soup.findAll(text=re.compile('var hqurl.*?mp3'))
它返回整个脚本标签,但我只希望将链接分配给hqurl
变量。
当前代码在@alecxe的帮助下:
request = requests.get('https://www.song365mp3.biz/download/the-beatles-there039s-a-place-studio-outtake-takes-5-amp-6-2788951.html')
data = request.text
soup = BeautifulSoup(data, 'html.parser')
pattern = re.compile("var hqurl = '(.*?mp3)';$", re.MULTILINE | re.DOTALL)
link = soup.find("script", text=pattern)
print(pattern.search(link.text).group(1))
但是抛出错误:
print((link.text).group(1))
AttributeError: 'ResultSet' object has no attribute 'text'
预编译模式并重复使用以查找元素和提取链接:
pattern = re.compile("var hqurl = '(.*?mp3)';", re.MULTILINE | re.DOTALL)
link = soup.find("script", text=pattern)
print(pattern.search(link.text).group(1))
请注意,我已经改进了表达式并添加了一个捕获组,该捕获组会将实际链接保存在一个组中,然后通过进行访问.group(1)
。
印刷:
http://stream1.song365.me/h1/20160129/1772422101/The%20Beatles%20-%20There%27s%20a%20Place%20%28Studio%20Outtake%20Takes%205%20%26%206%29_(song365.cc).mp3
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句