我的代码是:
html_doc = "file:///C:/Users/Me/Desktop/Convert%20URL%20to%20HTML%20Link.html"
soup = BeautifulSoup(html_doc, "html.parser")
print(soup.p)
使用其他soup.a / p / title,即使我确定HTML文档中应该包含这些元素中的任何一个,也将导致None。
这是指向HTML文档的实际URL链接:https : //www.textfixer.com/html/convert-url-to-html-link.php
假设html是您下载的目录中的文件。
您必须先打开文件,然后阅读然后抓取:
示例如下:
from bs4 import BeautifulSoup
file_dir = "C:/Users/Me/Desktop/Convert URL to HTML Link.html"
with open (file_dir , "r") as files_f:
content = files_f.read()
files_f.close()
soup = BeautifulSoup(content, 'html.parser')
selections_p = soup.find_all("p")
print(selections_p )
如果您要从某个网站抓取,则应先请求该页面,然后再抓取:
import requests
from bs4 import BeautifulSoup
with requests.session() as s_request:
url_to_scrape = 'https://www.textfixer.com/html/convert-url-to-html-link.php'
request_page = s_request.get(url_to_scrape)
soup = BeautifulSoup(request_page.content, 'html.parser')
soup = BeautifulSoup(content, 'html.parser')
selections_p = soup.find_all("p")
print(selections_p)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句