我编写了以下代码:
require "http/client"
require "myhtml"
puts "Give me the URL of the page to be scraped."
url = gets
html=<<-HTML
[Here goes the html of the website to be scraped]
HTML
myhtml = Myhtml::Parser.new(html)
myhtml.nodes(:div).each do |node|
id = node.attribute_by("id")
if first_link = node.scope.nodes(:a).first?
href = first_link.attribute_by("href")
link_text = first_link.inner_text
puts "div with id #{id} have link [#{link_text}](#{href})"
else
puts "div with id #{id} have no links"
end
end
如何从我试图抓取字符串的网页中获取 html 以便我可以替换
html=<<-HTML
[Here goes the html of the website to be scraped]
HTML
像
response = requests.get(url)
html = BeautifulSoup(response.text, 'html.parser')
来自以下 Python 代码:
url = input("What is the address of the web page in question?\n")
response = requests.get(url)
html = BeautifulSoup(response.text, 'html.parser')
或let html = reqwest::get(url).await?.text().await?;
来自以下 Rust 代码:
println!("Give me the URL of the page to be scraped.");
let mut url = String::new();
io::stdin().read_line(&mut url).expect("Failed to read line");
let html = reqwest::get(url).await?.text().await?;
分片myhtml的文档没有为我提供足够的示例来解决这个问题。可以使用Crystal 的标准库中的 HTTP 客户端来完成吗?当我更换
html=<<-HTML
[Here goes the html of the website to be scraped]
HTML
和
response = HTTP::Client.get url
html = response.body
我收到以下错误:
response = HTTP::Client.get url #no overload matches 'HTTP::Client.get' with type (String | Nil)
^--
Error: no overload matches 'HTTP::Client.get' with type (String | Nil)
Overloads are:
- HTTP::Client.get(url : String | URI, headers : HTTP::Headers | ::Nil = nil, body : BodyType = nil, tls : TLSContext = nil)
- HTTP::Client.get(url : String | URI, headers : HTTP::Headers | ::Nil = nil, body : BodyType = nil, tls : TLSContext = nil, &block)
- HTTP::Client.get(url, headers : HTTP::Headers | ::Nil = nil, tls : TLSContext = nil, *, form : String | IO | Hash)
- HTTP::Client.get(url, headers : HTTP::Headers | ::Nil = nil, tls : TLSContext = nil, *, form : String | IO | Hash, &block)
Couldn't find overloads for these types:
- HTTP::Client.get(Nil)
我可以通过硬编码从网页中获取文本,例如,response = HTTP::Client.get "https://github.com/monero-project/monero/releases"
但这还不够,因为我希望应用程序具有交互性。
你很接近,这是抱怨的类型系统。HTTP::Client.get
期望一个String
(或者更确切地说String | URL
)。但是,在您的代码中,您的url
变量也可以是nil
并且是 类型String?
,它是String | Nil
. 如果您对 URL 进行硬编码,则它不能nil
但是总是类型为String
。因此HTTP::Client.get
调用有效。
def gets(chomp = true) : 字符串?
从此 IO 中读取一行。一行以 \n 字符结束。如果在此 IO 结束时调用,则返回 nil。
有多种方法可以解决它,但基本思想是您必须确保在进行 HTTP 调用时url
不会出现nil
这种情况。例如:
url = gets
if url
# now url cannot be nil
response = HTTP::Client.get url
html = response.body
puts html
end
进一步阅读:如果 var
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句