I'm trying to use Go to parse html. I would like to print the html to the terminal and I don't understand why this doesn't print anything:
package main
import (
"fmt"
"log"
"net/http"
"golang.org/x/net/html"
)
func main() {
r, err := http.Get("https://google.com")
if err != nil {
log.Panicln(err)
}
defer func() {
err := r.Body.Close()
if err != nil {
fmt.Println(err)
}
}()
node, err := html.Parse(r.Body)
if err != nil {
log.Panicln(err)
}
fmt.Println(node.Data)
}
I know there are different ways to print the html, but I don't understand why this in particular never prints anything no matter what website I use. Is this intended behavior?
Docs:
Because it's a tree of the HTML. Upper level is empty. For example if you need parse all url from html:
package main
import (
"fmt"
"log"
"net/http"
"golang.org/x/net/html"
)
func main() {
r, err := http.Get("https://google.com")
if err != nil {
log.Panicln(err)
}
defer func() {
err := r.Body.Close()
if err != nil {
fmt.Println(err)
}
}()
node, err := html.Parse(r.Body)
if err != nil {
log.Panicln(err)
}
fmt.Println(node.Data)
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
fmt.Println(a.Val)
break
}
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
f(node)
}
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments