Multiple Http.Get hanging randomly

Dane Caro :

I am trying to learn Golang and took on a simple project to call all the craigslist cities and query them for a specific search. In the code below I removed all the links in the listingmap but there are over 400 links there. So the loop is fairly large. I thought this would be a good test to put what I am learning into application but I am running into a strange issue.

Some of the times most of the Http.Get() get no response from the server while others it gets them all with no problem. So I started adding prints to show how many error out and we recovered and how many successfully made it through. Also while this is running it will randomly hang and never respond. The program doesn't freeze but the site just sits there trying to load and the terminal shows no activity.

I am making sure my Response body is closed by deferring the cleanup after the recover but it still seems to not work. Is there something that jumps out to anyone that maybe I am missing?

Thanks in advance guys!

package main

import (
    "fmt"
    "net/http"
    "html/template"
    "io/ioutil"
    "encoding/xml"
    "sync"
    )

var wg sync.WaitGroup

var locationMap = map[string]string {"https://auburn.craigslist.org/": "auburn "...}

var totalRecovers int = 0
var successfulReads int = 0

type Listings struct {
    Links []string `xml:"item>link"`
    Titles []string `xml:"item>title"`
    Descriptions []string `xml:"item>description"`
    Dates []string `xml:"item>date"`
}

type Listing struct {
    Title string
    Description string
    Date string
}

type ListAggPage struct {
        Title string
        Listings map[string]Listing
        SearchRequest string
}

func cleanUp(link string) {
    defer wg.Done()
    if r:= recover(); r!= nil {
        totalRecovers++
//      recoverMap <- link
    }
}

func cityRoutine(c chan Listings, link string) {
    defer cleanUp(link)

    var i Listings
    address := link + "search/sss?format=rss&query=motorhome"
    resp, rErr := http.Get(address)
    if(rErr != nil) {
        fmt.Println("Fatal error has occurs while getting response.")
        fmt.Println(rErr);
    }

    bytes, bErr := ioutil.ReadAll(resp.Body)
    if(bErr != nil) {
        fmt.Println("Fatal error has occurs while getting bytes.")
        fmt.Println(bErr);
    }
    xml.Unmarshal(bytes, &i)
    resp.Body.Close()
    c <- i
    successfulReads++
}

func listingAggHandler(w http.ResponseWriter, r *http.Request) {
    queue := make(chan Listings, 99999)
    listing_map := make(map[string]Listing)

    for key, _ := range locationMap {
        wg.Add(1)
        go cityRoutine(queue, key)
    }

    wg.Wait()
    close(queue)

    for elem := range queue { 
        for index, _ := range elem.Links {
        listing_map[elem.Links[index]] = Listing{elem.Titles[index * 2], elem.Descriptions[index], elem.Dates[index]}
        }
    }

    p := ListAggPage{Title: "Craigslist Aggregator", Listings: listing_map}
    t, _ := template.ParseFiles("basictemplating.html")
    fmt.Println(t.Execute(w, p))

    fmt.Println("Successfully loaded: ", successfulReads)       
    fmt.Println("Recovered from: ", totalRecovers)
}

func indexHandler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "<h1>Whoa, Go is neat!</h1>")
}

func main() {
    http.HandleFunc("/", indexHandler)
    http.HandleFunc("/agg/", listingAggHandler)
    http.ListenAndServe(":8000", nil) 
}
maxm :

I'm having trouble finding the golang mailing list discussion I was reading in reference to this, but you generally don't want to open up hundreds of requests. There's some information here: How Can I Effectively 'Max Out' Concurrent HTTP Requests?

Craigslist might also just be rate limiting you. Either way, I recommend limiting to around 20 simultaneous requests or so, here's a quick update to your listingAggHandler.

queue := make(chan Listings, 99999)
listing_map := make(map[string]Listing)

request_queue := make(chan string)
for i := 0; i < 20; i++ {
    go func() {
        for {
            key := <- request_queue
            cityRoutine(queue, key)                
        }
    }()
}

for key, _ := range locationMap {
    wg.Add(1)
    request_queue <- key
}

wg.Wait()
close(request_queue)
close(queue)

The application should still be very fast. I agree with the other comments on your question as well. Would also try and avoid putting so much in the global scope.

You could also spruce my changes up a little by just using the wait group in the request pool and have each goroutine clean itself up and decrement the wait group. That would limit some of the global scope.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

TOP Ranking

HotTag

Archive