I'm trying to use multiprocessing along with BeautifulSoup but am encountering a maximum recursion depth exceeded
error:
def process_card(card):
result = card.find("p")
# Do some more parsing with beautifulsoup
return results
pool = multiprocessing.Pool(processes=4)
soup = BeautifulSoup(url, 'html.parser')
cards = soup.findAll("li")
for card in cards:
result = pool.apply_async(process_card, [card])
article = result.get()
if article is not None:
print article
articles.append(article)
pool.close()
pool.join()
From what I can gather, card
is of type <class bs4.element.Tag>
and the problem may have to do with pickling this object. It's not clear how I'd have to modify my code to resolve this.
It was pointed out in the comments that one could simply cast card
as unicode. However, this resulted in the process_card
function erroring out with slice indices must be integers or None or have an __index__ method
. It turns out that this error has to do with the fact that card
is no longer a bs4 object and therefore has no access to bs4 functions. Instead, card
is simply unicode and the error is a unicode-related error. And so one needs to turn card
into soup first and then proceed from there. This works!
def process_card(unicode_card):
card = BeautifulSoup(unicode_card)
result = card.find("p")
# Do some more parsing with beautifulsoup
return results
pool = multiprocessing.Pool(processes=4)
soup = BeautifulSoup(url, 'html.parser')
cards = soup.findAll("li")
for card in cards:
result = pool.apply_async(process_card, [unicode(card)])
article = result.get()
if article is not None:
print article
articles.append(article)
pool.close()
pool.join()
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments