Python: Pyppeteer with asyncio

htf Published at Dev

HTF

I was doing some tests and I wonder if the script below is running asynchronously?

# python test.py  It took 1.3439464569091797 seconds.

31 (sites) x 1.34 = 41.54s - so it's a few seconds less but in theory it should take only as long as the longest request?

# python test.py  It took 28.129364728927612 seconds.

Perhaps opening a browser is not async here and I should use executor for this?

# cat test.py 
import asyncio
import time

from pyppeteer import launch
from urllib.parse import urlparse

WEBSITE_LIST = [
    'http://envato.com',
    'http://amazon.co.uk',
    'http://amazon.com',
    'http://facebook.com',
    'http://google.com',
    'http://google.fr',
    'http://google.es',
    'http://google.co.uk',
    'http://internet.org',
    'http://gmail.com',
    'http://stackoverflow.com',
    'http://github.com',
    'http://heroku.com',
    'http://djangoproject.com',
    'http://rubyonrails.org',
    'http://basecamp.com',
    'http://trello.com',
    'http://yiiframework.com',
    'http://shopify.com',
    'http://airbnb.com',
    'http://instagram.com',
    'http://snapchat.com',
    'http://youtube.com',
    'http://baidu.com',
    'http://yahoo.com',
    'http://live.com',
    'http://linkedin.com',
    'http://yandex.ru',
    'http://netflix.com',
    'http://wordpress.com',
    'http://bing.com',
]

start = time.time()

async def fetch(url):
    browser = await launch(headless=True, args=['--no-sandbox'])
    page = await browser.newPage()
    await page.goto(f'{url}', {'waitUntil': 'load'})
    await page.screenshot({'path': f'img/{urlparse(url)[1]}.png'})
    await browser.close()

async def run():
    tasks = []

    for url in WEBSITE_LIST:
        task = asyncio.ensure_future(fetch(url))
        tasks.append(task)

    responses = await asyncio.gather(*tasks)
    #print(responses)

#asyncio.get_event_loop().run_until_complete(fetch('http://yahoo.com'))
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run())
loop.run_until_complete(future)

print(f'It took {time.time()-start} seconds.')

Fantix King

According to pyppeteer source code, it is using subprocess without pipes to manage Chromium processes, and websockets to communicate, therefore it is async.

You have 31 sites, then you'll have 31+1 processes. So unless you have a CPU with 32 cores (there might also be threads, system processes, locks, hyper-threading and all different factors infecting the result, so this is just an imprecise example), it won't be fully executed in parallel. Therefore, the bottleneck I think is CPU opening browsers, rendering web pages and dumping into images. Using executor won't help.

However, it is still async. That means, your Python process is not blocked, you can still run other code or wait for network results concurrently. It is only that when the CPU is fully loaded by other processes, it becomes harder for the Python process to "steal" CPU time.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-10-30

Comments

0 comments

TOP Ranking

Article

Python: Pyppeteer with asyncio

Python: Pyppeteer with asyncio

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

Emulator wrong screen resolution in Android Studio 1.3

3D Touch Peek Swipe Like Mail

Double spacing in rmarkdown pdf

Svchost high CPU from Microsoft.BingWeather app errors

How to how increase/decrease compared to adjacent cell

Using Response.Redirect with Friendly URLS in ASP.NET

java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

BigQuery - concatenate ignoring NULL

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Can a 32-bit antivirus program protect you from 64-bit threats

Make a B+ Tree concurrent thread safe

Bootstrap 5 Static Modal Still Closes when I Click Outside

Vector input in shiny R and then use it

Assembly definition can't resolve namespaces from external packages